Tag Archives: Best practices

Q: Is DevOps all about automation?

Answered succinctly, no. But automation is an important means to accomplishing the work. Whether you’re new to DevOps or migrating from another set of automation solutions, adopting new tooling with a small with a trial project or process will give you the base set of learnings to help scale and standardize automation across your entire organization. With this automation in place you will lay the foundation for measuring effectiveness and further progress toward your goals.

Tightening application security with Amazon CodeGuru

2020-12-01 Brian Farnhill

Post Syndicated from Brian Farnhill original https://aws.amazon.com/blogs/devops/tightening-application-security-with-amazon-codeguru/

Amazon CodeGuru is a developer tool powered by machine learning (ML) that provides intelligent recommendations for improving code quality and identifies an application’s most expensive lines of code. To help you find and remediate potential security issues in your code, Amazon CodeGuru Reviewer now includes an expanded set of security detectors. In this post, we discuss the new types of security issues CodeGuru Reviewer can detect.

Time to read			9 minutes
Services used			Amazon CodeGuru

The new security detectors are now a feature in CodeGuru Reviewer for Java applications. These detectors focus on finding security issues in your code before you deploy it. They extend CodeGuru Reviewer by providing additional security-specific recommendations to the existing set of application improvements it already recommends. When an issue is detected, a remediation recommendation and explanation is generated. This allows you to find and remediate issues before the code is deployed. These findings can help in addressing the OWASP top 10 web application security risks, with many of the recommendations being based on specific issues customers have had in this space.

You can run a security scan by creating a repository analysis. CodeGuru Reviewer now provides an additional option to get both code and security recommendations for Java codebases. Selecting this option enables you to find potential security vulnerabilities before they are promoted to production, and support users remaining secure when using your service.

Types of security issues CodeGuru Reviewer detects

Previously, CodeGuru Reviewer helped address security by detecting potential sensitive information leaks (such as personally identifiable information or credit card numbers). The additional CodeGuru Reviewer security detectors expand on this by addressing:

AWS API security best practices – Helps you follow security best practices when using AWS APIs, such as avoiding hard-coded credentials in API calls
Java crypto library best practices – Identifies when you’re not using best practices for common Java cryptography libraries, such as avoiding outdated cryptographic ciphers
Secure web applications – Inspects code for insecure handling of untrusted data, such as not sanitizing user-supplied input to protect against cross-site scripting, SQL injection, LDAP injection, path traversal injection, and more
AWS Security best practices – Developed in collaboration with AWS Security, these best practices help bring our internal expertise to customers

Examples of new security findings

The following are examples of findings that CodeGuru Reviewer security detectors can now help you identify and resolve.

AWS API security best practices

AWS API security best practice detectors inspect your code to identify issues that can be caused by not following best practices related to AWS SDKs and APIs. An example of a detected issue in this category is using hard-coded AWS credentials. Consider the following code:

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;

static String myKeyId ="AKIAX742FUDUQXXXXXXX";
static String mySecretKey = "MySecretKey";

public static void main(String[] args) {
    AWSCredentials creds = getCreds(myKeyId, mySecretKey);
}

static AWSCredentials getCreds(String id, String key) {
    return new BasicAWSCredentials(id, key);}
}

In this code, the variables myKeyId and mySecretKey are hard-coded in the application. This may have been done to move quickly, but it can also lead to these values being discovered and misused.

In this case, CodeGuru Reviewer recommends using environment variables or an AWS profile to store these values, because these can be retrieved at runtime and aren’t stored inside the application (or its source code). Here you can see an example of what this finding looks like in the console:

An example of the CodeGuru reviewer finding for IAM credentials in the AWS console

The recommendation suggests using environment variables or an AWS profile instead, and that after you delete or rotate the affected key you monitor it with CloudWatch for any attempted use. Following the learn more link, you’ll see additional detail and recommended approaches for remediation, such as using the DefaultAWSCredentialsProviderChain. An example of how to remediate this in the preceding code is to update the getCreds() function:

import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;

static AWSCredentials getCreds() {
    DefaultAWSCredentialsProviderChain creds =
        new DefaultAWSCredentialsProviderChain();
    return creds.getCredentials();
}

Java crypto library best practices

When working with data that must be protected, cryptography provides mechanisms to encrypt and decrypt the information. However, to ensure the security of this data, the application must use a strong and modern cipher. Consider the following code:

import javax.crypto.Cipher;

static final String CIPHER = "DES";

public void run() {
    cipher = Cipher.getInstance(CIPHER);
}

A cipher object is created with the DES algorithm. CodeGuru Reviewer recommends a stronger cipher to help protect your data. This is what the recommendation looks like in the console:

An example of the CodeGuru reviewer finding for encryption ciphers in the AWS console

Based on this, one example of how to address this is to substitute a different cipher:

static final String CIPHER ="RSA/ECB/OAEPPadding";

This is just one option for how it could be addressed. The CodeGuru Reviewer recommendation text suggests several options, and a link to documentation to help you choose the best cipher.

Secure web applications

When working with sensitive information in cookies, such as temporary session credentials, those values must be protected from interception. This is done by flagging the cookies as secure, which prevents them from being sent over an unsecured HTTP connection. Consider the following code:

import javax.servlet.http.Cookie;

public static void createCookie() {
    Cookie cookie = new Cookie("name", "value");
}

In this code, a new cookie is created that is not marked as secure. CodeGuru Reviewer notifies you that you could make a correction by adding:

cookie.setSecure(true);

This screenshot shows you an example of what the finding looks like.

An example CodeGuru finding that shows how to ensure cookies are secured.

AWS Security best practices

This category of detectors has been built in collaboration with AWS Security and assists in detecting many other issue types. Consider the following code, which illustrates how a string can be re-encrypted with a new key from AWS Key Management Service (AWS KMS):

import java.nio.ByteBuffer;
import com.amazonaws.services.kms.AWSKMS;
import com.amazonaws.services.kms.AWSKMSClientBuilder;
import com.amazonaws.services.kms.model.DecryptRequest;
import com.amazonaws.services.kms.model.EncryptRequest;

AWSKMS client = AWSKMSClientBuilder.standard().build();
ByteBuffer sourceCipherTextBlob = ByteBuffer.wrap(new byte[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 0});

DecryptRequest req = new DecryptRequest()
                         .withCiphertextBlob(sourceCipherTextBlob);
ByteBuffer plainText = client.decrypt(req).getPlaintext();

EncryptRequest res = new EncryptRequest()
                         .withKeyId("NewKeyId")
                         .withPlaintext(plainText);
ByteBuffer ciphertext = client.encrypt(res).getCiphertextBlob();

This approach puts the decrypted value at risk by decrypting and re-encrypting it locally. CodeGuru Reviewer recommends using the ReEncrypt method—performed on the server side within AWS KMS—to avoid exposing your plaintext outside AWS KMS. A solution that uses the ReEncrypt object looks like the following code:

import com.amazonaws.services.kms.model.ReEncryptRequest;

ReEncryptRequest req = new ReEncryptRequest()
                           .withCiphertextBlob(sourceCipherTextBlob)
                           .withDestinationKeyId("NewKeyId");

client.reEncrypt(req).getCiphertextBlob();

This screenshot shows you an example of what the finding looks like.

An example CodeGuru finding to show how to avoid decrypting and encrypting locally when it's not needed

Detecting issues deep in application code

Detecting security issues can be made more complex by the contributing code being spread across multiple methods, procedures and files. This separation of code helps ensure humans work in more manageable ways, but for a person to look at the code, it obscures the end to end view of what is happening. This obscurity makes it harder, or even impossible to find complex security issues. CodeGuru Reviewer can see issues regardless of these boundaries, deeply assessing code and the flow of the application to find security issues throughout the application. An example of this depth exists in the code below:

import java.io.UnsupportedEncodingException;
import javax.servlet.http.Cookie;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

private String decode(final String val, final String enc) {
    try {
        return java.net.URLDecoder.decode(val, enc);
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return "";
}

public void pathTraversal(HttpServletRequest request) throws IOException {
    javax.servlet.http.Cookie[] theCookies = request.getCookies();
    String path = "";
    if (theCookies != null) {
        for (javax.servlet.http.Cookie theCookie : theCookies) {
            if (theCookie.getName().equals("thePath")) {
                path = decode(theCookie.getValue(), "UTF-8");
                break;
            }
        }
    }
    if (!path.equals("")) {
        String fileName = path + ".txt";
        String decStr = new String(org.apache.commons.codec.binary.Base64.decodeBase64(
            org.apache.commons.codec.binary.Base64.encodeBase64(fileName.getBytes())));
        java.io.FileOutputStream fileOutputStream = new java.io.FileOutputStream(decStr);
        java.io.FileDescriptor fd = fileOutputStream.getFD();
        System.out.println(fd.toString());
    }
}

This code presents an issue around path traversal, specifically relating to the Broken Access Control rule in the OWASP top 10 (specifically CWE 22). The issue is that a FileOutputStream is being created using an external input (in this case, a cookie) and the input is not being checked for invalid values that could traverse the file system. To add to the complexity of this sample, the input is encoded and decoded from Base64 so that the cookie value isn’t passed directly to the FileOutputStream constructor, and the parsing of the cookie happens in a different function. This is not something you would do in the real world as it is needlessly complex, but it shows the need for tools that can deeply analyze the flow of data in an application. Here the value passed to the FileOutputStream isn’t an external value, it is the result of the encode/decode line and as such, is a new object. However CodeGuru Reviewer follows the flow of the application to understand that the input still came from a cookie, and as such it should be treated as an external value that needs to be validated. An example of a fix for the issue here would be to replace the pathTraversal function with the sample shown below:

static final String VALID_PATH1 = "./test/file1.txt";
static final String VALID_PATH2 = "./test/file2.txt";
static final String DEFAULT_VALID_PATH = "./test/file3.txt";

public void pathTraversal(HttpServletRequest request) throws IOException {
    javax.servlet.http.Cookie[] theCookies = request.getCookies();
    String path = "";
    if (theCookies != null) {
        for (javax.servlet.http.Cookie theCookie : theCookies) {
            if (theCookie.getName().equals("thePath")) {
                path = decode(theCookie.getValue(), "UTF-8");
                break;
            }
        }
    }
    String fileName = "";
    if (!path.equals("")) {
        if (path.equals(VALID_PATH1)) {
            fileName = VALID_PATH1;
        } else if (path.equals(VALID_PATH2)) {
            fileName = VALID_PATH2;
        } else {
            fileName = DEFAULT_VALID_PATH;
        }
        String decStr = new String(org.apache.commons.codec.binary.Base64.decodeBase64(
            org.apache.commons.codec.binary.Base64.encodeBase64(fileName.getBytes())));
        java.io.FileOutputStream fileOutputStream = new java.io.FileOutputStream(decStr);
        java.io.FileDescriptor fd = fileOutputStream.getFD();
        System.out.println(fd.toString());
    }
}

The main difference in this sample is that the path variable is tested against known good values that would prevent path traversal, and if one of the two valid path options isn’t provided, the third default option is used. In all cases the externally provided path is validated to ensure that there isn’t a path through the code that allows for path traversal to occur in the subsequent call. As with the first sample, the path is still encoded/decoded to make it more complicated to follow the flow through, but the deep analysis performed by CodeGuru Reviewer can follow this and provide meaningful insights to help ensure the security of your applications.

Extending the value of CodeGuru Reviewer

CodeGuru Reviewer already recommends different types of fixes for your Java code, such as concurrency and resource leaks. With these new categories, CodeGuru Reviewer can let you know about security issues as well, bringing further improvements to your applications’ code. The new security detectors operate in the same way that the existing detectors do, using static code analysis and ML to provide high confidence results. This can help avoid signaling non-issue findings to developers, which can waste time and erode trust in the tool.

You can provide feedback on recommendations in the CodeGuru Reviewer console or by commenting on the code in a pull request. This feedback helps improve the performance of the reviewer, so the recommendations you see get better over time.

Conclusion

Security issues can be difficult to identify and can impact your applications significantly. CodeGuru Reviewer security detectors help make sure you’re following security best practices while you build applications.

CodeGuru Reviewer is available for you to try. For full repository analysis, the first 30,000 lines of code analyzed each month per payer account are free. For pull request analysis, we offer a 90 day free trial for new customers. Please check the pricing page for more details. For more information, see Getting started with CodeGuru Reviewer.

About the author

Brian Farnhill

Brian Farnhill is a Developer Specialist Solutions Architect in the Australian Public Sector team. His background is building solutions and helping customers improve DevOps tools and processes. When he isn’t working, you’ll find him either coding for fun or playing online games.

Application integration patterns for microservices: Orchestration and coordination

2020-11-18 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-for-microservices-orchestration-and-coordination/

This post is courtesy of Stephen Liedig, Sr. Serverless Specialist SA.

This is the final blog post in the “Application Integration Patterns for Microservices” series. Previous posts cover asynchronous messaging for microservices, fan-out strategies, and scatter-gather design patterns.

In this post, I look at how to implement messaging patterns to help orchestrate and coordinate business workflows in our applications. Specifically, I cover two patterns:

Pipes and Filters, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004)
Saga Pattern, which is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987.

I discuss these patterns using the Wild Rydes example from this series.

Wild Rydes

Wild Rydes is a fictional technology start-up created to disrupt the transportation industry by replacing traditional taxis with unicorns. Several hands-on AWS workshops use the Wild Rydes scenario. It illustrates concepts such as serverless development, event-driven design, API management, and messaging in microservices.

This blog post explores the build process of the Wild Rydes workshop, to help you apply these concepts to your applications.

After completing a unicorn ride, the Wild Rydes customer application charges the customer. Once the driver submits a ride completion, an event triggers the following steps:

Registers the fare: registers the fare ride completion event.
Initiates the payment (via a payment service): calls a payment gateway for credit card pre-authorization. Using the pre-authorization code, it completes the payment transaction.
Updates customer accounting system: once the payment is processed, updates the Wild Rydes customer accounting system with the transaction detail.
Publishes “Fare Processed” event: sends a notification to interested components that the process is completed.

Each of the steps interfaces with separate systems – the Wild Rydes system, a third-party payment provider, and the customer accounting system. You could implement these steps inside a single component, but that would make it difficult to change and adapt. It’d also reduce the potential for components reuse within our application. Breaking down the steps into individual components allows you to build components with a single responsibility making it easier to manage each components dependencies and application lifecycle. You can be selective about how you implement the respective components, for example, different teams responsible for the development of the respective components may choose to use different languages. This is where the Pipes and Filters architectural pattern can help.

Pipes and filters

Hohpe and Woolf define Pipes and Filters as an “architectural style to divide a larger processing task into a sequence of smaller, independent processing steps (filters) that are connected by channels (pipes).”

Pipes provide a communications channel that abstracts the consumer of messages sent through that channel. It decouples your filter from one another, so components only need to know the messaging channel, or endpoint, where they are sending messages. They do not know who, or what, is processing that message, or where the receiver is located on the network.

Amazon SQS provides a lightweight solution with the power and scale of messaging middleware. It is a simple, flexible, fully managed message queuing service for reliably and continuously exchanging large volume of messages. It has virtually limitless scalability and the ability to increase message throughput without pre-provisioning capacity.

You can create an SQS queue with this AWS CLI command:

aws sqs create-queue --queue-name MyQueue

For the fare processing scenario, you could implement a Pipes and Filters architectural pattern using AWS services. This uses two Amazon SQS queues and an Amazon SNS topic:

Amazon SQS provides a mechanism for decoupling the components. The filters only need to know to which queue to send the message, without knowing which component processes that message nor when it is processed. SQS does this in a secure, durable, and scalable way.

Despite the fact that none of the filters have a direct dependency on one another, there is still a degree of coupling at the pipe level. Changing execution order therefore forces you to update and redeploy your existing filters to point to a new pipe. In the Wild Rydes example, you can reduce the impact of this by defining an environment variable for the destination endpoint in AWS Lambda function configuration, rather than hardcoding this inside your implementations.

Dealing with failures and retries requires some consideration too. In Amazon SQS terms, this requires you to define configurations, such a message VisibilityTimeOut. The VisibilityTimeOut setting provides you with some transactional support. It ensures that the message is not removed from the queue until after you have finished processing the message and you explicitly delete it from the queue. Using Amazon SQS as an Event Source for AWS Lambda further simplifies that for you because the message polling implementation is managed by the service, so you don’t need to create an explicit implementation in your filter.

Amazon SQS helps deal with failures gracefully as it maintains a count of how many times a message is processed via ReceiveCount. By specifying a maxReceiveCount, you can limit the number of times a poisoned message gets processed. Combine this with a dead letter queue (DLQ), you can then move messages that have exceeded the maxReceiveCount number to the DLQ. Adding Amazon CloudWatch alarms on metrics such as ApproximateNumberOfMessagesVisible on the DLQ, you can proactively alert on system failures if the number of messages on the dead letter queue exceed and acceptable threshold.

Alternatively, you can model the fare payment scenario with AWS Step Functions. Step Functions externalizes the Pipes and Filters pattern. It extracts the coordination from the filter implementations into a state machine that orchestrates the sequence of events. Visual workflows allow you to change the sequence of execution without modifying code, reducing the amount of coupling between collaborating components.

Here is how you could model the fare processing scenario using Step Functions:

{
  "Comment": "StateMachine for Processing Fare Payments",
  "StartAt": "RegisterFare",
  "States": {
    "RegisterFare": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:RegisterFareFunction",
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ChargeFareFunction",
      "Next": "UpdateCustomerAccount"
    },
    "UpdateCustomerAccount": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:UpdateCustomerAccountFunction",
      "Next": "PublishFareProcessedEvent"
    },
    "PublishFareProcessedEvent": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:REGION:ACCOUNT_ID:myTopic",
        "Message": {
          "Input": "Hello from Step Functions!"
        }
      },
      "End": true
    }
  }
}

AWS Step Functions allows you to easily build more sophisticated workflows. These could include decision points, parallel processing, wait states to pause the state machine execution, error handling, and retry logic. Error and Retry states help you simplify your component implementation by providing a framework for error handling and implementation exponential backoff on retries. You can define alternate execution paths if failures cannot be handled.

In this implementation, each of these states is a discrete transaction. Some implement database transactions when registering the fare, others are calling the third-party payment provider APIs, and internal APIs or programming interfaces when updating the customer accounting system.

Dealing with each of these transactions independently is relatively straightforward. But what happens if you require consistency across all steps so that either all or none of the transactions complete? How can you deal with consistency across multiple, distributed transactions? How do we deal with the temporal aspects of coordinating these potentially long running heterogeneous integrations?

Cloud providers do not support Distributed Transaction Coordinators (DTC) or two-phase commit protocols responsible for coordinating transactions across multiple cloud resources. Therefore, you need a mechanism to explicitly coordinate multiple local transactions. This is where the saga pattern and AWS Step Functions can help.

A saga is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987, they define the concept of a saga as:

“LLT is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions.” (Garcia-Molina, Salem 1987)

Fundamentally, saga can provide a failure management pattern to establish consistency across all of your distributed applications, by implementing a compensating transaction for each step in a series of functions. Compensating transactions allow you to back out of the changes that were previously committed in your series of functions, so that if one of your steps fails you can “undo” what you did before, and leave your system in stable state, devoid of side-effects.

AWS Step Functions provides a mechanism for implementing a saga pattern with the ability to build fully managed state machines that allow you to catch custom business exceptions and manage and share data across state transitions.

Figure 1: Using Step Functions’ Service Integrations for Amazon DynamoDB and Amazon SNS, you can further reduce the need for a custom AWS Lambda implementation to persist data to the database, or send a notification.

By using these capabilities, you can expand on the previous Fare Processing state machine and implementing compensating transaction states. If Register Fare fails, you may want to emit an event that invokes an external support function or generates a notification informing operators of the system the error.

If payment processing failed, you would want to ensure that the status is updated to reflect state change and then notify operators of the failed event. You might decide to refund customers, update the fare status and notify support, until you have been able to resolve issues with the customer accounting system. Regardless of the approach, Step Functions allows you to model a failure scenario that aligns with a more business-centric view of consistency.

If you want to see the full state machine implementation in Lab4 of Wild Rydes Asynchronous Messaging Workshop. The workshop guides you through building your own state machine so you can see how to apply the pattern to your own scenarios. There are also three other workshops you can walk through that cover the other patterns in the series.

Conclusion

Using Wild Rydes, I show how to use Amazon SQS and AWS Step Functions to decouple your application components and services. I show you how these services help to coordinate and orchestrate distributed components to build resilient and fault tolerant microservices architectures.

Take part in the Wild Rydes Asynchronous Messaging Workshop and learn about the other messaging patterns you can apply to microservices architectures, including fan-out and message filtering, topic-queue-chaining and load balancing (blog post), and scatter-gather.

The Wild Rydes Asynchronous Messaging Workshop resources are hosted on our AWS Samples GitHub repository, including the sample code for this blog post under Lab-4: Choreography and orchestration.

For a deeper dive into queues and topics and how to use these in microservices architectures, read:

For more information on enterprise integration patterns, see:

Application integration patterns for microservices: Running distributed RFQs

2020-11-11 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-running-distributed-rfqs/

This post is courtesy of Dirk Fröhner, Principal Solutions Architect.

The first blog in this series introduces asynchronous messaging for building loosely coupled systems that can scale, operate, and evolve individually. It considers messaging as a communications model for microservices architectures. Part 2 dives into fan-out strategies and applies the respective patterns to a concrete use case.

In this post, I look at how to apply messaging patterns to help coordinate distributed requests and responses. Specifically, I focus on a composite pattern called scatter-gather, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004).

I also show how a client can communicate with a backend via synchronous REST API operations while asynchronous messaging is applied internally for processing.

Overview

The use case is for Wild Rydes, a fictional application that replaces traditional taxis with unicorns. It’s used in several hands-on AWS workshops that illustrate serverless development concepts.

Wild Rydes wants to allow customers to initiate requests for quotation (RFQs) for their rides. This allows unicorns to make special offers to potential customers within a defined schedule. A customer can send their ride details and ask for quotations from all unicorns that are within a certain vicinity. The customer can then choose the best offer.

The scatter-gather pattern

The scatter-gather pattern can be used to implement this use-case on the server side. This pattern is ideal for requesting responses from multiple parties, then aggregating and processing that data.

As presented by Hohpe and Woolf, the scatter-gather pattern is a composite pattern that illustrates how to “broadcast a message to multiple recipients and re-aggregate the responses back into a single message”. The pattern is illustrated in the following diagram.

The flow starts with the Requester to initiate the broadcast to all potential Responders. This can be architected in a loosely coupled manner using pub-sub messaging with Amazon SNS or Amazon MQ, as shown in this blog post.

All responders must send their answers somewhere for aggregation and processing. This can also be architected in a loosely coupled manner using a message queue with Amazon SQS or Amazon MQ, as described in this blog post.

The Aggregator component consumes the individual responses from the response queue. It forwards the aggregate to the Processor component for final processing. Both Aggregator and Processor can be part of the same application or process. If separated, they can be decoupled through messaging. The Requester can also be part of the same application or process as Aggregator and Processor.

Explaining the architecture and API

In this section, I walk through the use-case and explain how it can be architected and implemented. I show how the scatter-gather pattern works in the backend, and the client-to-backend communication.

Submit instant ride RFQ

To initiate such an RFQ, the customer app communicates with the ride booking service on the backend. The ride booking service exposes a REST API. By default, an RFQ runs for five minutes, but Wild Rydes is working on a feature to let a customer individually set that value.

A request to submit an instant-ride RFQ contains start and destination locations for the ride and the customer ID:

POST /<submit-instant-ride-rfq-resource-path> HTTP/1.1
...

{
    "from": "...",
    "to": "...",
    "customer": "..."
}

The RFQ is a lengthy process so the client app should not expect an immediate response. Instead, the API accepts the RFQ, creates an RFQ task resource, and returns to the client. The response contains a URL to request an update for the status. It also provides an estimated time for the end of the RFQ:

HTTP/1.1 202 Accepted
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "eta": "..."
}

The following architecture shows this interaction, excluding the process after a new RFQ is submitted.

Processing the RFQ

The backend uses the scatter-gather pattern to publish the RFQ to unicorns and collect responses for aggregation and processing.

1. The ride booking service acts as the requester in the scatter-gather pattern. Following a new RFQ from the client app, it publishes the details into an SNS topic. This topic is related to the location of the ride’s starting point since customers need quotes from unicorns within the vicinity. These messages are the green request messages.

2. The unicorn management service maintains instances of unicorn management resources and subscribes them to RFQ topics related to their current location. These resources receive the RFQ request messages and handle the interaction with the Wild Rydes unicorn app.

3. The unicorns in the vicinity are notified through the Wild Rydes unicorn app about the new RFQ and can react if they are available. Notification options between the unicorn management service and the Wild Rydes unicorn app include push notifications and web sockets.

4. Every addressed unicorn can now submit their quote. All quotes go back through the unicorn management resources and the unicorn management service into the RFQ response queue. They act as the responders in the sense of the scatter-gather pattern.

5. The ride booking service also acts as aggregator and processor in the sense of the scatter-gather pattern. It uses SQS to consume messages from an RFQ response queue that eventually contains the RFQ responses from the involved unicorns. It starts doing so immediately after it publishes the details of a new RFQ into the RFQ topic. The messages from the RFQ response queue relate to the blue response messages.

The ride booking service consumes all incoming responses from that queue. This continues until the deadline or all participating unicorns have answered, whatever occurs first. The aggregator responsibility can be as simple as persisting the details of each incoming RFQ response into an Amazon DynamoDB table.

To match incoming responses to the right RFQ, it uses a fundamental integration pattern, correlation ID. In this pattern, a requester adds a unique ID to an outgoing message and each responder is asked to forward this ID in their response.

Also, responders must know where to send their responses to. To keep this dynamic, there is another fundamental integration pattern: return address. It suggests that a requester adds meta information into outgoing messages that indicate the address for their responses. In this architecture, this is the ARN of the SQS queue that acts as the RFQ response queue. This supports an option to simplify the response management: the RFQ response queue is a dedicated queue per customer.

Lastly, the processor responsibility in the ride booking service reads the RFQ responses from the DynamoDB table. It converts the data to JSON for the Wild Rydes customer app.

Check RFQ status

During the RFQ processing, a customer may want to know how many responses have already arrived, or if the results are already available. After submitting an instant ride RFQ, the client receives a representation of the running task. It can use the self-link to request an update:

GET /<rfq-task-resource-path> HTTP/1.1

While the task is running, a response from the ride booking service comes back with the respective status value and the count of responses that have already arrived:

HTTP/1.1 200 OK
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "responses-received": 2,
    "eta": "..."
}

After the RFQ is completed

An RFQ is completed if either the time is up or all unicorns have answered. The result of the RFQ is then available to the customer. If the client requests an update to the task representation, the response indicates this by redirecting to the RFQ result:

HTTP/1.1 303 See Other
Location: <url-of-rfq-result-resource>

Requesting a representation of the results resource, the client receives the quotes of all the participating unicorns. The frontend customer app can visualize these accordingly:

HTTP/1.1 200 OK
...

{
    "links": { ... },
    "from": "...",
    "to": "...",
    "customer": "...",
    "quotes": [ ... ]
}

The ride booking service can also use means of active notifications to make the customer app aware once the RFQ result is ready, including the link to the RFQ result. Examples for this include push notifications and web sockets.

Conclusion

In this blog, I present the scatter-gather pattern, which is a composite pattern based on pub-sub and point-to-point messaging channels. It also employs correlation ID and return address. I show how this is implemented in the Wild Rydes example application. You can use this integration pattern for communication in your microservices.

I cover how synchronous API communication between end user client and backend can work along with asynchronous messaging for request processing internally.

To learn more:

The use-cases of this blog series are discussed and implemented in our Decoupled Microservices hands-on workshop.
Watch the AWS re:Invent 2019 talk “Application integration patterns for microservices”. This explains both the fundamental integration patterns in addition to all the use-cases of this blog series.

For more serverless learning resources, visit https://serverlessland.com.

Integrating CloudEndure Disaster Recovery into your security incident response plan

2020-11-10 Gonen Stein

Post Syndicated from Gonen Stein original https://aws.amazon.com/blogs/security/integrating-cloudendure-disaster-recovery-into-your-security-incident-response-plan/

An incident response plan (also known as procedure) contains the detailed actions an organization takes to prepare for a security incident in its IT environment. It also includes the mechanisms to detect, analyze, contain, eradicate, and recover from a security incident. Every incident response plan should contain a section on recovery, which outlines scenarios ranging from single component to full environment recovery. This recovery section should include disaster recovery (DR), with procedures to recover your environment from complete failure. Effective recovery from an IT disaster requires tools that can automate preparation, testing, and recovery processes. In this post, I explain how to integrate CloudEndure Disaster Recovery into the recovery section of your incident response plan. CloudEndure Disaster Recovery is an Amazon Web Services (AWS) DR solution that enables fast, reliable recovery of physical, virtual, and cloud-based servers on AWS. This post also discusses how you can use CloudEndure Disaster Recovery to reduce downtime and data loss when responding to a security incident, and best practices for maintaining your incident response plan.

How disaster recovery fits into a security incident response plan

The AWS Well-Architected Framework security pillar provides guidance to help you apply best practices and current recommendations in the design, delivery, and maintenance of secure AWS workloads. It includes a recommendation to integrate tools to secure and protect your data. A secure data replication and recovery tool helps you protect your data if there’s a security incident and quickly return to normal business operation as you resolve the incident. The recovery section of your incident response plan should define recovery point objectives (RPOs) and recovery time objectives (RTOs) for your DR-protected workloads. RPO is the window of time that data loss can be tolerated due to a disruption. RTO is the amount of time permitted to recover workloads after a disruption.

Your DR response to a security incident can vary based on the type of incident you encounter. For example, your DR plan for responding to a security incident such as ransomware—which involves data corruption—should describe how to recover workloads on your secondary DR site using a recovery point prior to the data corruption. This use case will be discussed further in the next section.

In addition to tools and processes, your security incident response plan should define the roles and responsibilities necessary during an incident. This includes the people and roles in your organization who perform incident mitigation steps, in addition to those who need to be informed and consulted. This can include technology partners, application owners, or subject matter experts (SMEs) outside of your organization who can offer additional expertise. DR-related roles for your incident response plan include:

A person who analyzes the situation and provides visibility to decision-makers.
A person who decides whether or not to trigger a DR response.
A person who actively triggers the DR response.

Be sure to include all of the stakeholders you identify in your documented security incident response procedures and runbooks. Test your plan to verify that the people in these roles have the pre-provisioned access they need to perform their defined role.

How to use CloudEndure Disaster Recovery during a security incident

CloudEndure Disaster Recovery continuously replicates your servers—including OS, system state configuration, databases, applications, and files—to a staging area in your target AWS Region. The staging area contains low-cost resources automatically provisioned and managed by CloudEndure Disaster Recovery. This reduces the cost of provisioning duplicate resources during normal operation. Your fully provisioned recovery environment is launched only during an incident or drill.

If your organization experiences a security incident that can be remediated using DR, you can use CloudEndure Disaster Recovery to perform failover to your target AWS Region from your source environment. When you perform failover, CloudEndure Disaster Recovery orchestrates the recovery of your environment in your target AWS Region. This enables quick recovery, with RPOs of seconds and RTOs of minutes.

To deploy CloudEndure Disaster Recovery, you must first install the CloudEndure agent on the servers in your environment that you want to replicate for DR, and then initiate data replication to your target AWS Region. Once data replication is complete and your data is in sync, you can launch machines in your target AWS Region from the CloudEndure User Console. CloudEndure Disaster Recovery enables you to launch target machines in either Test Mode or Recovery Mode. Your launched machines behave the same way in either mode; the only difference is how the machine lifecycle is displayed in the CloudEndure User Console. Launch machines by opening the Machines page, shown in the following figure, and selecting the machines you want to launch. Then select either Test Mode or Recovery Mode from the Launch Target Machines menu.

Figure 1: Machines page on the CloudEndure User Console

You can launch your entire environment, a group of servers comprising one or more applications, or a single server in your target AWS Region. When you launch machines from the CloudEndure User Console, you’re prompted to choose a recovery point from the Choose Recovery Point dialog box (shown in the following figure).

Use point-in-time recovery to respond to security incidents that involve data corruption, such as ransomware. Your incident response plan should include a mechanism to determine when data corruption occurred. Knowing how to determine which recovery point to choose in the CloudEndure User Console helps you minimize response time during a security incident. Each recovery point is a point-in-time snapshot of your servers that you can use to launch recovery machines in your target AWS Region. Select the latest recovery point before the data corruption to restore your workloads on AWS, and then choose Continue With Launch.

Figure 2: Selection of an earlier recovery point from the Choose Recovery Point dialog box

Run your recovered workloads in your target AWS Region until you’ve resolved the security incident. When the incident is resolved, you can perform failback to your primary environment using CloudEndure Disaster Recovery. You can learn more about CloudEndure Disaster Recovery setup, operation, and recovery by taking this online CloudEndure Disaster Recovery Technical Training.

Test and maintain the recovery section of your incident response plan

Your entire incident response plan must be kept accurate and up to date in order to effectively remediate security incidents if they occur. A best practice for achieving this is through frequently testing all sections of your plan, including your tools. When you first deploy CloudEndure Disaster Recovery, begin running tests as soon as all of your replicated servers are in sync on your target AWS Region. DR solution implementation is generally considered complete when all initial testing has succeeded.

By correctly configuring the networking and security groups in your target AWS Region, you can use CloudEndure Disaster Recovery to launch a test workload in an isolated environment without impacting your source environment. You can run tests as often as you want. Tests don’t incur additional fees beyond payment for the fully provisioned resources generated during tests.

Testing involves two main components: launching the machines you wish to test on AWS, and performing user acceptance testing (UAT) on the launched machines.

Launch machines to test.

Select the machines to test from the Machines page of the CloudEndure User Console by selecting the check box next to the machine. Then choose Test Mode from the Launch Target Machines menu, as shown in the following figure. You can select the latest recovery point or an earlier recovery point.

Figure 3: Select Test Mode to launch selected machines

The following figure shows the CloudEndure User Console. The Disaster Recovery Lifecycle column shows that the machines have been Tested Recently.

Figure 4: Machines launched in Test Mode display purple icons in the Status column and Tested Recently in the Disaster Recovery Lifecycle column
Perform UAT testing.

Begin UAT testing when the machine launch job is successfully completed and your target machines have booted.

After you’ve successfully deployed, configured, and tested CloudEndure Disaster Recovery on your source environment, add it to your ongoing change management processes so that your incident response plan remains accurate and up-to-date. This includes deploying and testing CloudEndure Disaster Recovery every time you add new servers to your environment. In addition, monitor for changes to your existing resources and make corresponding changes to your CloudEndure Disaster Recovery configuration if necessary.

How CloudEndure Disaster Recovery keeps your data secure

CloudEndure Disaster Recovery has multiple mechanisms to keep your data secure and not introduce new security risks. Data replication is performed using AES 256-bit encryption in transit. Data at rest can be encrypted by using Amazon Elastic Block Store (Amazon EBS) encryption with an AWS managed key or a customer key. Amazon EBS encryption is supported by all volume types, and includes built-in key management infrastructure that has no performance impact. Replication traffic is transmitted directly from your source servers to your target AWS Region, and can be restricted to private connectivity such as AWS Direct Connect or a VPN. CloudEndure Disaster Recovery is ISO 27001 and GDPR compliant and HIPAA eligible.

Summary

Each organization tailors its incident response plan to meet its unique security requirements. As described in this post, you can use CloudEndure Disaster Recovery to improve your organization’s incident response plan. I also explained how to recover from an earlier point in time when you respond to security incidents involving data corruption, and how to test your servers as part of maintaining the DR section of your incident response plan. By following the guidance in this post, you can improve your IT resilience and recover more quickly from security incidents. You can also reduce your DR operational costs by avoiding duplicate provisioning of your DR infrastructure.

Visit the CloudEndure Disaster Recovery product page if you would like to learn more. You can also view the AWS Raise the Bar on Data Protection and Security webinar series for additional information on how to protect your data and improve IT resilience on AWS.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Integrating AWS Outposts with existing security zones

2020-11-05 Shubha Kumbadakone

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/integrating-aws-outposts-with-existing-security-zones/

This post is contributed by Santiago Freitas and Matt Lehwess.

AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to your on-premises facility. This blog post explains how the resources created on an Outpost can be integrated with security zones of an existing infrastructure. Integrating Outposts with existing security zones is important for customers that segment their environment into multiple domains based on their security policy.

Background

It’s common for customers to have different security zones within their infrastructure. For example, a customer might have a DMZ security zone where internet facing systems are located, an extra-net security zone where systems used to communicate with business partners are hosted, and an internal security zone where the systems accessible only by employees operate.

As illustrated in the following figure, customers often use firewalls and network ACLs to perform packet inspection and filtering for traffic that flows between security zones. Customers also usually use similar security controls for traffic flowing between different application tiers.

Example of firewall being used for security zone segmentation

Enabling connectivity from an Outposts VPC to your on-premises network

During your Outpost deployment, AWS works with you to establish connectivity to your local network. During this installation process, AWS creates an address pool based on an IP range you assign to the Outpost out of your local network’s IP ranges. This address pool is known as a customer-owned IP address pool or CoIP pool.

When resources running on Outposts (like an EC2 instance) need to communicate with existing on-premises systems, you must assign an Elastic IP address to them. This Elastic IP address comes from the CoIP pool. The resources in your local area network (LAN), including firewalls and network ACLs, see the Elastic IP address as the source IP of the packets sent from the resources running on Outposts. This Elastic IP address is also used as the destination IP for traffic from your on-premises systems to the instance on the Outpost.

How to enable connectivity from an Outposts VPC to your on-premises network

With a typical Outpost deployment, as shown in the following diagram, the following items are required to enable connectivity from the Outpost’s VPC to your on-premises network:

VPC routing: Routes to your on-premises network in the route table of the VPC’s Outpost subnet.
CoIP pool: Assigned by you, the customer, and created by AWS at time of install, for example, 192.168.1.0/26 in the diagram. It’s used to allocate Elastic IP addresses that can then be associated with instances or other resources on the Outpost.
Elastic IP address association: Customer configured, association of Elastic IP addresses from the CoIP pool with EC2 instances and other resources on the Outpost. For example, VPC IP 10.0.3.112 from instance Y has Elastic IP address 192.168.1.15 associated with it.
Routing advertisement: The Outpost advertises the assigned CoIP range to the local devices within your on-premises network via Border Gateway Protocol (BGP). This is configured by AWS during installation and based on the CoIP pool IP range you provide.

AWS Outposts – typical network topology

Integrating Outposts with existing security zones

CoIP pools, and AWS Resource Access Manager (RAM) are used to facilitate the integration of Outposts with your existing security zones. AWS RAM lets you share your resources with AWS account or through AWS Organizations.

When AWS deploys your Outposts, you can ask AWS to create multiple CoIP pools. Each CoIP pool is configured with an IP range assigned by you during initial installation. If after the initial installation you need AWS to create additional CoIP pools for an existing Outpost, you can open a case with AWS Support and request the creation of additional pools.

CoIP pools are owned initially by the AWS account that owns the Outpost. In a multi-account Outpost deployment, you can share your customer-owned IP pool(s) with other AWS accounts in your AWS Organization using AWS RAM.

VPCs that have subnets on the Outpost must be created in the same account that owns the Outpost. After creation of these subnets within the Outpost account, AWS RAM can then be used to share these subnets with other accounts or organizational units that are in the same organization from AWS Organizations.

After you’ve shared the CoIP pool with the account where you’d like to run your workloads, and you’ve shared an Outpost subnet with that same account, users of that account can deploy resources in the shared Outpost subnet. For the Outposts resources that must communicate with resources in your existing infrastructure, users can assign Elastic IP addresses from one or more of the shared CoIP pools to those Outposts resources.

Multiple CoIP pools and the ability to share them and Outpost subnets with a particular account, gives you granular control over the IP range used by Outpost resources requiring connectivity to your on-premises LAN.

The following figure illustrates the sharing of a subnet and CoIP pool from Account 1, which is the account where the Outpost was initially deployed, with Account 2. As the CoIP pool was shared with Account 2, the users in Account 2 are able to assign an Elastic IP address to the EC2 instance running within Account 2.

AWS Outposts and VPC Sharing

Creating an AWS RAM resource share

The following screenshots demonstrate how to use AWS RAM to share a CoIP pool and a subnet with another account in the same organization from AWS Organizations.

Step 1. Navigate to the AWS RAM service page within the AWS Management Console, and click Create Resource Share. This step is done within the AWS account that owns the Outpost.

Step 2. Next, give the share an appropriate name.

Step 3. Select one or more subnets you’d like to share with your application owner’s account. In this case, select a subnet that exists in your VPC on the Outpost.

Step 4. In this case, share the CoIP range. You can find that in the resources type list.

Step 5. Select the Customer-owned Ipv4Pool resource type, and then select the CoIP to be shared. Selecting the the CoIP adds it to the list of shared resources along with the subnet selected earlier.

Step 6. Add the account number for the account you’d like to share the Outposts subnet and CoIP pool with.

Step 7. Click Create Resource. After clicking the Create Resource share button, you should see the resource share listed and be in the state “active.”

Now that you’ve successfully shared the subnet and CoIP pool with your application account (that’s in the same organization), you can now go in to that account and allocate an Elastic IP address from the CoIP pool. You can then assign the Elastic IP address to an instance launched in the subnet previously shared, which is on the Outpost.

Allocating and associating the Elastic IP address from your CoIP pool

Step 1. From within the application account, allocate an Elastic IP address from the CoIP pool. This is done by navigating to the VPC console, then to Elastic IP addresses, and then click Allocate Elastic IP address.

Step 2. Select Customer owned pool of IPv4 addresses, select the CoIP pool that’s been shared with this account previously, then click Allocate. You can see this step in the following image.

Step 3. You can now see the new Elastic IP address that has been allocated from the shared CoIP pool. You can now associate that Elastic IP address with an instance in our shared subnet.

Note: When using the AWS Management Console to allocate an Elastic IP address, the ElasticIP address is automatically allocated from the CoIPpool. If you’re using the AWS CLI, you have the option to select a specific Elastic IP address from the CoIP pool by using the --customer-owned-ipv4-pool <value> option.

Step 4. You can now associate the Elastic IP address of type CoIP with an instance. Click Associate after selecting the instance that is running on your Outpost in the shared subnet. This step is represented in the following image.

Step 5 – Once the operation is complete, you can see the CoIP Elastic IP address in the Elastic IP address console, and that it’s assigned to the instance that’s running in your shared subnet.

The steps preceding demonstrated how to share the Outpost with other accounts through the use of AWS Resource Access Manager, subnet sharing, and CoIP sharing.

Use case

Let’s apply the capabilities described prior into a design. A customer is running a 3-tier web application and would like to host the web and application tiers on Outposts. The database tier is hosted on the existing infrastructure. The application’s user traffic comes from the internet, through an external firewall towards the web tier hosted on the Outpost. Traffic between web and application tiers is secured with security groups and remains within the Outpost. Application servers connect to the database servers via the existing internal firewalls. The customer uses independent external and internal firewalls.

During the Outposts installation, the customer asks AWS to create two CoIP pools. The first CoIP pool, referenced in the following image as CoIP Web, is assigned the IP range 172.0.1.0/24. The second CoIP pool, referenced in the following image as CoIP App, is assigned the IP range 172.0.2.0/24. The customer then creates two additional AWS accounts, one for the web tier and another for the app tier. Within the account that owns the Outpost, the customer creates two VPCs and within each of those VPCs a subnet is created.

The subnet with IP address 10.0.1.0/24 is shared with the web tier account and the subnet with IP address 10.1.2.0/24 is shared with the app tier account. The customer then configures VPC peering between the VPCs to enable communication between the web and app tiers. VPC peering configuration is performed within the account that owns the Outposts.

Note: With VPC peering traffic between the VPCs remains within the Outpost. However, data transfer charges for VPC peering applies. This use-case could also be built with the web tier and app tier using different subnets within the same VPC to save on data transfer costs over VPC peering.

The following image shows initial setup with two CoIP pools, two accounts, and two VPCs.

Initial setup with two CoIP pools, two accounts and two VPCs

After the initial setup the customer then shares each CoIP pool with the appropriate account using AWS RAM. The customer can then create their web and application servers within the respective accounts. As shown in the prior image, the web server is assigned IP address 10.0.1.5 and the application server is assigned IP address 10.1.2.10. Each IP address is part of their respective VPC IP range and are reachable only within the Outpost at this point.

To enable the integration of the web and application servers with the existing infrastructure, the customer assigns an Elastic IP address from the CoIP pool shared with each account to their respective Amazon EC2 instances.

With this configuration, the customer can integrate the web and application servers into the existing security zones. To integrate the web servers, path “A” in the following image, the customer creates a rule in their external firewall that allows communication from any source (internet) to Elastic IP address 172.0.1.5 on port 443 (HTTPS) which has been assigned to the web server.

For the communication between the web and application servers, path “B” in the following image, the customer has configured VPC peering between the two VPCs and created the required security groups.

To integrate the application servers into the Internal Security Zone, path “C” in the following image, the customer has assigned Elastic IP address 172.0.2.10 to the application server and has configured a rule on the Internal Firewall allowing the IP range 172.0.2.0/24 that is assigned to the app CoIP pool to communicate with the database server.

End to end traffic flow from users to web tier and from app tier to database tier

In addition to setting up their Outposts as covered in the preceding details, to enable the communication between the Outposts hosted resources and the existing infrastructure you must create a custom route table and associate it with an Outpost subnet.

Conclusion

AWS Outposts extends AWS infrastructure, services, APIs, and tools to your on-premises facility. During deployment of your Outposts, AWS works with you to establish connectivity to your local network. This blog post builds upon this initial deployment of an Outpost to allow the Outpost’s resources to integrate into your existing security zones. The design is applicable for customers that segment their environment into multiple security zones.

Santiago Freitas is AWS Head of Technology for Central Eastern Europe (CEE), Middle East, North Africa (MENA), Sub Saharan Africa (SSA), Turkey (TUR), and Russia and Commonwealth of Independent States (RUS-CIS). Previously he was an AWS Global Solutions Architect for financial services. Before joining AWS, Santiago was a Distinguished Engineer at Cisco. He is based in Dubai, United Arab Emirates.

Matt is a Principal Developer Advocate for Amazon Web Services where he has spent the last 7 years working with AWS customers and partners, helping them to build best-in-class AWS networking solutions. He is co-author of the AWS Certified Advanced Networking Official Study Guide, as well as an Amazon Re:Invent speaker extraordinaire (5 years and counting!). His primary focus at AWS is to help evangelize AWS networking solutions and more recently, AWS Outposts.

Announcing Outposts and local gateway sharing for multi-account access

2020-11-03 Shubha Kumbadakone

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/announcing-outposts-and-local-gateway-sharing-for-multi-account-access/

This post was contributed by James Devine, Sr. Outposts SA

AWS Outposts enables customers to run AWS services in their on-premises environments. With the release of Outposts and local gateway (LGW) sharing, customers can now configure multi-account access and sharing within an AWS Organization.

Prior to this release, an Outpost was only viable within a single AWS account. VPC sharing was the main way to enable multiple accounts to use Outposts capacity. With the release of Outposts and LGW sharing support, there is now additional functionality to enable multi-account access Outpost capacity within an AWS Organization.

Outposts and LGW sharing is facilitated through AWS Resource Access Manager (RAM). It enables Outposts and LGWs to be shared with AWS accounts within the same AWS Organization. The account that orders Outposts is the owner account that can create resource shares. The accounts that have access to the share are called consumer accounts. Each consumer account can create its own VPCs with subnets that reside on the shared Outpost.

This post will discuss how to start using this new functionality and considerations to take into account.

Use cases

Per AWS best practices, customers typically deploy a number of AWS accounts. Utilizing multiple accounts allows for reduced blast radius and the ability to provide infrastructure isolation by line of business, environment type, and even down to individual workloads. Outposts sharing enables customers to extend their existing AWS account structures to seamlessly integrate with Outposts.

Getting started – creating a resource share

Before any resources can be shared, the first step is to configure an AWS Organization (if one does not already exist). Outposts resources can only be shared with accounts under the same AWS Organization. The Outposts can reside in any account under an organization. For centralized management of Outposts, it is recommend to create a dedicated account, or set of accounts, to host Outposts.

Once an organization is created with member AWS accounts, resources shares can then be created. It’s possible to place multiple resources into a resource share. To facilitate Outposts, LGW, and customer-owned IP (CoIP) sharing, a single resource share can be created that includes all three resources. Principals can then be added to the resource share. The principals can be both organizational units (OUs) and individual AWS account IDs within the AWS Organization. In this case, I’ve shared all three resources with a consumer account ID as a principal, as demonstrated in the following screenshot.

Screen shot of resource sharing.

Sharing an Outpost

After an Outposts is provisioned, the logical Outpost ID can be shared with any account under the AWS Organization. The consumer account then has access to provision resources on the Outposts, such as Amazon EBS volumes and Outposts subnets, as well as launching instances on the shared Outpost.

From the AWS Management Console in the consumer account, I can see the shared Outposts ID, its associated Availability Zone, and the owner account ID.

Once the Outposts ID is selected, I can use the Actions drop down menu to create Outposts subnets and EBS volumes. I can also select Launch instance to provision instances on the Outpost.

Sharing an LGW

Each consumer account can create their own Outposts subnets within their own VPCs. LGW sharing enables the consumer account to create routes an Outposts subnet route table that has a shared LGW as the destination. This enables Outposts subnets in the consumer account to have communication with the on-premises network through the shared LGW.

The consumer account view shared LGWs, as shown in the following screenshot.

The consumer account can then select VPCs within the account to associate with the LGW route table. This enables routing to on-premises if a CoIP is assigned to an instance.

This enables routing to on-premises if a CoIP is assigned to an instance.

Considerations

LGW and Outposts sharing is meant to enable sharing of resources between various accounts within a larger organizational structure. It is not suitable for multi-tenancy outside of an AWS Organization. Additional considerations around capacity planning, access, and local network connectivity should be taken into account.

Resources created in the consumer account are only visible from within the consumer account. The AWS account that owns the Outpost does not have the ability to view instances, EBS volumes, VPCs, subnets, or any other resource created within the consumer account. Since the consumer account is part of an AWS Organization, it is possible to use the default OrganizationAccountAccessRole role that is created by AWS Organizations. This allows for visibility and management of Outpost resources across the AWS Organization.

Capacity information is not shared with the consumer account. However, it is possible to use cross-account CloudWatch metric sharing. Outposts utilization metrics from the account that owns the Outpost can be shared with the consumer account. This allows the consumer account to see what capacity is available on the shared Outposts. I’ve configured the cross-account sharing, and from my consumer account I can see that there is ample c5.xlarge capacity on the shared Outposts.

I’ve configured the cross-account sharing, and from my consumer account I can see that there is ample c5.xlarge capacity on the shared Outposts.

If a principal (consumer account or organizational unit) no longer requires access to Outposts capacity, the resource share can be deleted through RAM in the primary Outposts account. It is important to note that this does not delete subnets, EBS volumes, instances, or other resources running on the shared Outposts. Proper cleanup of Outposts resources within the consumer account (EBS volumes, instances, subnets, etc.) should be planned for whenever removing principals from a resource share to ensure that the capacity is released.

Conclusion

In the blog post, I described the Outposts and LGW sharing capabilities and demonstrated how they can be used to enable multi-account sharing of an Outpost within an AWS Organization. These new capabilities unlock even more customer use cases and allow for stronger blast-radius and account isolation. It’s exciting to see continued functionality come to Outposts! You can start using LGW and Outposts sharing today. There’s no need to upgrade or modify your Outposts in any way to take advantage of this new and exciting functionality.

Getting started with DevOps automation

2020-10-29 Jared Murrell

Post Syndicated from Jared Murrell original https://github.blog/2020-10-29-getting-started-with-devops-automation/

This is the second post in our series on DevOps fundamentals. For a guide to what DevOps is and answers to common DevOps myths check out part one.

What role does automation play in DevOps?

First things first—automation is one of the key principles for accelerating with DevOps. As noted in my last blog post, it enables consistency, reliability, and efficiency within the organization, making it easier for teams to discover and troubleshoot problems.

However, as we’ve worked with organizations, we’ve found not everyone knows where to get started, or which processes can and should be automated. In this post, we’ll discuss a few best practices and insights to get teams moving in the right direction.

A few helpful guidelines

The path to DevOps automation is continually evolving. Before we dive into best practices, there are a few common guidelines to keep in mind as you’re deciding what and how you automate.

Choose open standards. Your contributors and team may change, but that doesn’t mean your tooling has to. By maintaining tooling that follows common, open standards, you can simplify onboarding and save time on specialized training. Community-driven standards for packaging, runtime, configuration, and even networking and storage—like those found in Kubernetes—also become even more important as DevOps and deployments move toward the cloud.
Use dynamic variables. Prioritizing reusable code will reduce the amount of rework and duplication you have, both now and in the future. Whether in scripts or specialized tools, securely using externally-defined variables is an easy way to apply your automation to different environments without needing to change the code itself.
Use flexible tooling you can take with you. It’s not always possible to find a tool that fits every situation, but using a DevOps tool that allows you to change technologies also helps reduce rework when companies change direction. By choosing a solution with a wide ecosystem of partner integrations that works with any cloud, you’ll be able to define your unique set of best practices and reach your goals—without being restricted by your toolchain.

DevOps automation best practices

Now that our guidelines are in place, we can evaluate which sets of processes we need to automate. We’ve broken some best practices for DevOps automation into four categories to help you get started.

1. Continuous integration, continuous delivery, and continuous deployment

We often think of the term “DevOps” as being synonymous with “CI/CD”. At GitHub we recognize that DevOps includes so much more, from enabling contributors to build and run code (or deploy configurations) to improving developer productivity. In turn, this shortens the time it takes to build and deliver applications, helping teams add value and learn faster. While CI/CD and DevOps aren’t precisely the same, CI/CD is still a core component of DevOps automation.

Continuous integration (CI) is a process that implements testing on every change, enabling users to see if their changes break anything in the environment.
Continuous delivery (CD) is the practice of building software in a way that allows you to deploy any successful release candidate to production at any time.
Continuous deployment (CD) takes continuous delivery a step further. With continuous deployment, every successful change is automatically deployed to production. Since some industries and technologies can’t immediately release new changes to customers (think hardware and manufacturing), adopting continuous deployment depends on your organization and product.

Together, continuous integration and continuous delivery (commonly referred to as CI/CD) create a collaborative process for people to work on projects through shared ownership. At the same time, teams can maintain quality control through automation and bring new features to users with continuous deployment.

2. Change management

Change management is often a critical part of business processes. Like the automation guidelines, there are some common principles and tooling that development and operations teams can use to create consistency.

Version control: The practice of using version control has a long history rooted in helping people revert changes and learn from past decisions. From RCS to SVN, CVS to Perforce, ClearCase to Git, version control is a staple for enabling teams to collaborate by providing a common workflow and code base for individuals to work with.
Change control: Along with maintaining your code’s version history, having a system in place to coordinate and facilitate changes helps to maintain product direction, reduces the probability of harmful changes to your code, and encourages a collaborative process.
Configuration management: Configuration management makes it easier for everyone to manage complex deployments through templates and manage changes at scale with proper controls and approvals.

3. ‘X’ as code

By now, you also may have heard of “infrastructure as code,” “configuration as code,” “policy as code,” or some of the other “as code” models. These models provide a declarative framework for managing different aspects of your operating environments through high level abstractions. Stated another way, you provide variables to a tool and the output is consistently the same, allowing you to recreate your resources consistently. DevOps implements the “as code” principle with several goals, including: an auditable change trail for compliance, collaborative change process via version control, a consistent, testable and reliable way of deploying resources, and as a way to lower the learning curve for new team members.

Infrastructure as code (IaC) provides a declarative model for creating immutable infrastructure using the same versioning and workflow that developers use for source code. As changes are introduced to your infrastructure requirements, new infrastructure is defined, tested, and deployed with new configurations through automated declarative pipelines.
Platform as code (PaC) provides a declarative model for services similar to how infrastructure as code provides a framework for recreating the same infrastructure—allowing you to rapidly deploy services to existing infrastructure with high-level abstractions.
Configuration as code (CaC) brings the next level of declarative pipelining by defining the configuration of your applications as versioned resources.
Policy as code brings versioning and the DevOps workflow to security and policy management.

4. Continuous monitoring

Operational insights are an invaluable component of any production environment. In order to understand the behaviors of your software in production, you need to have information about how it operates. Continuous monitoring—the processes and technology that monitor performance and stability of applications and infrastructure throughout the software lifecycle—provides operations teams with data to help troubleshoot, and development teams the information needed to debug and patch. This also leads into an important aspect of security, where DevSecOps takes on these principles with a security focus. Choosing the right monitoring tools can be the difference between a slight service interruption and a major outage. When it comes to gaining operational insights, there are some important considerations:

Logging gives you a continuous stream of data about your business’ critical components. Application logs, infrastructure logs, and audit logs all provide important data that helps teams learn and improve products.
Monitoring provides a level of intelligence and interpretation to the raw data provided in logs and metrics. With advanced tooling, monitoring can provide teams with correlated insights beyond what the raw data provides.
Alerting provides proactive notifications to respective teams to help them stay ahead of major issues. When effectively implemented, these alerts not only let you know when something has gone wrong, but can also provide teams with critical debugging information to help solve the problem quickly.
Tracing takes logging a step further, providing a deeper level of application performance and behavioral insights that can greatly impact the stability and scalability of applications in production environments.

Putting DevOps automation into action

At this point, we’ve talked much about automation in the DevOps space, so is DevOps all about automation? Put simply, no. Automation is an important means to accomplishing this work efficiently between teams. Whether you’re new to DevOps or migrating from another set of automation solutions, testing new tooling with a small project or process is a great place to start. It will lay the foundation for scaling and standardizing automation across your entire organization, including how to measure effectiveness and progression toward your goals.

Regardless of which toolset you choose to automate your DevOps workflow, evaluating your teams’ current workflows and the information you need to do your work will help guide you to your tool and platform selection, and set the stage for success. Here are a few more resources to help you along the way:

Want to see what DevOps automation looks like in practice? See how engineers at Wiley build faster and more securely with GitHub Actions.

Choosing between AWS Lambda data storage options in web apps

2020-10-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/

AWS Lambda is an on-demand compute service that powers many serverless applications. Lambda functions are ephemeral, with execution environments only existing for a brief time when the function is invoked. Many compute operations need access to external data for a variety of purposes. This includes importing third-party libraries, accessing machine learning models, or exporting the output of the compute operation.

Lambda provides a comprehensive range of storage options to meet the needs of web application developers. These include other AWS services such as Amazon S3 and Amazon EFS. There are also native storage options available, such as temporary storage or Lambda layers. In this blog post, I explain the differences between these options, and discuss common use-cases to help you choose for your own applications.

This post references the Happy Path web application series, and you can download the code for that application from the repository.

Amazon S3 – Object storage

Amazon S3 is an object storage service that scales elastically. It offers high availability and 11 9’s of durability. The service is ideal for storing unstructured data. This includes binary data, such as images or media, log files and sensor data.

There are certain characteristics of S3 object storage that are important to remember. While S3 objects can be versioned, you cannot append data as you could in a file system. You have to store an entirely new version of an object. S3 also has a flat storage hierarchy that’s different to a file system. Instead of directories, you use folders to logically organize objects, by prefixing ‘foldername/’ in the key name.

S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3. In the Happy Path application, the image-processing workflows are initiated by this event integration. To learn more about using S3 to trigger automated serverless workflows, visit the learning path.

S3 is often an important repository for an organization’s data lake. If your application writes data to S3 buckets, this can be a useful staging area for downstream processing. For analytics workloads, you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards. To learn how to build business intelligence dashboards for your web application, visit the Innovator Island workshop.

S3 also provides object lifecycle management. This allows you to automatically change storage classes when certain conditions are met. For example, an application for uploading expenses could automatically archive PDFs after 1 year to Amazon S3 Glacier to reduce storage costs. In the Happy Path application, the original high-resolution uploads are stored in a separate bucket from the optimized distribution assets. To reduce storage costs, lifecycle management could be configured to automatically delete these original photo assets after 30 days.

Temporary storage with /tmp

The Lambda execution environment provides a file system for your code to use at /tmp. This space has a fixed size of 512 MB. The same Lambda execution environment may be reused by multiple Lambda invocations to optimize performance. The /tmp area is preserved for the lifetime of the execution environment and provides a transient cache for data between invocations. Each time a new execution environment is created, this area is deleted.

Consequently, this is intended as an ephemeral storage area. While functions may cache data here between invocations, it should be used only for data needed by code in a single invocation. It’s not a place to store data permanently, and is better-used to support operations required by your code.

Operationally, working with files in /tmp is the same as your local hard disk, and offers fast I/O throughput. For example, to unzip a file into this space in Python, use:

import os, zipfile
os.chdir('/tmp')
with zipfile.ZipFile(myzipfile, 'r') as zip:
    zip.extractall()

Lambda layers

Your Lambda functions may use additional libraries as part of the deployment package. You can bundle these in the deployment archive or optionally move to a layer instead. A Lambda function can have up to five layers, and is subject to the maximum deployment size of 50 MB (zipped). Packages in layers are available in the /opt directory during invocations. While layers are private to you by default, you can also share layers with other AWS accounts, or make layers public.

There are many benefits to using layers throughout the functions in your serverless application. It’s best practice to include the AWS SDK instead of depending on the version bundled with the Lambda service. This enables you to pin the version of the SDK. By using a layer, you don’t need to bundle the package with each function, which can increase your deployment package size and slow down deployments. You can create an AWS SDK layer and then include a reference to the layer in each function.

Layers can be an effective way to bundle large dependencies, or share compiled libraries with binaries that vary by operating system. For example, the Happy Path application uses the Sharp npm graphics library to process images. Similarly, the Innovator Island workshop uses the OpenCV library to perform image manipulation, and this is imported using a shared layer.

Layers are static once they are deployed. You can only change the contents of a layer by deploying a new version. Any Lambda function using the layer binds to a specific version and must be updated to change layer versions. To learn more, see using Lambda layers to simplify your development process.

Amazon EFS for Lambda

Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services. It is durable storage option that offers high availability. You can now mount EFS volumes in Lambda functions, which makes it simpler to share data across invocations. The file system grows and shrinks as you add or delete data, so you do not need to manage storage limits.

The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.

EFS enables new capabilities for serverless applications. The file system is a dynamic binding for Lambda functions, unlike layers. This makes it useful for deploying code libraries where you want to always use the latest version. You configure the mount path when integrating the file system with your function, and then include packages from this location. Additionally, you can use this to include packages that exceed the limits of layers.

Due to its speed and support of standard file operations, EFS is also useful for ingesting or writing large numbers files durably. This can be helpful for zipping or unzipping large archives, for example. For appending to existing files, EFS is also a preferred option to using S3.

To learn more, see using Amazon EFS for AWS Lambda in your serverless applications.

Comparing the different data storage options

This table compares the characteristics of these four different data storage options for Lambda:

	Amazon S3	/tmp	Lambda Layers	Amazon EFS
Maximum size	Elastic	512 MB	50 MB	Elastic
Persistence	Durable	Ephemeral	Durable	Durable
Content	Dynamic	Dynamic	Static	Dynamic
Storage type	Object	File system	Archive	File system
Lambda event source integration	Native	N/A	N/A	N/A
Operations supported	Atomic with versioning	Any file system operation	Immutable	Any file system operation
Object tagging	Y	N	N	N
Object metadata	Y	N	N	N
Pricing model	Storage + requests + data transfer	Included in Lambda	Included in Lambda	Storage + data transfer + throughput
Sharing/permissions model	IAM	Function-only	IAM	IAM + NFS
Source for AWS Glue	Y	N	N	N
Source for Amazon QuickSight	Y	N	N	N
Relative data access speed from Lambda	Fast	Fastest	Fastest	Very fast

Conclusion

Lambda is a flexible, on-demand compute service for serverless application. It supports a wide variety of workloads by providing a number of different data storage options.

In this post, I compare the capabilities and use-cases of S3, EFS, Lambda layers, and temporary storage for Lambda functions. There are benefits to each approach, as each type has different behaviors and characteristics. For web application developers, these storage types support different operations depending upon the needs of your serverless backend.

As the newest integration with Lambda, EFS now enables new workloads and capabilities. This includes sharing large code packages with Lambda, or durably operating on large numbers of files. It also opens up new possibilities for developers working on deep learning inference models.

To learn more about storage options available, visit the AWS Serverless homepage. For more serverless learning resources, visit https://serverlessland.com.

Optimizing the cost of serverless web applications

2020-10-19 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-the-cost-of-serverless-web-applications/

Web application backends are one of the most frequent types of serverless use-case for customers. The pay-for-value model can make it cost-efficient to build web applications using serverless tools.

While serverless cost is generally correlated with level of usage, there are architectural decisions that impact cost efficiency. The impact of these choices is more significant as your traffic grows, so it’s important to consider the cost-effectiveness of different designs and patterns.

This blog post reviews some common areas in web applications where you may be able to optimize cost. It uses the Happy Path web application as a reference example, which you can read about in the introductory blog post.

Serverless web applications generally use a combination of the services in the following diagram. I cover each of these areas to highlight common areas for cost optimization.

The API management layer: Selecting the right API type

Most serverless web applications use an API between the frontend client and the backend architecture. Amazon API Gateway is a common choice since it is a fully managed service that scales automatically. There are three types of API offered by the service – REST APIs, WebSocket APIs, and the more recent HTTP APIs.

HTTP APIs offer many of the features in the REST APIs service, but the cost is often around 70% less. It supports Lambda service integration, JWT authorization, CORS, and custom domain names. It also has a simpler deployment model than REST APIs. This feature set tends to work well for web applications, many of which mainly use these capabilities. Additionally, HTTP APIs will gain feature parity with REST APIs over time.

The Happy Path application is designed for 100,000 monthly active users. It uses HTTP APIs, and you can inspect the backend/template.yaml to see how to define these in the AWS Serverless Application Model (AWS SAM). If you have existing AWS SAM templates that are using REST APIs, in many cases you can change these easily:

Content distribution layer: Optimizing assets

Amazon CloudFront is a content delivery network (CDN). It enables you to distribute content globally across 216 Points of Presence without deploying or managing any infrastructure. It reduces latency for users who are geographically dispersed and can also reduce load on other parts of your service.

A typical web application uses CDNs in a couple of different ways. First, there is the distribution of the application itself. For single-page application frameworks like React or Vue.js, the build processes create static assets that are ideal for serving over a CDN.

However, these builds may not be optimized and can be larger than necessary. Many frameworks offer optimization plugins, and the JavaScript community frequently uses Webpack to bundle modules and shrink deployment packages. Similarly, any media assets used in the application build should be optimized. You can use tools like Lighthouse to analyze your web apps to find images that can be resized or compressed.

The second common CDN use-case for web apps is for user-generated content (UGC). Many apps allow users to upload images, which are then shared with other users. A typical photo from a 12-megapixel smartphone is 3–9 MB in size. This high resolution is not necessary when photos are rendered within web apps. Displaying the high-resolution asset results in slower download performance and higher data transfer costs.

The Happy Path application uses a Resizer Lambda function to optimize these uploaded assets. This process creates two different optimized images depending upon which component loads the asset.

The upload S3 bucket shows the original size of the upload from the smartphone:

The distribution S3 bucket contains the two optimized images at different sizes:

The distribution file sizes are 98–99% smaller. For a busy web application, using optimized image assets can make a significant difference to data transfer and CloudFront costs.

Additionally, you can convert to highly optimized file formats such as WebP to reduce file size even further. Not all browsers support this format, but you can use CSS on the frontend to fall back to other types if needed:

<img src="myImage.webp" onerror="this.onerror=null; this.src='myImage.jpg'">

The data layer

AWS offers many different database and storage options that can be useful for web applications. Billing models vary by service and Region. By understanding the data access and storage requirements of your app, you can make informed decisions about the right service to use.

Generally, it’s more cost-effective to store binary data in S3 than a database. First, when the data is uploaded, you can upload directly to S3 with presigned URLs instead of proxying data via API Gateway or another service.

If you are using Amazon DynamoDB, it’s best practice to store larger items in S3 and include a reference token in a table item. Part of DynamoDB pricing is based on read capacity units (RCUs). For binary items such as images, it is usually more cost-efficient to use S3 for storage.

Many web developers who are new to serverless are familiar with using a relational database, so choose Amazon RDS for their database needs. Depending upon your use-case and data access patterns, it may be more cost effective to use DynamoDB instead. RDS is not a serverless service so there are monthly charges for the underlying compute instance. DynamoDB pricing is based upon usage and storage, so for many web apps may be a lower-cost choice.

Integration layer

This layer includes services like Amazon SQS, Amazon SNS, and Amazon EventBridge, which are essential for decoupling serverless applications. Each of these have a request-based pricing component, where 64 KB of a payload is billed as one request. For example, a single SQS message with a 256 KB payload is billed as four requests. There are two optimization methods common for web applications.

1. Combine messages

Many messages sent to these services are much smaller than 64 KB. In some applications, the publishing service can combine multiple messages to reduce the total number of publish actions to SNS. Additionally, by either eliminating unused attributes in the message or compressing the message, you can store more data in a single request.

For example, a publishing service may be able to combine multiple messages together in a single publish action to an SNS topic:

Before optimization, a publishing service sends 100,000,000 1KB-messages to an SNS topic. This is charged as 100 million messages for a total cost of $50.00.
After optimization, the publishing service combines messages to send 1,562,500 64KB-messages to an SNS topic. This is charged as 1,562,500 messages for a total cost of $0.78.

2. Filter messages

In many applications, not every message is useful for a consuming service. For example, an SNS topic may publish to a Lambda function, which checks the content and discards the message based on some criteria. In this case, it’s more cost effective to use the native filtering capabilities of SNS. The service can filter messages and only invoke the Lambda function if the criteria is met. This lowers the compute cost by only invoking Lambda when necessary.

For example, an SNS topic receives messages about customer orders and forwards these to a Lambda function subscriber. The function is only interested in canceled orders and discards all other messages:

Before optimization, the SNS topic sends all messages to a Lambda function. It evaluates the message for the presence of an order canceled attribute. On average, only 25% of the messages are processed further. While SNS does not charge for delivery to Lambda functions, you are charged each time the Lambda service is invoked, for 100% of the messages.
After optimization, using an SNS subscription filter policy, the SNS subscription filters for canceled orders and only forwards matching messages. Since the Lambda function is only invoked for 25% of the messages, this may reduce the total compute cost by up to 75%.

3. Choose a different messaging service

For complex filtering options based upon matching patterns, you can use EventBridge. The service can filter messages based upon prefix matching, numeric matching, and other patterns, combining several rules into a single filter. You can create branching logic within the EventBridge rule to invoke downstream targets.

EventBridge offers a broader range of targets than SNS destinations. In cases where you publish from an SNS topic to a Lambda function to invoke an EventBridge target, you could use EventBridge instead and eliminate the Lambda invocation. For example, instead of routing from SNS to Lambda to AWS Step Functions, instead create an EventBridge rule that routes events directly to a state machine.

Business logic layer

Step Functions allows you to orchestrate complex workflows in serverless applications while eliminating common boilerplate code. The Standard Workflow service charges per state transition. Express Workflows were introduced in December 2019, with pricing based on requests and duration, instead of transitions.

For workloads that are processing large numbers of events in shorter durations, Express Workflows can be more cost-effective. This is designed for high-volume event workloads, such as streaming data processing or IoT data ingestion. For these cases, compare the cost of the two workflow types to see if you can reduce cost by switching across.

Lambda is the on-demand compute layer in serverless applications, which is billed by requests and GB-seconds. GB-seconds is calculated by multiplying duration in seconds by memory allocated to the function. For a function with a 1-second duration, invoked 1 million times, here is how memory allocation affects the total cost in the US East (N. Virginia) Region:

Memory (MB)	GB/S	Compute cost	Total cost
128	125,000	$ 2.08	$ 2.28
512	500,000	$ 8.34	$ 8.54
1024	1,000,000	$ 16.67	$ 16.87
1536	1,500,000	$ 25.01	$ 25.21
2048	2,000,000	$ 33.34	$ 33.54
3008	2,937,500	$ 48.97	$ 49.17

There are many ways to optimize Lambda functions, but one of the most important choices is memory allocation. You can choose between 128 MB and 3008 MB, but this also impacts the amount of virtual CPU as memory increases. Since total cost is a combination of memory and duration, choosing more memory can often reduce duration and lower overall cost.

Instead of manually setting the memory for a Lambda function and running executions to compare duration, you can use the AWS Lambda Power Tuning tool. This uses Step Functions to run your function against varying memory configurations. It can produce a visualization to find the optimal memory setting, based upon cost or execution time.

Conclusion

Web application backends are one of the most popular workload types for serverless applications. The pay-per-value model works well for this type of workload. As traffic grows, it’s important to consider the design choices and service configurations used to optimize your cost.

Serverless web applications generally use a common range of services, which you can logically split into different layers. This post examines each layer and suggests common cost optimizations helpful for web app developers.

To learn more about building web apps with serverless, see the Happy Path series. For more serverless learning resources, visit https://serverlessland.com.

Building a cross-account CI/CD pipeline for single-tenant SaaS solutions

2020-10-18 Rafael Ramos

Post Syndicated from Rafael Ramos original https://aws.amazon.com/blogs/devops/cross-account-ci-cd-pipeline-single-tenant-saas/

With the increasing demand from enterprise customers for a pay-as-you-go consumption model, more and more independent software vendors (ISVs) are shifting their business model towards software as a service (SaaS). Usually this kind of solution is architected using a multi-tenant model. It means that the infrastructure resources and applications are shared across multiple customers, with mechanisms in place to isolate their environments from each other. However, you may not want or can’t afford to share resources for security or compliance reasons, so you need a single-tenant environment.

To achieve this higher level of segregation across the tenants, it’s recommended to isolate the environments on the AWS account level. This strategy brings benefits, such as no network overlapping, no account limits sharing, and simplified usage tracking and billing, but it comes with challenges from an operational standpoint. Whereas multi-tenant solutions require management of a single shared production environment, single-tenant installations consist of dedicated production environments for each customer, without any shared resources across the tenants. When the number of tenants starts to grow, delivering new features at a rapid pace becomes harder to accomplish, because each new version needs to be manually deployed on each tenant environment.

This post describes how to automate this deployment process to deliver software quickly, securely, and less error-prone for each existing tenant. I demonstrate all the steps to build and configure a CI/CD pipeline using AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CloudFormation. For each new version, the pipeline automatically deploys the same application version on the multiple tenant AWS accounts.

There are different caveats to build such cross-account CI/CD pipelines on AWS. Because of that, I use AWS Command Line Interface (AWS CLI) to manually go through the process and demonstrate in detail the various configuration aspects you have to handle, such as artifact encryption, cross-account permission granting, and pipeline actions.

Single-tenancy vs. multi-tenancy

One of the first aspects to consider when architecting your SaaS solution is its tenancy model. Each brings their own benefits and architectural challenges. On multi-tenant installations, each customer shares the same set of resources, including databases and applications. With this mode, you can use the servers’ capacity more efficiently, which generally leads to significant cost-saving opportunities. On the other hand, you have to carefully secure your solution to prevent a customer from accessing sensitive data from another. Designing for high availability becomes even more critical on multi-tenant workloads, because more customers are affected in the event of downtime.

Because the environments are by definition isolated from each other, single-tenant solutions are simpler to design when it comes to security, networking isolation, and data segregation. Likewise, you can customize the applications per customer, and have different versions for specific tenants. You also have the advantage of eliminating the noisy-neighbor effect, and can plan the infrastructure for the customer’s scalability requirements. As a drawback, in comparison with multi-tenant, the single-tenant model is operationally more complex because you have more servers and applications to maintain.

Which tenancy model to choose depends ultimately on whether you can meet your customer needs. They might have specific governance requirements, be bound to a certain industry regulation, or have compliance criteria that influences which model they can choose. For more information about modeling your SaaS solutions, see SaaS on AWS.

Solution overview

To demonstrate this solution, I consider a fictitious single-tenant ISV with two customers: Unicorn and Gnome. It uses one central account where the tools reside (Tooling account), and two other accounts, each representing a tenant (Unicorn and Gnome accounts). As depicted in the following architecture diagram, when a developer pushes code changes to CodeCommit, Amazon CloudWatch Events triggers the CodePipeline CI/CD pipeline, which automatically deploys a new version on each tenant’s AWS account. It ensures that the fictitious ISV doesn’t have the operational burden to manually re-deploy the same version for each end-customers.

Architecture diagram of a CI/CD pipeline for single-tenant SaaS solutions

For illustration purposes, the sample application I use in this post is an AWS Lambda function that returns a simple JSON object when invoked.

Prerequisites

Before getting started, you must have the following prerequisites:

Three AWS accounts:
- Tooling – Where the CodeCommit repository, the artifact store, and the pipeline orchestration reside.
- Tenant 1 – The dedicated account for the first tenant, called Unicorn.
- Tenant 2 – The dedicated account for the second tenant, called Gnome.
Install and authenticate the AWS CLI. You can authenticate with an AWS Identity and Access Management (IAM) user or an AWS Security Token Service (AWS STS) token.
Install Git.

Setting up the Git repository

Your first step is to set up your Git repository.

Create a CodeCommit repository to host the source code.

The CI/CD pipeline is automatically triggered every time new code is pushed to that repository.

Make sure Git is configured to use IAM credentials to access AWS CodeCommit via HTTP by running the following command from the terminal:

git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true

Clone the newly created repository locally, and add two files in the root folder: index.js and application.yaml.

The first file is the JavaScript code for the Lambda function that represents the sample application. For our use case, the function returns a JSON response object with statusCode: 200 and the body Hello!\n. See the following code:

exports.handler = async (event) => {
    const response = {
        statusCode: 200,
        body: `Hello!\n`,
    };
    return response;
};

The second file is where the infrastructure is defined using AWS CloudFormation. The sample application consists of a Lambda function, and we use AWS Serverless Application Model (AWS SAM) to simplify the resources creation. See the following code:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: Sample Application.

Parameters:
    S3Bucket:
        Type: String
    S3Key:
        Type: String
    ApplicationName:
        Type: String
        
Resources:
    SampleApplication:
        Type: 'AWS::Serverless::Function'
        Properties:
            FunctionName: !Ref ApplicationName
            Handler: index.handler
            Runtime: nodejs12.x
            CodeUri:
                Bucket: !Ref S3Bucket
                Key: !Ref S3Key
            Description: Hello Lambda.
            MemorySize: 128
            Timeout: 10

Push both files to the remote Git repository.

Creating the artifact store encryption key

By default, CodePipeline uses server-side encryption with an AWS Key Management Service (AWS KMS) managed customer master key (CMK) to encrypt the release artifacts. Because the Unicorn and Gnome accounts need to decrypt those release artifacts, you need to create a customer managed CMK in the Tooling account.

From the terminal, run the following command to create the artifact encryption key:

aws kms create-key --region <YOUR_REGION>

This command returns a JSON object with the key ARN property if run successfully. Its format is similar to arn:aws:kms:<YOUR_REGION>:<TOOLING_ACCOUNT_ID>:key/<KEY_ID>. Record this value to use in the following steps.

The encryption key has been created manually for educational purposes only, but it’s considered a best practice to have it as part of the Infrastructure as Code (IaC) bundle.

Creating an Amazon S3 artifact store and configuring a bucket policy

Our use case uses Amazon Simple Storage Service (Amazon S3) as artifact store. Every release artifact is encrypted and stored as an object in an S3 bucket that lives in the Tooling account.

To create and configure the artifact store, follow these steps in the Tooling account:

From the terminal, create an S3 bucket and give it a unique name:

aws s3api create-bucket \
    --bucket <BUCKET_UNIQUE_NAME> \
    --region <YOUR_REGION> \
    --create-bucket-configuration LocationConstraint=<YOUR_REGION>

Configure the bucket to use the customer managed CMK created in the previous step. This makes sure the objects stored in this bucket are encrypted using that key, replacing <KEY_ARN> with the ARN property from the previous step:

aws s3api put-bucket-encryption \
    --bucket <BUCKET_UNIQUE_NAME> \
    --server-side-encryption-configuration \
        '{
            "Rules": [
                {
                    "ApplyServerSideEncryptionByDefault": {
                        "SSEAlgorithm": "aws:kms",
                        "KMSMasterKeyID": "<KEY_ARN>"
                    }
                }
            ]
        }'

The artifacts stored in the bucket need to be accessed from the Unicorn and Gnome Configure the bucket policies to allow cross-account access:

aws s3api put-bucket-policy \
    --bucket <BUCKET_UNIQUE_NAME> \
    --policy \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "s3:GetBucket*",
                        "s3:List*"
                    ],
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:root",
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:root"
                        ]
                    },
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>"
                    ]
                },
                {
                    "Action": [
                        "s3:GetObject*"
                    ],
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:root",
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:root"
                        ]
                    },
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>/CrossAccountPipeline/*"
                    ]
                }
            ]
        }'

This S3 bucket has been created manually for educational purposes only, but it’s considered a best practice to have it as part of the IaC bundle.

Creating a cross-account IAM role in each tenant account

Following the security best practice of granting least privilege, each action declared on CodePipeline should have its own IAM role. For this use case, the pipeline needs to perform changes in the Unicorn and Gnome accounts from the Tooling account, so you need to create a cross-account IAM role in each tenant account.

Repeat the following steps for each tenant account to allow CodePipeline to assume role in those accounts:

Configure a named CLI profile for the tenant account to allow running commands using the correct access keys.
Create an IAM role that can be assumed from another AWS account, replacing <TENANT_PROFILE_NAME> with the profile name you defined in the previous step:

aws iam create-role \
    --role-name CodePipelineCrossAccountRole \
    --profile <TENANT_PROFILE_NAME> \
    --assume-role-policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::<TOOLING_ACCOUNT_ID>:root"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }'

Create an IAM policy that grants access to the artifact store S3 bucket and to the artifact encryption key:

aws iam create-policy \
    --policy-name CodePipelineCrossAccountArtifactReadPolicy \
    --profile <TENANT_PROFILE_NAME> \
    --policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "s3:GetBucket*",
                        "s3:ListBucket"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>"
                    ],
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "s3:GetObject*",
                        "s3:Put*"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>/CrossAccountPipeline/*"
                    ],
                    "Effect": "Allow"
                },
                {
                    "Action": [ 
                        "kms:DescribeKey", 
                        "kms:GenerateDataKey*", 
                        "kms:Encrypt", 
                        "kms:ReEncrypt*", 
                        "kms:Decrypt" 
                    ], 
                    "Resource": "<KEY_ARN>",
                    "Effect": "Allow"
                }
            ]
        }'

Attach the CodePipelineCrossAccountArtifactReadPolicy IAM policy to the CodePipelineCrossAccountRole IAM role:

aws iam attach-role-policy \
    --profile <TENANT_PROFILE_NAME> \
    --role-name CodePipelineCrossAccountRole \
    --policy-arn arn:aws:iam::<TENANT_ACCOUNT_ID>:policy/CodePipelineCrossAccountArtifactReadPolicy

Create an IAM policy that allows to pass the IAM role CloudFormationDeploymentRole to CloudFormation and to perform CloudFormation actions on the application Stack:

aws iam create-policy \
    --policy-name CodePipelineCrossAccountCfnPolicy \
    --profile <TENANT_PROFILE_NAME> \
    --policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "iam:PassRole"
                    ],
                    "Resource": "arn:aws:iam::<TENANT_ACCOUNT_ID>:role/CloudFormationDeploymentRole",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "cloudformation:*"
                    ],
                    "Resource": "arn:aws:cloudformation:<YOUR_REGION>:<TENANT_ACCOUNT_ID>:stack/SampleApplication*/*",
                    "Effect": "Allow"
                }
            ]
        }'

Attach the CodePipelineCrossAccountCfnPolicy IAM policy to the CodePipelineCrossAccountRole IAM role:

aws iam attach-role-policy \
    --profile <TENANT_PROFILE_NAME> \
    --role-name CodePipelineCrossAccountRole \
    --policy-arn arn:aws:iam::<TENANT_ACCOUNT_ID>:policy/CodePipelineCrossAccountCfnPolicy

Additional configuration is needed in the Tooling account to allow access, which you complete later on.

Creating a deployment IAM role in each tenant account

After CodePipeline assumes the CodePipelineCrossAccountRole IAM role into the tenant account, it triggers AWS CloudFormation to provision the infrastructure based on the template defined in the application.yaml file. For that, AWS CloudFormation needs to assume an IAM role that grants privileges to create resources into the tenant AWS account.

Repeat the following steps for each tenant account to allow AWS CloudFormation to create resources in those accounts:

Create an IAM role that can be assumed by AWS CloudFormation:

aws iam create-role \
    --role-name CloudFormationDeploymentRole \
    --profile <TENANT_PROFILE_NAME> \
    --assume-role-policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "cloudformation.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }'

Create an IAM policy that grants permissions to create AWS resources:

aws iam create-policy \
    --policy-name CloudFormationDeploymentPolicy \
    --profile <TENANT_PROFILE_NAME> \
    --policy-document \
        '{
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "iam:PassRole",
                    "Resource": "arn:aws:iam::<TENANT_ACCOUNT_ID>:role/*",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "iam:GetRole",
                        "iam:CreateRole",
                        "iam:DeleteRole",
                        "iam:AttachRolePolicy",
                        "iam:DetachRolePolicy"
                    ],
                    "Resource": "arn:aws:iam::<TENANT_ACCOUNT_ID>:role/*",
                    "Effect": "Allow"
                },
                {
                    "Action": "lambda:*",
                    "Resource": "*",
                    "Effect": "Allow"
                },
                {
                    "Action": "codedeploy:*",
                    "Resource": "*",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "s3:GetObject*",
                        "s3:GetBucket*",
                        "s3:List*"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>",
                        "arn:aws:s3:::<BUCKET_UNIQUE_NAME>/*"
                    ],
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "kms:Decrypt",
                        "kms:DescribeKey"
                    ],
                    "Resource": "<KEY_ARN>",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "cloudformation:CreateStack",
                        "cloudformation:DescribeStack*",
                        "cloudformation:GetStackPolicy",
                        "cloudformation:GetTemplate*",
                        "cloudformation:SetStackPolicy",
                        "cloudformation:UpdateStack",
                        "cloudformation:ValidateTemplate"
                    ],
                    "Resource": "arn:aws:cloudformation:<YOUR_REGION>:<TENANT_ACCOUNT_ID>:stack/SampleApplication*/*",
                    "Effect": "Allow"
                },
                {
                    "Action": [
                        "cloudformation:CreateChangeSet"
                    ],
                    "Resource": "arn:aws:cloudformation:<YOUR_REGION>:aws:transform/Serverless-2016-10-31",
                    "Effect": "Allow"
                }
            ]
        }'

The granted permissions in this IAM policy depend on the resources your application needs to be provisioned. Because the application in our use case consists of a simple Lambda function, the IAM policy only needs permissions over Lambda. The other permissions declared are to access and decrypt the Lambda code from the artifact store, use AWS CodeDeploy to deploy the function, and create and attach the Lambda execution role.

Attach the IAM policy to the IAM role:

aws iam attach-role-policy \
    --profile <TENANT_PROFILE_NAME> \
    --role-name CloudFormationDeploymentRole \
    --policy-arn arn:aws:iam::<TENANT_ACCOUNT_ID>:policy/CloudFormationDeploymentPolicy

Configuring an artifact store encryption key

Even though the IAM roles created in the tenant accounts declare permissions to use the CMK encryption key, that’s not enough to have access to the key. To access the key, you must update the CMK key policy.

From the terminal, run the following command to attach the new policy:

aws kms put-key-policy \
    --key-id <KEY_ARN> \
    --policy-name default \
    --region <YOUR_REGION> \
    --policy \
        '{
             "Id": "TenantAccountAccess",
             "Version": "2012-10-17",
             "Statement": [
                {
                    "Sid": "Enable IAM User Permissions",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::<TOOLING_ACCOUNT_ID>:root"
                    },
                    "Action": "kms:*",
                    "Resource": "*"
                },
                {
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:role/CloudFormationDeploymentRole",
                            "arn:aws:iam::<GNOME_ACCOUNT_ID>:role/CodePipelineCrossAccountRole",
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:role/CloudFormationDeploymentRole",
                            "arn:aws:iam::<UNICORN_ACCOUNT_ID>:role/CodePipelineCrossAccountRole"
                        ]
                    },
                    "Action": [
                        "kms:Decrypt",
                        "kms:DescribeKey"
                    ],
                    "Resource": "*"
                }
             ]
         }'

Provisioning the CI/CD pipeline

Each CodePipeline workflow consists of two or more stages, which are composed by a series of parallel or serial actions. For our use case, the pipeline is made up of four stages:

Source – Declares CodeCommit as the source control for the application code.
Build – Using CodeBuild, it installs the dependencies and builds deployable artifacts. In this use case, the sample application is too simple and this stage is used for illustration purposes.
Deploy_Dev – Deploys the sample application on a sandbox environment. At this point, the deployable artifacts generated at the Build stage are used to create a CloudFormation stack and deploy the Lambda function.
Deploy_Prod – Similar to Deploy_Dev, at this stage the sample application is deployed on the tenant production environments. For that, it contains two actions (one per tenant) that are run in parallel. CodePipeline uses CodePipelineCrossAccountRole to assume a role on the tenant account, and from there, CloudFormationDeploymentRole is used to effectively deploy the application.

To provision your resources, complete the following steps from the terminal:

Download the CloudFormation pipeline template:

curl -LO https://cross-account-ci-cd-pipeline-single-tenant-saas.s3.amazonaws.com/pipeline.yaml

Deploy the CloudFormation stack using the pipeline template:

aws cloudformation deploy \
    --template-file pipeline.yaml \
    --region <YOUR_REGION> \
    --stack-name <YOUR_PIPELINE_STACK_NAME> \
    --capabilities CAPABILITY_IAM \
    --parameter-overrides \
        ArtifactBucketName=<BUCKET_UNIQUE_NAME> \
        ArtifactEncryptionKeyArn=<KMS_KEY_ARN> \
        UnicornAccountId=<UNICORN_TENANT_ACCOUNT_ID> \
        GnomeAccountId=<GNOME_TENANT_ACCOUNT_ID> \
        SampleApplicationRepositoryName=<YOUR_CODECOMMIT_REPOSITORY_NAME> \
        RepositoryBranch=<YOUR_CODECOMMIT_MAIN_BRANCH>

This is the list of the required parameters to deploy the template:

- ArtifactBucketName – The name of the S3 bucket where the deployment artifacts are to be stored.
- ArtifactEncryptionKeyArn – The ARN of the customer managed CMK to be used as artifact encryption key.
- UnicornAccountId – The AWS account ID for the first tenant (Unicorn) where the application is to be deployed.
- GnomeAccountId – The AWS account ID for the second tenant (Gnome) where the application is to be deployed.
- SampleApplicationRepositoryName – The name of the CodeCommit repository where source changes are detected.
- RepositoryBranch – The name of the CodeCommit branch where source changes are detected. The default value is master in case no value is provided.

Wait for AWS CloudFormation to create the resources.

When stack creation is complete, the pipeline starts automatically.

For each existing tenant, an action is declared within the Deploy_Prod stage. The following code is a snippet of how these actions are configured to deploy the application on a different account:

RoleArn: !Sub arn:aws:iam::${UnicornAccountId}:role/CodePipelineCrossAccountRole
Configuration:
    ActionMode: CREATE_UPDATE
    Capabilities: CAPABILITY_IAM,CAPABILITY_AUTO_EXPAND
    StackName: !Sub SampleApplication-unicorn-stack-${AWS::Region}
    RoleArn: !Sub arn:aws:iam::${UnicornAccountId}:role/CloudFormationDeploymentRole
    TemplatePath: CodeCommitSource::application.yaml
    ParameterOverrides: !Sub | 
        { 
            "ApplicationName": "SampleApplication-Unicorn",
            "S3Bucket": { "Fn::GetArtifactAtt" : [ "ApplicationBuildOutput", "BucketName" ] },
            "S3Key": { "Fn::GetArtifactAtt" : [ "ApplicationBuildOutput", "ObjectKey" ] }
        }

The code declares two IAM roles. The first one is the IAM role assumed by the CodePipeline action to access the tenant AWS account, whereas the second is the IAM role used by AWS CloudFormation to create AWS resources in the tenant AWS account. The ParameterOverrides configuration declares where the release artifact is located. The S3 bucket and key are in the Tooling account and encrypted using the customer managed CMK. That’s why it was necessary to grant access from external accounts using a bucket and KMS policies.

Besides the CI/CD pipeline itself, this CloudFormation template declares IAM roles that are used by the pipeline and its actions. The main IAM role is named CrossAccountPipelineRole, which is used by the CodePipeline service. It contains permissions to assume the action roles. See the following code:

{
    "Action": "sts:AssumeRole",
    "Effect": "Allow",
    "Resource": [
        "arn:aws:iam::<TOOLING_ACCOUNT_ID>:role/<PipelineSourceActionRole>",
        "arn:aws:iam::<TOOLING_ACCOUNT_ID>:role/<PipelineApplicationBuildActionRole>",
        "arn:aws:iam::<TOOLING_ACCOUNT_ID>:role/<PipelineDeployDevActionRole>",
        "arn:aws:iam::<UNICORN_ACCOUNT_ID>:role/CodePipelineCrossAccountRole",
        "arn:aws:iam::<GNOME_ACCOUNT_ID>:role/CodePipelineCrossAccountRole"
    ]
}

When you have more tenant accounts, you must add additional roles to the list.

After CodePipeline runs successfully, test the sample application by invoking the Lambda function on each tenant account:

aws lambda invoke --function-name SampleApplication --profile <TENANT_PROFILE_NAME> --region <YOUR_REGION> out

The output should be:

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

Cleaning up

Follow these steps to delete the components and avoid future incurring charges:

Delete the production application stack from each tenant account:

aws cloudformation delete-stack --profile <TENANT_PROFILE_NAME> --region <YOUR_REGION> --stack-name SampleApplication-<TENANT_NAME>-stack-<YOUR_REGION>

Delete the dev application stack from the Tooling account:

aws cloudformation delete-stack --region <YOUR_REGION> --stack-name SampleApplication-dev-stack-<YOUR_REGION>

Delete the pipeline stack from the Tooling account:

aws cloudformation delete-stack --region <YOUR_REGION> --stack-name <YOUR_PIPELINE_STACK_NAME>

Delete the customer managed CMK from the Tooling account:

aws kms schedule-key-deletion --region <YOUR_REGION> --key-id <KEY_ARN>

Delete the S3 bucket from the Tooling account:

aws s3 rb s3://<BUCKET_UNIQUE_NAME> --force

Optionally, delete the IAM roles and policies you created in the tenant accounts

Conclusion

This post demonstrated what it takes to build a CI/CD pipeline for single-tenant SaaS solutions isolated on the AWS account level. It covered how to grant cross-account access to artifact stores on Amazon S3 and artifact encryption keys on AWS KMS using policies and IAM roles. This approach is less error-prone because it avoids human errors when manually deploying the exact same application for multiple tenants.

For this use case, we performed most of the steps manually to better illustrate all the steps and components involved. For even more automation, consider using the AWS Cloud Development Kit (AWS CDK) and its pipeline construct to create your CI/CD pipeline and have everything as code. Moreover, for production scenarios, consider having integration tests as part of the pipeline.

Rafael Ramos

Rafael is a Solutions Architect at AWS, where he helps ISVs on their journey to the cloud. He spent over 13 years working as a software developer, and is passionate about DevOps and serverless. Outside of work, he enjoys playing tabletop RPG, cooking and running marathons.

Unlocking Data from Existing Systems with a Serverless API Facade

2020-10-15 Santiago Freitas

Post Syndicated from Santiago Freitas original https://aws.amazon.com/blogs/architecture/unlocking-data-from-existing-systems-with-serverless-api-facade/

In today’s modern world, it’s not enough to produce a good product; it’s critical that your products and services are well integrated into the surrounding business ecosystem. Companies lose market share when valuable data about their products or services are locked inside their systems. Business partners and internal teams use data from multiple sources to enhance their customers’ experience.

This blog post explains an architecture pattern for providing access to data and functionalities from existing systems in a consistent way using well-defined APIs. It then covers what the API Facade architecture pattern looks like when implemented on AWS using serverless for API management and mediation layer.

Background

Modern applications are often developed with an application programming interface (API)-first approach. This significantly eases integrations with internal and third-party applications by exposing data and functionalities via well-documented APIs.

On the other hand, applications built several years ago have multiple interfaces and data formats which creates a challenge for integrating their data and functionalities into new applications. Those existing applications store vast amounts of historical data. Integrating their data to build new customer experiences can be very valuable.

Figure 1: Existing applications use a broad range of integration methods and data formats

API Facade pattern

When building modern APIs for existing systems, you can use an architecture pattern called API Facade. This pattern creates a layer that exposes well-structured and well-documented APIs northbound, and it integrates southbound with the required interfaces and protocols that existing applications use. This pattern is about creating a facade, which creates a consistent view from the perspective of the API consumer—usually an application developer, and ultimately another application.

In addition to providing a simple interface for complex existing systems, an API Facade allows you to protect future compatibility of your solution. This is because if the underlying systems are modified or replaced, the facade layer will remain the same. From the API consumer perspective, nothing will have changed.

The API facade consists of two layers: 1) API management layer; and 2) mediation layer.

Figure 2: Conceptual representation of API facade pattern.

The API management layer exposes a set of well-designed, well-documented APIs with associated URLs, request parameters and responses, a list of supported headers and query parameters, and possible error codes and descriptions. A developer portal is used to help API consumers discover which APIs are available, browse the API documentation, and register for—and immediately receive—an API key to build applications. The APIs exposed by this layer can be used by external as well as internal consumers and enables them to build applications faster.

The mediation layer is responsible for integration between API and underlying systems. It transforms API requests into formats acceptable for different systems and then process and transform underlying systems’ responses into response and data formats the API has promised to return to the API consumers. This layer can perform tasks ranging from simple data manipulations, such as converting a response from XML to JSON, to much more complex operations where an application-specific client is required to run in order to connect to existing systems.

API Facade pattern on AWS serverless platform

To build the API management and the mediation layer, you can leverage services from the AWS serverless platform.

Amazon API Gateway allows you to build the API management. With API Gateway you can create RESTful APIs and WebSocket APIs. It supports integration with the mediation layer running on containers on Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS), and also integration with serverless compute using AWS Lambda. API Gateway allows you to make your APIs available on the Internet for your business partners and third-party developers or keep them private. Private APIs hosted within your VPC can be accessed by resources inside your VPC, or those connected to your VPC via AWS Direct Connect or Site-to-Site VPN. This allows you to leverage API Gateway for building the API management of the API facade pattern for internal and external API consumers.

When it comes to building the Mediation layer, AWS Lambda is a great choice as it runs your mediation code without requiring you to provision or manage servers. AWS Lambda hosts the code that ingests the request coming from the API management layer, processes it, and makes the required format and protocols transformations. It can connect to the existing systems, and then return the response to the API management layer to send it back to the system which originated the request. AWS Lambda functions run outside your VPC or they can be configured to access systems in your VPC or those running in your own data centers connected to AWS via Direct Connect or Site-to-Site VPN.

However, some of the most complex mediations may require a custom client or have the need to maintain a persistent connection to the backend system. In those cases, using containers, and specifically AWS Fargate, would be more suitable. AWS Fargate is a serverless compute engine for containers with support for Amazon ECS and Amazon EKS. Containers running on AWS Fargate can access systems in your VPC or those running in your own data centers via Direct Connect or Site-to-Site VPN.

When building the API Facade pattern using AWS Serverless, you can focus most of your resources writing the API definition and mediation logic instead of managing infrastructure. This makes it easier for the teams who own the existing applications that need to expose data and functionality to own the API management and mediation layer implementations. A team that runs an existing application usually knows the best way to integrate with it. This team is also better equipped to handle changes to the mediation layer, which may be required as a result of changes to the existing application. Those teams will then publish the API information into a developer portal, which could be made available as a central API repository provided by a company’s tools team.

The following figure shows the API Facade pattern built on AWS Serverless using API Gateway for the API management layer and AWS Lambda and Fargate for the mediation layer. It functions as a facade for the existing systems running on-premises connected to AWS via Direct Connect and Site-to-Site VPN. The APIs are also exposed to external consumers via a public API endpoint as well as to internal consumers within a VPC. API Gateway supports multiple mechanisms for controlling and managing access to your API.

Figure 3: API Facade pattern built on AWS Serverless

To provide an example of a practical implementation of this pattern we can look into UK Open Banking. The Open Banking standard set the API specifications for delivering account information and payment initiation services banks such as HSBC had to implement. HSBC internal landscape is hugely varied and they needed to harness the power of multiple disparate on-premises systems while providing uniform API to the outside world. HSBC shared how they met the requirements on this re:Invent 2019 session.

Conclusion

You can build differentiated customer experiences and bring services to market faster when you integrate your products and services into the surrounding business ecosystem. Your systems can participate in a business ecosystem more effectively when they expose their data and capabilities via well-established APIs. The API Facade pattern enables existing systems that don’t offer well-established APIs natively to participate on this well-integrated business ecosystem. By building the API Facade pattern on the AWS serverless platform, you can focus on defining the APIs and the mediation layer code instead of spending resources on managing the infrastructure required to implement this pattern. This allows you to implement this pattern faster.

Architecting for database encryption on AWS

2020-10-08 Jonathan Jenkyn

Post Syndicated from Jonathan Jenkyn original https://aws.amazon.com/blogs/security/architecting-for-database-encryption-on-aws/

In this post, I review the options you have to protect your customer data when migrating or building new databases in Amazon Web Services (AWS). I focus on how you can support sensitive workloads in ways that help you maintain compliance and regulatory obligations, and meet security objectives.

Understanding transparent data encryption

I commonly see enterprise customers migrating existing databases straight from on-premises to AWS without reviewing their design. This might seem simpler and faster, but they miss the opportunity to review the scalability, cost-savings, and feature capability of native cloud services. A straight lift and shift migration can also create unnecessary operational overheads, carry-over unneeded complexity, and result in more time spent troubleshooting and responding to events over time.

One example is when enterprise customers who are using Transparent Data Encryption (TDE) or Extensible Key Management (EKM) technologies want to reuse the same technologies in their migration to AWS. TDE and EKM are database technologies that encrypt and decrypt database records as the records are written and read to the underlying storage medium. Customers use TDE features in Microsoft SQL Server, Oracle 10g and 11g, and Oracle Enterprise Edition to meet requirements for data-at-rest encryption. This shouldn’t mean that TDE is the requirement. It’s infrequent that an organizational policy or compliance framework specifies a technology such as TDE in the actual requirement. For example, the Payment Card Industry Data Security Standard (PCI-DSS) standard requires that sensitive data must be protected using “Strong cryptography with associated key-management processes and procedures.” Nowhere does PCI-DSS endorse or require the use of a specific technology.

Understanding risks

It’s important that you understand the risks that encryption-at-rest mitigates before selecting a technology to use. Encryption-at-rest, in the context of databases, generally manages the risk that one of the disks used to store database data is physically stolen and thus compromised. In on-premises scenarios, TDE is an effective technology used to manage this risk. All data from the database—up to and including the disk—is encrypted. The database manages all key management and cryptographic operations. You can also use TDE with a hardware security module (HSM) so that the keys and cryptography for the database are managed outside of the database itself. In TDE implementations, the HSM is used only to manage the key encryption keys (KEK), and not the data encryption keys (DEK) themselves. The DEKs are in volatile memory in the database at runtime, and so the cryptographic operations occur on the database itself.

You can also use native operating system encryption technologies such as dm-crypt or LUKS (Linux Unified Key Setup). Dm-crypt is a full disk encryption (FDE) subsystem in Linux kernel version 2.6 and beyond. Dm-crypt can be used on its own or with LUKS as an extension to add more features. When using dm-crypt, the operating system kernel is responsible for encrypting and decrypting data as it’s written and read from the attached volumes. This would achieve the same outcome as TDE—data written and read to the disk volume is encrypted, and the risk related to physical disk compromise is managed. DEKs are in runtime memory of the machine running the database.

With some TDE implementations, you can encrypt tables, rows, columns, and cells with different DEKs to achieve granular separation of duties between operators. Customers can then configure TDE to authorize access to each DEK based on database login credentials and job function, helping to manage risks associated with unauthorized access. However, the most common configuration I’ve seen is to rely on whole database encryption when using TDE. This configuration gives similar protection against the identified risks as dm-crypt with LUKS used without an HSM, since the DEKs and KEKs are stored within the instance in both cases and the result is that the database data on disk is encrypted.

Using encryption to manage data at rest risks in AWS

When you move to AWS, you gain additional security capabilities that can simplify your security implementations. Since the announcement of the AWS Key Management Service (AWS KMS) in 2014, it has been tightly integrated with Amazon Elastic Block Store (Amazon EBS), Amazon Simple Storage Service (Amazon S3), and dozens of other services on AWS. This means that data is encrypted on disk by checking a single check box. Furthermore, you get the benefits of AWS KMS for key management and cryptographic operations, while being transparent to the Amazon Elastic Compute Cloud (Amazon EC2) instance where the data is being encrypted and decrypted. For simplicity, the authorization for access to the data is managed entirely by AWS Identity and Access Management (IAM) and AWS KMS key resource policies.

If you need more granular access control to the data, you can use the AWS Encryption SDK to encrypt data at the application layer. That provides the same effect as TDE cell-level protection, with a FIPS140-2 Level 2 validated HSM, as might be required by a recognizing standard.

If you must use a FIPS140-2 Level 3 validated HSM to meet more stringent compliance standards or regulations, then you can use the Custom Key Store capability of AWS KMS to achieve that—again in a transparent way. This option has a trade-off, as there is additional operational overhead in terms of managing an AWS CloudHSM cluster.

Many customers choose to migrate their database into the managed Amazon Relational Database Service (Amazon RDS), rather than managing the database instance themselves. Like the Amazon EC2 service, RDS uses Amazon EBS volumes for its data storage, and so can seamlessly use AWS KMS for encryption at rest functionality. When you do so, your management overhead for the protection of data-at-rest reduces to almost zero. This lets you focus on business value while AWS is responsible for the management of your database and the protection of the underlying data. The next section reviews this option and others in more detail.

You can review the available Amazon RDS database engines and versions via the Amazon RDS User Guide documentation, or by running the following AWS Command Line Interface (AWS CLI) command:

aws rds describe-db-engine-versions --query "DBEngineVersions[].DBEngineVersionDescription" --region <regionIdentifier>

Summary

While you can operate in AWS similar to how you operate in your on-premises environment, the preceding configurations and recommendations show how you can significantly reduce your challenges and increase your benefits by using cloud-native security services like AWS KMS, Amazon RDS, and CloudHSM. Specifically, using Amazon RDS with Amazon EBS volumes encrypted by AWS KMS provides a highly scalable, resilient, and secure way to manage your keys in AWS.

While there might be some architectural redesign and configuration work needed to move an on-premises database into Amazon RDS, you can leverage AWS services to help you meet your compliance requirements with less effort. By offloading the OS and database maintenance responsibility to AWS, you simultaneously reduce operational friction and increase security. By migrating this way, you can benefit from the scalability and resilience of the AWS global infrastructure and expertise. Lastly, to get started with migrating your database to AWS, I encourage you to use the AWS Database Migration Service.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Choosing between messaging services for serverless applications

2020-09-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-messaging-services-for-serverless-applications/

Most serverless application architectures use a combination of different AWS services, microservices, and AWS Lambda functions. Messaging services are important in allowing distributed applications to communicate with each other, and are fundamental to most production serverless workloads.

Messaging services can improve the resilience, availability, and scalability of applications, when used appropriately. They can also enable your applications to communicate beyond your workload or even the AWS Cloud, and provide extensibility for future service features and versions.

In this blog post, I compare the primary messaging services offered by AWS and how you can use these in your serverless application architectures. I also show how you use and deploy these integrations with the AWS Serverless Application Model (AWS SAM).

Examples in this post refer to code that can be downloaded from this GitHub repository. The README.md file explains how to deploy and run each example.

Overview

Three of the most useful messaging patterns for serverless developers are queues, publish/subscribe, and event buses. In AWS, these are provided by Amazon SQS, Amazon SNS, and Amazon EventBridge respectively. All of these services are fully managed and highly available, so there is no infrastructure to manage. All three integrate with Lambda, allowing you to publish messages via the AWS SDK and invoke functions as targets. Each of these services has an important role to play in serverless architectures.

SNS enables you to send messages reliably between parts of your infrastructure. It uses a robust retry mechanism for when downstream targets are unavailable. When the delivery policy is exhausted, it can optionally send those messages to a dead-letter queue for further processing. SNS uses topics to logically separate messages into channels, and your Lambda functions interact with these topics.

SQS provides queues for your serverless applications. You can use a queue to send, store, and receive messages between different services in your workload. Queues are an important mechanism for providing fault tolerance in distributed systems, and help decouple different parts of your application. SQS scales elastically, and there is no limit to the number of messages per queue. The service durably persists messages until they are processed by a downstream consumer.

EventBridge is a serverless event bus service, simplifying routing events between AWS services, software as a service (SaaS) providers, and your own applications. It logically separates routing using event buses, and you implement the routing logic using rules. You can filter and transform incoming messages at the service level, and route events to multiple targets, including Lambda functions.

Integrating an SQS queue with AWS SAM

The first example shows an AWS SAM template defining a serverless application with two Lambda functions and an SQS queue:

You can declare an SQS queue in an AWS SAM template with the AWS::SQS::Queue resource:

  MySqsQueue:
    Type: AWS::SQS::Queue

To publish to the queue, the publisher function must have permission to send messages. Using an AWS SAM policy template, you can apply policy that enables send messaging to one specific queue:

      Policies:
        - SQSSendMessagePolicy:
            QueueName: !GetAtt MySqsQueue.QueueName

The AWS SAM template passes the queue name into the Lambda function as an environment variable. The function uses the sendMessage method of the AWS.SQS class to publish the message:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const sqs = new AWS.SQS({apiVersion: '2012-11-05'})

// The Lambda handler
exports.handler = async (event) => {
  // Params object for SQS
  const params = {
    MessageBody: `Message at ${Date()}`,
    QueueUrl: process.env.SQSqueueName
  }
  
  // Send to SQS
  const result = await sqs.sendMessage(params).promise()
  console.log(result)
}

When the SQS queue receives the message, it publishes to the consuming Lambda function. To configure this integration in AWS SAM, the consumer function is granted the SQSPollerPolicy policy. The function’s event source is set to receive messages from the queue in batches of 10:

  QueueConsumerFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: code/
      Handler: consumer.handler
      Runtime: nodejs12.x
      Timeout: 3
      MemorySize: 128
      Policies:  
        - SQSPollerPolicy:
            QueueName: !GetAtt MySqsQueue.QueueName
      Events:
        MySQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt MySqsQueue.Arn
            BatchSize: 10

The payload for the consumer function is the message from SQS. This is an array of messages up to the batch size, containing a body attribute with the publishing function’s MessageBody. You can see this in the CloudWatch log for the function:

Integrating an SNS topic with AWS SAM

The second example shows an AWS SAM template defining a serverless application with three Lambda functions and an SNS topic:

You declare an SNS topic and the subscribing Lambda functions with the AWS::SNS:Topic resource:

  MySnsTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Protocol: lambda
          Endpoint: !GetAtt TopicConsumerFunction1.Arn    
        - Protocol: lambda
          Endpoint: !GetAtt TopicConsumerFunction2.Arn

You provide the SNS service with permission to invoke the Lambda functions but defining an AWS::Lambda::Permission for each:

  TopicConsumerFunction1Permission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      Action: 'lambda:InvokeFunction'
      FunctionName: !Ref TopicConsumerFunction1
      Principal: sns.amazonaws.com

The SNSPublishMessagePolicy policy template grants permission to the publishing function to send messages to the topic. In the function, the publish method of the AWS.SNS class handles publishing:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const sns = new AWS.SNS({apiVersion: '2012-11-05'})

// The Lambda handler
exports.handler = async (event) => {
  // Params object for SNS
  const params = {
    Message: `Message at ${Date()}`,
    Subject: 'New message from publisher',
    TopicArn: process.env.SNStopic
  }
  
  // Send to SQS
  const result = await sns.publish(params).promise()
  console.log(result)
}

The payload for the consumer functions is the message from SNS. This is an array of messages, containing subject and message attributes from the publishing function. You can see this in the CloudWatch log for the function:

Differences between SQS and SNS configurations

SQS queues and SNS topics offer different functionality, though both can publish to downstream Lambda functions.

An SQS message is stored on the queue for up to 14 days until it is successfully processed by a subscriber. SNS does not retain messages so if there are no subscribers for a topic, the message is discarded.

SNS topics may broadcast to multiple targets. This behavior is called fan-out. It can be used to parallelize work across Lambda functions or send messages to multiple environments (such as test or development). An SNS topic can have up to 12,500,000 subscribers, providing highly scalable fan-out capabilities. The targets may include HTTP/S endpoints, SMS text messaging, SNS mobile push, email, SQS, and Lambda functions.

In AWS SAM templates, you can retrieve properties such as ARNs and names of queues and topics, using the following intrinsic functions:

	Amazon SQS	Amazon SNS
Channel type	Queue	Topic
Get ARN	!GetAtt MySqsQueue.Arn	!Ref MySnsTopic
Get name	!GetAtt MySqsQueue.QueueName	!GetAtt MySnsTopic.TopicName

Integrating with EventBridge in AWS SAM

The third example shows the AWS SAM template defining a serverless application with two Lambda functions and an EventBridge rule:

The default event bus already exists in every AWS account. You declare a rule that filters events in the event bus using the AWS::Events::Rule resource:

  EventRule: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "EventRule"
      EventPattern: 
        source: 
          - "demo.event"
        detail: 
          state: 
            - "new"
      State: "ENABLED"
      Targets: 
        - Arn: !GetAtt EventConsumerFunction.Arn
          Id: "ConsumerTarget"

The rule describes an event pattern specifying matching JSON attributes. Events that match this pattern are routed to the list of targets. You provide the EventBridge service with permission to invoke the Lambda functions in the target list:

  PermissionForEventsToInvokeLambda: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: "EventConsumerFunction"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: !GetAtt EventRule.Arn

The AWS SAM template uses an IAM policy statement to grant permission to the publishing function to put events on the event bus:

  EventPublisherFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: code/
      Handler: publisher.handler
      Timeout: 3
      Runtime: nodejs12.x
      Policies:
        - Statement:
          - Effect: Allow
            Resource: '*'
            Action:
              - events:PutEvents

The publishing function then uses the putEvents method of the AWS.EventBridge class, which returns after the events have been durably stored in EventBridge:

const AWS = require('aws-sdk')
AWS.config.update({region: 'us-east-1'})
const eventbridge = new AWS.EventBridge()

exports.handler = async (event) => {
  const params = {
    Entries: [ 
      {
        Detail: JSON.stringify({
          "message": "Hello from publisher",
          "state": "new"
        }),
        DetailType: 'Message',
        EventBusName: 'default',
        Source: 'demo.event',
        Time: new Date 
      }
    ]
  }
  const result = await eventbridge.putEvents(params).promise()
  console.log(result)
}

The payload for the consumer function is the message from EventBridge. This is an array of messages, containing subject and message attributes from the publishing function. You can see this in the CloudWatch log for the function:

Comparing SNS with EventBridge

SNS and EventBridge have many similarities. Both can be used to decouple publishers and subscribers, filter messages or events, and provide fan-in or fan-out capabilities. However, there are differences in the list of targets and features for each service, and your choice of service depends on the needs of your use-case.

EventBridge offers two newer capabilities that are not available in SNS. The first is software as a service (SaaS) integration. This enables you to authorize supported SaaS providers to send events directly from their EventBridge event bus to partner event buses in your account. This replaces the need for polling or webhook configuration, and creates a highly scalable way to ingest SaaS events directly into your AWS account.

The second feature is the Schema Registry, which makes it easier to discover and manage OpenAPI schemas for events. EventBridge can infer schemas based on events routed through an event bus by using schema discovery. This can be used to generate code bindings directly to your IDE for type-safe languages like Python, Java, and TypeScript. This can help accelerate development by automating the generation of classes and code directly from events.

This table compares the major features of both services:

	Amazon SNS	Amazon EventBridge
Number of targets	10 million (soft)	5
Availability SLA	99.9%	99.99%
Limits	100,000 topics. 12,500,000 subscriptions per topic.	100 event buses. 300 rules per event bus.
Publish throughput	Varies by Region. Soft limits.	Varies by Region. Soft limits.
Input transformation	No	Yes – see details.
Message filtering	Yes – see details.	Yes, including IP address matching – see details.
Message size maximum	256 KB	256 KB
Billing	Per 64 KB
Format	Raw or JSON	JSON
Receive events from AWS CloudTrail	No	Yes
Targets	HTTP(S), SMS, SNS Mobile Push, Email/Email-JSON, SQS, Lambda functions.	15 targets including AWS Lambda, Amazon SQS, Amazon SNS, AWS Step Functions, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose.
SaaS integration	No	Yes – see integrations.
Schema Registry integration	No	Yes – see details.
Dead-letter queues supported	Yes	No
FIFO ordering available	No	No
Public visibility	Can create public topics	Cannot create public buses
Pricing	$0.50/million requests + variable delivery cost + data transfer out cost. SMS varies.	$1.00/million events. Free for AWS events. No charge for delivery.
Billable request size	1 request = 64 KB	1 event = 64 KB
AWS Free Tier eligible	Yes	No
Cross-Region	You can subscribe your AWS Lambda functions to an Amazon SNS topic in any Region.	Targets must be in the same Region. You can publish across Regions to another event bus.
Retry policy	For SQS/Lambda, exponential backoff over 23 days. For SMTP, SMS and Mobile push, exponential backoff over 6 hours.	At-least-once event delivery to targets, including retry with exponential backoff for up to 24 hours.

Conclusion

Messaging is an important part of serverless applications and AWS services provide queues, publish/subscribe, and event routing capabilities. This post reviews the main features of SNS, SQS, and EventBridge and how they provide different capabilities for your workloads.

I show three example applications that publish and consume events from the three services. I walk through AWS SAM syntax for deploying these resources in your applications. Finally, I compare differences between the services.

To learn more building decoupled architectures, see this Learning Path series on EventBridge. For more serverless learning resources, visit https://serverlessland.com.

Architecture Patterns for Red Hat OpenShift on AWS

2020-09-22 Ryan Niksch

Post Syndicated from Ryan Niksch original https://aws.amazon.com/blogs/architecture/architecture-patterns-for-red-hat-openshift-on-aws/

Editor’s note: Although this blog post and its accompanying code make use of the word “Master,” Red Hat is making open source code more inclusive by eradicating “problematic language.” Read more about this.

Introduction

Red Hat OpenShift is an application platform that provides customers with turnkey application platform that is much more than a simple Kubernetes orchestration.

OpenShift customers choose AWS as their cloud of choice because of the efficiency, security, and reliability, scalability, and elasticity it provides. Customers seeking to modernize their business, process, and application stacks are drawn to the rich AWS service and feature sets.

As such, we see some customers migrate from on-premises to AWS or exist in a hybrid context with application workloads running in various locations. For OpenShift customers, this poses a few questions and considerations:

What are the recommendations for the best way to deploy OpenShift on AWS?
How is this different from what customers were used to on-premises?
How does this ensure resilience and availability?
Do customers need a multi-region, multi-account approach?

For hybrid customers, there are assumptions and misconceptions:

Where does the control plane exist?
Is there replication, and if so, what are the considerations and ramifications?

In this post I will run through some of the more common questions and patterns for OpenShift on AWS, while looking at some of the terminology and conceptual differences of AWS. I’ll explore migration and hybrid use cases and address some misconceptions.

OpenShift building blocks

On AWS, OpenShift 4x is the norm. To that effect, I will focus on OpenShift 4, but many of the considerations will apply to both OpenShift 3 and OpenShift 4.

Let’s unpack some of the OpenShift building blocks. An OpenShift cluster consists of Master, infrastructure, and worker nodes. The Master forms the control plane and infrastructure nodes cater to a routing layer and additional functions, such as logging, monitoring etc. Worker nodes are the nodes that customer application container workloads will exist on.

When deployed on-premises, OpenShift nodes will be placed in separate network subnets. Depending on distance, latency, etc., a single OpenShift cluster may span two data centers that have some nodes in a subnet in one data center and other subnets in a different data center. This applies to customers with data centers within a few miles of each other with high-speed connectivity. An alternative would be an OpenShift cluster in each data center.

AWS concepts and terminology

At AWS, the concept of “region” is a geolocation, such as EMEA (Europe, Middle East, and Africa) or APAC (Asian Pacific) rather than a data center or specific building. An Availability Zone (AZ) is the closest construct on AWS that maps to a physical data center. Within each region you will find multiple (typically three or more) AZs. Note that a single AZ will contain multiple physical data centers but we treat it as a single point of failure. For example, an event that impacts an AZ would be expected to impact all the data centers within that AZ. To this effect, customers should deploy workloads spanning multiple AZs to protect against any event that would impact a single AZ.

Deploying OpenShift

When deploying an OpenShift cluster on AWS, we recommend starting with three Master nodes spread across three AWS AZs and three worker nodes spread across three AZs. This allows for the combination of resilience and availably constructs provided by AWS as well as Red Hat OpenShift. The OpenShift installer provides a means of deploying the underlying AWS infrastructure in two ways: IPI Installer-provisioned infrastructure and UPI user-provisioned infrastructure. Both Red Hat and AWS collect customer feedback and use this to drive recommended patterns that are then included in the OpenShift installer. As such, the OpenShift installer IPI mode becomes a living reference architecture for deploying OpenShift on AWS.

Deploying OpenShift

The installer will require inputs for the environment on which it’s being deployed. In this case, since I am deploying on AWS, I will need to provide the AWS region, AZs, or subnets that related to the AZs, as well as EC2 instance type. The installer will then generate a set of ignition files that will be used during the deployment of OpenShift:

apiVersion: v1
baseDomain: example.com 
controlPlane: 
  hyperthreading: Enabled   
  name: master
  platform:
    aws:
      zones:
      - us-west-2a
      - us-west-2b
      - us-west-2c
      rootVolume:
        iops: 4000
        size: 500
        type: io1
      type: m5.xlarge 
  replicas: 3
compute: 
- hyperthreading: Enabled 
  name: worker
  platform:
    aws:
      rootVolume:
        iops: 2000
        size: 500
        type: io1 
      type: m5.xlarge
      zones:
      - us-west-2a
      - us-west-2b
      - us-west-2c
  replicas: 3
metadata:
  name: test-cluster 
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-west-2 
    userTags:
      adminContact: jdoe
      costCenter: 7536
pullSecret: '{"auths": ...}' 
fips: false 
sshKey: ssh-ed25519 AAAA...

What does this look like at scale?

For larger implementations, we would see additional worker nodes spread across three or more AZs. As more worker nodes are added, use of the control plane increases. Initially scaling up the Amazon Elastic Compute Cloud (EC2) instance type to a larger instance type is an effective way of addressing this. It’s possible to add more Master nodes, and we recommend that an odd number of nodes are maintained. It is more common to see scaling out of the infrastructure nodes before there is a need to scale Masters. For large-scale implementations, infrastructure functions such as the router, monitoring, and logging functions can be moved to separate EC2 instances from the Master nodes, as well as from each other. It is important to spread the routing layer across multiple AZs, which is critical to maintaining availability and resilience.

The process of resource separation is now controlled by infrastructure machine sets within OpenShift. An infrastructure machine set would need to be defined, then the infrastructure role edited to be moved from the default to this new infrastructure machine set. Read about this in greater detail.

OpenShift in a multi-account context

Using AWS accounts as a means of separation is a common well-architected pattern. AWS Organizations and AWS Control Tower are services that are commonly adopted as part of a multi-account strategy. This is very much the case when looking to enable teams to use their own accounts and when an account vending process is needed to cater for self-service account provisioning.

OpenShift in a multi-account context

OpenShift clusters are deployed into multiple accounts. An OpenShift dev cluster is deployed into an AWS Dev account. This account would typically have AWS Developer Support associated with it. A separate production OpenShift cluster would be provisioned into an AWS production account with AWS Enterprise Support. Enterprise support provides for faster support case response times, and you get the benefit of dedicated resources such as a technical account manager and solutions architect.

CICD pipelines and processes are then used to control the application life cycle from code to dev to production. The pipelines would push the code to different OpenShift cluster end points at different stages of the life cycle.

Hybrid use case implementation

A common misconception of hybrid implementations is that there is a single cluster or control plan that has worker nodes in various locations. For example, there could be a cluster where the Master and infrastructure nodes are deployed in one location, but also worker nodes registered with this cluster that exist on-premises as well as in the cloud.

Having a single customer control plane for a hybrid implementation, even if technically possible, introduces undesired risks.

There is the potential to take multiple environments with very different resilience characteristics and make them interdependent of each other. This can result in performance and reliability issues, and these may increase not only the possibility of the risk manifesting, but also increase in the impact or blast radius.

Instead, hybrid implementations will see separate OpenShift clusters deployed into various locations. A customer may deploy clusters on-premises to cater for a workload that can’t be migrated to the cloud in the short term. Separate OpenShift clusters can then deployed into accounts in AWS for workloads on the cloud. Customers can also deploy separate OpenShift clusters in different AWS regions to cater for proximity to the consuming customer.

Though adding multiple clusters doesn’t add significant administrative overhead, there is a desire to be able to gain visibility and telemetry to all the deployed clusters from a central location. This may see the OpenShift clusters registered with Red Hat Advanced Cluster Manager for Kubernetes.

Summary

Take advantage of the IPI model, not only as a guide but to also save time. Make AWS Organizations, AWS Control Tower, and the AWS Service catalog part of your cloud and hybrid strategies. These will not only speed up migrations but also form building blocks for a modernized business with a focus of enabling prescriptive self-service. Consider Red Hat advanced cluster manager for multi cluster management.

Using Lambda layers to simplify your development process

2020-09-08 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-lambda-layers-to-simplify-your-development-process/

Serverless developers frequently import libraries and dependencies into their AWS Lambda functions. While you can zip these dependencies as part of the build and deployment process, in many cases it’s easier to use layers instead. In this post, I explain how layers work, and how you can build and include layers in your own applications.

This blog post references the Happy Path application, which shows how to build a flexible backend to a photo-processing web application. To learn more, refer to Using serverless backends to iterate quickly on web apps – part 1. This code in this post is available at this GitHub repo.

Overview of Lambda layers

A Lambda layer is an archive containing additional code, such as libraries, dependencies, or even custom runtimes. When you include a layer in a function, the contents are extracted to the /opt directory in the execution environment. You can include up to five layers per function, which count towards the standard Lambda deployment size limits.

Layers are deployed as immutable versions, and the version number increments each time you publish a new layer. When you include a layer in a function, you specify the layer version you want to use. Layers are automatically set as private, but they can be shared with other AWS accounts, or shared publicly. Permissions only apply to a single version of a layer.

Using layers can make it faster to deploy applications with the AWS Serverless Application Model (AWS SAM) or the Serverless framework. By moving runtime dependencies from your function code to a layer, this can help reduce the overall size of the archive uploaded during a deployment.

Creating a layer containing the AWS SDK

The AWS SDK allows you to interact programmatically with AWS services using one of the supported runtimes. The Lambda service includes the AWS SDK so you can use it without explicitly importing in your deployment package.

However, there is no guarantee of the version provided in the execution environment. The SDK is upgraded frequently to support new AWS services and features. As a result, the version may change at any time. You can see the current version used by Lambda by declaring an instance of the SDK and logging out the version method:

For production workloads, it’s best practice to lock the version of the AWS SDK used in your functions. You can achieve this by including the SDK with your code package. Once you include this library, your code always uses the version in the deployment package and not the version included in the Lambda service.

A serverless application may consist of many functions, which all use a common SDK version. Instead of bundling the SDK with each function deployment, you can create a layer containing the SDK. The effect of this is to reduce the size of the uploaded archive, which makes your deployments faster.

To create an AWS SDK layer:

First, clone this blog post’s GitHub repo. From a terminal window, execute:
git clone https://github.com/aws-samples/aws-lambda-layers-aws-sam-examples
cd ./aws-sdk-layer
This directory contains an AWS SAM template and Node.js package.json file. Install the package.json contents:
npm install
Create the layer directory defined in the AWS SAM template and the nodejs directory required by Lambda. Next, move the node_modules directory:
mkdir -p ./layer/nodejs
mv ./node_modules ./layer/nodejs
Next, deploy the AWS SAM template to create the layer:
sam deploy --guided
For the Stack name, enter “aws-sdk-layer”. Enter your preferred AWS Region and accept the other defaults.
After the deployment completes, the new Lambda layer is available to use. Run this command to see the available layers:aws lambda list-layers

After adding a layer to a function, you can use console.log to log out the AWS SDK version. This shows that the function is now using the SDK version in the layer instead of the version provided by the Lambda service:

Creating layers with OS-specific binaries

Many code libraries include binaries that are operating-system specific. When you build packages on your local development machine, by default the binaries for that operating system are used. These may not be the right binaries for Lambda, which runs on Amazon Linux. If you are not using a compatible operating system, you must ensure you include Linux binaries in the layer.

The simplest way to package these libraries correctly is to use AWS Cloud9. This is an IDE in the AWS Cloud, which runs on Amazon EC2. After creating an environment, you can clone a git repository directly to the local storage of the instance, and run the necessary build scripts.

The Happy Path application resizes images using the Sharp npm library. This library uses libvips, which is written in C, so the compilation is operating system-specific. By creating a layer containing this library, it simplifies the packaging and deployment of the consuming Lambda function.

To create a Sharp layer using AWS Cloud9:

Navigate to the AWS Cloud9 console.
Choose Create environment.
Enter the name “My IDE” and choose Next step.
Accept all the default and choose Next step.
Review the settings and choose Create environment.
In the terminal panel, enter:
git clone https://github.com/aws-samples/aws-lambda-layers-aws-sam-examples
cd ./aws-lambda-layers-aws-sam-examples/sharp-layer
npm install
From a terminal window, ensure you are in the directory where you cloned this post’s GitHub repo. Execute the following commands:cd ./sharp-layer
npm install
mkdir -p ./layer/nodejs
mv ./node_modules ./layer/nodejs
Next, deploy the AWS SAM template to create the layer:
sam deploy --guided
For the Stack name, enter “sharp-layer”. Enter your preferred AWS Region and accept the other defaults. After the deployment completes, the new Lambda layer is available to use.

In some runtimes, you can specify a local set of packages for development, and another set for production. For example, in Node.js, the package.json file allows you to specify two sections for dependencies. If your development machine uses a different operating system to Lambda, and therefore uses different binaries, you can use package.json to resolve this. In the Happy Path Resizer function, which uses the Sharp layer, the package.json refers to a local binary for development.

AWS SAM defines Lambda functions with the AWS::Serverless::Function resource. Layers are defined as a property of functions, as a list of layer ARNs including the version:

  MyLambdaFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: myFunction/
      Handler: app.handler
      MemorySize: 128
      Layers:
        - !Ref SharpLayerARN

Sharing a layer

Layers are private to your account by default but you can optionally share with other AWS accounts or make a layer public. You cannot share layers via the AWS Management Console but instead use the AWS CLI.

To share a layer, use add-layer-version-permission, specifying the layer name, version, AWS Region, and principal:

aws lambda add-layer-version-permission \
  --layer-name node-sharp \
  --principal '*' \
  --action lambda:GetLayerVersion \
  --version-number 3 
  --statement-id public 
  --region us-east-1

In the principal parameter, specify an individual account ID or use an asterisk to make the layer public. The CLI responds with a RevisionId containing the current revision of the policy:

You can check the permissions associated with a layer version by calling get-layer-version-policy with the layer name and version:

aws lambda get-layer-version-policy \
  --layer-name node-sharp \
  --version-number 3 \
  --region us-east-1

Similarly, you can delete permissions associated with a layer version by calling remove-layer-vesion-permission with the layer name, statement ID, and version:

aws lambda remove-layer-version-permission \
 -- layer-name node-sharp \
 -- statement-id public \
 -- version-number 3

Once the permissions are removed, calling get-layer-version-policy results in an error:

Conclusion

Lambda layers provide a convenient and effective way to package code libraries for sharing with Lambda functions in your account. Using layers can help reduce the size of uploaded archives and make it faster to deploy your code.

Layers can contain packages using OS-specific binaries, providing a convenient way to distribute these to developers. While layers are private by default, you can share with other accounts or make a layer public. Layers are published as immutable versions, and deleting a layer has no effect on deployed Lambda functions already using that layer.

To learn more about using Lambda layers, visit the documentation, or see how layers are used in the Happy Path web application.

Introducing the AWS Best Practices for Security, Identity, & Compliance Webpage and Customer Polling Feature

2020-09-04 Marta Taggart

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/introducing-aws-best-practices-security-identity-compliance-webpage-and-customer-polling-feature/

The AWS Security team has made it easier for you to find information and guidance on best practices for your cloud architecture. We’re pleased to share the Best Practices for Security, Identity, & Compliance webpage of the new AWS Architecture Center. Here you’ll find top recommendations for security design principles, workshops, and educational materials, and you can browse our full catalog of self-service content including blogs, whitepapers, videos, trainings, reference implementations, and more.

We’re also running polls on the new AWS Architecture Center to gather your feedback. Want to learn more about how to protect account access? Or are you looking for recommendations on how to improve your incident response capabilities? Let us know by completing the poll. We will use your answers to help guide security topics for upcoming content.

Poll topics will change periodically, so bookmark the Security, Identity, & Compliance webpage for easy access to future questions, or to submit your topic ideas at any time. Our first poll, which asks what areas of the Well-Architected Security Pillar are most important for your use, is available now. We look forward to hearing from you.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Option	Database management	Host	Encryption	Key management
1	Amazon managed	Amazon RDS	Amazon EBS	AWS KMS
2	Amazon managed	Amazon RDS	Amazon EBS	AWS KMS Custom Key Store
3	Customer managed	Amazon EC2	Amazon EBS	AWS KMS
4	Customer managed	Amazon EC2	Amazon EBS	AWS KMS Custom Key Store
5	Customer managed	Amazon EC2	Amazon EBS	LUKS
6	Customer managed	Amazon EC2	Database	Database TDE
7	Customer managed	Amazon EC2	Database	CloudHSM

Types of security issues CodeGuru Reviewer detects

Examples of new security findings

AWS API security best practices

Java crypto library best practices

Secure web applications

AWS Security best practices

Detecting issues deep in application code

Extending the value of CodeGuru Reviewer

Conclusion

About the author

Brian Farnhill

Wild Rydes

Pipes and filters

Figure 1: Using Step Functions’ Service Integrations for Amazon DynamoDB and Amazon SNS, you can further reduce the need for a custom AWS Lambda implementation to persist data to the database, or send a notification.

Conclusion

Overview

The scatter-gather pattern

Explaining the architecture and API

Submit instant ride RFQ

Processing the RFQ

Check RFQ status

After the RFQ is completed

Conclusion

How disaster recovery fits into a security incident response plan

How to use CloudEndure Disaster Recovery during a security incident

Test and maintain the recovery section of your incident response plan

How CloudEndure Disaster Recovery keeps your data secure

Summary

What role does automation play in DevOps?

A few helpful guidelines

DevOps automation best practices

1. Continuous integration, continuous delivery, and continuous deployment

2. Change management

3. ‘X’ as code

4. Continuous monitoring

Putting DevOps automation into action

Amazon S3 – Object storage

Temporary storage with /tmp

Lambda layers

Amazon EFS for Lambda

Comparing the different data storage options

Conclusion

The API management layer: Selecting the right API type

Content distribution layer: Optimizing assets

The data layer

Integration layer

Business logic layer

Conclusion

Single-tenancy vs. multi-tenancy

Solution overview

Prerequisites

Setting up the Git repository

Creating the artifact store encryption key

Creating an Amazon S3 artifact store and configuring a bucket policy

Creating a cross-account IAM role in each tenant account

Creating a deployment IAM role in each tenant account

Configuring an artifact store encryption key

Provisioning the CI/CD pipeline

Cleaning up

Conclusion

Rafael Ramos

Background

API Facade pattern

API Facade pattern on AWS serverless platform

Conclusion

Understanding transparent data encryption

Understanding risks

Using encryption to manage data at rest risks in AWS

Recommended Solutions

Option 1 – Using Amazon RDS with Amazon EBS encryption and key management provided by AWS KMS

Benefits

Challenges

Option 2 – Using Amazon RDS with Amazon EBS encryption and key management provided by AWS KMS custom key store

Benefits

Challenges

Option 3 – Customer-managed database platform hosted on Amazon EC2 with Amazon EBS encryption and key management provided by KMS

Benefits

Challenges

Option 4 – Customer-managed database platform hosted on Amazon EC2 with Amazon EBS encryption and key management provided by KMS custom key store

Benefits