All posts by James Beswick

Extending SaaS products with serverless functions

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/extending-saas-products-with-serverless-functions/

This post was written by Santiago Cardenas, Sr Partner SA. and Nir Mashkowski, Principal Product Manager.

Increasingly, customers turn to software as a service (SaaS) solutions for the potential of lowering the total cost of ownership (TCO). This enables customers to focus their teams on business priorities instead of managing and maintaining software and infrastructure. Startups are building SaaS products for a wide variety of common application types to take advantage of these market needs.

As SaaS accelerates adoption, enterprise customers expect the same capabilities that are available with traditional, on-premises software. They want the ability to customize system behavior and use rich integrations that can help build solutions rapidly.

For customization and extensibility, many independent software vendors (ISVs) are building application programming interfaces (APIs) and integration hooks. To extend these capabilities, many SaaS builders expose a common set of APIs:

  • Event APIs emit events when SaaS entities change. Synchronous event APIs block the SaaS action until the API completes a request. Asynchronous are non-blocking and use mechanisms like pub/sub and webhooks to inform the caller of updates. Event APIs are used for many purposes, such as enriching incoming data or triggering workflows.
  • CRUD APIs allow developers to interact with entities within the SaaS product. They can be used by mobile or web clients to add, update, and remove records, for example.
  • Schema APIs allow developers to create data entities in the SaaS product, such as tables, key-value stores, or document repositories.
  • User experience (UX) components. Many SaaS products include an SDK that helps provide a consistent look-and-feel and built-in support for common functions, such as authentication. Components are sometimes delivered as code libraries or as an online API that renders the UI.

Business systems expose different subsets of the APIs based on the application domain. Extensibility models are built on top of those APIs and can take various different forms. ISVs use these APIs to build features such as “no code” workflow engines, UX, and report generators. In those cases, the SaaS product runs a domain-specific language (DSL) where it controls compute, storage, and memory consumption.

Figure 1: Example of various APIs providing extensibility within a SaaS app

This level of customization is acceptable for many business users. However, for more sophisticated customization, this requires the ability to write custom code. When coding is needed, some business systems choose to provide sandboxing for the user code within the service. Others choose to ask developers to host the extensibility model themselves.

The growth of vendor-hosted SaaS extensions

First-generation SaaS products essentially “lift and shift” on-premises enterprise software, where each customer has a copy of the entire stack. This single tenant model offers simplicity, a smaller blast radius, and faster time to market.

Newer, born-in-the-cloud SaaS products implement a multi-tenant approach, where all resources are shared across customers. This model may be easier to maintain but can present challenges for handling security, isolation, and resource allocation.

Multi-tenancy challenges are harder when customers can run custom code inside the SaaS infrastructure. To solve this, SaaS builders may start with a customer hosted approach, where customers implement their own extensions by consuming SaaS APIs. This means customers must learn and install an SDK, deploy, and maintain an app in their cloud. This often results in higher cost and slower time to market.

To simplify this model, SaaS builders are finding ways to allow developers to write code directly within the SaaS product. The event driven, pay-per-execution, and polyglot nature of serverless functions provides new capabilities for implementing SaaS extensibility. This model is called vendor hosted SaaS extensions.

SaaS builders are using AWS Lambda for serverless functions to provide flexible compute options to their customers. The goal is to abstract away and simplify the consumption model. AWS provides SaaS builders with features and controls to customize the execution environments as part of their own SaaS product. This allows SaaS owners more flexibility when deciding on isolation models, usability, and cost considerations.

Isolating tenant requests

Isolation of customer requests is important both at the product level and at the tenant level. Product-level isolation focuses on controlling and enforcing the access to data between tenants. It ensures that one tenant is separated from another tenant’s functions. Tenant-level isolation focuses on resources allocated to serve requests. These may include identity, network and internet access, file system access, and memory/CPU allocation.

Figure 2: Example of hierarchical levels of abstraction

Usability

SaaS product owners can allow customers to use familiar programming languages within the serverless functions. This allows customers to grow with the service and potentially host and scale independently, using their own infrastructure.

Usability considers the domain and industry of the product. For example, if the SaaS product enables data processing, it may enable invocation of serverless functions during these workflows. Additionally, these functions may provide the customer the context of the user, application, tenant, and the domain. A streamlined, opinionated deployment workflow that abstracts away initial configuration can also aid customer adoption.

Managing costs

Cost is an important factor in driving adoption. It’s an important differentiator to pay only for the resources used, while being able to scale in response to events. This can help reduce costs that are passed on to SaaS customers.

Examples of SaaS product extensibility

Multiple AWS Partners are extending their SaaS product using Lambda for on-demand scalable compute. This enables them to focus on enriching the customer experience that is associated with their business domain. Examples include:

  • Segment Functions, which seamlessly integrates as a source or destination. The service uses code snippets to allow customers to enrich data, enforce consistency, and connect to APIs and services that power their workflows.
  • Freshworks’ Neo platform provides extensibility using the concept of apps. These are powered by Lambda functions hosting the core business logic and backends. Apps are triggered by unplanned and scheduled Freshworks events (customer support tickets, IT service cases, contacts, and deal updates), in addition to app-specific and external events.
  • Netlify Functions enables customers to supercharge frontend code with functions in their development workflow. These can power automated triggers, connect to third-party APIs, or provide user authentication.

All of these SaaS partners abstract away the deployment, versioning, and configuration of custom code using Lambda.

Conclusion

As customers increasingly use SaaS solutions in their businesses, they want the same customization and extensibility available in on-premises solutions. SaaS partners have developed APIs and integration hooks to help address this need. For more sophisticated customization, products enable custom code to run within their SaaS workflows.

This presents SaaS partners with isolation, usability, and cost challenges and many of them are now using serverless functions to address these challenges. Lambda provides a pay-per-value compute service that scales automatically to meet customer demand. Segment Functions, Freshworks, and Netlify Functions have all used Lambda to provide extensibility to their customers.

Lambda continues to develop features and functionality to power the extensibility of SaaS products. We look forward to seeing the new ways you use Lambda to extend your SaaS product for your customers. Share your Lambda extensibility story with us at [email protected].

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Application design – Scaling and concurrency: Part 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-lambda-application-design-scaling-and-concurrency-part-2/

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses application design for Lambda-based applications.

Part 1 shows how to work with Service Quotas, when to request increases, and architecting with quotas in mind. This post covers scaling and concurrency and the different behaviors of on-demand and Provisioned Concurrency.

Scaling and concurrency in Lambda

Lambda is engineered to provide managed scaling in a way that does not rely upon threading or any custom engineering in your code. As traffic increases, Lambda increases the number of concurrent executions of your functions.

When a function is first invoked, the Lambda service creates an instance of the function and runs the handler method to process the event. After completion, the function remains available for a period of time to process subsequent events. If other events arrive while the function is busy, Lambda creates more instances of the function to handle these requests concurrently.

For an initial burst of traffic, your cumulative concurrency in a Region can reach between 500 and 3000 per minute, depending upon the Region. After this initial burst, functions can scale by an additional 500 instances per minute. If requests arrive faster than a function can scale, or if a function reaches maximum capacity, additional requests fail with a throttling error (status code 429).

All AWS accounts start with a default concurrent limit of 1000 per Region. This is a soft limit that you can increase by submitting a request in the AWS Support Center.

On-demand scaling example

In this example, a Lambda receives 10,000 synchronous requests from Amazon API Gateway. The concurrency limit for the account is 10,000. The following shows four scenarios:

On-demand scaling example

In each case, all of the requests arrive at the same time in the minute they are scheduled:

  1. All requests arrive immediately: 3000 requests are handled by new execution environments; 7000 are throttled.
  2. Requests arrive over 2 minutes: 3000 requests are handled by new execution environments in the first minute; the remaining 2000 are throttled. In minute 2, another 500 environments are created and the 3000 original environments are reused; 1500 are throttled.
  3. Requests arrive over 3 minutes: 3000 requests are handled by new execution environments in the first minute; the remaining 333 are throttled. In minute 2, another 500 environments are created and the 3000 original environments are reused; all requests are served. In minute 3, the remaining 3334 requests are served by warm environments.
  4. Requests arrive over 4 minutes: In minute 1, 2500 requests are handled by new execution environment; the same environments are reused in subsequent minutes to serve all requests.

Provisioned Concurrency scaling example

The majority of Lambda workloads are asynchronous so the default scaling behavior provides a reasonable trade-off between throughput and configuration management overhead. However, for synchronous invocations from interactive workloads, such as web or mobile applications, there are times when you need more control over how many concurrent function instances are ready to receive traffic.

Provisioned Concurrency is a Lambda feature that prepares concurrent execution environments in advance of invocations. Consequently, this can be used to address two issues. First, if expected traffic arrives more quickly than the default burst capacity, Provisioned Concurrency can ensure that your function is available to meet the demand. Second, if you have latency-sensitive workloads that require predictable double-digit millisecond latency, Provisioned Concurrency solves the typical cold start issues associated with default scaling.

Provisioned Concurrency is a configuration available for a specific published version or alias of a Lambda function. It does not rely on any custom code or changes to a function’s logic, and it’s compatible with features such as VPC configuration, Lambda layers. For more information on how Provisioned Concurrency optimizes performance for Lambda-based applications, watch this Tech Talk video.

Using the same scenarios with 10,000 requests, the function is configured with a Provisioned Concurrency of 7,000:

Provisioned Concurrency scaling example

  1. In case #1, 7,000 requests are handled by the provisioned environments with no cold start. The remaining 3,000 requests are handled by new, on-demand execution environments.
  2. In cases #2-4, all requests are handled by provisioned environments in the minute when they arrive.

Using service integrations and asynchronous processing

Synchronous requests from services like API Gateway require immediate responses. In many cases, these workloads can be rearchitected as asynchronous workloads. In this case, API Gateway uses a service integration to persist messages in an Amazon SQS queue durably. A Lambda function consumes these messages from the queue, and updates the status in an Amazon DynamoDB table. Another API endpoint provides the status of the request by querying the DynamoDB table:

Async with polling example

API Gateway has a default throttle limit of 10,000 requests per second, which can be raised upon request. SQS standard queues support a virtually unlimited throughput of API actions such as SendMessage.

The Lambda function consuming the messages from SQS can control the speed of processing through a combination of two factors. The first is BatchSize, which is the number of messages received by each invocation of the function, and the second is concurrency. Provided there is still concurrency available in your account, the Lambda function is not throttled while processing messages from an SQS queue.

In asynchronous workflows, it’s not possible to pass the result of the function back through the invocation path. The original API Gateway call receives an acknowledgment that the message has been stored in SQS, which is returned back to the caller. There are multiple mechanisms for returning the result back to the caller. One uses a DynamoDB table, as shown, to store a transaction ID and status, which is then polled by the caller via another API Gateway endpoint until the work is finished. Alternatively, you can use webhooks via Amazon SNS or WebSockets via AWS IoT Core to return the result.

Using this asynchronous approach can make it much easier to handle unpredictable traffic with significant volumes. While it is not suitable for every use case, it can simplify scalability operations.

Reserved concurrency

Lambda functions in a single AWS account in one Region share the concurrency limit. If one function exceeds the concurrent limit, this prevents other functions from being invoked by the Lambda service. You can set reserved capacity for Lambda functions to ensure that they can be invoked even if the overall capacity has been exhausted. Reserved capacity has two effects on a Lambda function:

  1. The reserved capacity is deducted from the overall capacity for the AWS account in a given Region. The Lambda function always has the reserved capacity available exclusively for its own invocations.
  2. The reserved capacity restricts the maximum number of concurrency invocations for that function. Synchronous requests arriving in excess of the reserved capacity limit fail with a throttling error.

You can also use reserved capacity to throttle the rate of requests processed by your workload. For Lambda functions that are invoked asynchronously or using an internal poller, such as for S3, SQS, or DynamoDB integrations, reserved capacity limits how many requests are processed simultaneously. In this case, events are stored durably in internal queues until the Lambda function is available. This can help create a smoothing effect for handling spiky levels of traffic.

For example, a Lambda function receives messages from an SQS queue and writes to a DynamoDB table. It has a reserved concurrency of 10 with a batch size of 10 items. The SQS queue rapidly receives 1,000 messages. The Lambda function scales up to 10 concurrent instances, each processing 10 messages from the queue. While it takes longer to process the entire queue, this results in a consistent rate of write capacity units (WCUs) consumed by the DynamoDB table.

Reserved concurrency for throttling example

To learn more, read “Managing AWS Lambda Function Concurrency” and “Managing concurrency for a Lambda function”.

Conclusion

This post explains scaling and concurrency in Lambda and the different behaviors of on-demand and Provisioned Concurrency. It also shows how to use service integrations and asynchronous patterns in Lambda-based applications. Finally, I discuss how reserved concurrency works and how to use it in your application design.

Part 3 will discuss choosing and managing runtimes, networking and VPC configurations, and different invocation modes.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Application design and Service Quotas – Part 1

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-lambda-application-design-and-service-quotas-part-1/

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses application design for Lambda-based applications.

A well-architected event-driven application uses a combination of AWS services and custom code to process and manage requests and data. This series on Lambda-specific topics in application design, and how Lambda interacts with other services. There are many important considerations for serverless architects when designing applications for busy production systems.

Part 1 shows how to work with Service Quotas, when to request increases, and architecting with quotas in mind. It also explains how to control traffic for downstream server-based resources.

Understanding quotas

The Lambda service is designed for short-lived compute tasks that do not retain or rely upon state between invocations. The Lambda service invokes your custom code on demand in response to events from other AWS services, or requests via the AWS CLI or AWS SDKs. Code can run for up to 15 minutes in a single invocation and a single function can use up to 10,240 MB of memory.

Lambda is designed to scale rapidly to meet demand, allowing your functions to scale up to serve traffic in your application. Other AWS services frequently used in serverless applications, such as Amazon API Gateway, Amazon SNS, and AWS Step Functions, also scale up in response to increased load. This has enabled our largest customers to build applications that scale to millions of requests quickly without having to manage underlying infrastructure.

However, before you scale an application to these levels, it’s important to understand the guardrails that are put in place to protect your account and the workloads of other customers. Service Quotas exist in all AWS services and consist of hard limits, which you cannot change, and soft limits, which you can request increases for.

By default, all new accounts are assigned a quota profile that allows exploration of the services. However, the values may need to be raised to support medium-to-large application workloads. Typically, customers request increases for their accounts as they start to expand usage of their applications. This allows the quotas to grow with usage, and help protect the account from unexpected costs caused by unintended usage.

Different AWS services have different quotas. These quotas may apply at the Region level, or account level, and may also include time-interval restrictions, such as requests per second. For example, the maximum number of IAM roles is an account-based quota, whereas the maximum number of concurrent Lambda executions is a per-Region quota.

To see the quotas that apply to your account, navigate to the Service Quotas dashboard. This allows you to view your Service Quotas, request a service quota increase, and view current utilization. From here, you can drill down to a specific AWS service, such as Lambda:

Service Quotas for AWS Lambda

In this example, sorted by the Adjustable column, this shows that Concurrent executions, Elastic network interfaces per VPC, and Function and layer storage are all adjustable limits. You could request a quota increase for any of these via the AWS Support Center. The other items provide a useful reference for other limits applying to the service.

Architecting with Service Quotas

Most serverless applications use multiple AWS services, and different services have different quotas for different features. Once you have a serverless architecture designed and know which services your application uses, you can compare the different quotas across services and find any potential issues.

Example serverless application architecture

In this example, API Gateway has a default throttle limit of 10,000 requests per second. Many applications use API Gateway endpoints to invoke Lambda functions. Lambda has a default concurrency limit of 1,000. Since API Gateway to Lambda is a synchronous invocation, it’s possible to have more incoming requests than could be handled simultaneously by a Lambda function, when using the default limits. This can be resolved by requesting to have the Lambda concurrency limit raised for this account to match the expected level of traffic.

Another common challenge is handling payload sizes in different services. Consider an application moving a payload from API Gateway to Lambda to Amazon SQS. API Gateway supports payloads up to 10 Mb, while Lambda’s payload limit is 6 Mb and the SQS message size limit is 256 Kb. In this example, you could instead store the payload in an Amazon S3 bucket instead of uploading to API Gateway, and pass a reference token across the services. The token size is much smaller than any payload limit and may provide a more efficient design for your workload, depending upon the use-case.

Load testing your serverless application also allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be relatively simple to load test, thanks to the automatic scaling built into many of the services. During a load test, you can identify any quotas that may act as a limiting factor for the traffic levels you expect and take action accordingly.

There are several tools available for serverless developers to perform this task. One of the most popular is Artillery Community Edition, which is an open-source tool for testing serverless APIs. You configure the number of requests per second and overall test duration and it uses a headless Chromium browser to run tests. Other popular tools include Nordstrom’s Serverless-Artillery and Gatling.

Using multiple AWS accounts for managing quotas

Many customers have multiple workloads running in the AWS Cloud but many quotas are set at the account level. This means that as you add more serverless workloads, some quotas are shared across more workloads, reducing the quotas available for each workload. Additionally, if you have development resources in the same account as production workloads, quotas are shared across both. It’s possible for development activity to exhaust resources unintentionally that you may want to reserve only for production.

An effective way to solve this issue is to use multiple AWS accounts, dedicating workloads to their own specific account. This prevents quotas from being shared with other workloads or non-production resources. Using AWS Organizations, you can centrally manage the billing, compliance, and security of these accounts. You can attach policies to groups of accounts to avoid custom scripts and manual processes.

One common approach is to provide each developer with an AWS account, and then use separate accounts for a beta deployment stage and production:

Multiple AWS account by environment

The developer accounts can contain copies of production resources and provide the developer with admin-level permissions to these resources. Each developer has their own set of limits for the account, so their usage does not impact your production environment. Individual developers can deploy AWS CloudFormation stacks and AWS Serverless Application Model (AWS SAM) templates into these accounts with minimal risk to production assets.

This approach allows developers to test Lambda functions locally on their development machines against live cloud resources in their individual accounts. It can help create a robust unit testing process, and developers can then push code to a repository like AWS CodeCommit when ready.

By integrating with AWS Secrets Manager, you can store different sets of secrets in each environment and replace any need for credentials stored in code. As code is promoted from developer account through to the beta and production accounts, the correct set of credentials is automatically used. You do not need to share environment-level credentials with individual developers.

To learn more, read “Best practices for organizing larger serverless applications”.

Controlling traffic flow for server-based resources

While Lambda can scale up quickly in response to traffic, many non-serverless services cannot. If your Lambda functions interact with those services downstream, it’s possible to overwhelm those services with data or connection requests.

Amazon RDS is one of the most common Lambda integrations that relies on a server-based resource. However, relational databases are connection-based, so they are intended to work with a few long-lived clients, such as web servers. By contrast, Lambda functions are ephemeral and short-lived, so their database connections are numerous and brief. If Lambda scales up to hundreds or thousands of instances, you may overwhelm downstream relational databases with connection requests. This is typically only an issue for moderately busy applications. If you are using a Lambda function for low-volume tasks, such as running daily SQL reports, you do not experience this behavior.

The Amazon RDS Proxy service is built to solve the high-volume use-case. It pools the connections between the Lambda service and the downstream Amazon RDS database. This means that a scaling Lambda function is able to reuse connections via the proxy. As a result, the relational database is not overwhelmed with connections requests from individual Lambda functions. This does not require code changes in many cases. You only need to replace the database endpoint with the proxy endpoint in your Lambda function.

For other downstream server-based resources, APIs, or third-party services, it’s important to know the limits around connections, transactions, and data transfer. If your serverless workload has the capacity to overwhelm those resources, use an SQS queue to decouple the Lambda function from the target. This allows the server-based resource to process messages from the queue at a steady rate. The queue also durably stores the requests if the downstream resource becomes unavailable.

Conclusion

Lambda works with other AWS services to process and manage requests and data. This post explains how to understand and manage Service Quotas, when to request increases, and architecting with quotas in mind. It also explains how to control traffic for downstream server-based resources.

Part 2 of this series will discuss scaling and concurrency in Lambda and the different behaviors of on-demand and Provisioned Concurrency.

For more guidance, see the Operating Lambda: Understanding event-driven architectures series.

For more serverless learning resources, visit Serverless Land.

Building server-side rendering for React in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-server-side-rendering-for-react-in-aws-lambda/

This post is courtesy of Roman Boiko, Solutions Architect.

React is a popular front-end framework used to create single-page applications (SPAs). It is rendered and run on the client-side in the browser. However, for SEO or performance reasons, you may need to render parts of a React application on the server. This is where the server-side rendering (SSR) is useful.

This post introduces the concepts and demonstrates rendering a React application with AWS Lambda. To deploy this solution and to provision the AWS resources, I use the AWS Cloud Development Kit (CDK). This is an open-source framework, which helps you reduce the amount of code required to automate deployment.

Overview

This solution uses Amazon S3, Amazon CloudFront, Amazon API Gateway, AWS Lambda, and Lambda@Edge. It creates a fully serverless SSR implementation, which automatically scales according to the workload. This solution addresses three scenarios.

1. A static React app hosted in an S3 bucket with a CloudFront distribution in front of the website. The backend is running behind API Gateway, implemented as a Lambda function. Here, the application is fully downloaded to the client and rendered in a web browser. It sends requests to the backend.

SSR app 1

2. The React app is rendered with a Lambda function. The CloudFront distribution is configured to forward requests from the /ssr path to the API Gateway endpoint. This calls the Lambda function where the rendering is happening. While rendering the requested page, the Lambda function calls the backend API to fetch the data. It returns a static HTML page with all the data. This page may be cached in CloudFront to optimize subsequent requests.

SSR app 2

 

3. The React app is rendered with a Lambda@Edge function. This scenario is similar but rendering happens at edge locations. The requests to /edgessr are handled by the Lambda@Edge function. This sends requests to the backend and returns a static HTML page.

SSR app 3

 

Walkthrough

The example application shows how the preceding scenarios are implemented with the AWS CDK. This solution requires:

This solution deploys a Lambda@Edge function so it must be provisioned in the US East (N. Virginia) Region.

To get started, download and configure the sample:

  1. From a terminal, clone the GitHub repository:
    git clone https://github.com/aws-samples/react-ssr-lambda
  2. Provide a unique name for the S3 bucket, which is created by the stack and used for React application hosting. Change the placeholder <your bucket name> to your bucket name. To install the solution, run:
    cd react-ssr-lambda
    cd ./cdk
    npm install
    npm run build
    cdk bootstrap
    cdk deploy SSRApiStack --outputs-file ../simple-ssr/src/config.json
    
    cd ../simple-ssr
    npm install
    npm run build-all
    cd ../cdk
    cdk deploy SSRAppStack --parameters mySiteBucketName=<your bucket name>
  3. Note the following values from the output:
    • SSRAppStack.CFURL – the URL of the CloudFront distribution. Its root path returns the React application stored in S3.
    • SSRAppStack.LambdaSSRURL – the URL of the CloudFront /ssr distribution, which returns a page rendered by the Lambda function.
    • SSRAppStack.LambdaEdgeSSRURL – the URL of the CloudFront /edgessr distribution, which returns a page rendered by Lambda@Edge function.Stack outputs
  4. In a browser, open each of the URLs from step 3. You see the same page with a different footer, indicating how it is rendered.Comparing the served pages

Understanding the React app

The application is created by the create-react-app utility. You can run and test this application locally by navigating to the simple-ssr directory and running the npm start command.

This small application consists of two components that render the list of products received from the backend. The App.js file sends the request, parses the result, and passes it to the component.

import React, { useEffect, useState } from "react";
import ProductList from "./components/ProductList";
import config from "./config.json";
import axios from "axios";

const App = ({ isSSR, ssrData }) => {
  const [err, setErr] = useState(false);
  const [result, setResult] = useState({ loading: true, products: null });
  useEffect(() => {
    const getData = async () => {
      const url = config.SSRApiStack.apiurl;
      try {
        let result = await axios.get(url);
        setResult({ loading: false, products: result.data });
      } catch (error) {
        setErr(error);
      }
    };
    getData();
  }, []);
  if (err) {
    return <div>Error {err}</div>;
  } else {
    return (
      <div>
        <ProductList result={result} />
      </div>
    );
  }
};

export default App;

Adding server-side rendering

To support SSR, I change the preceding application using several Lambda functions with the implementation. As I change the way data is retrieved from the backend, I remove this code from App.js. Instead, the data is retrieved in the Lambda function and injected into the application during the rendering process.

The new file SSRApp.js reflects these changes:

import React, { useState } from "react";
import ProductList from "./components/ProductList";

const SSRApp = ({ data }) => {
  const [result, setResult] = useState({ loading: false, products: data });
  return (
    <div>
      <ProductList result={result} />
    </div>
  );
};

export default SSRApp;

Next, I implement SSR logic in the Lambda function. For simplicity, I use React’s built-in renderToString method, which returns an HTML string. You can find the corresponding file in the simple-ssr/src/server/index.js. The handler function fetches data from the backend, renders the React components, and injects them into the HTML template. It returns the response to API Gateway, which responds to the client.

const handler = async function (event) {
  try {
    const url = config.SSRApiStack.apiurl;
    const result = await axios.get(url);
    const app = ReactDOMServer.renderToString(<SSRApp data={result.data} />);
    const html = indexFile.replace(
      '<div id="root"></div>',
      `<div id="root">${app}</div>`
    );
    return {
      statusCode: 200,
      headers: { "Content-Type": "text/html" },
      body: html,
    };
  } catch (error) {
    console.log(`Error ${error.message}`);
    return `Error ${error}`;
  }
};

For rendering the same code on Lambda@Edge, I change the code to work with CloudFront events and also modify the response format. This function searches for a specific path (/edgessr). All other logic stays the same. You can view the full code at simple-ssr/src/edge/index.js:

const handler = async function (event) {
  try {
    const request = event.Records[0].cf.request;
    if (request.uri === "/edgessr") {
      const url = config.SSRApiStack.apiurl;
      const result = await axios.get(url);
      const app = ReactDOMServer.renderToString(<SSRApp data={result.data} />);
      const html = indexFile.replace(
        '<div id="root"></div>',
        `<div id="root">${app}</div>`
      );
      return {
        status: "200",
        statusDescription: "OK",
        headers: {
          "cache-control": [
            {
              key: "Cache-Control",
              value: "max-age=100",
            },
          ],
          "content-type": [
            {
              key: "Content-Type",
              value: "text/html",
            },
          ],
        },
        body: html,
      };
    } else {
      return request;
    }
  } catch (error) {
    console.log(`Error ${error.message}`);
    return `Error ${error}`;
  }
};

The create-react-app utility configures tools such as Babel and webpack for the client-side React application. However, it is not designed to work with SSR. To make the functions work as expected, I transpile these into CommonJS format in addition to transpiling React JSX files. The standard tool for this task is Babel. To add it to this project, I create the configuration file .babelrc.json with instructions to transpile the functions into Node.js v12 format:

{
  "presets": [
    [
      "@babel/preset-env",
      {
        "targets": {
          "node": 12
        }
      }
    ],
    "@babel/preset-react"
  ]
}

I also include all the dependencies. I use the popular frontend tool webpack, which also works with Lambda functions. It adds only the necessary dependencies and minimizes the package size. For this purpose, I create configurations for both functions. You can find these in the webpack.edge.js and webpack.server.js files:

const path = require("path");

module.exports = {
  entry: "./src/edge/index.js",

  target: "node",

  externals: [],

  output: {
    path: path.resolve("edge-build"),
    filename: "index.js",
    library: "index",
    libraryTarget: "umd",
  },

  module: {
    rules: [
      {
        test: /\.js$/,
        use: "babel-loader",
      },
      {
        test: /\.css$/,
        use: "css-loader",
      },
    ],
  },
};

The result of running webpack is one file for each build. I use these files to deploy the Lambda and Lambda@Edge functions. To automate the build process, I add several scripts to package.json.

"build-server": "webpack --config webpack.server.js --mode=development",
"build-edge": "webpack --config webpack.edge.js --mode=development",
"build-all": "npm-run-all --parallel build build-server build-edge"

Launch the build by running npm run build-all.

Deploying the application

After the application successfully builds, I deploy to the AWS Cloud. I use AWS CDK for an infrastructure as code approach. The code is located in cdk/lib/ssr-stack.ts.

First, I create an S3 bucket for storing the static content and I pass the name of the bucket as a parameter. To ensure only CloudFront can access my S3 bucket, I use an access identity configuration:

const mySiteBucketName = new CfnParameter(this, "mySiteBucketName", {
      type: "String",
      description: "The name of S3 bucket to upload react application"
    });

const mySiteBucket = new s3.Bucket(this, "ssr-site", {
      bucketName: mySiteBucketName.valueAsString,
      websiteIndexDocument: "index.html",
      websiteErrorDocument: "error.html",
      publicReadAccess: false,
      //only for demo not to use in production
      removalPolicy: cdk.RemovalPolicy.DESTROY
    });

new s3deploy.BucketDeployment(this, "Client-side React app", {
      sources: [s3deploy.Source.asset("../simple-ssr/build/")],
      destinationBucket: mySiteBucket
    });

const originAccessIdentity = new cloudfront.OriginAccessIdentity(
      this,
      "ssr-oia"
    );
    mySiteBucket.grantRead(originAccessIdentity);

I deploy the Lambda function from the build directory and configure an integration with API Gateway. I also note the API Gateway domain name for later use in the CloudFront distribution.

const ssrFunction = new lambda.Function(this, "ssrHandler", {
      runtime: lambda.Runtime.NODEJS_12_X,
      code: lambda.Code.fromAsset("../simple-ssr/server-build"),
      memorySize: 128,
      timeout: Duration.seconds(5),
      handler: "index.handler"
    });

const ssrApi = new apigw.LambdaRestApi(this, "ssrEndpoint", {
      handler: ssrFunction
    });

const apiDomainName = `${ssrApi.restApiId}.execute-api.${this.region}.amazonaws.com`;

I configure the Lambda@Edge function. It’s important to create a function version explicitly to use with CloudFront:

const ssrEdgeFunction = new lambda.Function(this, "ssrEdgeHandler", {
      runtime: lambda.Runtime.NODEJS_12_X,
      code: lambda.Code.fromAsset("../simple-ssr/edge-build"),
      memorySize: 128,
      timeout: Duration.seconds(5),
      handler: "index.handler"
    });

const ssrEdgeFunctionVersion = new lambda.Version(
      this,
      "ssrEdgeHandlerVersion",
      { lambda: ssrEdgeFunction }
    );

Finally, I configure the CloudFront distribution to communicate with all the origins:

const distribution = new cloudfront.CloudFrontWebDistribution(
      this,
      "ssr-cdn",
      {
        originConfigs: [
          {
            s3OriginSource: {
              s3BucketSource: mySiteBucket,
              originAccessIdentity: originAccessIdentity
            },
            behaviors: [
              {
                isDefaultBehavior: true,
                lambdaFunctionAssociations: [
                  {
                    eventType: cloudfront.LambdaEdgeEventType.ORIGIN_REQUEST,
                    lambdaFunction: ssrEdgeFunctionVersion
                  }
                ]
              }
            ]
          },
          {
            customOriginSource: {
              domainName: apiDomainName,
              originPath: "/prod",
              originProtocolPolicy: cloudfront.OriginProtocolPolicy.HTTPS_ONLY
            },
            behaviors: [
              {
                pathPattern: "/ssr"
              }
            ]
          }
        ]
      }
    );

The template is now ready for deployment. This approach allows you to use this code in your Continuous Integration and Continuous Delivery/Deployment (CI/CD) pipelines to automate deployments of your SSR applications. Also, you can create a CDK construct to reuse this code in different applications.

Cleaning up

To delete all the resources used in this solution, run:

cd react-ssr-lambda/cdk
cdk destroy SSRApiStack
cdk destroy SSRAppStack

Conclusion

This post demonstrates two ways you can implement and deploy a solution for server-side rendering in React applications, by using Lambda or Lambda@Edge.

It also shows how to use open-source tools and AWS CDK to automate the building and deployment of such applications.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Anti-patterns in event-driven architectures – Part 3

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-lambda-anti-patterns-in-event-driven-architectures-part-3/

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part section discusses event-driven architectures and how these relate to Lambda-based applications.

Part 1 covers the benefits of the event-driven paradigm and how it can improve throughput, scale, and extensibility. Part 2 explains some of the design principles and best practices that can help developers gain the benefits of building Lambda-based applications. This post explores anti-patterns in event-driven architectures.

Lambda is not a prescriptive service and provides broad functionality for you to build applications as needed. While this flexibility is important to customers, there are some designs that are technically functional but suboptimal from an architecture standpoint.

The Lambda monolith

In many applications migrated from traditional servers, Amazon EC2 instances or AWS Elastic Beanstalk applications, developers “lift and shift” existing code. Frequently, this results in a single Lambda function that contains all of the application logic that is triggered for all events. For a basic web application, for example, a monolithic Lambda function handles all Amazon API Gateway routes and integrates with all necessary downstream resources:

Monolithic Lambda application

This approach has several drawbacks:

  • Package size: The Lambda function may be much larger because it contains all possible code for all paths, which makes it slower for the Lambda service to download and run.
  • Harder to enforce least privilege: The function’s IAM role must grant permissions for all resources needed for all paths, making the permissions very broad. Many paths in the functional monolith do not need all the permissions that have been granted.
  • Harder to upgrade: In a production system, any upgrades to the single function are more risky and could cause the entire application to stop working. Upgrading a single path in the Lambda function is an upgrade to the entire function.
  • Harder to maintain: It’s more difficult to have multiple developers working on the service since it’s a monolithic code repository. It also increases the cognitive burden on developers and makes it harder to create appropriate test coverage for code.
  • Harder to reuse code: Typically, it can be harder to separate libraries from monoliths, making code reuse more difficult. As you develop and support more projects, this can make it harder to support the code and scale your team’s velocity.
  • Harder to test: As the lines of code increase, it becomes harder to unit all the possible combinations of inputs and entry points in the code base. It’s generally easier to implement unit testing for smaller services with less code.

The preferred alternative is to decompose the monolithic Lambda function into individual microservices, mapping a single Lambda function to a single, well-defined task. In this example web application with a few API endpoints, the resulting microservice-based architecture is based on the API routes.

Microservice architecture

The process of decomposing a monolith depends upon the complexity of your workload. Using strategies like the strangler pattern, you can migrate code from larger code bases to microservices. There are many potential benefits to running a Lambda-based application this way:

  • Package sizes can be optimized for only the code needed for a single task, which helps make the function more performant, and may reduce running cost.
  • IAM roles can be scoped to precisely the access needed by the microservice, making it easier to enforce the principles of least privilege. In controlling the blast radius, using IAM roles this way can give your application a stronger security posture.
  • Easier to upgrade: you can apply upgrades at a microservice level without impacting the entire workload. Upgrades occur at the functional level, not at the application level, and you can implement canary releases to control the rollout.
  • Easier to maintain: adding new features is usually easier when working with a single small service than a monolithic with significant coupling. Frequently, you implement features by adding new Lambda functions without modifying existing code.
  • Easier to reuse code: when you have specialized functions that perform a single task, it’s often easier to copy these across multiple projects. Building a library of generic specialized functions can help accelerate development in future projects.
  • Easier to test: unit testing is easier when there are few lines of code and the range of potential inputs for a function is smaller.
  • Lower cognitive load for developers since each development team has a smaller surface area of the application to understand. This can help accelerate onboarding for new developers.

To learn more, read “Decomposing the Monolith with Event Storming”.

Lambda as orchestrator

Many business workflows result in complex workflow logic, where the flow of operations depends on multiple factors. In an ecommerce example, a payments service is an example of a complex workflow:

  • A payment type may be cash, check, or credit card, all of which have different processes.
  • A credit card payment has many possible states, from successful to declined.
  • The service may need to issue refunds or credits for a portion or the entire amount.
  • A third-party service that processes credit cards may be unavailable due to an outage.
  • Some payments may take multiple days to process.

Implementing this logic in a Lambda function can result in ‘spaghetti code’ that’s different to read, understand, and maintain. It can also become fragile in production systems. The complexity is compounded if you must handle error handling, retry logic, and inputs and outputs processing. These types of orchestration functions are an anti-pattern in Lambda-based applications.

Instead, use AWS Step Functions to orchestrate these workflows using a versionable, JSON-defined state machine. State machines can handle nested workflow logic, errors, and retries. A workflow can also run for up to 1 year, and the service can maintain different versions of workflows, allowing you to upgrade production systems in place. Using this approach also results in less custom code, making an application easier to test and maintain.

While Step Functions is generally best-suited for workflows within a bounded context or microservice, to coordinate state changes across multiple services, instead use Amazon EventBridge. This is a serverless event bus that routes events based upon rules, and simplifies orchestration between microservices.

Recursive patterns that cause invocation loops

AWS services generate events that invoke Lambda functions, and Lambda functions can send messages to AWS services. Generally, the service or resource that invokes a Lambda function should be different to the service or resource that the function outputs to. Failure to manage this can result in invocation loops.

For example, a Lambda function writes an object to an Amazon S3 object, which in turn invokes the same Lambda function via a put event. The invocation causes a second object to be written to the bucket, which invokes the same Lambda function:

Event loops in Lambda-based applications

While the potential for infinite loops exists in most programming languages, this anti-pattern has the potential to consume more resources in serverless applications. Both Lambda and S3 automatically scale based upon traffic, so the loop may cause Lambda to scale to consume all available concurrency and S3 to continue to write objects and generate more events for Lambda. In this situation, you can press the “Throttle” button in the Lambda console to scale the function concurrency down to zero and break the recursion cycle.

This example uses S3 but the risk of recursive loops also exists in Amazon SNS, Amazon SQS, Amazon DynamoDB, and other services. In most cases, it is safer to separate the resources that produce and consume events from Lambda. However, if you need a Lambda function to write data back to the same resource that invoked the function, ensure that you:

  • Use a positive trigger: For example, an S3 object trigger may use a naming convention or meta tag that is only triggered on the first invocation. This prevents objects written from the Lambda function from invoking the same Lambda function again. See the S3-to-Lambda translation application for an example of this mechanism.
  • Use reserved concurrency: Setting the function’s reserved concurrency to a lower limit prevents the function from scaling concurrently beyond that limit. It does not prevent the recursion, but limits the resources consumed as a safety mechanism. This can be useful during the development and test phases.
  • Use Amazon CloudWatch monitoring and alarming: By setting an alarm on a function’s concurrency metric, you can receive alerts if the concurrency suddenly spikes and take appropriate action.

Lambda functions calling Lambda functions

Functions enable encapsulation and code reuse. Most programming languages support the concept of code synchronously calling functions within a code base. In this case, the caller waits until the function returns a response. This model does not generally adapt well to serverless development.

For example, consider a simple ecommerce application consisting of three Lambda functions that process an order:

Ecommerce example with three functions

In this case, the Create order function calls the Process payment function, which in turn calls the Create invoice function. While this synchronous flow may work within a single application on a server, it introduces several avoidable problems in a distributed serverless architecture:

  • Cost: With Lambda, you pay for the duration of an invocation. In this example, while the Create invoice functions runs, two other functions are also running in a wait state, shown in red on the diagram.
  • Error handling: In nested invocations, error handling can become more complex. Either errors are thrown to parent functions to handle at the top-level function, or functions require custom handling. For example, an error in Create invoice might require the Process payment function to reverse the charge, or it may instead retry the Create invoice process.
  • Tight coupling: Processing a payment typically takes longer than creating an invoice. In this model, the availability of the entire workflow is limited by the slowest function.
  • Scaling: The concurrency of all three functions must be equal. In a busy system, this uses more concurrency than would otherwise be needed.

In serverless applications, there are two common approaches to avoid this pattern. First, use an SQS queue between Lambda functions. If a downstream process is slower than an upstream process, the queue durably persists messages and decouples the two functions. In this example, the Create order function publishes a message to an SQS queue, and the Process payment function consumes messages from the queue.

The second approach is to use AWS Step Functions. For complex processes with multiple types of failure and retry logic, Step Functions can help reduce the amount of custom code needed to orchestrate the workflow. As a result, Step Functions orchestrates the work and robustly handles errors and retries, and the Lambda functions contain only business logic.

Synchronous waiting within a single Lambda function

Within a single Lambda, ensure that any potentially concurrent activities are not scheduled synchronously. For example, a Lambda function might write to an S3 bucket and then write to a DynamoDB table:

The wait states, shown in red in the diagram, are compounded because the activities are sequential. If the tasks are independent, they can be run in parallel, which results in the total wait time being set by the longest-running task.

Parallel tasks in Lambda functions

In cases where the second task depends on the completion of the first task, you may be able to reduce the total waiting time and the cost of execution by splitting the Lambda functions:

Splitting tasks over two functions

In this design, the first Lambda function responds immediately after putting the object to the S3 bucket. The S3 service invokes the second Lambda function, which then writes data to the DynamoDB table. This approach minimizes the total wait time in the Lambda function executions.

To learn more, read the “Serverless Applications Lens” from the AWS Well-Architected Framework.

Conclusion

This post discusses anti-patterns in event-driven architectures using Lambda. I show some of the issues when using monolithic Lambda functions or custom code to orchestrate workflows. I explain how to avoid recursive architectures that may cause invocation loops and why you should avoid calling functions from functions. I also explain different approaches to handling waiting in functions to minimize cost.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Design principles in event-driven architectures – Part 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-lambda-design-principles-in-event-driven-architectures-part-2/

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part section discusses event-driven architectures and how these relate to Lambda-based applications.

Part 1 covers the benefits of the event-driven paradigm and how it can improve throughput, scale and extensibility. This post explains some of the design principles and best practices that can help developers gain the benefits of building Lambda-based applications.

Overview

Many of the best practices that apply to software development and distributed systems also apply to serverless application development. The broad principles are consistent with the Well-Architected Framework. The overall goal is to develop workloads that are:

  • Reliable: offering your end users a high level of availability. AWS serverless services are reliable because they are also designed for failure.
  • Durable: providing storage options that meet the durability needs of your workload.
  • Secure: following best practices and using the tools provided to secure access to workloads and limit the blast radius, if any issues occur.
  • Performant: using computing resources efficiently and meeting the performance needs of your end users.
  • Cost-efficient: designing architectures that avoid unnecessary cost that can scale without overspending, and also be decommissioned, if necessary, without significant overhead.

When you develop Lambda-based applications, there are several important design principles that can help you build workloads that meet these goals. You may not apply every principle to every architecture and you have considerable flexibility in how you approach building with Lambda. However, they should guide you in general architecture decisions.

Use services instead of custom code

Serverless applications usually comprise several AWS services, integrated with custom code run in Lambda functions. While Lambda can be integrated with most AWS services, the services most commonly used in serverless applications are:

Category AWS service
Compute AWS Lambda
Data storage Amazon S3
Amazon DynamoDB
Amazon RDS
API Amazon API Gateway
Application integration Amazon EventBridge
Amazon SNS
Amazon SQS
Orchestration AWS Step Functions
Streaming data and analytics Amazon Kinesis Data Firehose

There are many well-established, common patterns in distributed architectures that you can build yourself or implement using AWS services. For most customers, there is little commercial value in investing time to develop these patterns from scratch. When your application needs one of these patterns, use the corresponding AWS service:

Pattern AWS service
Queue Amazon SQS
Event bus Amazon EventBridge
Publish/subscribe (fan-out) Amazon SNS
Orchestration AWS Step Functions
API Amazon API Gateway
Event streams Amazon Kinesis

These services are designed to integrate with Lambda and you can use infrastructure as code (IaC) to create and discard resources in the services. You can use any of these services via the AWS SDK without needing to install applications or configure servers. Becoming proficient with using these services via code in your Lambda functions is an important step to producing well-designed serverless applications.

Understanding the level of abstraction

The Lambda service limits your access to the underlying operating systems, hypervisors, and hardware running your Lambda functions. The service continuously improves and changes infrastructure to add features, reduce cost and make the service more performant. Your code should assume no knowledge of how Lambda is architected and assume no hardware affinity.

Similarly, the integration of other services with Lambda is managed by AWS with only a small number of configuration options exposed. For example, when API Gateway and Lambda interact, there is no concept of load balancing available since it is entirely managed by the services. You also have no direct control over which Availability Zones the services use when invoking functions at any point in time, or how and when Lambda execution environments are scaled up or destroyed.

This abstraction allows you to focus on the integration aspects of your application, the flow of data, and the business logic where your workload provides value to your end users. Allowing the services to manage the underlying mechanics helps you develop applications more quickly with less custom code to maintain.

Implementing statelessness in functions

When building Lambda functions, you should assume that the environment exists only for a single invocation. The function should initialize any required state when it is first started – for example, fetching a shopping cart from a DynamoDB table. It should commit any permanent data changes before exiting to a durable store such as S3, DynamoDB, or SQS. It should not rely on any existing data structures or temporary files, or any internal state that would be managed by multiple invocations (such as counters or other calculated, aggregate values).

Lambda provides an initializer before the handler where you can initialize database connections, libraries, and other resources. Since execution environments are reused where possible to improve performance, you can amortize the time taken to initialize these resources over multiple invocations. However, you should not store any variables or data used in the function within this global scope.

Lambda function design

Most architectures should prefer many, shorter functions over fewer, larger ones. Making Lambda functions highly specialized for your workload means that they are concise and generally result in shorter executions. The purpose of each function should be to handle the event passed into the function, with no knowledge or expectations of the overall workflow or volume of transactions. This makes the function agnostic to the source of the event with minimal coupling to other services.

Any global-scope constants that change infrequently should be implemented as environment variables to allow updates without deployments. Any secrets or sensitive information should be stored in AWS Systems Manager Parameter Store or AWS Secrets Manager and loaded by the function. Since these resources are account-specific, this allows you to create build pipelines across multiple accounts. The pipelines load the appropriate secrets per environment, without exposing these to developers or requiring any code changes.

Building for on-demand data instead of batches

Many traditional systems are designed to run periodically and process batches of transactions that have built up over time. For example, a banking application may run every hour to process ATM transactions into central ledgers. In Lambda-based applications, the custom processing should be triggered by every event, allowing the service to scale up concurrency as needed, to provide near-real time processing of transactions.

While you can run cron tasks in serverless applications by using scheduled expressions for rules in Amazon EventBridge, these should be used sparingly or as a last-resort. In any scheduled task that processes a batch, there is the potential for the volume of transactions to grow beyond what can be processed within the 15-minute Lambda timeout. If the limitations of external systems force you to use a scheduler, you should generally schedule for the shortest reasonable recurring time period.

For example, it’s not best practice to use a batch process that triggers a Lambda function to fetch a list of new S3 objects. This is because the service may receive more new objects in between batches than can be processed within a 15-minute Lambda function.

S3 fetch anti-pattern

Instead, the Lambda function should be invoked by the S3 service each time a new object is put into the S3 bucket. This approach is significantly more scalable and also invokes processing in near-real time.

S3 to Lambda events

Orchestrating workflows

Workflows that involve branching logic, different types of failure models and retry logic typically use an orchestrator to keep track of the state of the overall execution. Avoid using Lambda functions for this purpose, since it results in tightly coupled groups of functions and services and complex code handling routing and exceptions.

With AWS Step Functions, you use state machines to manage orchestration. This extracts the error handling, routing, and branching logic from your code, replacing it with state machines declared using JSON. Apart from making workflows more robust and observable, it allows you to add versioning to workflows and make the state machine a codified resource that you can add to a code repository.

It’s common for simpler workflows in Lambda functions to become more complex over time, and for developers to use a Lambda function to orchestrate the flow. When operating a production serverless application, it’s important to identify when this is happening, so you can migrate this logic to a state machine.

Developing for retries and failures

AWS serverless services, including Lambda, are fault-tolerant and designed to handle failures. In the case of Lambda, if a service invokes a Lambda function and there is a service disruption, Lambda invokes your function in a different Availability Zone. If your function throws an error, the Lambda service retries your function.

Since the same event may be received more than once, functions should be designed to be idempotent. This means that receiving the same event multiple times does not change the result beyond the first time the event was received.

For example, if a credit card transaction is attempted twice due to a retry, the Lambda function should process the payment on the first receipt. On the second retry, either the Lambda function should discard the event or the downstream service it uses should be idempotent.

A Lambda function implements idempotency typically by using a DynamoDB table to track recently processed identifiers to determine if the transaction has been handled previously. The DynamoDB table usually implements a Time To Live (TTL) value to expire items to limit the storage space used.

Idempotent microservice

For failures within the custom code of a Lambda function, the service offers a number of features to help preserve and retry the event, and provide monitoring to capture that the failure has occurred. Using these approaches can help you develop workloads that are resilient to failure and improve the durability of events as they are processed by Lambda functions.

Conclusion

This post discusses the design principles that can help you develop well-architected serverless applications. I explain why using services instead of code can help improve your application’s agility and scalability. I also show how statelessness and function design also contribute to good application architecture. I cover how using events instead of batches helps serverless development, and how to plan for retries and failures in your Lambda-based applications.

Part 3 of this series will look at common anti-patterns in event-driven architectures and how to avoid building these into your microservices.

For more serverless learning resources, visit Serverless Land.

Introducing message archiving and analytics for Amazon SNS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-message-archiving-and-analytics-for-amazon-sns/

This blog post is courtesy of Sebastian Caceres (AWS Consultant, DevOps), Otavio Ferreira (Sr. Manager, Amazon SNS), Prachi Sharma and Mary Gao (Software Engineers, Amazon SNS).

Today, we are announcing the release of a message delivery protocol for Amazon SNS based on Amazon Kinesis Data Firehose. This is a new way to integrate SNS with storage and analytics services, without writing custom code.

SNS provides topics for push-based, many-to-many pub/sub messaging to help you decouple distributed systems, microservices, and event-driven serverless applications. As applications grow, so does the need to archive messages to meet compliance goals. These archives can also provide important operational and business insights.

Previously, custom code was required to create data pipelines, using general-purpose SNS subscription endpoints, such as Amazon SQS queues or AWS Lambda functions. You had to manage data transformation, data buffering, data compression, and the upload to data stores.

Overview

With the new native integration between SNS and Kinesis Data Firehose, you can send messages to storage and analytics services, using a purpose-built SNS subscription type.

Once you configure a subscription, messages published to the SNS topic are sent to the subscribed Kinesis Data Firehose delivery stream. The messages are then delivered to the destination endpoint configured in the delivery stream, which can be an Amazon S3 bucket, an Amazon Redshift table, or an Amazon Elasticsearch Service index.

You can also use a third-party service provider as the destination of a delivery stream, including Datadog, New Relic, MongoDB, and Splunk. No custom code is required to bridge the services. For more information, see Fanout to Kinesis Data Firehose streams, in the SNS Developer Guide.

Amazon SNS subscriber types with Amazon Kinesis Data Firehose.

The new Kinesis Data Firehose subscription type and its destinations are part of the application-to-application (A2A) messaging offering of SNS. The addition of this subscription type expands the SNS A2A offering to include the following use cases:

  • Run analytics on SNS messages, using Amazon Kinesis Data Analytics, Amazon Elasticsearch Service, or Amazon Redshift as a delivery stream destination. You can use this option to gain insights and detect anomalies in workloads.
  • Index and search SNS messages, using Amazon Elasticsearch Service as a delivery stream destination. From there, you can create dashboards using Kibana, a data visualization and exploration tool.
  • Store SNS messages for backup and auditing purposes, using S3 as a destination of choice. You can then use Amazon Athena to query the S3 bucket for analytics purposes.
  • Apply transformation to SNS messages. For example, you may obfuscate personally identifiable information (PII) or protected health information (PHI) using a Lambda function invoked by the delivery stream.
  • Feed SNS messages into cloud-based application monitoring and observability tools, using Datadog, New Relic, or Splunk as a destination. You can choose this option to enrich DevOps or marketing workflows.

As with all supported message delivery protocols, you can filter, monitor, and encrypt messages.

To simplify architecture and further avoid custom code, you can use an SNS subscription filter policy. This enables you to route only the relevant subset of SNS messages to the Kinesis Data Firehose delivery stream. For more information, see SNS message filtering.

To monitor the throughput, you can check the NumberOfMessagesPublished and the NumberOfNotificationsDelivered metrics for SNS, and the IncomingBytes, IncomingRecords, DeliveryToS3.Records and DeliveryToS3.Success metrics for Kinesis Data Firehose. For additional information, see Monitoring SNS topics using CloudWatch and Monitoring Kinesis Data Firehose using CloudWatch.

For security purposes, you can choose to have data encrypted at rest, using server-side encryption (SSE), in addition to encrypted in transit using HTTPS. For more information, see SNS SSE, Kinesis Data Firehose SSE, and S3 SSE.

Applying SNS message archiving and analytics in a use case

For example, consider an airline ticketing platform that operates in a regulated environment. The compliance framework requires that the company archives all ticket sales for at least 5 years.

Example architecture of a flight ticket selling platform.

The platform is based on an event-driven serverless architecture. It has a ticket seller Lambda function that publishes an event to an SNS topic for every ticket sold. The SNS topic fans out the event to subscribed systems that are interested in processing this type of event. In the preceding diagram, two systems are interested: one focused on payment processing, and another on fraud control. Each subscribed system is invoked by an SQS queue and an event processing Lambda function.

To meet the compliance goal on data retention, the airline company subscribes a Kinesis Data Firehose delivery stream to their existing SNS topic. They use an S3 bucket as the stream destination. After this, all events published to the SNS topic are archived in the S3 bucket.

The company can then use Athena to query the S3 bucket with standard SQL to run analytics and gain insights on ticket sales. For example, they can query for the most popular flight destinations or the most frequent flyers.

Subscribing a Kinesis Data Firehose stream to an SNS topic

You can set up a Kinesis Data Firehose subscription to an SNS topic using the AWS Management Console, the AWS CLI, or the AWS SDKs. You can also use AWS CloudFormation to automate the provisioning of these resources.

We use CloudFormation for this example. The provided CloudFormation template creates the following resources:

  • An SNS topic
  • An S3 bucket
  • A Kinesis Data Firehose delivery stream
  • A Kinesis Data Firehose subscription in SNS
  • Two SQS subscriptions in SNS
  • Two IAM roles with access to deliver messages:
    • From SNS to Kinesis Data Firehose
    • From Kinesis Data Firehose to S3

To provision the infrastructure, use the following template:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Template for creating an SNS archiving use case
Resources:
  ticketUploadStream:
    DependsOn:
    - ticketUploadStreamRolePolicy
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      S3DestinationConfiguration:
        BucketARN: !Sub 'arn:${AWS::Partition}:s3:::${ticketArchiveBucket}'
        BufferingHints:
          IntervalInSeconds: 60
          SizeInMBs: 1
        CompressionFormat: UNCOMPRESSED
        RoleARN: !GetAtt ticketUploadStreamRole.Arn
  ticketArchiveBucket:
    Type: AWS::S3::Bucket
  ticketTopic:
    Type: AWS::SNS::Topic
  ticketPaymentQueue:
    Type: AWS::SQS::Queue
  ticketFraudQueue:
    Type: AWS::SQS::Queue
  ticketQueuePolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
          Effect: Allow
          Principal:
            Service: sns.amazonaws.com
          Action:
            - sqs:SendMessage
          Resource: '*'
          Condition:
            ArnEquals:
              aws:SourceArn: !Ref ticketTopic
      Queues:
        - !Ref ticketPaymentQueue
        - !Ref ticketFraudQueue
  ticketUploadStreamSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketUploadStream.Arn
      Protocol: firehose
      SubscriptionRoleArn: !GetAtt ticketUploadStreamSubscriptionRole.Arn
  ticketPaymentQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketPaymentQueue.Arn
      Protocol: sqs
  ticketFraudQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketFraudQueue.Arn
      Protocol: sqs
  ticketUploadStreamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service: firehose.amazonaws.com
          Action: sts:AssumeRole
  ticketUploadStreamRolePolicy:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: FirehoseticketUploadStreamRolePolicy
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Action:
          - s3:AbortMultipartUpload
          - s3:GetBucketLocation
          - s3:GetObject
          - s3:ListBucket
          - s3:ListBucketMultipartUploads
          - s3:PutObject
          Resource:
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}'
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}/*'
      Roles:
      - !Ref ticketUploadStreamRole
  ticketUploadStreamSubscriptionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - sns.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: SNSKinesisFirehoseAccessPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - firehose:DescribeDeliveryStream
            - firehose:ListDeliveryStreams
            - firehose:ListTagsForDeliveryStream
            - firehose:PutRecord
            - firehose:PutRecordBatch
            Effect: Allow
            Resource:
            - !GetAtt ticketUploadStream.Arn

To test, publish a message to the SNS topic. After the delivery stream buffer interval of 60 seconds, the message appears in the destination S3 bucket. For information on message formats, see Amazon SNS message formats in Amazon Kinesis Data Firehose destinations.

Cleaning up

After testing, avoid incurring usage charges by deleting the resources you created during the walkthrough. If you used the CloudFormation template, delete all the objects from the S3 bucket before deleting the stack.

Conclusion

In this post, we show how SNS delivery to Kinesis Data Firehose enables you to integrate SNS with storage and analytics services. The example shows how to create an SNS subscription to use a Kinesis Data Firehose delivery stream to store SNS messages in an S3 bucket.

You can adapt this configuration for your needs for storage, encryption, data transformation, and data pipeline architecture. For more information, see Fanout to Kinesis Data Firehose streams in the SNS Developer Guide.

For details on pricing, see SNS pricing and Kinesis Data Firehose pricing. For more serverless learning resources, visit Serverless Land.

Operating Lambda: Understanding event-driven architecture – Part 1

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-lambda-understanding-event-driven-architecture-part-1/

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses event-driven architectures and how these relate to serverless applications.

Part 1 covers the benefits of the event-driven paradigm and how it can improve throughput, scale and extensibility, while also reducing complexity and the overall amount of code in an application.

Event-driven architectures have grown in popularity because they help address some of the inherent challenges in building the complex systems commonly used in modern organizations. This approach promotes the use of microservices, which are small, specialized services performing a narrow set of functions. A well-designed, Lambda-based application is compatible with the principles of microservice architectures.

How Lambda fits into the event-driven paradigm

Lambda is an on-demand compute service that runs custom code in response to events. Most AWS services generate events, and many can act as an event source for Lambda. Within Lambda, your code is stored in a code deployment package and contains an event handler. All interaction with the code occurs through the Lambda API and there is no direct invocation of functions from outside of the service. The main purpose of Lambda functions is to process events.

Lambda API triggers function code

Unlike traditional servers, Lambda functions do not run constantly. When a function is triggered by an event, this is called an invocation. Lambda functions are purposefully limited to 15 minutes in duration but on average, across all AWS customers, most invocations only last for less than a second. In some intensive compute operations, it may take several minutes to process a single event but in the majority of cases the duration is brief.

An event triggering a Lambda function could be almost anything, from an HTTP request via Amazon API Gateway, a schedule managed by an Amazon EventBridge rule, or an Amazon S3 notification. Even the simplest Lambda-based application uses at least one event.

Different Lambda event sources

The event itself is a JSON object that contains information about what happened. Events are facts about a change in the system state, they are immutable, and the time when they happen is significant. The first parameter of every Lambda handler contains the event. An event could be custom-generated from another microservice, such as new order generated in an ecommerce application:

Defining a console test event

The event may also be generated by an AWS service, such as Amazon SQS when a new message is available in a queue:

SQS test event

In either case, the event is passed to the Lambda function as the first parameter in the Lambda handler:

INIT code and event handler

  1. The code outside of the handler, also known as “INIT” code, is run before the handler. This is used for tasks like importing libraries or declaring and initializing global objects.
  2. The handler itself is a function that takes the event object. Regardless of runtime used in the Lambda function, the event is a JSON object.

For smaller applications, the difference between event-driven and request-driven applications may not be clear. As your applications develop more functionality and handle more traffic, this becomes more apparent. Request-driven applications typically use directed commands to coordinate downstream functions to complete an activity and are often tightly coupled. Event-driven applications create events that are observable by other services and systems, but the event producer is unaware of which consumers, if any, are listening. Typically, these are loosely coupled.

Most Lambda-based applications use a combination of AWS services for durably storing data and integrating with other system and services. In these applications, Lambda acts as glue between the services, providing business logic to transform data as it moves between services.

Grouping AWS serverless services into layers

Building Lambda-based applications follows many of the best practices of building any event-based architecture. A number of development approaches have emerged to help developers create event-driven systems. Event storming, which is an interactive approach to domain-driven design (DDD), is one popular methodology. As you explore the events in your workload, you can group these as bounded contexts to develop the boundaries of the microservices in your application.

To learn more about event-driven architectures, read “What is an Event-Driven Architecture?” and “What do you mean by Event-Driven?

The benefits of event-driven architectures

Replacing polling and webhooks with events

Many traditional architectures frequently use polling and webhook mechanisms to communicate state between different components. Polling can be highly inefficient for fetching updates since there is a lag between new data becoming available and synchronization with downstream services. Webhooks are not always supported by other microservices that you want to integrate with. They may also require custom authorization and authentication configurations. In both cases, these integration methods are challenging to scale on-demand without additional work by development teams.

Polling and webhooks

Both of these mechanisms can be replaced by events, which can be filtered, routed, and pushed downstream to consuming microservices. This approach can result in less bandwidth consumption, CPU utilization, and potentially lower cost. These architectures can reduce complexity, since each functional unit is smaller and there is often less code.

Event communication

Event-driven architectures can also make it easier to design near-real-time systems, helping organizations move away from batch-based processing. Events are generated at the time when state in the application changes, so the custom code of a microservice should be designed to handle the processing of a single event. Since scaling is handled by the Lambda service, this architecture can handle significant increases in traffic without changing custom code. As events scale up, so does the compute layer that processes events.

Reducing complexity

Microservices enable developers and architects to decompose complex workflows. For example, an ecommerce monolith may be broken down into order acceptance and payment processes with separate inventory, fulfillment and accounting services. What may be complex to manage and orchestrate in a monolith becomes a series of decoupled services that communicate asynchronously with event messages.

Ecommerce microservices example

This approach also makes it possible to assemble services that process data at different rates. In this case, an order acceptance microservice can store high volumes of incoming orders by buffering the messages in an Amazon SQS queue.

A payment processing service, which is typically slower due to the complexity of handling payments, can take a steady stream of messages from the SQS queue. It can orchestrate complex retry and error handling logic using AWS Step Functions, and coordinate active payment workflows for hundreds of thousands of orders.

Improving scalability and extensibility

Microservices generate events that are typically published to messaging services like Amazon SNS and SQS. These behave like an elastic buffer between microservices and help handle scaling when traffic increases. Services like EventBridge can then filter and route messages depending upon the content of the event, as defined in rules. As a result, event-based applications can be more scalable and offer greater redundancy than monolithic applications.

This system is also highly extensible, allowing other teams to extend features and add functionality without impacting the order processing and payment processing microservices. By publishing events using EventBridge, this application integrates with existing systems, such as the inventory microservice, but also enables any future application to integrate as an event consumer. Producers of events have no knowledge of event consumers, which can help simplify the microservice logic.

To learn more, read “How event-driven architecture solves modern web app problems” and “How to Use Amazon EventBridge to Build Decoupled, Event-Driven Architectures”.

Trade-offs of event-driven architectures

Variable latency

Unlike monolithic applications, which may process everything within the same memory space on a single device, event-driven applications communicate across networks. This design introduces variable latency. While it’s possible to engineer applications to minimize latency, monolithic applications can almost always be optimized for lower latency at the expense of scalability and availability.

The serverless services in AWS are highly available, meaning that they operate in more than one Availability Zone in a Region. In the event of a service disruption, services automatically fail over to alternative Availability Zones and retry transactions. As a result, instead of a transaction failing, it may be completed successfully but with higher latency.

Workloads that require consistent low-latency performance, such as high-frequency trading applications in banks or submillisecond robotics automation in warehouses, are not good candidates for event-driven architecture.

Eventual consistency

An event represents a change in state. With many events flowing through different services in an architecture at any given point of time, such workloads are often eventually consistent. This makes it more complex to process transactions, handle duplicates, or determine the exact overall state of a system.

Some workloads are not well suited for event-driven architecture, due to the need for ACID properties. However, many workloads contain a combination of requirements that are eventually consistent (for example, total orders in the current hour) or strongly consistent (for example, current inventory). For those features needing strong data consistency, there are architecture patterns to support this.

Event-based architectures are designed around individual events instead of large batches of data. Generally, workflows are designed to manage the steps of an individual event or execution flow instead of operating on multiple events simultaneously. Real-time event processing is preferred to batch processing in event-driven systems, replacing a batch with many small incremental updates. While this can make workloads more available and scalable, it also makes it more challenging for events to have awareness of other events.

Returning values to callers

In many cases, event-based applications are asynchronous. This means that caller services do not wait for requests from other services before continuing with other work. This is a fundamental characteristic of event-driven architectures that enables scalability and flexibility. This means that passing return values or the result of a workflow is often more complex than in synchronous execution flows.

Most Lambda invocations in productions systems are asynchronous, responding to events from services like S3 or SQS. In these cases, the success or failure of processing an event is often more important than returning a value. Features such as dead letter queues (DLQs) in Lambda are provided to ensure you can identify and retry failed events, without needing to notify the caller.

For interactive workloads, such as web and mobile applications, the end user usually expects to receive a return value or a current status of a transaction. For these workloads, there are several design patterns that can provide rich eventing back to the caller. However, these implementations are more complex than using a traditional asynchronous return value.

Debugging across services and functions

Debugging event-driven systems is also different to solving problems with a monolithic application. With different systems and services passing events, it is often not possible to record and reproduce the exact state of multiple services when an error occurs. Since each service and function invocation has separate log files, it can be more complicated to determine what happened to a specific event that caused an error.

To learn more, read “Challenges with distributed systems” and “Implementing Microservices on AWS”.

Conclusion

Event-driven architectures have grown in popularity in modern organizations. This approach promotes the use of microservices, which can be designed as Lambda-based applications. This post discusses the benefits of the event-driven approach, along with the trade-offs involved.

Part 2 of this series will discuss design principles and the best practices for developing Lambda-based applications.

Discovering sensitive data in AWS CodeCommit with AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/discovering-sensitive-data-in-aws-codecommit-with-aws-lambda-2/

This post is courtesy of Markus Ziller, Solutions Architect.

Today, git is a de facto standard for version control in modern software engineering. The workflows enabled by git’s branching capabilities are a major reason for this. However, with git’s distributed nature, it can be difficult to reliably remove changes that have been committed from all copies of the repository. This is problematic when secrets such as API keys have been accidentally committed into version control. The longer it takes to identify and remove secrets from git, the more likely that the secret has been checked out by another user.

This post shows a solution that automatically identifies credentials pushed to AWS CodeCommit in near-real-time. I also show three remediation measures that you can use to reduce the impact of secrets pushed into CodeCommit:

  • Notify users about the leaked credentials.
  • Lock the repository for non-admins.
  • Hard reset the CodeCommit repository to a healthy state.

I use the AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

Overview of solution

The services in this solution are AWS Lambda, AWS CodeCommit, Amazon EventBridge, and Amazon SNS. These services are part of the AWS serverless platform. They help reduce undifferentiated work around managing servers, infrastructure, and the parts of the application that add less value to your customers. With serverless, the solution scales automatically, has built-in high availability, and you only pay for the resources you use.

Solution architecture

This diagram outlines the workflow implemented in this blog:

  1. After a developer pushes changes to CodeCommit, it emits an event to an event bus.
  2. A rule defined on the event bus routes this event to a Lambda function.
  3. The Lambda function uses the AWS SDK for JavaScript to get the changes introduced by commits pushed to the repository.
  4. It analyzes the changes for secrets. If secrets are found, it publishes another event to the event bus.
  5. Rules associated with this event type then trigger invocations of three Lambda functions A, B, and C with information about the problematic changes.
  6. Each of the Lambda functions runs a remediation measure:
    • Function A sends out a notification to an SNS topic that informs users about the situation (A1).
    • Function B locks the repository by setting a tag with the AWS SDK (B2). It sends out a notification about this action (B2).
    • Function C runs git commands that remove the problematic commit from the CodeCommit repository (C2). It also sends out a notification (C1).

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

Checkout and deploy the sample stack:

  1. After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
    git clone [email protected]:aws-samples/discover-sensitive-data-in-aws-codecommit-with-aws-lambda.git
  2. Open the repository in a local editor and review the contents of cdk/lib/resources.ts, src/handlers/commits.ts, and src/handlers/remediations.ts.
  3. Follow the instructions in the README.md to deploy the stack.

The CDK will deploy resources for the following services in your account.

Using CodeCommit to manage your git repositories

The CDK creates a new empty repository called TestRepository and adds a tag RepoState with an initial value of ok. You later use this tag in the LockRepo remediation strategy to restrict access.

It also creates two IAM groups with one user in each. Members of the CodeCommitSuperUsers group are always able to access the repository, while members of the CodeCommitUsers group can only access the repository when the value of the tag RepoState is not locked.

I also import the CodeCommitSystemUser into the CDK. Since the user requires git credentials in a downloaded CSV file, it cannot be created by the CDK. Instead it must be created as described in the README file.

The following CDK code sets up all the described resources:

const TAG_NAME = "RepoState";

const superUsers = new Group(this, "CodeCommitSuperUsers", { groupName: "CodeCommitSuperUsers" });
superUsers.addUser(new User(this, "CodeCommitSuperUserA", {
    password: new Secret(this, "CodeCommitSuperUserPassword").secretValue,
    userName: "CodeCommitSuperUserA"
}));

const users = new Group(this, "CodeCommitUsers", { groupName: "CodeCommitUsers" });
users.addUser(new User(this, "User", {
    password: new Secret(this, "CodeCommitUserPassword").secretValue,
    userName: "CodeCommitUserA"
}));

const systemUser = User.fromUserName(this, "CodeCommitSystemUser", props.codeCommitSystemUserName);

const repo = new Repository(this, "Repository", {
    repositoryName: "TestRepository",
    description: "The repository to test this project out",
});
Tags.of(repo).add(TAG_NAME, "ok");

users.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn],
    conditions: {
        StringNotEquals: {
            [`aws:ResourceTag/${TAG_NAME}`]: "locked"
        }
    }
}));

superUsers.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn]
}));

Using EventBridge to pass events between components

I use EventBridge, a serverless event bus, to connect the Lambda functions together. Many AWS services like CodeCommit are natively integrated into EventBridge and publish events about changes in their environment.

repo.onCommit is a higher-level CDK construct. It creates the required resources to invoke a Lambda function for every commit to a given repository. The created events rule looks like this:

EventBridge rule definition

Note that this event rule only matches commit events in TestRepository. To send commits of all repositories in that account to the inspecting Lambda function, remove the resources filter in the event pattern.

CodeCommit Repository State Change is a default event that is published by CodeCommit if changes are made to a repository. In addition, I define CodeCommit Security Event, a custom event, which Lambda publishes to the same event bus if secrets are discovered in the inspected code.

The sample below shows how you can set up Lambda functions as targets for both type of events.

const DETAIL_TYPE = "CodeCommit Security Event";
const eventBus = new EventBus(this, "CodeCommitEventBus", {
    eventBusName: "CodeCommitSecurityEvents"
});

repo.onCommit("AnyCommitEvent", {
    ruleName: "CallLambdaOnAnyCodeCommitEvent",
    target: new targets.LambdaFunction(commitInspectLambda)
});


new Rule(this, "CodeCommitSecurityEvent", {
    eventBus,
    enabled: true,
    ruleName: "CodeCommitSecurityEventRule",
    eventPattern: {
        detailType: [DETAIL_TYPE]
    },
    targets: [
        new targets.LambdaFunction(lockRepositoryLambda),
        new targets.LambdaFunction(raiseAlertLambda),
        new targets.LambdaFunction(forcefulRevertLambda)
    ]
});

Using Lambda functions to run remediation measures

AWS Lambda functions allow you to run code in response to events. The example defines four Lambda functions.

By comparing the delta to its predecessor, the commitInspectLambda function analyzes if secrets are introduced by a commit. With the CDK, you can create a Lambda function with:

const myLambdaInCDK = new Function(this, "UniqueIdentifierRequiredByCDK", {
    runtime: Runtime.NODEJS_12_X,
    handler: "<handlerfile>.<function name>",
    code: Code.fromAsset(path.join(__dirname, "..", "..", "src", "handlers")),
    // See git repository for complete code
});

The code for this Lambda function uses the AWS SDK for JavaScript to fetch the details of the commit, the differences introduced, and the new content.

The code checks each modified file line by line with a regular expression that matches typical secret formats. In src/handlers/regex.json, I provide a few regular expressions that match common secrets. You can extend this with your own patterns.

If a secret is discovered, a CodeCommit Security Event is published to the event bus. EventBridge then invokes all Lambda functions that are registered as targets with this event. This demo triggers three remediation measures.

The raiseAlertLambda function uses the AWS SDK for JavaScript to send out a notification to all subscribers (that is, CodeCommit administrators) on an SNS topic. It takes no further action.

SNS.publish({
    TopicArn: <TOPIC_ARN>,
    Subject: `[ACTION REQUIRED] Secrets discovered in <repo>`
    Message: `<Your message>
}

Notification about secrets discovered in a commit in TestRepository

The lockRepositoryLambda function uses the AWS SDK for JavaScript to change the RepoState tag from ok to locked. This restricts access to members of the CodeCommitSuperUsers IAM group.

CodeCommit.tagResource({
    resourceArn: event.detail.repositoryArn,
    tags: {
        RepoState: "locked"
    }
})

In addition, the Lambda function uses SNS to send out a notification. The forcefulRevertLambda function runs the following git commands:

git clone <repository>
git checkout <branch>
git reset –hard <previousCommitId>
git push origin <branch> --force

These commands reset the repository to the last accepted commit, by forcefully removing the respective commit from the git history of your CodeCommit repo. I advise you to handle this with care and only activate it on a real project if you fully understand the consequences of rewriting git history.

The Node.js v12 runtime for Lambda does not have a git runtime installed by default. You can add one by using the git-lambda2 Lambda layer. This allows you to run git commands from within the Lambda function.

Logs for the remediation measure Hard Reset

Finally, this Lambda function also sends out a notification. The complete code is available in the GitHub repo.

Using SNS to notify users

To notify users about secrets discovered and actions taken, you create an SNS topic and subscribe to it via email.

const topic = new Topic(this, "CodeCommitSecurityEventNotification", {
    displayName: "CodeCommitSecurityEventNotification",
});

topic.addSubscription(new subs.EmailSubscription(/* your email address */));

Testing the solution

You can test the deployed solution by running these two sets of commands. First, add a file with no credentials:

echo "Clean file - no credentials here" > clean_file.txt
git add clean_file.txt
git commit clean_file.txt -m "Adds clean_file.txt"
git push

Then add a file containing credentials:

SECRET_LIKE_STRING=$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo "secret=$SECRET_LIKE_STRING" > problematic_file.txt
git add problematic_file.txt
git commit problematic_file.txt -m "Adds secret-like string to problematic_file.txt"
git push

This first command creates, commits and pushes an unproblematic file clean_file.txt that will pass the checks of commitInspectLambda. The second command creates, commits, and pushes problematic_file.txt, which matches the regular expressions and triggers the remediation measures.

If you check your email, you soon receive notifications about actions taken by the Lambda functions.

Cleaning up

To avoid incurring charges, delete the resources by running cdk destroy and confirming the deletion.

Conclusion

This post demonstrates how you can implement a solution to discover secrets in commits to AWS CodeCommit repositories. It also defines different strategies to remediate this.

The CDK code to set up all components is minimal and can be extended for remediation measures. The template is portable between Regions and uses serverless technologies to minimize cost and complexity.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless Q4 2020

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2020/

Welcome to the 12th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

ICYMI Q4 calendar

In case you missed our last ICYMI, check out what happened last quarter here.

AWS re:Invent

re:Invent 2020 banner

re:Invent was entirely virtual in 2020 and free to all attendees. The conference had a record number of registrants and featured over 700 sessions. The serverless developer advocacy team presented a number of talks to help developers build their skills. These are now available on-demand:

AWS Lambda

There were three major Lambda announcements at re:Invent. Lambda duration billing changed granularity from 100 ms to 1 ms, which is shown in the December billing statement. All functions benefit from this change automatically, and it’s especially beneficial for sub-100ms Lambda functions.

Lambda has also increased the maximum memory available to 10 GB. Since memory also controls CPU allocation in Lambda, this means that functions now have up to 6 vCPU cores available for processing. Finally, Lambda now supports container images as a packaging format, enabling teams to use familiar container tooling, such as Docker CLI. Container images are stored in Amazon ECR.

There were three feature releases that make it easier for developers working on data processing workloads. Lambda now supports self-hosted Kafka as an event source, allowing you to source events from on-premises or instance-based Kafka clusters. You can also process streaming analytics with tumbling windows and use custom checkpoints for processing batches with failed messages.

We launched Lambda Extensions in preview, enabling you to more easily integrate monitoring, security, and governance tools into Lambda functions. You can also build your own extensions that run code during Lambda lifecycle events. See this example extensions repo for starting development.

You can now send logs from Lambda functions to custom destinations by using Lambda Extensions and the new Lambda Logs API. Previously, you could only forward logs after they were written to Amazon CloudWatch Logs. Now, logging tools can receive log streams directly from the Lambda execution environment. This makes it easier to use your preferred tools for log management and analysis, including Datadog, Lumigo, New Relic, Coralogix, Honeycomb, or Sumo Logic.

Lambda Logs API architecture

Lambda launched support for Amazon MQ as an event source. Amazon MQ is a managed broker service for Apache ActiveMQ that simplifies deploying and scaling queues. The event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service manages an internal poller to invoke the target Lambda function.

Lambda announced support for AWS PrivateLink. This allows you to invoke Lambda functions from a VPC without traversing the public internet. It provides private connectivity between your VPCs and AWS services. By using VPC endpoints to access the Lambda API from your VPC, this can replace the need for an Internet Gateway or NAT Gateway.

For developers building machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling in Lambda, you can now use AVX2 support to help reduce duration and lower cost. In this blog post’s example, enabling AVX2 for an image-processing function increased performance by 32-43%.

Lambda now supports batch windows of up to 5 minutes when using SQS as an event source. This is useful for workloads that are not time-sensitive, allowing developers to reduce the number of Lambda invocations from queues. Additionally, the batch size has been increased from 10 to 10,000. This is now the same batch size as Kinesis as an event source, helping Lambda-based applications process more data per invocation.

Code signing is now available for Lambda, using AWS Signer. This allows account administrators to ensure that Lambda functions only accept signed code for deployment. You can learn more about using this new feature in the developer documentation.

AWS Step Functions

Synchronous Express Workflows have been launched for AWS Step Functions, providing a new way to run high-throughput Express Workflows. This feature allows developers to receive workflow responses without needing to poll services or build custom solutions. This is useful for high-volume microservice orchestration and fast compute tasks communicating via HTTPS.

The Step Functions service recently added support for other AWS services in workflows. You can now integrate API Gateway REST and HTTP APIs. This enables you to call API Gateway directly from a state machine as an asynchronous service integration.

Step Functions now also supports Amazon EKS service integration. This allows you to build workflows with steps that synchronously launch tasks in EKS and wait for a response. The service also announced support for Amazon Athena, so workflows can now query data in your S3 data lakes.

Amazon API Gateway

API Gateway now supports mutual TLS authentication, which is commonly used for business-to-business applications and standards such as Open Banking. This is provided at no additional cost. You can now also disable the default REST API endpoint when deploying APIs using custom domain names.

HTTP APIs now supports service integrations with Step Functions Synchronous Express Workflows. This is a result of the service team’s work to add the most popular features of REST APIs to HTTP APIs.

AWS X-Ray

X-Ray now integrates with Amazon S3 to trace upstream requests. If a Lambda function uses the X-Ray SDK, S3 sends tracing headers to downstream event subscribers. This allows you to use the X-Ray service map to view connections between S3 and other services used to process an application request.

X-Ray announced support for end-to-end tracing in Step Functions to make it easier to trace requests across multiple AWS services. It also launched X-Ray Insights in preview, which generates actionable insights based on anomalies detected in an application. For Java developers, the services released an auto-instrumentation agent, for collecting instrumentation without modifying existing code.

Additionally, the AWS Distro for Open Telemetry is now in preview. OpenTelemetry is a collaborative effort by tracing solution providers to create common approaches to instrumentation.

Amazon EventBridge

You can now use event replay to archive and replay events with Amazon EventBridge. After configuring an archive, EventBridge automatically stores all events or filtered events, based upon event pattern matching logic. Event replay can help with testing new features or changes in your code, or hydrating development or test environments.

EventBridge archive and replay

EventBridge also launched resource policies that simplify managing access to events across multiple AWS accounts. Resource policies provide a powerful mechanism for modeling event buses across multiple account and providing fine-grained access control to EventBridge API actions.

EventBridge resource policies

EventBridge announced support for Server-Side Encryption (SSE). Events are encrypted using AES-256 at no additional cost for customers. EventBridge also increased PutEvent quotas to 10,000 transactions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). This helps support workloads with high throughput.

Developer tools

The AWS SDK for JavaScript v3 was launched and includes first-class TypeScript support and a modular architecture. This makes it easier to import only the services needed to minimize deployment package sizes.

The AWS Serverless Application Model (AWS SAM) is an AWS CloudFormation extension that makes it easier to build, manage, and maintain serverless applications. The latest versions include support for cached and parallel builds, together with container image support for Lambda functions.

You can use AWS SAM in the new AWS CloudShell, which provides a browser-based shell in the AWS Management Console. This can help run a subset of AWS SAM CLI commands as an alternative to using a dedicated instance or AWS Cloud9 terminal.

AWS CloudShell

Amazon SNS

Amazon SNS announced support for First-In-First-Out (FIFO) topics. These are used with SQS FIFO queues for applications that require strict message ordering with exactly once processing and message deduplication.

Amazon DynamoDB

Developers can now use PartiQL, an SQL-compatible query language, with DynamoDB tables, bringing familiar SQL syntax to NoSQL data. You can also choose to use Kinesis Data Streams to capture changes to tables.

For customers using DynamoDB global tables, you can now use your own encryption keys. While all data in DynamoDB is encrypted by default, this feature enables you to use customer managed keys (CMKs). DynamoDB also announced the ability to export table data to data lakes in Amazon S3. This enables you to use services like Amazon Athena and AWS Lake Formation to analyze DynamoDB data with no custom code required.

AWS Amplify and AWS AppSync

You can now use existing Amazon Cognito user pools and identity pools for Amplify projects, making it easier to build new applications for an existing user base. With the new AWS Amplify Admin UI, you can configure application backends without using the AWS Management Console.

AWS AppSync enabled AWS WAF integration, making it easier to protect GraphQL APIs against common web exploits. You can also implement rate-based rules to help slow down brute force attacks. Using AWS Managed Rules for AWS WAF provides a faster way to configure application protection without creating the rules directly.

Serverless Posts

October

November

December

Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q4:

Videos

October:

November:

December:

There are also other helpful videos covering Serverless available on the Serverless Land YouTube channel.

The Serverless Land website

Serverless Land website

To help developers find serverless learning resources, we have curated a list of serverless blogs, videos, events, and training programs at a new site, Serverless Land. This is regularly updated with new information – you can subscribe to the RSS feed for automatic updates or follow the LinkedIn page.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Ingesting MongoDB Atlas data using Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/ingesting-mongodb-atlas-data-using-amazon-eventbridge/

This post is courtesy of Gopalakrishnan Ramaswamy, Solutions Architect

Amazon EventBridge is a serverless event bus that makes it easier to connect applications together using data from your own applications, integrated software as a service (SaaS) applications, and AWS services. It does so by delivering a stream of real-time data from various event sources. You can set up routing rules to send data to targets like AWS Lambda and build loosely coupled application architectures that react in near-real time to data sources.

MongoDB is a document database, which means it stores data in JSON-like documents. It provides a query language and has support for multi-document ACID transactions. MongoDB Atlas is a fully managed MongoDB database service hosted on the cloud. It can be used as a globally distributed database that automates administrative tasks such as database configuration, infrastructure provisioning, patching, scaling, and backups.

With EventBridge, you can use data from MongoDB to trigger workflows for customer support, business operations and more. In this post, I walk through the process of connecting MongoDB Atlas with the AWS Cloud and triggering events from changes in the MongoDB collections data.

Overview

The following diagram shows the high-level architecture of an example scenario to ingest MongoDB data into the AWS Cloud using Amazon EventBridge.

Solution architecture

MongoDB stores data records as BSON documents, which are gathered together in collections. A database stores one or more collections of documents.

This walkthrough shows you how to:

  1. Create a MongoDB cluster and load sample data.
  2. Create a database trigger associated to a collection.
  3. Create an event bus in AWS, linked to the partner event source.
  4. Create a Lambda function and the associated role with permissions.
  5. Create an EventBridge rule and associate it to the Lambda function.
  6. Verify the process.

Steps 3–5 create and configure the AWS resources using the AWS Serverless Application Model (AWS SAM). To set up the sample application, visit the GitHub repo and follow the instructions in the README.md file.

Prerequisites

This walkthrough requires:

  • An AWS account.
  • A MongoDB account.
  • The AWS SAM CLI installed and configured on your machine.

Creating a MongoDB Atlas cluster and loading sample data

For detailed steps to create a cluster and load data, see MongoDB Atlas documentation. To create the test cluster:

  1. Create a MongoDB Atlas account.
  2. Deploy a free tier cluster using these instructions, selecting your preferred cloud provider and Region.
  3. Add your trusted connection IP address to the IP access list. This allows to connect to the cluster and access the data.
  4. After connecting to your cluster, load sample data into your cluster:
    • Navigate to the clusters view by choosing Clusters in the left navigation pane.
    • Select the cluster, choose the ellipses (…) button, and Load Sample Dataset.

MongoDB clusters UI

Create MongoDB database trigger

MongoDB database triggers allow you to run server-side logic when a document is added, updated, or removed in a linked cluster. Use database triggers to implement complex data interactions, including updating information in one document when a related document changes or interacting with an external service when a new document is inserted.

  1. Sign in to your account and choose Triggers in the left-hand panel.
  2. Choose Add Trigger to open the trigger configuration page.
  3. Select Database for Trigger Type.Add trigger
  4. Enter a name for the trigger.
  5. In the Trigger Source Details section:
    • Select the cluster with sample data loaded (for example, Cluster0) for Cluster Name.
    • For Database Name select sample_analytics.
    • Select customers for Collection Name.
    • Check Insert, Update, Delete, and Replace for Operation Type.Trigger source details
  6. In the Function section:
    • For Select An Event Type, Select EventBridge.
    • Enter your AWS Account ID. Learn how to find your account ID in this documentation.
    • Select an AWS Region where the event bus will be created.EventBridge configuration
  7. Choose Save.

Once a MongoDB Atlas trigger is created, it creates a corresponding partner event source in the Amazon EventBridge console. Initially, these event sources show as Pending with no event bus associated to them.

Partner event source

Next, use the AWS SAM template in the GitHub repo to create the event bus, Lambda function, and event rule.

  1. Clone the GitHub repo and deploy the AWS SAM template:
    git clone https://github.com/aws-samples/amazon-eventbridge-partnerevent-example
    cd ./amazon-eventbridge-partnerevent-example
    sam deploy --guided
  2. Choose a stack name and enter the partner event source name.

The next section explains the steps that are performed by the AWS SAM template.

Creating the event bus

To receive events from SaaS partners, an event bus must be created that is associated to the partner event source:

  PartnerEventBus: 
    Type: AWS::Events::EventBus
    Properties: 
      EventSourceName: !Ref PartnerEventSource
      Name: !Ref PartnerEventSource

The partner event source name and the name of the event bus are derived from the parameter entered when running the template.

Once you create an event bus associated with a partner event source, the status of the partner event source changes to Active. A new event bus with the same name as the partner event source is created. You can see this in the EventBridge console, in Event buses in the left-hand panel.

Partner event sources

Creating the Lambda function

The following section of the template creates a Lambda function that is invoked by an event rule:

  myeventfunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: eventLambda/
      Handler: index.handler
      Runtime: nodejs12.x
      FunctionName: myeventfunction

Creating an event bus rule

The following section in the template creates an event rule that triggers the preceding Lambda function. The event pattern used by the rule, selects and routes events to targets.

  myeventrule:
    Type: 'AWS::Events::Rule'
    Properties:
      Description: Test Events Rule
      EventBusName: !Ref PartnerEventSource
      EventPattern: 
        account: [!Ref AWS::AccountId]
      Name: myeventrule
      State: ENABLED
      Targets:
       - 
         Arn: 
           Fn::GetAtt:
             - "myeventfunction"
             - "Arn"
         Id: "idmyeventrule"

Permission is provided to the rule, to invoke Lambda functions. This allows the rule to trigger the associated Lambda function:

  PermissionForEventsToInvokeLambda: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: "myeventfunction"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: 
        Fn::GetAtt: 
          - "myeventrule"
          - "Arn"         

Verifying the integration

After deploying the AWS SAM template, verify that the EventBridge integration works by inserting test data into the source MongoDB collection. After adding this data, the event is sent to the event bus, which invokes the Lambda function. This is shown in the CloudWatch logs for the event payload.

To verify the deployment:

  1. Download and install the MongoDB shell.
  2. Connect to MongoDB shell using:
    mongo "mongodb+srv://cluster0.xvo4o.mongodb.net/sample_analytics" --username yourusername

    Replace the cluster name with the cluster you created. Connect to the sample_analytics database, which has the sample data and collections.

  3. Next, insert a record into the customers collection with associated the database trigger. In the MongoDB shell, run the following command:
    db.customers.insertOne(
    {
      username:"myuser99",
      name:"Eventbridge Mongo",
      address:"My Address XYZ",
      birthdate:{"$date":"1975-03-02T02:20:31.000Z"},
      email:"[email protected]",
      active:true,
      accounts:[371138,324287,276528,332179,422649,387979],
      tier_and_details:{
         "0df078f33aa74a2e9696e0520c1a828a":{
         tier:"Bronze",
         id:"0df078f33aa74a2e9696e0520c1a828a",
         active:true,
         benefits:["sports tickets"]
        },
       "699456451cc24f028d2aa99d7534c219":{
       tier:"Bronze",
       benefits:["24 hour dedicated line","concierge services"],
       active:true,
       id:"699456451cc24f028d2aa99d7534c219"
      }
      }
      }
    )
    
  4. Once the record is successfully inserted:
    • Navigate to CloudWatch in the AWS console and choose Log groups in the left-hand panel.
    • Search for the log group /aws/lambda/myeventfunction and choose the event stream.
    • Expand the log items to reveal the event. This contains the payload that was sent from MongoDB Atlas to EventBridge.

Conclusion

This post demonstrates how to connect MongoDB Atlas data with the AWS Cloud using Amazon EventBridge. EventBridge helps you connect data from a range of SaaS applications using minimal code. It can help reduce operational overhead and build powerful event-driven architectures more easily. For more information about integrating data between SaaS applications, see Amazon EventBridge.

For more serverless learning resources, visit Serverless Land.

Automating mutual TLS setup for Amazon API Gateway

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-mutual-tls-setup-for-amazon-api-gateway/

This post is courtesy of Pankaj Agrawal, Solutions Architect.

In September 2020, Amazon API Gateway announced support for mutual Transport Layer Security (TLS) authentication. This is a new method for client-to-server authentication that can be used with API Gateway’s existing authorization options. Mutual TLS (mTLS) is an extension of Transport Layer Security(TLS), requiring both the server and client to verify each other.

Mutual TLS is commonly used for business-to-business (B2B) applications. It’s used in standards such as Open Banking, which enables secure open API integrations for financial institutions. It’s also common for Internet of Things (IoT) applications to authenticate devices using digital certificates.

This post covers automating the mTLS setup for API Gateway HTTP APIs, but the same steps can also be used for REST APIs as well. Download the code used in this walkthrough from the project’s GitHub repo.

Overview

To enable mutual TLS, you must create an API with a valid custom domain name. Mutual TLS is available for both regional REST APIs and the newer HTTP APIs. To set up mutual TLS with API Gateway, you must upload a certificate authority (CA) public key certificate to Amazon S3. This is called a truststore and is used for validating client certificates.

Reference architecture

The AWS Certificate Manager Private Certificate Authority (ACM Private CA) is a highly available private CA service. I am using the ACM Private CA as a certificate authority to configure HTTP APIs and to distribute certificates to clients.

Deploying the solution

To deploy the application, the solution uses the AWS Serverless Application Model (AWS SAM). AWS SAM provides shorthand syntax to define functions, APIs, databases, and event source mappings. As a prerequisite, you must have AWS SAM CLI and Java 8 installed. You must also have the AWS CLI configured.

To deploy the solution:

  1. Clone the GitHub repository and build the application with the AWS SAM CLI. Run the following commands in a terminal:
    git clone https://github.com/aws-samples/api-gateway-auth.git
    cd api-gateway-auth
    sam build

    Console output

  2. Deploy the application:
    sam deploy --guided

Provide a stack name and preferred AWS Region for the deployment process. The template requires three parameters:

  1. HostedZoneId: The template uses an Amazon Route 53 public hosted zone to configure the custom domain. Provide the hosted zone ID where the record set must be created.
  2. DomainName: The custom domain name for the API Gateway HTTP API.
  3. TruststoreKey: The name for the trust store file in S3 bucket, which is used by API Gateway for mTLS. By default its truststore.pem.

SAM deployment configuration

After deployment, the stack outputs the ARN of a test client certificate (ClientOneCertArn). This is used to validate the setup later. The API Gateway HTTP API endpoint is also provided as output.

SAM deployment output

You have now created an API Gateway HTTP APIs endpoint using mTLS.

Setting up the ACM Private CA

The AWS SAM template starts with setting up the ACM Private CA. This enables you to create a hierarchy of certificate authorities with up to five levels. A well-designed CA hierarchy offers benefits such as granular security controls and division of administrative tasks. To learn more about the CA hierarchy, visit designing a CA hierarchy. The ACM Private CA is used to configure HTTP APIs and to distribute certificates to clients.

First, a root CA is created and activated, followed by a subordinate CA following best practices. The subordinate CA is used to configure mTLS for the API and distribute the client certificates.

  PrivateCA:
    Type: AWS::ACMPCA::CertificateAuthority
    Properties:
      KeyAlgorithm: RSA_2048
      SigningAlgorithm: SHA256WITHRSA
      Subject:
        CommonName: !Sub "${AWS::StackName}-rootca"
      Type: ROOT

  PrivateCACertificate:
    Type: AWS::ACMPCA::Certificate
    Properties:
      CertificateAuthorityArn: !Ref PrivateCA
      CertificateSigningRequest: !GetAtt PrivateCA.CertificateSigningRequest
      SigningAlgorithm: SHA256WITHRSA
      TemplateArn: 'arn:aws:acm-pca:::template/RootCACertificate/V1'
      Validity:
        Type: YEARS
        Value: 10

  PrivateCAActivation:
    Type: AWS::ACMPCA::CertificateAuthorityActivation
    Properties:
      Certificate: !GetAtt
        - PrivateCACertificate
        - Certificate
      CertificateAuthorityArn: !Ref PrivateCA
      Status: ACTIVE

  MtlsCA:
    Type: AWS::ACMPCA::CertificateAuthority
    Properties:
      Type: SUBORDINATE
      KeyAlgorithm: RSA_2048
      SigningAlgorithm: SHA256WITHRSA
      Subject:
        CommonName: !Sub "${AWS::StackName}-mtlsca"

  MtlsCertificate:
    DependsOn: PrivateCAActivation
    Type: AWS::ACMPCA::Certificate
    Properties:
      CertificateAuthorityArn: !Ref PrivateCA
      CertificateSigningRequest: !GetAtt
        - MtlsCA
        - CertificateSigningRequest
      SigningAlgorithm: SHA256WITHRSA
      TemplateArn: 'arn:aws:acm-pca:::template/SubordinateCACertificate_PathLen3/V1'
      Validity:
        Type: YEARS
        Value: 3

  MtlsActivation:
    Type: AWS::ACMPCA::CertificateAuthorityActivation
    Properties:
      CertificateAuthorityArn: !Ref MtlsCA
      Certificate: !GetAtt
        - MtlsCertificate
        - Certificate
      CertificateChain: !GetAtt
        - PrivateCAActivation
        - CompleteCertificateChain
      Status: ACTIVE

Issuing client certificate from ACM Private CA

Create a client certificate, which is used as a test certificate to validate the mTLS setup:

ClientOneCert:
    DependsOn: MtlsActivation
    Type: AWS::CertificateManager::Certificate
    Properties:
      CertificateAuthorityArn: !Ref MtlsCA
      CertificateTransparencyLoggingPreference: ENABLED
      DomainName: !Ref DomainName
      Tags:
        - Key: Name
          Value: ClientOneCert

Setting up a truststore in Amazon S3

The ACM Private CA is ready for configuring mTLS on the API. The configuration uses an S3 object as its truststore to validate client certificates. To automate this, an AWS Lambda backed custom resource copies the public certificate chain of the ACM Private CA to the S3 bucket:

  TrustStoreBucket:
    Type: AWS::S3::Bucket
    Properties:
      VersioningConfiguration:
        Status: Enabled

  TrustedStoreCustomResourceFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: TrustedStoreCustomResourceFunction
      Handler: com.auth.TrustedStoreCustomResourceHandler::handleRequest
      Timeout: 120
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref TrustStoreBucket

The example custom resource is written in Java but it could also be written in another language runtime. The custom resource is invoked with the public certificate details of the private root CA, subordinate CAs, and the target S3 bucket. The Lambda function then concatenates the certificate chain and stores the object in the S3 bucket.

TrustedStoreCustomResource:
    Type: Custom::TrustedStore
    Properties:
      ServiceToken: !GetAtt TrustedStoreCustomResourceFunction.Arn
      TrustStoreBucket: !Ref TrustStoreBucket
      TrustStoreKey: !Ref TruststoreKey
      Certs:
        - !GetAtt MtlsCertificate.Certificate
        - !GetAtt PrivateCACertificate.Certificate

You can view and download the handler code for the Lambda-backed custom resource from the repo.

Configuring Amazon API Gateway HTTP APIs with mTLS

With a valid truststore object in the S3 bucket, you can set up the API. A valid custom domain must be configured for API Gateway to enable mTLS. The following code creates and sets up a custom domain for HTTP APIs. See template.yaml for a complete example.

CustomDomainCert:
    Type: AWS::CertificateManager::Certificate
    Properties:
      CertificateTransparencyLoggingPreference: ENABLED
      DomainName: !Ref DomainName
      DomainValidationOptions:
        - DomainName: !Ref DomainName
          HostedZoneId: !Ref HostedZoneId
      ValidationMethod: DNS

  SampleHttpApi:
    Type: AWS::Serverless::HttpApi
    DependsOn: TrustedStoreCustomResource
    Properties:
      CorsConfiguration:
        AllowMethods:
          - GET
        AllowOrigins:
          - http://localhost:8080
      Domain:
        CertificateArn: !Ref CustomDomainCert
        DomainName: !Ref DomainName
        EndpointConfiguration: REGIONAL
        SecurityPolicy: TLS_1_2
        MutualTlsAuthentication:
          TruststoreUri: !GetAtt TrustedStoreCustomResource.TrustStoreUri
          TruststoreVersion: !GetAtt TrustedStoreCustomResource.ObjectVersion
        Route53:
          EvaluateTargetHealth: False
          HostedZoneId: !Ref HostedZoneId
        DisableExecuteApiEndpoint: true

An Amazon Route 53 public hosted zone is used to configure the custom domain. This must be set up in your AWS account separately and you must provide the hosted zone ID as a parameter to the template.

Since the HTTP APIs default endpoint does not require mutual TLS, it is disabled via DisableExecuteApiEndpoint. This helps to ensure that mTLS authentication is enforced for all traffic to the API.

The sample API invokes a Lambda function and returns the request payload as the response.

Testing and validating the setup

To validate the setup, first export the client certificate created earlier. You can export the certificate by using the AWS Management Console or AWS CLI. This example uses the AWS CLI to export the certificate. To learn how to do this via the console, see exporting a private certificate using the console.

  1. Export the base64 PEM-encoded certificate to a local file, client.pem.aws acm export-certificate --certificate-arn <<Certificat ARN from stack output>>
    --passphrase $(echo -n 'your paraphrase' | base64) --region us-east-2 | jq -r '"\(.Certificate)"' > client.pem
  2. Export the encrypted private key associated with the public key in the certificate and save it to a local file client.encrypted.key. You must provide a passphrase to associate with the encrypted private key. This is used to decrypt the exported private key.aws acm export-certificate --certificate-arn <<Certificat ARN from stack output>>
    --passphrase $(echo -n 'your paraphrase' | base64) --region us-east-2| jq -r '"\(.PrivateKey)"' > client.encrypted.key
  3. Decrypt the exported private key using passphrase and OpenSSL:openssl rsa -in client.encrypted.key -out client.decrypted.key
  4. Access the API using mutual TLS:curl -v --cert client.pem  --key client.decrypted.key https://demo-api.example.com

Adding a certificate revocation list

AWS Certificate Manager Private Certificate Authority (ACM Private CA) can be natively configured with an optional certificate revocation list (CRL).

CRL is a way for certificate authority (CA) to make it known that one or more of their digital certificates is no longer trustworthy. When they revoke a certificate, they invalidate the certificate ahead of its expiration date. The certificate authority can revoke an issued certificate for several reasons, the most common one being that the certificate’s private key are compromised.

API Gateway HTTP APIs mTLS setup can be used along with all existing API Gateway authorizer options. You can further extend validation to AWS Lambda authorizers, which can be configured to validate the client certificates against this certificate revocation list (CRL). For example:

Certificate revocation architecture

For Lambda authorizer blueprint examples, refer to aws-apigateway-lambda-authorizer-blueprints.

Conclusion

Mutual TLS (mTLS) for API Gateway is now generally available at no additional cost. This post shows how to automate mutual TLS for Amazon API Gateway HTTP APIs using the AWS Certificate Manager Private Certificate Authority as a private CA. Using infrastructure as code (IaC) enables you to develop, deploy, and scale cloud applications, often with greater speed, less risk, and reduced cost.

Download the complete working example for deploying mTLS with API Gateway at this GitHub repo. To learn more about Amazon API Gateway, visit the API Gateway developer guide documentation.

For more serverless learning resources, visit Serverless Land.

Optimizing batch processing with custom checkpoints in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-batch-processing-with-custom-checkpoints-in-aws-lambda/

AWS Lambda can process batches of messages from sources like Amazon Kinesis Data Streams or Amazon DynamoDB Streams. In normal operation, the processing function moves from one batch to the next to consume messages from the stream.

However, when an error occurs in one of the items in the batch, this can result in reprocessing some of the same messages in that batch. With the new custom checkpoint feature, there is now much greater control over how you choose to process batches containing failed messages.

This blog post explains the default behavior of batch failures and options available to developers to handle this error state. I also cover how to use this new checkpoint capability and show the benefits of using this feature in your stream processing functions.

Overview

When using a Lambda function to consume messages from a stream, the batch size property controls the maximum number of messages passed in each event.

The stream manages two internal pointers: a checkpoint and a current iterator. The checkpoint is the last known item position that was successfully processed. The current iterator is the position in the stream for the next read operation. In a successful operation, here are two batches processed from a stream with a batch size of 10:

Checkpoints and current iterators

  1. The first batch delivered to the Lambda function contains items 1–10. The function processes these items without error.
  2. The checkpoint moves to item 11. The next batch delivered to the Lambda function contains items 11–20.

In default operation, the processing of the entire batch must succeed or fail. If a single item fails processing and the function returns an error, the batch fails. The entire batch is then retried until the maximum retries is reached. This can result in the same failure occurring multiple times and unnecessary processing of individual messages.

You can also enable the BisectBatchOnFunctonError property in the event source mapping. If there is a batch failure, the calling service splits the failed batch into two and retries the half-batches separately. The process continues recursively until there is a single item in a batch or messages are processed successfully. For example, in a batch of 10 messages, where item number 5 is failing, the processing occurs as follows:

Bisect batch on error processing

  1. Batch 1 fails. It’s split into batches 2 and 3.
  2. Batch 2 fails, and batch 3 succeeds. Batch 2 is split into batches 4 and 5.
  3. Batch 4 fails and batch 5 succeeds. Batch 4 is split into batches 6 and 7.
  4. Batch 6 fails and batch 7 succeeds.

While this provides a way to process messages in a batch with one failing message, it results in multiple invocations of the function. In this example, message number 4 is processed four times before succeeding.

With the new custom checkpoint feature, you can return the sequence identifier for the failed messages. This provides more precise control over how to choose to continue processing the stream. For example, in a batch of 10 messages where the sixth message fails:

Custom checkpoint behavior

  1. Lambda processes the batch of messages, items 1–10. The sixth message fails and the function returns the failed sequence identifier.
  2. The checkpoint in the stream is moved to the position of the failed message. The batch is retried for only messages 6–10.

Existing stream processing behaviors

In the following examples, I use a DynamoDB table with a Lambda function that is invoked by the stream for the table. You can also use a Kinesis data stream if preferred, as the behavior is the same. The event source mapping is set to a batch size of 10 items so all the stream messages are passed in the event to a single Lambda invocation.

Architecture diagram

I use the following Node.js script to generate batches of 10 items in the table.

const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const docClient = new AWS.DynamoDB.DocumentClient()

const ddbTable = 'ddbTableName'
const BATCH_SIZE = 10

const createRecords = async () => {
  // Create envelope
  const params = {
    RequestItems: {}
  }
  params.RequestItems[ddbTable] = []

  // Add items to batch and write to DDB
  for (let i = 0; i < BATCH_SIZE; i++) {
    params.RequestItems[ddbTable].push({
      PutRequest: {
        Item: {
          ID: Date.now() + i
        }
      }
    })
  }
  await docClient.batchWrite(params).promise()
}

const main = async() => await createRecords()
main()

After running this script, there are 10 items in the DynamoDB table, which are then put into the DynamoDB stream for processing.

10 items in DynamoDB table

The processing Lambda function uses the following code. This contains a constant called FAILED_MESSAGE_NUM to force an error on the message with the corresponding index in the event batch:

exports.handler = async (event) => {
  console.log(JSON.stringify(event, null, 2))
  console.log('Records: ', event.Records.length)
  const FAILED_MESSAGE_NUM = 6
  
  let recordNum = 1
  let batchItemFailures = []

  event.Records.map((record) => {
    const sequenceNumber = record.dynamodb.SequenceNumber
    
    if ( recordNum === FAILED_MESSAGE_NUM ) {
      console.log('Error! ', sequenceNumber)
      throw new Error('kaboom')
    }
    console.log('Success: ', sequenceNumber)
    recordNum++
  })
}

The code uses the DynamoDB item’s sequence number, which is provided in each record of the stream event:

Item sequence number in event

In the default configuration of the event source mapping, the failure of message 6 causes the whole batch to fail. The entire batch is then retried multiple times. This appears in the CloudWatch Logs for the function:

Logs with retried batches

Next, I enable the bisect-on-error feature in the function’s event trigger. The first invocation fails as before but this causes two subsequent invocations with batches of five messages. The original batch is bisected. These batches complete processing successfully.

Logs with bisected batches

Configuring a custom checkpoint

Finally, I enable the custom checkpoint feature. This is configured in the Lambda function console by selecting the “Report batch item failures” check box in the DynamoDB trigger:

Add trigger settings

I update the processing Lambda function with the following code:

exports.handler = async (event) => {
  console.log(JSON.stringify(event, null, 2))
  console.log('Records: ', event.Records.length)
  const FAILED_MESSAGE_NUM = 4
  
  let recordNum = 1
  let sequenceNumber = 0
    
  try {
    event.Records.map((record) => {
      sequenceNumber = record.dynamodb.SequenceNumber
  
      if ( recordNum === FAILED_MESSAGE_NUM ) {
        throw new Error('kaboom')
      }
      console.log('Success: ', sequenceNumber)
      recordNum++
    })
  } catch (err) {
    // Return failed sequence number to the caller
    console.log('Failure: ', sequenceNumber)
    return { "batchItemFailures": [ {"itemIdentifier": sequenceNumber} ]  }
  }
}

In this version of the code, the processing of each message is wrapped in a try…catch block. When processing fails, the function stops processing any remaining messages. It returns the sequence number of the failed message in a JSON object:

{ 
  "batchItemFailures": [ 
    {
      "itemIdentifier": sequenceNumber
    }
  ]
}

The calling service then updates the checkpoint value with the sequence number provided. If the batchItemFailures array is empty, the caller assumes all messages have been processed correctly. If the batchItemFailures array contains multiple items, the lowest sequence number is used as the checkpoint.

In this example, I also modify the FAILED_MESSAGE_NUM constant to 4 in the Lambda function. This causes the fourth message in every batch to throw an error. After adding 10 items to the DynamoDB table, the CloudWatch log for the processing function shows:

Lambda function logs

This is how the stream of 10 messages has been processed using the custom checkpoint:

Custom checkpointing walkthrough

  1. In the first invocation, all 10 messages are in the batch. The fourth message throws an error. The function returns this position as the checkpoint.
  2. In the second invocation, messages 4–10 are in the batch. Message 7 throws an error and its sequence number is returned as the checkpoint.
  3. In the third invocation, the batch contains messages 7–10. Message 10 throws an error and its sequence number is now the returned checkpoint.
  4. The final invocation contains only message 10, which is successfully processed.

Using this approach, subsequent invocations do not receive messages that have been successfully processed previously.

Conclusion

The default behavior for stream processing in Lambda functions enables entire batches of messages to succeed or fail. You can also use batch bisecting functionality to retry batches iteratively if a single message fails. Now with custom checkpoints, you have more control over handling failed messages.

This post explains the three different processing modes and shows example code for handling failed messages. Depending upon your use-case, you can choose the appropriate mode for your workload. This can help reduce unnecessary Lambda invocations and prevent reprocessing of the same messages in batches containing failures.

To learn more about how to use this feature, read the developer documentation. To learn more about building with serverless technology, visit Serverless Land.

Using AWS Lambda for streaming analytics

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-aws-lambda-for-streaming-analytics/

AWS Lambda now supports streaming analytics calculations for Amazon Kinesis and Amazon DynamoDB. This allows developers to calculate aggregates in near-real time and pass state across multiple Lambda invocations. This feature provides an alternative way to build analytics in addition to services like Amazon Kinesis Data Analytics.

In this blog post, I explain how this feature works with Kinesis Data Streams and DynamoDB Streams, together with example use-cases.

Overview

For workloads using streaming data, data arrives continuously, often from different sources, and is processed incrementally. Discrete data processing tasks, such as operating on files, have a known beginning and end boundary for the data. For applications with streaming data, the processing function does not know when the data stream starts or ends. Consequently, this type of data is commonly processed in batches or windows.

Before this feature, Lambda-based stream processing was limited to working on the incoming batch of data. For example, in Amazon Kinesis Data Firehose, a Lambda function transforms the current batch of records with no information or state from previous batches. This is also the same for processing DynamoDB streams using Lambda functions. This existing approach works well for MapReduce or tasks focused exclusively on the date in the current batch.

Comparing DynamoDB and Kinesis streams

  1. DynamoDB streams invoke a processing Lambda function asynchronously. After processing, the function may then store the results in a downstream service, such as Amazon S3.
  2. Kinesis Data Firehose invokes a transformation Lambda function synchronously, which returns the transformed data back to the service.

This new feature introduces the concept of a tumbling window, which is a fixed-size, non-overlapping time interval of up to 15 minutes. To use this, you specify a tumbling window duration in the event-source mapping between the stream and the Lambda function. When you apply a tumbling window to a stream, items in the stream are grouped by window and sent to the processing Lambda function. The function returns a state value that is passed to the next tumbling window.

You can use this to calculate aggregates over multiple windows. For example, you can calculate the total value of a data item in a stream using 30-second tumbling windows:

Tumbling windows

  1. Integer data arrives in the stream at irregular time intervals.
  2. The first tumbling window consists of data in the 0–30 second range, passed to the Lambda function. It adds the items and returns the total of 6 as a state value.
  3. The second tumbling window invokes the Lambda function with the state value of 6 and the 30–60 second batch of stream data. This adds the items to the existing total, returning 18.
  4. The third tumbling window invokes the Lambda function with a state value of 18 and the next window of values. The running total is now 28 and returned as the state value.
  5. The fourth tumbling window invokes the Lambda function with a state value of 28 and the 90–120 second batch of data. The final total is 32.

This feature is useful in workloads where you need to calculate aggregates continuously. For example, for a retailer streaming order information from point-of-sale systems, it can generate near-live sales data for downstream reporting. Using Lambda to generate aggregates only requires minimal code, and the function can access other AWS services as needed.

Using tumbling windows with Lambda functions

When you configure an event source mapping between Kinesis or DynamoDB and a Lambda function, use the new setting, Tumbling window duration. This appears in the trigger configuration in the Lambda console:

Trigger configuration

You can also set this value in AWS CloudFormation and AWS SAM templates. After the event source mapping is created, events delivered to the Lambda function have several new attributes:

New attributes in events

These include:

  • Window start and end: the beginning and ending timestamps for the current tumbling window.
  • State: an object containing the state returned from the previous window, which is initially empty. The state object can contain up to 1 MB of data.
  • isFinalInvokeForWindow: indicates if this is the last invocation for the tumbling window. This only occurs once per window period.
  • isWindowTerminatedEarly: a window ends early only if the state exceeds the maximum allowed size of 1 MB.

In any tumbling window, there is a series of Lambda invocations following this pattern:

Tumbling window process in Lambda

  1. The first invocation contains an empty state object in the event. The function returns a state object containing custom attributes that are specific to the custom logic in the aggregation.
  2. The second invocation contains the state object provided by the first Lambda invocation. This function returns an updated state object with new aggregated values. Subsequent invocations follow this same sequence.
  3. The final invocation in the tumbling window has the isFinalInvokeForWindow flag set to the true. This contains the state returned by the most recent Lambda invocation. This invocation is responsible for storing the result in S3 or in another data store, such as a DynamoDB table. There is no state returned in this final invocation.

Using tumbling windows with DynamoDB

DynamoDB streams can invoke Lambda function using tumbling windows, enabling you to generate aggregates per shard. In this example, an ecommerce workload saves orders in a DynamoDB table and uses a tumbling window to calculate the near-real time sales total.

First, I create a DynamoDB table to capture the order data and a second DynamoDB table to store the aggregate calculation. I create a Lambda function with a trigger from the first orders table. The event source mapping is created with a Tumbling window duration of 30 seconds:

DynamoDB trigger configuration

I use the following code in the Lambda function:

const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const docClient = new AWS.DynamoDB.DocumentClient()
const TableName = 'tumblingWindowsAggregation'

function isEmpty(obj) { return Object.keys(obj).length === 0 }

exports.handler = async (event) => {
    // Save aggregation result in the final invocation
    if (event.isFinalInvokeForWindow) {
        console.log('Final: ', event)
        
        const params = {
          TableName,
          Item: {
            windowEnd: event.window.end,
            windowStart: event.window.start,
            sales: event.state.sales,
            shardId: event.shardId
          }
        }
        return await docClient.put(params).promise()
    }
    console.log(event)
    
    // Create the state object on first invocation or use state passed in
    let state = event.state

    if (isEmpty (state)) {
        state = {
            sales: 0
        }
    }
    console.log('Existing: ', state)

    // Process records with custom aggregation logic

    event.Records.map((item) => {
        // Only processing INSERTs
        if (item.eventName != "INSERT") return
        
        // Add sales to total
        let value = parseFloat(item.dynamodb.NewImage.sales.N)
        console.log('Adding: ', value)
        state.sales += value
    })

    // Return the state for the next invocation
    console.log('Returning state: ', state)
    return { state: state }
}

This function code processes the incoming event to aggregate a sales attribute, and return this aggregated result in a state object. In the final invocation, it stores the aggregated value in another DynamoDB table.

I then use this Node.js script to generate random sample order data:

const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const docClient = new AWS.DynamoDB.DocumentClient()

const TableName = 'tumblingWindows'
const ITERATIONS = 100
const SLEEP_MS = 100

let totalSales = 0

function sleep(ms) { 
  return new Promise(resolve => setTimeout(resolve, ms));
}

const createSales = async () => {
  for (let i = 0; i < ITERATIONS; i++) {

    let sales = Math.round (parseFloat(100 * Math.random()))
    totalSales += sales
    console.log ({i, sales, totalSales})

    await docClient.put ({
      TableName,
      Item: {
        ID: Date.now().toString(),
        sales,
        ITERATIONStamp: new Date().toString()
      }
    }).promise()
    await sleep(SLEEP_MS)
  }
}

const main = async() => {
  await createSales()
  console.log('Total Sales: ', totalSales)
}

main()

Once the script is complete, the console shows the individual order transactions and the total sales:

Script output

After the tumbling window duration is finished, the second DynamoDB table shows the aggregate values calculated and stored by the Lambda function:

Aggregate values in second DynamoDB table

Since aggregation for each shard is independent, the totals are stored by shardId. If I continue to run the test data script, the aggregation function continues to calculate and store more totals per tumbling window period.

Using tumbling windows with Kinesis

Kinesis data streams can also invoke a Lambda function using a tumbling window in a similar way. The biggest difference is that you control how many shards are used in the data stream. Since aggregation occurs per shard, this controls the total number aggregate results per tumbling window.

Using the same sales example, first I create a Kinesis data stream with one shard. I use the same DynamoDB tables from the previous example, then create a Lambda function with a trigger from the first orders table. The event source mapping is created with a Tumbling window duration of 30 seconds:

Kinesis trigger configuration

I use the following code in the Lambda function, modified to process the incoming Kinesis data event:

const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const docClient = new AWS.DynamoDB.DocumentClient()
const TableName = 'tumblingWindowsAggregation'

function isEmpty(obj) {
    return Object.keys(obj).length === 0
}

exports.handler = async (event) => {

    // Save aggregation result in the final invocation
    if (event.isFinalInvokeForWindow) {
        console.log('Final: ', event)
        
        const params = {
          TableName,
          Item: {
            windowEnd: event.window.end,
            windowStart: event.window.start,
            sales: event.state.sales,
            shardId: event.shardId
          }
        }
        console.log({ params })
        await docClient.put(params).promise()

    }
    console.log(JSON.stringify(event, null, 2))
    
    // Create the state object on first invocation or use state passed in
    let state = event.state

    if (isEmpty (state)) {
        state = {
            sales: 0
        }
    }
    console.log('Existing: ', state)

    // Process records with custom aggregation logic

    event.Records.map((record) => {
        const payload = Buffer.from(record.kinesis.data, 'base64').toString('ascii')
        const item = JSON.parse(payload).Item

        // // Add sales to total
        let value = parseFloat(item.sales)
        console.log('Adding: ', value)
        state.sales += value
    })

    // Return the state for the next invocation
    console.log('Returning state: ', state)
    return { state: state }
}

This function code processes the incoming event in the same way as the previous example. I then use this Node.js script to generate random sample order data, modified to put the data on the Kinesis stream:

const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const kinesis = new AWS.Kinesis()

const StreamName = 'testStream'
const ITERATIONS = 100
const SLEEP_MS = 10

let totalSales = 0

function sleep(ms) { 
  return new Promise(resolve => setTimeout(resolve, ms));
}

const createSales = async() => {

  for (let i = 0; i < ITERATIONS; i++) {

    let sales = Math.round (parseFloat(100 * Math.random()))
    totalSales += sales
    console.log ({i, sales, totalSales})

    const data = {
      Item: {
        ID: Date.now().toString(),
        sales,
        timeStamp: new Date().toString()
      }
    }

    await kinesis.putRecord({
      Data: Buffer.from(JSON.stringify(data)),
      PartitionKey: 'PK1',
      StreamName
    }).promise()
    await sleep(SLEEP_MS)
  }
}

const main = async() => {
  await createSales()
}

main()

Once the script is complete, the console shows the individual order transactions and the total sales:

Console output

After the tumbling window duration is finished, the second DynamoDB table shows the aggregate values calculated and stored by the Lambda function:

Aggregate values in second DynamoDB table

As there is only one shard in this Kinesis stream, there is only one aggregation value for all the data items in the test.

Conclusion

With tumbling windows, you can calculate aggregate values in near-real time for Kinesis data streams and DynamoDB streams. Unlike existing stream-based invocations, state can be passed forward by Lambda invocations. This makes it easier to calculate sums, averages, and counts on values across multiple batches of data.

In this post, I walk through an example that aggregates sales data stored in Kinesis and DynamoDB. In each case, I create an aggregation function with an event source mapping that uses the new tumbling window duration attribute. I show how state is passed between invocations and how to persist the aggregated value at the end of the tumbling window.

To learn more about how to use this feature, read the developer documentation. To learn more about building with serverless technology, visit Serverless Land.

Using self-hosted Apache Kafka as an event source for AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-self-hosted-apache-kafka-as-an-event-source-for-aws-lambda/

Apache Kafka is an open source event streaming platform used to support workloads such as data pipelines and streaming analytics. Apache Kafka is a distributed streaming platform that it is conceptually similar to Amazon Kinesis.

With the launch of Kafka as an event source for Lambda, you can now consume messages from a topic in a Lambda function. This makes it easier to integrate your self-hosted Kafka clusters with downstream serverless workflows.

In this blog post, I explain how to set up an Apache Kafka cluster on Amazon EC2 and configure key elements in the networking configuration. I also show how to create a Lambda function to consume messages from a Kafka topic. Although the process is similar to using Amazon Managed Streaming for Apache Kafka (Amazon MSK) as an event source, there are also some important differences.

Overview

Using Kafka as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides the message batches to your function in the event payload.

Lambda is a consumer application for your Kafka topic. It processes records from one or more partitions and sends the payload to the target function. Lambda continues to process batches until there are no more messages in the topic.

Configuring networking for self-hosted Kafka

It’s best practice to deploy the Amazon EC2 instances running Kafka in private subnets. For the Lambda function to poll the Kafka instances, you must ensure that there is a NAT Gateway running in the public subnet of each Region.

It’s possible to route the traffic to a single NAT Gateway in one AZ for test and development workloads. For redundancy in production workloads, it’s recommended that there is one NAT Gateway available in each Availability Zone. This walkthrough creates the following architecture:

Self-hosted Kafka architecture

  1. Deploy a VPC with public and private subnets and a NAT Gateway that enables internet access. To configure this infrastructure with AWS CloudFormation, deploy this template.
  2. From the VPC console, edit the default security group created by this template to provide inbound access to the following ports:
    • Custom TCP: ports 2888–3888 from all sources.
    • SSH (port 22), restricted to your own IP address.
    • Custom TCP: port 2181 from all sources.
    • Custom TCP: port 9092 from all sources.
    • All traffic from the same security group identifier.

Security Group configuration

Deploying the EC2 instances and installing Kafka

Next, you deploy the EC2 instances using this network configuration and install the Kafka application:

  1. From the EC2 console, deploy an instance running Ubuntu Server 18.04 LTS. Ensure that there is one instance in each private subnet, in different Availability Zones. Assign the default security group configured by the template.
  2. Next, deploy another EC2 instance in either of the public subnets. This is a bastion host used to access the private instances. Assign the default security group configured by the template.EC2 instances
  3. Connect to the bastion host, then SSH to the first private EC2 instance using the method for your preferred operating system. This post explains different methods. Repeat the process in another terminal for the second private instance.EC2 terminals
  4. On each instance, install Java:
    sudo add-apt-repository ppa:webupd8team/java
    sudo apt update
    sudo apt install openjdk-8-jdk
    java –version
  5. On each instance, install Kafka:
    wget http://www-us.apache.org/dist/kafka/2.3.1/kafka_2.12-2.3.1.tgz
    tar xzf kafka_2.12-2.3.1.tgz
    ln -s kafka_2.12-2.3.1 kafka

Configure and start Zookeeper

Configure and start the Zookeeper service that manages the Kafka brokers:

  1. On the first instance, configure the Zookeeper ID:
    cd kafka
    mkdir /tmp/zookeeper
    touch /tmp/zookeeper/myid
    echo "1" >> /tmp/zookeeper/myid
  2. Repeat the process on the second instance, using a different ID value:
    cd kafka
    mkdir /tmp/zookeeper
    touch /tmp/zookeeper/myid
    echo "2" >> /tmp/zookeeper/myid
  3. On the first instance, edit the config/zookeeper.properties file, adding the private IP address of the second instance:
    initLimit=5
    syncLimit=2
    tickTime=2000
    # list of servers: <ip>:2888:3888
    server.1=0.0.0.0:2888:3888 
    server.2=<<IP address of second instance>>:2888:3888
    
  4. On the second instance, edit the config/zookeeper.properties file, adding the private IP address of the first instance:
    initLimit=5
    syncLimit=2
    tickTime=2000
    # list of servers: <ip>:2888:3888
    server.1=<<IP address of first instance>>:2888:3888 
    server.2=0.0.0.0:2888:3888
  5. On each instance, start Zookeeper:bin/zookeeper-server-start.sh config/zookeeper.properties

Configure and start Kafka

Configure and start the Kafka broker:

  1. On the first instance, edit the config/server.properties file:
    broker.id=1
    zookeeper.connect=0.0.0.0:2181, =<<IP address of second instance>>:2181
  2. On the second instance, edit the config/server.properties file:
    broker.id=2
    zookeeper.connect=0.0.0.0:2181, =<<IP address of first instance>>:2181
  3. Start Kafka on each instance:
    bin/kafka-server-start.sh config/server.properties

At the end of this process, Zookeeper and Kafka are running on both instances. If you use separate terminals, it looks like this:

Zookeeper and Kafka terminals

Configuring and publishing to a topic

Kafka organizes channels of messages around topics, which are virtual groups of one or many partitions across Kafka brokers in a cluster. Multiple producers can send messages to Kafka topics, which can then be routed to and processed by multiple consumers. Producers publish to the tail of a topic and consumers read the topic at their own pace.

From either of the two instances:

  1. Create a new topic called test:
    bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 2 --partitions 2 --topic test
  2. Start a producer:
    bin/kafka-console-producer.sh --broker-list localhost:9092 –topic
  3. Enter test messages to check for successful publication:Sending messages to the Kafka topic

At this point, you can successfully publish messages to your self-hosted Kafka cluster. Next, you configure a Lambda function as a consumer for the test topic on this cluster.

Configuring the Lambda function and event source mapping

You can create the Lambda event source mapping using the AWS CLI or AWS SDK, which provide the CreateEventSourceMapping API. In this walkthrough, you use the AWS Management Console to create the event source mapping.

Create a Lambda function that uses the self-hosted cluster and topic as an event source:

  1. From the Lambda console, select Create function.
  2. Enter a function name, and select Node.js 12.x as the runtime.
  3. Select the Permissions tab, and select the role name in the Execution role panel to open the IAM console.
  4. Choose Add inline policy and create a new policy called SelfHostedKafkaPolicy with the following permissions. Replace the resource example with the ARNs of your instances:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ec2:CreateNetworkInterface",
                    "ec2:DescribeNetworkInterfaces",
                    "ec2:DescribeVpcs",
                    "ec2:DeleteNetworkInterface",
                    "ec2:DescribeSubnets",
                    "ec2:DescribeSecurityGroups",
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": " arn:aws:ec2:<REGION>:<ACCOUNT_ID>:instance/<instance-id>"
            }
        ]
    }
    

    Create policy

  5. Choose Create policy and ensure that the policy appears in Permissions policies.IAM role page
  6. Back in the Lambda function, select the Configuration tab. In the Designer panel, choose Add trigger.
  7. In the dropdown, select Apache Kafka:
    • For Bootstrap servers, add each of the two instances private IPv4 DNS addresses with port 9092 appended.
    • For Topic name, enter ‘test’.
    • Enter your preferred batch size and starting position values (see this documentation for more information).
    • For VPC, select the VPC created by the template.
    • For VPC subnets, select the two private subnets.
    • For VPC security groups, select the default security group.
    • Choose Add.

Add trigger configuration

The trigger’s status changes to Enabled in the Lambda console after a few seconds. It then takes several minutes for the trigger to receive messages from the Kafka cluster.

Testing the Lambda function

At this point, you have created a VPC with two private and public subnets and a NAT Gateway. You have created a Kafka cluster on two EC2 instances in private subnets. You set up a target Lambda function with the necessary IAM permissions. Next, you publish messages to the test topic in Kafka and see the resulting invocation in the logs for the Lambda function.

  1. In the Function code panel, replace the contents of index.js with the following code and choose Deploy:
    exports.handler = async (event) => {
        // Iterate through keys
        for (let key in event.records) {
          console.log('Key: ', key)
          // Iterate through records
          event.records[key].map((record) => {
            console.log('Record: ', record)
            // Decode base64
            const msg = Buffer.from(record.value, 'base64').toString()
            console.log('Message:', msg)
          }) 
        }
    }
  2. Back in the terminal with the producer script running, enter a test message:Send test message in Kafka
  3. In the Lambda function console, select the Monitoring tab then choose View logs in CloudWatch. In the latest log stream, you see the original event and the decoded message:Log events output

Using Lambda as event source

The Lambda function target in the event source mapping does not need to be connected to a VPC to receive messages from the private instance hosting Kafka. However, you must provide details of the VPC, subnets, and security groups in the event source mapping for the Kafka cluster.

The Lambda function must have permission to describe VPCs and security groups, and manage elastic network interfaces. These execution roles permissions are:

  • ec2:CreateNetworkInterface
  • ec2:DescribeNetworkInterfaces
  • ec2:DescribeVpcs
  • ec2:DeleteNetworkInterface
  • ec2:DescribeSubnets
  • ec2:DescribeSecurityGroups

The event payload for the Lambda function contains an array of records. Each array item contains details of the topic and Kafka partition identifier, together with a timestamp and base64 encoded message:

Event payload example

There is an important difference in the way the Lambda service connects to the self-hosted Kafka cluster compared with Amazon MSK. MSK encrypts data in transit by default so the broker connection defaults to using TLS. With a self-hosted cluster, TLS authentication is not supported when using the Apache Kafka event source. Instead, if accessing brokers over the internet, the event source uses SASL/SCRAM authentication, which can be configured in the event source mapping:

SASL/SCRAM configuration

To learn how to configure SASL/SCRAM authentication your self-hosted Kafka cluster, see this documentation.

Conclusion

Lambda now supports self-hosted Kafka as an event source so you can invoke Lambda functions from messages in Kafka topics to integrate into other downstream serverless workflows.

This post shows how to configure a self-hosted Kafka cluster on EC2 and set up the network configuration. I also cover how to set up the event source mapping in Lambda and test a function to decode the messages sent from Kafka.

To learn more about how to use this feature, read the documentation. For more serverless learning resource, visit Serverless Land.

ICYMI: Serverless pre:Invent 2020

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-preinvent-2020/

During the last few weeks, the AWS serverless team has been releasing a wave of new features in the build-up to AWS re:Invent 2020. This post recaps some of the most important releases for serverless developers.

re:Invent is virtual and free to all attendees in 2020 – register here. See the complete list of serverless sessions planned and join the serverless DA team live on Twitch. Also, follow your DAs on Twitter for live recaps and Q&A during the event.

AWS re:Invent 2020

AWS Lambda

We launched Lambda Extensions in preview, enabling you to more easily integrate monitoring, security, and governance tools into Lambda functions. You can also build your own extensions that run code during Lambda lifecycle events, and there is an example extensions repo for starting development.

You can now send logs from Lambda functions to custom destinations by using Lambda Extensions and the new Lambda Logs API. Previously, you could only forward logs after they were written to Amazon CloudWatch Logs. Now, logging tools can receive log streams directly from the Lambda execution environment. This makes it easier to use your preferred tools for log management and analysis, including Datadog, Lumigo, New Relic, Coralogix, Honeycomb, or Sumo Logic.

Lambda Extensions API

Lambda launched support for Amazon MQ as an event source. Amazon MQ is a managed broker service for Apache ActiveMQ that simplifies deploying and scaling queues. This integration increases the range of messaging services that customers can use to build serverless applications. The event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service manages an internal poller to invoke the target Lambda function.

We also released a new layer to make it simpler to integrate Amazon CodeGuru Profiler. This service helps identify the most expensive lines of code in a function and provides recommendations to help reduce cost. With this update, you can enable the profiler by adding the new layer and setting environment variables. There are no changes needed to the custom code in the Lambda function.

Lambda announced support for AWS PrivateLink. This allows you to invoke Lambda functions from a VPC without traversing the public internet. It provides private connectivity between your VPCs and AWS services. By using VPC endpoints to access the Lambda API from your VPC, this can replace the need for an Internet Gateway or NAT Gateway.

For developers building machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling in Lambda, you can now use AVX2 support to help reduce duration and lower cost. By using packages compiled for AVX2 or compiling libraries with the appropriate flags, your code can then benefit from using AVX2 instructions to accelerate computation. In the blog post’s example, enabling AVX2 for an image-processing function increased performance by 32-43%.

Lambda now supports batch windows of up to 5 minutes when using SQS as an event source. This is useful for workloads that are not time-sensitive, allowing developers to reduce the number of Lambda invocations from queues. Additionally, the batch size has been increased from 10 to 10,000. This is now the same as the batch size for Kinesis as an event source, helping Lambda-based applications process more data per invocation.

Code signing is now available for Lambda, using AWS Signer. This allows account administrators to ensure that Lambda functions only accept signed code for deployment. Using signing profiles for functions, this provides granular control over code execution within the Lambda service. You can learn more about using this new feature in the developer documentation.

Amazon EventBridge

You can now use event replay to archive and replay events with Amazon EventBridge. After configuring an archive, EventBridge automatically stores all events or filtered events, based upon event pattern matching logic. You can configure a retention policy for archives to delete events automatically after a specified number of days. Event replay can help with testing new features or changes in your code, or hydrating development or test environments.

EventBridge archived events

EventBridge also launched resource policies that simplify managing access to events across multiple AWS accounts. This expands the use of a policy associated with event buses to authorize API calls. Resource policies provide a powerful mechanism for modeling event buses across multiple account and providing fine-grained access control to EventBridge API actions.

EventBridge resource policies

EventBridge announced support for Server-Side Encryption (SSE). Events are encrypted using AES-256 at no additional cost for customers. EventBridge also increased PutEvent quotas to 10,000 transactions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). This helps support workloads with high throughput.

AWS Step Functions

Synchronous Express Workflows have been launched for AWS Step Functions, providing a new way to run high-throughput Express Workflows. This feature allows developers to receive workflow responses without needing to poll services or build custom solutions. This is useful for high-volume microservice orchestration and fast compute tasks communicating via HTTPS.

The Step Functions service recently added support for other AWS services in workflows. You can now integrate API Gateway REST and HTTP APIs. This enables you to call API Gateway directly from a state machine as an asynchronous service integration.

Step Functions now also supports Amazon EKS service integration. This allows you to build workflows with steps that synchronously launch tasks in EKS and wait for a response. In October, the service also announced support for Amazon Athena, so workflows can now query data in your S3 data lakes.

These new integrations help minimize custom code and provide built-in error handling, parameter passing, and applying recommended security settings.

AWS SAM CLI

The AWS Serverless Application Model (AWS SAM) is an AWS CloudFormation extension that makes it easier to build, manage, and maintains serverless applications. On November 10, the AWS SAM CLI tool released version 1.9.0 with support for cached and parallel builds.

By using sam build --cached, AWS SAM no longer rebuilds functions and layers that have not changed since the last build. Additionally, you can use sam build --parallel to build functions in parallel, instead of sequentially. Both of these new features can substantially reduce the build time of larger applications defined with AWS SAM.

Amazon SNS

Amazon SNS announced support for First-In-First-Out (FIFO) topics. These are used with SQS FIFO queues for applications that require strict message ordering with exactly once processing and message deduplication. This is designed for workloads that perform tasks like bank transaction logging or inventory management. You can also use message filtering in FIFO topics to publish updates selectively.

SNS FIFO

AWS X-Ray

X-Ray now integrates with Amazon S3 to trace upstream requests. If a Lambda function uses the X-Ray SDK, S3 sends tracing headers to downstream event subscribers. With this, you can use the X-Ray service map to view connections between S3 and other services used to process an application request.

AWS CloudFormation

AWS CloudFormation announced support for nested stacks in change sets. This allows you to preview changes in your application and infrastructure across the entire nested stack hierarchy. You can then review those changes before confirming a deployment. This is available in all Regions supporting CloudFormation at no extra charge.

The new CloudFormation modules feature was released on November 24. This helps you develop building blocks with embedded best practices and common patterns that you can reuse in CloudFormation templates. Modules are available in the CloudFormation registry and can be used in the same way as any native resource.

Amazon DynamoDB

For customers using DynamoDB global tables, you can now use your own encryption keys. While all data in DynamoDB is encrypted by default, this feature enables you to use customer managed keys (CMKs). DynamoDB also announced support for global tables in the Europe (Milan) and Europe (Stockholm) Regions. This feature enables you to scale global applications for local access in workloads running in different Regions and replicate tables for higher availability and disaster recovery (DR).

The DynamoDB service announced the ability to export table data to data lakes in Amazon S3. This enables you to use services like Amazon Athena and AWS Lake Formation to analyze DynamoDB data with no custom code required. This feature does not consume table capacity and does not impact performance and availability. To learn how to use this feature, see this documentation.

AWS Amplify and AWS AppSync

You can now use existing Amazon Cognito user pools and identity pools for Amplify projects, making it easier to build new applications for an existing user base. AWS Amplify Console, which provides a fully managed static web hosting service, is now available in the Europe (Milan), Middle East (Bahrain), and Asia Pacific (Hong Kong) Regions. This service makes it simpler to bring automation to deploying and hosting single-page applications and static sites.

AWS AppSync enabled AWS WAF integration, making it easier to protect GraphQL APIs against common web exploits. You can also implement rate-based rules to help slow down brute force attacks. Using AWS Managed Rules for AWS WAF provides a faster way to configure application protection without creating the rules directly. AWS AppSync also recently expanded service availability to the Asia Pacific (Hong Kong), Middle East (Bahrain), and China (Ningxia) Regions, making the service now available in 21 Regions globally.

Still looking for more?

Join the AWS Serverless Developer Advocates on Twitch throughout re:Invent for live Q&A, session recaps, and more! See this page for the full schedule.

For more serverless learning resources, visit Serverless Land.

Using Amazon SQS dead-letter queues to replay messages

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-sqs-dead-letter-queues-to-replay-messages/

Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service. It enables you to decouple and scale microservices, distributed systems, and serverless applications. A commonly used feature of Amazon SQS is dead-letter queues. The DLQ (dead-letter queue) is used to store messages that can’t be processed (consumed) successfully.

This post describes how to add automated resilience to an existing SQS queue. It monitors the dead-letter queue and moves a message back to the main queue to see if it can be processed again. It also uses a specific algorithm to make sure this is not repeated forever. Each time it attempts to reprocess the message, the replay time increases until the message is finally considered dead.

I use Amazon SQS dead-letter queues, AWS Lambda, and a specific algorithm to decrease the rate of retries for failed messages. I then package and publish this serverless solution in the AWS Serverless Application Repository.

Dead-letter queues and message replay

The main task of a dead-letter queue (DLQ) is to handle message failure. It allows you to set aside and isolate non-processed messages to determine why processing failed. Often these failed messages are caused by application errors. For example, a consumer application fails to parse a message correctly and throws an unhandled exception. This exception then triggers an error response that sends the message to the DLQ. The AWS documentation contains a tutorial detailing the configuration of an Amazon SQS dead-letter queue.

To process the failed messages, I build a retry mechanism by implementing an exponential backoff algorithm. The idea behind exponential backoff is to use progressively longer waits between retries for consecutive error responses. Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. This spreads the message retries more evenly across time, allowing them to be processed more efficiently.

Solution overview

Solution architecture

The flow of the message sent by the producer to SQS is as follows:

  1. The producer application sends a message to an SQS queue
  2. The consumer application fails to process the message in the same SQS queue
  3. The message is moved from the main SQS queue to the default dead-letter queue as per the component settings.
  4. A Lambda function is configured with the SQS main dead-letter queue as an event source. It receives and sends back the message to the original queue adding a message timer.
  5. The message timer is defined by the exponential backoff and jitter algorithm.
  6. You can limit the number of retries. If the message exceeds this limit, the message is moved to a second DLQ where an operator processes it manually.

How the replay function works

Each time the SQS dead-letter queue receives a message, it triggers Lambda to run the replay function. The replay code uses an SQS message attribute `sqs-dlq-replay-nb` as a persistent counter for the current number of retries attempted. The number of retries is compared to the maximum number (defined in the application configuration file). If it exceeds the maximum, the message is moved to the human operated queue. If not, the function uses the AWS Lambda event data to build a new message for the Amazon SQS main queue. Finally it updates the retry counter, adds a new message timer to the message, and it sends the message back (replays) to the main queue.

def handler(event, context):
    """Lambda function handler."""
    for record in event['Records']:
        nbReplay = 0
        # number of replay
        if 'sqs-dlq-replay-nb' in record['messageAttributes']:
            nbReplay = int(record['messageAttributes']['sqs-dlq-replay-nb']["stringValue"])

        nbReplay += 1
        if nbReplay > config.MAX_ATTEMPS:
            raise MaxAttempsError(replay=nbReplay, max=config.MAX_ATTEMPS)

        # SQS attributes
        attributes = record['messageAttributes']
        attributes.update({'sqs-dlq-replay-nb': {'StringValue': str(nbReplay), 'DataType': 'Number'}})

        _sqs_attributes_cleaner(attributes)

        # Backoff
        b = backoff.ExpoBackoffFullJitter(base=config.BACKOFF_RATE, cap=config.MESSAGE_RETENTION_PERIOD)
        delaySeconds = b.backoff(n=int(nbReplay))

        # SQS
        SQS.send_message(
            QueueUrl=config.SQS_MAIN_URL,
            MessageBody=record['body'],
            DelaySeconds=int(delaySeconds),
            MessageAttributes=record['messageAttributes']
        )

How to use the application

You can use this serverless application via:

  • The Lambda console: choose the “Browse serverless app repository” option to create a function. Select “amazon-sqs-dlq-replay-backoff” application in the public applications repository. Then, configure the application with the default SQS parameters and the replay feature parameters.
  • The Serverless Framework, as described by Yan Cui in this blog post.
  • An AWS CloudFormation template by using the AWS::ServerlessRepo::Application resource, as described in the documentation.

Here is an example of a CloudFormation template using the AWS Serverless Application Repository application:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ReplaySqsQueue:
    Type: AWS::Serverless::Application
    Properties:
      Location: 
        ApplicationId: arn:aws:serverlessrepo:eu-west-1:1234123412:applications~sqs-dlq-replay
        SemanticVersion: 1.0.0
      Parameters:
        BackoffRate: "2"
        MaxAttempts: "3"

Conclusion

I describe how an exponential backoff algorithm (with jitter) enhances the message processing capabilities of an Amazon SQS queue. You can now find the amazon-sqs-dlq-replay-backoff application in the AWS Serverless Application Repository. Download the code from this GitHub repository.

To get started with dead-letter queues in Amazon SQS, read:

To implement replay mechanisms, see:

For more serverless learning resources, visit https://serverlessland.com.

Creating faster AWS Lambda functions with AVX2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-faster-aws-lambda-functions-with-avx2/

Customers use AWS Lambda to build a wide range of applications, including mission-critical and compute-intensive applications. The most demanding workloads include machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling. With the release of Advanced Vector Extensions 2 (AVX2) support for Lambda, builders can benefit from improved performance for these types of applications.

Overview

This blog post explains AVX2 and how you can take advantage of this instruction set in your Lambda functions. I walk through an example of how to enhance performance of a typical use case using AVX2 and measure the performance gain. This feature is available for new or existing Lambda-based workloads at no additional cost.

AVX2 provides extensions to the x86 instruction set architecture. This is a Single Instruction Multiple Data (SIMD) instruction set that enables running a set of highly parallelizable operations simultaneously. AVX2 allows CPUs to perform a higher number of integer and floating-point operations per clock cycle. For vectorizable algorithms, this can enhance performance resulting in lower latencies and higher throughput.

AVX2 for Lambda

Implementing AVX2 in an example application

Pillow is a popular Python-based imaging library. It provides powerful image manipulation functions that use computationally complex processes. Computer vision operations such as convolution resampling can benefit from parallelization. This is because the filters are applied on different windows that are independent and can be processed in parallel. In this section, I compare the performance of an image transformation after applying AVX2 instructions.

The following example downloads an original JPEG object from an Amazon S3 bucket, resizes the image, and then saves the result to another S3 bucket. There are three resizing filters used – bilinear, bicubic, and Lanczos.

import boto3
import os

from PIL import Image

# Download the image to /tmp
s3 = boto3.client('s3')
s3.download_file('my-input-bucket', 'photo.jpeg', '/tmp/photo.jpeg')

def lambda_handler(event, context):
    # Open image and perform resize
    image = Image.open('/tmp/photo.jpeg')

    # Select one of the three algorithms
    image = image.resize((256, 128), Image.BILINEAR) 
    # image = image.resize((256, 128), Image.BICUBIC) 
    # image = image.resize((256, 128), Image.LANCZOS) 
    
    # Save and upload to S3
    image.save('/tmp/thumbnail.jpeg', 'JPEG')
    s3.upload_file('/tmp/thumbnail.jpeg', 'my-output-bucket', 'thumbnail.jpeg')
    
    return "Success!"

To convert code to use AVX2, you must recompile the source code with the appropriate flags, or use packages and dependencies optimized for AVX2. In this example, you can use a production-ready fork of Pillow called pillow-simd. When compiled for AVX2, it uses the AVX2 instructions to accelerate many of the features in Pillow.

First, you must compile the library using the same Amazon Linux AMI and kernel version that is used by the Lambda service. To do this, use an EC2 or AWS Cloud9 instance running Amazon Linux 2, or using a Docker container with a Lambda-supported image. Compile the library using the following commands:

# Install dependencies
~ yum install -y \
    freetype-devel \
    gcc \
    ghostscript \
    lcms2-devel \
    libffi-devel \
    libimagequant-devel \
    libjpeg-devel \
    libraqm-devel \
    libtiff-devel \
    libwebp-devel \
    make \
    openjpeg2-devel \
    rh-python36 \
    rh-python36-python-virtualenv \
    sudo \
    tcl-devel \
    tk-devel \
    tkinter \
    which \
    xorg-x11-server-Xvfb \
    zlib-devel \
    && yum clean all

# Compile code with AVX2 flag
CC="cc -mavx2" pip install --force-reinstall --no-cache-dir -t . --compile  pillow-simd

You can then use this compiled, AVX2-compatible version of the Pillow library in a Lambda function. This can be bundled with the deployment code or you can deploy the library as a Lambda layer. Depending on the version, you may have to include the following binaries from the `/usr/lib64` directory with the function. If so, add this location to LD_LIBRARY_PATH so the binaries are discoverable:

cp /usr/lib64/libtiff.so.5 lib/libtiff.so.5
cp /usr/lib64/libjpeg.so.62 lib/libjpeg.so.62
cp /usr/lib64/libjbig.so.2.0 lib/libjbig.so.2.0

In this test, I compare the performance of both the original function and the AVX2-optimized version with 1024 MB of memory. The test uses the following image:

Resampled photo

Source: https://unsplash.com/photos/IMXhx6qhvf0. Photo credit: Daniel Seßler.

  1. Bilinear filter.
  2. Bicubic filter.
  3. Lanczos filter.

The timings exclude S3 transfers and only compare the image transformation operation. The results of the three resize operations are:

Filter Without AVX2 With AVX2 Performance
improvement
Bilinear

105 ms

71 ms

32%

Bicubic

122 ms

73 ms

40%

Lanczos

136 ms

77 ms

43%

Using AVX2 in popular Lambda runtimes

This process involves recompiling the source code with appropriate flags, or by selecting packages and dependencies optimized for AVX2. For popular runtimes used in Lambda:

  • Python: Python developers frequently use scipy and numpy libraries to support scientific or computationally complex work. These libraries can be compiled with the AVX2 flag or linked with MKL to take advantage of AVX2.
  • Java: Java’s JIT compiler can auto-vectorize code to run with AVX2 instructions. To learn more, see this post on how to detect vectorization and potentially optimize code to take advantage of this.
  • Golang: the standard golang compiler does not currently support auto-vectorization. However, you can use the gcc compiler for Go, gccgo.
  • Node: for compute intensive workloads, use the AVX2-enabled or MKL-enabled versions of libraries.
  • Compiling from source: for C or C++ libraries for vectorizable work, compile with the appropriate flags to allow the compiler to automatically vectorize your code. See the documentation for additional details.

Enabling AVX2 for the Intel Math Kernel Library

The Intel Math Kernel Library (MKL) is a library of optimized math operations that implicitly use AVX2 instructions when available on the compute platform. Many popular frameworks, such as PyTorch, build with MKL by default so you don’t need to take additional actions. Some libraries, such as TensorFlow, provide options in their build process to specify MKL optimization (set –config=mkl as an option).

You can also build popular scientific Python libraries, such as SciPy and NumPy, with MKL. For instructions on building libraries with MKL, read Numpy/Scipy with Intel MKL and Intel Compilers. Intel also provides a Python distribution that includes SciPy and NumPy with MKL.

Performance improvements

After enabling AVX2 for your Lambda functions, you can compare the before-and-after performance. Lambda emits a latency metric in CloudWatch that you can use to measure the performance improvement. Other third-party production monitoring tools (for example, Datadog or New Relic) can also capture these metrics to profile the performance.

Pricing and availability

Starting today, customers can either compile their existing workloads or deploy new ones to target this instruction set at no additional cost. To learn more on how to build AVX2 compatible applications on AWS Lambda, read the Lambda Developer Guide.

Support for AVX2 is available in all Regions where Lambda is available, except for the Regions in China. For more information on availability, see the AWS Region table.

Conclusion

With the release of AVX2 for Lambda, customers can now run AVX2-optimized workloads while benefitting from the pay-for-use, reduced operational model of AWS Lambda. This feature is provided at no additional cost.

Developers can create highly scalable synchronous, asynchronous, or streaming applications. Compared to the x86 Intel baseline instruction set, AVX2 allows CPUs to perform more integer and floating-point operations per clock cycle. This speeds up compute-intensive applications with parallelizable operations to process data faster and improve throughput. Developers can schedule queues with data-intensive jobs and deliver performant end user experiences.

To learn more, read the Lambda documentation. For more serverless learning resources, visit Serverless Land.

 

Simplifying cross-account access with Amazon EventBridge resource policies

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/simplifying-cross-account-access-with-amazon-eventbridge-resource-policies/

This post is courtesy of Stephen Liedig, Sr Serverless Specialist SA.

Amazon EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the targets and EventBridge routes the events accordingly.

A common architectural approach adopted by customers is to isolate these application components or services by using separate AWS accounts. This “account-per-service” strategy limits the blast radius by providing a logical and physical separation of resources. It provides additional security boundaries and allows customers to easily track service costs without having to adopt a complex tagging strategy.

To enable the flow of events from one account to another, you must create a rule on one event bus that routes events to an event bus in another account. To enable this routing, you need to configure the resource policy for your event buses.

This blog post shows you how to use EventBridge resource policies to publish events and create rules on event buses in another account.

Overview

Today, EventBridge launches improvements to resource policies that make it easier to build applications that work across accounts. The service expands the use of the policy associated with an event bus to the authorization of API calls.

This means you can manage permissions for API calls that interact with the event bus, such as PutEventsPutRule, and PutTargets, directly from that event bus’ resource policy. This replaces the need to create different IAM roles that are assumed by each account that interacts with the event bus. It also provides a central resource to manage your permissions.

There is support for organizations and tags via IAM conditions. Now when you call an API, it considers both the user’s IAM policy and the event bus resource policy in the authorization process.

EventBridge APIs that accept an event bus name parameter (including PutRule, PutTargets, DeleteRule, RemoveTargets, DisableRule, and EnableRule) now also support an event bus ARN. This allows you to target cross-account event buses through the APIs. For example, you can call PutRule to create a rule on an event bus in another account, without needing to assume a role.

EventBridge now supports using policy conditions for the following authorization context keys in the APIs, to help scope down permissions.

Context key

APIs

Customer usage

events:detail-type PutEvents Used to restrict PutEvents calls for events with a specific “detail-type” field.
events:source PutEvents Used to restrict PutEvents calls for events with a specific “source” field.
events:creatorAccount PutRule,
PutTargets,
DeleteRule,
RemoveTargets,
DisableRule,
EnableRule,
TagResource,
UntagResource,
DescribeRule,
ListTargetsByRule,
ListTagsForResource

Used to restrict control plane API calls on rules belonging to a certain account.

This can be used to allow a customer to edit/disable only rules created by their own account.

events:eventBusInvocation PutEvents

Used to differentiate a PutEvents API call from a cross-account event bus target invocation. This context key is set to true during a cross-account event bus target invocation authorization. For example, when a rule matches an event and sends that event to another event bus.

For an API call of PutEvents, this context key is set to false.

Ecommerce example walkthrough

In this ecommerce example, there are multiple services distributed across different accounts. A web store publishes an event when a new order is created. The event is sent via a central event bus, which is in another account. The bus has two rules with target services in different AWS accounts.

Walkthrough architecture

The goal is to create fine-grained permissions that only allow:

  • The web store to publish events for a specific detail-type and source.
  • The invoice processing service to create and manage its own rules on the central bus.

To complete this walk through, you set up three accounts. For account A (Web Store), you deploy an AWS Lambda function that sends the “newOrderCreated” event directly to the “central event bus” in account B. The invoice processing Lambda function in account C creates a rule on the central event bus to process the event published by account A.

Create the central event bus in account B

Account B event bus

Create the central event bus in account B, adding the following resource policy. Be sure to substitute your account numbers for accounts A, B, and C.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WebStoreCrossAccountPublish",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::[ACCOUNT-A]:root"
      },
      "Action": "events:PutEvents",
      "Resource": "arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus",
      "Condition": {
        "StringEquals": {
          "events:detail-type": "newOrderCreated",
          "events:source": "com.exampleCorp.webStore"
        }
      }
    },
    {
      "Sid": "InvoiceProcessingRuleCreation",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::[ACCOUNT-C]:root"
      },
      "Action": [
        "events:PutRule",
        "events:DeleteRule",
        "events:DescribeRule",
        "events:DisableRule",
        "events:EnableRule",
        "events:PutTargets",
        "events:RemoveTargets"
      ],
      "Resource": "arn:aws:events:us-east-1:[ACCOUNT-B]:rule/central-event-bus/*",
      "Condition": {
        "StringEqualsIfExists": {
          "events:creatorAccount": "${aws:PrincipalAccount}",
          "events:source": "com.exampleCorp.webStore"
        }
      }
    }
  ]
}

Create event bus

There are two statements in the resource policy: WebStoreCrossAccountPublish and InvoiceProcessingRuleCreation.

The WebStoreCrossAccountPublish statement allows the Lambda function in account A to publish events directly to the central event bus. There are two conditions in the statement that further restrict the types of events that can be sent to the event bus. The first restricts the event detail-type to equal “newOrderCreated” and the second condition requires that the event source equals “com.exampleCorp.webStore”.

The InvoiceProcessingRuleCreation statement allows the invoice processing function in account C to describe, add, update, enable, disable, and delete any rules created by account C. You apply this restriction by using the events:creatorAccount context key in the statements condition.

Importantly you should set the StringEqualsIfExists type for the events:creatorAccount condition. If you use StringEquals, it results in an AccessDeniedException. AWS CloudFormation calls DescribeRule to check if the rule already exists. However, as this is a new rule, and because you set a condition for events:creatorAccount for DescribeRule, this key is not populated and CloudFormation receives an AccessDeniedException instead of ResourceNotFoundException.

Here is how you create the event bus using AWS CloudFormation:

  CentralEventBus: 
      Type: AWS::Events::EventBus
      Properties: 
          Name: !Ref EventBusName

  WebStoreCrossAccountPublishStatement: 
      Type: AWS::Events::EventBusPolicy
      Properties: 
          EventBusName: !Ref CentralEventBus
          StatementId: "WebStoreCrossAccountPublish"
          Statement: 
              Effect: "Allow"
              Principal: 
                  AWS: !Sub arn:aws:iam::${AccountA}:root
              Action: "events:PutEvents"
              Resource: !GetAtt CentralEventBus.Arn
              Condition:
                  StringEquals:
                      "events:detail-type": "newOrderCreated"
                      "events:source" : "com.exampleCorp.webStore"
                      
  InvoiceProcessingRuleCreationStatement: 
      Type: AWS::Events::EventBusPolicy
      Properties: 
          EventBusName: !Ref CentralEventBus
          StatementId: "InvoiceProcessingRuleCreation"
          Statement: 
              Effect: "Allow"
              Principal: 
                  AWS: !Sub arn:aws:iam::${AccountC}:root
              Action: 
                  - "events:PutRule"
                  - "events:DeleteRule"
                  - "events:DescribeRule"
                  - "events:DisableRule"
                  - "events:EnableRule"
                  - "events:PutTargets"
                  - "events:RemoveTargets"
              Resource: 
                  - !Sub arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/${CentralEventBus.Name}/*
              Condition:
                  StringEqualsIfExists:
                      "events:creatorAccount" : "${aws:PrincipalAccount}"
                  StringEquals:
                      "events:source": "com.exampleCorp.webStore"

Now that you have a policy set up on the central event bus, configure the client applications to send and process events. The client application must also have permissions configured.

Create the web store order function in account A

Web Store order function

In account A, create a Lambda function to send the event to the central bus in account B. Set the EventBusName parameter to the central event bus ARN on the PutEvents API call. This allows you to target cross-account event buses directly.

import json
import boto3

EVENT_BUS_ARN = 'arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus'

# Create EventBridge client
events = boto3.client('events')

def lambda_handler(event, context):

  # new order created event datail
  eventDetail  = {
    "orderNo": "123",
    "orderDate": "2020-09-09T22:01:02Z",
    "customerId": "789",
    "lineItems": {
      "productCode": "P1",
      "quantityOrdered": 3,
      "unitPrice": 23.5,
      "currency": "USD"
    }
  }
  
  try:
    # Put an event
    response = events.put_events(
        Entries=[
            {
                'EventBusName': EVENT_BUS_ARN,
                'Source': 'com.exampleCorp.webStore',
                'DetailType': 'newOrderCreated',
                'Detail': json.dumps(eventDetail)
            }
        ]
    )
    
    print(response['Entries'])
    print('Event sent to the event bus ' + EVENT_BUS_ARN )
    print('EventID is ' + response['Entries'][0]['EventId'])
    
  except Exception as e:
      print(e)

Create the Invoice Processing service in account C

Invoice processing service in account C

Next, create the invoice processing function that processes the newOrderCreated event. You use the AWS Serverless Application Model (AWS SAM) to create the invoice processing function and other application resources. Before you can process any events from the central event bus, you must create a new event bus in account C to receive incoming events.

Next, you define the function that processes the events. Here, you define a Lambda event source that is an EventBridge rule. You set the EventBusName to the receiving invoice processing event bus. When this Lambda function is deployed, AWS SAM creates the rule on the event bus with the specified pattern and target. It configures the event source that triggers the function when an event is received.

Parameters:
  EventBusName:
    Description: Name of the central event bus
    Type: String
    Default: invoice-processing-event-bus
  CentralEventBusArn:
    Description: The ARN of the central event bus # e.g. arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus
    Type: String
Resources:
  # This is the receiving invoice processing event bus in account C.
  InvoiceProcessingEventBus: 
    Type: AWS::Events::EventBus
    Properties: 
        Name: !Ref EventBusName
# AWS Lambda function processes the newOrderCreated event
  InvoiceProcessingFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: invoice_processing
      Handler: invoice_processing_function/app.lambda_handler
      Runtime: python3.8
      Events:
        NewOrderCreatedRule:
          Type: EventBridgeRule
          Properties:
            EventBusName: !Ref InvoiceProcessingEventBus
            Pattern:
              source:
                - com.exampleCorp.webStore
              detail-type:
                - newOrderCreated

The next resource in the AWS SAM template is the rule that creates the target on the central event bus. It sends events to the invoice processing event bus. Though the rule is added to the central event bus, its definition is managed by the invoice processing service template. The rule definition sets EventBusName parameter to the ARN of the central event bus.

  # This is the rule that the invoice processing service creates on the central event bus
  InvoiceProcessingRule:
    Type: AWS::Events::Rule
    Properties:
      Name: InvoiceProcessingNewOrderCreatedSubscription
      Description: Cross account rule created by Invoice Processing service
      EventBusName: !Ref CentralEventBusArn # ARN of the central event bus
      EventPattern:
        source:
          - com.exampleCorp.webStore
        detail-type:
          - newOrderCreated
      State: ENABLED
      Targets: 
        - Id: SendEventsToInvoiceProcessingEventBus
          Arn: !GetAtt InvoiceProcessingEventBus.Arn
          RoleArn: !GetAtt CentralEventBusToInvoiceProcessingEventBusRole.Arn
          DeadLetterConfig:
            Arn: !GetAtt InvoiceProcessingTargetDLQ.Arn

For the central event bus target to send the event to the invoice processing event bus in account C, EventBridge needs the necessary permissions to invoke the PutEvents API. The CentralEventBusToInvoiceProcessingEventBusRole IAM role provides that permission. It is assumed by the central event bus in account B when it needs to send events to the invoice processing event bus, without you having to create an additional resource policy on the invoice processing event bus.

  CentralEventBusToInvoiceProcessingEventBusRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
              - events.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: PutEventsOnInvoiceProcessingEventBus
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 'events:PutEvents'
                Resource: !GetAtt InvoiceProcessingEventBus.Arn

You can also set up a dead-letter queue (DLQ) configuration for the rule in account C. This allows the subscriber of the event to control where events that fail to get delivered to the invoice processing event bus get sent. All you need to do to make this happen is create an Amazon SQS queue in account C, and a queue policy that sets a resource policy to allow EventBridge to send failed events from account B to the queue in account C.

  # Invoice Processing Target Dead Letter Queue 
  InvoiceProcessingTargetDLQ:
    Type: AWS::SQS::Queue

  # SQS resource policy required to allow target on central bus to send failed messages to target DLQ
  InvoiceProcessingTargetDLQPolicy: 
    Type: AWS::SQS::QueuePolicy
    Properties: 
      Queues: 
        - !Ref InvoiceProcessingTargetDLQ
      PolicyDocument: 
        Statement: 
          - Action: 
              - "SQS:SendMessage" 
            Effect: "Allow"
            Resource: !GetAtt InvoiceProcessingTargetDLQ.Arn
            Principal:  
              Service: "events.amazonaws.com"
            Condition:
              ArnEquals:
                "aws:SourceArn": !GetAtt InvoiceProcessingRule.Arn 

Conclusion

This post shows you how to use the new features Amazon EventBridge resource policies that make it easier to build applications that work across accounts. Resource policies provide you with a powerful mechanism for modeling your event buses across multiple accounts, and give you fine-grained control over EventBridge API invocations.

Download the code in this blog from https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples.

For more serverless learning resources, visit Serverless Land.

Application integration patterns for microservices: Orchestration and coordination

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-for-microservices-orchestration-and-coordination/

This post is courtesy of Stephen Liedig, Sr. Serverless Specialist SA.

This is the final blog post in the “Application Integration Patterns for Microservices” series. Previous posts cover asynchronous messaging for microservices, fan-out strategies, and scatter-gather design patterns.

In this post, I look at how to implement messaging patterns to help orchestrate and coordinate business workflows in our applications. Specifically, I cover two patterns:

  • Pipes and Filters, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004)
  • Saga Pattern, which is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987.

I discuss these patterns using the Wild Rydes example from this series.

Wild Rydes

Wild Rydes is a fictional technology start-up created to disrupt the transportation industry by replacing traditional taxis with unicorns. Several hands-on AWS workshops use the Wild Rydes scenario. It illustrates concepts such as serverless development, event-driven design, API management, and messaging in microservices.

Wild Rydes

This blog post explores the build process of the Wild Rydes workshop, to help you apply these concepts to your applications.

After completing a unicorn ride, the Wild Rydes customer application charges the customer. Once the driver submits a ride completion, an event triggers the following steps:

  • Registers the fare: registers the fare ride completion event.
  • Initiates the payment (via a payment service): calls a payment gateway for credit card pre-authorization. Using the pre-authorization code, it completes the payment transaction.
  • Updates customer accounting system: once the payment is processed, updates the Wild Rydes customer accounting system with the transaction detail.
  • Publishes “Fare Processed” event: sends a notification to interested components that the process is completed.

Each of the steps interfaces with separate systems – the Wild Rydes system, a third-party payment provider, and the customer accounting system. You could implement these steps inside a single component, but that would make it difficult to change and adapt. It’d also reduce the potential for components reuse within our application. Breaking down the steps into individual components allows you to build components with a single responsibility making it easier to manage each components dependencies and application lifecycle. You can be selective about how you implement the respective components, for example, different teams responsible for the development of the respective components may choose to use different languages. This is where the Pipes and Filters architectural pattern can help.

Pipes and filters

Hohpe and Woolf define Pipes and Filters as an “architectural style to divide a larger processing task into a sequence of smaller, independent processing steps (filters) that are connected by channels (pipes).”

Pipes and filters architecture

Pipes provide a communications channel that abstracts the consumer of messages sent through that channel. It decouples your filter from one another, so components only need to know the messaging channel, or endpoint, where they are sending messages. They do not know who, or what, is processing that message, or where the receiver is located on the network.

Amazon SQS provides a lightweight solution with the power and scale of messaging middleware. It is a simple, flexible, fully managed message queuing service for reliably and continuously exchanging large volume of messages. It has virtually limitless scalability and the ability to increase message throughput without pre-provisioning capacity.

You can create an SQS queue with this AWS CLI command:

aws sqs create-queue --queue-name MyQueue

For the fare processing scenario, you could implement a Pipes and Filters architectural pattern using AWS services. This uses two Amazon SQS queues and an Amazon SNS topic:

Pipes and filters pattern with AWS services

Amazon SQS provides a mechanism for decoupling the components. The filters only need to know to which queue to send the message, without knowing which component processes that message nor when it is processed. SQS does this in a secure, durable, and scalable way.

Despite the fact that none of the filters have a direct dependency on one another, there is still a degree of coupling at the pipe level. Changing execution order therefore forces you to update and redeploy your existing filters to point to a new pipe. In the Wild Rydes example, you can reduce the impact of this by defining an environment variable for the destination endpoint in AWS Lambda function configuration, rather than hardcoding this inside your implementations.

Dealing with failures and retries requires some consideration too. In Amazon SQS terms, this requires you to define configurations, such a message VisibilityTimeOut. The VisibilityTimeOut setting provides you with some transactional support. It ensures that the message is not removed from the queue until after you have finished processing the message and you explicitly delete it from the queue. Using Amazon SQS as an Event Source for AWS Lambda further simplifies that for you because the message polling implementation is managed by the service, so you don’t need to create an explicit implementation in your filter.

Amazon SQS helps deal with failures gracefully as it maintains a count of how many times a message is processed via ReceiveCount. By specifying a maxReceiveCount, you can limit the number of times a poisoned message gets processed. Combine this with a dead letter queue (DLQ), you can then move messages that have exceeded the maxReceiveCount number to the DLQ. Adding Amazon CloudWatch alarms on metrics such as ApproximateNumberOfMessagesVisible on the DLQ, you can proactively alert on system failures if the number of messages on the dead letter queue exceed and acceptable threshold.

Alternatively, you can model the fare payment scenario with AWS Step Functions. Step Functions externalizes the Pipes and Filters pattern. It extracts the coordination from the filter implementations into a state machine that orchestrates the sequence of events. Visual workflows allow you to change the sequence of execution without modifying code, reducing the amount of coupling between collaborating components.

Here is how you could model the fare processing scenario using Step Functions:

Fare processing with Step Functions

{
  "Comment": "StateMachine for Processing Fare Payments",
  "StartAt": "RegisterFare",
  "States": {
    "RegisterFare": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:RegisterFareFunction",
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ChargeFareFunction",
      "Next": "UpdateCustomerAccount"
    },
    "UpdateCustomerAccount": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:UpdateCustomerAccountFunction",
      "Next": "PublishFareProcessedEvent"
    },
    "PublishFareProcessedEvent": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:REGION:ACCOUNT_ID:myTopic",
        "Message": {
          "Input": "Hello from Step Functions!"
        }
      },
      "End": true
    }
  }
}

AWS Step Functions allows you to easily build more sophisticated workflows. These could include decision points, parallel processing, wait states to pause the state machine execution, error handling, and retry logic. Error and Retry states help you simplify your component implementation by providing a framework for error handling and implementation exponential backoff on retries. You can define alternate execution paths if failures cannot be handled.

In this implementation, each of these states is a discrete transaction. Some implement database transactions when registering the fare, others are calling the third-party payment provider APIs, and internal APIs or programming interfaces when updating the customer accounting system.

Dealing with each of these transactions independently is relatively straightforward. But what happens if you require consistency across all steps so that either all or none of the transactions complete? How can you deal with consistency across multiple, distributed transactions? How do we deal with the temporal aspects of coordinating these potentially long running heterogeneous integrations?

Consistency across multiple, distributed transactions.

Cloud providers do not support Distributed Transaction Coordinators (DTC) or two-phase commit protocols responsible for coordinating transactions across multiple cloud resources. Therefore, you need a mechanism to explicitly coordinate multiple local transactions. This is where the saga pattern and AWS Step Functions can help.

A saga is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987, they define the concept of a saga as:

“LLT is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions.” (Garcia-Molina, Salem 1987)

Fundamentally, saga can provide a failure management pattern to establish consistency across all of your distributed applications, by implementing a compensating transaction for each step in a series of functions. Compensating transactions allow you to back out of the changes that were previously committed in your series of functions, so that if one of your steps fails you can “undo” what you did before, and leave your system in stable state, devoid of side-effects.

AWS Step Functions provides a mechanism for implementing a saga pattern with the ability to build fully managed state machines that allow you to catch custom business exceptions and manage and share data across state transitions.

Infrastructure with service integrations

Figure 1: Using Step Functions’ Service Integrations for Amazon DynamoDB and Amazon SNS, you can further reduce the need for a custom AWS Lambda implementation to persist data to the database, or send a notification.

By using these capabilities, you can expand on the previous Fare Processing state machine and implementing compensating transaction states. If Register Fare fails, you may want to emit an event that invokes an external support function or generates a notification informing operators of the system the error.

If payment processing failed, you would want to ensure that the status is updated to reflect state change and then notify operators of the failed event. You might decide to refund customers, update the fare status and notify support, until you have been able to resolve issues with the customer accounting system. Regardless of the approach, Step Functions allows you to model a failure scenario that aligns with a more business-centric view of consistency.

Step Functions workflow results

If you want to see the full state machine implementation in Lab4 of Wild Rydes Asynchronous Messaging Workshop. The workshop guides you through building your own state machine so you can see how to apply the pattern to your own scenarios. There are also three other workshops you can walk through that cover the other patterns in the series.

Conclusion

Using Wild Rydes, I show how to use Amazon SQS and AWS Step Functions to decouple your application components and services. I show you how these services help to coordinate and orchestrate distributed components to build resilient and fault tolerant microservices architectures.

Take part in the Wild Rydes Asynchronous Messaging Workshop and learn about the other messaging patterns you can apply to microservices architectures, including fan-out and message filtering, topic-queue-chaining and load balancing (blog post), and scatter-gather.

The Wild Rydes Asynchronous Messaging Workshop resources are hosted on our AWS Samples GitHub repository, including the sample code for this blog post under Lab-4: Choreography and orchestration.

For a deeper dive into queues and topics and how to use these in microservices architectures, read:

  1. The AWS whitepaper, Implementing Microservices on AWS.
  2. Implementing enterprise integration patterns with AWS messaging services: point-to-point channels.
  3. Implementing enterprise integration patterns with AWS messaging services: publish-subscribe channels.
  4. Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox.

For more information on enterprise integration patterns, see: