All posts by James Beswick

Understanding data streaming concepts for serverless applications

Post Syndicated from James Beswick original

Amazon Kinesis is a suite of managed services that can help you collect, process, and analyze streaming data in near-real time. It consists of four separate services that are designed for common tasks with streaming data: This blog post focuses on Kinesis Data Streams.

One of the main benefits of processing streaming data is that an application can react as new data is generated, instead of waiting for batches. This real-time capability enables new functionality for applications. For example, payment processors can analyze payments in real time to detect fraudulent transactions. Ecommerce websites can use streams of clickstream activity to determine site engagement metrics in near-real time.

Kinesis can be used with Amazon EC2-based and container-based workloads. However, its integration with AWS Lambda can make it a useful data source for serverless applications. Using Lambda as a stream consumer can also help minimize the amount of operational overhead for managing streaming applications.

In this post, I explain important streaming concepts and how they affect the design of serverless applications. This post references the Alleycat racing application. Alleycat is a home fitness system that allows users to compete in an intense series of 5-minute virtual bicycle races. Up to 1,000 racers at a time take the saddle and push the limits of cadence and resistance to set personal records and rank on leaderboards. The Alleycat software connects the stationary exercise bike with a backend application that processes the data from thousands of remote devices.

The Alleycat frontend allows users to configure their races and view real-time leaderboard and historical rankings. The frontend could wait until the end of each race and collect the total output from each racer. Once the batch is ready, it could rank the results and provide a leaderboard once the race is completed. However, this is not very engaging for competitors. By using streaming data instead of a batch, the application show the racers a view of who is winning during the race. This makes the virtual environment more like a real-life cycling race.

Producers and consumers

In streaming data workloads, producers are the applications that produce data and consumers are those that process it. In a serverless streaming application, a consumer is usually a Lambda function, Amazon Kinesis Data Firehose, or Amazon Kinesis Data Analytics.

Kinesis producers and consumers

There are a number of ways to put data into a Kinesis stream in serverless applications, including direct service integrations, client libraries, and the AWS SDK.


Kinesis Data Streams

Kinesis Data Firehose

Amazon CloudWatch Logs Yes, using subscription filters Yes, using subscription filters
AWS IoT Core Yes, using IoT rule actions Yes, using IoT rule actions
AWS Database Migration Service Yes – set stream as target Not directly.
Amazon API Gateway Yes, via REST API direct service integration Yes, via REST API direct service integration
AWS Amplify Yes – via JavaScript library Not directly

A single stream may have tens of thousands of producers, which could be web or mobile applications or IoT devices. The Alleycat application uses the AWS IoT SDK for JavaScript to publish messages to an IoT topic. An IoT rule action then uses a direct integration with Kinesis Data Streams to push the data to the stream. This configuration is ideal at the device level, especially since the device may already use AWS IoT Core to receive messages.

The Alleycat simulator uses the AWS SDK to send a large number of messages to the stream. The SDK provides two methods: PutRecord and PutRecords. The first allows you to send a single record, while the second supports up to 500 records per request (or up to 5 MB in total). The simulator uses the putRecords JavaScript API to batch messages to the stream.

A producer can put records directly on a stream, for example via the AWS SDK, or indirectly via other services such as Amazon API Gateway or AWS IoT Core. If direct, the producer must have appropriate permission to write data to the stream. If indirect, the producer must have permission to invoke the proxy service, and then the service must have permission to put data onto the stream.

While there may be many producers, there are comparatively fewer consumers. You can register up to 20 consumers per data stream, which share the outgoing throughout limit per shard. Consumers receive batches of records sequentially, which means processing latency increases as you add more consumers to a stream. For latency-sensitive applications, Kinesis offers enhanced fan-out which gives consumers 2 MB per second dedicated throughput and uses a push model to reduce latency.

Shards, streams, and partition keys

A shard is a sequence of data records in a stream with a fixed capacity. Part of Kinesis billing is based upon the number of shards. A single shard can process up to 1 MB per second or 1,000 records of incoming data. One shard can also send up to 2 MB per second of outgoing data to downstream consumers. These are hard limits on the throughputs of a shard and as your application approaches these limits, you must add more shards to avoid exceeding these limits.

A stream is a collection of these shards and is often a grouping at the workload or project level. Adding another shard to a stream effectively doubles the throughput, though it also doubles the cost. When there is only one shard in a stream, all records sent to that sent are routed to the same shard. With multiple shards, the routing of incoming messages to shards is determined by a partition key.

The data producer adds the partition key before sending the record to Kinesis. The service calculates an MD5 hash of the key, which maps to one of the shards in the stream. Each shard is assigned a range of non-overlapping hash values, so each partition key maps to only one shard.

MD5 hash function

The partition key exists as an alternative to specifying a shard ID directly, since it’s common in production applications to add and remove shards depending upon traffic. How you use the partition key determines the shard-mapping behavior. For example:

  • Same value: If you specify the same string as the partition key, every message is routed to a single shard, regardless of the number of shards in the stream. This is called overheating a shard.
  • Random value: Using a pseudo-random value, such as a UUID, evenly distributes messages between all the shards available.
  • Time-based: Using a timestamp as a partition key may result in a preference for a single shard if multiple messages arrive at the same time.
  • Applicationspecific: The Alleycat application uses the raceId as a partition key to ensure that all messages from a single race are processed by the same shard consumer.

A Lambda function is a consumer application for a data stream and processes one batch of records for each shard. Since Alleycat uses a tumbling window to calculate aggregates between batches, this use of the partition key ensures that all messages for each raceId are processed by the same function. The downside to this architecture is that it is limited to 1,000 incoming messages per second with the same raceId since it is bound to a single shard.

Deciding on a partition key strategy depends upon the specific needs of your workload. In most cases, a random value partition key is often the best approach.

Streaming payloads in serverless applications

When using the SDK to put messages to a stream, the Data attribute can be a buffer, typed array, blob, or string. Combined with the partition key value, the maximum record size is 1 MB. The Data value is base64 encoded when serialized by Kinesis and delivered as an encoded value to downstream consumers. When using a Lambda function consumer, Kinesis delivers batches in a Records array. Each record contains the encoded data attribute, partition key, and additional metadata in a JSON envelope:

JSON transformation from producer to consumer

Ordering and idempotency

Records in a Kinesis stream are delivered to consuming applications in the same order that they arrive at the Kinesis service. The service assigns a sequence number to the record when it is received and this is delivered as part of the payload to a Kinesis consumer:

Sequence number in payload

When using Lambda as a consuming application for Kinesis, by default each shard has a single instance of the function processing records. In this case, ordering is guaranteed as Kinesis invokes the function serially, one batch of records at a time.

Parallelization factor of 1

You can increase the number of concurrent function invocations by setting the ParallelizationFactor on the event source mapping. This allows you to set a concurrency of between 1 and 10, which provides a way to increase Lambda throughout if the IteratorAge metric is increasing. However, one side effect is that ordering per shard is no longer guaranteed, since the shard’s messages are split into multiple subgroups based upon an internal hash.

Parallelization factor is 2

Kinesis guarantees that every record is delivered “at least once”, but occasionally messages are delivered more than once. This is caused by producers that retry messages, network-related timeouts, and consumer retries, which can occur when worker processes restart. In both cases, these are normal activities and you should design your application to handle infrequent duplicate records.

To prevent the duplicate messages causing unintentional side effects, such as charging a payment twice, it’s important to design your application with idempotency in mind. By using transaction IDs appropriately, your code can determine if a given message has been processed previously, and ignore any duplicates. In the Alleycat application, the aggregation and processes of messages is idempotent. If two identical messages are received, processing completes with the same end result.

To learn more about implementing idempotency in serverless applications, read the “Serverless Application Lens: AWS Well-Architected Framework”.


In this post, I introduce some of the core streaming concepts for serverless applications. I explain some of the benefits of streaming architectures and how Kinesis works with producers and consumers. I compare different ways to ingest data, how streams are composed of shards, and how partition keys determine which shard is used. Finally, I explain the payload formats at the different stages of a streaming workload, how message ordering works with shards, and why idempotency is important to handle.

To learn more about building serverless web applications, visit Serverless Land.

Developing evolutionary architecture with AWS Lambda

Post Syndicated from James Beswick original

This post was written by Luca Mezzalira, Principal Solutions Architect, Media and Entertainment.

Agility enables you to evolve a workload quickly, adding new features, or introducing new infrastructure as required. The key characteristics for achieving agility in a code base are loosely coupled components and strong encapsulation.

Loose coupling can help improve test coverage and create atomic refactoring. With encapsulation, you expose only what is needed to interact with a service without revealing the implementation logic.

Evolutionary architectures can help achieve agility in your design. In the book “Building Evolutionary Architectures”, this architecture is defined as one that “supports guided, incremental change across multiple dimensions”.

This blog post focuses on how to structure code for AWS Lambda functions in a modular fashion. It shows how to embrace the evolutionary aspect provided by the hexagonal architecture pattern and apply it to different use cases.

Introducing ports and adapters

Hexagonal architecture is also known as the ports and adapters architecture. It is an architectural pattern used for encapsulating domain logic and decoupling it from other implementation details, such as infrastructure or client requests.

Ports and adapters

  1. Domain logic: Represents the task that the application should perform, abstracting any interaction with the external world.
  2. Ports: Provide a way for the primary actors (on the left) to interact with the application, via the domain logic. The domain logic also uses ports for interacting with secondary actors (on the right) when needed.
  3. Adapters: A design pattern for transforming one interface into another interface. They wrap the logic for interacting with a primary or secondary actor.
  4. Primary actors: Users of the system such as a webhook, a UI request, or a test script.
  5. Secondary actors: used by the application, these services are either a Repository (for example, a database) or a Recipient (such as a message queue).

Hexagonal architecture with Lambda functions

Lambda functions are units of compute logic that accomplish a specific task. For example, a function could manipulate data in a Amazon Kinesis stream, or process messages from an Amazon SQS queue.

In Lambda functions, hexagonal architecture can help you implement new business requirements and improve the agility of a workload. This approach can help create separation of concerns and separate the domain logic from the infrastructure. For development teams, it can also simplify the implementation of new features and parallelize the work across different developers.

The following example introduces a service for returning a stock value. The service supports different currencies for a frontend application that displays the information in a dashboard. The translation of a stock value between currencies happens in real time. The service must retrieve the exchange rates with every request made by the client.

The architecture for this service uses an Amazon API Gateway endpoint that exposes a REST API. When the client calls the API, it triggers a Lambda function. This gets the stock value from a DynamoDB table and the currency information from a third-party endpoint. The domain logic uses the exchange rate to convert the stock value to other currencies before responding to the client request.

The full example is available in the AWS GitHub samples repository. Here is the architecture for this service:

Hexagonal architecture example

  1. A client makes a request to the API Gateway endpoint, which invokes the Lambda function.
  2. The primary adapter receives the request. It captures the stock ID and pass it to the port:
    exports.lambdaHandler = async (event) => {
    	// retrieve the stockID from the request
            const stockID = event.pathParameters.StockID;
    	// pass the stockID to the port
            const response = await getStocksRequest(stockID);
            return response
  3. The port is an interface for communicating with the domain logic. It enforces the separation between an adapter and the domain logic. With this approach, you can change and test the infrastructure and domain logic in isolation without impacting another part of the code base:
    const retrieveStock = async (stockID) => {
    	//use the port “stock” to access the domain logic
            const stockWithCurrencies = await stock.retrieveStockValues(stockID)
            return stockWithCurrencies;
  4. The port passing the stock ID invokes the domain logic entry point. The domain logic fetches the stock value from a DynamoDB table, then it requests the exchange rates. It returns the computed values to the primary adapter via the port. The domain logic always uses a port to interact with an adapter because the ports are the interfaces with the external world:
    const CURRENCIES = [“USD”, “CAD”, “AUD”]
    const retrieveStockValues = async (stockID) => {
    try {
    //retrieve the stock value from DynamoDB using a port
            const stockValue = await Repository.getStockData(stockID);
    //fetch the currencies value using a port
            const currencyList = await Currency.getCurrenciesData(CURRENCIES);
    //calculate the stock value in different currencies
            const stockWithCurrencies = {
                stock: stockValue.STOCK_ID,
                values: {
                    "EUR": stockValue.VALUE
            for(const currency in currencyList.rates){
                stockWithCurrencies.values[currency] =  (stockValue.VALUE * currencyList.rates[currency]).toFixed(2)
    // return the final computation to the port
            return stockWithCurrencies;


This is how the domain logic interacts with the DynamoDB table:

DynamoDB interaction

  1. The domain logic uses the Repository port for interacting with the database. There is not a direct connection between the domain and the adapter:
    const getStockData = async (stockID) => {
    //the domain logic pass the request to fetch the stock ID value to this port
            const data = await getStockValue(stockID);
            return data.Item;
  2. The secondary adapter encapsulates the logic for reading an item from a DynamoDB table. All the logic for interacting with DynamoDB is encapsulated in this module:
    const getStockValue = async (stockID) => {
        let params = {
            TableName : DB_TABLE,
                'STOCK_ID': stockID
        try {
            const stockData = await documentClient.get(params).promise()
            return stockData


The domain logic uses an adapter for fetching the exchange rates from the third-party service. It then processes the data and responds to the client request:


  1. Currencies API interactionThe second operation in the business logic is retrieving the currency exchange rates. The domain logic requests the operation via a port that proxies the request to the adapter:
    const getCurrenciesData = async (currencies) => {
            const data = await getCurrencies(currencies);
            return data
  2. The currencies service adapter fetches the data from a third-party endpoint and returns the result to the domain logic.
    const getCurrencies = async (currencies) => {
            const res = await axios.get(`${currencies.toString()}`)

These eight steps show how to structure the Lambda function code using a hexagonal architecture.

Adding a cache layer

In this scenario, the production stock service experiences traffic spikes during the day. The external endpoint for the exchange rates cannot support the level of traffic. To address this, you can implement a caching strategy with Amazon ElastiCache using a Redis cluster. This approach uses a cache-aside pattern for offloading traffic to the external service.

Typically, it can be challenging to evolve code to implement this change without the separation of concerns in the code base. However, in this example, there is an adapter that interacts with the external service. Therefore, you can change the implementation to add the cache-aside pattern and maintain the same API contract with the rest of the application:

const getCurrencies = async (currencies) => {
// Check the exchange rates are available in the Redis cluster
        let res = await asyncClient.get("CURRENCIES");
// If present, return the value retrieved from Redis
            return JSON.parse(res);
// Otherwise, fetch the data from the external service
        const getCurr = await axios.get(`${currencies.toString()}`)
// Store the new values in the Redis cluster with an expired time of 20 seconds
        await asyncClient.set("CURRENCIES", JSON.stringify(, "ex", 20);
// Return the data to the port

This is a low-effort change only affecting the adapter. The domain logic and port interacting with the adapter are untouched and maintain the same API contract. The encapsulation provided by this architecture helps to evolve the code base. It also preserves many of the tests in place, considering only an adapter is modified.

Moving domain logic from a container to a Lambda function

In this example, the team working on this workload originally wrap all the functionality inside a container using AWS Fargate with Amazon ECS. In this case, the developers define a route for the GET method for retrieving the stock value:

// This web application uses the Fastify framework 
  fastify.get('/stock/:StockID', async (request, reply) => {
        const stockID = request.params.StockID;
        const response = await getStocksRequest(stockID);
        return response

In this case, the route’s entry point is exactly the same for the Lambda function. The team does not need to change anything else in the code base, thanks to the characteristics provided by the hexagonal architecture.

This pattern can help you more easily refactor code from containers or virtual machines to multiple Lambda functions. It introduces a level of code portability that can be more challenging with other solutions.

Benefits and drawbacks

As with any pattern, there are benefits and drawbacks to using hexagonal architecture.

The main benefits are:

  • The domain logic is agnostic and independent from the outside world.
  • The separation of concerns increases code testability.
  • It may help reduce technical debt in workloads.

The drawbacks are:

  • The pattern requires an upfront investment of time.
  • The domain logic implementation is not opinionated.

Whether you should use this architecture for developing Lambda functions depends upon the needs of your application. With an evolving workload, the extra implementation effort may be worthwhile.

The pattern can help improve code testability because of the encapsulation and separation of concerns provided. This approach can also be used with compute solutions other than Lambda, which may be useful in code migration projects.


This post shows how you can evolve a workload using hexagonal architecture. It explains how to add new functionality, change underlying infrastructure, or port the code base between different compute solutions. The main characteristics enabling this are loose coupling and strong encapsulation.

To learn more about hexagonal architecture and similar patterns, read:

For more serverless learning resources, visit Serverless Land.

Translating content dynamically by using Amazon S3 Object Lambda

Post Syndicated from James Beswick original

This post is written by Sandeep Mohanty, Senior Solutions Architect.

The recent launch of Amazon S3 Object Lambda creates many possibilities to transform data in S3 buckets dynamically. S3 Object Lambda can be used with other AWS serverless services to transform content stored in S3 in many creative ways. One example is using S3 Object Lambda with Amazon Translate to translate and serve content from S3 buckets on demand.

Amazon Translate is a serverless machine translation service that delivers fast and customizable language translation. With Amazon Translate, you can localize content such as websites and applications to serve a diverse set of users.

Using S3 Object Lambda with Amazon Translate, you do not need to translate content in advance for all possible permutations of source to target languages. Instead, you can transform content in near-real time using a data driven model. This can serve multiple language-specific applications simultaneously.

S3 Object Lambda enables you to process and transform data using Lambda functions as objects are being retrieved from S3 by a client application. S3 GET object requests invoke the Lambda function and you can customize it to transform the content to meet specific requirements.

For example, if you run a website or mobile application with global visitors, you must provide translations in multiple languages. Artifacts such as forms, disclaimers, or product descriptions can be translated to serve a diverse global audience using this approach.

Solution architecture

This is the high-level architecture diagram for the example application that translates dynamic content on demand:

Solution architecture

In this example, you create an S3 Object Lambda that intercepts S3 GET requests for an object. It then translates the file to a target language, passed as an argument appended to the S3 object key. At a high level, the steps can be summarized as follows:

  1. Create a Lambda function to translate data from a source language to a target language using Amazon Translate.
  2. Create an S3 Object Lambda Access Point from the S3 console.
  3. Select the Lambda function created in step 1.
  4. Provide a supporting S3 Access Point to give S3 Object Lambda access to the original object.
  5. Retrieve a file from S3 by invoking the S3 GetObject API, and pass the Object Lambda Access Point ARN as the bucket name instead of the actual S3 bucket name.

Creating the Lambda function

In the first step, you create the Lambda function, DynamicFileTranslation. This Lambda function is invoked by an S3 GET Object API call and it translates the requested object. The target language is passed as an argument appended to the S3 object key, corresponding to the object being retrieved.

For example, for the object key passed in the S3 GetObject API call is customized to look something like “ContactUs/contact-us.txt#fr”, the characters after the pound sign represent the code for the target language. In this case, ‘fr’ is French. The full list of supported languages and language codes can be found here.

This Lambda function dynamically translates the content of an object in S3 to a target language:

import json
import boto3
from urllib.parse import urlparse, unquote
from pathlib import Path
def lambda_handler(event, context):

   # Extract the outputRoute and outputToken from the object context
    object_context = event["getObjectContext"]
    request_route = object_context["outputRoute"]
    request_token = object_context["outputToken"]

   # Extract the user requested URL and the supporting access point arn
    user_request_url = event["userRequest"]["url"]
    supporting_access_point_arn = event["configuration"]["supportingAccessPointArn"]

    print("USER REQUEST URL: ", user_request_url)
   # The S3 object key is after the Host name in the user request URL.
   # The user request URL looks something like this, 
   # https://<User Request Host>/ContactUs/contact-us.txt#fr.
   # The target language code in the S3 GET request is after the "#"
    user_request_url = unquote(user_request_url)
    result = user_request_url.split("#")
    user_request_url = result[0]
    targetLang = result[1]
   # Extract the S3 Object Key from the user requested URL
    s3Key = str(Path(urlparse(user_request_url).path).relative_to('/'))
   # Get the original object from S3
    s3 = boto3.resource('s3')
   # To get the original object from S3,use the supporting_access_point_arn 
    s3Obj = s3.Object(supporting_access_point_arn, s3Key).get()
    srcText = s3Obj['Body'].read()
    srcText = srcText.decode('utf-8')

   # Translate original text
    translateClient = boto3.client('translate')
    response = translateClient.translate_text(
                                                Text = srcText,
  # Write object back to S3 Object Lambda
    s3client = boto3.client('s3')
                                        RequestToken=request_token )
    return { 'statusCode': 200 }

The code in the Lambda function:

  • Extracts the outputRoute and outputToken from the object context. This defines where the WriteGetObjectResponse request is delivered.
  • Extracts the user-requested url from the event object.
  • Parses the S3 object key and the target language that is appended to the S3 object key.
  • Calls S3 GetObject to fetch the raw text of the source object.
  • Invokes Amazon Translate with the raw text extracted.
  • Puts the translated output back to S3 using the WriteGetObjectResponse API.

Configuring the Lambda IAM role

The Lambda function needs permissions to call back to the S3 Object Lambda access point with the WriteGetObjectResponse. It also needs permissions to call S3 GetObject and Amazon Translate. Add the following permissions to the Lambda execution role:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "S3ObjectLambdaAccess",
            "Effect": "Allow",
            "Action": [
            "Resource": [
                "<arn of your S3 access point/*>”,
                "<arn of your Object Lambda accesspoint>"
            "Sid": "AmazonTranslateAccess",
            "Effect": "Allow",
            "Action": "translate:TranslateText",
            "Resource": "*"

Deploying the Lambda using an AWS SAM template

Alternatively, deploy the Lambda function with the IAM role by using an AWS SAM template. The code for the Lambda function and the AWS SAM template is available for download from GitHub.

Creating the S3 Access Point

  1. Navigate to the S3 console and create a bucket with a unique name.
  2. In the S3 console, select “Access Points” and choose “Create access point”. Enter a name for the access point.
  3. For Bucket name, enter the S3 bucket name you entered in step 1.Create access point

This access point is the supporting access point for the Object Lambda Access Point you create in the next step. Keep all other settings on this page as default.

After creating the S3 access point, create the S3 Object Lambda Access Point using the supporting Access Point. The Lambda function you created earlier uses the supporting Access Point to download the original untransformed objects from S3.

Create Object Lambda Access Point

In the S3 console, go to the Object Lambda Access Point configuration and create an Object Lambda Access Point. Enter a name.

Create Object Lambda Access Point

For the Lambda function configurations, associate this with the Lambda function created earlier. Select the latest version of the Lambda function and keep all other settings as default.

Select Lambda function

To understand how to use the other settings of the S3 Object Lambda configuration, refer to the product documentation.

Testing dynamic content translation

In this section, you create a Python script and invoke the S3 GetObject API twice. First, against the S3 bucket and then against the Object Lambda Access Point. You can then compare the output to see how content is transformed using Object Lambda:

  1. Upload a text file to the S3 bucket using the Object Lambda Access Point you configured. For example, upload a sample “Contact Us” file in English to S3.
  2. To use the object Lambda Access Point, locate its ARN from the Properties tab of the Object Lambda Access Point.Object Lambda Access Point
  3. Create a local file called that contains the following Python script:
    import json
    import boto3
    import sys, getopt
    def main(argv):
        targetLang = sys.argv[1]
        print("TargetLang = ", targetLang)
        s3 = boto3.client('s3')
        s3Bucket = "my-s3ol-bucket"
        s3Key = "ContactUs/contact-us.txt"
        # Call get_object using the S3 bucket name
        response = s3.get_object(Bucket=s3Bucket,Key=s3Key)
        print("Original Content......\n")
        # Call get_object using the S3 Object Lambda access point ARN
        s3Bucket = "arn:aws:s3-object-lambda:us-west-2:123456789012:accesspoint/my-s3ol-access-point"
        s3Key = "ContactUs/contact-us.txt#" + targetLang
        response = s3.get_object(Bucket=s3Bucket,Key=s3Key)
        print("Transformed Content......\n")
        return {'Success':200}             
        print("\n\nUsage: <Target Language Code>")
    #********** Program Entry Point ***********
    if __name__ == '__main__':
        main(sys.argv[1: ])
  4. Run the client program from the command line, passing the target language code as an argument. The full list of supported languages and codes in Amazon Translate can be found here.python "fr"

The output looks like this:

Example output

The first output is the original content that was retrieved when calling GetObject with the S3 bucket name. The second output is the transformed content when calling GetObject against the Object Lambda access point. The content is transformed by the Lambda function as it is being retrieved, and translated to French in near-real time.


This blog post shows how you can use S3 Object Lambda with Amazon Translate to simplify dynamic content translation by using a data driven approach. With user-provided data as arguments, you can dynamically transform content in S3 and generate a new object.

It is not necessary to create a copy of this new object in S3 before returning it to the client. We also saw that it is not necessary for the object with the same name to exist in the S3 bucket when using S3 Object Lambda. This pattern can be used to address several real world use cases that can benefit from the ability to transform and generate S3 objects on the fly.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless Q2 2021

Post Syndicated from James Beswick original

Welcome to the 14th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

Q2 calendar

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Step Functions

Step Functions launched Workflow Studio, a new visual tool that provides a drag-and-drop user interface to build Step Functions workflows. This exposes all the capabilities of Step Functions that are available in Amazon States Language (ASL). This makes it easier to build and change workflows and build definitions in near-real time.

For more:

Workflow Studio

The new data flow simulator in the Step Functions console helps you evaluate the inputs and outputs passed through your state machine. It allows you to simulate each of the fields used to process data and updates in real time. It can help accelerate development with workflows and help visualize JSONPath processing.

For more:

Data flow simulator

Also, Amazon API Gateway can now invoke synchronous Express Workflows using REST APIs.

Amazon EventBridge

EventBridge now supports cross-Region event routing from any commercial AWS Region to a list of supported Regions. This feature allows you to centralize global events for auditing and monitoring or replicate events across Regions.

EventBridge cross-Region routing

The service now also supports bus-to-bus event routing in the same Region and in the same AWS account. This can be useful for centralizing events related to a single project, application, or team within your organization.

EventBridge bus-to-bus

You can now use EventBridge as a resource within Step Functions workflows. This provides a direct service integration for both standard and Express Workflows. You can publish events directly to a specified event bus using either a request-response or wait-for-callback pattern.

EventBridge added a new target for rules – Amazon SageMaker Pipelines. This allows you to use a rule to trigger a continuous integration and continuous deployment (CI/CD) service for your machine learning workloads.

AWS Lambda

Lambda Extensions

AWS Lambda extensions are now generally available including some performance and functionality improvements. Lambda extensions provide a new way to integrate your chosen monitoring, observability, security, and governance tools with AWS Lambda. These use the Lambda Runtime Extensions API to integrate with the execution environment and provide hooks into the Lambda lifecycle.

To help build your own extensions, there is an updated GitHub repository with example code.

To learn more:

  • Watch a Tech Talk with Julian Wood.
  • Watch the 8-episode Learning Path series covering all aspects of extensions.

Extensions available today

Amazon CloudWatch Lambda Insights support for Lambda container images is now generally available.

Amazon SNS

Amazon SNS has expanded the set of filter operators available to include IP address matching, existence of an attribute key, and “anything-but” matching.

The service has also introduced an SMS sandbox to help developers testing workloads that send text messages.

To learn more:

Amazon DynamoDB

DynamoDB announced CloudFormation support for several features. First, it now supports configuring Kinesis Data Streams using CloudFormation. This allows you to use infrastructure as code to set up Kinesis Data Streams instead of DynamoDB streams.

The service also announced that NoSQL Workbench now supports CloudFormation, so you can build data models and configure table capacity settings directly from the tool. Finally, you can now create and manage global tables with CloudFormation.

Learn how to use the recently launched Serverless Patterns Collection to configure DynamoDB as an event source for Lambda.

AWS Amplify

Amplify Hosting announced support for server-side rendered (SSR) apps built with the Next.js framework. This provides a zero configuration option for developers to deploy and host their Next.js-based applications.

The Amplify GLI now allows developers to make multiple DynamoDB GSI updates in a single deployment. This can help accelerate data model iterations. Additionally, the data management experience in the Amplify Admin UI launched at AWS re:Invent 2020 is now generally available.

AWS Serverless Application Model (AWS SAM)

AWS SAM has a public preview of support for local development and testing of AWS Cloud Development Kit (AWS CDK) projects.

To learn more:

Serverless blog posts

Operating Lambda

The “Operating Lambda” blog series includes the following posts in this quarter:

Streaming data

The “Building serverless applications with streaming data” blog series shows how to use Lambda with Kinesis.

Getting started with serverless for developers

Learn how to build serverless applications from your local integrated development environment (IDE).




Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q2:

Serverless Live was a day of talks held on May 19, featuring the serverless developer advocacy team, along with Adrian Cockroft and Jeff Barr. You can watch a replay of all the talks on the AWS Twitch channel.


YouTube ServerlessLand channel

Serverless Office Hours – Tues 10 AM PT / 1PM EST

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.





DynamoDB Office Hours

Are you an Amazon DynamoDB customer with a technical question you need answered? If so, join us for weekly Office Hours on the AWS Twitch channel led by Rick Houlihan, AWS principal technologist and Amazon DynamoDB expert. See upcoming and previous shows

Learning Path – AWS Lambda Extensions: The deep dive

Are you looking for a way to more easily integrate AWS Lambda with your favorite monitoring, observability, security, governance, and other tools? Welcome to AWS Lambda extensions: The deep dive, a learning path video series that shows you everything about augmenting Lambda functions using Lambda extensions.

There are also other helpful videos covering serverless available on the Serverless Land YouTube channel.

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Monitoring and troubleshooting serverless data analytics applications

Post Syndicated from James Beswick original

This series is about building serverless solutions in streaming data workloads. The application example used in this series is Alleycat, which allows bike racers to compete with each other virtually on home exercise bikes.

The first four posts have explored the architecture behind the application, which is enabled by Amazon Kinesis, Amazon DynamoDB, and AWS Lambda. This post explains how to monitor and troubleshoot issues that are common in streaming applications.

To set up the example, visit the GitHub repo and follow the instructions in the file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Monitoring the Alleycat application

The business requirements for Alleycat state that it must handle up to 1,000 simultaneous racers. With each racer emitting a message every second, each 5-minute race results in 300,000 messages.

Reference architecture

While the architecture can support this throughput, the settings for each service determine how the workload scales up. The deployment templates in the GitHub repo do not use sufficiently high settings to handle this amount of data. In the section, I show how this results in errors and what steps you can take to resolve the issues. To start, I run the simulator for several races with the maximum racers configuration set to 1,000.

Monitoring the Kinesis stream

The monitoring tab of the Kinesis stream provides visualizations of stream metrics. This immediately shows that there is a problem in the application when running at full capacity:

Monitoring the Kinesis stream

  1. The iterator age is growing, indicating that the data consumers are falling behind the data producers. The Get records graph also shows the number of records in the stream growing.
  2. The Incoming data (count) metric shows the number of separate records ingested by the stream. The red line indicates the maximum capacity of this single-shard stream. With 1,000 active racers, this is almost at full capacity.
  3. However, the Incoming data – sum (bytes) graph shows that the total amount of data ingested by the stream is currently well under the maximum level shown by the red line.

There are two solutions for improving the capacity on the stream. First, the data producer application (the Alleycat frontend) could combine messages before sending. It’s currently reaching the total number of messages per second but the total byte capacity is significantly below the maximum. This action improves message packing but increases latency since the frontend waits to group messages.

Alternatively, you can add capacity by resharding. This enables you to increase (or decrease) the number of shards in a stream to adapt to the rate of data flowing through the application. You can do this with the UpdateShardCount API action. The existing stream goes into an Updating status and the stream scales by splitting shards. This creates two new child shards that split the partition keyspace of the parent. It also results in another, separate Lambda consumer for the new shard.

Monitoring the Lambda function

The monitoring tab of the consuming Lambda function provides visualization of metrics that can highlight problems in the workload. At full capacity, the monitoring highlights issues to resolve:

Monitoring the Lambda function

  1. The Duration chart shows that the function is exceeding its 15-second timeout, when the function normally finishes in under a second. This typically indicates that there are too many records to process in a single batch or throttling is occurring downstream.
  2. The Error count metric is growing, which highlights either logical errors in the code or errors from API calls to downstream resources.
  3. The IteratorAge metric appears for Lambda functions that are consuming from streams. In this case, the growing metric confirms that data consumption is falling behind data production in the stream.
  4. Concurrent executions remain at 1 throughout. This is set by the parallelization factor in the event source mapping and can be increased up to 10.

Monitoring the DynamoDB table

The metric tab on the application’s table in the DynamoDB console provides visualizations for the performance of the service:

Monitoring the DynamoDB table

  1. The consumed Read usage is well within the provisioned maximum and there is no read throttling on the table.
  2. Consumed Write usage, shown in blue, is frequently bursting through the provisioned capacity.
  3. The number of Write throttled requests confirms that the DynamoDB service is throttling requests since the table is over capacity.

You can resolve this issue by increasing the provisioned throughput on the table and related global secondary indexes. Write capacity units (WCUs) provide 1 KB of write throughput per second. You can set this value manually, use automatic scaling to match varying throughout, or enable on-demand mode. Read more about the pricing models for each to determine the best approach for your workload.

Monitoring Kinesis Data Streams

Kinesis Data Streams ingests data into shards, which are fixed capacity sequences of records, up to 1,000 records or 1 MB per second. There is no limit to the amount of data held within a stream but there is a configurable retention period. By default, Kinesis stores records for 24 hours but you can increase this up to 365 days as needed.

Kinesis is integrated with Amazon CloudWatch. Basic metrics are published every minute, and you can optionally enable enhanced metrics for an additional charge. In this section, I review the most commonly used metrics for monitoring the health of streams in your application.

Metrics for monitoring data producers

When data producers are throttled, they cannot put new records onto a Kinesis stream. Use the WriteProvisionedThroughputExceeded metric to detect if producers are throttled. If this is more than zero, you won’t be able to put records to the stream. Monitoring the Average for this statistic can help you determine if your producers are healthy.

When producers succeed in sending data to a stream, the PutRecord.Success and PutRecords.Success are incremented. Monitoring for spikes or drops in these metrics can help you monitor the health of producers and catch problems early. There are two separate metrics for each of the API calls, so watch the Average statistic for whichever of the two calls your application uses.

Metrics for monitoring data consumers

When data consumers are throttled or start to generate errors, Kinesis continues to accept new records from producers. However, there is growing latency between when records are written and when they are consumed for processing.

Using the GetRecords.IteratorAgeMilliseconds metric, you can measure the difference between the age of the last record consumed and the latest record put to the stream. It is important to monitor the iterator age. If the age is high in relation to the stream’s retention period, you can lose data as records expire from the stream. This value should generally not exceed 50% of the stream’s retention period – when the value reaches 100% of the stream retention period, data is lost.

If the iterator age is growing, one temporary solution is to increase the retention time of the stream. This gives you more time to resolve the issue before losing data. A more permanent solution is to add more consumers to keep up with data production, or resolve any errors that are slowing consumers.

When consumers exceed the ReadProvisionedThroughputExceeded metric, they are throttled and you cannot read from the stream. This results in a growth of records in the stream waiting for processing. Monitor the Average statistic for this metric and aim for values as close to 0 as possible.

The GetRecords.Success metric is the consumer-side equivalent of PutRecords.Success. Monitor this value for spikes or drops to ensure that your consumers are healthy. The Average is usually the most useful statistic for this purpose.

Increasing data processing throughput for Kinesis Data Streams

Adjusting the parallelization factor

Kinesis invokes Lambda consumers every second with a configurable batch size of messages. It’s important that the processing in the function keeps pace with the rate of traffic to avoid a growing iterator age. For compute intensive functions, you can increase the memory allocated in the function, which also increases the amount of virtual CPU available. This can help reduce the duration of a processing function.

If this is not possible or the function is falling behind data production in the stream, consider increasing the parallelization factor. By default, this is set to 1, meaning that each shard has a single instance of a Lambda function it invokes. You can increase this up to 10, which results in multiple instances of the consumer function processing additional batches of messages.

Adjusting the parallelization factor

Using enhanced fan-out to reduce iterator age

Standard consumers use a pull model over HTTP to fetch batches of records. Each consumer operates in serial. A stream with five consumers averages 200 ms of latency each, meaning it takes up to 1 second for all five to receive batches of records.

You can improve the overall latency by removing any unnecessary data consumers. If you use Kinesis Data Firehose and Kinesis Data Analytics on a stream, these count as consumers too. If you can remove subscribers, this helps with over data consumption throughput.

If the workload needs all of the existing subscribers, use enhanced fan-out (EFO). EFO consumers use a push model over HTTP/2 and are independent of each other. With EFO, the same five consumers in the previous example would receive batches of messages in parallel, using dedicated throughput. Overall latency averages 70 ms and typically data delivery speed is improved by up to 65%. There is an additional charge for this feature.

Enhanced fan-out

To learn more about processing streaming data with Lambda, see this AWS Online Tech Talk presentation.


In this post, I show how the existing settings in the Alleycat application are not sufficient for handling the expected amount of traffic. I walk through the metrics visualizations for Kinesis Data Streams, Lambda, and DynamoDB to find which quotas should be increased.

I explain which CloudWatch metrics can be used with Kinesis Data Stream to ensure that data producers and data consumers are healthy. Finally, I show how you can use the parallelization factor and enhanced fan-out features to increase the throughput of data consumers.

For more serverless learning resources, visit Serverless Land.

Building leaderboard functionality with serverless data analytics

Post Syndicated from James Beswick original

This series is about building serverless solutions in streaming data workloads. The application example used in this series is Alleycat, which allows bike racers to compete with each other virtually on home exercise bikes.

Part 1 explains the application’s functionality, how to deploy to your AWS account, and provides an architectural review. Part 2 compares different ways to ingest streaming data into Amazon Kinesis Data Streams and shows how to optimize shard capacity. Part 3 uses Amazon Kinesis Data Firehose with AWS Lambda to implement the all-time rankings functionality.

This post walks through the leaderboard functionality in the application. This solution uses Kinesis Data Streams, Lambda, and Amazon DynamoDB to provide both real-time and historical rankings.

To set up the example, visit the GitHub repo and follow the instructions in the file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Overview of leaderboards in Alleycat

Alleycat races are 5 minutes long and run continuously throughout the day for each class. Competitors select a class type and automatically join the current race by choosing Start Race. There are up to 1,000 competitors per race. To track performance, there are two types of leaderboard in Alleycat: Race results for completed races and the Realtime rankings for active races.

Alleycat frontend

  1. The user selects a completed race from the dropdown to a leaderboard ranking of results. This table is not real time and only reflects historical results.
  2. The user selects the Here now option to see live results for all competitors in the current virtual race for the chosen class.

Architecture overview

The backend microservice supporting these features is located in the 1-streaming-kds directory in the GitHub repo. It uses the following architecture:

Solution architecture

  1. The tumbling window function receives results every second from Kinesis Data Streams. This function aggregates metrics and saves the latest results to the application’s DynamoDB table.
  2. DynamoDB Streams invoke the publishing Lambda function every time an item is changed in the table. This reformats the message and publishes to the application’s IoT topic in AWS IoT Core to update the frontend.
  3. The final results function is an additional consumer on Kinesis Data Streams. It filters for only the last result in each competitor’s race and publishes the results to the DynamoDB table.
  4. The frontend calls an Amazon API Gateway endpoint to fetch the historical results for completed races via a Lambda function.

Configuring the tumbling window function

This Lambda function aggregates data from the Kinesis stream and stores the result in the DynamoDB table:

Tumbling window architecture

This function receives batches of 1,000 messages per invocation. The messages may contain results for multiple races and competitors. The function groups the results by race and racer ID and then flattens the data structure. It writes an item to the DynamoDB table in this format:

 "PK": "race-5403307",
 "SK": "results",
 "ts": 1620992324960,
 "results": "{\"0\":167.04,\"1\":136,\"2\":109.52,\"3\":167.14,\"4\":129.69,\"5\":164.97,\"6\":149.86,\"7\":123.6,\"8\":154.29,\"9\":89.1,\"10\":137.41,\"11\":124.8,\"12\":131.89,\"13\":117.18,\"14\":143.52,\"15\":95.04,\"16\":109.34,\"17\":157.38,\"18\":81.62,\"19\":165.76,\"20\":181.78,\"21\":140.65,\"22\":112.35,\"23\":112.1,\"24\":148.4,\"25\":141.75,\"26\":173.24,\"27\":131.72,\"28\":133.77,\"29\":118.44}",
 "GSI": 2

This transformation significantly reduces the number of writes to the DynamoDB table per function invocation. Each time a write occurs, this triggers the publishing process and notifies the frontend via the IoT topic.

DynamoDB can handle almost any number of writes to the table. You can set up to 40,000 write capacity units with default limits and can request even higher limits via an AWS Support ticket. However, you may want to limit the throughput to reduce the WCU cost.

The tumbling window feature of Lambda allows invocations from a streaming source to pass state between invocations. You can specify a window interval of up to 15 minutes. During the window, a state is passed from one invocation to the next, until a final invocation at the end of the window. Alleycat uses this feature to buffer aggregated results and only write the output at the end of the tumbling window:

Tumbling window process

For a tumbling window period of 5 seconds, this means that the Lambda function is invoked multiple times, passing an intermediate state from invocation to invocation. Once the window ends, it then writes the final aggregated result to DynamoDB. The tradeoff in this solution is that it reduces the number of real-time notifications to the frontend since these are published from the table’s stream. This increases the latency of live results in the frontend application.

Buffer by the Lambda function

Implementing tumbling windows in Lambda functions

The template.yaml file describes the tumbling Lambda function. The event definition specifies the tumbling window duration in the TumblingWindowInSeconds attribute:

    Type: AWS::Serverless::Function
      CodeUri: tumblingFunction/    
      Handler: app.handler
      Runtime: nodejs14.x
      Timeout: 15
      MemorySize: 256
          DDB_TABLE: !Ref DynamoDBtableName
          TableName: !Ref DynamoDBtableName
          Type: Kinesis
            Stream: !Sub "arn:aws:kinesis:${AWS::Region}:${AWS::AccountId}:stream/${KinesisStreamName}"
            BatchSize: 1000
            StartingPosition: TRIM_HORIZON      
            TumblingWindowInSeconds: 15  

When you enable tumbling windows, the function’s event payload contains several new attributes:

    "Records": [
    "shardId": "shardId-000000000000",
    "eventSourceARN": "arn:aws:kinesis:us-east-2:123456789012:stream/alleycat",
    "window": {
        "start": "2021-05-05T18:51:00Z",
        "end": "2021-05-05T18:51:15Z"
    "state": {},
    "isFinalInvokeForWindow": false,
    "isWindowTerminatedEarly": false

These include:

  • Window start and end: The beginning and ending timestamps for the current tumbling window.
  • State: An object containing the state returned from the previous invocation, which is initially empty in a new window. The state object can contain up to 1 MB of data.
  • isFinalInvokeForWindow: Indicates if this is the last invocation for the current window. This only occurs once per window period.
  • isWindowTerminatedEarly: A window ends early if the state exceeds the maximum allowed size of 1 MB.

The event handler in app.js uses tumbling windows if they are defined in the AWS SAM template:

// Main Lambda handler
exports.handler = async (event) => {

	// Retrieve existing state passed during tumbling window	
	let state = event.state || {}
	// Process the results from event
	let jsonRecords = getRecordsFromPayload(event) => raceMap[record.raceId] = record.classId)
	state = getResultsByRaceId(state, jsonRecords)

	// If tumbling window is not configured, save and exit
	if (event.window === undefined) {
		return await saveCurrentRaces(state) 

	// If tumbling window is configured, save to DynamoDB on the 
	// final invocation in the window
	if (event.isFinalInvokeForWindow) {
		await saveCurrentRaces(state) 
	} else {
		return { state }

The final results function and API

The final results function filters for the last event in each competitor’s race, which contains the final score. The function writes each score to the DynamoDB table. Since there may be many write events per invocation, this function uses the Promise.all construct in Node.js to complete the database operations in parallel:

const saveFinalScores = async (raceResults) => {
	let paramsArr = [] => {
		  TableName : process.env.DDB_TABLE,
		  Item: {
		    PK: `race-${result.raceId}`,
		    SK: `racer-${result.racerId}`,
		    GSI: result.output,
	// Save to DDB in parallel
	await Promise.all( => documentClient.put (params).promise()))

Using this approach, each call to documentClient.put is made without waiting for a response from the SDK. The call returns a Promise object, which is in a pending state until the database operation returns with a status. Promise.all waits for all promises to resolve or reject before code execution continues. Comparing the serial and concurrent approach, this reduces the overall time for multiple database writes. The tradeoff is that it increases the number of writes to DynamoDB and the number of WCUs consumed.

Serial vs concurrent writes

For a large number of put operations to DynamoDB, you can also use the DocumentClient’s batchWrite operation. This delegates to the underlying DynamoDB BatchWriteItem operation in the AWS SDK and can accept up to 25 separate put requests in a single call. To handle more than 25, you can make multiple batchWrite requests and still use the parallelized invocation method shown above.

The DynamoDB table maintains a list of race results with one item per racer per race:

DynamoDB table items

The frontend calls an API Gateway endpoint that invokes the getLeaderboard function. This function uses the DocumentClient’s query API to return results for a selected race, sorted by a global secondary index containing the final score:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const documentClient = new AWS.DynamoDB.DocumentClient()

// Main Lambda handler
exports.handler = async (event) => {

    const classId = parseInt(event.queryStringParameters.classId)

    const params = {
        TableName: process.env.DDB_TABLE,
        IndexName: 'GSI_PK_Index',
        KeyConditionExpression: 'PK = :ID',
        ExpressionAttributeValues: {
          ':ID': `class-${classId}`
        ScanIndexForward: false,
        Limit: 1000

    const result = await documentClient.query(params).promise()   
    return result.Items

By default, this returns the top 1,000 places by using the Limit parameter. You can customize this or use pagination to implement fetching large result sets more efficiently.


In this post, I explain the all-time leaderboard logic in the Alleycat application. This is an asynchronous, eventually consistent process that checks batches of incoming records for new personal best records. This uses Kinesis Data Firehose to provide a zero-administration way to deliver and process large batches of records continuously.

This post shows the architecture in Alleycat and how this is defined in AWS SAM. Finally, I walk through how to build a data transformation Lambda function that decodes a payload and returns records back to Kinesis.

Part 5 discusses how to troubleshoot issues in Alleycat and how to monitor streaming applications generally.

For more serverless learning resources, visit Serverless Land.

Building serverless applications with streaming data: Part 3

Post Syndicated from James Beswick original

This series is about building serverless solutions in streaming data workloads. These are traditionally challenging to build, since data can be streamed from thousands or even millions of devices continuously. The application example used in this series is Alleycat, which allows bike racers to compete with each other virtually on home exercise bikes.

Part 1 explains the application’s functionality, how to deploy to your AWS account, and provides an architectural review. Part 2 compares different ways to ingest streaming data into Amazon Kinesis Data Streams and shows how to optimize shard capacity.

In this post, I explain how the all-time rankings functionality is implemented. This uses Amazon Kinesis Data Firehose to deliver hundreds of thousands of data points for processing by AWS Lambda functions.

To set up the example, visit the GitHub repo and follow the instructions in the file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Overview of real time rankings in Alleycat

In the example scenario, there are 40,000 users and up to 1,000 competitors may race at any given time. While a competitor is racing, there is a real time rankings display that shows their performance in the selected class:

Alleycat front end

The racer can select Here now to compare against racers in the current virtual race. Alternatively, selecting All time compares their performance against the best performance of all races who have ever competed in the race. The rankings board is dynamic and shows the rankings for the current second on the display for the local racer:

Leaderboard changes over time

In Alleycat, races occur every five minutes continuously, and the all-time data is gathered from races that are completed. For this to work, the application must do the following:

  • Continuously aggregate race data from each of the six exercise classes and deliver to an Amazon S3 bucket.
  • Compare the incoming race data with the personal best records for every competitor in the same class.
  • If the current performance is a record, update the all-time best records dataset.
  • Compress the dataset, since there are thousands of racers, each with 5 minutes of personal racing history.

The resulting dataset contains second-by-second personal records for every racer in a selected race. This is saved in the application’s history bucket in S3 and loaded by the Alleycat front end before the beginning of each race:

        // From Home.vue's loadRealtimeHistory method

        // Load gz from S3 history bucket, unzip and parse to JSON
        const URL = `https://${this.$appConfig.historyBucket}.s3.${this.$appConfig.region}${this.selectedClassId}.gz`
        const response = await axios.get(URL, { responseType: 'arraybuffer' })
        const buffer = Buffer.from(, 'base64')
        const dezipped = await gunzip(buffer)
        const history = JSON.parse(dezipped.toString())

Architecture overview

The backend microservice handling this feature is located in the 2-streaming-kdf directory in the GitHub repo. It uses the following architecture:

Solution architecture

  1. Amazon Kinesis Data Firehose is a consumer of Kinesis Data Streams. Records are delivered from the application’s main stream as they arrive.
  2. Kinesis Data Firehose invokes a data transformation Lambda function. This calculates the output for each racer’s data points and returns the modified records to Kinesis.
  3. Kinesis Data Firehose delivers batches of records to an S3 bucket.
  4. When objects are written to the S3 bucket, this event triggers the S3 processor Lambda function. This compares incoming racer performance with historical records.
  5. The Lambda function saves the new all-time records in a compressed format to the application’s history bucket in S3.

The process happens continuously providing that new records are delivered to the application’s main Kinesis stream.

Using Kinesis Data Firehose

Kinesis Data Firehose can consume records from Kinesis Data Streams, or directly from the AWS SDK, AWS IoT Core, and other data producers. In Alleycat, there are multiple consumers that process incoming data, so Kinesis Data Firehose is configured as a consumer on the main stream.

The process of updating historical records is not a real-time process in Alleycat. Processing individual racer messages and comparing against all-time records is possible but would be computationally more complex. In practice, only a few incoming data points are all-time records. As a result, Alleycat asynchronously processes batches of records to find and update all-time records. The process is eventually consistent and the tradeoff is latency. There may be up to 1 minute between recording a personal record and updating the historical dataset in S3.

Kinesis Data Firehose provides several key functions for this process. First, it batches groups of messages based upon the batching hints provided in the AWS Serverless Application Model (AWS SAM) template. This application buffers records for up to 60 seconds or until 1 MB of records are available, whichever is reached first. You can adjust these settings and batch for up to 900 seconds or 128 MB of data. Note that these settings are hints and not absolute – the service can dynamically adjust these if data delivery falls behind data writing in the stream. As a result, hints should be treated as guidance and the actual settings may change.

Kinesis Data Firehose also enables data compression before delivery in various common formats. Since the S3 delivery bucket is an intermediary storage location in this application, using data compression reduces the storage cost. Kinesis can also encrypt data before delivery but this feature is not used in Alleycat. Finally, Kinesis Data Firehose can transform the incoming records by invoking a Lambda function. This enables the application to calculate the racer’s output before delivering the records.

The configuration for Kinesis Data Firehose, the S3 buckets, and the Lambda function is located in the template.yaml file in the GitHub repo. This AWS SAM template shows how to define the complete integration:

    Type: AWS::KinesisFirehose::DeliveryStream
      - DeliveryStreamPolicy    
      DeliveryStreamName: "alleycat-data-firehose"
      DeliveryStreamType: "KinesisStreamAsSource"
        KinesisStreamARN: !Sub "arn:aws:kinesis:${AWS::Region}:${AWS::AccountId}:stream/${KinesisStreamName}"
        RoleARN: !GetAtt DeliveryStreamRole.Arn
        BucketARN: !GetAtt DeliveryBucket.Arn
          SizeInMBs: 1
          IntervalInSeconds: 60
          Enabled: true
          LogGroupName: "/aws/kinesisfirehose/alleycat-firehose"
          LogStreamName: "S3Delivery"
        CompressionFormat: "GZIP"
          NoEncryptionConfig: "NoEncryption"
        Prefix: ""
        RoleARN: !GetAtt DeliveryStreamRole.Arn
          Enabled: true
            - Type: "Lambda"
                - ParameterName: "LambdaArn"
                  ParameterValue: !GetAtt FirehoseProcessFunction.Arn

Once Kinesis Data Firehose is set up, there is no ongoing administration. The service scales automatically to adjust to the amount of the data available in the application’s Kinesis data stream.

How the Lambda data transformer works

Kinesis Data Firehose buffers up to 3 MB before invoking the data transformation function (you can configure this setting with the ProcessingConfiguration API). The service delivers records as a JSON array:

    "invocationId": "d8ce3cc6-abcd-407a-abcd-d9bc2ce58e72",
    "sourceKinesisStreamArn": "arn:aws:kinesis:us-east-2:012345678912:stream/alleycat",
    "deliveryStreamArn": "arn:aws:firehose:us-east-2:012345678912:deliverystream/alleycat-firehose",
    "region": "us-east-2",
    "records": [
            "recordId": "49617596128959546022271021234567...",
            "approximateArrivalTimestamp": 1619185789847,
            "data": "eyJ1dWlkIjoiYjM0MGQzMGYtjI1Yi00YWM4LThjY2QtM2ZhNThiMWZmZjNlIiwiZXZlbnQiOiJ1cGRhdGUiLCJkZXZpY2VUaW1lc3RhbXiOjE2MTkxODU3ODk3NUsInNlY29uZCI6MwicmFjZUlkIjoxNjE5MTg1NjIwMj1LCJuYW1lIjoiSHViZXJ0IiwicmFjZXJJZCI6MSwiY2xhc3NJZCI6MSwiY2FkZW5jZSI6ODAsInJlc2lzdGFuY2UiOjc4fQ==",
            "kinesisRecordMetadata": {
                "sequenceNumber": "49617596128959546022271020452",
                "subsequenceNumber": 0,
                "partitionKey": "1619185789798",
                "shardId": "shardId-000000000000",
                "approximateArrivalTimestamp": 1619185789847

The payload contains metadata about the invocation and data source and each record contains a recordId and a base64 encoded data attribute. The Lambda function can then decode and modify the data attribute for each record as needed. The Lambda function must finally return a JSON array of the same length as the incoming event.records array, with the following attributes:

  • recordId: this must match the incoming recordId, so Kinesis can map the modified data payload back to the original record.
  • result: this must be “Ok”, “Dropped”, or “ProcessingFailed”. “Dropped” means that the function has intentionally removed a payload from processing. Any records with “ProcessingFailed” are delivered to the S3 bucket in a folder called processing-failed. This includes metadata indicating the number of delivery attempts, the timestamp of the last attempt, and the Lambda function’s ARN.
  • data: the returned data payload must be base64 encoded and the modified record must be within the 1 MB limit per record.

The transformer function in Alleycat shows how to implement this process in Node.js:

exports.handler = (event) => {
  const output = => {
    // Extract JSON record from base64 data
    const buffer = Buffer.from(, "base64").toString()
    const jsonRecord = JSON.parse(buffer)

    // Add calculated field
    jsonRecord.output = ((jsonRecord.cadence + 35) * (jsonRecord.resistance + 65)) / 100

    // Convert back to base64 + add a newline
    const dataBuffer = Buffer.from(JSON.stringify(jsonRecord) + "\n", "utf8").toString("base64")

    return {
      recordId: record.recordId,
      result: "Ok",
      data: dataBuffer,

  console.log(`{ recordsTotal: ${output.length} }`)
  return { records: output }

If you are processing the data in downstream services in JSON format, adding a newline at the end of the data buffer for each record can simplify the import process.

The data transformation Lambda function can run for up to 5 minutes and can use other AWS services for data processing. The Kinesis Data Firehose service scales up the Lambda function automatically if traffic increases, up to 5 outstanding invocations per shard (or 10 if the destination is Splunk).

Processing the Kinesis Data Firehose output

Kinesis Data Firehose puts a series of objects in the intermediary S3 bucket. Each put event invokes the S3 processor Lambda function. This function compares each data point to historical best performance, per racer ID, per class ID, at the same second of the race.

Data points that do not beat existing records are discarded. Otherwise, the function merges new historical records into the dataset and saves the result into the final S3 history bucket. This object is compressed in gzip format and contains the history for a single class ID.

When the frontend downloads this dataset from the S3 bucket, it decompresses the object. The resulting JSON structure shows personal output records per racer ID for each second of the race. The frontend uses this data to update the leaderboard on the local device during each second of the active race.

JSON output


In this post, I explain the all-time leaderboard logic in the Alleycat application. This is an asynchronous, eventually consistent process that checks batches of incoming records for new personal records. This uses Kinesis Data Firehose to provide a zero-administration way to deliver and process large batches of records continuously.

This post shows the architecture in Alleycat and how this is defined in AWS SAM. Finally, I walk through how to build a data transformation Lambda function that correctly decodes a payload and returns records back to Kinesis.

Part 4 show how to combine Kinesis with Amazon DynamoDB to support queries for streaming data. Alleycat uses this architecture to provide real-time rankings for competitors in the same virtual race.

For more serverless learning resources, visit Serverless Land.

Announcing migration of the Java 8 runtime in AWS Lambda to Amazon Corretto

Post Syndicated from James Beswick original

This post is written by Jonathan Tuliani, Principal Product Manager, AWS Lambda.

What is happening?

Beginning July 19, 2021, the Java 8 managed runtime in AWS Lambda will migrate from the current Open Java Development Kit (OpenJDK) implementation to the latest Amazon Corretto implementation.

To reflect this change, the AWS Management Console will change how Java 8 runtimes are displayed. The display name for runtimes using the ‘java8’ identifier will change from ‘Java 8’ to ‘Java 8 on Amazon Linux 1’. The display name for runtimes using the ‘java8.al2’ identifier will change from ‘Java 8 (Corretto)’ to ‘Java 8 on Amazon Linux 2’. The ‘java8’ and ‘java8.al2’ identifiers themselves, as used by tools such as the AWS CLI, CloudFormation, and AWS SAM, will not change.

Why are you making this change?

This change enables customers to benefit from the latest innovations and extended support of the Amazon Corretto JDK distribution. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the OpenJDK. Corretto is certified as compatible with the Java SE standard and used internally at Amazon for many production services.

Amazon is committed to Corretto, and provides regular updates that include security fixes and performance enhancements. With this change, these benefits are available to all Lambda customers. For more information on improvements provided by Amazon Corretto 8, see Amazon Corretto 8 change logs.

How does this affect existing Java 8 functions?

Amazon Corretto 8 is designed as a drop-in replacement for OpenJDK 8. Most functions benefit seamlessly from the enhancements in this update without any action from you.

In rare cases, switching to Amazon Corretto 8 introduces compatibility issues. See below for known issues and guidance on how to verify compatibility in advance of this change.

When will this happen?

This migration to Amazon Corretto takes place in several stages:

  • June 15, 2021: Availability of Lambda layers for testing the compatibility of functions with the Amazon Corretto runtime. Start of AWS Management Console changes to java8 and java8.al2 display names.
  • July 19, 2021: Any new functions using the java8 runtime will use Amazon Corretto. If you update an existing function, it will transition to Amazon Corretto automatically. The container base image is updated to use Amazon Corretto.
  • August 16, 2021: For functions that have not been updated since June 28, AWS will begin an automatic transition to the new Corretto runtime.
  • September 10, 2021: Migration completed.

These changes are only applied to functions not using the arn:aws:lambda:::awslayer:Java8Corretto or arn:aws:lambda:::awslayer:Java8OpenJDK layers described below.

Which of my Lambda functions are affected?

Lambda supports two versions of the Java 8 managed runtime: the java8 runtime, which runs on Amazon Linux 1, and the java8.al2 runtime, which runs on Amazon Linux 2. This change only affects functions using the java8 runtime. Functions the java8.al2 runtime are already using the Amazon Corretto implementation of Java 8 and are not affected.

The following command shows how to use the AWS CLI to list all functions in a specific Region using the java8 runtime. To find all such functions in your account, repeat this command for each Region:

aws lambda list-functions --function-version ALL --region us-east-1 --output text --query "Functions[?Runtime=='java8'].FunctionArn"

What do I need to do?

If you are using the java8 runtime, your functions will be updated automatically. For production workloads, we recommend that you test functions in advance for compatibility with Amazon Corretto 8.

For Lambda functions using container images, the existing container base image will be updated to use the Amazon Corretto Java implementation. You must manually update your functions to use the updated container base image.

How can I test for compatibility with Amazon Corretto 8?

If you are using the java8 managed runtime, you can test functions with the new version of the runtime by adding the layer reference arn:aws:lambda:::awslayer:Java8Corretto to the function configuration. This layer instructs the Lambda service to use the Amazon Corretto implementation of Java 8. It does not contain any data or code.

If you are using container images, update the JVM in your image to Amazon Corretto for testing. Here is an example Dockerfile:


# Update the JVM to the latest Corretto version
## Import the Corretto public key
rpm --import

## Add the Corretto yum repository to the system list
curl -L -o /etc/yum.repos.d/corretto.repo

## Install the latest version of Corretto 8
yum install -y java-1.8.0-amazon-corretto-devel

# Copy function code and runtime dependencies from Gradle layout
COPY build/classes/java/main ${LAMBDA_TASK_ROOT}
COPY build/dependency/* ${LAMBDA_TASK_ROOT}/lib/

# Set the CMD to your handler
CMD [ "com.example.LambdaHandler::handleRequest" ]

Can I continue to use the OpenJDK version of Java 8?

You can continue to use the OpenJDK version of Java 8 by adding the layer reference arn:aws:lambda:::awslayer:Java8OpenJDK to the function configuration. This layer tells the Lambda service to use the OpenJDK implementation of Java 8. It does not contain any data or code.

This option gives you more time to address any code incompatibilities with Amazon Corretto 8. We do not recommend that you use this option to continue to use Lambda’s OpenJDK Java implementation in the long term. Following this migration, it will no longer receive bug fix and security updates. After addressing any compatibility issues, remove this layer reference so that the function uses the Lambda-Amazon Corretto managed implementation of Java 8.

What are the known differences between OpenJDK 8 and Amazon Corretto 8 in Lambda?

Amazon Corretto caches TCP sessions for longer than OpenJDK 8. Functions that create new connections (for example, new AWS SDK clients) on each invoke without closing them may experience an increase in memory usage. In the worst case, this could cause the function to consume all the available memory, which results in an invoke error and a subsequent cold start.

We recommend that you do not create AWS SDK clients in your function handler on every function invocation. Instead, create SDK clients outside the function handler as static objects that can be used by multiple invocations. For more information, see static initialization in the Lambda Operator Guide.

If you must use a new client on every invocation, make sure it is shut down at the end of every invocation. This avoids TCP session caches using unnecessary resources.

What if I need additional help?

Contact AWS Support, the AWS Lambda discussion forums, or your AWS account team if you have any questions or concerns.

For more serverless learning resources, visit Serverless Land.

Building serverless applications with streaming data: Part 2

Post Syndicated from James Beswick original

Part 1 introduces the Alleycat application that allows bike racers to compete with each other virtually on home exercise bikes. I explain the application’s functionality, how to deploy to your AWS account, and provide an architectural review.

This series is about building serverless solutions in streaming data workloads. These are traditionally challenging to build, since data can be streamed from thousands or even millions of devices continuously.

In the example scenario, there are 40,000 users and up to 1,000 competitors may race at any given time. The workload must continuously ingest and buffer this data, then process and analyze the information to provide analytics and leaderboard content for the frontend application.

In this post, I focus on data ingestion. I compare the two different methods used in Alleycat, and discuss other approaches available. This post refers to Amazon Kinesis Data Streams, the AWS SDK, and AWS IoT Core in the solutions.

To set up the example, visit the GitHub repo and follow the instructions in the file. Note that this walkthrough uses services that are not covered by the AWS Free Tier and incur cost.

Using AWS IoT Core to ingest streaming data

AWS IoT Core enables publish-subscribe capabilities for large numbers of client applications. Clients can send data to the backend using the AWS IoT Device SDK, which uses the MQTT standard for IoT messaging. After processing, the backend can publish aggregation and status messages back to the frontend via AWS IoT Core. This service fans out the messages to clients using topics.

When using this approach, note the Quality of Service (QoS) options available. By default, the SDK uses QoS level 0, which means the device does not confirm the message is received. This is intended for workloads that can lose messages occasionally without impacting performance. In Alleycat, if performance metrics are sometimes lost, this does not likely impact the overall end user experience.

For workloads requiring higher reliability, use QoS level 1, which causes the SDK to resend the message until an acknowledgement is received. While there is no additional charge for using QoS level 1, it generally increases the number of messages, which increases the overall cost. You are not charged for the PUBACK acknowledgement message – for more details, read more about AWS IoT Core pricing.


In this scenario, the Alleycat frontend application is running on a physical exercise bike. The user selects a racer ID and exercise class and chooses Start Race to join the current virtual race for that class.

Start race UI

Every second, the frontend sends a message containing the cadence and resistance metrics and the current second in the race for the local racer. This message is created as a JSON object in the Home.vue component and sent to the ‘alleycat-publish’ topic:

      const message = {
        uuid: uuidv4(),
        event: this.event,
        second: this.currentSecond,
        raceId: RACE_ID,
        classId: this.selectedClassId,
        cadence: this.racer.getCurrentCadence(),
        resistance: this.racer.getCurrentResistance

The IoT.vue component contains the logic for this integration and uses the AWS IoT Device SDK to send and receive messages. On startup, the frontend connects to AWS IoT Core and publishes the messages using an MQTT client:

    bus.$on('publish', (data) => {
      console.log('Publish: ', data)
      mqttClient.publish(topics.publish, JSON.stringify(data))

The SDK automatically attempts to retry in the event of a network disconnection and exposes an error handler to allow custom logic if other errors occur.


The resources used in the backend are defined using the AWS Serverless Application Model (AWS SAM) and configured in the core setup templates:

Reference architecture

Messages are published to topics in AWS IoT Core, which act as channels of interest. The message broker uses topic names and topic filters to route messages between publishers and subscribers. Incoming messages are routed using rules. Alleycat’s IoT rule routes all incoming messages to a Kinesis stream:

    Type: AWS::IoT::TopicRule
      RuleName: 'alleycatIngest'
        RuleDisabled: 'false'
        Sql: "SELECT * FROM 'alleycat-publish'"
        - Kinesis:
            StreamName: 'alleycat'
            PartitionKey: "${timestamp()}"
            RoleArn: !GetAtt IoTKinesisRole.Arn

    Type: AWS::IAM::Role
        Version: "2012-10-17"
          - Effect: Allow
              - 'sts:AssumeRole'
      Path: /
        - PolicyName: IoTKinesisPutPolicy
            Version: "2012-10-17"
              - Effect: Allow
                Action: 'kinesis:PutRecord'
                Resource: !GetAtt KinesisStream.Arn

Using the AWS::IoT::TopicRule resource, you can optionally define an error action. This allows you to store messages in a durable location, such as an Amazon S3 bucket, if an error occurs. Errors can occur if a rule does not have permission to access a destination or throttling occurs in a target.

Rules can route matching messages to up to 10 targets. For debugging purposes, you can also enable Amazon CloudWatch Logs, which can help in troubleshoot failed message deliveries. The AWS IoT Core Message Broker allows up to 20,000 publish requests per second – if you need a higher limit for your workload, submit a request to AWS Support.

Using the AWS SDK to ingest streaming data

The Alleycat frontend creates traffic for a single user but there is also a simulator application that can generate messages for up to 1,000 riders. Instead of routing messages using an MQTT client, the simulator uses the AWS SDK to put messages directly into the Kinesis data stream.

The SDK provides a service interface object for Kinesis and two API methods for putting messages into streams: putRecord and putRecords. The first option accepts only a single message but the second enables batching of up to 500 messages per request. This is the preferred option for adding multiple messages, compared with calling putRecord multiple times.

The putRecords API takes parameters as a JSON array of messages:

const params = {
   StreamName: 'alley-cat',

The SDK automatically base64 encodes the Data attribute, which in this case is the JSON string output from JSON.stringify. In the JavaScript SDK, the putRecords API can return a promise, allowing the code to await the operation:

const result = await kinesis.putRecords(params).promise()

Shards and partition keys

Kinesis data streams consist of one or more shards, which are sequences of data records with a fixed capacity. Each shard can support up to 1,000 records per second for writes, up to maximum total data write rate of 1MB per second. The total capacity of a stream is the total of its shards.

When you send messages to a stream, the partitionKey attribute determines which shard it is routed to. The example application configures a Kinesis data stream with a single shard so the partitionKey attribute has no effect – all messages are routed to the same shard. However, many production applications have more than one shard and use the partitionKey to assign messages to shards.

The partitionKey is hashed by the Kinesis service to route to a shard. This diagram shows how partitionKey values from data producers are hashed by an MD5 function and mapped to individual shards:

MD5 hash process

While you cannot designate a specific shard ID in a message, you can influence the assignment depending on your choice of partitionKey:

  • Random: Using a randomized value results in random hash so messages are randomly sent to different shards. This effectively load balances messages across all available shards.
  • Time-based: A timestamp value may cause groups of messages sent to a single shard, if the messages arrive at the same time. The identical timestamp results in an identical hash.
  • Application-specific: if Alleycat used the classID as a partitionKey, racers in each class would always be routed to the same shard. This could be useful for downstream aggregation logic but would limit the capacity of messages per classID.

Optimizing capacity in a shard

Each shard can ingest data at a rate of 1 MB per second or 1,000 records per second, whichever limit is reached first. Since the payload maximum is 1MB, this could equate to one 1MB message per second. If the payload is larger, you must divide it into smaller pieces to avoid an error. For 1,000 messages, each payload must be under 1 KB on average to fit within the allowed capacity.

The combination of the two payload limits can result in different capacity profiles for a shard:

Capacity profiles in a shard

  1. The data payloads are evenly sized and use the 1 MB per second capacity.
  2. Data payload sizes vary, so the number of messages that can be packed into 1 MB varies per second.
  3. There are a large number of very messages, consuming all 1,000 messages per second. However, the total data capacity used is significantly less than 1 MB.

In the Alleycat application, the average payload size is around 170 bytes. When producing 1,000 messages a second, the workload is only using about 20% of the 1 MB per second limit. Since PUT payload size is a factor in Kinesis pricing, messages that are much smaller than 25 KB are less cost-efficient. Compare these two messaging patterns for the Alleycat application:

Producer message patterns

  1. In this default mode, a smaller message is published once per second. This reduces overall latency but results in higher overall messaging cost.
  2. The client application batches outgoing messages and sends to Kinesis every 5 seconds. This results in lower cost and better packing of messages, but introduces additional latency.

There is a tradeoff between cost and latency when optimizing a shard’s capacity and the decision depends upon the needs of your workload. If the client buffers messages, this adds latency on the client side. This is acceptable in many workloads that collect metrics for archival or asynchronous reporting purchases. However, for low-latency applications like Alleycat, it provides a better experience for the application user to send messages as soon as they are available.


This post focuses on ingesting data into Kinesis Data Streams. I explain the two approaches used by the Alleycat frontend and the simulator application and highlight other approaches that you can use. I show how messages are routed to shards using partition keys. Finally, I explore additional factors to consider when ingesting data, to improve efficiency and reduce cost.

Part 3 covers using Amazon Kinesis Data Firehose for transforming, aggregating, and loading streaming data into data stores. This is used to provide the historical, second-by-second leaderboard for the frontend application.

For more serverless learning resources, visit Serverless Land.

Using API destinations with Amazon EventBridge

Post Syndicated from James Beswick original

Amazon EventBridge enables developers to route events between AWS services, integrated software as a service (SaaS) applications, and your own applications. It can help decouple applications and produce more extensible, maintainable architectures. With the new API destinations feature, EventBridge can now integrate with services outside of AWS using REST API calls.

API destinations architecture

This feature enables developers to route events to existing SaaS providers that integrate with EventBridge, like Zendesk, PagerDuty, TriggerMesh, or MongoDB. Additionally, you can use other SaaS endpoints for applications like Slack or Contentful, or any other type of API or webhook. It can also provide an easier way to ingest data from serverless workloads into Splunk without needing to modify application code or install agents.

This blog post explains how to use API destinations and walks through integration examples you can use in your workloads.

How it works

API destinations are third-party targets outside of AWS that you can invoke with an HTTP request. EventBridge invokes the HTTP endpoint and delivers the event as a payload within the request. You can use any preferred HTTP method, such as GET or POST. You can use input transformers to change the payload format to match your target.

An API destination uses a Connection to manage the authentication credentials for a target. This defines the authorization type used, which can be an API key, OAuth client credentials grant, or a basic user name and password.

Connection details

The service manages the secret in AWS Secrets Manager and the cost of storing the secret is included in the pricing for API destinations.

You create a connection to each different external API endpoint and share the connection with multiple endpoints. The API destinations console shows all configured connections, together with their authorization status. Any connections that cannot be established are shown here:

Connections list

To create an API destination, you provide a name, the HTTP endpoint and method, and the connection:

Create API destination

When you configure the destination, you must also set an invocation rate limit between 1 and 300 events per second. This helps protect the downstream endpoint from surges in traffic. If the number of arriving events exceeds the limit, the EventBridge service queues up events. It delivers to the endpoint as quickly as possible within the rate limit.

It continues to do this for 24 hours. To make sure you retain any events that cannot be delivered, set up a dead-letter queue on the event bus. This ensures that if the event is not delivered within this timeframe, it is stored durably in an Amazon SQS queue for further processing. This can also be useful if the downstream API experiences an outage for extended periods of time.

Throttling and retries

Once you have configured the API destination, it becomes available in the list of targets for rules. Matching events are sent to the HTTP endpoint with the event serialized as part of the payload.

Select targets

As with API Gateway targets in EventBridge, the maximum timeout for API destination is 5 seconds. If an API call exceeds this timeout, it is retried.

Debugging the payload from API destinations

You can send an event via API Destinations to debugging tools like to view the headers and payload of the API call:

  1. Create a connection with the credential, such as an API key.Connection details
  2. Create an API destination with the webhook URL endpoint and then create a rule to match and route events.Target configuration
  3. The testing service shows the headers and payload once the webhook is triggered. This can help you test rules if you are adding headers or manipulating the payload using an input testing service

Customizing the payload

Third-party APIs often require custom headers or payload formats when accepting data. EventBridge rules allow you to customize header parameters, query strings, and payload formats without the need for custom code. Header parameters and query strings can be configured with static values or attributes from the event:

Header parameters

To customize the payload, configure an input transformer, which consists of an Input Path and Input template. You use an Input Path to define variables and use JSONPath query syntax to identify the variable source in the event. For example, to extract two attributes from an Amazon S3 PutObject event, the Input Path is:

  "key" : "$.[0].s3.object.key", 
  "bucket" : " $.[0] "

Next, the Input template defines the structure of the data passed to the target, which references the variables. With this release you can now use variables inside quotes in the input transformer. As a result, you can pass these values as a string or JSON, for example:

  "filename" : "<key>", 
  "container" : "mycontainer-<bucket>"

Sending AWS events to DataDog

Using API destinations, you can send any AWS-sourced event to third-party services like DataDog. This approach uses the DataDog API to put data into the service. To do this, you must sign up for an account and create an API key. In this example, I send S3 events via CloudTrail to DataDog for further analysis.

  1. Navigate to the EventBridge console, select API destinations from the menu.
  2. Select the Connections tab and choose Create connection:Connections UI
  3. Enter a connection name, then select API Key for Authorization Type. Enter the API key name DD-API-KEY and paste your secret API key as the value. Choose Create.Create new connection UI
  4. In the API destinations tab, choose Create API destination.Create API destination UI
  5. Enter a name, set the API destination endpoint to, and the HTTP method to POST. Enter 300 for Invocation rate limit and select the DataDog connection from the dropdown. Choose Create.API destination detail UI
  6. From the EventBridge console, select Rules and choose Create rule. Enter a name, select the default bus, and enter this event pattern:
      "source": ["aws.s3"]
  7. In Select targets, choose API destination and select DataDog for the API destination. Expand, Configure Input.
  8. In the Input transformer section, enter {"detail":"$.detail"} in the Input Path field and enter {"message": <detail>} in the Input Template.Select targets UI
  9. You can optionally add a dead letter queue. To do this, open the Retry policy and dead-letter queue section. Under Dead-letter queue, select an existing SQS queue.
    Retry policy and DLQ
  10. Choose Create.
  11. Open AWS CloudShell and upload an object to an S3 bucket in your account to trigger an event:
    echo "test" > testfile.txt
    aws s3 cp testfile.txt s3://YOUR_BUCKET_NAME

    CloudShell output

  12. The logs appear in the DataDog Logs console, where you can process the raw data for further analysis:Datadog Logs console

Sending AWS events to Zendesk

Zendesk is a SaaS provider that provides customer support solutions. It can already send events to EventBridge using a partner integration. This post shows how you can consume ticket events from Zendesk and run a sentiment analysis using Amazon Comprehend.

With API destinations, you can now use events to call the Zendesk API to create and modify tickets and interact with chats and customer profiles.

To create an API destination for Zendesk:

  1. Log in with an existing Zendesk account or register for a trial account.
  2. Navigate to the EventBridge console, select API destinations from the menu and choose Create API destination.
  3. On the Create API destination, page:
    1. Enter a name for the destination (e.g. “SendToZendesk”).
    2. For API destination endpoint, enter https://<<your-subdomain>>
    3. For HTTP method, select POST.
    4. For Invocation rate, enter 10.
  4. In the connection section:
    1. Select the Create a new connection radio button.
    2. For Connection name, enter ZendeskConnection.
    3. For Authorization type, select Basic (Username/Password).
    4. Enter your Zendesk username and password.
  5. Choose Create.
    Connection configuration with basic auth

When you create a rule to route to this API destination, use the Input transformer to build the defined JSON payload, as shown in the previous DataDog example. When an event matches the rule, EventBridge calls the Zendesk Create Ticket API. The new ticket appears in the Zendesk dashboard:

Zendesk dashboard

For more information on the Zendesk API, visit the Zendesk Developer Portal.

Building an integration with AWS CloudFormation and AWS SAM

To support this new feature, there are two new AWS CloudFormation resources available. These can also be used in AWS Serverless Application Model (AWS SAM) templates:

The connection resource defines the connection credential and optional invocation HTTP parameters:

    Type: AWS::Events::Connection
      AuthorizationType: API_KEY
      Description: 'My connection with an API key'
          ApiKeyName: VHS
          ApiKeyValue: Testing
          - Key: 'my-integration-key'
            Value: 'ABCDEFGH0123456'

    Value: !Ref TestConnection
    Value: !GetAtt TestConnection.Arn

The API destination resource provides the connection, endpoint, HTTP method, and invocation limit:

    Type: AWS::Events::ApiDestination
      Name: 'datadog-target'
      ConnectionArn: arn:aws:events:us-east-1:123456789012:connection/datadogConnection/2
      InvocationEndpoint: ''
      HttpMethod: POST
      InvocationRateLimitPerSecond: 300

    Value: !Ref TestApiDestination
    Value: !GetAtt TestApiDestination.Arn
    Value: !GetAtt TestApiDestination.SecretArn

You can use the existing AWS::Events::Rule resource to configure an input transformer for API destination targets:

  Type: AWS::Events::Rule
        - "EventsForMyAPIdestination"
    State: "ENABLED"
        Arn: !Ref TestApiDestinationArn
            detail: $.detail
          InputTemplate: >
                    "message": <detail>


The API destinations feature of EventBridge enables developers to integrate workloads with third-party applications using REST API calls. This provides an easier way to build decoupled, extensible applications that work with applications outside of the AWS Cloud.

To use this feature, you configure a connection and an API destination. You can use API destinations in the same way as existing targets for rules, and also customize headers, query strings, and payloads in the API call.

Learn more about using API destinations with the following SaaS providers: DatadogFreshworks, MongoDB, TriggerMesh, and Zendesk.

For more serverless learning resources, visit Serverless Land.

Using AWS X-Ray tracing with Amazon EventBridge

Post Syndicated from James Beswick original

AWS X-Ray allows developers to debug and analyze distributed applications. It can be useful for tracing transactions through microservices architectures, such as those typically used in serverless applications. Amazon EventBridge allows you to route events between AWS services, integrated software as a service (SaaS) applications, and your own applications. EventBridge can help decouple applications and produce more extensible, maintainable architectures.

EventBridge now supports trace context propagation for X-Ray, which makes it easier to trace transactions through event-based architectures. This means you can potentially trace a single request from an event producer through to final processing by an event consumer. These may be decoupled application stacks where the consumer has no knowledge of how the event is produced.

This blog post explores how to use X-Ray with EventBridge and shows how to implement tracing using the example application in this GitHub repo.

How it works

X-Ray works by adding a trace header to requests, which acts as a unique identifier. In the case of a serverless application using multiple AWS services, this allows X-Ray to group service interactions together as a single trace. X-Ray can then produce a service map of the transaction flow or provide the raw data for a trace:

X-Ray service map

When you send events to EventBridge, the service uses rules to determine how the events are routed from the event bus to targets. Any event that is put on an event bus with the PutEvents API can now support trace context propagation.

The trace header is provided as internal metadata to support X-Ray tracing. The header itself is not available in the event when it’s delivered to a target. For developers using the EventBridge archive feature, this means that a trace ID is not available for replay. Similarly, it’s not available on events sent to a dead-letter queue (DLQ).

Enabling tracing with EventBridge

To enable tracing, you don’t need to change the event structure to add the trace header. Instead, you wrap the AWS SDK client in a call to AWSXRay.captureAWSClient and grant IAM permissions to allow tracing. This enables X-Ray to instrument the call automatically with the X-Amzn-Trace-Id header.

For code using the AWS SDK for JavaScript, this requires changes to the way that the EventBridge client is instantiated. Without tracing, you declare the AWS SDK and EventBridge client with:

const AWS = require('aws-sdk')
const eventBridge = new AWS.EventBridge()

To use tracing, this becomes:

const AWSXRay = require('aws-xray-sdk')
const AWS = AWSXRay.captureAWS(require('aws-sdk'))
const eventBridge = new AWS.EventBridge()

The interaction with the EventBridge client remains the same but the calls are now instrumented by X-Ray. Events are put on the event bus programmatically using a PutEvents API call. In a Node.js Lambda function, the following code processes an event to send to an event bus, with tracing enabled:

const AWSXRay = require('aws-xray-sdk')
const AWS = AWSXRay.captureAWS(require('aws-sdk'))
const eventBridge = new AWS.EventBridge()

exports.handler = async (event) => {

  let myDetail = { "name": "Alice" }

  const myEvent = { 
    Entries: [{
      Detail: JSON.stringify({ myDetail }),
      DetailType: 'myDetailType',
      Source: 'myApplication',
      Time: new Date

  // Send to EventBridge
  const result = await eventBridge.putEvents(myEvent).promise()

  // Log the result
  console.log('Result: ', JSON.stringify(result, null, 2))

You can also define a custom tracing header using the new TraceHeader attribute on the PutEventsRequestEntry API model. The unique value you provide overrides any trace header on the HTTP header. The value is also validated by X-Ray and discarded if it does not pass validation. See the X-Ray Developer Guide to learn about generating valid trace headers.

Deploying the example application

The example application consists of a webhook microservice that publishes events and target microservices that consume events. The generated event contains a target attribute to determine which target receives the event:

Example application architecture

To deploy these microservices, you must have the AWS SAM CLI and Node.js 12.x installed. to To complete the deployment, follow the instructions in the GitHub repo.

EventBridge can route events to a broad range of target services in AWS. Targets that support active tracing for X-Ray can create comprehensive traces from the event source. The services offering active tracing are AWS Lambda, AWS Step Functions, and Amazon API Gateway. In each case, you can trace a request from the producer to the consumer of the event.

The GitHub repo contains examples showing how to use active tracing with EventBridge targets. The webhook application uses a query string parameter called target to determine which events are routed to these targets.

For X-Ray to detect each service in the webhook, tracing must be enabled on both the API Gateway stage and the Lambda function. In the AWS SAM template, the Tracing: Active property turns on active tracing for the Lambda function. If an IAM role is not specified, the AWS SAM CLI automatically adds the arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess policy to the Lambda function’s execution role. For the API definition, adding TracingEnabled: True enables tracing for this API stage.

When you invoke the webhook’s API endpoint, X-Ray generates a trace map of the request, showing each of the services from the REST API call to putting the event on the bus:

X-Ray trace map with EventBridge

The CloudWatch Logs from the webhook’s Lambda function shows the event that has been put on the event bus:

CloudWatch Logs from a webhook

Tracing with a Lambda target

In the targets-lambda example application, the Lambda function uses the X-Ray SDK and has active tracing enabled in the AWS SAM template:

    Type: AWS::Serverless::Function
      CodeUri: src/
      Handler: app.handler
      MemorySize: 128
      Timeout: 3
      Runtime: nodejs12.x
      Tracing: Active

With these two changes, the target Lambda function propagates the tracing header from the original webhook request. When the webhook API is invoked, the X-Ray trace map shows the entire request through to the Lambda target. X-Ray shows two nodes for Lambda – one is the Lambda service and the other is the Lambda function invocation:

Downstream service node in service map

Tracing with an API Gateway target

Currently, active tracing is only supported by REST APIs but not HTTP APIs. You can enable X-Ray tracing from the AWS CLI or from the Stages menu in the API Gateway console, in the Logs/Tracing tab:

Enable X-Ray tracing in API Gateway

You cannot currently create an API Gateway target for EventBridge using AWS SAM. To invoke an API endpoint from the EventBridge console, create a rule and select the API as a target. The console automatically creates the necessary IAM permissions for EventBridge to invoke the endpoint.

Setting API Gateway as an EventBridge target

If the API invokes downstream services with active tracing available, these services also appear as nodes in the X-Ray service graph. Using the webhook application to invoke the API Gateway target, the trace shows the entire request from the initial API call through to the second API target:

API Gateway node in X-Ray service map

Tracing with a Step Functions target

To enable tracing for a Step Functions target, the state machine must have tracing enabled and have permissions to write to X-Ray. The AWS SAM template can enable tracing, define the EventBridge rule and the AWSXRayDaemonWriteAccess policy in one resource:

    Type: AWS::Serverless::StateMachine
      DefinitionUri: definition.asl.json
        LoggerFunctionArn: !GetAtt LoggerFunction.Arn
        Enabled: True
          Type: EventBridgeRule
                - !Sub '${AWS::AccountId}'
                - !Ref EventSource
                    - 'sfn'

        - AWSXRayDaemonWriteAccess
        - LambdaInvokePolicy:
            FunctionName: !Ref LoggerFunction

If the state machine uses services that support active tracing, these also appear in the trace map for individual requests. Using the webhook to invoke this target, X-Ray now shows the request trace to the state machine and the Lambda function it contains:

Step Functions in X-Ray service map

Adding X-Ray tracing to existing Lambda targets

To wrap the SDK client, you must enable active tracing and include the AWS X-Ray SDK in the Lambda function’s deployment package. Unlike the AWS SDK, the X-Ray SDK is not included in the Lambda execution environment.

Another option is to include the X-Ray SDK as a Lambda layer. You can build this layer by following the instructions in the GitHub repo. Once deployed, you can attach the X-Ray layer to any Lambda function either via the console or the CLI:

Adding X-Ray tracing a Lambda function

To learn more about using Lambda layers, read “Using Lambda layers to simplify your development process”.


X-Ray is a powerful tool for providing observability in serverless applications. With the launch of X-Ray trace context propagation in EventBridge, this allows you to trace requests across distributed applications more easily.

In this blog post, I walk through an example webhook application with three targets that support active tracing. In each case, I show how to enable tracing either via the console or using AWS SAM and show the resulting X-Ray trace map.

To learn more about how to use tracing with events, read the X-Ray Developer Guide or see the Amazon EventBridge documentation for this feature.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Building a solid security foundation – Part 2

Post Syndicated from James Beswick original

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This two-part series discusses core security concepts for Lambda-based applications.

Part 1 explains the Lambda execution environment and how to apply the principles of least privilege to your workload. This post covers securing workloads with public endpoints, encrypting data, and using AWS CloudTrail for governance, compliance, and operational auditing.

Securing workloads with public endpoints

For workloads that are accessible publicly, AWS provides a number of features and services that can help mitigate certain risks. This section covers authentication and authorization of application users and protecting API endpoints.

Authentication and authorization

Authentication relates to identity and authorization refers to actions. Use authentication to control who can invoke a Lambda function, and then use authorization to control what they can do. For many applications, AWS Identity & Access Management (IAM) is sufficient for managing both control mechanisms.

For applications with external users, such as web or mobile applications, it is common to use JSON Web Tokens (JWTs) to manage authentication and authorization. Unlike traditional, server-based password management, JWTs are passed from the client on every request. They are a cryptographically secure way to verify identity and claims using data passed from the client. For Lambda-based applications, this allows you to secure APIs for each microservice independently, without relying on a central server for authentication.

You can implement JWTs with Amazon Cognito, which is a user directory service that can handle registration, authentication, account recovery, and other common account management operations. For frontend development, Amplify Framework provides libraries to simplify integrating Cognito into your frontend application. You can also use third-party partner services like Auth0.

Given the critical security role of an identity provider service, it’s important to use professional tooling to safeguard your application. It’s not recommended that you write your own services to handle authentication or authorization. Any vulnerabilities in custom libraries may have significant implications for the security of your workload and its data.

Protecting API endpoints

For serverless applications, the preferred way to serve a backend application publicly is to use Amazon API Gateway. This can help you protect an API from malicious users or spikes in traffic.

For authenticated API routes, API Gateway offers both REST APIs and HTTP APIs for serverless developers. Both types support authorization using AWS Lambda, IAM or Amazon Cognito. When using IAM or Amazon Cognito, incoming requests are evaluated and if they are missing a required token or contain invalid authentication, the request is rejected. You are not charged for these requests and they do not count towards any throttling quotas.

Unauthenticated API routes may be accessed by anyone on the public internet so it’s recommended that you limit their use. If you must use unauthenticated APIs, it’s important to protect these against common risks, such as denial-of-service (DoS) attacks. Applying AWS WAF to these APIs can help protect your application from SQL injection and cross-site scripting (XSS) attacks. API Gateway also implements throttling at the AWS account-level and per-client level when API keys are used.

In some cases, the functionality provided by an unauthenticated API can be achieved with an alternative approach. For example, a web application may provide a list of customer retail stores from an Amazon DynamoDB table to users who are not logged in. This request may originate from a frontend web application or from any other source that calls the URL endpoint. This diagram compares three solutions:

Solutions for an unauthenticated API

  1. The unauthenticated API can be called by anyone on the internet. In a denial of service attack, it’s possible to exhaust API throttling limits, Lambda concurrency, or DynamoDB provisioned read capacity on an underlying table.
  2. An Amazon CloudFront distribution in front of the API endpoint with an appropriate time-to-live (TTL) configuration may help absorb traffic in a DoS attack, without changing the underlying solution for fetching the data.
  3. Alternatively, for static data that rarely changes, the CloudFront distribution could serve the data from an S3 bucket.

The AWS Well-Architected Tool provides a Serverless Lens that analyzes the security posture of serverless workloads.

Encrypting data in Lambda-based applications

Managing secrets

For applications handling sensitive data, AWS services provide a range of encryption options for data in transit and at rest. It’s important to identity and classify sensitive data in your workload, and minimize the storage of sensitive data to only what is necessary.

When protecting data at rest, use AWS services for key management and encryption of stored data, secrets and environment variables. Both the AWS Key Management Service and AWS Secrets Manager provide a robust approach to storing and managing secrets used in Lambda functions.

Do not store plaintext secrets or API keys in Lambda environment variables. Instead, use KMS to encrypt environment variables. Also ensure you do not embed secrets directly in function code, or commit these secrets to code repositories.

Using HTTPS securely

HTTPS is encrypted HTTP, using TLS (SSL) to encrypt the request and response, including headers and query parameters. While query parameters are encrypted, URLs may be logged by different services in plaintext, so you should not use these to store sensitive data such as credit card numbers.

AWS services make it easier to use HTTPS throughout your application and it is provided by default in services like API Gateway. Where you need an SSL/TLS certificate in your application, to support features like custom domain names, it’s recommended that you use AWS Certificate Manager (ACM). This provides free public certificates for ACM-integrated services and managed certificate renewal.

Governance controls with AWS CloudTrail

For compliance and operational auditing of application usage, AWS CloudTrail logs activity related to your AWS account usage. It tracks resource changes and usage, and provides analysis and troubleshooting tools. Enabling CloudTrail does not have any negative performance implications for your Lambda-based application, since the logging occurs asynchronously.

Separate from application logging (see chapter 4), CloudTrail captures two types of events:

  • Control plane: These events apply to management operations performed on any AWS resources. Individual trails can be configured to capture read or write events, or both.
  • Data plane: Events performed on the resources, such as when a Lambda function is invoked or an S3 object is downloaded.

For Lambda, you can log who creates and invokes functions, together with any changes to IAM roles. You can configure CloudTrail to log every single activity by user, role, service, and API within an AWS account. The service is critical for understanding the history of changes made to your account and also detecting any unintended changes or suspicious activity.

To research which AWS user interacted with a Lambda function, CloudTrail provides an audit log to find this information. For example, when a new permission is added to a Lambda function, it creates an AddPermission record. You can interpret the meaning of individual attributes in the JSON message by referring to the CloudTrail Record Contents documentation.

CloudTrail Record Contents documentation

CloudTrail data is considered sensitive so it’s recommended that you protect it with KMS encryption. For any service processing encrypted CloudTrail data, it must use an IAM policy with kms:Decrypt permission.

By integrating CloudTrail with Amazon EventBridge, you can create alerts in response to certain activities and respond accordingly. With these two services, you can quickly implement an automated detection and response pattern, enabling you to develop mechanisms to mitigate security risks. With EventBridge, you can analyze data in real-time, using event rules to filter events and forward to targets like Lambda functions or Amazon Kinesis streams.

CloudTrail can deliver data to Amazon CloudWatch Logs, which allows you to process multi-Region data in real time from one location. You can also deliver CloudTrail to Amazon S3 buckets, where you can create event source mappings to start data processing pipelines, run queries with Amazon Athena, or analyze activity with Amazon Macie.

If you use multiple AWS accounts, you can use AWS Organizations to manage and govern individual member accounts centrally. You can set an existing trail as an organization-level trail in a primary account that can collect events from all other member accounts. This can simplify applying consistent auditing rules across a large set of existing accounts, or automatically apply rules to new accounts. To learn more about this feature, see Creating a Trail for an Organization.


In this blog post, I explain how to secure workloads with public endpoints and the different authentication and authorization options available. I also show different approaches to exposing APIs publicly.

CloudTrail can provide compliance and operational auditing for Lambda usage. It provides logs for both the control plane and data plane. You can integrate CloudTrail with EventBridge to create alerts in response to certain activities. Customers with multiple AWS accounts can use AWS Organizations to manage trails centrally.

For more serverless learning resources, visit Serverless Land.

Building a serverless multi-player game that scales

Post Syndicated from James Beswick original

This post is written by Tim Bruce, Sr. Solutions Architect, Developer Acceleration.

Game development is a highly iterative process with rapidly changing requirements. Many game developers want to maximize the time spent building features and less time configuring servers, managing infrastructure, and mastering scale.

AWS Serverless provides four key benefits for customers. First, it can help move from idea to market faster, by reducing operational overhead. Second, customers may realize lower costs with serverless by not over-provisioning hardware and software to operate. Third, serverless scales with user activity. Finally, serverless services provide built-in integration, allowing you to focus on your game instead of connecting pieces together.

For AWS Gaming customers, these benefits allow your teams to spend more time focusing on gameplay and content, instead of undifferentiated tasks such as setting up and maintaining servers and software. This can result in better gameplay and content, and a faster time-to-market.

This blog post introduces a game with a serverless-first architecture. Simple Trivia Service is a web-based game showing architectural patterns that you can apply in your own games.

Introducing the Simple Trivia Service

The Simple Trivia Service offers single- and multi-player trivia games with content created by players. There are many features in Simple Trivia Service found in games, such as user registration, chat, content creation, leaderboards, game play, and a marketplace.

Simple Trivia Service UI

Authenticated players can chat with other players, create and manage quizzes, and update their profile. They can play single- and multi-player quizzes, host quizzes, and buy and sell quizzes on the marketplace. The single- and multi-player game modes show how games with different connectivity and technical requirements can be delivered with serverless first architectures. The game modes and architecture solutions are covered in the Simple Trivia Service backend architecture section.

Simple Trivia Service front end

The Simple Trivia Service front end is a Vue.js single page application (SPA) that accesses backend services. The SPA app, accessed via a web browser, allows users to make requests to the game endpoints using HTTPS, secure WebSockets, and WebSockets over MQTT. These requests use integrations to access the serverless backend services.

Vue.js helps make this reference architecture more accessible. The front end uses AWS Amplify to build, deploy, and host the SPA without the need to provision and manage any resources.

Simple Trivia Service backend architecture

The backend architecture for Simple Trivia Service is defined in a set of AWS Serverless Application Model (AWS SAM) templates for portions of the game. A deployment guide is included in the file in the GitHub repository. Here is a visual depiction of the backend architecture.

Reference architecture

Services used

Simple Trivia Service is built using AWS Lambda, Amazon API Gateway, AWS IoT, Amazon DynamoDB, Amazon Simple Notification Service (SNS), AWS Step Functions, Amazon Kinesis, Amazon S3, Amazon Athena, and Amazon Cognito:

  • Lambda enables serverless microservice features in Simple Trivia Service.
  • API Gateway provides serverless endpoints for HTTP/RESTful and WebSocket communication while IoT delivers a serverless endpoint for WebSockets over MQTT communication.
  • DynamoDB enables data storage and retrieval for internet-scale applications.
  • SNS provides microservice communications via publish/subscribe functionality.
  • Step Functions coordinates complex tasks to ensure appropriate outcomes.
  • Analytics for Simple Trivia Service are delivered via Kinesis and S3 with Athena providing a query/visualization capability.
  • Amazon Cognito provides secure, standards-based login and a user directory.

Two managed services that are not serverless, Amazon VPC NAT Gateway and Amazon ElastiCache for Redis, are also used. VPC NAT Gateway is required by VPC-enabled Lambda functions to reach services outside of the VPC, such as DynamoDB. ElastiCache provides an in-memory database suited for applications with submillisecond latency requirements.

User security and enabling communications to backend services

Players are required to register and log in before playing. Registration and login credentials are sent to Amazon Cognito using the Secure Remote Password protocol. Upon successfully logging in, Amazon Cognito returns a JSON Web Token (JWT) and an Amazon Cognito user session.

The JWT is included within requests to API Gateway, which validates the token before allowing the request to be forwarded to Lambda.

IoT requires additional security for users by using an AWS Identity and Access Management (IAM) policy. A policy attached to the Amazon Cognito user allows the player to connect, subscribe, and send messages to the IoT endpoint.

Game types and supporting architectures

Simple Trivia Service’s three game modes define how players interact with the backend services. These modes align to different architectures used within the game.

“Single Player” quiz architecture

“Single Player” quiz architecture

Single player quizzes have simple rules, short play sessions, and appeal to wide audiences. Single player game communication is player-to-endpoint only. This is accomplished with API Gateway via an HTTP API.

Four Lambda functions (ActiveGamesList, GamePlay, GameAnswer, and LeaderboardGet) enable single player games. These functions are integrated with API Gateway and respond to specific requests from the client. API Gateway forwards the request, including URI, body, and query string, to the appropriate Lambda function.

When a player chooses “Play”, a request is sent to API Gateway, which invokes the ActiveGamesList function. This function queries the ActiveGames DynamoDB table and returns the list of active games to the user.

The player selects a game, resulting in another request triggering the GamePlay function. GamePlay retrieves the game’s questions from the GamesDetail DynamoDB table. The front end maintains the state for the user during the game.

When all questions are answered, the SPA sends the player’s responses to API Gateway, invoking the GameAnswer function. This function scores the player’s responses against the GameDetails table. The score and answers are sent to the user.

Additionally, this function sends the player score for the leaderboard and player experience to two SNS topics (LeaderboardTopic and PlayerProgressTopic). The ScorePut and PlayerProgressPut functions subscribe to these topics. These two functions write the details to the Leaderboard and Player Progress DynamoDB tables.

This architecture processes these two actions asynchronously, resulting in the player receiving their score and answers without having to wait. This also allows for increased security for player progress, as only the PlayerProgressPut function is allowed to write to this table.

Finally, the player can view the game’s leaderboard, which is returned to the player as the response to the GetLeaderboard function. The function retrieves the top 10 scores and the current player’s score from the Leaderboard table.

“Multi-player – Casual and Competitive” architecture

“Multiplayer – Casual and Competitive” architecture

These game types require player-to-player and service-to-player communication. This is typically performed using TCP/UDP sockets or the WebSocket protocol. API Gateway WebSockets provides WebSocket communication and enables Lambda functions to send messages to and receive messages from game hosts and players.

Game hosts start games via the “Host” button, messaging the LiveAdmin function via API Gateway. The function adds the game to the LiveGames table, which allows players to find and join the game. A list of questions for the game is sent to the game host from the LiveAdmin function at this time. Additionally, the game host is added to the GameConnections table, which keeps track of which connections are related to a game. Players, via the LivePlayer function, are also added to this table when they join a game.

The game host client manages the state of the game for all players and controls the flow of the game, sending questions, correct answers, and leaderboards to players via API Gateway and the LiveAdmin function. The function only sends game messages to the players in the GameConnections table. Player answers are sent to the host via the LivePlayer function.

To end the game, the game host sends a message with the final leaderboard to all players via the LiveAdmin function. This function also stores the leaderboard in the Leaderboard table, removes the game from the ActiveGames table, and sends player progression messages to the Player Progress topic.

“Multi-player – Live Scoreboard” architecture

“Multiplayer – Live Scoreboard” architecture

This is an extension of other multi-player game types requiring similar communications. This uses IoT with WebSockets over MQTT as the transport. It enables the client to subscribe to a topic and act on messages it receives. IoT manages routing messages to clients based on their subscriptions.

This architecture moves the state management from the game host client to a data store on the backend. This change requires a database that can respond quickly to user actions. Simple Trivia Service uses ElastiCache for Redis for this database. Game questions, player responses, and the leaderboard are all stored and updated in Redis during the quiz. The ElastiCache instance is blocked from internet traffic by placing it in a VPC. A security group configures access for the Lambda functions in the same VPC.

Game hosts for this type of game start the game by hosting it, which sends a message to IoT, triggering the CacheGame function. This function adds the game to the ActiveGames table and caches the quiz details from DynamoDB into Redis. Players join the game by sending a message, which is delivered to the JoinGame function. This adds the user record to Redis and alerts the game host that a player has joined.

Game hosts can send questions to the players via a message that invokes the AskQuestion function. This function updates the current question number in Redis and sends the question to subscribed players via the AskQuestion function. The ReceiveAnswer function processes player responses. It validates the response, stores it in Redis, updates the scoreboard, and replies to all players with the updated scoreboard after the first correct answer. The game scoreboard is updated for players in real time.

When the game is over, the game host sends a message to the EndGame function via IoT. This function writes the game leaderboard to the Leaderboard table, sends player progress to the Player Progress SNS topic, deletes the game from cache, and removes the game from the ActiveGames table.


This post introduces the Simple Trivia Service, a single- and multi-player game built using a serverless-first architecture on AWS. I cover different solutions that you can use to enable connectivity from your game client to a serverless-first backend for both single- and multi-player games. I also include a walkthrough of the architecture for each of these solutions.

You can deploy the code for this solution to your own AWS account via instructions in the Simple Trivia Service GitHub repository.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Building a solid security foundation – Part 1

Post Syndicated from James Beswick original

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This two-part series discusses core security concepts for Lambda-based applications.

In the AWS Cloud, the most important foundational security principle is the shared responsibility model. This broadly shares security responsibilities between AWS and our customers. AWS is responsible for “security of the cloud”, such as the underlying physical infrastructure and facilities providing the services. Customers are responsible for “security in the cloud”, which includes applying security best practices, controlling access, and taking measures to protect data.

One of the main reasons for the popularity of Lambda-based applications is that AWS manages even more of the security operations compared with traditional cloud-based compute. For example, Lambda customers using zip file deployments do not need to patch underlying operating systems or apply security patches – these tasks are managed automatically by the Lambda service.

This post explains the Lambda execution environment and mechanisms used by the service to protect customer data. It also covers applying the principles of least privilege to your application and what this means in terms of permissions and Lambda function scope.

Understanding the Lambda execution environment

When your functions are invoked, the Lambda service runs your code inside an execution environment. Lambda scrubs the memory before it is assigned to an execution environment. Execution environments are run on hardware virtualized virtual machines (MicroVMs) which are dedicated to a single AWS account. Execution environments are never shared across functions and MicroVMs are never shared across AWS accounts. This is the isolation model for the Lambda service:

Isolation model for the Lambda service

A single execution environment may be reused by subsequent function invocations. This helps improve performance since it reduces the time taken to prepare and environment. Within your code, you can take advantage of this behavior to improve performance further, by caching locally within the function or reusing long-lived connections. All of these invocations are handled by a single process, so any process-wide state (such as static state in Java) is available across all invocations within the same execution environment.

There is also a local file system available at /tmp for all Lambda functions. This is local to each function but shared across invocations within the same execution environment. If your function must access large libraries or files, these can be downloaded here first and then used by all subsequent invocations. This mechanism provides a way to amortize the cost and time of downloading this data across multiple invocations.

While data is never shared across AWS customers, it is possible for data from one Lambda function to be shared with another invocation of the same function instance. This can be useful for caching common values or sharing libraries. However, if you have information only intended for a single invocation, you should:

  • Ensure that data is only used in a local variable scope.
  • Delete any /tmp files before exiting, and use a UUID name to prevent different instances from accessing the same temporary files.
  • Ensure that any callbacks are complete before exiting.

For applications requiring the highest levels of security, you may also implement your own memory encryption and wiping process before a function exits. At the function level, the Lambda service does not inspect or scan your code. Many of the best practices in security for software development continue to apply in serverless software development.

The security posture of an application is determined by the use-case but developers should always take precautions against common risks such as misconfiguration, injection flaws, and handling user input. Developers should be familiar with common security concepts and security risks, such as those listed in the OWASP Top 10 Web Application Security Risks and the OWASP Serverless Top 10. The use of static code analysis tools, unit tests, and regression tests are still valid in a serverless compute environment.

To learn more, read “Amazon Web Services: Overview of Security Processes”, “Compliance validation for AWS Lambda”, and “Security Overview of AWS Lambda”.

Applying the principles of least privilege

AWS Identity and Access Management (IAM) is the service used to manage access to AWS services. Before using IAM, it’s important to review security best practices that apply across AWS, to ensure that your user accounts are secured appropriately.

Lambda is fully integrated with IAM, allowing you to control precisely what each Lambda function can do within the AWS Cloud. There are two important policies that define the scope of permissions in Lambda functions. The event source uses a resource policy that grants permission to invoke the Lambda function, whereas the Lambda service uses an execution role to constrain what the function is allowed to do. In many cases, the console configures both of these policies with default settings.

As you start to build Lambda-based applications with frameworks such as AWS SAM, you describe both policies in the application’s template.

Resource and execution role policy

By default, when you create a new Lambda function, a specific IAM role is created for only that function.

IAM role for a Lambda function

This role has permissions to create an Amazon CloudWatch log group in the current Region and AWS account, and create log streams and put events to those streams. The policy follows the principle of least privilege by scoping precise permissions to specific resources, AWS services, and accounts.

Developing least privilege IAM roles

As you develop a Lambda function, you expand the scope of this policy to enable access to other resources. For example, for a function that processes objects put into an Amazon S3 bucket, it requires read access to objects stored in that bucket. Do not grant the function broader permissions to write or delete data, or operate in other buckets.

Determining the exact permissions can be challenging, since IAM permissions are granular and they control access to both the data plane and control plane. The following references are useful for developing IAM policies:

One of the fastest ways to scope permissions appropriately is to use AWS SAM policy templates. You can reference these templates directly in the AWS SAM template for your application, providing custom parameters as required:

SAM policy templates

In this example, the S3CrudPolicy template provides full create, read, update, and delete permissions to one bucket, and the S3ReadPolicy template provides only read access to another bucket. AWS SAM named templates expand into more verbose AWS CloudFormation policy definitions that show how the principle of least privilege is applied. The S3ReadPolicy is defined as:

        "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": [
                "Fn::Sub": [
                    "bucketName": {
                      "Ref": "BucketName"
                "Fn::Sub": [
                    "bucketName": {
                      "Ref": "BucketName"

It includes the necessary, minimal permissions to retrieve the S3 object, including getting the bucket location, object version, and lifecycle configuration.

Access to CloudWatch Logs

To log output, Lambda roles must provide access to CloudWatch Logs. If you are building a policy manually, ensure that it includes:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:region:accountID:*"
            "Effect": "Allow",
            "Action": [
            "Resource": [

If the role is missing these permissions, the function still runs but it is unable to log any output to the CloudWatch service.

Avoiding wildcard permissions in IAM policies

The granularity of IAM permissions means that developers may choose to use overly broad permissions when they are testing or developing code.

IAM supports the “*” wildcard in both the resources and actions attributes, making it easier to select multiple matching items automatically. These may be useful when developing and testing functions in specific development AWS accounts with no access to production data. However, you should ensure that “star permissions” are never used in production environments.

Wildcard permissions grant broad permissions, often for many permissions or resources. Many AWS managed policies, such as AdministratorAccess, provide broad access intended only for user roles. Do not apply these policies to Lambda functions, since they do not specify individual resources.

In Application design and Service Quotas – Part 1, the section Using multiple AWS accounts for managing quotas shows a multiple account example. This approach provisions a separate AWS account for each developer in a team, and separates accounts for beta and production. This can help prevent developers from unintentionally transferring overly broad permissions to beta or production accounts.

For developers using the Serverless Framework, the Safeguards plugin is a policy-as-code framework to check deployed templates for compliance with security.

Specialized Lambda functions compared with all-purpose functions

In the post on Lambda design principles, I discuss architectural decisions in choosing between specialized functions and all-purpose functions. From a security perspective, it can be more difficult to apply the principles of least privilege to all-purpose functions. This is partly because of the broad capabilities of these functions and also because developers may grant overly broad permissions to these functions.

When building smaller, specialized functions with single tasks, it’s often easier to identify the specific resources and access requirements, and grant only those permissions. Additionally, since new features are usually implemented by new functions in this architectural design, you can specifically grant permissions in new IAM roles for these functions.

Avoid sharing IAM roles with multiple Lambda functions. As permissions are added to the role, these are shared across all functions using this role. By using one dedicated IAM role per function, you can control permissions more intentionally. Every Lambda function should have a 1:1 relationship with an IAM role. Even if some functions have the same policy initially, always separate the IAM roles to ensure least privilege policies.

To learn more, the series of posts for “Building well-architected serverless applications: Controlling serverless API access” – part 1, part 2, and part 3.


This post explains the Lambda execution environment and how the service protects customer data. It covers important steps you should take to prevent data leakage between invocations and provides additional security resources to review.

The principles of least privilege also apply to Lambda-based applications. I show how you can develop IAM policies and practices to ensure that IAM roles are scoped appropriately, and why you should avoid wildcard permissions. Finally, I explain why using smaller, specialized Lambda functions can help maintain least privilege.

Part 2 will discuss security workloads with public endpoints and how to use AWS CloudTrail for governance, compliance, and operational auditing of Lambda usage.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Application design – Part 3

Post Syndicated from James Beswick original

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses application design for Lambda-based applications.

Part 1 shows how to work with Service Quotas, when to request increases, and architecting with quotas in mind. Part 2 covers scaling and concurrency and the different behaviors of on-demand and Provisioned Concurrency. This post discusses choosing and managing runtimes, networking and VPC configurations, and different invocation modes.

Choosing and managing runtimes in Lambda functions

Lambda natively supports a variety of common runtimes, including Python, Node.js, Java, .NET, and others. If you prefer to use any other runtime, such as PHP or Perl, you can use a custom runtime. There are lists of community-maintained runtimes for a wide range of programming languages or you can build your own. As a result, Lambda customers can run Erlang, COBOL, Haskell, and almost any other runtime needed to support their workloads.

Regardless of compute platform, developers must take action if their preferred runtime version is no longer supported by the maintaining organization. Lambda has a documented runtime support policy for languages and frameworks that explains the process for runtime deprecation. Deprecation dates are driven by each runtime’s maintaining organization. Generally, AWS allows you to continue running functions on runtime versions for a period of time after the official runtime deprecation. You will receive emails from AWS if you have functions affected by an upcoming deprecation.

Runtimes and performance

Your choice of runtime is likely determined by a variety of factors. These include the skills available in your development team and the runtimes used in existing projects, especially in migrations. This choice may also be influenced by IT policy in your organization and other external factors. Lambda is agnostic to the choice of runtime, so you are free to choose without sacrificing capabilities within the service.

Different runtimes have different performance profiles in on-demand compute services like Lambda. For example, both Python and Node.js are fast to initialize and offer reasonable overall performance. Java is much slower to initialize but can be fast once running. The programming language Go can be extremely performant for both start-up and runtime. If performance is critical to your application, then profiling and comparing runtime performance is an important step before coding applications.

Multiple runtimes in single applications

Serverless applications usually consist of multiple Lambda functions. Each Lambda function can use only one runtime but you can use multiple runtimes across multiple functions. This enables you to choose the most appropriate runtime for the task performed by the function. Unlike traditional applications that tend to use a single language runtime, serverless applications allow you to mix-and-match runtimes as needed.

For example, in a Lambda function that transforms JSON between services, you could choose Node.js for your business logic. In another function handling data processing, you may choose Python. Both can co-exist in a single serverless application.

Managing AWS SDKs in Lambda functions

The Lambda service also provides AWS SDKs for your chosen runtime. These enable you to interact with AWS services using familiar code constructs. SDK versions change frequently as AWS adds new features and services, and the Lambda service periodically updates the bundled SDKs. Consequently, if you are using the bundled SDK version, you will notice the version changes in your function even if your function code has not changed.

The bundled SDK is provided as a convenience for developers building simpler functions or using the Lambda console for development. In these cases, SDK version changes typically do not impact the functionality or performance. To lock an SDK version and make it immutable, it’s recommended that you create a Lambda layer with a specific version of an SDK and include this in your deployment package.

To learn more, see the “Creating a layer containing the AWS SDK” section at

Networking and VPC configurations

Lambda functions always run inside VPCs owned by the Lambda service. As with customer-owned VPCs, this allows the service to apply network access and security rules to everything within the VPC. These VPCs are not visible to customers, the configurations are maintained automatically, and monitoring is managed by the service.

When you use some AWS services, they create resources that are only accessible from within your customer VPC. To access these resources with Lambda, your Lambda function must also be configured for access to the same VPC. Importantly, unless you are accessing services with resources in a customer VPC, there is no additional benefit to add a VPC configuration.

By default, Lambda functions have access to the public internet. This is not the case after they have been configured with access to one of your VPCs. If you continue to need access to resources on the internet, set up a NAT instance or Amazon NAT Gateway. Alternatively, you can also use VPC endpoints to enable private communications between your VPC and supported AWS services.

The high availability of the Lambda service depends upon access to multiple Availability Zones within the Region where your code runs. When you create a Lambda function without a VPC configuration, it’s automatically available in all Availability Zones within the Region. When you set up VPC access, you choose which Availability Zones the Lambda function can use. As a result, to provide continued high availability, ensure that the function has access to at least two Availability Zones.

The Lambda service uses a Network Function Virtualization platform to provide NAT capabilities from the Lambda VPC to customer VPCs. This configures the required elastic network interfaces (ENIs) at the point where Lambda functions are created or updated. It also enables ENIs from your account to be shared across multiple execution environments, which allows Lambda to make more efficient use of a limited network resource when functions scale.

Since ENIs are an exhaustible resource and there is a soft limit of 350 ENIs per Region, you should monitor elastic network interface usage if you are configuring Lambda functions for VPC access. Generally, if you increase concurrency limits in Lambda, you should evaluate if you need an elastic network interface increase. If the limit is reached, this causes invocations of VPC-enabled Lambda functions to be throttled.

Most serverless services can be used without further VPC configuration, while most instance-based services require VPC configuration:

AWS services accessible by default AWS services requiring VPC configuration
Amazon API Gateway
Amazon CloudFront
Amazon CloudWatch
Amazon Comprehend
Amazon DynamoDB
Amazon EventBridge
Amazon Kinesis
Amazon Lex
Amazon Polly
Amazon Rekognition
Amazon S3
Amazon SNS
Amazon SQS
AWS Step Functions
Amazon Textract
Amazon Transcribe
Amazon Translate
Amazon ECS
Amazon EFS
Amazon ElastiCache
Amazon Elasticsearch Service
Amazon MSK
Amazon MQ
Amazon RDS
Amazon Redshift

To learn more, read about how VPC networking works with Lambda functions.

Comparing Lambda invocation modes

Lambda functions can be invoked either synchronously or asynchronously, depending upon the trigger. In synchronous invocations, the caller waits for the function to complete execution and the function can return a value. In asynchronous operation, the caller places the event on an internal queue, which is then processed by the Lambda function.

Synchronous invocation Asynchronous invocation Polling invocation
Elastic Load Balancing (Application Load Balancer)
Amazon Cognito
Amazon Lex
Amazon Alexa
Amazon API Gateway
Amazon CloudFront via [email protected]
Amazon Kinesis Data Firehose
Amazon S3 Batch
Amazon S3
Amazon SNS
Amazon Simple Email Service
AWS CloudFormation
Amazon CloudWatch Logs
Amazon CloudWatch Events
AWS CodeCommit
AWS Config
AWS IoT Events
AWS CodePipeline
Amazon DynamoDB
Amazon Kinesis
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Amazon SQS

Synchronous invocations are well suited for short-lived Lambda functions. Although Lambda functions can run for up to 15 minutes, synchronous callers may have shorter timeouts. For example, API Gateway has a 29-second integration timeout, so a Lambda function running for more than 29 seconds will not return a value successfully. In synchronous invocations, if the Lambda function fails, retries are the responsibility of the trigger.

In asynchronous invocations, the caller continues with other work and cannot receive a return value from the Lambda function. The function can send the result to a destination, configurable based on success or failure. The internal queue between the caller and the function ensures that messages are stored durably. The Lambda service scales up the concurrency of the processing function as this internal queue grows. If an error occurs in the Lambda function, the retry behavior is determined by the Lambda service.

AWS service Invocation type Retry behavior
Amazon API Gateway Synchronous None – returns error to the client
Amazon S3 Asynchronous Retries with exponential backoff
Amazon SNS Asynchronous Retries with exponential backoff
Amazon DynamoDB Streams Synchronous from poller Retries until data expiration (24 hours)
Amazon Kinesis Synchronous from poller Retries until data expiration (24 hours to 7 days)
AWS CLI Synchronous/Asynchronous Configured by CLI call
AWS SDK Synchronous/Asynchronous Application-specific
Amazon SQS Synchronous from poller Retries until Message Retention Period expires or is sent to a dead-letter queue

To learn more, read “Invoking AWS Lambda functions” and “Introducing AWS Lambda Destinations”.


This post discusses choosing and managing runtimes, the effect on performance, and how you can use multiple runtimes within a single serverless application. It explains the networking model and whether a Lambda function must have access to a customer VPC or can run with the default VPC configuration. It also compares the different invocation modes for Lambda functions.

For more serverless learning resources, visit Serverless Land.

Extending SaaS products with serverless functions

Post Syndicated from James Beswick original

This post was written by Santiago Cardenas, Sr Partner SA. and Nir Mashkowski, Principal Product Manager.

Increasingly, customers turn to software as a service (SaaS) solutions for the potential of lowering the total cost of ownership (TCO). This enables customers to focus their teams on business priorities instead of managing and maintaining software and infrastructure. Startups are building SaaS products for a wide variety of common application types to take advantage of these market needs.

As SaaS accelerates adoption, enterprise customers expect the same capabilities that are available with traditional, on-premises software. They want the ability to customize system behavior and use rich integrations that can help build solutions rapidly.

For customization and extensibility, many independent software vendors (ISVs) are building application programming interfaces (APIs) and integration hooks. To extend these capabilities, many SaaS builders expose a common set of APIs:

  • Event APIs emit events when SaaS entities change. Synchronous event APIs block the SaaS action until the API completes a request. Asynchronous are non-blocking and use mechanisms like pub/sub and webhooks to inform the caller of updates. Event APIs are used for many purposes, such as enriching incoming data or triggering workflows.
  • CRUD APIs allow developers to interact with entities within the SaaS product. They can be used by mobile or web clients to add, update, and remove records, for example.
  • Schema APIs allow developers to create data entities in the SaaS product, such as tables, key-value stores, or document repositories.
  • User experience (UX) components. Many SaaS products include an SDK that helps provide a consistent look-and-feel and built-in support for common functions, such as authentication. Components are sometimes delivered as code libraries or as an online API that renders the UI.

Business systems expose different subsets of the APIs based on the application domain. Extensibility models are built on top of those APIs and can take various different forms. ISVs use these APIs to build features such as “no code” workflow engines, UX, and report generators. In those cases, the SaaS product runs a domain-specific language (DSL) where it controls compute, storage, and memory consumption.

Figure 1: Example of various APIs providing extensibility within a SaaS app

This level of customization is acceptable for many business users. However, for more sophisticated customization, this requires the ability to write custom code. When coding is needed, some business systems choose to provide sandboxing for the user code within the service. Others choose to ask developers to host the extensibility model themselves.

The growth of vendor-hosted SaaS extensions

First-generation SaaS products essentially “lift and shift” on-premises enterprise software, where each customer has a copy of the entire stack. This single tenant model offers simplicity, a smaller blast radius, and faster time to market.

Newer, born-in-the-cloud SaaS products implement a multi-tenant approach, where all resources are shared across customers. This model may be easier to maintain but can present challenges for handling security, isolation, and resource allocation.

Multi-tenancy challenges are harder when customers can run custom code inside the SaaS infrastructure. To solve this, SaaS builders may start with a customer hosted approach, where customers implement their own extensions by consuming SaaS APIs. This means customers must learn and install an SDK, deploy, and maintain an app in their cloud. This often results in higher cost and slower time to market.

To simplify this model, SaaS builders are finding ways to allow developers to write code directly within the SaaS product. The event driven, pay-per-execution, and polyglot nature of serverless functions provides new capabilities for implementing SaaS extensibility. This model is called vendor hosted SaaS extensions.

SaaS builders are using AWS Lambda for serverless functions to provide flexible compute options to their customers. The goal is to abstract away and simplify the consumption model. AWS provides SaaS builders with features and controls to customize the execution environments as part of their own SaaS product. This allows SaaS owners more flexibility when deciding on isolation models, usability, and cost considerations.

Isolating tenant requests

Isolation of customer requests is important both at the product level and at the tenant level. Product-level isolation focuses on controlling and enforcing the access to data between tenants. It ensures that one tenant is separated from another tenant’s functions. Tenant-level isolation focuses on resources allocated to serve requests. These may include identity, network and internet access, file system access, and memory/CPU allocation.

Figure 2: Example of hierarchical levels of abstraction


SaaS product owners can allow customers to use familiar programming languages within the serverless functions. This allows customers to grow with the service and potentially host and scale independently, using their own infrastructure.

Usability considers the domain and industry of the product. For example, if the SaaS product enables data processing, it may enable invocation of serverless functions during these workflows. Additionally, these functions may provide the customer the context of the user, application, tenant, and the domain. A streamlined, opinionated deployment workflow that abstracts away initial configuration can also aid customer adoption.

Managing costs

Cost is an important factor in driving adoption. It’s an important differentiator to pay only for the resources used, while being able to scale in response to events. This can help reduce costs that are passed on to SaaS customers.

Examples of SaaS product extensibility

Multiple AWS Partners are extending their SaaS product using Lambda for on-demand scalable compute. This enables them to focus on enriching the customer experience that is associated with their business domain. Examples include:

  • Segment Functions, which seamlessly integrates as a source or destination. The service uses code snippets to allow customers to enrich data, enforce consistency, and connect to APIs and services that power their workflows.
  • Freshworks’ Neo platform provides extensibility using the concept of apps. These are powered by Lambda functions hosting the core business logic and backends. Apps are triggered by unplanned and scheduled Freshworks events (customer support tickets, IT service cases, contacts, and deal updates), in addition to app-specific and external events.
  • Netlify Functions enables customers to supercharge frontend code with functions in their development workflow. These can power automated triggers, connect to third-party APIs, or provide user authentication.

All of these SaaS partners abstract away the deployment, versioning, and configuration of custom code using Lambda.


As customers increasingly use SaaS solutions in their businesses, they want the same customization and extensibility available in on-premises solutions. SaaS partners have developed APIs and integration hooks to help address this need. For more sophisticated customization, products enable custom code to run within their SaaS workflows.

This presents SaaS partners with isolation, usability, and cost challenges and many of them are now using serverless functions to address these challenges. Lambda provides a pay-per-value compute service that scales automatically to meet customer demand. Segment Functions, Freshworks, and Netlify Functions have all used Lambda to provide extensibility to their customers.

Lambda continues to develop features and functionality to power the extensibility of SaaS products. We look forward to seeing the new ways you use Lambda to extend your SaaS product for your customers. Share your Lambda extensibility story with us at [email protected].

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Application design – Scaling and concurrency: Part 2

Post Syndicated from James Beswick original

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses application design for Lambda-based applications.

Part 1 shows how to work with Service Quotas, when to request increases, and architecting with quotas in mind. This post covers scaling and concurrency and the different behaviors of on-demand and Provisioned Concurrency.

Scaling and concurrency in Lambda

Lambda is engineered to provide managed scaling in a way that does not rely upon threading or any custom engineering in your code. As traffic increases, Lambda increases the number of concurrent executions of your functions.

When a function is first invoked, the Lambda service creates an instance of the function and runs the handler method to process the event. After completion, the function remains available for a period of time to process subsequent events. If other events arrive while the function is busy, Lambda creates more instances of the function to handle these requests concurrently.

For an initial burst of traffic, your cumulative concurrency in a Region can reach between 500 and 3000 per minute, depending upon the Region. After this initial burst, functions can scale by an additional 500 instances per minute. If requests arrive faster than a function can scale, or if a function reaches maximum capacity, additional requests fail with a throttling error (status code 429).

All AWS accounts start with a default concurrent limit of 1000 per Region. This is a soft limit that you can increase by submitting a request in the AWS Support Center.

On-demand scaling example

In this example, a Lambda receives 10,000 synchronous requests from Amazon API Gateway. The concurrency limit for the account is 10,000. The following shows four scenarios:

On-demand scaling example

In each case, all of the requests arrive at the same time in the minute they are scheduled:

  1. All requests arrive immediately: 3000 requests are handled by new execution environments; 7000 are throttled.
  2. Requests arrive over 2 minutes: 3000 requests are handled by new execution environments in the first minute; the remaining 2000 are throttled. In minute 2, another 500 environments are created and the 3000 original environments are reused; 1500 are throttled.
  3. Requests arrive over 3 minutes: 3000 requests are handled by new execution environments in the first minute; the remaining 333 are throttled. In minute 2, another 500 environments are created and the 3000 original environments are reused; all requests are served. In minute 3, the remaining 3334 requests are served by warm environments.
  4. Requests arrive over 4 minutes: In minute 1, 2500 requests are handled by new execution environment; the same environments are reused in subsequent minutes to serve all requests.

Provisioned Concurrency scaling example

The majority of Lambda workloads are asynchronous so the default scaling behavior provides a reasonable trade-off between throughput and configuration management overhead. However, for synchronous invocations from interactive workloads, such as web or mobile applications, there are times when you need more control over how many concurrent function instances are ready to receive traffic.

Provisioned Concurrency is a Lambda feature that prepares concurrent execution environments in advance of invocations. Consequently, this can be used to address two issues. First, if expected traffic arrives more quickly than the default burst capacity, Provisioned Concurrency can ensure that your function is available to meet the demand. Second, if you have latency-sensitive workloads that require predictable double-digit millisecond latency, Provisioned Concurrency solves the typical cold start issues associated with default scaling.

Provisioned Concurrency is a configuration available for a specific published version or alias of a Lambda function. It does not rely on any custom code or changes to a function’s logic, and it’s compatible with features such as VPC configuration, Lambda layers. For more information on how Provisioned Concurrency optimizes performance for Lambda-based applications, watch this Tech Talk video.

Using the same scenarios with 10,000 requests, the function is configured with a Provisioned Concurrency of 7,000:

Provisioned Concurrency scaling example

  1. In case #1, 7,000 requests are handled by the provisioned environments with no cold start. The remaining 3,000 requests are handled by new, on-demand execution environments.
  2. In cases #2-4, all requests are handled by provisioned environments in the minute when they arrive.

Using service integrations and asynchronous processing

Synchronous requests from services like API Gateway require immediate responses. In many cases, these workloads can be rearchitected as asynchronous workloads. In this case, API Gateway uses a service integration to persist messages in an Amazon SQS queue durably. A Lambda function consumes these messages from the queue, and updates the status in an Amazon DynamoDB table. Another API endpoint provides the status of the request by querying the DynamoDB table:

Async with polling example

API Gateway has a default throttle limit of 10,000 requests per second, which can be raised upon request. SQS standard queues support a virtually unlimited throughput of API actions such as SendMessage.

The Lambda function consuming the messages from SQS can control the speed of processing through a combination of two factors. The first is BatchSize, which is the number of messages received by each invocation of the function, and the second is concurrency. Provided there is still concurrency available in your account, the Lambda function is not throttled while processing messages from an SQS queue.

In asynchronous workflows, it’s not possible to pass the result of the function back through the invocation path. The original API Gateway call receives an acknowledgment that the message has been stored in SQS, which is returned back to the caller. There are multiple mechanisms for returning the result back to the caller. One uses a DynamoDB table, as shown, to store a transaction ID and status, which is then polled by the caller via another API Gateway endpoint until the work is finished. Alternatively, you can use webhooks via Amazon SNS or WebSockets via AWS IoT Core to return the result.

Using this asynchronous approach can make it much easier to handle unpredictable traffic with significant volumes. While it is not suitable for every use case, it can simplify scalability operations.

Reserved concurrency

Lambda functions in a single AWS account in one Region share the concurrency limit. If one function exceeds the concurrent limit, this prevents other functions from being invoked by the Lambda service. You can set reserved capacity for Lambda functions to ensure that they can be invoked even if the overall capacity has been exhausted. Reserved capacity has two effects on a Lambda function:

  1. The reserved capacity is deducted from the overall capacity for the AWS account in a given Region. The Lambda function always has the reserved capacity available exclusively for its own invocations.
  2. The reserved capacity restricts the maximum number of concurrency invocations for that function. Synchronous requests arriving in excess of the reserved capacity limit fail with a throttling error.

You can also use reserved capacity to throttle the rate of requests processed by your workload. For Lambda functions that are invoked asynchronously or using an internal poller, such as for S3, SQS, or DynamoDB integrations, reserved capacity limits how many requests are processed simultaneously. In this case, events are stored durably in internal queues until the Lambda function is available. This can help create a smoothing effect for handling spiky levels of traffic.

For example, a Lambda function receives messages from an SQS queue and writes to a DynamoDB table. It has a reserved concurrency of 10 with a batch size of 10 items. The SQS queue rapidly receives 1,000 messages. The Lambda function scales up to 10 concurrent instances, each processing 10 messages from the queue. While it takes longer to process the entire queue, this results in a consistent rate of write capacity units (WCUs) consumed by the DynamoDB table.

Reserved concurrency for throttling example

To learn more, read “Managing AWS Lambda Function Concurrency” and “Managing concurrency for a Lambda function”.


This post explains scaling and concurrency in Lambda and the different behaviors of on-demand and Provisioned Concurrency. It also shows how to use service integrations and asynchronous patterns in Lambda-based applications. Finally, I discuss how reserved concurrency works and how to use it in your application design.

Part 3 will discuss choosing and managing runtimes, networking and VPC configurations, and different invocation modes.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Application design and Service Quotas – Part 1

Post Syndicated from James Beswick original

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses application design for Lambda-based applications.

A well-architected event-driven application uses a combination of AWS services and custom code to process and manage requests and data. This series on Lambda-specific topics in application design, and how Lambda interacts with other services. There are many important considerations for serverless architects when designing applications for busy production systems.

Part 1 shows how to work with Service Quotas, when to request increases, and architecting with quotas in mind. It also explains how to control traffic for downstream server-based resources.

Understanding quotas

The Lambda service is designed for short-lived compute tasks that do not retain or rely upon state between invocations. The Lambda service invokes your custom code on demand in response to events from other AWS services, or requests via the AWS CLI or AWS SDKs. Code can run for up to 15 minutes in a single invocation and a single function can use up to 10,240 MB of memory.

Lambda is designed to scale rapidly to meet demand, allowing your functions to scale up to serve traffic in your application. Other AWS services frequently used in serverless applications, such as Amazon API Gateway, Amazon SNS, and AWS Step Functions, also scale up in response to increased load. This has enabled our largest customers to build applications that scale to millions of requests quickly without having to manage underlying infrastructure.

However, before you scale an application to these levels, it’s important to understand the guardrails that are put in place to protect your account and the workloads of other customers. Service Quotas exist in all AWS services and consist of hard limits, which you cannot change, and soft limits, which you can request increases for.

By default, all new accounts are assigned a quota profile that allows exploration of the services. However, the values may need to be raised to support medium-to-large application workloads. Typically, customers request increases for their accounts as they start to expand usage of their applications. This allows the quotas to grow with usage, and help protect the account from unexpected costs caused by unintended usage.

Different AWS services have different quotas. These quotas may apply at the Region level, or account level, and may also include time-interval restrictions, such as requests per second. For example, the maximum number of IAM roles is an account-based quota, whereas the maximum number of concurrent Lambda executions is a per-Region quota.

To see the quotas that apply to your account, navigate to the Service Quotas dashboard. This allows you to view your Service Quotas, request a service quota increase, and view current utilization. From here, you can drill down to a specific AWS service, such as Lambda:

Service Quotas for AWS Lambda

In this example, sorted by the Adjustable column, this shows that Concurrent executions, Elastic network interfaces per VPC, and Function and layer storage are all adjustable limits. You could request a quota increase for any of these via the AWS Support Center. The other items provide a useful reference for other limits applying to the service.

Architecting with Service Quotas

Most serverless applications use multiple AWS services, and different services have different quotas for different features. Once you have a serverless architecture designed and know which services your application uses, you can compare the different quotas across services and find any potential issues.

Example serverless application architecture

In this example, API Gateway has a default throttle limit of 10,000 requests per second. Many applications use API Gateway endpoints to invoke Lambda functions. Lambda has a default concurrency limit of 1,000. Since API Gateway to Lambda is a synchronous invocation, it’s possible to have more incoming requests than could be handled simultaneously by a Lambda function, when using the default limits. This can be resolved by requesting to have the Lambda concurrency limit raised for this account to match the expected level of traffic.

Another common challenge is handling payload sizes in different services. Consider an application moving a payload from API Gateway to Lambda to Amazon SQS. API Gateway supports payloads up to 10 Mb, while Lambda’s payload limit is 6 Mb and the SQS message size limit is 256 Kb. In this example, you could instead store the payload in an Amazon S3 bucket instead of uploading to API Gateway, and pass a reference token across the services. The token size is much smaller than any payload limit and may provide a more efficient design for your workload, depending upon the use-case.

Load testing your serverless application also allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be relatively simple to load test, thanks to the automatic scaling built into many of the services. During a load test, you can identify any quotas that may act as a limiting factor for the traffic levels you expect and take action accordingly.

There are several tools available for serverless developers to perform this task. One of the most popular is Artillery Community Edition, which is an open-source tool for testing serverless APIs. You configure the number of requests per second and overall test duration and it uses a headless Chromium browser to run tests. Other popular tools include Nordstrom’s Serverless-Artillery and Gatling.

Using multiple AWS accounts for managing quotas

Many customers have multiple workloads running in the AWS Cloud but many quotas are set at the account level. This means that as you add more serverless workloads, some quotas are shared across more workloads, reducing the quotas available for each workload. Additionally, if you have development resources in the same account as production workloads, quotas are shared across both. It’s possible for development activity to exhaust resources unintentionally that you may want to reserve only for production.

An effective way to solve this issue is to use multiple AWS accounts, dedicating workloads to their own specific account. This prevents quotas from being shared with other workloads or non-production resources. Using AWS Organizations, you can centrally manage the billing, compliance, and security of these accounts. You can attach policies to groups of accounts to avoid custom scripts and manual processes.

One common approach is to provide each developer with an AWS account, and then use separate accounts for a beta deployment stage and production:

Multiple AWS account by environment

The developer accounts can contain copies of production resources and provide the developer with admin-level permissions to these resources. Each developer has their own set of limits for the account, so their usage does not impact your production environment. Individual developers can deploy AWS CloudFormation stacks and AWS Serverless Application Model (AWS SAM) templates into these accounts with minimal risk to production assets.

This approach allows developers to test Lambda functions locally on their development machines against live cloud resources in their individual accounts. It can help create a robust unit testing process, and developers can then push code to a repository like AWS CodeCommit when ready.

By integrating with AWS Secrets Manager, you can store different sets of secrets in each environment and replace any need for credentials stored in code. As code is promoted from developer account through to the beta and production accounts, the correct set of credentials is automatically used. You do not need to share environment-level credentials with individual developers.

To learn more, read “Best practices for organizing larger serverless applications”.

Controlling traffic flow for server-based resources

While Lambda can scale up quickly in response to traffic, many non-serverless services cannot. If your Lambda functions interact with those services downstream, it’s possible to overwhelm those services with data or connection requests.

Amazon RDS is one of the most common Lambda integrations that relies on a server-based resource. However, relational databases are connection-based, so they are intended to work with a few long-lived clients, such as web servers. By contrast, Lambda functions are ephemeral and short-lived, so their database connections are numerous and brief. If Lambda scales up to hundreds or thousands of instances, you may overwhelm downstream relational databases with connection requests. This is typically only an issue for moderately busy applications. If you are using a Lambda function for low-volume tasks, such as running daily SQL reports, you do not experience this behavior.

The Amazon RDS Proxy service is built to solve the high-volume use-case. It pools the connections between the Lambda service and the downstream Amazon RDS database. This means that a scaling Lambda function is able to reuse connections via the proxy. As a result, the relational database is not overwhelmed with connections requests from individual Lambda functions. This does not require code changes in many cases. You only need to replace the database endpoint with the proxy endpoint in your Lambda function.

For other downstream server-based resources, APIs, or third-party services, it’s important to know the limits around connections, transactions, and data transfer. If your serverless workload has the capacity to overwhelm those resources, use an SQS queue to decouple the Lambda function from the target. This allows the server-based resource to process messages from the queue at a steady rate. The queue also durably stores the requests if the downstream resource becomes unavailable.


Lambda works with other AWS services to process and manage requests and data. This post explains how to understand and manage Service Quotas, when to request increases, and architecting with quotas in mind. It also explains how to control traffic for downstream server-based resources.

Part 2 of this series will discuss scaling and concurrency in Lambda and the different behaviors of on-demand and Provisioned Concurrency.

For more guidance, see the Operating Lambda: Understanding event-driven architectures series.

For more serverless learning resources, visit Serverless Land.

Building server-side rendering for React in AWS Lambda

Post Syndicated from James Beswick original

This post is courtesy of Roman Boiko, Solutions Architect.

React is a popular front-end framework used to create single-page applications (SPAs). It is rendered and run on the client-side in the browser. However, for SEO or performance reasons, you may need to render parts of a React application on the server. This is where the server-side rendering (SSR) is useful.

This post introduces the concepts and demonstrates rendering a React application with AWS Lambda. To deploy this solution and to provision the AWS resources, I use the AWS Cloud Development Kit (CDK). This is an open-source framework, which helps you reduce the amount of code required to automate deployment.


This solution uses Amazon S3, Amazon CloudFront, Amazon API Gateway, AWS Lambda, and [email protected]. It creates a fully serverless SSR implementation, which automatically scales according to the workload. This solution addresses three scenarios.

1. A static React app hosted in an S3 bucket with a CloudFront distribution in front of the website. The backend is running behind API Gateway, implemented as a Lambda function. Here, the application is fully downloaded to the client and rendered in a web browser. It sends requests to the backend.

SSR app 1

2. The React app is rendered with a Lambda function. The CloudFront distribution is configured to forward requests from the /ssr path to the API Gateway endpoint. This calls the Lambda function where the rendering is happening. While rendering the requested page, the Lambda function calls the backend API to fetch the data. It returns a static HTML page with all the data. This page may be cached in CloudFront to optimize subsequent requests.

SSR app 2


3. The React app is rendered with a [email protected] function. This scenario is similar but rendering happens at edge locations. The requests to /edgessr are handled by the [email protected] function. This sends requests to the backend and returns a static HTML page.

SSR app 3



The example application shows how the preceding scenarios are implemented with the AWS CDK. This solution requires:

This solution deploys a [email protected] function so it must be provisioned in the US East (N. Virginia) Region.

To get started, download and configure the sample:

  1. From a terminal, clone the GitHub repository:
    git clone
  2. Provide a unique name for the S3 bucket, which is created by the stack and used for React application hosting. Change the placeholder <your bucket name> to your bucket name. To install the solution, run:
    cd react-ssr-lambda
    cd ./cdk
    npm install
    npm run build
    cdk bootstrap
    cdk deploy SSRApiStack --outputs-file ../simple-ssr/src/config.json
    cd ../simple-ssr
    npm install
    npm run build-all
    cd ../cdk
    cdk deploy SSRAppStack --parameters mySiteBucketName=<your bucket name>
  3. Note the following values from the output:
    • SSRAppStack.CFURL – the URL of the CloudFront distribution. Its root path returns the React application stored in S3.
    • SSRAppStack.LambdaSSRURL – the URL of the CloudFront /ssr distribution, which returns a page rendered by the Lambda function.
    • SSRAppStack.LambdaEdgeSSRURL – the URL of the CloudFront /edgessr distribution, which returns a page rendered by [email protected] function.Stack outputs
  4. In a browser, open each of the URLs from step 3. You see the same page with a different footer, indicating how it is rendered.Comparing the served pages

Understanding the React app

The application is created by the create-react-app utility. You can run and test this application locally by navigating to the simple-ssr directory and running the npm start command.

This small application consists of two components that render the list of products received from the backend. The App.js file sends the request, parses the result, and passes it to the component.

import React, { useEffect, useState } from "react";
import ProductList from "./components/ProductList";
import config from "./config.json";
import axios from "axios";

const App = ({ isSSR, ssrData }) => {
  const [err, setErr] = useState(false);
  const [result, setResult] = useState({ loading: true, products: null });
  useEffect(() => {
    const getData = async () => {
      const url = config.SSRApiStack.apiurl;
      try {
        let result = await axios.get(url);
        setResult({ loading: false, products: });
      } catch (error) {
  }, []);
  if (err) {
    return <div>Error {err}</div>;
  } else {
    return (
        <ProductList result={result} />

export default App;

Adding server-side rendering

To support SSR, I change the preceding application using several Lambda functions with the implementation. As I change the way data is retrieved from the backend, I remove this code from App.js. Instead, the data is retrieved in the Lambda function and injected into the application during the rendering process.

The new file SSRApp.js reflects these changes:

import React, { useState } from "react";
import ProductList from "./components/ProductList";

const SSRApp = ({ data }) => {
  const [result, setResult] = useState({ loading: false, products: data });
  return (
      <ProductList result={result} />

export default SSRApp;

Next, I implement SSR logic in the Lambda function. For simplicity, I use React’s built-in renderToString method, which returns an HTML string. You can find the corresponding file in the simple-ssr/src/server/index.js. The handler function fetches data from the backend, renders the React components, and injects them into the HTML template. It returns the response to API Gateway, which responds to the client.

const handler = async function (event) {
  try {
    const url = config.SSRApiStack.apiurl;
    const result = await axios.get(url);
    const app = ReactDOMServer.renderToString(<SSRApp data={} />);
    const html = indexFile.replace(
      '<div id="root"></div>',
      `<div id="root">${app}</div>`
    return {
      statusCode: 200,
      headers: { "Content-Type": "text/html" },
      body: html,
  } catch (error) {
    console.log(`Error ${error.message}`);
    return `Error ${error}`;

For rendering the same code on [email protected], I change the code to work with CloudFront events and also modify the response format. This function searches for a specific path (/edgessr). All other logic stays the same. You can view the full code at simple-ssr/src/edge/index.js:

const handler = async function (event) {
  try {
    const request = event.Records[0].cf.request;
    if (request.uri === "/edgessr") {
      const url = config.SSRApiStack.apiurl;
      const result = await axios.get(url);
      const app = ReactDOMServer.renderToString(<SSRApp data={} />);
      const html = indexFile.replace(
        '<div id="root"></div>',
        `<div id="root">${app}</div>`
      return {
        status: "200",
        statusDescription: "OK",
        headers: {
          "cache-control": [
              key: "Cache-Control",
              value: "max-age=100",
          "content-type": [
              key: "Content-Type",
              value: "text/html",
        body: html,
    } else {
      return request;
  } catch (error) {
    console.log(`Error ${error.message}`);
    return `Error ${error}`;

The create-react-app utility configures tools such as Babel and webpack for the client-side React application. However, it is not designed to work with SSR. To make the functions work as expected, I transpile these into CommonJS format in addition to transpiling React JSX files. The standard tool for this task is Babel. To add it to this project, I create the configuration file .babelrc.json with instructions to transpile the functions into Node.js v12 format:

  "presets": [
        "targets": {
          "node": 12

I also include all the dependencies. I use the popular frontend tool webpack, which also works with Lambda functions. It adds only the necessary dependencies and minimizes the package size. For this purpose, I create configurations for both functions. You can find these in the webpack.edge.js and webpack.server.js files:

const path = require("path");

module.exports = {
  entry: "./src/edge/index.js",

  target: "node",

  externals: [],

  output: {
    path: path.resolve("edge-build"),
    filename: "index.js",
    library: "index",
    libraryTarget: "umd",

  module: {
    rules: [
        test: /\.js$/,
        use: "babel-loader",
        test: /\.css$/,
        use: "css-loader",

The result of running webpack is one file for each build. I use these files to deploy the Lambda and [email protected] functions. To automate the build process, I add several scripts to package.json.

"build-server": "webpack --config webpack.server.js --mode=development",
"build-edge": "webpack --config webpack.edge.js --mode=development",
"build-all": "npm-run-all --parallel build build-server build-edge"

Launch the build by running npm run build-all.

Deploying the application

After the application successfully builds, I deploy to the AWS Cloud. I use AWS CDK for an infrastructure as code approach. The code is located in cdk/lib/ssr-stack.ts.

First, I create an S3 bucket for storing the static content and I pass the name of the bucket as a parameter. To ensure only CloudFront can access my S3 bucket, I use an access identity configuration:

const mySiteBucketName = new CfnParameter(this, "mySiteBucketName", {
      type: "String",
      description: "The name of S3 bucket to upload react application"

const mySiteBucket = new s3.Bucket(this, "ssr-site", {
      bucketName: mySiteBucketName.valueAsString,
      websiteIndexDocument: "index.html",
      websiteErrorDocument: "error.html",
      publicReadAccess: false,
      //only for demo not to use in production
      removalPolicy: cdk.RemovalPolicy.DESTROY

new s3deploy.BucketDeployment(this, "Client-side React app", {
      sources: [s3deploy.Source.asset("../simple-ssr/build/")],
      destinationBucket: mySiteBucket

const originAccessIdentity = new cloudfront.OriginAccessIdentity(

I deploy the Lambda function from the build directory and configure an integration with API Gateway. I also note the API Gateway domain name for later use in the CloudFront distribution.

const ssrFunction = new lambda.Function(this, "ssrHandler", {
      runtime: lambda.Runtime.NODEJS_12_X,
      code: lambda.Code.fromAsset("../simple-ssr/server-build"),
      memorySize: 128,
      timeout: Duration.seconds(5),
      handler: "index.handler"

const ssrApi = new apigw.LambdaRestApi(this, "ssrEndpoint", {
      handler: ssrFunction

const apiDomainName = `${ssrApi.restApiId}.execute-api.${this.region}`;

I configure the [email protected] function. It’s important to create a function version explicitly to use with CloudFront:

const ssrEdgeFunction = new lambda.Function(this, "ssrEdgeHandler", {
      runtime: lambda.Runtime.NODEJS_12_X,
      code: lambda.Code.fromAsset("../simple-ssr/edge-build"),
      memorySize: 128,
      timeout: Duration.seconds(5),
      handler: "index.handler"

const ssrEdgeFunctionVersion = new lambda.Version(
      { lambda: ssrEdgeFunction }

Finally, I configure the CloudFront distribution to communicate with all the origins:

const distribution = new cloudfront.CloudFrontWebDistribution(
        originConfigs: [
            s3OriginSource: {
              s3BucketSource: mySiteBucket,
              originAccessIdentity: originAccessIdentity
            behaviors: [
                isDefaultBehavior: true,
                lambdaFunctionAssociations: [
                    eventType: cloudfront.LambdaEdgeEventType.ORIGIN_REQUEST,
                    lambdaFunction: ssrEdgeFunctionVersion
            customOriginSource: {
              domainName: apiDomainName,
              originPath: "/prod",
              originProtocolPolicy: cloudfront.OriginProtocolPolicy.HTTPS_ONLY
            behaviors: [
                pathPattern: "/ssr"

The template is now ready for deployment. This approach allows you to use this code in your Continuous Integration and Continuous Delivery/Deployment (CI/CD) pipelines to automate deployments of your SSR applications. Also, you can create a CDK construct to reuse this code in different applications.

Cleaning up

To delete all the resources used in this solution, run:

cd react-ssr-lambda/cdk
cdk destroy SSRApiStack
cdk destroy SSRAppStack


This post demonstrates two ways you can implement and deploy a solution for server-side rendering in React applications, by using Lambda or [email protected]

It also shows how to use open-source tools and AWS CDK to automate the building and deployment of such applications.

For more serverless learning resources, visit Serverless Land.

Operating Lambda: Anti-patterns in event-driven architectures – Part 3

Post Syndicated from James Beswick original

In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part section discusses event-driven architectures and how these relate to Lambda-based applications.

Part 1 covers the benefits of the event-driven paradigm and how it can improve throughput, scale, and extensibility. Part 2 explains some of the design principles and best practices that can help developers gain the benefits of building Lambda-based applications. This post explores anti-patterns in event-driven architectures.

Lambda is not a prescriptive service and provides broad functionality for you to build applications as needed. While this flexibility is important to customers, there are some designs that are technically functional but suboptimal from an architecture standpoint.

The Lambda monolith

In many applications migrated from traditional servers, Amazon EC2 instances or AWS Elastic Beanstalk applications, developers “lift and shift” existing code. Frequently, this results in a single Lambda function that contains all of the application logic that is triggered for all events. For a basic web application, for example, a monolithic Lambda function handles all Amazon API Gateway routes and integrates with all necessary downstream resources:

Monolithic Lambda application

This approach has several drawbacks:

  • Package size: The Lambda function may be much larger because it contains all possible code for all paths, which makes it slower for the Lambda service to download and run.
  • Harder to enforce least privilege: The function’s IAM role must grant permissions for all resources needed for all paths, making the permissions very broad. Many paths in the functional monolith do not need all the permissions that have been granted.
  • Harder to upgrade: In a production system, any upgrades to the single function are more risky and could cause the entire application to stop working. Upgrading a single path in the Lambda function is an upgrade to the entire function.
  • Harder to maintain: It’s more difficult to have multiple developers working on the service since it’s a monolithic code repository. It also increases the cognitive burden on developers and makes it harder to create appropriate test coverage for code.
  • Harder to reuse code: Typically, it can be harder to separate libraries from monoliths, making code reuse more difficult. As you develop and support more projects, this can make it harder to support the code and scale your team’s velocity.
  • Harder to test: As the lines of code increase, it becomes harder to unit all the possible combinations of inputs and entry points in the code base. It’s generally easier to implement unit testing for smaller services with less code.

The preferred alternative is to decompose the monolithic Lambda function into individual microservices, mapping a single Lambda function to a single, well-defined task. In this example web application with a few API endpoints, the resulting microservice-based architecture is based on the API routes.

Microservice architecture

The process of decomposing a monolith depends upon the complexity of your workload. Using strategies like the strangler pattern, you can migrate code from larger code bases to microservices. There are many potential benefits to running a Lambda-based application this way:

  • Package sizes can be optimized for only the code needed for a single task, which helps make the function more performant, and may reduce running cost.
  • IAM roles can be scoped to precisely the access needed by the microservice, making it easier to enforce the principles of least privilege. In controlling the blast radius, using IAM roles this way can give your application a stronger security posture.
  • Easier to upgrade: you can apply upgrades at a microservice level without impacting the entire workload. Upgrades occur at the functional level, not at the application level, and you can implement canary releases to control the rollout.
  • Easier to maintain: adding new features is usually easier when working with a single small service than a monolithic with significant coupling. Frequently, you implement features by adding new Lambda functions without modifying existing code.
  • Easier to reuse code: when you have specialized functions that perform a single task, it’s often easier to copy these across multiple projects. Building a library of generic specialized functions can help accelerate development in future projects.
  • Easier to test: unit testing is easier when there are few lines of code and the range of potential inputs for a function is smaller.
  • Lower cognitive load for developers since each development team has a smaller surface area of the application to understand. This can help accelerate onboarding for new developers.

To learn more, read “Decomposing the Monolith with Event Storming”.

Lambda as orchestrator

Many business workflows result in complex workflow logic, where the flow of operations depends on multiple factors. In an ecommerce example, a payments service is an example of a complex workflow:

  • A payment type may be cash, check, or credit card, all of which have different processes.
  • A credit card payment has many possible states, from successful to declined.
  • The service may need to issue refunds or credits for a portion or the entire amount.
  • A third-party service that processes credit cards may be unavailable due to an outage.
  • Some payments may take multiple days to process.

Implementing this logic in a Lambda function can result in ‘spaghetti code’ that’s different to read, understand, and maintain. It can also become fragile in production systems. The complexity is compounded if you must handle error handling, retry logic, and inputs and outputs processing. These types of orchestration functions are an anti-pattern in Lambda-based applications.

Instead, use AWS Step Functions to orchestrate these workflows using a versionable, JSON-defined state machine. State machines can handle nested workflow logic, errors, and retries. A workflow can also run for up to 1 year, and the service can maintain different versions of workflows, allowing you to upgrade production systems in place. Using this approach also results in less custom code, making an application easier to test and maintain.

While Step Functions is generally best-suited for workflows within a bounded context or microservice, to coordinate state changes across multiple services, instead use Amazon EventBridge. This is a serverless event bus that routes events based upon rules, and simplifies orchestration between microservices.

Recursive patterns that cause invocation loops

AWS services generate events that invoke Lambda functions, and Lambda functions can send messages to AWS services. Generally, the service or resource that invokes a Lambda function should be different to the service or resource that the function outputs to. Failure to manage this can result in invocation loops.

For example, a Lambda function writes an object to an Amazon S3 object, which in turn invokes the same Lambda function via a put event. The invocation causes a second object to be written to the bucket, which invokes the same Lambda function:

Event loops in Lambda-based applications

While the potential for infinite loops exists in most programming languages, this anti-pattern has the potential to consume more resources in serverless applications. Both Lambda and S3 automatically scale based upon traffic, so the loop may cause Lambda to scale to consume all available concurrency and S3 to continue to write objects and generate more events for Lambda. In this situation, you can press the “Throttle” button in the Lambda console to scale the function concurrency down to zero and break the recursion cycle.

This example uses S3 but the risk of recursive loops also exists in Amazon SNS, Amazon SQS, Amazon DynamoDB, and other services. In most cases, it is safer to separate the resources that produce and consume events from Lambda. However, if you need a Lambda function to write data back to the same resource that invoked the function, ensure that you:

  • Use a positive trigger: For example, an S3 object trigger may use a naming convention or meta tag that is only triggered on the first invocation. This prevents objects written from the Lambda function from invoking the same Lambda function again. See the S3-to-Lambda translation application for an example of this mechanism.
  • Use reserved concurrency: Setting the function’s reserved concurrency to a lower limit prevents the function from scaling concurrently beyond that limit. It does not prevent the recursion, but limits the resources consumed as a safety mechanism. This can be useful during the development and test phases.
  • Use Amazon CloudWatch monitoring and alarming: By setting an alarm on a function’s concurrency metric, you can receive alerts if the concurrency suddenly spikes and take appropriate action.

Lambda functions calling Lambda functions

Functions enable encapsulation and code reuse. Most programming languages support the concept of code synchronously calling functions within a code base. In this case, the caller waits until the function returns a response. This model does not generally adapt well to serverless development.

For example, consider a simple ecommerce application consisting of three Lambda functions that process an order:

Ecommerce example with three functions

In this case, the Create order function calls the Process payment function, which in turn calls the Create invoice function. While this synchronous flow may work within a single application on a server, it introduces several avoidable problems in a distributed serverless architecture:

  • Cost: With Lambda, you pay for the duration of an invocation. In this example, while the Create invoice functions runs, two other functions are also running in a wait state, shown in red on the diagram.
  • Error handling: In nested invocations, error handling can become more complex. Either errors are thrown to parent functions to handle at the top-level function, or functions require custom handling. For example, an error in Create invoice might require the Process payment function to reverse the charge, or it may instead retry the Create invoice process.
  • Tight coupling: Processing a payment typically takes longer than creating an invoice. In this model, the availability of the entire workflow is limited by the slowest function.
  • Scaling: The concurrency of all three functions must be equal. In a busy system, this uses more concurrency than would otherwise be needed.

In serverless applications, there are two common approaches to avoid this pattern. First, use an SQS queue between Lambda functions. If a downstream process is slower than an upstream process, the queue durably persists messages and decouples the two functions. In this example, the Create order function publishes a message to an SQS queue, and the Process payment function consumes messages from the queue.

The second approach is to use AWS Step Functions. For complex processes with multiple types of failure and retry logic, Step Functions can help reduce the amount of custom code needed to orchestrate the workflow. As a result, Step Functions orchestrates the work and robustly handles errors and retries, and the Lambda functions contain only business logic.

Synchronous waiting within a single Lambda function

Within a single Lambda, ensure that any potentially concurrent activities are not scheduled synchronously. For example, a Lambda function might write to an S3 bucket and then write to a DynamoDB table:

The wait states, shown in red in the diagram, are compounded because the activities are sequential. If the tasks are independent, they can be run in parallel, which results in the total wait time being set by the longest-running task.

Parallel tasks in Lambda functions

In cases where the second task depends on the completion of the first task, you may be able to reduce the total waiting time and the cost of execution by splitting the Lambda functions:

Splitting tasks over two functions

In this design, the first Lambda function responds immediately after putting the object to the S3 bucket. The S3 service invokes the second Lambda function, which then writes data to the DynamoDB table. This approach minimizes the total wait time in the Lambda function executions.

To learn more, read the “Serverless Applications Lens” from the AWS Well-Architected Framework.


This post discusses anti-patterns in event-driven architectures using Lambda. I show some of the issues when using monolithic Lambda functions or custom code to orchestrate workflows. I explain how to avoid recursive architectures that may cause invocation loops and why you should avoid calling functions from functions. I also explain different approaches to handling waiting in functions to minimize cost.

For more serverless learning resources, visit Serverless Land.