Tag Archives: contributed

Using the circuit-breaker pattern with AWS Lambda extensions and Amazon DynamoDB

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-the-circuit-breaker-pattern-with-aws-lambda-extensions-and-amazon-dynamodb/

This post is written by Alan Oberto Jimenez, Senior Cloud Application Architect, and Tobias Drees, Cloud Application Architect.

Modern software systems frequently rely on remote calls to other systems across networks. When failures occur, they can cascade across multiple services causing service disruptions. One technique for mitigating this risk is the circuit breaker pattern, which can detect and isolate failures in a distributed system. The circuit breaker pattern can help prevent cascading failures and improve overall system stability.

The pattern isolates the failing service and thus prevents cascading failures. It improves the overall responsiveness by preventing long waiting times for timeout periods. Furthermore, it also increases the fault tolerance of the system since it lets the system interact with the affected service again once it is available again.

This blog post presents an example application, showing how AWS Lambda extensions integrate with Amazon DynamoDB to implement the circuit breaker pattern.

Using Lambda extensions to implement the circuit breaker pattern

AWS Lambda extensions provide a way to integrate monitoring, observability, security, and governance tools into the Lambda execution environment without complex installation or configuration management. You can run extensions both as part of the runtime process with an internal extension or as a separate process in the execution environment with an external extension.

Lambda extensions enable the circuit breaker pattern without modifying the core function code. An external extension checks in a separate runtime whether a certain service is reachable or not. This approach decouples the business logic in the Lambda function from failure detection, allowing for the reuse of this Lambda extension across different Lambda functions. Both decoupling of code with different purposes and code reuse is in line with the best practices for building Lambda functions.

Pinging a microservice at each Lambda invocation increases network traffic and latency. Circuit breaker implementations benefit from a caching layer to store the state of the microservices. The Lambda extension fetches the status of a microservice from a database and stores the result in memory for a specified time avoiding a disk write. The Lambda function checks the extension cache before pinging the microservice reducing network traffic. Lambda extensions are an ideal tool to build a caching layer for Lambda functions since its in-memory cache makes it more secure, easier to manage, and more performant due to higher availability compared to calling a network resource instead.


Architecture Overview

  1. The main function process handles the event after every AWS Lambda invocation. Before performing any external call against the external components, it listens for HTTP POST events from the Lambda extension process to fetch the last status of the circuits.
  2. The extension process provides the circuit state to the main process via HTTP POST.
    1. The extension checks its internal cache and returns a valid value if available, otherwise reads the state of the circuits from the DynamoDB table and updates the cache.
    2. Finally, the extension process returns the state of the circuits to the main function via an API call response.
    3. Because of the Lambda extensions lifecycle, this process occurs periodically to keep the local cache updated until the execution environment is terminated.
  3. If the circuit is in the OPEN state, the main function process executes calls against the external microservices, otherwise the process returns a local response.
  4. An Amazon EventBridge event periodically invokes a Lambda responsible for updating the circuit states.
  5. This Lambda function performs the validations needed to determine the status of the different remote microservices (circuits) with an Amazon API Gateway entrypoint.
  6. The Lambda function writes the result of the verification process to the DynamoDB table.


The following prerequisites are required to complete the walkthrough:

  • An active AWS account
  • AWS CLI 2.15.17 or later
  • AWS SAM CLI 1.116.0 or later
  • Git 2.39.3 or later
  • Python 3.12

Initial setup

  1. Clone the code from GitHub onto a local machine:
    git clone https://github.com/aws-samples/implementing-the-circuit-breaker-pattern-with-lambda-extensions-and-dynamodb.git
  2. To install the packages, utilize a virtual environment:
    python -m venv circuit_breaker_venv && source circuit_breaker_venv/bin/activate
  3. To prepare the services for deployment, execute the following AWS Serverless Application Model (SAM) command:
    sam build
  4. To deploy the services, use this command specifying the AWS CLI profile (in the config file in the .aws folder) for the AWS account to deploy the services in:
    sam deploy --guided --profile <AWSProfile>

    Answer the question prompts as appropriate.

  5. You can deploy subsequent local changes in the code with:
    sam build 
    sam deploy

Testing and adjusting the solution

The Lambda function updating the state in DynamoDB runs every minute as specified by the template. After the function has run for the first time after 1 minute, the DynamoDB entry containing the status (“OPEN” or “CLOSED”) is ready. Since the mock API is part of the stack, the status is “OPEN”.

You can invoke the My Microservice Lambda function manually to see:


The Lambda function updating the state in DynamoDB is invoked with an EventBridge rule that specifies the URL and the ID of the service to be monitored. By creating a new EventBridge rule with the correct URL and a new ID, you can use the AWS SAM template for monitoring multiple services.

To add a new EventBridge rule, add this to the template:

    Type: AWS::Events::Rule
      Description: Event rule to trigger the Lambda function with a JSON payload
      ScheduleExpression: rate(1 minute) 
      State: ENABLED
        - Arn: !GetAtt UpdatingStateLambda.Arn
          Id: TargetFunction
          Input: '{ "URL": "https://aws.amazon.com/", "ID": "NewMicroservice"}'  # Add the JSON payload here

    Type: AWS::Lambda::Permission
      FunctionName: !Ref UpdatingStateLambda
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt NewEventRule.Arn    

In the Lambda function that contains the business logic, add the following environment variables. However, for more complex cases with multiple microservices to be monitored, it’s recommended to use AWS Config. Using AWS Config, configurations for Lambda functions can be stored to enable more granular control than with environment variables.

          service_name: "NewMicroservice"

You can adjust the logic of this Lambda function by changing the code in my-microservice/lambda-handler.py or directly in the Lambda section of the AWS Management Console.

If you end up using your own Lambda function to use the circuit breaker Lambda extension, include the circuit breaker extension as a layer:

    Type: AWS::Serverless::Function
      CodeUri: business-logic-microservice/
      Handler: lambda_function.lambda_handler
      MemorySize: 128
      - DynamoDBCrudPolicy:
          TableName: !Ref CircuitBreakerStateTable
      Timeout: 100
      Runtime: python3.8
      - !Ref CircuitBreakerExtensionLayer

Circuit breaker in closed state

So far, the sample application only features an open circuit breaker state signaling a functioning microservice. This section simulates an unresponsive microservice to test the behavior of the system with a closed-circuit breaker state.

  1. Edit the environment variables of the MyMicroservice Lambda function in line 47 of the template.yaml file and the URL of the input to the Lambda updating the state in the event rule in line 107 to a domain that times out such as ”https://aws.amazon.com:81/“.
    API_URL: "https://aws.amazon.com:81/"
    Input: '{ "URL": "https://aws.amazon.com:81/", "ID": "MyMicroservice"}'
  2. Deploy these changes:
    sam build
    sam deploy

The event rule invokes the Lambda function, updating the state every minute. To see the output of this Lambda function, invoke it manually:

Execution result

This Lambda function changes the DynamoDB entry for this URL to:

DynamoDB entry

The MyMicroservice Lambda function receives the DynamoDB entries for the status over HTTP from the Circuit Breaker Lambda extension and proceeds with the logic following a closed state. The output of invoking the Lambda manually is:

Manual output

This shows the circuit breaker pattern working as intended. In the Lambda updating state, the time it takes for the Lambda function to throw a timeout exception is defined as 4 seconds and can be adjusted to the use case.

requests.get(API_URL, headers=headers, timeout=4)


To delete all resources from this stack, run:

sam delete --stack-name new-circuit-breaker-sam-stack


The provided AWS SAM template does not provide an Amazon Virtual Private Cloud (VPC) in which to host the resources. Integrate the resources into an appropriate networking configuration if you are using it in production applications.

The solution has auditability characteristics, as calls to the circuit breaker and to the microservices are logged to the Amazon CloudWatch log group. The audit log is encrypted using AWS Key Management Service.

To monitor the security of your account with the solution, use Amazon GuardDuty, AWS CloudTrail, AWS Config, and AWS WAF for API Gateway.


The circuit breaker pattern is a powerful tool for helping to ensure the resiliency and stability of serverless applications. Lambda extensions are a good fit for its implementation, as demonstrated in this example. With the provided Lambda extension and code, you can incorporate the circuit breaker pattern into your applications and customize it to suit your specific requirements, helping to ensure a robust and reliable system.

For more serverless learning resources, visit Serverless Land.

Running code after returning a response from an AWS Lambda function

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/running-code-after-returning-a-response-from-an-aws-lambda-function/

This post is written by Uri Segev, Principal Serverless Specialist SA.

When you invoke an AWS Lambda function synchronously, you expect the function to return a response. For example, this is the case when a client invokes a Lambda function through Amazon API Gateway or from AWS Step Functions. As the client is waiting for the response, you should return the response as soon as possible.

However, there may be instances where you must perform additional work that does not affect the response and you can do it asynchronously, after you send the response. For example, you may store data in a database or send information to a logging system.

Once you send the response from the function, the Lambda service freezes the runtime environment, and the function cannot run additional code. Even if you create a thread for running a task in the background, the Lambda service freezes the runtime environment once the handler returns, causing the thread to freeze until the next invocation. While you can delay returning the response to the client until all work is complete, this approach can negatively impact the user experience.

This blog explores ways to run a task that may start before the function returns but continues running after the function returns the response to the client.

Invoking an asynchronous Lambda function

The first option is to break the code into two functions. The first function runs the synchronous code; the second function runs the asynchronous code. Before the synchronous function returns, it invokes the second function asynchronously, either directly, using the Invoke API, or indirectly, for example, by sending a message to Amazon SQS to trigger the second function.

This Python code demonstrates how to implement this:

import json
import time
import os
import boto3
from aws_lambda_powertools import Logger

logger = Logger()
client = boto3.client('lambda')

def calc_response(event):
    logger.info(f"[Function] Calculating response")
    time.sleep(1) # Simulate sync work
    return {
        "message": "hello from async"

def submit_async_task(response):
    # Invoke async function to continue
    logger.info(f"[Function] Invoking async task in async function")
    client.invoke_async(FunctionName=os.getenv('ASYNC_FUNCTION'), InvokeArgs=json.dumps(response))

def handler(event, context):
    logger.info(f"[Function] Received event: {json.dumps(event)}")

    response = calc_response(event)
    # Done calculating response, submit async task

    # Return response to client
    logger.info(f"[Function] Returning response to client")
    return {
        "statusCode": 200,
        "body": json.dumps(response)

The following is the Lambda function that performs the asynchronous work:

import json
import time
from aws_lambda_powertools import Logger

logger = Logger()

def handler(event, context):
    logger.info(f"[Async task] Starting async task: {json.dumps(event)}")
    time.sleep(3)  # Simulate async work
    logger.info(f"[Async task] Done")

Use Lambda response streaming

Response streaming enables developers to start streaming the response as soon as they have the first byte of the response, without waiting for the entire response. You usually use response streaming when you must minimize the Time to First Byte (TTFB) or when you must send a response that is larger than 6 MB (the Lambda response payload size limit).

Using this method, the function can send the response using the response streaming mechanism and can continue running code even after sending the last byte of the response. This way, the client receives the response, and the Lambda function can continue running.

This Node.js code demonstrates how to implement this:

import { Logger } from '@aws-lambda-powertools/logger';

const logger = new Logger();

export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {
    logger.info("[Function] Received event: ", event);
    // Do some stuff with event
    let response = await calc_response(event);
    // Return response to client
    logger.info("[Function] Returning response to client");

    await async_task(response);   

const calc_response = async (event) => {
    logger.info("[Function] Calculating response");
    await sleep(1);  // Simulate sync work

    return {
        message: "hello from streaming"

const async_task = async (response) => {
    logger.info("[Async task] Starting async task");
    await sleep(3);  // Simulate async work
    logger.info("[Async task] Done");

const sleep = async (sec) => {
    return new Promise((resolve) => {
        setTimeout(resolve, sec * 1000);

Use Lambda extensions

Lambda extensions can augment Lambda functions to integrate with your preferred monitoring, observability, security, and governance tools. You can also use an extension to run your own code in the background so that it continues running after your function returns the response to the client.

There are two types of Lambda extensions: external extensions and internal extensions. External extensions run as separate processes in the same execution environment. The Lambda function can communicate with the extension using files in the /tmp folder or using a local network, for example, via HTTP requests. You must package external extensions as a Lambda layer.

Internal extensions run as separate threads within the same process that runs the handler. The handler can communicate with the extension using any in-process mechanism, such as internal queues. This example shows an internal extension, which is a dedicated thread within the handler process.

When the Lambda service invokes a function, it also notifies all the extensions of the invocation. The Lambda service only freezes the execution environment when the Lambda function returns a response and all the extensions signal to the runtime that they are finished. With this approach, the function has the extension run the task independently from the function itself and the extension notifies the Lambda runtime when it is done processing the task. This way, the execution environment stays active until the task is done.

The following Python code example isolates the extension code into its own file and the handler imports and uses it to run the background task:

import json
import time
import async_processor as ap
from aws_lambda_powertools import Logger

logger = Logger()

def calc_response(event):
    logger.info(f"[Function] Calculating response")
    time.sleep(1) # Simulate sync work
    return {
        "message": "hello from extension"

# This function is performed after the handler code calls submit_async_task 
# and it can continue running after the function returns
def async_task(response):
    logger.info(f"[Async task] Starting async task: {json.dumps(response)}")
    time.sleep(3)  # Simulate async work
    logger.info(f"[Async task] Done")

def handler(event, context):
    logger.info(f"[Function] Received event: {json.dumps(event)}")

    # Calculate response
    response = calc_response(event)

    # Done calculating response
    # call async processor to continue
    logger.info(f"[Function] Invoking async task in extension")
    ap.start_async_task(async_task, response)

    # Return response to client
    logger.info(f"[Function] Returning response to client")
    return {
        "statusCode": 200,
        "body": json.dumps(response)

The following Python code demonstrates how to implement the extension that runs the background task:

import os
import requests
import threading
import queue
from aws_lambda_powertools import Logger

logger = Logger()

# An internal queue used by the handler to notify the extension that it can
# start processing the async task.
async_tasks_queue = queue.Queue()

def start_async_processor():
    # Register internal extension
    logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Registering with Lambda service...")
    response = requests.post(
        json={'events': ['INVOKE']},
        headers={'Lambda-Extension-Name': LAMBDA_EXTENSION_NAME}
    ext_id = response.headers['Lambda-Extension-Identifier']
    logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Registered with ID: {ext_id}")

    def process_tasks():
        while True:
            # Call /next to get notified when there is a new invocation and let
            # Lambda know that we are done processing the previous task.

            logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Waiting for invocation...")
            response = requests.get(
                headers={'Lambda-Extension-Identifier': ext_id},

            # Get next task from internal queue
            logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Wok up, waiting for async task from handler")
            async_task, args = async_tasks_queue.get()
            if async_task is None:
                # No task to run this invocation
                logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Received null task. Ignoring.")
                # Invoke task
                logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Received async task from handler. Starting task.")
            logger.debug(f"[{LAMBDA_EXTENSION_NAME}] Finished processing task")

    # Start processing extension events in a separate thread
    threading.Thread(target=process_tasks, daemon=True, name='AsyncProcessor').start()

# Used by the function to indicate that there is work that needs to be 
# performed by the async task processor
def start_async_task(async_task=None, args=None):
    async_tasks_queue.put((async_task, args))

# Starts the async task processor

Use a custom runtime

Lambda supports several runtimes out of the box: Python, Node.js, Java, Dotnet, and Ruby. Lambda also supports custom runtimes, which lets you develop Lambda functions in any other programming language that you need to.

When you invoke a Lambda function that uses a custom runtime, the Lambda service invokes a process called ‘bootstrap’ that contains your custom code. The custom code needs to interact with the Lambda Runtime API. It calls the /next endpoint to obtain information about the next invocation. This API call is blocking and it waits until a request arrives. When the function is done processing the request, it must call the /response endpoint to send the response back to the client and then it must call the /next endpoint again to wait for the next invocation. Lambda freezes the execution environment after you call /next, until a request arrives.

Using this approach, you can run the asynchronous task after calling /response, and sending the response back to the client, and before calling /next, indicating that the processing is done.

The following Python code example isolates the custom runtime code into its own file and the function imports and uses it to interact with the runtime API:

import time
import json
import runtime_interface as rt
from aws_lambda_powertools import Logger

logger = Logger()

def calc_response(event):
    logger.info(f"[Function] Calculating response")
    time.sleep(1) # Simulate sync work
    return {
        "message": "hello from custom"

def async_task(response):
    logger.info(f"[Async task] Starting async task: {json.dumps(response)}")
    time.sleep(3)  # Simulate async work
    logger.info(f"[Async task] Done")

def main():
    # You can add initialization code here

    # The following loop runs forever waiting for the next invocation
    # and sending the response back to the client
    while True:
        # Call /next to wait for next request (and indicate 
        # that we are done processing the previous request)

        requestId, event = rt.get_next()

        # The code from here to send_response() is the code
        # that usually goes inside the Lambda handler()

        logger.info(f"[Function] Received event: {json.dumps(event)}")

        # Calculate response
        response = calc_response(event)

        # Done calculating response, send response to client
        logger.info(f"[Function] Returning response to client")
        rt.send_response(requestId, {
            "statusCode": 200,
            "body": json.dumps(response)

        logger.info(f"[Function] Invoking async task")


This Python code demonstrates how to interact with the runtime API:

import requests
import os
from aws_lambda_powertools import Logger

logger = Logger()
run_time_endpoint = os.environ['AWS_LAMBDA_RUNTIME_API']

def get_next():
    logger.debug("[Custom runtime] Waiting for invocation...")
    request = requests.get(
    event = request.json()
    requestId = request.headers["Lambda-Runtime-Aws-Request-Id"]
    return requestId, event

def send_response(requestId, response):
    logger.debug("[Custom runtime] Sending response")
        json = response,


This blog shows four ways of combining synchronous and asynchronous tasks in a Lambda function, allowing you to run tasks that continue running after the function returns a response to the client. The following table summarizes the pros and cons of each solution:

Function URLs, cannot be used with API Gateway, always public

Asynchronous invocation Response streaming Lambda extensions Custom runtime
Complexity Easier to implement Easiest to implement The most complex solution to implement as it requires interacting with the extensions API and a dedicated thread Medium as it interacts with the runtime API
Deployment Need two artifacts: the synchronous function and the asynchronous function A single deployment artifact that contains all code A single deployment artifact that contains all code A single deployment artifact, requires packaging all needed runtime files
Cost Most expensive as it incurs additional invocation cost as well as the overall duration of both functions is higher than having it in one Least expensive Least expensive Least expensive
Starting the async task Before returning from handler Anytime during the handler invocation Anytime during the handler invocation After returning the response to the client, unless you use a dedicated thread
Limitations Payload sent to the asynchronous function cannot exceed 256 KB Only supported with Node.js and custom runtimes. Requires Lambda Function URLs, cannot be used with API Gateway, always public
Additional benefits Better decoupling between synchronous and asynchronous code Ability to send response in stages. Supports payloads larger than 6 MB (at additional cost) The asynchronous task runs in its own thread, which can reduce overall duration and cost
Retries in case of failure in async code Managed by the Lambda service Responsibility of the developer Responsibility of the developer Responsibility of the developer

Choosing the right approach depends on your use case. If you write your function in Node.js and you invoke it using Lambda Function URLs, use response streaming. This is the easiest way to implement, and it is the most cost effective.

If there is a chance for a failure in the asynchronous task (for example, a database is not accessible), and you must ensure that the task completes, use the asynchronous Lambda invocation method. The Lambda service retries your asynchronous function until it succeeds. Eventually, if all retries fail, it invokes a Lambda destination so you can take action.

If you need a custom runtime because you need to use a programming language that Lambda does not natively support, use the custom runtime option. Otherwise, use the Lambda extensions option. It is more complex to implement, but it is cost effective. This allows you to package the code in a single artifact and start processing the asynchronous task before you send the response to the client.

For more serverless learning resources, visit Serverless Land.

Accelerating workflow development with the TestState API in AWS Step Functions

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/accelerating-workflow-development-with-the-teststate-api-in-aws-step-functions/

This post is written by Ben Freiberg, Senior Solutions Architect.

Developers often choose AWS Step Functions to orchestrate the services that comprise their applications. Step Functions is a visual workflow service that makes it easier for developers to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines. Step Functions integrates with over 220 AWS services and any publicly accessible HTTP endpoint. Step Functions provides many features that help developers build, such as built-in error handling, real-time and auditable workflow execution history, and large-scale parallel processing.

Several areas can be time consuming for developers when testing Step Functions workflows. For example, authentication with external services, input/output processing, AWS IAM permission, or intrinsic functions. To simplify and speed up resolving these issues, Step Functions released a new capability last year to test individual states: the TestState API. This feature allows you to test states independently from the execution of your workflow. You can change the input and test different scenarios without the need to deploy your workflow or execute the whole state machine. This feature is available for all task, choice, and pass states.

Since developers spend significant time in IDEs and terminals, TestState is also available via an API. This allows you to iterate over changes for an individual state and lets you refine the input/output processing or conditional logic in a choice state without leaving your IDE. In this post, you’ll learn how the TestState API can speed up your testing and development.

Getting started with TestState

Suppose that you are developing a payment processing workflow that consists of three states. First, a Choice state that checks the type of payment based on the input data. Depending on the type, it calls either an AWS Lambda function or an external endpoint. The task state that invokes the Lambda function includes some input/output processing.

Getting started with TestState

To get started with the TestState API, you must create an IAM role that the service can assume. The role must contain the required IAM permissions for the resources your state is accessing. For information about the permissions a state might need, see IAM permissions to test a state. The following snippet shows the minimal necessary permissions:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": [
      "Resource": "*"

Next, you must provide the definition of the state being tested. The choice state is configured to check the type of payment and if the voucherId is present, in case of a voucher. The following snippet shows the state definition:

    "Type": "Choice",
    "Choices": [
            "And": [
                    "Variable": "$.payment.type",
                    "IsPresent": true
                    "Variable": "$.payment.type",
                    "StringEquals": "voucher"
            "Next": "Process voucher"
            "Variable": "$.payment.type",
            "StringEquals": "credit",
            "Next": "Call payment provider"
    "Default": "Fail"

Using the role and state definition, you can now test it if an input results in the expected next state:

aws stepfunctions test-state 
--definition file://choice.json 
--role-arn "arn:aws:iam::<account-id>:role/StepFunctions-TestState-Role" 
--input '{"payment":{"type":"voucher"}}'

The response shows that the test did not encounter any errors and that the next state would be invoking the Lambda function to process the voucher as expected.

    "output": "{\"payment\":{\"type\":\"voucher\"}}",
    "nextState": "Process voucher",
    "status": "SUCCEEDED"

Similarly, with a payment type of credit as input, the next state is invoking the third-party endpoint:

aws stepfunctions test-state
--definition file://choice.json
--role-arn "arn:aws:iam::<account-id>:role/StepFunctions-TestState-Role"
--input '{"payment":{"type":"credit"}}'
    "output": "{\"payment\":{\"type\":\"credit\"}}",
    "nextState": "Call payment provider",
    "status": "SUCCEEDED"

Because the TestState API takes the state definition as an argument, you do not have to redeploy the state machine when changing the state definition. Instead, you can iterate and test your settings by passing the modified state definition to the TestState API.

Using inspection levels

For each state, you can specify the amount of detail you want to view in the test results. These details provide additional information about the state that you are testing. For example, if you’ve used any input and output data processing filters, such as InputPath or ResultPath in a state, you can view the intermediate and final data processing results. Step Functions provides the following levels to specify the details you want to view, INFODEBUG, and TRACE. All these levels return the status and nextState fields.

Next, the Lambda Invoke state is tested. In this scenario, the state includes input/output processing. The output from the function is transformed by renaming and restructuring the field and then merged with the original input. This is the relevant part of the task definition:

"Process voucher": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {...},
      "Retry": [...],
      "Next": "Success",
      "ResultPath": "$.voucherProcessed",
      "ResultSelector": {
        "status.$": "$.Payload.result",
        "workflowId.$": "$.Payload.workflow"

This time test using the Step Functions console, which can make it easier to understand the input/output processing steps. To get started, open the state machine in Workflow Studio and select the state, and then choose Test State. Make sure to select DEBUG as the inspection level. After testing the state, switch to the Input/output processing tab to check the intermediate steps.

Input/output processing tab

When you call the TestState API and set the inspectionLevel parameter to DEBUG, the API response includes an object called inspectionData. This object contains fields to help you inspect how data was filtered or manipulated within the state when it was executed. This data is shown in the Input/output processing tab in the console.

Being able to see all the processing steps easily in one place allows developers to spot issues and iterate more quickly, saving time.

Testing third-party endpoint integrations

Applications might call third-party endpoints that require authentication. Step Functions offers the HTTPS endpoint resource to connect to third-party HTTP targets outside of the AWS Cloud.

HTTPS endpoints use Amazon EventBridge connections to manage the authentication credentials for the target. This defines the authorization type used, which can be a basic authentication with a username and password, an API key, or OAuth. EventBridge connections use AWS Secrets Manager to store the secret. This keeps the secrets out of the state machine, reducing the risks of accidentally exposing your secrets in logs or in the state machine definition.

Getting the authentication configuration right might involve several time-consuming iterations. With the TRACE inspection level, developers can see the raw HTTP request and response, which is useful for verifying headers, query parameters, and other API-specific details. This option is only available for the HTTP Task. You can also view the secrets included in the EventBridge connection. To do this, you must set the revealSecrets parameter to true in the TestState API. This can help verifying that the correct authentication parameters are used.

To get started, ensure that the execution role used for testing has the necessary permissions, as shown here:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:secretsmanager:<your-region>:<account-id>:secret:events!connection/<your-connection-id>"
    "Version": "2012-10-17",
    "Statement": [
            "Sid": "RetrieveConnectionCredentials",
            "Effect": "Allow",
            "Action": [
            "Resource": [
    "Version": "2012-10-17",
    "Statement": [
            "Sid": "InvokeHTTPEndpoint",
            "Effect": "Allow",
            "Action": [
            "Resource": [

When you test the HTTP task, make sure to set the inspection level to TRACE. Then use the HTTP request and response tab to check the details. This capability saves you time when debugging complex authentication issues.

set the inspection level to TRACE

Automating testing

Testing is not only a manual activity to get the configuration right. Most often, tests are run as part of a suite of tests, which are automatically performed to validate the correct behavior. It also prevents regressions when making changes. The TestState API can easily be integrated in such tests as well.

The following snippet shows a test using the Jest framework in JavaScript. The test checks if the correct next state is produced given a definition and input. The definition resides in a different file, which can also be used for infrastructure as code (IaC) to create the state machine.

const { SFNClient, TestStateCommand } = require("@aws-sdk/client-sfn");
// Import the state definition 
const definition = require("./definition.json");

const client = new SFNClient({});

describe("Step Functions", () => {
  test("that next state is correct", async () => {
    const command = new TestStateCommand({
      definition: JSON.stringify(definition),
      roleArn: "arn:aws:iam::<account-id>:role/<role-with-sufficient-permissions>",
      input: "{}" # Adjust as necessary
    const data = await client.send(command);

    expect(data.nextState).toBe("Success"); # Adjust as necessary

With automated tests, you can safely change your workflow definitions without the need for manual efforts. That way, you are immediately alerted if a change would result in an incompatibility.

With TestState you can increase your test coverage with less effort because you can test states directly. This is especially helpful for complex workflows and states that require a specific set of circumstances to reach them. It makes it easier to validate the correctness of your error-handling as well. You can now test the potentially many combinations of your configured Retriers and Catchers much easier.


The TestState API helps developers to iterate faster, resolve issues efficiently, and deliver high-quality applications with greater confidence. By enabling developers to test individual states independently and integrating testing into their preferred development workflows, it simplifies the debugging process and reduces context switches. Whether testing input/output processing, authentication with external services, or third-party endpoint integrations, the TestState API can be a useful tool for testing.

Automating chaos experiments with AWS Fault Injection Service and AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-chaos-experiments-with-aws-fault-injection-service-and-aws-lambda/

This post is written by André Stoll, Solution Architect.

Chaos engineering is a popular practice for building confidence in system resilience. However, many existing tools assume the ability to alter infrastructure configurations, and cannot be easily applied to the serverless application paradigm. Due to the stateless, ephemeral, and distributed nature of serverless architectures, you must evolve the traditional technique when running chaos experiments on these systems.

This blog post explains a technique for running chaos engineering experiments on AWS Lambda functions. The approach uses Lambda extensions to induce failures in a runtime-agnostic way requiring no function code changes. It shows how you can use the AWS Fault Injection Service (FIS) to automate and manage chaos experiments across different Lambda functions to provide a reusable testing method.


Chaos experiments are commonly applied to cloud applications to uncover latent issues and prevent service disruptions. IT teams use chaos experiments to build confidence in the robustness of their systems. However, the traditional methods used in server-based chaos engineering do not easily translate to the serverless world since many existing tools are based on altering the underlying infrastructure configurations, such as cluster nodes or server instances of your applications.

In serverless applications, AWS handles the undifferentiated heavy lifting of managing infrastructure, so you can focus on delivering business value. But this also means that engineering teams have limited control over the infrastructure, and must rely on application-level tooling to run chaos experiments. Two techniques commonly used in the serverless community for conducting chaos experiments on Lambda functions are modifying the function configuration or using runtime-specific libraries.

Changing the configuration of a Lambda function allows you to induce rudimentary failures. For example, you can set the reserved concurrency of a Lambda function to simulate invocation throttling. Alternatively, you might change the function execution role permissions or the function policy to simulate IAM access denial. These types of failures are easy to implement, but the range of possible fault injection types is limited.

The other technique—injecting chaos into Lambda functions through purpose-built, runtime-specific libraries—is more flexible. There are various open-source libraries that allow you to inject failures, such as added latency, exceptions, or disk exhaustion. Examples of such libraries are Python’s chaos_lambda and failure-lambda for Node.js. The downside is that you must change the function code for every function you want to run chaos experiments on. In addition, those libraries are runtime-specific and each library comes with a set of different capabilities and configurations. This reduces the reusability of your chaos experiments across Lambda functions implemented in different languages.

Injecting chaos using Lambda extensions

Implementing chaos experiments using Lambda extensions allows you to address all of the previous concerns. Lambda extensions augment your functions by adding functionality, such as capturing diagnostic information or automatically instrumenting your code. You can integrate your preferred monitoring, observability, or security tooling deeply into the Lambda environment without complex installation or configuration management. Lambda extensions are generally packaged as Lambda layers and run as a separate process in the Lambda execution environment. You may use extensions from AWS, AWS Lambda partners, or build your own custom functionality.

With Lambda extensions, you can implement a chaos extension to inject the desired failures into your Lambda environments. This chaos extension uses the Runtime API proxy pattern that enables you to hook into the function invocation request and response lifecycle. Lambda runtimes use the Lambda Runtime API to retrieve the next incoming event to be processed by the function handler and return the handler response to the Lambda service.

The Runtime API HTTP endpoint is available within the Lambda execution environment. Runtimes get the API endpoint from the environment variable AWS_LAMBDA_RUNTIME_API. During the initialization of the execution environment, you can modify the runtime startup behavior. This lets you change the value of AWS_LAMBDA_RUNTIME_API to the port the chaos extension process is listening on. Now, all requests to the Runtime API go through the chaos extension proxy. You can use this workflow for blocking malicious events, auditing payloads, or injecting failures.

Injecting chaos using Lambda extensions

  1. The chaos extension intercepts incoming events and outbound responses, and injects failures according to the chaos experiment configuration.
  2. The extension accesses environment variables to read the chaos experiment configuration.
  3. A wrapper script configures the runtime to proxy requests through the chaos extension.

When intercepting incoming events and outbound responses to the Lambda Runtime API, you can simulate failures such as introducing artificial delay or generate an error response to return to the Lambda service. This workflow adds latency to your function calls:


All Lambda runtimes support extensions. Since extensions run as a separate process, you can implement them in a language other than the function code. AWS recommends you implement extensions using a programming language that compiles to a binary executable, such as Golang or Rust. This allows you to use the extension with any Lambda runtime.

Some of the open source projects following this technique are the chaos-lambda-extension, implemented in Rust, or the serverless-chaos-extension, implemented in Python.

Extensions provide you with a flexible and reusable method to run your chaos experiments on Lambda functions. You can reuse the chaos extension for all runtimes without having to change function code. Add the extension to any Lambda function where you want to run chaos experiments.

Automating with AWS FIS experiment templates

According to the Principles of Chaos Engineering, you should “automate your experiments to run continuously”. To achieve this, you can use the AWS Fault Injection Service (FIS).

This service allows you to generate reusable experiment templates. The template specifies the targets and the actions to run on them during the experiment, and an optional stop condition that prevents the experiment from going out of bounds. You can also execute AWS Systems Manager Automation runbooks which support custom fault types. You can write your own custom Systems Manager documents to define the individual steps involved in the automation. To carry out the actions of the experiment, you define scripts in the document to manage your Lambda function and set it up for the chaos experiment.

To use the chaos extension for your serverless chaos experiments:

  1. Set up the Lambda function for the experiment. Add the chaos extension as a layer and configure the experiment, for example, by adding environment variables specifying the fault type and its corresponding value.
  2. Pause the automation and conduct the experiment. To do this, use the aws:sleep automation action. During this period, you conduct the experiment, measure and observe the outcome.
  3. Clean up the experiment. The script removes the layer again and also resets the environment variables.

Running your first serverless chaos experiment

This sample repository provides you with the necessary code to run your first serverless chaos experiment in AWS. The experiment uses the chaos-lambda-extension extension to inject chaos.

The sample deploys the AWS FIS experiment template, the necessary SSM Automation runbooks including the IAM role used by the runbook to configure the Lambda functions. The sample also provisions a Lambda function for testing and an Amazon CloudWatch alarm used to roll back the experiment.


Running the experiment

Follow the steps outlined in the repository to conduct your first experiment. Starting the experiment triggers the automation execution.

Actions summary

This automation includes adding the extension and configuring the experiment, pausing the execution and observing the system and reverting all changes to the initial state.

Executed steps

If you invoke the targeted Lambda function during the second step, failures (in this case, artificial latency) are simulated.

Output result

Security best practices

Extensions run within the same execution environment as the function, so they have the same level of access to resources such as file system, networking, and environment variables. IAM permissions assigned to the function are shared with extensions. AWS recommends you assign the least required privileges to your functions.

Always install extensions from a trusted source only. Use Infrastructure as Code (IaC) and automation tools, such as CloudFormation or AWS Systems Manager, to simplify attaching the same extension configuration, including AWS Identity and Access Management (IAM) permissions, to multiple functions. IaC and automation tools allow you to have an audit record of extensions and versions used previously.

When building extensions, do not log sensitive data. Sanitize payloads and metadata before logging or persisting them for audit purposes.


This blog post details how to run chaos experiments for serverless applications built using Lambda. The described approach uses Lambda extension to inject faults into the execution environment. This allows you to use the same method regardless of runtime or configuration of the Lambda function.

To automate and successfully conduct the experiment, you can use the AWS Fault Injection Service. By creating an experiment template, you can specify the actions to run on the defined targets, such as adding the extension during the experiment. Since the extension can be used for any runtime, you can reuse the experiment template to inject failures into different Lambda functions.

Visit this repository to deploy your first serverless chaos experiment, or watch this video guide for learning more about building extensions. Explore the AWS FIS documentation to learn how to create your own experiments.

For more serverless learning resources, visit Serverless Land.

Building a Serverless Streaming Pipeline to Deliver Reliable Messaging

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/building-a-serverless-streaming-pipeline-to-deliver-reliable-messaging/

This post is written by Jeff Harman, Senior Prototyping Architect, Vaibhav Shah, Senior Solutions Architect and Erik Olsen, Senior Technical Account Manager.

Many industries are required to provide audit trails for decision and transactional systems. AI assisted decision making requires monitoring the full inputs to the decision system in near real time to prevent fraud, detect model drift, and discrimination. Modern systems often use a much wider array of inputs for decision making, including images, unstructured text, historical values, and other large data elements. These large data elements pose a challenge to traditional audit systems that deal with relatively small text messages in structured formats. This blog shows the use of serverless technology to create a reliable, performant, traceable, and durable streaming pipeline for audit processing.


Consider the following four requirements to develop an architecture for audit record ingestion:

  1. Audit record size: Store and manage large payloads (256k – 6 MB in size) that may be heterogeneous, including text, binary data, and references to other storage systems.
  2. Audit traceability: The data stored has full traceability of the payload and external processes to monitor the process via subscription-based events.
  3. High Performance: The time required for blocking writes to the system is limited to the time it takes to transmit the audit record over the network.
  4. High data durability: Once the system sends a payload receipt, the payload is at very low risk of loss because of system failures.

The following diagram shows an architecture that meets these requirements and models the flow of the audit record through the system.

The primary source of latency is the time it takes for an audit record to be transmitted across the network. Applications sending audit records make an API call to an Amazon API Gateway endpoint. An AWS Lambda function receives the message and an Amazon ElastiCache for Redis cluster provides a low latency initial storage mechanism for the audit record. Once the data is stored in ElastiCache, the AWS Step Functions workflow then orchestrates the communication and persistence functions.

Subscribers receive four Amazon Simple Notification Service (Amazon SNS) notifications pertaining to arrival and storage of the audit record payload, storage of the audit record metadata, and audit record archive completion. Users can subscribe an Amazon Simple Queue Service (SQS) queue to the SNS topic and use fan out mechanisms to achieve high reliability.

  1. The Ingest Message Lambda function sends an initial receipt notification
  2. The Message Archive Handler Lambda function notifies on storage of the audit record from ElastiCache to Amazon Simple Storage Service (Amazon S3)
  3. The Message Metadata Handler Lambda function notifies on storage of the message metadata into Amazon DynamoDB
  4. The Final State Aggregation Lambda function notifies that the audit record has been archived.

Any failure by the three fundamental processing steps: Ingestion, Data Archive, and Metadata Archive triggers a message in an SQS Dead Letter Queue (DLQ) which contains the original request and an explanation of the failure reason. Any failure in the Ingest Message function invokes the Ingest Message Failure function, which stores the original parameters to the S3 Failed Message Storage bucket for later analysis.

The Step Functions workflow provides orchestration and parallel path execution for the system. The detailed workflow below shows the execution flow and notification actions. The transformer steps convert the internal data structures into the format required for consumers.

Data structures

There are types three events and messages managed by this system:

  1. Incoming message: This is the message the producer sends to an API Gateway endpoint.
  2. Internal message: This event contains the message metadata allowing subsequent systems to understand the originating message producer context.
  3. Notification message: Messages that allow downstream subscribers to act based on the message.

Solution walkthrough

The message producer calls the API Gateway endpoint, which enforces the security requirements defined by the business. In this implementation, API Gateway uses an API key for providing more robust security. API Gateway also creates a security header for consumption by the Ingest Message Lambda function. API Gateway can be configured to enforce message format standards, see Use request validation in API Gateway for more information.

The Ingest Message Lambda function generates a message ID that tracks the message payload throughout its lifecycle. Then it stores the full message in the ElastiCache for Redis cache. The Ingest Message Lambda function generates an internal message with all the elements necessary as described above. Finally, the Lambda function handler code starts the Step Functions workflow with the internal message payload.

If the Ingest Message Lambda function fails for any reason, the Lambda function invokes the Ingestion Failure Handler Lambda function. This Lambda function writes any recoverable incoming message data to an S3 bucket and sends a notification on the Ingest Message dead letter queue.

The Step Functions workflow then runs three processes in parallel.

  • The Step Functions workflow triggers the Message Archive Data Handler Lambda function to persist message data from the ElastiCache cache to an S3 bucket. Once stored, the Lambda function returns the S3 bucket reference and state information. There are two options to remove the internal message from the cache. Remove the message from cache immediately before sending the internal message and updating the ElastiCache cache flag or wait for the ElastiCache lifecycle to remove a stale message from cache. This solution waits for the ElastiCache lifecycle to remove the message.
  • The workflow triggers the Message Metadata Handler Lambda function to write all message metadata and security information to DynamoDB. The Lambda function replies with the DynamoDB reference information.
  • Finally, the Step Functions workflow sends a message to the SNS topic to inform subscribers that the message has arrived and the data persistence processes have started.

After each of the Lambda functions’ processes complete, the Lambda function sends a notification to the SNS notification topic to alert subscribers that each action is complete. When both Message Metadata and Message Archive Lambda functions are done, the Final Aggregation function makes a final update to the metadata in DynamoDB to include S3 reference information and to remove the ElastiCache Redis reference.

Deploying the solution


  1. AWS Serverless Application Model (AWS SAM) is installed (see Getting started with AWS SAM)
  2. AWS User/Credentials with appropriate permissions to run AWS CloudFormation templates in the target AWS account
  3. Python 3.8 – 3.10
  4. The AWS SDK for Python (Boto3) is installed
  5. The requests python library is installed

The source code for this implementation can be found at  https://github.com/aws-samples/blog-serverless-reliable-messaging

Installing the Solution:

  1. Clone the git repository to a local directory
  2. git clone https://github.com/aws-samples/blog-serverless-reliable-messaging.git
  3. Change into the directory that was created by the clone operation, usually blog_serverless_reliable_messaging
  4. Execute the command: sam build
  5. Execute the command: sam deploy –-guided. You are asked to supply the following parameters:
    1. Stack Name: Name given to this deployment (example: serverless-streaming)
    2. AWS Region: Where to deploy (example: us-east-1)
    3. ElasticacheInstanceClass: EC2 cache instance type to use with (example: cache.t3.small)
    4. ElasticReplicaCount: How many replicas should be used with ElastiCache (recommended minimum: 2)
    5. ProjectName: Used for naming resources in account (example: serverless-streaming)
    6. MultiAZ: True/False if multiple Availability Zones should be used (recommend: True)
    7. The default parameters can be selected for the remainder of questions


Once you have deployed the stack, you can test it through the API gateway endpoint with the API key that is referenced in the deployment output. There are two methods for retrieving the API key either via the AWS console (from the link provided in the output – ApiKeyConsole) or via the AWS CLI (from the AWS CLI reference in the output – APIKeyCLI).

You can test directly in the Lambda service console by invoking the ingest message function.

A test message is available at the root of the project test_message.json for direct Lambda function testing of the Ingest function.

  1. In the console navigate to the Lambda service
  2. From the list of available functions, select the “<project name> -IngestMessageFunction-xxxxx” function
  3. Under the “Function overview” select the “Test” tab
  4. Enter an event name of your choosing
  5. Copy and paste the contents of test_message.json into the “Event JSON” box
  6. Click “Save” then after it has saved, click the “Test”
  7. If successful, you should see something similar to the below in the details:
    "isBase64Encoded": false,
    "statusCode": 200,
    "headers": {
    "Access-Control-Allow-Headers": "Content-Type",
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "OPTIONS,POST"
    "body": "{\"messageID\": \"XXXXXXXXXXXXXX\"}"
  8. In the S3 bucket “<project name>-s3messagearchive-xxxxxx“, find the payload of the original json with a key based on the date and time of the script execution, e.g.: YEAR/MONTH/DAY/HOUR/MINUTE with a file name of the messageID
  9. In a DynamoDB table named metaDataTable, you should find a record with a messageID equal to the messageID from above that contains all of the metadata related to the payload

A python script is included with the code in the test_client folder

  1. Replace the <Your API key key here> and the <Your API Gateway URL here (IngestMessageApi)> values with the correct ones for your environment in the test_client.py file
  2. Execute the test script with Python 3.8 or higher with the requests package installed
    Example execution (from main directory of git clone):
    python3 -m pip install -r ./test_client/requirements.txt
    python3 ./test_client/test_client.py
  3. Successful output shows the messageID and the header JSON payload:
    "messageID": " XXXXXXXXXXXXXX"
  4. In the S3 bucket “<project name>-s3messagearchive-xxxxxx“, you should be able to find the payload of the original json with a key based on the date and time of the script execution, e.g.: YEAR/MONTH/DAY/HOUR/MINUTE with a file name of the messageID
  5. In a DynamoDB table named metaDataTable, you should find a record with a messageID equal to the messageID from above that contains all of the meta data related to the payload


This blog describes architectural patterns, messaging patterns, and data structures that support a highly reliable messaging system for large messages. The use of serverless services including Lambda functions, Step Functions, ElastiCache, DynamoDB, and S3 meet the requirements of modern audit systems to be scalable and reliable. The architecture shared in this blog post is suitable for a highly regulated environment to store and track messages that are larger than typical logging systems, records sized between 256k and 6MB. The architecture serves as a blueprint that can be extended and adapted to fit further serverless use cases.

For serverless learning resources, visit Serverless Land.

Comparing design approaches for building serverless microservices

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/comparing-design-approaches-for-building-serverless-microservices/

This post is written by Luca Mezzalira, Principal SA, and Matt Diamond, Principal, SA.

Designing a workload with AWS Lambda creates questions for developers due to the modularity that can be expressed either at the code or infrastructure level. Using serverless for running code requires additional planning to extract the business logic from the underlying functional components. This deliberate separation of concerns ensures a robust modularity, paving the way for evolutionary architectures.

This post focuses on synchronous workloads, but similar considerations are applicable in other workload types. After identifying the bounded context of your API and agreeing on API contracts with consumers, it’s time to structure the architecture of your bounded context and the associated infrastructure.

The two most common ways to structure an API using Lambda functions are single responsibility and Lambda-lith. However, this blog post explores an alternative to these approaches, which can provide the best of both.

Single responsibility Lambda functions

Single responsibility Lambda functions are designed to run a specific task or handle a particular event-triggered operation within a serverless architecture:


This approach provides a strong separation of concerns between business logic and capabilities. You can test in isolation specific capabilities, deploy a Lambda function independently, reduce the surface to introduce bugs, and enable easier debugging for issues in Amazon CloudWatch.

Additionally, single purpose functions enable efficient resource allocation as Lambda automatically scales based on demand, optimizing resource consumption, and minimizing costs. This means you can modify the memory size, architecture, and any other configuration available per function. Moreover, requesting an update of concurrent function execution via a support ticket becomes easier because you are not aggregating the traffic to a single Lambda function that handles every request but you can request specific increase based on the traffic of a single task.

Another advantage is rapid execution time. Considering the business logic for a single-purpose Lambda function designed for a single task, you can optimize the size of a function more easily, without the need of additional libraries required in other approaches. This helps reduce the cold start time due to a smaller bundle size.

Despite these benefits, some issues exist when solely relying on single-purpose Lambda functions. While the cold start time is mitigated, you might experience a higher number of cold starts, particularly for functions with sporadic or infrequent invocations. For example, a function that deletes users in an Amazon DynamoDB table likely won’t be triggered as often as one that reads user data. Also, relying heavily on single-purpose Lambda functions can lead to increased system complexity, especially as the number of functions grows.

A good separation of concerns helps maintain your code base, at the cost of a lack of cohesion. In functions with similar tasks, such as write operations of an API (POST, PUT, DELETE), you might duplicate code and behaviors across multiple functions. Moreover, updating common libraries shared via Lambda Layers, or other dependency management systems, requires multiple changes across every function instead of an atomic change on a single file. This is also true for any other change across multiple functions, for instance, updating the runtime version.

Lambda-lith: Using one single Lambda function

When many workloads use single purpose Lambda functions, developers end up with a proliferation of Lambda functions across an AWS account. One of the main challenges developers face is updating common dependencies or function configurations. Unless there is a clear governance strategy implemented for addressing this problem (such as using Dependabot for enforcing the update of dependencies, or parameterized parameters that are retrieved at provisioning time), developers may opt for a different strategy.

As a result, many development teams move in the opposite direction, aggregating all code related to an API inside the same Lambda function.

Lambda-lith: Using one single Lambda function

This approach is often referred to as a Lambda-lith, because it gathers all the HTTP verbs that compose an API and sometimes multiple APIs in the same function.

This allows you to have a higher code cohesion and colocation across the different parts of the application. Modularity in this case is expressed at the code level, where patterns like single responsibility, dependency injection, and façade are applied to structure your code. The discipline and code best practices applied by the development teams is crucial for maintaining large code bases.

However, considering the reduced number of Lambda functions, updating a configuration or implementing a new standard across multiple APIs can be achieved more easily compared with the single responsibility approach.

Moreover, since every request invokes the same Lambda function for every HTTP verb, it’s more likely that little-used parts of your code have a better response time because an execution environment is more likely to be available to fulfill the request.

Another factor to consider is the function size. This increases when collocating verbs in the same function with all the dependencies and business logic of an API. This may affect the cold start of your Lambda functions with spiky workloads. Customers should evaluate the benefits of this approach, especially when applications have restrictive SLAs, which would be impacted by cold starts. Developers can mitigate this problem by paying attention to the dependencies used and implementing techniques like tree-shaking, minification, and dead code elimination, where the programming language allows.

This coarse grain approach won’t allow you to tune your function configurations individually. But you must find a configuration that matches all the code capabilities with a possibly higher memory size and looser security permissions that might clash with the requirements defined by the security team.

Read and write functions

These two approaches both have trade-offs, but there is a third option that can combine their benefits.

Often, API traffic leans towards more reads or writes and that forces developers to optimize code and configurations more on one side over the other.

For example, consider building a user API that allows consumers to create, update, and delete a user but also to find a user or a list of users. In this scenario, you can change one user at a time with no bulk operations available, but you can get one or more users per API request. Dividing the design of the API into read and write operations results in this architecture:

Read and write functions

The cohesion of code for write operations (create, update, and delete) is beneficial for many reasons. For instance, you may need to validate the request body, ensuring it contains all the mandatory parameters. If the workload is heavy on writes, the less-used operations (for instance, Delete) benefit from warm execution environments. The code colocation enables reusability of code on similar actions, reducing the cognitive load to structure your projects with shared libraries or Lambda layers, for instance.

When looking at the read operations side, you can reduce the code bundled with this function, having a faster cold start, and heavily optimize the performance compared to a write operation. You can also store partial or full query results in-memory of an execution environment to improve the execution time of a Lambda function.

This approach helps you further with its evolutionary nature. Imagine if this platform becomes much more popular. Now, you must optimize the API even further by improving reads and adding a cache aside pattern with ElastiCache and Redis. Moreover, you have decided to optimize the read queries with a second database that is optimized for the read capability when the cache is missed.

On the write side, you have agreed with the API consumers that receiving and acknowledging user creation or deletion is adequate, considering they fully embraced the eventual consistency nature of distributed systems.

Now, you can improve the response time of write operations by adding an SQS queue before the Lambda function. You can update the write database in batches to reduce the number of invocations needed for handling write operations, instead of dealing with every request individually.

CQRS pattern

Command query responsibility segregation (CQRS) is a well-established pattern that separates the data mutation, or the command part of a system, from the query part. You can use the CQRS pattern to separate updates and queries if they have different requirements for throughput, latency, or consistency.

While it’s not mandatory to start with a full CQRS pattern, you can evolve from the infrastructure highlighted more easily in the initial read and write implementation, without massive refactoring of your API.

Comparison of the three approaches

Here is a comparison of the three approaches:


Single responsibility Lambda-lith Read and write
  • Strong separation of concerns
  • Granular configuration
  • Better debug
  • Rapid execution time
  • Fewer cold start invocations
  • Higher code cohesion
  • Simpler maintenance
  • Code cohesion where needed
  • Evolutionary architecture
  • Optimization of read and write operations
  • Code duplication
  • Complex maintenance
  • Higher cold start invocations
  • Corse grain configuration
  • Higher cold start time
  • Using CQRS with two data models
  • CQRS adds eventual consistency to your system


Developers often move from single responsibility functions to the Lambda-lith as their architectures evolve, but both approaches have relative trade-offs. This post shows how it’s possible to have the best of both approaches by dividing your workloads per read and write operations.

All three approaches are viable for designing serverless APIs, and understanding what you are optimizing for is the key for making the best decision. Remember, understanding your context and business requirements to express in your applications leads you towards the acceptable trade-offs to specify inside a specific workload. Keep an open mind and find the solution that solves the problem and balances security, developer experience, cost, and maintainability.

For more serverless learning resources, visit Serverless Land.

Introducing the .NET 8 runtime for AWS Lambda

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-the-net-8-runtime-for-aws-lambda/

This post is written by Beau Gosse, Senior Software Engineer and Paras Jain, Senior Technical Account Manager.

AWS Lambda now supports .NET 8 as both a managed runtime and container base image. With this release, Lambda developers can benefit from .NET 8 features including API enhancements, improved Native Ahead of Time (Native AOT) support, and improved performance. .NET 8 supports C# 12, F# 8, and PowerShell 7.4. You can develop Lambda functions in .NET 8 using the AWS Toolkit for Visual Studio, the AWS Extensions for .NET CLI, AWS Serverless Application Model (AWS SAM), AWS CDK, and other infrastructure as code tools.

Creating .NET 8 function in the console

Creating .NET 8 function in the console

What’s new

Upgraded operating system

The .NET 8 runtime is built on the Amazon Linux 2023 (AL2023) minimal container image. This provides a smaller deployment footprint than earlier Amazon Linux 2 (AL2) based runtimes and updated versions of common libraries such as glibc 2.34 and OpenSSL 3.

The new image also uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the .NET 8 base image. For more information, see Introducing the Amazon Linux 2023 runtime for AWS Lambda.


There are a number of language performance improvements available as part of .NET 8. Initialization time can impact performance, as Lambda creates new execution environments to scale your function automatically. There are a number of ways to optimize performance for Lambda-based .NET workloads, including using source generators in System.Text.Json or using Native AOT.

Lambda has increased the default memory size from 256 MB to 512 MB in the blueprints and templates for improved performance with .NET 8. Perform your own functional and performance tests on your .NET 8 applications. You can use AWS Compute Optimizer or AWS Lambda Power Tuning for performance profiling.

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda subsystems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing performance comparison conclusions with other Lambda runtimes until the performance has stabilized.

Native AOT

Lambda introduced .NET Native AOT support in November 2022. Benchmarks show up to 86% improvement in cold start times by eliminating the JIT compilation. Deploying .NET 8 Native AOT functions using the managed dotnet8 runtime rather than the OS-only provided.al2023 runtime gives your function access to .NET system libraries. For example, libicu, which is used for globalization, is not included by default in the provided.al2023 runtime but is in the dotnet8 runtime.

While Native AOT is not suitable for all .NET functions, .NET 8 has improved trimming support. This allows you to more easily run ASP.NET APIs. Improved trimming support helps eliminate build time trimming warnings, which highlight possible runtime errors. This can give you confidence that your Native AOT function behaves like a JIT-compiled function. Trimming support has been added to the Lambda runtime libraries, AWS .NET SDK, .NET Lambda Annotations, and .NET 8 itself.

Using.NET 8 with Lambda

To use .NET 8 with Lambda, you must update your tools.

  1. Install or update the .NET 8 SDK.
  2. If you are using AWS SAM, install or update to the latest version.
  3. If you are using Visual Studio, install or update the AWS Toolkit for Visual Studio.
  4. If you use the .NET Lambda Global Tools extension (Amazon.Lambda.Tools), install the CLI extension and templates. You can upgrade existing tools with dotnet tool update -g Amazon.Lambda.Tools and existing templates with dotnet new install Amazon.Lambda.Templates.

You can also use .NET 8 with Powertools for AWS Lambda (.NET), a developer toolkit to implement serverless best practices such as observability, batch processing, retrieving parameters, idempotency, and feature flags.

Building new .NET 8 functions


  1. Run sam init.
  2. Choose 1- AWS Quick Start Templates.
  3. Choose one of the available templates such as Hello World Example.
  4. Select N for Use the most popular runtime and package type?
  5. Select dotnet8 as the runtime. The dotnet8 Hello World Example also includes a Native AOT template option.
  6. Follow the rest of the prompts to create the .NET 8 function.
AWS SAM .NET 8 init options

AWS SAM .NET 8 init options

You can amend the generated function code and use sam deploy --guided to deploy the function.

Using AWS Toolkit for Visual Studio

  1. From the Create a new project wizard, filter the templates to either the Lambda or Serverless project type and select a template. Use Lambda for deploying a single function. Use Serverless for deploying a collection of functions using AWS CloudFormation.
  2. Continue with the steps to finish creating your project.
  3. You can amend the generated function code.
  4. To deploy, right click on the project in the Solution Explorer and select Publish to AWS Lambda.

Using AWS extensions for the .NET CLI

  1. Run dotnet new list --tag Lambda to get a list of available Lambda templates.
  2. Choose a template and run dotnet new <template name>. To build a function using Native AOT, use dotnet new lambda.NativeAOT or dotnet new serverless.NativeAOT when using the .NET Lambda Annotations Framework.
  3. Locate the generated Lambda function in the directory under src which contains the .csproj file. You can amend the generated function code.
  4. To deploy, run dotnet lambda deploy-function and follow the prompts.
  5. You can test the function in the cloud using dotnet lambda invoke-function or by using the test functionality in the Lambda console.

You can build and deploy .NET Lambda functions using container images. Follow the instructions in the documentation.

Migrating from .NET 6 to .NET 8 without Native AOT


  1. Open the template.yaml file.
  2. Update Runtime to dotnet8.
  3. Open a terminal window and rebuild the code using sam build.
  4. Run sam deploy to deploy the changes.

Using AWS Toolkit for Visual Studio

  1. Open the .csproj project file and update the TargetFramework to net8.0. Update NuGet packages for your Lambda functions to the latest version to pull in .NET 8 updates.
  2. Verify that the build command you are using is targeting the .NET 8 runtime.
  3. There may be additional steps depending on what build/deploy tool you’re using. Updating the function runtime may be sufficient.

.NET function in AWS Toolkit for Visual Studio

Using AWS extensions for the .NET CLI or AWS Toolkit for Visual Studio

  1. Open the aws-lambda-tools-defaults.json file if it exists.
    1. Set the framework field to net8.0. If unspecified, the value is inferred from the project file.
    2. Set the function-runtime field to dotnet8.
  2. Open the serverless.template file if it exists. For any AWS::Lambda::Function or AWS::Servereless::Function resources, set the Runtime property to dotnet8.
  3. Open the .csproj project file if it exists and update the TargetFramework to net8.0. Update NuGet packages for your Lambda functions to the latest version to pull in .NET 8 updates.

Migrating from .NET 6 to .NET 8 Native AOT

The following example migrates a .NET 6 class library function to a .NET 8 Native AOT executable function. This uses the optional Lambda Annotations framework which provides idiomatic .NET coding patterns.

Update your project file

  1. Open the project file.
  2. Set TargetFramework to net8.0.
  3. Set OutputType to exe.
  4. Remove PublishReadyToRun if it exists.
  5. Add PublishAot and set to true.
  6. Add or update NuGet package references to Amazon.Lambda.Annotations and Amazon.Lambda.RuntimeSupport. You can update using the NuGet UI in your IDE, manually, or by running dotnet add package Amazon.Lambda.RuntimeSupport and dotnet add package Amazon.Lambda.Annotations from your project directory.

Your project file should look similar to the following:

<Project Sdk="Microsoft.NET.Sdk">
    <!-- Generate native aot images during publishing to improve cold start time. -->
	  <!-- StripSymbols tells the compiler to strip debugging symbols from the final executable if we're on Linux and put them into their own file. 
		This will greatly reduce the final executable's size.-->
    <PackageReference Include="Amazon.Lambda.Core" Version="2.2.0" />
    <PackageReference Include="Amazon.Lambda.RuntimeSupport" Version="1.10.0" />
    <PackageReference Include="Amazon.Lambda.Serialization.SystemTextJson" Version="2.4.0" />

Updating your function code

    1. Reference the annotations library with using Amazon.Lambda.Annotations;
    2. Add [assembly:LambdaGlobalProperties(GenerateMain = true)] to allow the annotations framework to create the main method. This is required as the project is now an executable instead of a library.
    3. Add the below partial class and include a JsonSerializable attribute for any types that you need to serialize, including for your function input and output This partial class is used at build time to generate reflection free code dedicated to serializing the listed types. The following is an example:
    4. /// <summary>
      /// This class is used to register the input event and return type for the FunctionHandler method with the System.Text.Json source generator.
      /// There must be a JsonSerializable attribute for each type used as the input and return type or a runtime error will occur 
      /// from the JSON serializer unable to find the serialization information for unknown types.
      /// </summary>
      public partial class MyCustomJsonSerializerContext : JsonSerializerContext
          // By using this partial class derived from JsonSerializerContext, we can generate reflection free JSON Serializer code at compile time
          // which can deserialize our class and properties. However, we must attribute this class to tell it what types to generate serialization code for
          // See https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-source-generation

    5. After the using statement, add the following to specify the serializer to use. [assembly: LambdaSerializer(typeof(SourceGeneratorLambdaJsonSerializer<LambdaFunctionJsonSerializerContext>))]

    Swap LambdaFunctionJsonSerializerContext for your context if you are not using the partial class from the previous step.

    Updating your function configuration

    If you are using aws-lambda-tools-defaults.json.

    1. Set function-runtime to dotnet8.
    2. Set function-architecture to match your build machine – either x86_64 or arm64.
    3. Set (or update) environment-variables to include ANNOTATIONS_HANDLER=<YourFunctionHandler>. Replace <YourFunctionHandler> with the method name of your function handler, so the annotations framework knows which method to call from the generated main method.
    4. Set function-handler to the name of the executable assembly in your bin directory. By default, this is your project name, which tells the .NET Lambda bootstrap script to run your native binary instead of starting the .NET runtime. If your project file has AssemblyName then use that value for the function handler.
      "function-architecture": "x86_64",
      "function-runtime": "dotnet8",
      "function-handler": "<your-assembly-name>",

    Deploy and test

    1. Deploy your function. If you are using Amazon.Lambda.Tools, run dotnet lambda deploy-function. Check for trim warnings during build and refactor to eliminate them.
    2. Test your function to ensure that the native calls into AL2023 are working correctly. By default, running local unit tests on your development machine won’t run natively and will still use the JIT compiler. Running with the JIT compiler does not allow you to catch native AOT specific runtime errors.


    Lambda is introducing the new .NET 8 managed runtime. This post highlights new features in .NET 8. You can create new Lambda functions or migrate existing functions to .NET 8 or .NET 8 Native AOT.

    For more information, see the AWS Lambda for .NET repository, documentation, and .NET on Serverless Land.

    For more serverless learning resources, visit Serverless Land.

Re-platforming Java applications using the updated AWS Serverless Java Container

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/re-platforming-java-applications-using-the-updated-aws-serverless-java-container/

This post is written by Dennis Kieselhorst, Principal Solutions Architect.

The combination of portability, efficiency, community, and breadth of features has made Java a popular choice for businesses to build their applications for over 25 years. The introduction of serverless functions, pioneered by AWS Lambda, changed what you need in a programming language and runtime environment. Functions are often short-lived, single-purpose, and do not require extensive infrastructure configuration.

This blog post shows how you can modernize a legacy Java application to run on Lambda with minimal code changes using the updated AWS Serverless Java Container.

Deployment model comparison

Classic Java enterprise applications often run on application servers such as JBoss/ WildFly, Oracle WebLogic and IBM WebSphere, or servlet containers like Apache Tomcat. The underlying Java virtual machine typically runs 24/7 and serves multiple requests using its multithreading capabilities.

Typical long running Java application server

Typical long running Java application server

When building Lambda functions with Java, an HTTP server is no longer required and there are other considerations for running code in a Lambda environment. Code runs in an execution environment, which processes a single invocation at a time. Functions can run for up to 15 minutes with a maximum of 10 Gb allocated memory.

Functions are triggered by events such as an HTTP request with a corresponding payload. An Amazon API Gateway HTTP request invokes the function with the following JSON payload:

Amazon API Gateway HTTP request payload

Amazon API Gateway HTTP request payload

The code to process these events is different from how you implement it in a traditional application.

AWS Serverless Java Container

The AWS Serverless Java Container makes it easier to run Java applications written with frameworks such as Spring, Spring Boot, or JAX-RS/Jersey in Lambda.

The container provides adapter logic to minimize code changes. Incoming events are translated to the Servlet specification so that frameworks work as before.

AWS Serverless Java Container adapter

AWS Serverless Java Container adapter

Version 1 of this library was released in 2018. Today, AWS is announcing the release of version 2, which supports the latest Jakarta EE specification, along with Spring Framework 6.x, Spring Boot 3.x and Jersey 3.x.

Example: Modifying a Spring Boot application

This following example illustrates how to migrate a Spring Boot 3 application. You can find the full example for Spring and other frameworks in the GitHub repository.

  1. Add the AWS Serverless Java dependency to your Maven POM build file (or Gradle accordingly):
  2. <dependency>
  3. Spring Boot, by default, embeds Apache Tomcat to deal with HTTP requests. The examples use Amazon API Gateway to handle inbound HTTP requests so you can exclude the dependency.
  4. <build>

    The AWS Serverless Java Container accepts API Gateway proxy requests and transforms them into a plain Java object. The library also transforms outputs into a suitable API Gateway response object.

    Once you run your build process, Maven’s Shade-plugin now produces an Uber-JAR that bundles all dependencies, which you can upload to Lambda.

  5. The Lambda runtime must know which handler method to invoke. You can configure and use the SpringDelegatingLambdaContainerHandler implementation or implement your own handler Java class that delegates to AWS Serverless Java Container. This is useful if you want to add additional functionality.
  6. Configure the handler name in the runtime settings of your function.
  7. Configure the handler name

    Configure the handler name

  8. Configure an environment variable named MAIN_CLASS to let the generic handler know where to find your original application main class, which is usually annotated with @SpringBootApplication.
  9. Configure MAIN_CLASS environment variable

    Configure MAIN_CLASS environment variable

    You can also configure these settings using infrastructure as code (IaC) tools such as AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or the AWS Serverless Application Model (AWS SAM).

    In an AWS SAM template, the related changes are as follows. Full templates are part of the GitHub repository.

    Handler: com.amazonaws.serverless.proxy.spring.SpringDelegatingLambdaContainerHandler 
        MAIN_CLASS: com.amazonaws.serverless.sample.springboot3.Application

    Optimizing memory configuration

    When running Lambda functions, start-up time and memory footprint are important considerations. The amount of memory you configure for your Lambda function also determines the amount of virtual CPU available. Adding more memory proportionally increases the amount of CPU, and therefore increases the overall computational power available. If a function is CPU-, network- or memory-bound, adding more memory can improve performance.

    Lambda charges for the total amount of gigabyte-seconds consumed by a function. Gigabyte-seconds are a combination of total memory (in gigabytes) and duration (in seconds). Increasing memory incurs additional cost. However, in many cases, increasing the memory available causes a decrease in the function duration due to the additional CPU available. As a result, the overall cost increase may be negligible for additional performance, or may even decrease.

    Choosing the memory allocated to your Lambda functions is an optimization process that balances speed (duration) and cost. You can manually test functions by selecting different memory allocations and measuring the completion time. AWS Lambda Power Tuning is a tool to simplify and automate the process, which you can use to optimize your configuration.

    Power Tuning uses AWS Step Functions to run multiple concurrent versions of a Lambda function at different memory allocations and measures the performance. The function runs in your AWS account, performing live HTTP calls and SDK interactions, to measure performance in a production scenario.

    Improving cold-start time with AWS Lambda SnapStart

    Traditional applications often have a large tree of dependencies. Lambda loads the function code and initializes dependencies during Lambda lifecycle initialization phase. With many dependencies, this initialization time may be too long for your requirements. AWS Lambda SnapStart for Java based functions can deliver up to 10 times faster startup performance.

    Instead of running the function initialization phase on every cold-start, Lambda SnapStart runs the function initialization process at deployment time. Lambda takes a snapshot of the initialized execution environment. This snapshot is encrypted and persisted in a tiered cache for low latency access. When the function is invoked and scales, Lambda resumes the execution environment from the persisted snapshot instead of running the full initialization process. This results in lower startup latency.

    To enable Lambda SnapStart you must first enable the configuration setting, and also publish a function version.

    Enabling SnapStart

    Enabling SnapStart

    Ensure you point your API Gateway endpoint to the published version or an alias to ensure you are using the SnapStart enabled function.

    The corresponding settings in an AWS SAM template contain the following:

      ApplyOn: PublishedVersions
    AutoPublishAlias: my-function-alias

    Read the Lambda SnapStart compatibility considerations in the documentation as your application may contain specific code that requires attention.


    When building serverless applications with Lambda, you can deliver features faster, but your language and runtime must work within the serverless architectural model. AWS Serverless Java Container helps to bridge between traditional Java Enterprise applications and modern cloud-native serverless functions.

    You can optimize the memory configuration of your Java Lambda function using AWS Lambda Power Tuning tool and enable SnapStart to optimize the initial cold-start time.

    The self-paced Java on AWS Lambda workshop shows how to build cloud-native Java applications and migrate existing Java application to Lambda.

    Explore the AWS Serverless Java Container GitHub repo where you can report related issues and feature requests.

    For more serverless learning resources, visit Serverless Land.

Build real-time applications with Amazon EventBridge and AWS AppSync

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/build-real-time-applications-with-amazon-eventbridge-and-aws-appsync/

This post is written by Josh Kahn, Tech Leader, Serverless.

Amazon EventBridge now supports publishing events to AWS AppSync GraphQL APIs as native targets. The new integration enables builders to publish events easily to a wider variety of consumers and simplifies updating clients with near real-time data. You can use EventBridge and AWS AppSync to build resilient, subscription-based event-driven architectures across consumers.

To illustrate using EventBridge with AWS AppSync, consider a simplified airport operations scenario. In this example, airlines publish flight events (for example, boarding, push back, gate changes, and delays) to a service that maintains flight status on in-airport displays. Airlines also publish events that are useful for other entities at the airport, such as baggage handlers and maintenance, but not to passengers. This depicts a conceptual view of the system:

Conceptual view of the system

Passengers want the in-airport displays to be up-to-date and accurate. There are a number of ways to design the display application so that data remains up-to-date. Broadly, these include the application polling some API or the application subscribing to data changes.

Subscriptions for this scenario are better as the data changes are small and incremental relative to the large amount of information displayed. In a delay, for example, the display updates the status and departure time but no other details of a single flight among a larger list of flight information.

Flight board

AWS AppSync can enable clients to listen for real-time data changes through the use of GraphQL subscriptions. These are implemented using a WebSocket connection between the client and the AWS AppSync service. The display application client invokes the GraphQL subscription operation to establish a secure connection. AWS AppSync will automatically push data changes (or mutations) via the GraphQL API to subscribers using that connection.

Previously, builders could use EventBridge API Destinations to wire events published and routed through EventBridge to AWS AppSync, as described in an earlier blog post, and available in Serverless Land patterns (API Key, OAuth). The approach is useful for dealing with “out-of-band” updates in which data changes outside of an AWS AppSync mutation. Out-of-band updates generally require a NONE data source in AWS AppSync to notify subscribers of changes, as described in the AWS re:Post Knowledge Center. The addition of AWS AppSync as a target for EventBridge simplifies these use cases as you can now trigger a mutation in response to an event without additional code.

Airport Operations Events

Expanding the scenario, airport operations events look like this:

  "flightNum": 123,
  "carrierCode": "JK",
  "date": "2024-01-25",
  "event": "FlightDelayed",
  "message": "Delayed 15 minutes, late aircraft",
  "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"

The event field identifies the type of event and if it is relevant to passengers. The event details provide further information about the event, which varies based on the type of event. The airport publishes a variety of events but the airport displays only need a subset of those changes.

AWS AppSync GraphQL APIs start with a GraphQL schema that defines the types, fields, and operations available in that API. AWS AppSync documentation provides an overview of schema and other GraphQL essentials. The partial GraphQL schema for the airport scenario is as follows:

type DelayEventInfo implements EventInfo {
	message: String
	delayMinutes: Int
	newDepTime: AWSDateTime

interface EventInfo {
	message: String

enum StatusEvent {

type StatusUpdate {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	info: EventInfo

input StatusUpdateInput {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	message: String
	extra: AWSJSON

type Mutation {
	updateFlightStatus(input: StatusUpdateInput!): StatusUpdate!

type Query {
	listStatusUpdates(by: String): [StatusUpdate]

type Subscription {
	onFlightStatusUpdate(date: AWSDate, carrier: String): StatusUpdate
		@aws_subscribe(mutations: ["updateFlightStatus"])

schema {
	query: Query
	mutation: Mutation
	subscription: Subscription

Connect EventBridge to AWS AppSync

EventBridge allows you to filter, transform, and route events to a number of targets. The airport display service only needs events that directly impact passengers. You can define a rule in EventBridge that routes only those events (included in the preceding GraphQL schema) to the AWS AppSync target. Other events are routed elsewhere, as defined by other rules, or dropped. Details on creating EventBridge rules and the event matching pattern format can be found in EventBridge documentation.

The previous flight delayed event would be delivered using EventBridge as follows:

  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"

In this scenario, there is a specific list of events of interest, but EventBridge provides a flexible set of operations to match patterns, inspect arrays, and filter by content using prefix, numerical, or other matching. Some organizations will also allow subscribers to define their own rules on an EventBridge event bus, allowing targets to subscribe to events via self-service.

The following event pattern matches on the events needed for the airport display service:

  "source": [ "airport-operations" ],
  "detail": {
    "event": [ "FlightArrived", "FlightBoarding", "FlightCancelled", ... ]

To create a new EventBridge rule, you can use the AWS Management Console or infrastructure as code. You can find the CloudFormation definition for the completed rule, with the AWS AppSync target, later in this post.

Console view

Create the AWS AppSync target

Now that EventBridge is configured to route selected events, define AWS AppSync as the target for the rule. The AWS AppSync API must support IAM authorization to be used as an EventBridge target. AWS AppSync supports multiple authorization types on a single GraphQL type, so you can also use OpenID Connect, Amazon Cognito User Pools, or other authorization methods as needed.

To configure AWS AppSync as an EventBridge target, define the target using the AWS Management Console or infrastructure as code. In the console, select the Target Type as “AWS Service” and Target as “AppSync.” Select your API. EventBridge parses the GraphQL schema and allows you to select the mutation to invoke when the rule is triggered.

When using the AWS Management Console, EventBridge will also configure the necessary AWS IAM role to invoke the selected mutation. Remember to create and associate a role with an appropriate trust policy when configuring with IaC.

EventBridge target types

EventBridge supports input transformation to customize the contents of an event before passing the information as input to the target. Configure the input transformer to extract needed values from the event using JSON path and a template in the input format expected by the AWS AppSync API. EventBridge provides a handy utility in the Console to pass and test the output of a sample event.

Target input transformer

Finally, configure the selection set to include the response from the AWS AppSync API. These are the fields that will be returned to EventBridge when the mutation is invoked. While the result returned to EventBridge is not overly useful (aside from troubleshooting), the mutation selection set will also determine the fields available to subscribers to the onFlightStatusUpdate subscription.

Configuring the selection set

Define the EventBridge to AWS AppSync rule in CloudFormation

Infrastructure as code templates, including AWS CloudFormation and AWS CDK, are useful for codifying infrastructure definitions to deploy across Regions and accounts. While you can write CloudFormation by hand, EventBridge provides a useful CloudFormation export in the AWS Management Console. You can use this feature to export the definition for a defined rule.

Export definition

This is the CloudFormation for the previous configured rule and AWS AppSync target. This snippet includes both the rule definition and the target configuration.

    Type: AWS::Events::Rule
      Description: Route passenger related events to the display service endpoint
      EventBusName: eb-to-appsync
          - airport-operations
            - FlightArrived
            - FlightBoarding
            - FlightCancelled
            - FlightDelayed
            - FlightGateChanged
            - FlightLanded
            - FlightPushBack
            - FlightTookOff
      Name: passenger-events-to-display-service
      State: ENABLED
        - Id: 12344535353263463
          Arn: <AppSync API GraphQL API ARN>
          RoleArn: <EventBridge Role ARN (defined elsewhere)>
              carrier: $.detail.carrierCode
              date: $.detail.date
              event: $.detail.event
              extra: $.detail.info
              message: $.detail.message
              num: $.detail.flightNum
            InputTemplate: |-
                "input": {
                  "num": <num>,
                  "carrier": <carrier>,
                  "date": <date>,
                  "event": <event>,
                  "message": "<message>",
                  "extra": <extra>
            GraphQLOperation: >-
                info {
                  ... on DelayEventInfo {

The ARN of the AWS AppSync API follows the form arn:aws:appsync:<AWS_REGION>:<ACCOUNT_ID>:endpoints/graphql-api/<GRAPHQL_ENDPOINT_ID>. The ARN is available in CloudFormation (see GraphQLEndpointArn return value) or can be created using the identifier found in the AWS AppSync GraphQL endpoint. The ARN included in the EventBridge execution role policy is the AWS AppSync API ARN (a different ARN).

The AppSyncParameters field includes the GraphQL operation for EventBridge to invoke on the AWS AppSync API. This must be well formatted and match the GraphQL schema. Include any fields that must be available to subscribers in the selection set.

Testing subscriptions

AWS AppSync is now configured as a target for the EventBridge rule. The real-life display application would use a GraphQL library, such as AWS Amplify, to subscribe to real-time data changes. The AWS Management Console provides a useful utility to test. Navigate to the AWS AppSync console and select Queries in the menu for your API. Enter the following query and choose Run to subscribe for data changes:

subscription MySubscription {
  onFlightStatusUpdate {
    info {
      … on DelayEventInfo {

In a separate browser tab, navigate to the EventBridge console, and choose Send events. On the Send events page, select the required event bus and set the Event source to “airport-operations.” Then enter a detail type of your choice. Finally, paste the following as the Event detail, then choose Send.

  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"

Return to the AWS AppSync tab in your browser to see the changed data in the result pane:

Result pane


Directly invoking AWS AppSync GraphQL API targets from EventBridge simplifies and streamlines integration between these two services, ideal for notifying a variety of subscribers of data changes in event-driven workloads. You can also take advantage of other features available from the two services. For example, use AWS AppSync enhanced subscription filtering to update only airport displays in the terminal in which they are located.

To learn more about serverless, visit Serverless Land for a wide array of reusable patterns, tutorials, and learning materials. Newly added to the pattern library is an EventBridge to AWS AppSync pattern similar to the one described in this post. Visit EventBridge documentation for more details.

For more serverless learning resources, visit Serverless Land.

Invoking on-premises resources interactively using AWS Step Functions and MQTT

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/invoking-on-premises-resources-interactively-using-aws-step-functions-and-mqtt/

This post is written by Alex Paramonov, Sr. Solutions Architect, ISV, and Pieter Prinsloo, Customer Solutions Manager.

Workloads in AWS sometimes require access to data stored in on-premises databases and storage locations. Traditional solutions to establish connectivity to the on-premises resources require inbound rules to firewalls, a VPN tunnel, or public endpoints.

This blog post demonstrates how to use the MQTT protocol (AWS IoT Core) with AWS Step Functions to dispatch jobs to on-premises workers to access or retrieve data stored on-premises. The state machine can communicate with the on-premises workers without opening inbound ports or the need for public endpoints on on-premises resources. Workers can run behind Network Access Translation (NAT) routers while keeping bidirectional connectivity with the AWS Cloud. This provides a more secure and cost-effective way to access data stored on-premises.


By using Step Functions with AWS Lambda and AWS IoT Core, you can access data stored on-premises securely without altering the existing network configuration.

AWS IoT Core lets you connect IoT devices and route messages to AWS services without managing infrastructure. By using a Docker container image running on-premises as a proxy IoT Thing, you can take advantage of AWS IoT Core’s fully managed MQTT message broker for non-IoT use cases.

MQTT subscribers receive information via MQTT topics. An MQTT topic acts as a matching mechanism between publishers and subscribers. Conceptually, an MQTT topic behaves like an ephemeral notification channel. You can create topics at scale with virtually no limit to the number of topics. In SaaS applications, for example, you can create topics per tenant. Learn more about MQTT topic design here.

The following reference architecture shown uses the AWS Serverless Application Model (AWS SAM) for deployment, Step Functions to orchestrate the workflow, AWS Lambda to send and receive on-premises messages, and AWS IoT Core to provide the MQTT message broker, certificate and policy management, and publish/subscribe topics.

Reference architecture

  1. Start the state machine, either “on demand” or on a schedule.
  2. The state: “Lambda: Invoke Dispatch Job to On-Premises” publishes a message to an MQTT message broker in AWS IoT Core.
  3. The message broker sends the message to the topic corresponding to the worker (tenant) in the on-premises container that runs the job.
  4. The on-premises container receives the message and starts work execution. Authentication is done using client certificates and the attached policy limits the worker access to only the tenant’s topic.
  5. The worker in the on-premises container can access local resources like DBs or storage locations.
  6. The on-premises container sends the results and job status back to another MQTT topic.
  7. The AWS IoT Core rule invokes the “TaskToken Done” Lambda function.
  8. The Lambda function submits the results to Step Functions via SendTaskSuccess or SendTaskFailure API.

Deploying and testing the sample

Ensure you can manage AWS resources from your terminal and that:

  • Latest versions of AWS CLI and AWS SAM CLI are installed.
  • You have an AWS account. If not, visit this page.
  • Your user has sufficient permissions to manage AWS resources.
  • Git is installed.
  • Python version 3.11 or greater is installed.
  • Docker is installed.

You can access the GitHub repository here and follow these steps to deploy the sample.

The aws-resources directory contains the required AWS resources including the state machine, Lambda functions, topics, and policies. The directory on-prem-worker contains the Docker container image artifacts. Use it to run the on-premises worker locally.

In this example, the worker container adds two numbers, provided as an input in the following format:

  "a": 15,
  "b": 42

In a real-world scenario, you can substitute this operation with business logic. For example, retrieving data from on-premises databases, generating aggregates, and then submitting the results back to your state machine.

Follow these steps to test the sample end-to-end.

Using AWS IoT Core without IoT devices

There are no IoT devices in the example use case. However, the fully managed MQTT message broker in AWS IoT Core lets you route messages to AWS services without managing infrastructure.

AWS IoT Core authenticates clients using X.509 client certificates. You can attach a policy to a client certificate allowing the client to publish and subscribe only to certain topics. This approach does not require IAM credentials inside the worker container on-premises.

AWS IoT Core’s security, cost efficiency, managed infrastructure, and scalability make it a good fit for many hybrid applications beyond typical IoT use cases.

Dispatching jobs from Step Functions and waiting for a response

When a state machine reaches the state to dispatch the job to an on-premises worker, the execution pauses and waits until the job finishes. Step Functions support three integration patterns: Request-Response, Sync Run a Job, and Wait for a Callback with Task Token. The sample uses the “Wait for a Callback with Task Token“ integration. It allows the state machine to pause and wait for a callback for up to 1 year.

When the on-premises worker completes the job, it publishes a message to the topic in AWS IoT Core. A rule in AWS IoT Core then invokes a Lambda function, which sends the result back to the state machine by calling either SendTaskSuccess or SendTaskFailure API in Step Functions.

You can prevent the state machine from timing out by adding HeartbeatSeconds to the task in the Amazon States Language (ASL). Timeouts happen if the job freezes and the SendTaskFailure API is not called. HeartbeatSeconds send heartbeats from the worker via the SendTaskHeartbeat API call and should be less than the specified TimeoutSeconds.

To create a task in ASL for your state machine, which waits for a callback token, use the following code:

      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
      "Parameters": {
        "FunctionName": "${LambdaNotifierToWorkerArn}",
        "Payload": {
          "Input.$": "$",
          "TaskToken.$": "$$.Task.Token"

The .waitForTaskToken suffix indicates that the task must wait for the callback. The state machine generates a unique callback token, accessible via the $$.Task.Token built-in variable, and passes it as an input to the Lambda function defined in FunctionName.

The Lambda function then sends the token to the on-premises worker via an AWS IoT Core topic.

Lambda is not the only service that supports Wait for Callback integration – see the full list of supported services here.

In addition to dispatching tasks and getting the result back, you can implement progress tracking and shut down mechanisms. To track progress, the worker sends metrics via a separate topic.

Depending on your current implementation, you have several options:

  1. Storing progress data from the worker in Amazon DynamoDB and visualizing it via REST API calls to a Lambda function, which reads from the DynamoDB table. Refer to this tutorial on how to store data in DynamoDB directly from the topic.
  2. For a reactive user experience, create a rule to invoke a Lambda function when new progress data arrives. Open a WebSocket connection to your backend. The Lambda function sends progress data via WebSocket directly to the frontend.

To implement a shutdown mechanism, you can run jobs in separate threads on your worker and subscribe to the topic, to which your state machine publishes the shutdown messages. If a shutdown message arrives, end the job thread on the worker and send back the status including the callback token of the task.

Using AWS IoT Core Rules and Lambda Functions

A message with job results from the worker does not arrive to the Step Functions API directly. Instead, an AWS IoT Core Rule and a dedicated Lambda function forward the status message to Step Functions. This allows for more granular permissions in AWS IoT Core policies, which result in improved security because the worker container can only publish and subscribe to specific topics. No IAM credentials exist on-premises.

The Lambda function’s execution role contains the permissions for SendTaskSuccess, SendTaskHeartbeat, and SendTaskFailure API calls only.

Alternatively, a worker can run API calls in Step Functions workflows directly, which replaces the need for a topic in AWS IoT Core, a rule, and a Lambda function to invoke the Step Functions API. This approach requires IAM credentials inside the worker’s container. You can use AWS Identity and Access Management Roles Anywhere to obtain temporary security credentials. As your worker’s functionality evolves over time, you can add further AWS API calls while adding permissions to the IAM execution role.

Cleaning up

The services used in this solution are eligible for AWS Free Tier. To clean up the resources in the aws-resources/ directory of the repository run:

sam delete

This removes all resources provisioned by the template.yml file.

To remove the client certificate from AWS, navigate to AWS IoT Core Certificates and delete the certificate, which you added during the manual deployment steps.

Lastly, stop the Docker container on-premises and remove it:

docker rm --force mqtt-local-client

Finally, remove the container image:

docker rmi mqtt-client-waitfortoken


Accessing on-premises resources with workers controlled via Step Functions using MQTT and AWS IoT Core is a secure, reactive, and cost effective way to run on-premises jobs. Consider updating your hybrid workloads from using inefficient polling or schedulers to the reactive approach described in this post. This offers an improved user experience with fast dispatching and tracking of jobs outside of cloud.

For more serverless learning resources, visit Serverless Land.

Consuming private Amazon API Gateway APIs using mutual TLS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/consuming-private-amazon-api-gateway-apis-using-mutual-tls/

This post is written by Thomas Moore, Senior Solutions Architect and Josh Hart, Senior Solutions Architect.

A previous blog post explores using Amazon API Gateway to create private REST APIs that can be consumed across different AWS accounts inside a virtual private cloud (VPC). Private cross-account APIs are useful for software vendors (ISVs) and SaaS companies providing secure connectivity for customers, and organizations building internal APIs and backend microservices.

Mutual TLS (mTLS) is an advanced security protocol that provides two-way authentication via certificates between a client and server. mTLS requires the client to send an X.509 certificate to prove its identity when making a request, together with the default server certificate verification process. This ensures that both parties are who they claim to be.

mTLS connection process

The mTLS connection process illustrated in the diagram above:

  1. Client connects to the server.
  2. Server presents its certificate, which is verified by the client.
  3. Client presents its certificate, which is verified by the server.
  4. Encrypted TLS connection established.

Customers use mTLS because it offers stronger security and identity verification than standard TLS connections. mTLS helps prevent man-in-the-middle attacks and protects against threats such as impersonation attempts, data interception, and tampering. As threats become more advanced, mTLS provides an extra layer of defense to validate connections.

Implementing mTLS increases overhead for certificate management, but for applications transmitting valuable or sensitive data, the extra security is important. If security is a priority for your systems and users, you should consider deploying mTLS.

Regional API Gateway endpoints have native support for mTLS but private API Gateway endpoints do not support mTLS, so you must terminate mTLS before API Gateway. The previous blog post shows how to build private mTLS APIs using a self-managed verification process inside a container running an NGINX proxy. Since then, Application Load Balancer (ALB) now supports mTLS natively, simplifying the architecture.

This post explores building mTLS private APIs using this new feature.

Application Load Balancer mTLS configuration

You can enable mutual authentication (mTLS) on a new or existing Application Load Balancer. By enabling mTLS on the load balancer listener, clients are required to present trusted certificates to connect. The load balancer validates the certificates before allowing requests to the backends.

Application Load Balancer mTLS configuration

There are two options available when configuring mTLS on the Application Load Balancer: Passthrough mode and Verify with trust store mode.

In Passthrough mode, the client certificate chain is passed as an X-Amzn-Mtls-Clientcert HTTP header for the application to inspect for authorization. In this scenario, there is still a backend verification process. The benefit in adding the ALB to the architecture is that you can perform application (layer 7) routing, such as path-based routing, allowing more complex application routing configurations.

In Verify with trust store mode, the load balancer validates the client certificate and only allows clients providing trusted certificates to connect. This simplifies the management and reduces load on backend applications.

This example uses AWS Private Certificate Authority but the steps are similar for third-party certificate authorities (CA).

To configure the certificate Trust Store for the ALB:

  1. Create an AWS Private Certificate Authority. Specify the Common Name (CN) to be the domain you use to host the application at (for example, api.example.com).
  2. Export the CA using either the CLI or the Console and upload the resulting Certificate.pem to an Amazon S3 bucket.
  3. Create a Trust Store, point this at the certificate uploaded in the previous step.
  4. Update the listener of your Application Load Balancer to use this trust store and select the required mTLS verification behavior.
  5. Generate certificates for the client application against the private certificate authority, for example using the following commands:
openssl req -new -newkey rsa:2048 -days 365 -keyout my_client.key -out my_client.csr

aws acm-pca issue-certificate –certificate-authority-arn arn:aws:acm-pca:us-east-1:111122223333:certificate-authority/certificate_authority_id–csr fileb://my_client.csr –signing-algorithm “SHA256WITHRSA” –validity Value=365,Type=”DAYS” –template-arn arn:aws:acm-pca:::template/EndEntityCertificate/V1

aws acm-pca get-certificate -certificate-authority-arn arn:aws:acm-pca:us-east-1:111122223333:certificate-authority/certificate_authority_id–certificate-arn arn:aws:acm-pca:us-east-1:account_id:certificate-authority/certificate_authority_id/certificate/certificate_id–output text

For more details on this part of the process, see Use ACM Private CA for Amazon API Gateway Mutual TLS.

Private API Gateway mTLS verification using an ALB

Using the ALB Verify with trust store mode together with API Gateway can enable private APIs with mTLS, without the operational burden of a self-managed proxy service.

You can use this pattern to access API Gateway in the same AWS account, or cross-account.

Private API Gateway mTLS verification using an ALB

The same account pattern allows clients inside the VPC to consume the private API Gateway by calling the Application Load Balancer URL. The ALB is configured to verify the provided client certificate against the trust store before passing the request to the API Gateway.

If the certificate is invalid, the API never receives the request. A resource policy on the API Gateway ensures that can requests are only allowed via the VPC endpoint, and a security group on the VPC endpoint ensures that it can only receive requests from the ALB. This prevents the client from bypassing mTLS by invoking the API Gateway or VPC endpoints directly.

Cross-account private API Gateway mTLS using AWS PrivateLink.

The cross-account pattern using AWS PrivateLink provides the ability to connect to the ALB endpoint securely across accounts and across VPCs. It avoids the need to connect VPCs together using VPC Peering or AWS Transit Gateway and enables software vendors to deliver SaaS services to be consumed by their end customers. This pattern is available to deploy as sample code in the GitHub repository.

The flow of a client request through the cross-account architecture is as follows:

  1. A client in the consumer application sends a request to the producer API endpoint.
  2. The request is routed via AWS PrivateLink to a Network Load Balancer in the consumer account. The Network Load Balancer is a requirement of AWS PrivateLink services.
  3. The Network Load Balancer uses an Application Load Balancer-type Target Group.
  4. The Application Load Balancer listener is configured for mTLS in verify with trust store mode.
  5. An authorization decision is made comparing the client certificate to the chain in the certificate trust store.
  6. If the client certificate is allowed the request is routed to the API Gateway via the execute-api VPC Endpoint. An API Gateway resource policy is used to allow connections only via the VPC endpoint.
  7. Any additional API Gateway authentication and authorization is performed, such as using a Lambda authorizer to validate a JSON Web Token (JWT).

Using the example deployed from the GitHub repo, this is the expected response from a successful request with a valid certificate:

curl –key my_client.key –cert my_client.pem https://api.example.com/widgets 


When passing an invalid certificate, the following response is received:

curl: (35) Recv failure: Connection reset by peer

Custom domain names

An additional benefit to implementing the mTLS solution with an Application Load Balancer is support for private custom domain names. Private API Gateway endpoints do not support custom domain names currently. But in this case, clients first connect to an ALB endpoint, which does support a custom domain. The sample code implements private custom domains using a public AWS Certificate Manager (ACM) certificate on the internal ALB, and an Amazon Route 53 hosted DNS zone. This allows you to provide a static URL to consumers so that if the API Gateway is replaced the consumer does not need to update their code.

Certificate revocation list

Optionally, as another layer of security, you can also configure a certificate revocation list for a trust store on the ALB. Revocation lists allow you to revoke and invalidate issued certificates before their expiry date. You can use this feature to off-boarding customers or denying compromised credentials, for example.

You can add the certificate revocation list to a new or existing trust store. The list is provided via an Amazon S3 URI as a PEM formatted file.


This post explores ways to provide mutual TLS authentication for private API Gateway endpoints. A previous post shows how to achieve this using a self-managed NGINX proxy. This post simplifies the architecture by using the native mTLS support now available for Application Load Balancers.

This new pattern centralizes authentication at the edge, streamlines deployment, and minimizes operational overhead compared to self-managed verification. AWS Private Certificate Authority and certificate revocation lists integrate with managed credentials and security policies. This makes it easier to expose private APIs safely across accounts and VPCs.

Mutual authentication and progressive security controls are growing in importance when architecting secure cloud-based workloads. To get started, visit the GitHub repository.

For more serverless learning resources, visit Serverless Land.

Using generative infrastructure as code with Application Composer

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/using-generative-infrastructure-as-code-with-application-composer/

This post is written by Anna Spysz, Frontend Engineer, AWS Application Composer

AWS Application Composer launched in the AWS Management Console one year ago, and has now expanded to the VS Code IDE as part of the AWS Toolkit. This includes access to a generative AI partner that helps you write infrastructure as code (IaC) for all 1100+ AWS CloudFormation resources that Application Composer now supports.


Application Composer lets you create IaC templates by dragging and dropping cards on a virtual canvas. These represent CloudFormation resources, which you can wire together to create permissions and references. With support for all 1100+ resources that CloudFormation allows, you can now build with everything from AWS Amplify to AWS X-Ray.

­­Previously, standard CloudFormation resources came only with a basic configuration. Adding an Amplify App resource resulted in the following configuration by default:

    Type: AWS::Amplify::App
      Name: <String>

And in the console:

AWS App Composer in the console

AWS App Composer in the console

Now, Application Composer in the IDE uses generative AI to generate resource-specific configurations with safeguards such as validation against the CloudFormation schema to ensure valid values.

When working on a CloudFormation or AWS Serverless Application Model (AWS SAM) template in VS Code, you can sign in with your Builder ID and generate multiple suggested configurations in Application Composer. Here is an example of an AI generated configuration for the AWS::Amplify::App type:

AI generated configuration for the Amplify App type

AI generated configuration for the Amplify App type

These suggestions are specific to the resource type, and are safeguarded by a check against the CloudFormation schema to ensure valid values or helpful placeholders. You can then select, use, and modify the suggestions to fit your needs.

You now know how to generate a basic example with one resource, but let’s look at building a full application with the help of AI-generated suggestions. This example recreates a serverless application from a Serverless Land tutorial, “Use GenAI capabilities to build a chatbot,” using Application Composer and generative AI-powered code suggestions.

Getting started with the AWS Toolkit in VS Code

If you don’t yet have the AWS Toolkit extension, you can find it under the Extensions tab in VS Code. Install or update it to at least version 2.1.0, so that the screen shows Amazon Q and Application Composer:

Amazon Q and Application Composer

Amazon Q and Application Composer

Next, to enable gen AI-powered code suggestions, you must enable Amazon CodeWhisperer using your Builder ID. The easiest way is to open Amazon Q chat, and select Authenticate. On the next screen, select the Builder ID option, then sign in with your Builder ID.

Enable Amazon CodeWhisperer using your Builder ID

Enable Amazon CodeWhisperer using your Builder ID

After sign-in, your connection appears in the VS Code toolkit panel:

Connection in VS Code toolkit panel

Connection in VS Code toolkit panel

Building with Application Composer

With the toolkit installed and connected with your Builder ID, you are ready to start building.

  1. In a new workspace, create a folder for the application and a blank template.yaml file.
  2. Open this file and initiate Application Composer by choosing the icon in the top right.
Initiate Application Composer

Original architecture diagram

The original tutorial includes this architecture diagram:

Original architecture diagram

Initiate Application Composer

First, add the services in the diagram to sketch out the application architecture, which simultaneously creates a deployable CloudFormation template:

  1. From the Enhanced components list, drag in a Lambda function and a Lambda layer.
  2. Double-click the Function resource to edit its properties. Rename the Lambda function’s Logical ID to LexGenAIBotLambda.
  3. Change the Source path to src/LexGenAIBotLambda, and the runtime to Python.
  4. Change the handler value to TextGeneration.lambda_handler, and choose Save.
  5. Double-click the Layer resource to edit its properties. Rename the layer Boto3Layer and change its build method to Python. Change its Source path to src/Boto3PillowPyshorteners.zip.
  6. Finally, connect the layer to the function to add a reference between them. Your canvas looks like this:
Your App Composer canvas

Your App Composer canvas

The template.yaml file is now updated to include those resources. In the source directory, you can see some generated function files. You will replace them with the tutorial function and layers later.

In the first step, you added some resources and Application Composer generated IaC that includes best practices defaults. Next, you will use standard CloudFormation components.

Using AI for standard components

Start by using the search bar to search for and add several of the Standard components needed for your application.

Search for and add Standard components

Search for and add Standard components

  1. In the Resources search bar, enter “lambda” and add the resource type AWS::Lambda::Permission to the canvas.
  2. Enter “iam” in the search bar, and add type AWS::IAM::Policy.
  3. Add two resources of the type AWS::IAM::Role.

Your application now look like this:

Updated canvas

Updated canvas

Some standard resources have all the defaults you need. For example, when you add the AWS::Lambda::Permission resource, replace the placeholder values with:

FunctionName: !Ref LexGenAIBotLambda
Action: lambda:InvokeFunction
Principal: lexv2.amazonaws.com

Other resources, such as the IAM roles and IAM policy, have a vanilla configuration. This is where you can use the AI assistant. Select an IAM Role resource and choose Generate suggestions to see what the generative AI suggests.

Generate suggestions

Generate suggestions

Because these suggestions are generated by a Large Language Model (LLM), they may differ between each generation. These are checked against the CloudFormation schema, ensuring validity and providing a range of configurations for your needs.

Generating different configurations gives you an idea of what a resource’s policy should look like, and often gives you keys that you can then fill in with the values you need. Use the following settings for each resource, replacing the generated values where applicable.

  1. Double-click the “Permission” resource to edit its settings. Change its Logical ID to LexGenAIBotLambdaInvoke and replace its Resource configuration with the following, then choose Save:
  2. Action: lambda:InvokeFunction
    FunctionName: !GetAtt LexGenAIBotLambda.Arn
    Principal: lexv2.amazonaws.com
  3. Double-click the “Role” resource to edit its settings. Change its Logical ID to CfnLexGenAIDemoRole and replace its Resource configuration with the following, then choose Save:
  4. AssumeRolePolicyDocument:
        - Action: sts:AssumeRole
          Effect: Allow
            Service: lexv2.amazonaws.com
      Version: '2012-10-17'
      - !Join
        - ''
        - - 'arn:'
          - !Ref AWS::Partition
          - ':iam::aws:policy/AWSLambdaExecute'
  5. Double-click the “Role2” resource to edit its settings. Change its Logical ID to LexGenAIBotLambdaServiceRole and replace its Resource configuration with the following, then choose Save:
  6. AssumeRolePolicyDocument:
        - Action: sts:AssumeRole
          Effect: Allow
            Service: lambda.amazonaws.com
      Version: '2012-10-17'
      - !Join
        - ''
        - - 'arn:'
          - !Ref AWS::Partition
          - ':iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
  7. Double-click the “Policy” resource to edit its settings. Change its Logical ID to LexGenAIBotLambdaServiceRoleDefaultPolicy and replace its Resource configuration with the following, then choose Save:
    - Action:
        - lex:*
        - logs:*
        - s3:DeleteObject
        - s3:GetObject
        - s3:ListBucket
        - s3:PutObject
      Effect: Allow
      Resource: '*'
    - Action: bedrock:InvokeModel
      Effect: Allow
      Resource: !Join
        - ''
        - - 'arn:aws:bedrock:'
          - !Ref AWS::Region
          - '::foundation-model/anthropic.claude-v2'
  Version: '2012-10-17'
PolicyName: LexGenAIBotLambdaServiceRoleDefaultPolicy
  - !Ref LexGenAIBotLambdaServiceRole

Once you have updated the properties of each resource, you see the connections and groupings automatically made between them:

Connections and automatic groupings

Connections and automatic groupings

To add the Amazon Lex bot:

  1. In the resource picker, search for and add the type AWS::Lex::Bot. Here’s another chance to see what configuration the AI suggests.
  2. Change the Amazon Lex bot’s logical ID to LexGenAIBot update its configuration to the following:
  3. DataPrivacy:
      ChildDirected: false
    IdleSessionTTLInSeconds: 300
    Name: LexGenAIBot
    RoleArn: !GetAtt CfnLexGenAIDemoRole.Arn
    AutoBuildBotLocales: true
      - Intents:
          - InitialResponseSetting:
                EnableCodeHookInvocation: true
                IsActive: true
                PostCodeHookSpecification: {}
                  - Message:
                        Value: Hi there, I'm a GenAI Bot. How can I help you?
            Name: WelcomeIntent
              - Utterance: Hi
              - Utterance: Hey there
              - Utterance: Hello
              - Utterance: I need some help
              - Utterance: Help needed
              - Utterance: Can I get some help?
          - FulfillmentCodeHook:
              Enabled: true
              IsActive: true
              PostFulfillmentStatusSpecification: {}
                EnableCodeHookInvocation: true
                IsActive: true
                PostCodeHookSpecification: {}
            Name: GenerateTextIntent
              - Utterance: Generate content for
              - Utterance: 'Create text '
              - Utterance: 'Create a response for '
              - Utterance: Text to be generated for
          - FulfillmentCodeHook:
              Enabled: true
              IsActive: true
              PostFulfillmentStatusSpecification: {}
                EnableCodeHookInvocation: true
                IsActive: true
                PostCodeHookSpecification: {}
            Name: FallbackIntent
            ParentIntentSignature: AMAZON.FallbackIntent
        LocaleId: en_US
        NluConfidenceThreshold: 0.4
    Description: Bot created demonstration of GenAI capabilities.
        - BotAliasLocaleSetting:
                CodeHookInterfaceVersion: '1.0'
                LambdaArn: !GetAtt LexGenAIBotLambda.Arn
            Enabled: true
          LocaleId: en_US
  4. Choose Save on the resource.

Once all of your resources are configured, your application looks like this:

New AI generated canvas

New AI generated canvas

Adding function code and deployment

Once your architecture is defined, review and refine your template.yaml file. For a detailed reference and to ensure all your values are correct, visit the GitHub repository and check against the template.yaml file.

  1. Copy the Lambda layer directly from the repository, and add it to ./src/Boto3PillowPyshorteners.zip.
  2. In the .src/ directory, rename the generated handler.py to TextGeneration.py. You can also delete any unnecessary files.
  3. Open TextGeneration.py and replace the placeholder code with the following:
  4. import json
    import boto3
    import os
    import logging
    from botocore.exceptions import ClientError
    LOG = logging.getLogger()
    region_name = os.getenv("region", "us-east-1")
    s3_bucket = os.getenv("bucket")
    model_id = os.getenv("model_id", "anthropic.claude-v2")
    # Bedrock client used to interact with APIs around models
    bedrock = boto3.client(service_name="bedrock", region_name=region_name)
    # Bedrock Runtime client used to invoke and question the models
    bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name=region_name)
    def get_session_attributes(intent_request):
        session_state = intent_request["sessionState"]
        if "sessionAttributes" in session_state:
            return session_state["sessionAttributes"]
        return {}
    def close(intent_request, session_attributes, fulfillment_state, message):
        intent_request["sessionState"]["intent"]["state"] = fulfillment_state
        return {
            "sessionState": {
                "sessionAttributes": session_attributes,
                "dialogAction": {"type": "Close"},
                "intent": intent_request["sessionState"]["intent"],
            "messages": [message],
            "sessionId": intent_request["sessionId"],
            "requestAttributes": intent_request["requestAttributes"]
            if "requestAttributes" in intent_request
            else None,
    def lambda_handler(event, context):
        LOG.info(f"Event is {event}")
        accept = "application/json"
        content_type = "application/json"
        prompt = event["inputTranscript"]
            request = json.dumps(
                    "prompt": "\n\nHuman:" + prompt + "\n\nAssistant:",
                    "max_tokens_to_sample": 4096,
                    "temperature": 0.5,
                    "top_k": 250,
                    "top_p": 1,
                    "stop_sequences": ["\\n\\nHuman:"],
            response = bedrock_runtime.invoke_model(
            response_body = json.loads(response.get("body").read())
            LOG.info(f"Response body: {response_body}")
            response_message = {
                "contentType": "PlainText",
                "content": response_body["completion"],
            session_attributes = get_session_attributes(event)
            fulfillment_state = "Fulfilled"
            return close(event, session_attributes, fulfillment_state, response_message)
        except ClientError as e:
            LOG.error(f"Exception raised while execution and the error is {e}")
  5. To deploy the infrastructure, go back to the App Composer extension, and choose the Sync icon. Follow the guided AWS SAM instructions to complete the deployment.
App Composer Sync

App Composer Sync

After the message SAM Sync succeeded, navigate to CloudFormation in the AWS Management Console to see the newly created resources. To continue building the chatbot, follow the rest of the original tutorial.


This guide demonstrates how AI-generated CloudFormation can streamline your workflow in Application Composer, enhance your understanding of resource configurations, and speed up the development process. As always, adhere to the AWS Responsible AI Policy when using these features.

Introducing Amazon MQ cross-Region data replication for ActiveMQ brokers

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/introducing-amazon-mq-cross-region-data-replication-for-activemq-brokers/

This post is written by Dominic Gagné, Senior Software Development Engineer, and Vinodh Kannan Sadayamuthu, Senior Solutions Architect

Amazon MQ now supports cross-Region data replication for ActiveMQ brokers. This feature enables you to build regionally resilient messaging applications and makes it easier to set up cross-Region message replication between ActiveMQ brokers in Amazon MQ. This blog post explains how cross-Region data replication works in Amazon MQ, how to setup cross-Region replica brokers for ActiveMQ, and how to test promoting a replica broker.

Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ that simplifies setting up and operating message brokers on AWS.

Cross-Region replication improves the resilience and disaster recovery capabilities of your systems. This new Amazon MQ feature makes it easier to increase resilience of your ActiveMQ messaging systems across AWS Regions.

How cross-Region data replication works in Amazon MQ for ActiveMQ

The Amazon MQ for ActiveMQ cross-Region data replication feature replicates broker state from the primary broker in one AWS Region to the replica broker in another Region. Broker state consists of messages that have been sent to a broker by a message producer. Additionally, message acknowledgments and transactions are replicated. Scheduled messages and broker XML configuration are not replicated from the primary to the replica broker.

State replication occurs asynchronously and runs in the background. When a message is sent to a cross-Region data replication enabled broker, the data is persisted both to the primary data store and also on a queue used to replicate data. The replica broker acts as a client of this queue and consumes data that represents broker state from the primary broker.

At any given moment, only the primary broker is available for client connections. The replica broker is a hot standby and passively replicates the primary broker’s state. However, it does not accept client connections. The following diagram shows a simplified version of a cross-Region data replication broker pair. All replication traffic is encrypted using TLS and remains within AWS’ private backbone.

Amazon MQ for ActiveMQ cross-region data replication architecture

Configuring cross-Region replica brokers for Amazon MQ for ActiveMQ

To set up a cross-Region replica broker, your Amazon MQ for ActiveMQ primary broker must meet the following eligibility criteria:

  • ActiveMQ version 5.17.6 or above
  • Instance size m5.large or higher
  • Active/standby broker deployment enabled
  • Be in the Running state

If you do not have an ActiveMQ broker that meets these criteria, see Creating and configuring an ActiveMQ broker for instructions on how to create a primary broker.

To configure cross-Region replication

  1. Navigate to the Amazon MQ console and choose Create replica broker.
    Amazon MQ console create replica broker
  2. Select a primary broker from the list of eligible primary brokers and choose Next.
    Amazon MQ console choose primary broker
  3. Under Replica broker details, select the Region for your replica broker and enter a Replica broker name.
    Amazon MQ console configure replica broker
  4. In the ActiveMQ console user for replica broker panel, enter a Username and Password for broker access.
    Amazon MQ console user for replica broker
  5. In the Data replication user to bridge access between brokers panel, enter a replication user Username and Password.
    Amazon MQ console user for replica broker
  6. In the Additional settings panel, keep the defaults and choose Next.
  7. Review the settings and choose Create replica broker.
    Note: The broker access type is automatically set based on the primary broker access type.
    Amazon MQ console create replica broker setting summary
  8. The creation process takes up to 25 minutes. Once the replica broker creation is complete, begin replication between the primary and the replica brokers by rebooting the primary broker.
  9. Once the primary broker is rebooted and its status is Running, you can see the replica details in the Data replication panel of the primary broker.
    Amazon MQ console broker replication details

Both brokers now synchronize with each other to establish an inter-Region network and connection through which broker state is replicated. Once both brokers are in the Running state, the primary broker accepts client connections and passes all broker state changes (messages, acknowledgments, transactions, etc.) to the replica broker.

The replica broker now asynchronously mirrors the state of the primary broker. However, it does not become available for client connections until it is promoted via a switchover or a failover. These operations are covered in the following section.

Testing data replication and promoting the replica broker

There are two ways to promote a replica broker: initiating a switchover or a failover.

Switchover Failover
  • Prioritizes consistency over availability.
  • Prioritizes availability over consistency.
  • Brokers are guaranteed to have identical states.
  • Brokers are not guaranteed to be in identical states.
  • Brokers may not be available immediately to serve client traffic.
  • Replica broker is immediately available to serve client traffic.

To initiate a failover or switchover

    1. Navigate to the Amazon MQ console, choose your primary broker, and log in to the ActiveMQ Web Console using the URLs located in the Connections panel.
    2. In the top menu, select Queues. You should be able to see four ActiveMQ.Plugin.Replication queues used by the replication feature.
      Active MQ console queues
    3. To test message replication from the primary to a replica broker, create a queue and send messages. To create the queue:
      • For Queue Name, enter TestQueue.
      • Choose Create.

      ActiveMQ console create queue

    4. Under Operations for the TestQueue, choose Send To and perform the following steps:
      • For Number of messages to send, enter 10 and keep the other defaults.
      • Under Message body, enter a test message.
      • Choose Send.

      ActiveMQ console send test message

    5. To promote the replica broker, navigate to the Amazon MQ console and change the Region to the AWS Region where the replica broker is located.
    6. Select the replica broker (in this example called Secondarybroker) and choose Promote replica.
      Amazon MQ console promote broker
    7. In the Promote replica broker pop-up window:
      • Select Failover or Switchover.
      • Enter confirm in text box.
      • Choose Confirm.

      Amazon MQ console confirm broker promotion

    8. While a replica broker is being promoted, its replication status changes to Promotion in progress. The corresponding primary broker’s replication status changes to Demotion in progress.

Replica Secondarybroker status – Promotion in progress:

Replica Secondarybroker status - Promotion in progress

Primary broker status – Demotion in progress:

Primary broker status - Demotion in progress

Secondarybroker status – Promoted to new primary broker:

Secondarybroker status – Promoted to new primary broker

  1. Once the Secondarybroker status is Running, log in to the ActiveMQ Web Console from the URLs located in the Connections panel. You can see the replicated messages sent from the former primary broker in Step 4 in the TestQueue:
    Replicated message from primary broker in TestQueue

Monitoring cross-Region data replication

To monitor cross-Region data replication progress, you can use the Amazon CloudWatch metrics TotalReplicationLag and ReplicationLag.

Amazon CloudWatch metrics TotalReplicationLag and ReplicationLag

You can use these two metrics to monitor the progress of a switchover. When their value reaches zero, the switchover will complete because the broker states have been synchronized and the replica broker begins accepting client connections. If the switchover does not progress fast enough, or if you need the replica broker to be immediately available to serve client traffic, you can request a failover at any time.

Note: A failover can interrupt an ongoing switchover. However, a switchover cannot interrupt an ongoing failover.

Issuing a failover request causes the replica broker to become immediately available, but does not provide any guarantees about what data has been replicated to the replica broker. This means that a failover can make data tracking and reconciliation more challenging for your client application than a switchover.

For this reason, we recommend that you always start with a switchover and interrupt it with a failover if necessary. To interrupt an ongoing switchover, follow the same steps as for promoting a replica broker, select the failover option, and confirm.

Note: If you fail back to the original primary broker, messages that are not replicated from the primary to the replica broker during the failover will still exist on the primary broker. Therefore, consumers must manage these messages. We recommend tracking the processed message IDs in a data store such as Amazon DynamoDB global tables and comparing the message to the processed message IDs.

If you no longer need to replicate broker data across Regions or if you need to delete a primary or replica broker, you must unpair the replica broker and reboot the primary broker. You can unpair the replica broker in the Amazon MQ console by following Delete a CRDR broker.

To unpair the broker using the AWS Command Line Interface (AWS CLI), run the following command, replacing the --broker-id with your primary broker ID:

aws mq update-broker --broker-id <primary broker ID> \
--data-replication-mode "NONE" \
--region us-east-1


Using the cross-Region data replication feature for Amazon MQ for ActiveMQ provides a straightforward way to implement cross-Region replication to improve the resilience of your architecture and meet your business continuity and disaster recovery requirements. This post explains how cross-Region data replication works in Amazon MQ, how to set up a cross-Region replica broker, and how to test and promote the replica broker.

For more details, see the Amazon MQ documentation.

For more serverless learning resources, visit Serverless Land.

Python 3.12 runtime now available in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/python-3-12-runtime-now-available-in-aws-lambda/

This post is written by Jeff Gebhart, Sr. Specialist TAM, Serverless.

AWS Lambda now supports Python 3.12 as both a managed runtime and container base image. Python 3.12 builds on the performance enhancements that were first released with Python 3.11, and adds a number of performance and language readability features in the interpreter. With this release, Python developers can now take advantage of these new features and enhancements when creating serverless applications on AWS Lambda.

You can use Python 3.12 with Powertools for AWS Lambda (Python), a developer toolkit to implement Serverless best practices such as observability, batch processing, Parameter Store integration, idempotency, feature flags, CloudWatch Metrics, and structured logging among other features.

You can also use Python 3.12 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

Python is a popular language for building serverless applications. The Python 3.12 release has a number of interpreter and syntactic improvements.

At launch, new Lambda runtimes receive less usage than existing, established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Lambda runtime changes

Amazon Linux 2023

The Python 3.12 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. This OS update brings several improvements over the Amazon Linux 2 (AL2)-based OS used for Lambda Python runtimes from Python 3.8 to Python 3.11.

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint of less than 40MB compared to over 100MB for Lambda’s AL2-based images.

With glibc version 2.34, customers have access to a modern version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the Python 3.12 base image.

Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Response format change

Starting with the Python 3.12 runtime, functions return Unicode characters as part of their JSON response. Previous versions return escaped sequences for Unicode characters in responses.

For example, in Python 3.11, if you return a Unicode string such as “こんにちは”, it escapes the Unicode characters and returns “\u3053\u3093\u306b\u3061\u306f”. The Python 3.12 runtime returns the original “こんにちは”.

This change reduces the size of the payload returned by Lambda. In the previous example, the escaped version is 32 bytes compared to 17 bytes with the Unicode string. Using Unicode responses reduces the size of Lambda responses, making it easier to fit larger responses into the 6MB Lambda response (synchronous) limit.

When upgrading to Python 3.12, you may need to adjust your code in other modules to account for this new behavior. If the caller expects escaped Unicode based on the previous runtime behavior, you must either add code to the returning function to escape the Unicode manually, or adjust the caller to handle the Unicode return.

Extensions processing for graceful shutdown

Lambda functions with external extensions can now benefit from improved graceful shutdown capabilities. When the Lambda service is about to shut down the runtime, it sends a SIGTERM signal to the runtime and then a SHUTDOWN event to each registered external extension.

These events are sent each time an execution environment shuts down. This allows you to catch the SIGTERM signal in your Lambda function and clean up resources, such as database connections, which were created by the function.

To learn more about the Lambda execution environment lifecycle, see Lambda execution environment. More details and examples of how to use graceful shutdown with extensions is available in the AWS Samples GitHub repository.

New Python features

Comprehension inlining

With the implementation of PEP 709, dictionary, list, and set comprehensions are now inlined. Prior versions create a single-use function to execute such comprehensions. Removing this overhead results in faster comprehension execution by a factor of two.

There are some behavior changes to comprehensions because of this update. For example, a call to the ‘locals()’ function from within the comprehension now includes objects from the containing scope, not just within the comprehension itself as in prior versions. You should test functions you are migrating from an earlier version of Python to Python 3.12.

Typing changes

Python 3.12 continues the evolution of including type annotations to Python. PEP 695 includes a new, more compact syntax for generic classes and functions, and adds a new “type” statement to allow for type alias creation. Type aliases are evaluated on demand. This permits aliases to refer to other types defined later.

Type parameters are visible within the scope of the declaration and any nested scopes, but not in the outer scope.

Formalization of f-strings

One of the largest changes in Python 3.12, the formalization of f-strings syntax, is covered under PEP 701. Any valid expression can now be contained within an f-string, including other f-strings.

In prior versions of Python, reusing quotes within an f-string results in errors. With Python 3.12, quote reuse is fully supported in nested f-strings such as the following example:

>>>songs = ['Take me back to Eden', 'Alkaline', 'Ascensionism']

>>>f"This is the playlist: {", ".join(songs)}"

'This is the playlist: Take me back to Eden, Alkaline, Ascensionism'

Additionally, any valid Python expression can be contained within an f-string. This includes multi-line expressions and the ability to embed comments within an f-string.

Before Python 3.12, the “\” character is not permitted within an f-string. This prevented use of “\N” syntax for defining escaped Unicode characters within the body of an f-string.

Asyncio improvements

There are a number of improvements to the asyncio module. These include performance improvements to writing of sockets and a new implementation of asyncio.current_task() that can yield a 4–6 times performance improvement. Event loops now optimize their child watchers for their underlying environment.

Using Python 3.12 in Lambda

AWS Management Console

To use the Python 3.12 runtime to develop your Lambda functions, specify a runtime parameter value Python 3.12 when creating or updating a function. The Python 3.12 version is now available in the Runtime dropdown in the Create Function page:

To update an existing Lambda function to Python 3.12, navigate to the function in the Lambda console and choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown:

AWS Lambda container image

Change the Python base image version by modifying the FROM statement in your Dockerfile:

FROM public.ecr.aws/lambda/python:3.12
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

Customers running the Python 3.12 Docker images locally, including customers using AWS SAM, must upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM set the Runtime attribute to python3.12 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
    Type: AWS::Serverless::Function
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.12

AWS SAM supports generating this template with Python 3.12 for new serverless applications using the `sam init` command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.PYTHON_3_12 to use this version. In Python CDK:

from constructs import Construct 
from aws_cdk import ( App, Stack, aws_lambda as _lambda )

class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        base_lambda = _lambda.Function(self, 'SampleLambda', 

In TypeScript CDK:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The python3.12 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python311LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_12,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'


Lambda now supports Python 3.12. This release uses the Amazon Linux 2023 OS, supports Unicode responses, and graceful shutdown for functions with external extensions, and Python 3.12 language features.

You can build and deploy functions using Python 3.12 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC) tool. You can also use the Python 3.12 container base image if you prefer to build and deploy your functions using container images.

Python 3.12 runtime support helps developers to build more efficient, powerful, and scalable serverless applications. Try the Python 3.12 runtime in Lambda today and experience the benefits of this updated language version.

For more serverless learning resources, visit Serverless Land.

Introducing support for read-only management events in Amazon EventBridge

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/introducing-support-for-read-only-management-events-in-amazon-eventbridge/

This post is written by Pawan Puthran, Principal Specialist TAM, Serverless and Heeki Park, Principal Solutions Architect, Serverless

Today, AWS is announcing support for read-only management events in Amazon EventBridge. This feature enables customers to build rich event-driven responses from any action taken on AWS infrastructure to detect security vulnerabilities or identify suspicious activity in near real-time. You can now gain insight into all activity across all your AWS accounts and respond to those events as is appropriate.


EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the downstream targets that receive and process the events, and EventBridge routes the events accordingly.

EventBridge allows customers to monitor, audit, and react, in near real-time, to changes in their AWS environments through events generated by AWS CloudTrail for AWS API calls. CloudTrail records actions taken by a user, role, or an AWS service as events in a trail. Events include actions taken in the AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs and APIs.

Previously, only mutating API calls are published from CloudTrail to EventBridge for control plane changes. Events for mutating API calls include those that create, update, or delete resources. Control plane changes are also referred to as management events. EventBridge now supports non-mutating or read-only API calls for management events at no additional cost. These include those that list, get, or describe resources.

CloudTrail events in EventBridge enable you to build rich event-driven responses from any action taken on AWS infrastructure in real time. Previously, customers and partners often used a polling model to iterate over a batch of CloudTrail logs from Amazon S3 buckets to detect issues, making it slower to respond. The launch of read-only management events enables you to detect and remediate issues in near real-time and thus improve the overall security posture.

Enabling read-only management events

You can start receiving read-only management events if a CloudTrail is configured in your account and if the event selector for that CloudTrail is configured with ReadWriteType of either All or ReadOnly. This ensures that the read-only management events are logged in the CloudTrail and are then passed to EventBridge.

For example, you can receive an alert if a production account lists resources from an IP address outside of your VPC. Another example could be if an entity, such as a principal (AWS account root user, IAM role, or IAM user), calls the ListInstanceProfilesForRole or DescribeInstances APIs without any prior record of doing so. A malicious actor could use stolen credentials for conducting this reconnaissance to find more valuable credentials or determine the capabilities of the credentials they have.

To enable read-only management events with an EventBridge rule, in addition to the existing mutating events, use the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state when creating the rule.

    Type: 'AWS::Events::Rule'
      Description: 'Example for enabling read-only management events'
          - aws.s3
          - AWS API Call via CloudTrail
        - Arn: 'arn:aws:sns:us-east-1:123456789012:notificationTopic'
          Id: 'NotificationTarget'

You can also create a rule on an event bus to specify a particular API action by using the eventName attribute under the detail key:

aws events put-rule --name "SampleTestRule" \
--event-pattern '{"detail": {"eventName": ["ListBuckets"]}}' \

Common scenarios and use-cases

The following section describes a couple of the scenarios where you can set up EventBridge rules and take actions on non-mutating or read-only management events.

Detecting anomalous Secrets Manager GetSecretValue API Calls

Consider the security team of your organization that wants a notification whenever GetSecretValue API calls for AWS Secrets Manager are made through the CLI. They can then investigate if these calls are made by entities outside their organization or by an unauthorized user and take corrective actions to deny such requests.

When the application calls the GetSecretValue API to retrieve the secrets via a CLI, it generates an event like this:

    "version": "0",
    "id": "d3368cc1-e6d6-e4bf-e58e-030f03b6eae3",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.secretsmanager",
    "account": "111111111111",
    "time": "2023-11-08T19:58:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "AAAAAAAAAAAAA",
            "arn": "arn:aws:iam:: 111111111111:user/USERNAME"
            // ... additional detail fields
        "eventTime": "2023-11-08T19:58:38Z",
        "eventSource": "secretsmanager.amazonaws.com",
        "eventName": "GetSecretValue",
        "awsRegion": "us-east-1",
        "userAgent": "aws-cli/2.13.15 Python/3.11.4 Darwin/22.6.0 exe/x86_64 prompt/off command/secretsmanager.get-secret-value"
        // ... additional detail fields

You set the following event pattern on the rule to filter incoming events to specific consumers. This example also uses the recently launched wildcard filter for event matching.

    "source": [
    "detail-type": [
        "AWS API Call via CloudTrail"
    "detail": {
        "eventName": [
        "userAgent": [
                "wildcard": "aws-cli/*"

You can create a rule matching a combination of these event properties. In this case, you are matching for aws.secretsmanager as source, AWS API Call via CloudTrail as detail-type, GetSecretValue as detail.eventName and wildcard pattern on detail.userAgent for aws-cli/*. You can filter detail.userAgent with a wildcard to catch events that come from a particular application or user.

You can then route these events to a target like an Amazon CloudWatch Logs stream to record the change. You can also route them to Amazon SNS to get notified via email subscription. You can alternatively route them to an AWS Lambda function in which you perform custom business logic.

Creating an EventBridge rule for read-only management events

  1. Create a rule on the default event bus using the new state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS.
    aws events put-rule --name "monitor-secretsmanager" \
    --event-pattern '{"source": ["aws.secretsmanager"], "detail-type": ["AWS API Call via CloudTrail"], "detail": {"eventName": ["GetSecretValue"], "userAgent": [{ "wildcard": "aws-cli/*"} ]}}' \

    Rule details

    Rule details

  2. Configure a target. In this case, the target service is CloudWatch Logs but you can configure any of the supported targets.
    aws events put-targets --rule monitor-secretsmanager --targets Id=1,Arn=arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/events/getsecretvaluelogs --region us-east-1

    Target details

    Target details

You can then use CloudWatch Log Insights to search and analyze log data using the CloudWatch Log Insights query syntax where you can retrieve the user who performed these calls.

Identifying suspicious data exfiltration behavior

Consider the security or data perimeter team who wants to secure data residing in Amazon S3 buckets. The team requires notifications whenever API calls to list S3 buckets or to list S3 objects are made.

When a user or application calls the ListBuckets API to discover the available buckets, it generates the following CloudTrail event:

    "version": "0",
    "id": "345ca690-6510-85b2-ff02-090493a33cf1",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.s3",
    "account": "111111111111",
    "time": "2023-11-14T17:25:30Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.09",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "principal-identity-uuid",
            "arn": "arn:aws:iam::111111111111:user/exploited-user",
            "accountId": "111111111111",
            "accessKeyId": "AAAABBBBCCCCDDDDEEEE",
            "userName": "exploited-user"
        "eventTime": "2023-11-14T17:25:30Z",
        "eventSource": "s3.amazonaws.com",
        "eventName": "ListBuckets",
        "awsRegion": "us-east-1",
        "sourceIPAddress": "",
        "userAgent": "[aws-cli/2.13.29 Python/3.11.6 Darwin/22.6.0 exe/x86_64 prompt/off command/s3api.list-buckets]",
        "requestParameters": {
            "Host": "s3.us-east-1.amazonaws.com"
        "readOnly": true,
        "eventType": "AwsApiCall",
        "managementEvent": true
        // additional detail fields

In this scenario, you can create an EventBridge rule matching for aws.s3 for the source field, and ListBuckets for the eventName.

    "source": [
    "detail-type": [
        "AWS API Call via CloudTrail"
    "detail": {
        "eventName": [
            "ListBuckets "

However, listing objects alone might only be the beginning of a potential data exfiltration attempt. You may also want to check for ListObjects or ListObjectsV2 as the next action, followed by a large number of GetObject API calls. You can create the following rule to match those actions.

    "source": [
    "detail-type": [
        "AWS API Call via CloudTrail"
    "detail": {
        "eventName": [

You could potentially forward this log information to your central security logging solution or use anomaly detection machine learning models to evaluate these events to determine the appropriate response to these events.

Configuring cross-account and cross-Region event routing

You can also create rules to receive the read-only events to cross account or cross-Region to centralize your AWS events into one Region or one AWS account for auditing and monitoring purposes. For example, capture all workload events from multiple Regions in eu-west-1 for compliance reporting.

Cross Account example for default event bus and custom event bus

Cross Account example for default event bus and custom event bus

To do this, create a rule using the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state on the default bus of the source account or the Region, targeting either default event bus or a custom event bus of the target account or Region. You must also ensure you have a rule configured with ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS to be able to invoke the targets in the destination account or Region.

Cross-Region setup for CloudTrail read-only events

Cross-Region setup for CloudTrail read-only events


This blog shows how customers can build rich event-driven responses with the newly launched support for read-only events. You can now observe events as potential signals of reconnaissance and data exfiltration activities from any action taken on AWS infrastructure in near real time. You can also use the cross-Region and cross-account functionality to deliver the read-only events to a centralized AWS account or Region, enhancing the capability for auditing and monitoring across all your AWS environments.

For more serverless learning resources, visit Serverless Land.

Introducing advanced logging controls for AWS Lambda functions

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/introducing-advanced-logging-controls-for-aws-lambda-functions/

This post is written by Nati Goldberg, Senior Solutions Architect and Shridhar Pandey, Senior Product Manager, AWS Lambda

Today, AWS is launching advanced logging controls for AWS Lambda, giving developers and operators greater control over how function logs are captured, processed, and consumed.

This launch introduces three new capabilities to provide a simplified and enhanced default logging experience on Lambda.

First, you can capture Lambda function logs in JSON structured format without having to use your own logging libraries. JSON structured logs make it easier to search, filter, and analyze large volumes of log entries.

Second, you can control the log level granularity of Lambda function logs without making any code changes, enabling more effective debugging and troubleshooting.

Third, you can also set which Amazon CloudWatch log group Lambda sends logs to, making it easier to aggregate and manage logs at scale.


Being able to identify and filter relevant log messages is essential to troubleshoot and fix critical issues. To help developers and operators monitor and troubleshoot failures, the Lambda service automatically captures and sends logs to CloudWatch Logs.

Previously, Lambda emitted logs in plaintext format, also known as unstructured log format. This unstructured format could make the logs challenging to query or filter. For example, you had to search and correlate logs manually using well-known string identifiers such as “START”, “END”, “REPORT” or the request id of the function invocation. Without a native way to enrich application logs, you needed custom work to extract data from logs for automated analysis or to build analytics dashboards.

Previously, operators could not control the level of log detail generated by functions. They relied on application development teams to make code changes to emit logs with the required granularity level, such as INFO, DEBUG, or ERROR.

Lambda-based applications often comprise microservices, where a single microservice is composed of multiple single-purpose Lambda functions. Before this launch, Lambda sent logs to a default CloudWatch log group created with the Lambda function with no option to select a log group. Now you can aggregate logs from multiple functions in one place so you can uniformly apply security, governance, and retention policies to your logs.

Capturing Lambda logs in JSON structured format

Lambda now natively supports capturing structured logs in JSON format as a series of key-value pairs, making it easier to search and filter logs more easily.

JSON also enables you to add custom tags and contextual information to logs, enabling automated analysis of large volumes of logs to help understand the function performance. The format adheres to the OpenTelemetry (OTel) Logs Data Model, a popular open-source logging standard, enabling you to use open-source tools to monitor functions.

To set the log format in the Lambda console, select the Configuration tab, choose Monitoring and operations tools on the left pane, then change the log format property:

Currently, Lambda natively supports capturing application logs (logs generated by the function code) and system logs (logs generated by the Lambda service) in JSON structured format.

This is for functions that use non-deprecated versions of Python, Node.js, and Java Lambda managed runtimes, when using Lambda recommended logging methods such as using logging library for Python, console object for Node.js, and LambdaLogger or Log4j for Java.

For other managed runtimes, Lambda currently only supports capturing system logs in JSON structured format. However, you can still capture application logs in JSON structured format for these runtimes by manually configuring logging libraries. See configuring advanced logging controls section in the Lambda Developer Guide to learn more. You can also use Powertools for AWS Lambda to capture logs in JSON structured format.

Changing log format from text to JSON can be a breaking change if you parse logs in a telemetry pipeline. AWS recommends testing any existing telemetry pipelines after switching log format to JSON.

Working with JSON structured format for Node.js Lambda functions

You can use JSON structured format with CloudWatch Embedded Metric Format (EMF) to embed custom metrics alongside JSON structured log messages, and CloudWatch automatically extracts the custom metrics for visualization and alarming. However, to use JSON log format along with EMF libraries for Node.js Lambda functions, you must use the latest version of the EMF client library for Node.js or the latest version of Powertools for AWS Lambda (TypeScript) library.

Configuring log level granularity for Lambda function

You can now filter Lambda logs by log level, such as ERROR, DEBUG, or INFO, without code changes. Simplified log level filtering enables you to choose the required logging granularity level for Lambda functions, without sifting through large volumes of logs to debug errors.

You can specify separate log level filters for application logs (which are logs generated by the function code) and system logs (which are logs generated by the Lambda service, such as START and REPORT log messages). Note that log level controls are only available if the log format of the function is set to JSON.

The Lambda console allows setting both the Application log level and System log level properties:

You can define the granularity level of each log event in your function code. The following statement prints out the event input of the function, emitted as a DEBUG log message:


Once configured, log events emitted with a lower log level than the one selected are not published to the function’s CloudWatch log stream. For example, setting the function’s log level to INFO results in DEBUG log events being ignored.

This capability allows you to choose the appropriate amount of logs emitted by functions. For example, you can set a higher log level to improve the signal-to-noise ratio in production logs, or set a lower log level to capture detailed log events for testing or troubleshooting purposes.

Customizing Lambda function’s CloudWatch log group

Previously, you could not specify a custom CloudWatch log group for functions, so you could not stream logs from multiple functions into a shared log group. Also, to set custom retention policy for multiple log groups, you had to create each log group separately using a pre-defined name (for example, /aws/lambda/<function name>).

Now you can select a custom CloudWatch log group to aggregate logs from multiple functions automatically within an application in one place. You can apply security, governance, and retention policies at the application level instead of individually to every function.

To distinguish between logs from different functions in a shared log group, each log stream contains the Lambda function name and version.

You can share the same log group between multiple functions to aggregate logs together. The function’s IAM policy must include the logs:CreateLogStream and logs:PutLogEvents permissions for Lambda to create logs in the specified log group. The Lambda service can optionally create these permissions, when you configure functions in the Lambda console.

You can set the custom log group in the Lambda console by entering the destination log group name. If the entered log group does not exist, Lambda creates it automatically.

Advanced logging controls for Lambda can be configured using Lambda APIAWS Management ConsoleAWS Command Line Interface (CLI), and infrastructure as code (IaC) tools such as AWS Serverless Application Model (AWS SAM) and AWS CloudFormation.

Example of Lambda advanced logging controls

This section demonstrates how to use the new advanced logging controls for Lambda using AWS SAM to build and deploy the resources in your AWS account.


The following diagram shows Lambda functions processing newly created objects inside an Amazon S3 bucket, where both functions emit logs into the same CloudWatch log group:

The architecture includes the following steps:

  1. A new object is created inside an S3 bucket.
  2. S3 publishes an event using S3 Event Notifications to Amazon EventBridge.
  3. EventBridge triggers two Lambda functions asynchronously.
  4. Each function processes the object to extract labels and text, using Amazon Rekognition and Amazon Textract.
  5. Both functions then emit logs into the same CloudWatch log group.

This uses AWS SAM to define the Lambda functions and configure the required logging controls. The IAM policy allows the function to create a log stream and emit logs to the selected log group:

    Type: AWS::Serverless::Function 
      CodeUri: detect-labels/
      Handler: app.lambdaHandler
      Runtime: nodejs18.x
        - Version: 2012-10-17
            - Sid: CloudWatchLogGroup
                - logs:CreateLogStream
                - logs:PutLogEvents
              Resource: !GetAtt CloudWatchLogGroup.Arn
              Effect: Allow
        LogFormat: JSON 
        ApplicationLogLevel: DEBUG 
        SystemLogLevel: INFO 
        LogGroup: !Ref CloudWatchLogGroup 

Deploying the example

To deploy the example:

  1. Clone the GitHub repository and explore the application.
    git clone https://github.com/aws-samples/advanced-logging-controls-lambda/
    cd advanced-logging-controls-lambda
  2. Use AWS SAM to build and deploy the resources to your AWS account. This compiles and builds the application using npm, and then populate the template required to deploy the resources:
    sam build
  3. Deploy the solution to your AWS account with a guided deployment, using AWS SAM CLI interactive flow:
    sam deploy --guided
  4. Enter the following values:
    • Stack Name: advanced-logging-controls-lambda
    • Region: your preferred Region (for example, us-east-1)
    • Parameter UploadsBucketName: enter a unique bucket name.
    • Accept the rest of the initial defaults.
  5. To test the application, use the AWS CLI to copy the sample image into the S3 bucket you created.
    aws s3 cp samples/skateboard.jpg s3://example-s3-images-bucket

Explore CloudWatch Logs to view the logs emitted into the log group created, AggregatedLabelsLogGroup:

The DetectLabels Lambda function emits DEBUG log events in JSON format to the log stream. Log events with the same log level from the ExtractText Lambda function are omitted. This is a result of the different application log level settings for each function (DEBUG and INFO).

You can also use CloudWatch Logs Insights to search, filter, and analyze the logs in JSON format using this sample query:

You can see the results:


Advanced logging controls for Lambda give you greater control over logging. Use advanced logging controls to control your Lambda function’s log level and format, allowing you to search, query, and filter logs to troubleshoot issues more effectively.

You can also choose the CloudWatch log group where Lambda sends your logs. This enables you to aggregate logs from multiple functions into a single log group, apply retention, security, governance policies, and easily manage logs at scale.

To get started, specify the required settings in the Logging Configuration for any new or existing Lambda functions.

Advanced logging controls for Lambda are available in all AWS Regions where Lambda is available at no additional cost. Learn more about AWS Lambda Advanced Logging Controls.

For more serverless learning resources, visit Serverless Land.

Introducing the AWS Integrated Application Test Kit (IATK)

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/aws-integrated-application-test-kit/

This post is written by Dan Fox, Principal Specialist Solutions Architect, and Brian Krygsman, Senior Solutions Architect.

Today, AWS announced the public preview launch of the AWS Integrated Application Test Kit (IATK). AWS IATK is a software library that helps you write automated tests for cloud-based applications. This blog post presents several initial features of AWS IATK, and then shows working examples using an example video processing application. If you are getting started with serverless testing, learn more at serverlessland.com/testing.


When you create applications composed of serverless services like AWS Lambda, Amazon EventBridge, or AWS Step Functions, many of your architecture components cannot be deployed to your desktop, but instead only exist in the AWS Cloud. In contrast to working with applications deployed locally, these types of applications benefit from cloud-based strategies for performing automated tests. For its public preview launch, AWS IATK helps you implement some of these strategies for Python applications. AWS IATK will support other languages in future launches.

Locating resources for tests

When you write automated tests for cloud resources, you need the physical IDs of your resources. The physical ID is the name AWS assigns to a resource after creation. For example, to send requests to Amazon API Gateway you need the physical ID, which forms the API endpoint.

If you deploy cloud resources in separate infrastructure as code stacks, you might have difficulty locating physical IDs. In CloudFormation, you create the logical IDs of the resources in your template, as well as the stack name. With IATK, you can get the physical ID of a resource if you provide the logical ID and stack name. You can also get stack outputs by providing the stack name. These convenient methods simplify locating resources for the tests that you write.

Creating test harnesses for event driven architectures

To write integration tests for event driven architectures, establish logical boundaries by breaking your application into subsystems. Your subsystems should be simple enough to reason about, and contain understandable inputs and outputs. One useful technique for testing subsystems is to create test harnesses. Test harnesses are resources that you create specifically for testing subsystems.

For example, an integration test can begin a subsystem process by passing an input test event to it. IATK can create a test harness for you that listens to Amazon EventBridge for output events. (Under the hood, the harness is composed of an EventBridge Rule that forwards the output event to Amazon Simple Queue Service.) Your integration test then queries the test harness to examine the output and determine if the test passes or fails. These harnesses help you create integration tests in the cloud for event driven architectures.

Establishing service level agreements to test asynchronous features

If you write a synchronous service, your automated tests make requests and expect immediate responses. When your architecture is asynchronous, your service accepts a request and then performs a set of actions at a later time. How can you test for the success of an activity if it does not have a specified duration?

Consider creating reasonable timeouts for your asynchronous systems. Document timeouts as service level agreements (SLAs). You may decide to publish your SLAs externally or to document them as internal standards. IATK contains a polling feature that allows you to establish timeouts. This feature helps you to test that your asynchronous systems complete tasks in a timely manner.

Using AWS X-Ray for detailed testing

If you want to gain more visibility into the interior details of your application, instrument with AWS X-Ray. With AWS X-Ray, you trace the path of an event through multiple services. IATK provides conveniences that help you set the AWS X-Ray sampling rate, get trace trees, and assert for trace durations. These features help you observe and test your distributed systems in greater detail.

Learn more about testing asynchronous architectures at aws-samples/serverless-test-samples.

Overview of the example application

To demonstrate the features of IATK, this post uses a portion of a serverless video application designed with a plugin architecture. A core development team creates the primary application. Distributed development teams throughout the organization create the plugins. One AWS CloudFormation stack deploys the primary application. Separate stacks deploy each plugin.

Communications between the primary application and the plugins are managed by an EventBridge bus. Plugins pull application lifecycle events off the bus and must put completion notification events back on the bus within 20 seconds. For testing, the core team has created an AWS Step Functions workflow that mimics the production process by emitting properly formatted example lifecycle events. Developers run this test workflow in development and test environments to verify that their plugins are communicating properly with the event bus.

The following demonstration shows an integration test for the example application that validates plugin behavior. In the integration test, IATK locates the Step Functions workflow. It creates a test harness to listen for the event completion notification to be sent by the plugin. The test then runs the workflow to begin the lifecycle process and start plugin actions. Then IATK uses a polling mechanism with a timeout to verify that the plugin complies with the 20 second service level agreement. This is the sequence of processing:

Sequence of processing

  1. The integration test starts an execution of the test workflow.
  2. The workflow puts a lifecycle event onto the bus.
  3. The plugin pulls the lifecycle event from the bus.
  4. When the plugin is complete, it puts a completion event onto the bus.
  5. The integration test polls for the completion event to determine if the test passes within the SLA.

Deploying and testing the example application

Follow these steps to review this application, build it locally, deploy it in your AWS account, and test it.

Downloading the example application

  1. Open your terminal and clone the example application from GitHub with the following command or download the code. This repository also includes other example patterns for testing serverless applications.
    git clone https://github.com/aws-samples/serverless-test-samples
  2. The root of the IATK example application is in python-test-samples/integrated-application-test-kit. Change to this directory:
    cd serverless-test-samples/python-test-samples/integrated-application-test-kit

Reviewing the integration test

Before deploying the application, review how the integration test uses the IATK by opening plugins/2-postvalidate-plugins/python-minimal-plugin/tests/integration/test_by_polling.py in your text editor. The test class instantiates the IATK at the top of the file.

iatk_client = aws_iatk.AwsIatk(region=aws_region)

In the setUp() method, the test class uses IATK to fetch CloudFormation stack outputs. These outputs are references to deployed cloud components like the plugin tester AWS Step Functions workflow:

stack_outputs = self.iatk_client.get_stack_outputs(

The test class attaches a listener to the default event bus using an Event Rule provided in the stack outputs. The test uses this listener later to poll for events.

add_listener_output = self.iatk_client.add_listener(

The test class cleans up the listener in the tearDown() method.


Once the configurations are complete, the method test_minimal_plugin_event_published_polling() implements the actual test.

The test first initializes the trigger event.

trigger_event = {
    "eventHook": "postValidate",
    "pluginTitle": "PythonMinimalPlugin"

Next, the test starts an execution of the plugin tester Step Functions workflow. It uses the plugin_tester_arn that was fetched during setUp.


The test polls the listener, waiting for the plugin to emit events. It stops polling once it hits the SLA timeout or receives the maximum number of messages.

poll_output = self.iatk_client.poll_events(

Finally, the test asserts that it receives the right number of events, and that they are well-formed.

self.assertEqual(len(poll_output.events), 1)
self.assertEqual(received_event["source"], "video.plugin.PythonMinimalPlugin")
self.assertEqual(received_event["detail-type"], "plugin-complete")

Installing prerequisites

You need the following prerequisites to build this example:

Build and deploy the example application components

  1. Use AWS SAM to build and deploy the plugin tester to your AWS account. The plugin tester is the Step Functions workflow shown in the preceding diagram. During the build process, you can add the --use-container flag to the build command to instruct AWS SAM to create the application in a provided container. You can accept or override the default values during the deploy process. You will use “Stack Name” and “AWS Region” later to run the integration test.
    cd plugins/plugin_tester # Move to the plugin tester directory
    sam build --use-container # Build the plugin tester

    sam build

  2. Deploy the tester:
    sam deploy --guided # Deploy the plugin tester

    Deploy the tester

  3. Once the plugin tester is deployed, use AWS SAM to deploy the plugin.
    cd ../2-postvalidate-plugins/python-minimal-plugin # Move to the plugin directory
    sam build --use-container # Build the plugin

    Deploy the plugin

  4. Deploy the plugin:
    sam deploy --guided # Deploy the plugin

Running the test

You can run tests written with IATK using standard Python test runners like unittest and pytest. The example application test uses unittest.

    1. Use a virtual environment to organize your dependencies. From the root of the example application, run:
      python3 -m venv .venv # Create the virtual environment
      source .venv/bin/activate # Activate the virtual environment
    2. Install the dependencies, including the IATK:
      cd tests 
      pip3 install -r requirements.txt
    3. Run the test, providing the required environment variables from the earlier deployments. You can find correct values in the samconfig.toml file of the plugin_tester directory.

      cd integration
      PLUGIN_TESTER_STACK_NAME=video-plugin-tester \
      AWS_REGION=us-west-2 \
      python3 -m unittest ./test_by_polling.py

You should see output as unittest runs the test.

Open the Step Functions console in your AWS account, then choose the PluginLifecycleWorkflow-<random value> workflow to validate that the plugin tester successfully ran. A recent execution shows a Succeeded status:

Recent execution status

Review other IATK features

The example application includes examples of other IATK features like generating mock events and retrieving AWS X-Ray traces.

Cleaning up

Use AWS SAM to clean up both the plugin and the plugin tester resources from your AWS account.

  1. Delete the plugin resources:
    cd ../.. # Move to the plugin directory
    sam delete # Delete the plugin

    Deleting resources

  2. Delete the plugin tester resources:
    cd ../../plugin_tester # Move to the plugin tester directory
    sam delete # Delete the plugin tester

    Deleting the tester

The temporary test harness resources that IATK created during the test are cleaned up when the tearDown method runs. If there are problems during teardown, some resources may not be deleted. IATK adds tags to all resources that it creates. You can use these tags to locate the resources then manually remove them. You can also add your own tags.


The AWS Integrated Application Test Kit is a software library that provides conveniences to help you write automated tests for your cloud applications. This blog post shows some of the features of the initial Python version of the IATK.

To learn more about automated testing for serverless applications, visit serverlessland.com/testing. You can also view code examples at serverlessland.com/testing/patterns or at the AWS serverless-test-samples repository on GitHub.

For more serverless learning resources, visit Serverless Land.

Triggering AWS Lambda function from a cross-account Amazon Managed Streaming for Apache Kafka

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/triggering-aws-lambda-function-from-a-cross-account-amazon-managed-streaming-for-apache-kafka/

This post is written by Subham Rakshit, Senior Specialist Solutions Architect, and Ismail Makhlouf, Senior Specialist Solutions Architect.

Many organizations use a multi-account strategy for stream processing applications. This involves decomposing the overall architecture into a single producer account and many consumer accounts. Within AWS, in the producer account, you can use Amazon Managed Streaming for Apache Kafka (Amazon MSK), and in their consumer accounts have AWS Lambda functions for event consumption. This blog post explains how you can trigger Lambda functions from a cross-account Amazon MSK cluster.

The Lambda event sourcing mapping (ESM) for Amazon MSK continuously polls for new events from the Amazon MSK cluster, aggregates them into batches, and then triggers the target Lambda function. The ESM for Amazon MSK functions as a serverless set of Kafka consumers that ensures that each event is processed at least once. Additionally, events are processed in the same order they are received within each Kafka partition. In addition, the ESM batches the stream of data and filters the events based on configured logic.


Amazon MSK supports two different deployment types: provisioned and serverless. Triggering a Lambda function from a cross-account Amazon MSK cluster is only supported with a provisioned cluster deployed within the same Region. To facilitate this functionality, Amazon MSK uses multi-VPC private connectivity, powered by AWS PrivateLink, which simplifies connecting Kafka consumers hosted in different AWS accounts to an Amazon MSK cluster.

The following diagram illustrates the architecture of this example:

Architecture Diagram

The architecture is divided in two parts: the producer and the consumer.

In the producer account, you have the Amazon MSK cluster with multi-VPC connectivity enabled. Multi-VPC connectivity is only available for authenticated Amazon MSK clusters. Cluster policies are required to grant permissions to other AWS accounts, allowing them to establish private connectivity to the Amazon MSK cluster. You can delegate permissions to relevant roles or users. When combined with AWS Identity and Access Management (IAM) client authentication, cluster policies offer fine-grained control over Kafka data plane permissions for connecting applications.

In the consumer account, you have the Lambda ESM for Amazon MSK and the managed VPC connection deployed within the same VPC. The managed VPC connection allows private connectivity from the consumer application VPC to the Amazon MSK cluster. The Lambda ESM for Amazon MSK connects to the cross-account Amazon MSK cluster via IAM authentication. It also supports SASL/SCRAM, and mutual TLS (mTLS) authenticated clusters. The ESM receives the event from the Kafka topic and invokes the Lambda function to process it.

Deploying the example application

To set up the Lambda function trigger from a cross-account Amazon MSK cluster as the event source, follow these steps. The AWS CloudFormation templates for deploying the example are accessible in the GitHub repository.

As a part of this example, some sample data is published using the Kafka console producer and Lambda processes these events and writes to Amazon S3.


For this example, you need two AWS accounts. This post uses the following naming conventions:

  • Producer (for example, account No: 1111 1111 1111): Account that hosts the Amazon MSK cluster and Kafka client instance.
  • Consumer (for example, account No: 2222 2222 2222): Account that hosts the Lambda function and consumes events from Amazon MSK.

To get started:

  1. Clone the repository locally:
    git clone https://github.com/aws-samples/lambda-cross-account-msk.git
  2. Set up the producer account: you must configure the VPC networking, deploy the Amazon MSK cluster, and a Kafka client instance to publish data. To do this, deploy the CloudFormation template producer-account.yaml from the AWS console and take note of the MSKClusterARN from the CloudFormation outputs tab.
  3. Set up the consumer account: To set up the consumer account, you need the Lambda function, IAM role used by the Lambda function, and S3 bucket receiving the data. For this, deploy the CloudFormation template consumer-account.yaml from the AWS console with the input parameter MSKAccountId, that is the producer AWS account ID (for example, account Id: 1111 1111 1111). Note the LambdaRoleArn from the CloudFormation outputs tab.

Setting up multi-VPC connectivity in the Amazon MSK cluster

Once the accounts are created, you must enable connectivity between them. By enabling multi-VPC private connectivity in the Amazon MSK cluster, you set up the network connection to allow the cross-account consumers to connect to the cluster.

  1. In the producer account, navigate to the Amazon MSK console.
  2. Choose producer-cluster, and go to the Properties tab.
  3. Scroll to Networking settings, choose Edit, and select Turn on multi-VPC connectivity. This takes some time, then appears as follows.Networking settings
  4. Add the necessary cluster policy to allow cross-account consumers to connect to Amazon MSK. In the producer account, deploy the CloudFormation template producer-msk-cluster-policy.yaml from the AWS console with the following input parameters:
    • MSKClusterArnAmazon Resource Name (ARN) of the Amazon MSK cluster in producer account. Find this information in the CloudFormation output of producer-account.yaml.
    • LambdaRoleArn – ARN of the IAM role attached to the Lambda function in the consumer account. Find this information in the CloudFormation output of consumer-account.yaml.
    • LambdaAccountId – Consumer AWS account ID (for example, account Id: 2222 2222 2222).

Creating a Kafka topic in Amazon MSK and publishing events

In the producer account, navigate to the Amazon MSK console. Choose the Amazon MSK cluster named producer-cluster. Choose View client information to show the bootstrap server.

Client information

The CloudFormation template also deploys a Kafka client instance to create topics and publish events.

To access the client, go to the Amazon EC2 console and choose the instance producer-KafkaClientInstance1. Connect to EC2 instance with Session Manager:

sudo su - ec2-user
#Set MSK Broker IAM endpoint
export BS=<<Provide IAM bootstrap address here>>

You must use the single-VPC Private endpoint for the Amazon MSK cluster and not the multi-VPC private endpoint, as you are going to publish events from a Kafka console producer from the producer account.

Run these scripts to create the customer topic and publish sample events in the topic:


Creating a managed VPC connection in the consumer account

To establish a connection to the Amazon MSK cluster in the producer account, you must create a managed VPC connection in the consumer account. Lambda communicates with cross-account Amazon MSK through this managed VPC connection.

For detailed setup steps, read the Amazon MSK managed VPC connection documentation.

Configuring the Lambda ESM for Amazon MSK

The final step is to set up the Lambda ESM for Amazon MSK. Setting up the ESM enables you to connect to the Amazon MSK cluster in the producer account via the managed VPC endpoint. This allows you to trigger the Lambda function to process the data produced from the Kafka topic:

  1. In the consumer account, go to the Lambda console.
  2. Open the Lambda function msk-lambda-cross-account-iam.
  3. Go to the Configuration tab, select Triggers, and choose Add Trigger.
  4. For Trigger configuration, select Amazon MSK.

Lambda trigger

To configure this trigger:

  1. Select the shared Amazon MSK cluster. This automatically defaults to the IAM authentication that is used to connect to the cluster.
    MSK Lambda trigger
  2. By default, the Active trigger check box is enabled. This ensures that the trigger is in the active state after creation. For the other values:
    1. Keep the Batch size default to 100.
    2. Change the Starting Position to Trim horizon.
    3. Set the Topic name as customer.
    4. Set the Consumer Group ID as msk-lambda-iam.

Trigger configuration

Scroll to the bottom and choose Add. This starts creating the Amazon MSK trigger, which takes several minutes. After creation, the state of the trigger shows as Enabled.

Verifying the output on the consumer side

The Lambda function receives the events and writes them in an S3 bucket.

To validate that the function is working, go to the consumer account and navigate to the S3 console. Search for the cross-account-lambda-consumer-data-<<REGION>>-<<AWS Account Id>> bucket. In the bucket, you see the customer-data-<<datetime>>.csv files.

S3 bucket objects

Cleaning up

You must empty and delete the S3 bucket, managed VPC connection, and the Lambda ESM for Amazon MSK manually from the consumer account. Next, delete the CloudFormation stacks from the AWS console from both the producer and consumer accounts to remove all other resources created as a part of the example.


With Lambda and Amazon MSK, you can now build a decentralized application distributed across multiple AWS accounts. This post shows how you can set up Amazon MSK as an event source for cross-account Lambda functions and also walks you through the configuration required in both producer and consumer accounts.

For further reading on AWS Lambda with Amazon MSK as an event source, visit the documentation.

For more serverless learning resources, visit Serverless Land.

Node.js 20.x runtime now available in AWS Lambda

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/node-js-20-x-runtime-now-available-in-aws-lambda/

This post is written by Pascal Vogel, Solutions Architect, and Andrea Amorosi, Senior Solutions Architect.

You can now develop AWS Lambda functions using the Node.js 20 runtime. This Node.js version is in active LTS status and ready for general use. To use this new version, specify a runtime parameter value of nodejs20.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 20 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes proven libraries to support common patterns such as observability, Parameter Store integration, idempotency, batch processing, and more.

You can also use Node.js 20 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 20 runtime in your serverless applications.

Node.js 20 runtime updates

Changes to Root CA certificate loading

By default, Node.js includes root certificate authority (CA) certificates from well-known certificate providers. Earlier Lambda Node.js runtimes up to Node.js 18 augmented these certificates with Amazon-specific CA certificates, making it easier to create functions accessing other AWS services. For example, it included the Amazon RDS certificates necessary for validating the server identity certificate installed on your Amazon RDS database.

However, loading these additional certificates has a performance impact during cold start. Starting with Node.js 20, Lambda no longer loads these additional CA certificates by default. The Node.js 20 runtime contains a certificate file with all Amazon CA certificates located at /var/runtime/ca-cert.pem. By setting the NODE_EXTRA_CA_CERTS environment variable to /var/runtime/ca-cert.pem, you can restore the behavior from Node.js 18 and earlier runtimes.

This causes Node.js to validate and load all Amazon CA certificates during a cold start. It can take longer compared to loading only specific certificates. For the best performance, we recommend bundling only the certificates that you need with your deployment package and loading them via NODE_EXTRA_CA_CERTS. The certificates file should consist of one or more trusted root or intermediate CA certificates in PEM format.

For example, for RDS, include the required certificates alongside your code as certificates/rds.pem and then load it as follows:


See Using Lambda environment variables in the AWS Lambda Developer Guide for detailed instructions for setting environment variables.

Amazon Linux 2023

The Node.js 20 runtime is based on the provided.al2023 runtime. The provided.al2023 runtime in turn is based on the Amazon Linux 2023 minimal container image release and brings several improvements over Amazon Linux 2 (AL2).

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint with a compressed image size of less than 40MB compared to the over 100MB AL2-based base image.

With glibc version 2.34, customers have access to a more recent version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as package manager, symlinked as dnf, replacing yum in AL2-based images. Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Runtime Interface Client

The Node.js 20 runtime uses the open source AWS Lambda NodeJS Runtime Interface Client (RIC). You can now use the same RIC version in your Open Container Initiative (OCI) Lambda container images as the one used by the managed Node.js 20 runtime.

The Node.js 20 runtime supports Lambda response streaming which enables you to send response payload data to callers as it becomes available. Response streaming can improve application performance by reducing time-to-first-byte, can indicate progress during long-running tasks, and allows you to build functions that return payloads larger than 6MB, which is the Lambda limit for buffered responses.

Setting Node.js heap memory size

Node.js allows you to configure the heap size of the v8 engine via the --max-old-space-size and --max-semi-space-size options. By default, Lambda overrides the Node.js default values with values derived from the memory size configured for the function. If you need control over your runtime’s memory allocation, you can now set both of these options using the NODE_OPTIONS environment variable, without needing an exec wrapper script. See Using Lambda environment variables in the AWS Lambda Developer Guide for details.

Use the --max-old-space-size option to set the max memory size of V8’s old memory section, and the --max-semi-space-size option to set the maximum semispace size for V8’s garbage collector. See the Node.js documentation for more details on these options.

Node.js 20 language updates

Language features

With this release, Lambda customers can take advantage of new Node.js 20 language features, including:

  • HTTP(S)/1.1 default keepAlive: Node.js now sets keepAlive to true by default. Any outgoing HTTPs connections use HTTP 1.1 keep-alive with a default waiting window of 5 seconds. This can deliver improved throughput as connections are reused by default.
  • Fetch API is enabled by default: The global Node.js Fetch API is enabled by default. However, it is still an experimental module.
  • Faster URL parsing: Node.js 20 comes with the Ada 2.0 URL parser which brings performance improvements to URL parsing. This has also been back-ported to Node.js 18.7.0.
  • Web Crypto API now stable: The Node.js implementation of the standard Web Crypto API has been marked as stable. You can access the provided cryptographic primitives through globalThis.crypto.
  • Web assembly support: Node.js 20 enables the experimental WebAssembly System Interface (WASI) API by default without the need to set an experimental flag.

For a detailed overview of Node.js 20 language features, see the Node.js 20 release blog post and the Node.js 20 changelog.

Performance considerations

Node.js 19.3 introduced a change that impacts how non-essential modules are lazy-loaded during the Node.js process startup. In terms of the impact to your Lambda functions, this reduces the work during initialization of each execution environment, then if used, the modules will instead be loaded during the first function invoke. This change remains in Node.js 20.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

Migration from Node.js 16

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

In the case of Node.js 16, we have delayed deprecation from the community end of support date on September 11, 2023, to June 12, 2024. This gives customers the opportunity to migrate directly from Node.js 16 to Node.js 20, skipping Node.js 18.

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2020. Starting with Node.js 18, and continuing with Node.js 20, the Lambda Node.js runtimes have upgraded the version of the AWS SDK for JavaScript included in the runtime from v2 to v3. Customers upgrading from Node.js 16 or earlier runtimes who are using the included AWS SDK for JavaScript v2 should upgrade their code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Using the Node.js 20 runtime in AWS Lambda

AWS Management Console

To use the Node.js 20 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 20.x when creating or updating a function. The Node.js 20 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

Select Node.js 20.x when creating a new AWS Lambda function in the AWS Management Console

To update an existing Lambda function to Node.js 20, navigate to the function in the Lambda console, then choose Edit in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

Select Node.js 20.x when updating an existing AWS Lambda function in the AWS Management Console

AWS Lambda – Container Image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile:

FROM public.ecr.aws/lambda/nodejs:20
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

Customers running Node.js 20 Docker images locally, including customers using AWS SAM, will need to upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node20.x to use this version:

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

    Type: AWS::Serverless::Function
      Handler: lambda_function.lambda_handler
      Runtime: nodejs20.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

AWS Cloud Development Kit (AWS CDK)

In the AWS CDK, set the runtime attribute to Runtime.NODEJS_20_X to use this version:

import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 20 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node20LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_20_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",


Lambda now supports Node.js 20. This release uses the Amazon Linux 2023 OS, supports configurable CA certificate loading for faster cold starts, as well as other improvements detailed in this blog post.

You can build and deploy functions using the Node.js 20 runtime using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC). You can also use the Node.js 20 container base image if you prefer to build and deploy your functions using container images.

The Node.js 20 runtime empowers developers to build more efficient, powerful, and scalable serverless applications. Try the Node.js runtime in Lambda today and read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 20.

For more serverless learning resources, visit Serverless Land.

Converting Apache Kafka events from Avro to JSON using EventBridge Pipes

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/converting-apache-kafka-events-from-avro-to-json-using-eventbridge-pipes/

This post is written by Pascal Vogel, Solutions Architect, and Philipp Klose, Global Solutions Architect.

Event streaming with Apache Kafka has become an important element of modern data-oriented and event-driven architectures (EDAs), unlocking use cases such as real-time analytics of user behavior, anomaly and fraud detection, and Internet of Things event processing. Stream producers and consumers in Kafka often use schema registries to ensure that all components follow agreed-upon event structures when sending (serializing) and processing (deserializing) events to avoid application bugs and crashes.

A common schema format in Kafka is Apache Avro, which supports rich data structures in a compact binary format. To integrate Kafka with other AWS and third-party services more easily, AWS offers Amazon EventBridge Pipes, a serverless point-to-point integration service. However, many downstream services expect JSON-encoded events, requiring custom, and repetitive schema validation and conversion logic from Avro to JSON in each downstream service.

This blog post shows how to reliably consume, validate, convert, and send Avro events from Kafka to AWS and third-party services using EventBridge Pipes, allowing you to reduce custom deserialization logic in downstream services. You can also use EventBridge event buses as targets in Pipes to filter and distribute events from Pipes to multiple targets, including cross-account and cross-Region delivery.

This blog describes two scenarios:

  1. Using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and AWS Glue Schema Registry.
  2. Using Confluent Cloud and the Confluent Schema Registry.

See the associated GitHub repositories for Glue Schema Registry or Confluent Schema Registry for full source code and detailed deployment instructions.

Kafka event streaming and schema validation on AWS

To build event streaming applications with Kafka on AWS, you can use Amazon MSK, offerings such as Confluent Cloud, or self-hosted Kafka on Amazon Elastic Compute Cloud (Amazon EC2) instances.

To avoid common issues in event streaming and event-driven architectures, such as data inconsistencies and incompatibilities, it is a recommended practice to define and share event schemas between event producers and consumers. In Kafka, schema registries are used to manage, evolve, and enforce schemas for event producers and consumers. The AWS Glue Schema Registry provides a central location to discover, manage, and evolve schemas. In the case of Confluent Cloud, the Confluent Schema Registry serves the same role. Both the Glue Schema Registry and the Confluent Schema Registry support common schema formats such as Avro, Protobuf, and JSON.

To integrate Kafka with AWS services, third-party services, and your own applications, you can use EventBridge Pipes. EventBridge Pipes helps you create point-to-point integrations between event sources and targets with optional filtering, transformation, and enrichment. EventBridge Pipes reduces the amount of integration code that you have to write and maintain when building EDAs.

Many AWS and third-party services expect JSON-encoded payloads (events) as input, meaning they cannot directly consume Avro or Protobuf payloads. To replace repetitive Avro-to-JSON validation and conversion logic in each consumer, you can use the EventBridge Pipes enrichment step. This solution uses an AWS Lambda function in the enrichment step to deserialize and validate Kafka events with a schema registry, including error handling with dead-letter queues, and convert events to JSON before passing them to downstream services.

Solution overview

Architecture overview of the solution

The solution presented in this blog post consists of the following key elements:

  1. The source of the pipe is a Kafka cluster deployed using MSK or Confluent Cloud. EventBridge Pipes reads events from the Kafka stream in batches and sends them to the enrichment function (see here for an example event).
  2. The enrichment step (Lambda function) deserializes and validates the events against the configured schema registry (Glue or Confluent), converts events from Avro to JSON with integrated error handling, and returns them to the pipe.
  3. The target of this example solution is an EventBridge custom event bus that is invoked by EventBridge Pipes with JSON-encoded events returned by the enrichment Lambda function. EventBridge Pipes supports a variety of other targets, including Lambda, AWS Step Functions, Amazon API Gateway, API destinations, and more, enabling you to build EDAs without writing integration code.
  4. In this sample solution, the event bus sends all events to Amazon CloudWatch Logs via an EventBridge rule. You can extend the example to invoke additional EventBridge targets.

Optionally, you can add OpenAPI 3 or JSONSchema Draft 4 schemas for your events in the EventBridge schema registry by either manually generating it from the Avro schema or using EventBridge schema discovery. This allows you to download code bindings for the JSON-converted events for various programming languages, such as JavaScript, Python, and Java, to correctly use them in your EventBridge targets.

The remainder of this blog post describes this solution for the Glue and Confluent schema registries with code examples.

EventBridge Pipes with the Glue Schema Registry

This section describes how to implement event schema validation and conversion from Avro to JSON using EventBridge Pipes and the Glue Schema Registry. You can find the source code and detailed deployment instructions on GitHub.


You need an Amazon MSK serverless cluster running and the Glue Schema registry configured. This example includes a Avro schema and a Glue Schema Registry. See the following AWS blog post for an introduction to schema validation with the Glue Schema Registry: Validate, evolve, and control schemas in Amazon MSK and Amazon Kinesis Data Streams with AWS Glue Schema Registry.

EventBridge Pipes configuration

Use the AWS Cloud Development Kit (AWS CDK) template provided in the GitHub repository to deploy:

  1. An EventBridge pipe that connects to your existing Amazon MSK Serverless Kafka topic as the source via AWS Identity and Access Management (IAM) authentication.
  2. EventBridge Pipes reads events from your Kafka topic using the Amazon MSK source type.
  3. An enrichment Lambda function in Java to perform event deserialization, validation, and conversion from Avro to JSON.
  4. An Amazon Simple Queue Service (Amazon SQS) dead letter queue to hold events for which deserialization failed.
  5. An EventBridge custom event bus as the pipe target. An EventBridge rule sends all incoming events into a CloudWatch Logs log group.

For MSK-based sources, EventBridge supports configuration parameters, such as batch window, batch size, and starting position, which you can set using the parameters of the CfnPipe class in the example CDK stack.

The example EventBridge pipe consumes events from Kafka in batches of 10 because it is targeting an EventBridge event bus, which has a max batch size of 10. See batching and concurrency in the EventBridge Pipes User Guide to choose an optimal configuration for other targets.

EventBridge Pipes with the Confluent Schema Registry

This section describes how to implement event schema validation and conversion from Avro to JSON using EventBridge Pipes and the Confluent Schema Registry. You can find the source code and detailed deployment instructions on GitHub.


To set up this solution, you need a Kafka stream running on Confluent Cloud as well as the Confluent Schema Registry set up. See the corresponding Schema Registry tutorial for Confluent Cloud to set up a schema registry for your Confluent Kafka stream.

To connect to your Confluent Cloud Kafka cluster, you need an API key for Confluent Cloud and Confluent Schema Registry. AWS Secrets Manager is used to securely store your Confluent secrets.

EventBridge Pipes configuration

Use the AWS CDK template provided in the GitHub repository to deploy:

  1. An EventBridge pipe that connects to your existing Confluent Kafka topic as the source via an API secret stored in Secrets Manager.
  2. EventBridge Pipes reads events from your Confluent Kafka topic using the self-managed Apache Kafka stream source type, which includes all non-MSK Kafka clusters.
  3. An enrichment Lambda function in Python to perform event deserialization, validation, and conversion from Avro to JSON.
  4. An SQS dead letter queue to hold events for which deserialization failed.
  5. An EventBridge custom event bus as the pipe target. An EventBridge rule writes all incoming events into a CloudWatch Logs log group.

For self-managed Kafka sources, EventBridge supports configuration parameters, such as batch window, batch size, and starting position, which you can set using the parameters of the CfnPipe class in the example CDK stack.

The example EventBridge pipe consumes events from Kafka in batches of 10 because it is targeting an EventBridge event bus, which has a max batch size of 10. See batching and concurrency in the EventBridge Pipes User Guide to choose an optimal configuration for other targets.

Enrichment Lambda functions

Both of the solutions described previously include an enrichment Lambda function for schema validation and conversion from Avro to JSON.

The Java Lambda function integrates with the Glue Schema Registry using the AWS Glue Schema Registry Library. The Python Lambda function integrates with the Confluent Schema Registry using the confluent-kafka library and uses Powertools for AWS Lambda (Python) to implement Serverless best practices such as logging and tracing.

The enrichment Lambda functions perform the following tasks:

  1. In the events polled from the Kafka stream by the EventBridge pipe, the key and value of the event are base64 encoded. Therefore, for each event in the batch passed to the function, the key and the value are decoded.
  2. The event key is assumed to be serialized by the producer as a string type.
  3. The event value is deserialized using the Glue Schema registry Serde (Java) or the confluent-kafka AvroDeserializer (Python).
  4. The function then returns the successfully converted JSON events to the EventBridge pipe, which then invokes the target for each of them.
  5. Events for which Avro deserialization failed are sent to the SQS dead letter queue.


This blog post shows how to implement event consumption, Avro schema validation, and conversion to JSON using Amazon EventBridge Pipes, Glue Schema Registry, and Confluent Schema Registry.

The source code for the presented example is available in the AWS Samples GitHub repository for Glue Schema Registry and Confluent Schema Registry. For more patterns, visit the Serverless Patterns Collection.

For more serverless learning resources, visit Serverless Land.