Tag Archives: contributed

Python 3.12 runtime now available in AWS Lambda

2023-12-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/python-3-12-runtime-now-available-in-aws-lambda/

This post is written by Jeff Gebhart, Sr. Specialist TAM, Serverless.

AWS Lambda now supports Python 3.12 as both a managed runtime and container base image. Python 3.12 builds on the performance enhancements that were first released with Python 3.11, and adds a number of performance and language readability features in the interpreter. With this release, Python developers can now take advantage of these new features and enhancements when creating serverless applications on AWS Lambda.

You can use Python 3.12 with Powertools for AWS Lambda (Python), a developer toolkit to implement Serverless best practices such as observability, batch processing, Parameter Store integration, idempotency, feature flags, CloudWatch Metrics, and structured logging among other features.

You can also use Python 3.12 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

Python is a popular language for building serverless applications. The Python 3.12 release has a number of interpreter and syntactic improvements.

At launch, new Lambda runtimes receive less usage than existing, established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Lambda runtime changes

Amazon Linux 2023

The Python 3.12 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. This OS update brings several improvements over the Amazon Linux 2 (AL2)-based OS used for Lambda Python runtimes from Python 3.8 to Python 3.11.

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint of less than 40MB compared to over 100MB for Lambda’s AL2-based images.

With glibc version 2.34, customers have access to a modern version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the Python 3.12 base image.

Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Response format change

Starting with the Python 3.12 runtime, functions return Unicode characters as part of their JSON response. Previous versions return escaped sequences for Unicode characters in responses.

For example, in Python 3.11, if you return a Unicode string such as “こんにちは”, it escapes the Unicode characters and returns “\u3053\u3093\u306b\u3061\u306f”. The Python 3.12 runtime returns the original “こんにちは”.

This change reduces the size of the payload returned by Lambda. In the previous example, the escaped version is 32 bytes compared to 17 bytes with the Unicode string. Using Unicode responses reduces the size of Lambda responses, making it easier to fit larger responses into the 6MB Lambda response (synchronous) limit.

When upgrading to Python 3.12, you may need to adjust your code in other modules to account for this new behavior. If the caller expects escaped Unicode based on the previous runtime behavior, you must either add code to the returning function to escape the Unicode manually, or adjust the caller to handle the Unicode return.

Extensions processing for graceful shutdown

Lambda functions with external extensions can now benefit from improved graceful shutdown capabilities. When the Lambda service is about to shut down the runtime, it sends a SIGTERM signal to the runtime and then a SHUTDOWN event to each registered external extension.

These events are sent each time an execution environment shuts down. This allows you to catch the SIGTERM signal in your Lambda function and clean up resources, such as database connections, which were created by the function.

To learn more about the Lambda execution environment lifecycle, see Lambda execution environment. More details and examples of how to use graceful shutdown with extensions is available in the AWS Samples GitHub repository.

New Python features

Comprehension inlining

With the implementation of PEP 709, dictionary, list, and set comprehensions are now inlined. Prior versions create a single-use function to execute such comprehensions. Removing this overhead results in faster comprehension execution by a factor of two.

There are some behavior changes to comprehensions because of this update. For example, a call to the ‘locals()’ function from within the comprehension now includes objects from the containing scope, not just within the comprehension itself as in prior versions. You should test functions you are migrating from an earlier version of Python to Python 3.12.

Typing changes

Python 3.12 continues the evolution of including type annotations to Python. PEP 695 includes a new, more compact syntax for generic classes and functions, and adds a new “type” statement to allow for type alias creation. Type aliases are evaluated on demand. This permits aliases to refer to other types defined later.

Type parameters are visible within the scope of the declaration and any nested scopes, but not in the outer scope.

Formalization of f-strings

One of the largest changes in Python 3.12, the formalization of f-strings syntax, is covered under PEP 701. Any valid expression can now be contained within an f-string, including other f-strings.

In prior versions of Python, reusing quotes within an f-string results in errors. With Python 3.12, quote reuse is fully supported in nested f-strings such as the following example:

>>>songs = ['Take me back to Eden', 'Alkaline', 'Ascensionism']

>>>f"This is the playlist: {", ".join(songs)}"

'This is the playlist: Take me back to Eden, Alkaline, Ascensionism'

Additionally, any valid Python expression can be contained within an f-string. This includes multi-line expressions and the ability to embed comments within an f-string.

Before Python 3.12, the “\” character is not permitted within an f-string. This prevented use of “\N” syntax for defining escaped Unicode characters within the body of an f-string.

Asyncio improvements

There are a number of improvements to the asyncio module. These include performance improvements to writing of sockets and a new implementation of asyncio.current_task() that can yield a 4–6 times performance improvement. Event loops now optimize their child watchers for their underlying environment.

Using Python 3.12 in Lambda

AWS Management Console

To use the Python 3.12 runtime to develop your Lambda functions, specify a runtime parameter value Python 3.12 when creating or updating a function. The Python 3.12 version is now available in the Runtime dropdown in the Create Function page:

To update an existing Lambda function to Python 3.12, navigate to the function in the Lambda console and choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown:

AWS Lambda container image

Change the Python base image version by modifying the FROM statement in your Dockerfile:

FROM public.ecr.aws/lambda/python:3.12
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

Customers running the Python 3.12 Docker images locally, including customers using AWS SAM, must upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM set the Runtime attribute to python3.12 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.12

AWS SAM supports generating this template with Python 3.12 for new serverless applications using the `sam init` command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.PYTHON_3_12 to use this version. In Python CDK:

from constructs import Construct 
from aws_cdk import ( App, Stack, aws_lambda as _lambda )

class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        base_lambda = _lambda.Function(self, 'SampleLambda', 
                                       handler='lambda_handler.handler', 
                                    runtime=_lambda.Runtime.PYTHON_3_12, 
                                 code=_lambda.Code.from_asset('lambda'))

In TypeScript CDK:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The python3.12 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python311LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_12,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'
    })
  }
}

Conclusion

Lambda now supports Python 3.12. This release uses the Amazon Linux 2023 OS, supports Unicode responses, and graceful shutdown for functions with external extensions, and Python 3.12 language features.

You can build and deploy functions using Python 3.12 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC) tool. You can also use the Python 3.12 container base image if you prefer to build and deploy your functions using container images.

Python 3.12 runtime support helps developers to build more efficient, powerful, and scalable serverless applications. Try the Python 3.12 runtime in Lambda today and experience the benefits of this updated language version.

For more serverless learning resources, visit Serverless Land.

Introducing support for read-only management events in Amazon EventBridge

2023-11-18 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/introducing-support-for-read-only-management-events-in-amazon-eventbridge/

This post is written by Pawan Puthran, Principal Specialist TAM, Serverless and Heeki Park, Principal Solutions Architect, Serverless

Today, AWS is announcing support for read-only management events in Amazon EventBridge. This feature enables customers to build rich event-driven responses from any action taken on AWS infrastructure to detect security vulnerabilities or identify suspicious activity in near real-time. You can now gain insight into all activity across all your AWS accounts and respond to those events as is appropriate.

Overview

EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the downstream targets that receive and process the events, and EventBridge routes the events accordingly.

EventBridge allows customers to monitor, audit, and react, in near real-time, to changes in their AWS environments through events generated by AWS CloudTrail for AWS API calls. CloudTrail records actions taken by a user, role, or an AWS service as events in a trail. Events include actions taken in the AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs and APIs.

Previously, only mutating API calls are published from CloudTrail to EventBridge for control plane changes. Events for mutating API calls include those that create, update, or delete resources. Control plane changes are also referred to as management events. EventBridge now supports non-mutating or read-only API calls for management events at no additional cost. These include those that list, get, or describe resources.

CloudTrail events in EventBridge enable you to build rich event-driven responses from any action taken on AWS infrastructure in real time. Previously, customers and partners often used a polling model to iterate over a batch of CloudTrail logs from Amazon S3 buckets to detect issues, making it slower to respond. The launch of read-only management events enables you to detect and remediate issues in near real-time and thus improve the overall security posture.

Enabling read-only management events

You can start receiving read-only management events if a CloudTrail is configured in your account and if the event selector for that CloudTrail is configured with ReadWriteType of either All or ReadOnly. This ensures that the read-only management events are logged in the CloudTrail and are then passed to EventBridge.

For example, you can receive an alert if a production account lists resources from an IP address outside of your VPC. Another example could be if an entity, such as a principal (AWS account root user, IAM role, or IAM user), calls the ListInstanceProfilesForRole or DescribeInstances APIs without any prior record of doing so. A malicious actor could use stolen credentials for conducting this reconnaissance to find more valuable credentials or determine the capabilities of the credentials they have.

To enable read-only management events with an EventBridge rule, in addition to the existing mutating events, use the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state when creating the rule.

Resources:
  SampleMgmtRule:
    Type: 'AWS::Events::Rule'
    Properties:
      Description: 'Example for enabling read-only management events'
      State: ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS
      EventPattern:
        source:
          - aws.s3
        detail:
          - AWS API Call via CloudTrail
      Targets:
        - Arn: 'arn:aws:sns:us-east-1:123456789012:notificationTopic'
          Id: 'NotificationTarget'

You can also create a rule on an event bus to specify a particular API action by using the eventName attribute under the detail key:

aws events put-rule --name "SampleTestRule" \
--event-pattern '{"detail": {"eventName": ["ListBuckets"]}}' \
--state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS --region us-east-1

Common scenarios and use-cases

The following section describes a couple of the scenarios where you can set up EventBridge rules and take actions on non-mutating or read-only management events.

Detecting anomalous Secrets Manager GetSecretValue API Calls

Consider the security team of your organization that wants a notification whenever GetSecretValue API calls for AWS Secrets Manager are made through the CLI. They can then investigate if these calls are made by entities outside their organization or by an unauthorized user and take corrective actions to deny such requests.

When the application calls the GetSecretValue API to retrieve the secrets via a CLI, it generates an event like this:

{
    "version": "0",
    "id": "d3368cc1-e6d6-e4bf-e58e-030f03b6eae3",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.secretsmanager",
    "account": "111111111111",
    "time": "2023-11-08T19:58:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "AAAAAAAAAAAAA",
            "arn": "arn:aws:iam:: 111111111111:user/USERNAME"
            // ... additional detail fields
        },
        "eventTime": "2023-11-08T19:58:38Z",
        "eventSource": "secretsmanager.amazonaws.com",
        "eventName": "GetSecretValue",
        "awsRegion": "us-east-1",
        "userAgent": "aws-cli/2.13.15 Python/3.11.4 Darwin/22.6.0 exe/x86_64 prompt/off command/secretsmanager.get-secret-value"
        // ... additional detail fields
    }
}

You set the following event pattern on the rule to filter incoming events to specific consumers. This example also uses the recently launched wildcard filter for event matching.

{
    "source": [
        "aws.secretsmanager"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "GetSecretValue"
        ],
        "userAgent": [
            {
                "wildcard": "aws-cli/*"
            }
        ]
    }
}

You can create a rule matching a combination of these event properties. In this case, you are matching for aws.secretsmanager as source, AWS API Call via CloudTrail as detail-type, GetSecretValue as detail.eventName and wildcard pattern on detail.userAgent for aws-cli/*. You can filter detail.userAgent with a wildcard to catch events that come from a particular application or user.

You can then route these events to a target like an Amazon CloudWatch Logs stream to record the change. You can also route them to Amazon SNS to get notified via email subscription. You can alternatively route them to an AWS Lambda function in which you perform custom business logic.

Creating an EventBridge rule for read-only management events

Create a rule on the default event bus using the new state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS.

aws events put-rule --name "monitor-secretsmanager" \
--event-pattern '{"source": ["aws.secretsmanager"], "detail-type": ["AWS API Call via CloudTrail"], "detail": {"eventName": ["GetSecretValue"], "userAgent": [{ "wildcard": "aws-cli/*"} ]}}' \
--state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS --region us-east-1

Rule details

Configure a target. In this case, the target service is CloudWatch Logs but you can configure any of the supported targets.

aws events put-targets --rule monitor-secretsmanager --targets Id=1,Arn=arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/events/getsecretvaluelogs --region us-east-1

Target details

You can then use CloudWatch Log Insights to search and analyze log data using the CloudWatch Log Insights query syntax where you can retrieve the user who performed these calls.

Identifying suspicious data exfiltration behavior

Consider the security or data perimeter team who wants to secure data residing in Amazon S3 buckets. The team requires notifications whenever API calls to list S3 buckets or to list S3 objects are made.

When a user or application calls the ListBuckets API to discover the available buckets, it generates the following CloudTrail event:

{
    "version": "0",
    "id": "345ca690-6510-85b2-ff02-090493a33cf1",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.s3",
    "account": "111111111111",
    "time": "2023-11-14T17:25:30Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.09",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "principal-identity-uuid",
            "arn": "arn:aws:iam::111111111111:user/exploited-user",
            "accountId": "111111111111",
            "accessKeyId": "AAAABBBBCCCCDDDDEEEE",
            "userName": "exploited-user"
        },
        "eventTime": "2023-11-14T17:25:30Z",
        "eventSource": "s3.amazonaws.com",
        "eventName": "ListBuckets",
        "awsRegion": "us-east-1",
        "sourceIPAddress": "11.22.33.44",
        "userAgent": "[aws-cli/2.13.29 Python/3.11.6 Darwin/22.6.0 exe/x86_64 prompt/off command/s3api.list-buckets]",
        "requestParameters": {
            "Host": "s3.us-east-1.amazonaws.com"
        },
        "readOnly": true,
        "eventType": "AwsApiCall",
        "managementEvent": true
        // additional detail fields
    }
}

In this scenario, you can create an EventBridge rule matching for aws.s3 for the source field, and ListBuckets for the eventName.

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "ListBuckets "
        ]
    }
}

However, listing objects alone might only be the beginning of a potential data exfiltration attempt. You may also want to check for ListObjects or ListObjectsV2 as the next action, followed by a large number of GetObject API calls. You can create the following rule to match those actions.

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "ListObjects",
            "ListObjectsV2",
            "GetObject"
        ]
    }
}

You could potentially forward this log information to your central security logging solution or use anomaly detection machine learning models to evaluate these events to determine the appropriate response to these events.

Configuring cross-account and cross-Region event routing

You can also create rules to receive the read-only events to cross account or cross-Region to centralize your AWS events into one Region or one AWS account for auditing and monitoring purposes. For example, capture all workload events from multiple Regions in eu-west-1 for compliance reporting.

Cross Account example for default event bus and custom event bus

To do this, create a rule using the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state on the default bus of the source account or the Region, targeting either default event bus or a custom event bus of the target account or Region. You must also ensure you have a rule configured with ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS to be able to invoke the targets in the destination account or Region.

Cross-Region setup for CloudTrail read-only events

Conclusion

This blog shows how customers can build rich event-driven responses with the newly launched support for read-only events. You can now observe events as potential signals of reconnaissance and data exfiltration activities from any action taken on AWS infrastructure in near real time. You can also use the cross-Region and cross-account functionality to deliver the read-only events to a centralized AWS account or Region, enhancing the capability for auditing and monitoring across all your AWS environments.

For more serverless learning resources, visit Serverless Land.

Introducing advanced logging controls for AWS Lambda functions

2023-11-16 David Boyne

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/introducing-advanced-logging-controls-for-aws-lambda-functions/

This post is written by Nati Goldberg, Senior Solutions Architect and Shridhar Pandey, Senior Product Manager, AWS Lambda

Today, AWS is launching advanced logging controls for AWS Lambda, giving developers and operators greater control over how function logs are captured, processed, and consumed.

This launch introduces three new capabilities to provide a simplified and enhanced default logging experience on Lambda.

First, you can capture Lambda function logs in JSON structured format without having to use your own logging libraries. JSON structured logs make it easier to search, filter, and analyze large volumes of log entries.

Second, you can control the log level granularity of Lambda function logs without making any code changes, enabling more effective debugging and troubleshooting.

Third, you can also set which Amazon CloudWatch log group Lambda sends logs to, making it easier to aggregate and manage logs at scale.

Overview

Being able to identify and filter relevant log messages is essential to troubleshoot and fix critical issues. To help developers and operators monitor and troubleshoot failures, the Lambda service automatically captures and sends logs to CloudWatch Logs.

Previously, Lambda emitted logs in plaintext format, also known as unstructured log format. This unstructured format could make the logs challenging to query or filter. For example, you had to search and correlate logs manually using well-known string identifiers such as “START”, “END”, “REPORT” or the request id of the function invocation. Without a native way to enrich application logs, you needed custom work to extract data from logs for automated analysis or to build analytics dashboards.

Previously, operators could not control the level of log detail generated by functions. They relied on application development teams to make code changes to emit logs with the required granularity level, such as INFO, DEBUG, or ERROR.

Lambda-based applications often comprise microservices, where a single microservice is composed of multiple single-purpose Lambda functions. Before this launch, Lambda sent logs to a default CloudWatch log group created with the Lambda function with no option to select a log group. Now you can aggregate logs from multiple functions in one place so you can uniformly apply security, governance, and retention policies to your logs.

Capturing Lambda logs in JSON structured format

Lambda now natively supports capturing structured logs in JSON format as a series of key-value pairs, making it easier to search and filter logs more easily.

JSON also enables you to add custom tags and contextual information to logs, enabling automated analysis of large volumes of logs to help understand the function performance. The format adheres to the OpenTelemetry (OTel) Logs Data Model, a popular open-source logging standard, enabling you to use open-source tools to monitor functions.

To set the log format in the Lambda console, select the Configuration tab, choose Monitoring and operations tools on the left pane, then change the log format property:

Currently, Lambda natively supports capturing application logs (logs generated by the function code) and system logs (logs generated by the Lambda service) in JSON structured format.

This is for functions that use non-deprecated versions of Python, Node.js, and Java Lambda managed runtimes, when using Lambda recommended logging methods such as using logging library for Python, console object for Node.js, and LambdaLogger or Log4j for Java.

For other managed runtimes, Lambda currently only supports capturing system logs in JSON structured format. However, you can still capture application logs in JSON structured format for these runtimes by manually configuring logging libraries. See configuring advanced logging controls section in the Lambda Developer Guide to learn more. You can also use Powertools for AWS Lambda to capture logs in JSON structured format.

Changing log format from text to JSON can be a breaking change if you parse logs in a telemetry pipeline. AWS recommends testing any existing telemetry pipelines after switching log format to JSON.

Working with JSON structured format for Node.js Lambda functions

You can use JSON structured format with CloudWatch Embedded Metric Format (EMF) to embed custom metrics alongside JSON structured log messages, and CloudWatch automatically extracts the custom metrics for visualization and alarming. However, to use JSON log format along with EMF libraries for Node.js Lambda functions, you must use the latest version of the EMF client library for Node.js or the latest version of Powertools for AWS Lambda (TypeScript) library.

Configuring log level granularity for Lambda function

You can now filter Lambda logs by log level, such as ERROR, DEBUG, or INFO, without code changes. Simplified log level filtering enables you to choose the required logging granularity level for Lambda functions, without sifting through large volumes of logs to debug errors.

You can specify separate log level filters for application logs (which are logs generated by the function code) and system logs (which are logs generated by the Lambda service, such as START and REPORT log messages). Note that log level controls are only available if the log format of the function is set to JSON.

The Lambda console allows setting both the Application log level and System log level properties:

You can define the granularity level of each log event in your function code. The following statement prints out the event input of the function, emitted as a DEBUG log message:

console.debug(event);

Once configured, log events emitted with a lower log level than the one selected are not published to the function’s CloudWatch log stream. For example, setting the function’s log level to INFO results in DEBUG log events being ignored.

This capability allows you to choose the appropriate amount of logs emitted by functions. For example, you can set a higher log level to improve the signal-to-noise ratio in production logs, or set a lower log level to capture detailed log events for testing or troubleshooting purposes.

Customizing Lambda function’s CloudWatch log group

Previously, you could not specify a custom CloudWatch log group for functions, so you could not stream logs from multiple functions into a shared log group. Also, to set custom retention policy for multiple log groups, you had to create each log group separately using a pre-defined name (for example, /aws/lambda/<function name>).

Now you can select a custom CloudWatch log group to aggregate logs from multiple functions automatically within an application in one place. You can apply security, governance, and retention policies at the application level instead of individually to every function.

To distinguish between logs from different functions in a shared log group, each log stream contains the Lambda function name and version.

You can share the same log group between multiple functions to aggregate logs together. The function’s IAM policy must include the logs:CreateLogStream and logs:PutLogEvents permissions for Lambda to create logs in the specified log group. The Lambda service can optionally create these permissions, when you configure functions in the Lambda console.

You can set the custom log group in the Lambda console by entering the destination log group name. If the entered log group does not exist, Lambda creates it automatically.

Advanced logging controls for Lambda can be configured using Lambda API, AWS Management Console, AWS Command Line Interface (CLI), and infrastructure as code (IaC) tools such as AWS Serverless Application Model (AWS SAM) and AWS CloudFormation.

Example of Lambda advanced logging controls

This section demonstrates how to use the new advanced logging controls for Lambda using AWS SAM to build and deploy the resources in your AWS account.

Overview

The following diagram shows Lambda functions processing newly created objects inside an Amazon S3 bucket, where both functions emit logs into the same CloudWatch log group:

The architecture includes the following steps:

A new object is created inside an S3 bucket.
S3 publishes an event using S3 Event Notifications to Amazon EventBridge.
EventBridge triggers two Lambda functions asynchronously.
Each function processes the object to extract labels and text, using Amazon Rekognition and Amazon Textract.
Both functions then emit logs into the same CloudWatch log group.

This uses AWS SAM to define the Lambda functions and configure the required logging controls. The IAM policy allows the function to create a log stream and emit logs to the selected log group:

DetectLabelsFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: detect-labels/
      Handler: app.lambdaHandler
      Runtime: nodejs18.x
      Policies:
        ...
        - Version: 2012-10-17
          Statement:
            - Sid: CloudWatchLogGroup
              Action: 
                - logs:CreateLogStream
                - logs:PutLogEvents
              Resource: !GetAtt CloudWatchLogGroup.Arn
              Effect: Allow
      LoggingConfig:
        LogFormat: JSON 
        ApplicationLogLevel: DEBUG 
        SystemLogLevel: INFO 
        LogGroup: !Ref CloudWatchLogGroup

Deploying the example

To deploy the example:

Clone the GitHub repository and explore the application.

git clone https://github.com/aws-samples/advanced-logging-controls-lambda/

cd advanced-logging-controls-lambda

Use AWS SAM to build and deploy the resources to your AWS account. This compiles and builds the application using npm, and then populate the template required to deploy the resources:
```
sam build
```
Deploy the solution to your AWS account with a guided deployment, using AWS SAM CLI interactive flow:
```
sam deploy --guided
```
Enter the following values:
- Stack Name: advanced-logging-controls-lambda
- Region: your preferred Region (for example, us-east-1)
- Parameter UploadsBucketName: enter a unique bucket name.
- Accept the rest of the initial defaults.
To test the application, use the AWS CLI to copy the sample image into the S3 bucket you created.
```
aws s3 cp samples/skateboard.jpg s3://example-s3-images-bucket
```

Explore CloudWatch Logs to view the logs emitted into the log group created, AggregatedLabelsLogGroup:

The DetectLabels Lambda function emits DEBUG log events in JSON format to the log stream. Log events with the same log level from the ExtractText Lambda function are omitted. This is a result of the different application log level settings for each function (DEBUG and INFO).

You can also use CloudWatch Logs Insights to search, filter, and analyze the logs in JSON format using this sample query:

You can see the results:

Conclusion

Advanced logging controls for Lambda give you greater control over logging. Use advanced logging controls to control your Lambda function’s log level and format, allowing you to search, query, and filter logs to troubleshoot issues more effectively.

You can also choose the CloudWatch log group where Lambda sends your logs. This enables you to aggregate logs from multiple functions into a single log group, apply retention, security, governance policies, and easily manage logs at scale.

To get started, specify the required settings in the Logging Configuration for any new or existing Lambda functions.

Advanced logging controls for Lambda are available in all AWS Regions where Lambda is available at no additional cost. Learn more about AWS Lambda Advanced Logging Controls.

For more serverless learning resources, visit Serverless Land.

Introducing the AWS Integrated Application Test Kit (IATK)

2023-11-16 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/aws-integrated-application-test-kit/

This post is written by Dan Fox, Principal Specialist Solutions Architect, and Brian Krygsman, Senior Solutions Architect.

Today, AWS announced the public preview launch of the AWS Integrated Application Test Kit (IATK). AWS IATK is a software library that helps you write automated tests for cloud-based applications. This blog post presents several initial features of AWS IATK, and then shows working examples using an example video processing application. If you are getting started with serverless testing, learn more at serverlessland.com/testing.

Overview

When you create applications composed of serverless services like AWS Lambda, Amazon EventBridge, or AWS Step Functions, many of your architecture components cannot be deployed to your desktop, but instead only exist in the AWS Cloud. In contrast to working with applications deployed locally, these types of applications benefit from cloud-based strategies for performing automated tests. For its public preview launch, AWS IATK helps you implement some of these strategies for Python applications. AWS IATK will support other languages in future launches.

Locating resources for tests

When you write automated tests for cloud resources, you need the physical IDs of your resources. The physical ID is the name AWS assigns to a resource after creation. For example, to send requests to Amazon API Gateway you need the physical ID, which forms the API endpoint.

If you deploy cloud resources in separate infrastructure as code stacks, you might have difficulty locating physical IDs. In CloudFormation, you create the logical IDs of the resources in your template, as well as the stack name. With IATK, you can get the physical ID of a resource if you provide the logical ID and stack name. You can also get stack outputs by providing the stack name. These convenient methods simplify locating resources for the tests that you write.

Creating test harnesses for event driven architectures

To write integration tests for event driven architectures, establish logical boundaries by breaking your application into subsystems. Your subsystems should be simple enough to reason about, and contain understandable inputs and outputs. One useful technique for testing subsystems is to create test harnesses. Test harnesses are resources that you create specifically for testing subsystems.

For example, an integration test can begin a subsystem process by passing an input test event to it. IATK can create a test harness for you that listens to Amazon EventBridge for output events. (Under the hood, the harness is composed of an EventBridge Rule that forwards the output event to Amazon Simple Queue Service.) Your integration test then queries the test harness to examine the output and determine if the test passes or fails. These harnesses help you create integration tests in the cloud for event driven architectures.

Establishing service level agreements to test asynchronous features

If you write a synchronous service, your automated tests make requests and expect immediate responses. When your architecture is asynchronous, your service accepts a request and then performs a set of actions at a later time. How can you test for the success of an activity if it does not have a specified duration?

Consider creating reasonable timeouts for your asynchronous systems. Document timeouts as service level agreements (SLAs). You may decide to publish your SLAs externally or to document them as internal standards. IATK contains a polling feature that allows you to establish timeouts. This feature helps you to test that your asynchronous systems complete tasks in a timely manner.

Using AWS X-Ray for detailed testing

If you want to gain more visibility into the interior details of your application, instrument with AWS X-Ray. With AWS X-Ray, you trace the path of an event through multiple services. IATK provides conveniences that help you set the AWS X-Ray sampling rate, get trace trees, and assert for trace durations. These features help you observe and test your distributed systems in greater detail.

Learn more about testing asynchronous architectures at aws-samples/serverless-test-samples.

Overview of the example application

To demonstrate the features of IATK, this post uses a portion of a serverless video application designed with a plugin architecture. A core development team creates the primary application. Distributed development teams throughout the organization create the plugins. One AWS CloudFormation stack deploys the primary application. Separate stacks deploy each plugin.

Communications between the primary application and the plugins are managed by an EventBridge bus. Plugins pull application lifecycle events off the bus and must put completion notification events back on the bus within 20 seconds. For testing, the core team has created an AWS Step Functions workflow that mimics the production process by emitting properly formatted example lifecycle events. Developers run this test workflow in development and test environments to verify that their plugins are communicating properly with the event bus.

The following demonstration shows an integration test for the example application that validates plugin behavior. In the integration test, IATK locates the Step Functions workflow. It creates a test harness to listen for the event completion notification to be sent by the plugin. The test then runs the workflow to begin the lifecycle process and start plugin actions. Then IATK uses a polling mechanism with a timeout to verify that the plugin complies with the 20 second service level agreement. This is the sequence of processing:

The integration test starts an execution of the test workflow.
The workflow puts a lifecycle event onto the bus.
The plugin pulls the lifecycle event from the bus.
When the plugin is complete, it puts a completion event onto the bus.
The integration test polls for the completion event to determine if the test passes within the SLA.

Deploying and testing the example application

Follow these steps to review this application, build it locally, deploy it in your AWS account, and test it.

Downloading the example application

Open your terminal and clone the example application from GitHub with the following command or download the code. This repository also includes other example patterns for testing serverless applications.
```
git clone https://github.com/aws-samples/serverless-test-samples
```
The root of the IATK example application is in python-test-samples/integrated-application-test-kit. Change to this directory:
```
cd serverless-test-samples/python-test-samples/integrated-application-test-kit
```

Reviewing the integration test

Before deploying the application, review how the integration test uses the IATK by opening plugins/2-postvalidate-plugins/python-minimal-plugin/tests/integration/test_by_polling.py in your text editor. The test class instantiates the IATK at the top of the file.

iatk_client = aws_iatk.AwsIatk(region=aws_region)

In the setUp() method, the test class uses IATK to fetch CloudFormation stack outputs. These outputs are references to deployed cloud components like the plugin tester AWS Step Functions workflow:

stack_outputs = self.iatk_client.get_stack_outputs(
    stack_name=self.plugin_tester_stack_name,
    output_names=[
        "PluginLifecycleWorkflow",
        "PluginSuccessEventRuleName"
    ],
)

The test class attaches a listener to the default event bus using an Event Rule provided in the stack outputs. The test uses this listener later to poll for events.

add_listener_output = self.iatk_client.add_listener(
    event_bus_name="default",
    rule_name=self.existing_rule_name
)

The test class cleans up the listener in the tearDown() method.

self.iatk_client.remove_listeners(
    ids=[self.listener_id]
)

Once the configurations are complete, the method test_minimal_plugin_event_published_polling() implements the actual test.

The test first initializes the trigger event.

trigger_event = {
    "eventHook": "postValidate",
    "pluginTitle": "PythonMinimalPlugin"
}

Next, the test starts an execution of the plugin tester Step Functions workflow. It uses the plugin_tester_arn that was fetched during setUp.

self.step_functions_client.start_execution(
    stateMachineArn=self.plugin_tester_arn,
    input=json.dumps(trigger_event)
)

The test polls the listener, waiting for the plugin to emit events. It stops polling once it hits the SLA timeout or receives the maximum number of messages.

poll_output = self.iatk_client.poll_events(
    listener_id=self.listener_id,
    wait_time_seconds=self.SLA_TIMEOUT_SECONDS,
    max_number_of_messages=1,
)

Finally, the test asserts that it receives the right number of events, and that they are well-formed.

self.assertEqual(len(poll_output.events), 1)
self.assertEqual(received_event["source"], "video.plugin.PythonMinimalPlugin")
self.assertEqual(received_event["detail-type"], "plugin-complete")

Installing prerequisites

You need the following prerequisites to build this example:

An AWS account
The AWS Serverless Application Model (AWS SAM) CLI with credentials that can manage AWS resources
Python 3.11
Node.js 18.x
Docker (optional, but recommended for building Python applications with AWS SAM)

Build and deploy the example application components

Use AWS SAM to build and deploy the plugin tester to your AWS account. The plugin tester is the Step Functions workflow shown in the preceding diagram. During the build process, you can add the --use-container flag to the build command to instruct AWS SAM to create the application in a provided container. You can accept or override the default values during the deploy process. You will use “Stack Name” and “AWS Region” later to run the integration test.
```
cd plugins/plugin_tester # Move to the plugin tester directory

sam build --use-container # Build the plugin tester
```

Deploy the tester:

sam deploy --guided # Deploy the plugin tester

Once the plugin tester is deployed, use AWS SAM to deploy the plugin.

cd ../2-postvalidate-plugins/python-minimal-plugin # Move to the plugin directory

sam build --use-container # Build the plugin

Deploy the plugin:
```
sam deploy --guided # Deploy the plugin
```

Running the test

You can run tests written with IATK using standard Python test runners like unittest and pytest. The example application test uses unittest.

1. Use a virtual environment to organize your dependencies. From the root of the example application, run:
```
python3 -m venv .venv # Create the virtual environment
source .venv/bin/activate # Activate the virtual environment
```
2. Install the dependencies, including the IATK:
```
cd tests 
pip3 install -r requirements.txt
```
3. Run the test, providing the required environment variables from the earlier deployments. You can find correct values in the samconfig.toml file of the plugin_tester directory.
```
cd integration

PLUGIN_TESTER_STACK_NAME=video-plugin-tester \
AWS_REGION=us-west-2 \
python3 -m unittest ./test_by_polling.py
```

You should see output as unittest runs the test.

Open the Step Functions console in your AWS account, then choose the PluginLifecycleWorkflow-<random value> workflow to validate that the plugin tester successfully ran. A recent execution shows a Succeeded status:

Review other IATK features

The example application includes examples of other IATK features like generating mock events and retrieving AWS X-Ray traces.

Cleaning up

Use AWS SAM to clean up both the plugin and the plugin tester resources from your AWS account.

Delete the plugin resources:

cd ../.. # Move to the plugin directory
sam delete # Delete the plugin

Delete the plugin tester resources:

cd ../../plugin_tester # Move to the plugin tester directory
sam delete # Delete the plugin tester

The temporary test harness resources that IATK created during the test are cleaned up when the tearDown method runs. If there are problems during teardown, some resources may not be deleted. IATK adds tags to all resources that it creates. You can use these tags to locate the resources then manually remove them. You can also add your own tags.

Conclusion

The AWS Integrated Application Test Kit is a software library that provides conveniences to help you write automated tests for your cloud applications. This blog post shows some of the features of the initial Python version of the IATK.

To learn more about automated testing for serverless applications, visit serverlessland.com/testing. You can also view code examples at serverlessland.com/testing/patterns or at the AWS serverless-test-samples repository on GitHub.

For more serverless learning resources, visit Serverless Land.

Triggering AWS Lambda function from a cross-account Amazon Managed Streaming for Apache Kafka

2023-11-16 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/triggering-aws-lambda-function-from-a-cross-account-amazon-managed-streaming-for-apache-kafka/

This post is written by Subham Rakshit, Senior Specialist Solutions Architect, and Ismail Makhlouf, Senior Specialist Solutions Architect.

Many organizations use a multi-account strategy for stream processing applications. This involves decomposing the overall architecture into a single producer account and many consumer accounts. Within AWS, in the producer account, you can use Amazon Managed Streaming for Apache Kafka (Amazon MSK), and in their consumer accounts have AWS Lambda functions for event consumption. This blog post explains how you can trigger Lambda functions from a cross-account Amazon MSK cluster.

The Lambda event sourcing mapping (ESM) for Amazon MSK continuously polls for new events from the Amazon MSK cluster, aggregates them into batches, and then triggers the target Lambda function. The ESM for Amazon MSK functions as a serverless set of Kafka consumers that ensures that each event is processed at least once. Additionally, events are processed in the same order they are received within each Kafka partition. In addition, the ESM batches the stream of data and filters the events based on configured logic.

Overview

Amazon MSK supports two different deployment types: provisioned and serverless. Triggering a Lambda function from a cross-account Amazon MSK cluster is only supported with a provisioned cluster deployed within the same Region. To facilitate this functionality, Amazon MSK uses multi-VPC private connectivity, powered by AWS PrivateLink, which simplifies connecting Kafka consumers hosted in different AWS accounts to an Amazon MSK cluster.

The following diagram illustrates the architecture of this example:

The architecture is divided in two parts: the producer and the consumer.

In the producer account, you have the Amazon MSK cluster with multi-VPC connectivity enabled. Multi-VPC connectivity is only available for authenticated Amazon MSK clusters. Cluster policies are required to grant permissions to other AWS accounts, allowing them to establish private connectivity to the Amazon MSK cluster. You can delegate permissions to relevant roles or users. When combined with AWS Identity and Access Management (IAM) client authentication, cluster policies offer fine-grained control over Kafka data plane permissions for connecting applications.

In the consumer account, you have the Lambda ESM for Amazon MSK and the managed VPC connection deployed within the same VPC. The managed VPC connection allows private connectivity from the consumer application VPC to the Amazon MSK cluster. The Lambda ESM for Amazon MSK connects to the cross-account Amazon MSK cluster via IAM authentication. It also supports SASL/SCRAM, and mutual TLS (mTLS) authenticated clusters. The ESM receives the event from the Kafka topic and invokes the Lambda function to process it.

Deploying the example application

To set up the Lambda function trigger from a cross-account Amazon MSK cluster as the event source, follow these steps. The AWS CloudFormation templates for deploying the example are accessible in the GitHub repository.

As a part of this example, some sample data is published using the Kafka console producer and Lambda processes these events and writes to Amazon S3.

Pre-requisites

For this example, you need two AWS accounts. This post uses the following naming conventions:

Producer (for example, account No: 1111 1111 1111): Account that hosts the Amazon MSK cluster and Kafka client instance.
Consumer (for example, account No: 2222 2222 2222): Account that hosts the Lambda function and consumes events from Amazon MSK.

To get started:

Clone the repository locally:
git clone https://github.com/aws-samples/lambda-cross-account-msk.git
Set up the producer account: you must configure the VPC networking, deploy the Amazon MSK cluster, and a Kafka client instance to publish data. To do this, deploy the CloudFormation template producer-account.yaml from the AWS console and take note of the MSKClusterARN from the CloudFormation outputs tab.
Set up the consumer account: To set up the consumer account, you need the Lambda function, IAM role used by the Lambda function, and S3 bucket receiving the data. For this, deploy the CloudFormation template consumer-account.yaml from the AWS console with the input parameter MSKAccountId, that is the producer AWS account ID (for example, account Id: 1111 1111 1111). Note the LambdaRoleArn from the CloudFormation outputs tab.

Setting up multi-VPC connectivity in the Amazon MSK cluster

Once the accounts are created, you must enable connectivity between them. By enabling multi-VPC private connectivity in the Amazon MSK cluster, you set up the network connection to allow the cross-account consumers to connect to the cluster.

In the producer account, navigate to the Amazon MSK console.
Choose producer-cluster, and go to the Properties tab.
Scroll to Networking settings, choose Edit, and select Turn on multi-VPC connectivity. This takes some time, then appears as follows.
Add the necessary cluster policy to allow cross-account consumers to connect to Amazon MSK. In the producer account, deploy the CloudFormation template producer-msk-cluster-policy.yaml from the AWS console with the following input parameters:

- MSKClusterArn – Amazon Resource Name (ARN) of the Amazon MSK cluster in producer account. Find this information in the CloudFormation output of producer-account.yaml.
- LambdaRoleArn – ARN of the IAM role attached to the Lambda function in the consumer account. Find this information in the CloudFormation output of consumer-account.yaml.
- LambdaAccountId – Consumer AWS account ID (for example, account Id: 2222 2222 2222).

Creating a Kafka topic in Amazon MSK and publishing events

In the producer account, navigate to the Amazon MSK console. Choose the Amazon MSK cluster named producer-cluster. Choose View client information to show the bootstrap server.

The CloudFormation template also deploys a Kafka client instance to create topics and publish events.

To access the client, go to the Amazon EC2 console and choose the instance producer-KafkaClientInstance1. Connect to EC2 instance with Session Manager:

sudo su - ec2-user
#Set MSK Broker IAM endpoint
export BS=<<Provide IAM bootstrap address here>>

You must use the single-VPC Private endpoint for the Amazon MSK cluster and not the multi-VPC private endpoint, as you are going to publish events from a Kafka console producer from the producer account.

Run these scripts to create the customer topic and publish sample events in the topic:

./kafka_create_topic.sh
./kafka_produce_events.sh

Creating a managed VPC connection in the consumer account

To establish a connection to the Amazon MSK cluster in the producer account, you must create a managed VPC connection in the consumer account. Lambda communicates with cross-account Amazon MSK through this managed VPC connection.

For detailed setup steps, read the Amazon MSK managed VPC connection documentation.

Configuring the Lambda ESM for Amazon MSK

The final step is to set up the Lambda ESM for Amazon MSK. Setting up the ESM enables you to connect to the Amazon MSK cluster in the producer account via the managed VPC endpoint. This allows you to trigger the Lambda function to process the data produced from the Kafka topic:

In the consumer account, go to the Lambda console.
Open the Lambda function msk-lambda-cross-account-iam.
Go to the Configuration tab, select Triggers, and choose Add Trigger.
For Trigger configuration, select Amazon MSK.

To configure this trigger:

Select the shared Amazon MSK cluster. This automatically defaults to the IAM authentication that is used to connect to the cluster.
By default, the Active trigger check box is enabled. This ensures that the trigger is in the active state after creation. For the other values:
1. Keep the Batch size default to 100.
2. Change the Starting Position to Trim horizon.
3. Set the Topic name as customer.
4. Set the Consumer Group ID as msk-lambda-iam.

Scroll to the bottom and choose Add. This starts creating the Amazon MSK trigger, which takes several minutes. After creation, the state of the trigger shows as Enabled.

Verifying the output on the consumer side

The Lambda function receives the events and writes them in an S3 bucket.

To validate that the function is working, go to the consumer account and navigate to the S3 console. Search for the cross-account-lambda-consumer-data-<<REGION>>-<<AWS Account Id>> bucket. In the bucket, you see the customer-data-<<datetime>>.csv files.

Cleaning up

You must empty and delete the S3 bucket, managed VPC connection, and the Lambda ESM for Amazon MSK manually from the consumer account. Next, delete the CloudFormation stacks from the AWS console from both the producer and consumer accounts to remove all other resources created as a part of the example.

Conclusion

With Lambda and Amazon MSK, you can now build a decentralized application distributed across multiple AWS accounts. This post shows how you can set up Amazon MSK as an event source for cross-account Lambda functions and also walks you through the configuration required in both producer and consumer accounts.

For further reading on AWS Lambda with Amazon MSK as an event source, visit the documentation.

For more serverless learning resources, visit Serverless Land.

Node.js 20.x runtime now available in AWS Lambda

2023-11-15 Pascal Vogel

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/node-js-20-x-runtime-now-available-in-aws-lambda/

This post is written by Pascal Vogel, Solutions Architect, and Andrea Amorosi, Senior Solutions Architect.

You can now develop AWS Lambda functions using the Node.js 20 runtime. This Node.js version is in active LTS status and ready for general use. To use this new version, specify a runtime parameter value of nodejs20.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 20 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes proven libraries to support common patterns such as observability, Parameter Store integration, idempotency, batch processing, and more.

You can also use Node.js 20 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 20 runtime in your serverless applications.

Node.js 20 runtime updates

Changes to Root CA certificate loading

By default, Node.js includes root certificate authority (CA) certificates from well-known certificate providers. Earlier Lambda Node.js runtimes up to Node.js 18 augmented these certificates with Amazon-specific CA certificates, making it easier to create functions accessing other AWS services. For example, it included the Amazon RDS certificates necessary for validating the server identity certificate installed on your Amazon RDS database.

However, loading these additional certificates has a performance impact during cold start. Starting with Node.js 20, Lambda no longer loads these additional CA certificates by default. The Node.js 20 runtime contains a certificate file with all Amazon CA certificates located at /var/runtime/ca-cert.pem. By setting the NODE_EXTRA_CA_CERTS environment variable to /var/runtime/ca-cert.pem, you can restore the behavior from Node.js 18 and earlier runtimes.

This causes Node.js to validate and load all Amazon CA certificates during a cold start. It can take longer compared to loading only specific certificates. For the best performance, we recommend bundling only the certificates that you need with your deployment package and loading them via NODE_EXTRA_CA_CERTS. The certificates file should consist of one or more trusted root or intermediate CA certificates in PEM format.

For example, for RDS, include the required certificates alongside your code as certificates/rds.pem and then load it as follows:

NODE_EXTRA_CA_CERTS=/var/task/certificates/rds.pem

See Using Lambda environment variables in the AWS Lambda Developer Guide for detailed instructions for setting environment variables.

Amazon Linux 2023

The Node.js 20 runtime is based on the provided.al2023 runtime. The provided.al2023 runtime in turn is based on the Amazon Linux 2023 minimal container image release and brings several improvements over Amazon Linux 2 (AL2).

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint with a compressed image size of less than 40MB compared to the over 100MB AL2-based base image.

With glibc version 2.34, customers have access to a more recent version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as package manager, symlinked as dnf, replacing yum in AL2-based images. Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Runtime Interface Client

The Node.js 20 runtime uses the open source AWS Lambda NodeJS Runtime Interface Client (RIC). You can now use the same RIC version in your Open Container Initiative (OCI) Lambda container images as the one used by the managed Node.js 20 runtime.

The Node.js 20 runtime supports Lambda response streaming which enables you to send response payload data to callers as it becomes available. Response streaming can improve application performance by reducing time-to-first-byte, can indicate progress during long-running tasks, and allows you to build functions that return payloads larger than 6MB, which is the Lambda limit for buffered responses.

Setting Node.js heap memory size

Node.js allows you to configure the heap size of the v8 engine via the --max-old-space-size and --max-semi-space-size options. By default, Lambda overrides the Node.js default values with values derived from the memory size configured for the function. If you need control over your runtime’s memory allocation, you can now set both of these options using the NODE_OPTIONS environment variable, without needing an exec wrapper script. See Using Lambda environment variables in the AWS Lambda Developer Guide for details.

Use the --max-old-space-size option to set the max memory size of V8’s old memory section, and the --max-semi-space-size option to set the maximum semispace size for V8’s garbage collector. See the Node.js documentation for more details on these options.

Node.js 20 language updates

Language features

With this release, Lambda customers can take advantage of new Node.js 20 language features, including:

HTTP(S)/1.1 default keepAlive: Node.js now sets keepAlive to true by default. Any outgoing HTTPs connections use HTTP 1.1 keep-alive with a default waiting window of 5 seconds. This can deliver improved throughput as connections are reused by default.
Fetch API is enabled by default: The global Node.js Fetch API is enabled by default. However, it is still an experimental module.
Faster URL parsing: Node.js 20 comes with the Ada 2.0 URL parser which brings performance improvements to URL parsing. This has also been back-ported to Node.js 18.7.0.
Web Crypto API now stable: The Node.js implementation of the standard Web Crypto API has been marked as stable. You can access the provided cryptographic primitives through globalThis.crypto.
Web assembly support: Node.js 20 enables the experimental WebAssembly System Interface (WASI) API by default without the need to set an experimental flag.

For a detailed overview of Node.js 20 language features, see the Node.js 20 release blog post and the Node.js 20 changelog.

Performance considerations

Node.js 19.3 introduced a change that impacts how non-essential modules are lazy-loaded during the Node.js process startup. In terms of the impact to your Lambda functions, this reduces the work during initialization of each execution environment, then if used, the modules will instead be loaded during the first function invoke. This change remains in Node.js 20.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

Migration from Node.js 16

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

In the case of Node.js 16, we have delayed deprecation from the community end of support date on September 11, 2023, to June 12, 2024. This gives customers the opportunity to migrate directly from Node.js 16 to Node.js 20, skipping Node.js 18.

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2020. Starting with Node.js 18, and continuing with Node.js 20, the Lambda Node.js runtimes have upgraded the version of the AWS SDK for JavaScript included in the runtime from v2 to v3. Customers upgrading from Node.js 16 or earlier runtimes who are using the included AWS SDK for JavaScript v2 should upgrade their code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Using the Node.js 20 runtime in AWS Lambda

AWS Management Console

To use the Node.js 20 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 20.x when creating or updating a function. The Node.js 20 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

To update an existing Lambda function to Node.js 20, navigate to the function in the Lambda console, then choose Edit in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

AWS Lambda – Container Image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile:


FROM public.ecr.aws/lambda/nodejs:20
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

Customers running Node.js 20 Docker images locally, including customers using AWS SAM, will need to upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node20.x to use this version:


AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs20.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

AWS Cloud Development Kit (AWS CDK)

In the AWS CDK, set the runtime attribute to Runtime.NODEJS_20_X to use this version:


import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 20 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node20LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_20_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}

Conclusion

Lambda now supports Node.js 20. This release uses the Amazon Linux 2023 OS, supports configurable CA certificate loading for faster cold starts, as well as other improvements detailed in this blog post.

You can build and deploy functions using the Node.js 20 runtime using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC). You can also use the Node.js 20 container base image if you prefer to build and deploy your functions using container images.

The Node.js 20 runtime empowers developers to build more efficient, powerful, and scalable serverless applications. Try the Node.js runtime in Lambda today and read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 20.

For more serverless learning resources, visit Serverless Land.

Converting Apache Kafka events from Avro to JSON using EventBridge Pipes

2023-11-15 Pascal Vogel

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/converting-apache-kafka-events-from-avro-to-json-using-eventbridge-pipes/

This post is written by Pascal Vogel, Solutions Architect, and Philipp Klose, Global Solutions Architect.

Event streaming with Apache Kafka has become an important element of modern data-oriented and event-driven architectures (EDAs), unlocking use cases such as real-time analytics of user behavior, anomaly and fraud detection, and Internet of Things event processing. Stream producers and consumers in Kafka often use schema registries to ensure that all components follow agreed-upon event structures when sending (serializing) and processing (deserializing) events to avoid application bugs and crashes.

A common schema format in Kafka is Apache Avro, which supports rich data structures in a compact binary format. To integrate Kafka with other AWS and third-party services more easily, AWS offers Amazon EventBridge Pipes, a serverless point-to-point integration service. However, many downstream services expect JSON-encoded events, requiring custom, and repetitive schema validation and conversion logic from Avro to JSON in each downstream service.

This blog post shows how to reliably consume, validate, convert, and send Avro events from Kafka to AWS and third-party services using EventBridge Pipes, allowing you to reduce custom deserialization logic in downstream services. You can also use EventBridge event buses as targets in Pipes to filter and distribute events from Pipes to multiple targets, including cross-account and cross-Region delivery.

This blog describes two scenarios:

Using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and AWS Glue Schema Registry.
Using Confluent Cloud and the Confluent Schema Registry.

See the associated GitHub repositories for Glue Schema Registry or Confluent Schema Registry for full source code and detailed deployment instructions.

Kafka event streaming and schema validation on AWS

To build event streaming applications with Kafka on AWS, you can use Amazon MSK, offerings such as Confluent Cloud, or self-hosted Kafka on Amazon Elastic Compute Cloud (Amazon EC2) instances.

To avoid common issues in event streaming and event-driven architectures, such as data inconsistencies and incompatibilities, it is a recommended practice to define and share event schemas between event producers and consumers. In Kafka, schema registries are used to manage, evolve, and enforce schemas for event producers and consumers. The AWS Glue Schema Registry provides a central location to discover, manage, and evolve schemas. In the case of Confluent Cloud, the Confluent Schema Registry serves the same role. Both the Glue Schema Registry and the Confluent Schema Registry support common schema formats such as Avro, Protobuf, and JSON.

To integrate Kafka with AWS services, third-party services, and your own applications, you can use EventBridge Pipes. EventBridge Pipes helps you create point-to-point integrations between event sources and targets with optional filtering, transformation, and enrichment. EventBridge Pipes reduces the amount of integration code that you have to write and maintain when building EDAs.

Many AWS and third-party services expect JSON-encoded payloads (events) as input, meaning they cannot directly consume Avro or Protobuf payloads. To replace repetitive Avro-to-JSON validation and conversion logic in each consumer, you can use the EventBridge Pipes enrichment step. This solution uses an AWS Lambda function in the enrichment step to deserialize and validate Kafka events with a schema registry, including error handling with dead-letter queues, and convert events to JSON before passing them to downstream services.

Solution overview

The solution presented in this blog post consists of the following key elements:

The source of the pipe is a Kafka cluster deployed using MSK or Confluent Cloud. EventBridge Pipes reads events from the Kafka stream in batches and sends them to the enrichment function (see here for an example event).
The enrichment step (Lambda function) deserializes and validates the events against the configured schema registry (Glue or Confluent), converts events from Avro to JSON with integrated error handling, and returns them to the pipe.
The target of this example solution is an EventBridge custom event bus that is invoked by EventBridge Pipes with JSON-encoded events returned by the enrichment Lambda function. EventBridge Pipes supports a variety of other targets, including Lambda, AWS Step Functions, Amazon API Gateway, API destinations, and more, enabling you to build EDAs without writing integration code.
In this sample solution, the event bus sends all events to Amazon CloudWatch Logs via an EventBridge rule. You can extend the example to invoke additional EventBridge targets.

Optionally, you can add OpenAPI 3 or JSONSchema Draft 4 schemas for your events in the EventBridge schema registry by either manually generating it from the Avro schema or using EventBridge schema discovery. This allows you to download code bindings for the JSON-converted events for various programming languages, such as JavaScript, Python, and Java, to correctly use them in your EventBridge targets.

The remainder of this blog post describes this solution for the Glue and Confluent schema registries with code examples.

EventBridge Pipes with the Glue Schema Registry

This section describes how to implement event schema validation and conversion from Avro to JSON using EventBridge Pipes and the Glue Schema Registry. You can find the source code and detailed deployment instructions on GitHub.

Prerequisites

You need an Amazon MSK serverless cluster running and the Glue Schema registry configured. This example includes a Avro schema and a Glue Schema Registry. See the following AWS blog post for an introduction to schema validation with the Glue Schema Registry: Validate, evolve, and control schemas in Amazon MSK and Amazon Kinesis Data Streams with AWS Glue Schema Registry.

EventBridge Pipes configuration

Use the AWS Cloud Development Kit (AWS CDK) template provided in the GitHub repository to deploy:

An EventBridge pipe that connects to your existing Amazon MSK Serverless Kafka topic as the source via AWS Identity and Access Management (IAM) authentication.
EventBridge Pipes reads events from your Kafka topic using the Amazon MSK source type.
An enrichment Lambda function in Java to perform event deserialization, validation, and conversion from Avro to JSON.
An Amazon Simple Queue Service (Amazon SQS) dead letter queue to hold events for which deserialization failed.
An EventBridge custom event bus as the pipe target. An EventBridge rule sends all incoming events into a CloudWatch Logs log group.

For MSK-based sources, EventBridge supports configuration parameters, such as batch window, batch size, and starting position, which you can set using the parameters of the CfnPipe class in the example CDK stack.

The example EventBridge pipe consumes events from Kafka in batches of 10 because it is targeting an EventBridge event bus, which has a max batch size of 10. See batching and concurrency in the EventBridge Pipes User Guide to choose an optimal configuration for other targets.

EventBridge Pipes with the Confluent Schema Registry

This section describes how to implement event schema validation and conversion from Avro to JSON using EventBridge Pipes and the Confluent Schema Registry. You can find the source code and detailed deployment instructions on GitHub.

Prerequisites

To set up this solution, you need a Kafka stream running on Confluent Cloud as well as the Confluent Schema Registry set up. See the corresponding Schema Registry tutorial for Confluent Cloud to set up a schema registry for your Confluent Kafka stream.

To connect to your Confluent Cloud Kafka cluster, you need an API key for Confluent Cloud and Confluent Schema Registry. AWS Secrets Manager is used to securely store your Confluent secrets.

EventBridge Pipes configuration

Use the AWS CDK template provided in the GitHub repository to deploy:

An EventBridge pipe that connects to your existing Confluent Kafka topic as the source via an API secret stored in Secrets Manager.
EventBridge Pipes reads events from your Confluent Kafka topic using the self-managed Apache Kafka stream source type, which includes all non-MSK Kafka clusters.
An enrichment Lambda function in Python to perform event deserialization, validation, and conversion from Avro to JSON.
An SQS dead letter queue to hold events for which deserialization failed.
An EventBridge custom event bus as the pipe target. An EventBridge rule writes all incoming events into a CloudWatch Logs log group.

For self-managed Kafka sources, EventBridge supports configuration parameters, such as batch window, batch size, and starting position, which you can set using the parameters of the CfnPipe class in the example CDK stack.

Enrichment Lambda functions

Both of the solutions described previously include an enrichment Lambda function for schema validation and conversion from Avro to JSON.

The Java Lambda function integrates with the Glue Schema Registry using the AWS Glue Schema Registry Library. The Python Lambda function integrates with the Confluent Schema Registry using the confluent-kafka library and uses Powertools for AWS Lambda (Python) to implement Serverless best practices such as logging and tracing.

The enrichment Lambda functions perform the following tasks:

In the events polled from the Kafka stream by the EventBridge pipe, the key and value of the event are base64 encoded. Therefore, for each event in the batch passed to the function, the key and the value are decoded.
The event key is assumed to be serialized by the producer as a string type.
The event value is deserialized using the Glue Schema registry Serde (Java) or the confluent-kafka AvroDeserializer (Python).
The function then returns the successfully converted JSON events to the EventBridge pipe, which then invokes the target for each of them.
Events for which Avro deserialization failed are sent to the SQS dead letter queue.

Conclusion

This blog post shows how to implement event consumption, Avro schema validation, and conversion to JSON using Amazon EventBridge Pipes, Glue Schema Registry, and Confluent Schema Registry.

The source code for the presented example is available in the AWS Samples GitHub repository for Glue Schema Registry and Confluent Schema Registry. For more patterns, visit the Serverless Patterns Collection.

For more serverless learning resources, visit Serverless Land.

Enhanced Amazon CloudWatch metrics for Amazon EventBridge

2023-11-11 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/enhanced-amazon-cloudwatch-metrics-for-amazon-eventbridge/

This post is written by Vaibhav Shah, Sr. Solutions Architect.

Customers use event-driven architectures to orchestrate and automate their event flows from producers to consumers. Amazon EventBridge acts as a serverless event router for various targets based on event rules. It decouples the producers and consumers, allowing customers to build asynchronous architectures.

EventBridge provides metrics to enable you to monitor your events. Some of the metrics include: monitoring the number of partner events ingested, the number of invocations that failed permanently, and the number of times a target is invoked by a rule in response to an event, or the number of events that matched with any rule.

In response to customer requests, EventBridge has added additional metrics that allow customers to monitor their events and provide additional visibility. This blog post explains these new capabilities.

What’s new?

EventBridge has new metrics mainly around the API, events, and invocations metrics. These metrics give you insights into the total number of events published, successful events published, failed events, number of events matched with any or specific rule, events rejected because of throttling, latency, and invocations based metrics.

This allows you to track the entire span of event flow within EventBridge and quickly identify and resolve issues as they arise.

EventBridge now has the following metrics:

Metric	Description	Dimensions and Units
PutEventsLatency	The time taken per PutEvents API operation	None Units: Milliseconds
PutEventsRequestSize	The size of the PutEvents API request in bytes	None Units: Bytes
MatchedEvents	Number of events that matched with any rule, or a specific rule	None RuleName, EventBusName, EventSourceName Units: Count
ThrottledRules	The number of times rule execution was throttled.	None, RuleName Unit: Count
PutEventsApproximateCallCount	Approximate total number of calls in PutEvents API calls.	None Units: Count
PutEventsApproximateThrottledCount	Approximate number of throttled requests in PutEvents API calls.	None Units: Count
PutEventsApproximateFailedCount	Approximate number of failed PutEvents API calls.	None Units: Count
PutEventsApproximateSuccessCount	Approximate number of successful PutEvents API calls.	None Units: Count
PutEventsEntriesCount	The number of event entries contained in a PutEvents request.	None Units: Count
PutEventsFailedEntriesCount	The number of event entries contained in a PutEvents request that failed to be ingested.	None Units: Count
PutPartnerEventsApproximateCallCount	Approximate total number of calls in PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsApproximateThrottledCount	Approximate number of throttled requests in PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsApproximateFailedCount	Approximate number of failed PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsApproximateSuccessCount	Approximate number of successful PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsEntriesCount	The number of event entries contained in a PutPartnerEvents request.	None Units: Count
PutPartnerEventsFailedEntriesCount	The number of event entries contained in a PutPartnerEvents request that failed to be ingested.	None Units: Count
PutPartnerEventsLatency	The time taken per PutPartnerEvents API operation (visible in Partner’s account)	None Units: Milliseconds
InvocationsCreated	Number of times a target is invoked by a rule in response to an event. One invocation attempt represents a single count for this metric.	None Units: Count
InvocationAttempts	Number of times EventBridge attempted invoking a target.	None Units: Count
SuccessfulInvocationAttempts	Number of times target was successfully invoked.	None Units: Count
RetryInvocationAttempts	The number of times a target invocation has been retried.	None Units: Count
IngestiontoInvocationStartLatency	The time to process events, measured from when an event is ingested by EventBridge to the first invocation of a target.	None, RuleName, EventBusName Units: Milliseconds
IngestiontoInvocationCompleteLatency	The time taken from event Ingestion to completion of the first successful invocation attempt	None, RuleName, EventBusName Units: Milliseconds

Use-cases for these metrics

These new metrics help you improve observability and monitoring of your event-driven applications. You can proactively monitor metrics that help you understand the event flow, invocations, latency, and service utilization. You can also set up alerts on specific metrics and take necessary actions, which help improve your application performance, proactively manage quotas, and improve resiliency.

Monitor service usage based on Service Quotas

The PutEventsApproximateCallCount metric in the events family helps you identify the approximate number of events published on the event bus using the PutEvents API action. The PutEventsApproximateSuccessfulCount metric shows the approximate number of successful events published on the event bus.

Similarly, you can monitor throttled and failed events count with PutEventsApproximateThrottledCount and PutEventsApproximateFailedCount respectively. These metrics allow you to monitor if you are reaching your quota for PutEvents. You can use a CloudWatch alarm and set a threshold close to your account quotas. If that is triggered, send notifications using Amazon SNS to your operations team. They can work to increase the Service Quotas.

You can also set an alarm on the PutEvents throttle limit in transactions per second service quota.

Navigate to the Service Quotas console. On the left pane, choose AWS services, search for EventBridge, and select Amazon EventBridge (CloudWatch Events).
In the Monitoring section, you can monitor the percentage utilization of the PutEvents throttle limit in transactions per second.
Go to the Alarms tab, and choose Create alarm. In Alarm threshold, choose 80% of the applied quota value from the dropdown. Set the Alarm name to PutEventsThrottleAlarm, and choose Create.
To be notified if this threshold is breached, navigate to Amazon CloudWatch Alarms console and choose PutEventsThrottleAlarm.
Select the Actions dropdown from the top right corner, and choose Edit.
On the Specify metric and conditions page, under Conditions, make sure that the Threshold type is selected as Static and the % Utilization selected as Greater/Equal than 80. Choose Next.
Configure actions to send notifications to an Amazon SNS topic and choose Next.
The Alarm name should be already set to PutEventsThrottleAlarm. Choose Next, then choose Update alarm.

This helps you get notified when the percentage utilization of PutEvents throttle limit in transactions per second reaches close to the threshold set. You can then request Service Quota increases if required.

Similarly, you can also create CloudWatch alarms on percentage utilization of Invocations throttle limit in transactions per second against the service quota.

Enhanced observability

The PutEventsLatency metric shows the time taken per PutEvents API operation. There are two additional metrics, IngestiontoInvocationStartLatency metric and IngestiontoInvocationCompleteLatency metric. The first metric shows the time to process events measured from when the events are first ingested by EventBridge to the first invocation of a target. The second shows the time taken from event ingestion to completion of the first successful invocation attempt.

This helps identify latency-related issues from the time of ingestion until the time it reaches the target based on the RuleName. If there is high latency, these two metrics give you visibility into this issue, allowing you to take appropriate action.

You can set a threshold around these metrics, and if the threshold is triggered, the defined actions can help recover from potential failures. One of the defined actions here can be to send events generated later to EventBridge in the secondary Region using EventBridge global endpoints.

Sometimes, events are not delivered to the target specified in the rule. This can be because the target resource is unavailable, you don’t have permission to invoke the target, or there are network issues. In such scenarios, EventBridge retries to send these events to the target for 24 hours or up to 185 times, both of which are configurable.

The new RetryInvocationAttempts metric shows the number of times the EventBridge has retried to invoke the target. The retries are done when requests are throttled, target service having availability issues, network issues, and service failures. This provides additional observability to the customers and can be used to trigger a CloudWatch alarm to notify teams if the desired threshold is crossed. If the retries are exhausted, store the failed events in the Amazon SQS dead-letter queues to process failed events for the later time.

In addition to these, EventBridge supports additional dimensions like DetailType, Source, and RuleName to MatchedEvents metrics. This helps you monitor the number of matched events coming from different sources.

Navigate to the Amazon CloudWatch. On the left pane, choose Metrics, and All metrics.
In the Browse section, select Events, and Source.
From the Graphed metrics tab, you can monitor matched events coming from different sources.

Failover events to secondary Region

The PutEventsFailedEntriesCount metric shows the number of events that failed ingestion. Monitor this metric and set a CloudWatch alarm. If it crosses a defined threshold, you can then take appropriate action.

Also, set an alarm on the PutEventsApproximateThrottledCount metric, which shows the number of events that are rejected because of throttling constraints. For these event ingestion failures, the client must resend the failed events to the event bus again, allowing you to process every single event critical for your application.

Alternatively, send events to EventBridge service in the secondary Region using Amazon EventBridge global endpoints to improve resiliency of your event-driven applications.

Conclusion

This blog shows how to use these new metrics to improve the visibility of event flows in your event-driven applications. It helps you monitor the events more effectively, from invocation until the delivery to the target. This improves observability by proactively alerting on key metrics.

For more serverless learning resources, visit Serverless Land.

Introducing the Amazon Linux 2023 runtime for AWS Lambda

2023-11-10 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-the-amazon-linux-2023-runtime-for-aws-lambda/

This post is written by Rakshith Rao, Senior Solutions Architect.

AWS Lambda now supports Amazon Linux 2023 (AL2023) as a managed runtime and container base image. Named provided.al2023, this runtime provides an OS-only environment to run your Lambda functions.

It is based on the Amazon Linux 2023 minimal container image release and has several improvements over Amazon Linux 2 (AL2), such as a smaller deployment footprint, updated versions of libraries like glibc, and a new package manager.

What are OS-only Lambda runtimes?

Lambda runtimes define the execution environment where your function runs. They provide the OS, language support, and additional settings such as environment variables and certificates.

Lambda provides managed runtimes for Java, Python, Node.js, .NET, and Ruby. However, if you want to develop your Lambda functions in programming languages that are not supported by Lambda’s managed language runtimes, the ‘provided’ runtime family provides an OS-only environment in which you can run code written in any language. This release extends the provided runtime family to support Amazon Linux 2023.

Customers use these OS-only runtimes in three common scenarios. First, they are used with languages that compile to native code, such as Go, Rust, C++, .NET Native AOT and Java GraalVM Native. Since you only upload the compiled binary to Lambda, these languages do not require a dedicated language runtime, they only require an OS environment in which the binary can run.

Second, the OS-only runtimes also enable building third-party language runtimes that you can use off the shelf. For example, you can write Lambda functions in PHP using Bref, or Swift using the Swift AWS Lambda Runtime.

Third, you can use the OS-only runtime to deploy custom runtimes, which you build for a language or language version which Lambda does not provide a managed runtime. For example, Node.js 19 – Lambda only provides managed runtimes for LTS releases, which for Node.js are the even-numbered releases.

New in Amazon Linux 2023 base image for Lambda

Updated packages

AL2023 base image for Lambda is based on the AL2023-minimal container image and includes various package updates and changes compared with provided.al2.

The version of glibc in the AL2023 base image has been upgraded to 2.34, from 2.26 that was bundled in the AL2 base image. Some libraries that developers wanted to use in provided runtimes required newer versions of glibc. With this launch, you can now use an up-to-date version of glibc with your Lambda function.

The AL2 base image for Lambda came pre-installed with Python 2.7. This was needed because Python was a required dependency for some of the packages that were bundled in the base image. The AL2023 base image for Lambda has removed this dependency on Python 2.7 and does not come with any pre-installed language runtime. You are free to choose and install any compatible Python version that you need.

Since the AL2023 base image for Lambda is based on the AL2023-minimal distribution, you also benefit from a significantly smaller deployment footprint. The new image is less than 40MB compared to the AL2-based base image, which is over 100MB in size. You can find the full list of packages available in the AL2023 base image for Lambda in the “minimal container” column of the AL2023 package list documentation.

Package manager

Amazon Linux 2023 uses dnf as the package manager, replacing yum, which was the default package manager in Amazon Linux 2. AL2023 base image for Lambda uses microdnf as the package manager, which is a standalone implementation of dnf based on libdnf and does not require extra dependencies such as Python. microdnf in provided.al2023 is symlinked as dnf. Note that microdnf does not support all options of dnf. For example, you cannot install a remote rpm using the rpm’s URL or install a local rpm file. Instead, you can use the rpm command directly to install such packages.

This example Dockerfile shows how you can install packages using dnf while building a container-based Lambda function:

# Use the Amazon Linux 2023 Lambda base image
FROM public.ecr.aws/lambda/provided.al2023

# Install the required Python version
RUN dnf install -y python3

Runtime support

With the launch of provided.al2023 you can migrate your AL2 custom runtime-based Lambda functions right away. It also sets the foundation of future Lambda managed runtimes. The future releases of managed language runtimes such as Node.js 20, Python 3.12, Java 21, and .NET 8 are based on Amazon Linux 2023 and will use provided.al2023 as the base image.

Changing runtimes and using other compute services

Previously, the provided.al2 base image was built as a custom image that used a selection of packages from AL2. It included packages like curl and yum that were needed to build functions using custom runtime. Also, each managed language runtime used different packages based on the use case.

Since future releases of managed runtimes use provided.al2023 as the base image, they contain the same set of base packages that come with AL2023-minimal. This simplifies migrating your Lambda function from a custom runtime to a managed language runtime. It also makes it easier to switch to other compute services like AWS Fargate or Amazon Elastic Container Services (ECS) to run your application.

Upgrading from AL1-based runtimes

For more information on Lambda runtime deprecation, see Lambda runtimes.

AL1 end of support is scheduled for December 31, 2023. The AL1-based runtimes go1.x, java8 and provided will be deprecated from this date. You should migrate your Go based Lambda functions to the provided runtime family, such as provided.al2 or provided.al2023. Using a provided runtime offers several benefits over the go1.x runtime. First, you can run your Lambda functions on AWS Graviton2 processors that offer up to 34% better price-performance compared to functions running on x86_64 processors. Second, it offers a smaller deployment package and faster function invoke path. And third, it aligns Go with other languages that also compile to native code and run on the provided runtime family.

The deprecation of the Amazon Linux 1 (AL1) base image for Lambda is also scheduled for December 31, 2023. With provided.al2023 now generally available, you should start planning the migration of your go1.x and AL1 based Lambda functions to provided.al2023.

Using the AL2023 base image for Lambda

To build Lambda functions using a custom runtime, follow these steps using the provided.al2023 runtime.

AWS Management Console

Navigate to the Create function page in the Lambda console. To use the AL2023 custom runtime, select Provide your own bootstrap on Amazon Linux 2023 as the Runtime value:

AWS Serverless Application Model (AWS SAM) template

If you use the AWS SAM template to build and deploy your Lambda function, use the provided.al2023 as the value of the Runtime:

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: hello-world/
      Handler: my.bootstrap.file
      Runtime: provided.al2023

Building Lambda functions that compile natively

Lambda’s custom runtime simplifies the experience to build functions in languages that compile to native code, broadening the range of languages you can use. Lambda provides the Runtime API, an HTTP API that custom runtimes can use to interact with the Lambda service. Implementations of this API, called Runtime Interface Client (RIC), allow your function to receive invocation events from Lambda, send the response back to Lambda, and report errors to the Lambda service. RICs are available as language-specific libraries for several popular programming langauges such as Go, Rust, Python, and Java.

For example, you can build functions using Go as shown in the Building with Go section of the Lambda developer documentation. Note that the name of the executable file of your function should always be bootstrap in provided.al2023 when using the zip deployment model. To use AL2023 in this example, use provided.al2023 as the runtime for your Lambda function.

If you are using CLI set the --runtime option to provided.al2023:

aws lambda create-function --function-name myFunction \
--runtime provided.al2023 --handler bootstrap \
--role arn:aws:iam::111122223333:role/service-role/my-lambda-role \
--zip-file fileb://myFunction.zip

If you are using AWS Serverless Application Model, use provided.al2023 as the value of the Runtime in your AWS SAM template file:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Metadata:
      BuildMethod: go1.x
    Properties:
      CodeUri: hello-world/ # folder where your main program resides
      Handler: bootstrap
      Runtime: provided.al2023
      Architectures: [arm64]

If you run your function as a container image as shown in the Deploy container image example, use this Dockerfile. You can use any name for the executable file of your function when using container images. You need to specify the name of the executable as the ENTRYPOINT in your Dockerfile:

FROM golang:1.20 as build
WORKDIR /helloworld

# Copy dependencies list
COPY go.mod go.sum ./

# Build with optional lambda.norpc tag
COPY main.go .
RUN go build -tags lambda.norpc -o main main.go

# Copy artifacts to a clean image
FROM public.ecr.aws/lambda/provided:al2023
COPY --from=build /helloworld/main ./main
ENTRYPOINT [ "./main" ]

Conclusion

With this launch, you can now build your Lambda functions using Amazon Linux 2023 as the custom runtime or use it as the base image to run your container-based Lambda functions. You benefit from the updated versions of libraries such as glibc, new package manager, and smaller deployment size than Amazon Linux 2 based runtimes. Lambda also uses Amazon Linux 2023-minimal as the basis for future Lambda runtime releases.

For more serverless learning resources, visit Serverless Land.

Introducing faster polling scale-up for AWS Lambda functions configured with Amazon SQS

2023-11-06 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-faster-polling-scale-up-for-aws-lambda-functions-configured-with-amazon-sqs/

This post was written by Anton Aleksandrov, Principal Solutions Architect, and Tarun Rai Madan, Senior Product Manager.

Today, AWS is announcing that AWS Lambda supports up to five times faster polling scale-up rate for spiky Lambda workloads configured with Amazon Simple Queue Service (Amazon SQS) as an event source.

This feature enables customers building event-driven applications using Lambda and SQS to achieve more responsive scaling during a sudden burst of messages in their SQS queues, and reduces the need to duplicate Lambda functions or SQS queues to achieve faster message processing.

Overview

Customers building modern event-driven and messaging applications with AWS Lambda use the Amazon SQS as a fundamental building block for creating decoupled architectures. Amazon SQS is a fully managed message queueing service for microservices, distributed systems, and serverless applications. When a Lambda function subscribes to an SQS queue as an event source, Lambda polls the queue, retrieves the messages, and sends retrieved messages in batches to the function handler for processing. To consume messages efficiently, Lambda detects the increase in queue depth, and increases the number of poller processes to process the queued messages.

Up until today, the Lambda was adding up to 60 concurrent executions per minute for Lambda functions subscribed to SQS queues, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes. However, customers tell us that some of the modern event-driven applications they build using Lambda and SQS are sensitive to sudden spikes in messages, which may cause noticeable delay in processing of messages for end users. In order to harness the power of Lambda for applications that experience a burst of messages in SQS queues, these customers needed Lambda message polling to scale up faster.

With today’s announcement, Lambda functions that subscribe to an SQS queue can scale up to five times faster for queues that see a spike in message backlog, adding up to 300 concurrent executions per minute, and scaling up to a maximum of 1,250 concurrent executions. This scaling improvement helps to use the simplicity of Lambda and SQS integration to build event-driven applications that scale faster during a surge of incoming messages, particularly for real-time systems. It also offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent Lambda invocations per SQS event source.

Controlling the maximum concurrent Lambda invocations by SQS

The new improved scaling rates are automatically applied to all AWS accounts using Lambda and SQS as an event source. There is no explicit action that you must take, and there’s no additional cost. This scaling improvement helps customers to build more performant Lambda applications where they need faster SQS polling scale-up. To prevent potentially overloading the downstream dependencies, Lambda provides customers the control to set the maximum number of concurrent executions at a function level with reserved concurrency, and event source level with maximum concurrency.

The following diagram illustrates settings that you can use to control the flow rate of an SQS event-source. You use reserved concurrency to control function-level scaling, and maximum concurrency to control event source scaling.

Reserved concurrency is the maximum concurrency that you want to allocate to a function. When a function has reserved concurrency allocated, no other functions can use that concurrency.

AWS recommends using reserved concurrency when you want to ensure a function has enough concurrency to scale up. When an SQS event source is attempting to scale up concurrent Lambda invocations, but the function has already reached the threshold defined by the reserved concurrency, the Lambda service throttles further function invocations.

This may result in SQS event source attempting to scale down, reducing the number of concurrently processed messages. Depending on the queue configuration, the throttled messages are returned to the queue for retrying, expire based on the retention policy, or sent to a dead-letter queue (DLQ) or on-failure destination.

The maximum concurrency setting allows you to control concurrency at the event source level. It allows you to define the maximum number of concurrent invocations the event source attempts to send to the Lambda function. For scenarios where a single function has multiple SQS event sources configured, you can define maximum concurrency for each event source separately, providing more granular control. When trying to add rate control to SQS event sources, AWS recommends you start evaluating maximum concurrency control first, as it provides greater flexibility.

Reserved concurrency and maximum concurrency are complementary capabilities, and can be used together. Maximum concurrency can help to prevent overwhelming downstream systems and throttled invocations. Reserved concurrency helps to ensure available concurrency for the function.

Example scenario

Consider your business must process large volumes of documents from storage. Once every few hours, your business partners upload large volumes of documents to S3 buckets in your account.

For resiliency, you’ve designed your application to send a message to an SQS queue for each of the uploaded documents, so you can efficiently process them without accidentally skipping any. The documents are processed using a Lambda function, which takes around two seconds to process a single document.

Processing these documents is a CPU-intensive operation, so you decide to process a single document per invocation. You want to use the power of Lambda to fan out the parallel processing to as many concurrent execution environments as possible. You want the Lambda function to scale up rapidly to process those documents in parallel as fast as possible, and scale-down to zero once all documents are processed to save costs.

When a business partner uploads 200,000 documents, 200,000 messages are sent to the SQS queue. The Lambda function is configured with an SQS event source, and it starts consuming the messages from the queue.

This diagram shows the results of running the test scenario before the SQS event source scaling improvements. As expected, you can see that concurrent executions grow by 60 per minute. It takes approximately 16 minutes to scale up to 900 concurrent executions gradually and process all the messages in the queue.

The following diagram shows the results of running the same test scenario after the SQS event source scaling improvements. The timeframe used for both charts is the same, but the performance on the second chart is better. Concurrent executions grow by 300 per minute. It only takes 4 minutes to scale up to 1,250 concurrent executions, and all the messages in the queue are processed in approximately 8 minutes.

Deploying this example

Use the example project to replicate this performance test in your own AWS account. Follow the instructions in README.md for provisioning the sample project in your AWS accounts using the AWS Cloud Development Kit (CDK).

This example project is configured to demonstrate a large-scale workload processing 200,000 messages. Running this sample project in your account may incur charges. See AWS Lambda pricing and Amazon SQS pricing.

Once deployed, use the application under the “sqs-cannon” directory to send 200,000 messages to the SQS queue (or reconfigure to any other number). It takes several minutes to populate the SQS queue with messages. After all messages are sent, enable the SQS event source, as described in the README.md, and monitor the charts in the provisioned CloudWatch dashboard.

The default concurrency quota for new AWS accounts is 1000. If you haven’t requested an increase in this quota, the number of concurrent executions is capped at this number. Use Service Quotas or contact your account team to request a concurrency increase.

Security best practices

Always use the least privileged permissions when granting your Lambda functions access to SQS queues. This reduces potential attack surface by ensuring that only specific functions have permissions to perform specific actions on specific queues. For example, in case your function only polls from the queue, grant it permission to read messages, but not to send new messages. A function execution role defines which actions your function is allowed to perform on other resources. A queue access policy defines the principals that can access this queue, and the actions that are allowed.

Use server-side encryption (SSE) to store sensitive data in encrypted SQS queues. With SSE, your messages are always stored in encrypted form, and SQS only decrypts them for sending to an authorized consumer. SSE protects the contents of messages in queues using SQS-managed encryption keys (SSE-SQS) or keys managed in the AWS Key Management Service (SSE-KMS).

Conclusion

The improved Lambda SQS event source polling scale-up capability enables up to five times faster scale-up performance for spiky event-driven workloads using SQS queues, at no additional cost. This improvement offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent invokes by SQS as an event source.

For more serverless learning resources, visit Serverless Land.

Orchestrating dependent file uploads with AWS Step Functions

2023-11-02 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/orchestrating-dependent-file-uploads-with-aws-step-functions/

This post is written by Nelson Assis, Enterprise Support Lead, Serverless and Jevon Liburd, Technical Account Manager, Serverless

Amazon S3 is an object storage service that many customers use for file storage. With the use of Amazon S3 Event Notifications or Amazon EventBridge customers can create workloads with event-driven architecture (EDA). This architecture responds to events produced when changes occur to objects in S3 buckets.

EDA involves asynchronous communication between system components. This serves to decouple the components allowing each component to be autonomous.

Some scenarios may introduce coupling in the architecture due to dependency between events. This blog post presents a common example of this coupling and how it can be handled using AWS Step Functions.

Overview

In this example, an organization has two distributed autonomous teams, the Sales team and the Warehouse team. Each team is responsible for uploading a monthly data file to an S3 bucket so it can be processed.

The files generate events when they are uploaded, initiating downstream processes. The processing of the Warehouse file cleans the data and joins it with data from the Shipping team. The processing of the Sales file correlates the data with the combined Warehouse and Shipping data. This enables analysts to perform forecasting and gain other insights.

For this correlation to happen, the Warehouse file must be processed before the Sales file. As the two teams are autonomous, there is no coordination among the teams. This means that the files can be uploaded at any time with no assurance that the Warehouse file is processed before the Sales file.

For scenarios like these, the Aggregator pattern can be used. The pattern collects and stores the events, and triggers a new event based on the combined events. In the described scenario, the combined events are the processed Warehouse file and the uploaded Sales file.

The requirements of the aggregator pattern are:

Correlation – A way to group the related events. This is fulfilled by a unique identifier in the file name.
Event aggregator – A stateful store for the events.
Completion check and trigger – A condition when the combined events have been received and a way to publish the resulting event.

Architecture overview

The architecture uses the following AWS services:

Amazon DynamoDB as the event aggregator.
Step Functions to orchestrate the workflow.
AWS Lambda to parse the file name and extract the correlation identifier.
AWS Serverless Application Model (AWS SAM) for infrastructure as code and deployment.

File upload: The Sales and Warehouse teams upload their respective files to S3.
EventBridge: The ObjectCreated event is sent to EventBridge where there is a rule with a target of the main workflow.
Main state machine: This state machine orchestrates the aggregator operations and the processing of the files. It encapsulates the workflows for each file to separate the aggregator logic from the files’ workflow logic.
File parser and correlation: The business logic to identify the file and its type is run in this Lambda function.
Stateful store: A DynamoDB table stores information about the file such as the name, type, and processing status. The state machine reads from and writes to the DynamoDB table. Task tokens are also stored in this table.
File processing: Depending on the file type and any pre-conditions, state machines corresponding to the file type are run. These state machines contain the logic to process the specific file.
Task Token & Callback: The task token is generated when the dependent file tries to be processed before the independent file. The Step Functions “Wait for a Callback” pattern continues the execution of the dependent file after the independent file is processed.

Walkthrough

You need the following prerequisites:

AWS CLI and AWS SAM CLI installed.
An AWS account.
Sufficient permissions to manage the AWS resources.
Git installed.

To deploy the example, follow the instructions in the GitHub repo.

This walkthrough shows what happens if the dependent file (Sales file) is uploaded before the independent one (Warehouse file).

The workflow starts with the uploading of the Sales file to the dedicated Sales S3 bucket. The example uses separate S3 buckets for the two files as it assumes that the Sales and Warehouse teams are distributed and autonomous. You can find sample files in the code repository.

Uploading the file to S3 sends an event to EventBridge, which the aggregator state machine acts on. The event pattern used in the EventBridge rule is:

{
  "detail-type": ["Object Created"],
  "source": ["aws.s3"],
  "detail": {
    "bucket": {
      "name": ["sales-mfu-eda-09092023", "warehouse-mfu-eda-09092023"]
    },
    "reason": ["PutObject"]
  }
}

The aggregator state machine starts by invoking the file parser Lambda function. This function parses the file type and uses the identifier to correlate the files. In this example, the name of the file contains the file type and the correlation identifier (the year_month). To use other ways of representing the file type and correlation identifier, you can modify this function to parse that information.
The next step in the state machine inserts a record for the event in the event aggregator DynamoDB table. The table has a composite primary key with the correlation identifier as the partition key and the file type as the sort key. The processing status of the file is tracked to give feedback on the state of the workflow.
Based on the file type, the state machine determines which branch to follow. In the example, the Sales branch is run. The state machine tries to get the status of the (dependent) Warehouse file from DynamoDB using the correlation identifier. Using the result of this query, the state machine determines if the corresponding Warehouse file has already been processed.
Since the Warehouse file is not processed yet, the waitForTaskToken integration pattern is used. The state machine waits at this step and creates a task token, which the external services use to trigger the state machine to continue its execution. The Sales record in the DynamoDB table is updated with the Task Token.
Navigate to the S3 console and upload the sample Warehouse file to the Warehouse S3 bucket. This invokes a new instance of the Step Functions workflow, which flows through the other branch after the file type choice step. In this branch, the Warehouse state machine is run and the processing status of the file is updated in DynamoDB.

When the status of the Warehouse file is changed to “Completed”, the Warehouse state machine checks DynamoDB for a pending Sales file. If there is one, it retrieves the task token and calls the SendTaskSuccess method. This triggers the Sales state machine, which is in a waiting state to continue. The Sales state machine is started and the processing status is updated.

Conclusion

This blog post shows how to handle file dependencies in event driven architectures. You can customize the sample provided in the code repository for your own use case.

This solution is specific to file dependencies in event driven architectures. For more information on solving event dependencies and aggregators read the blog post: Moving to event-driven architectures with serverless event aggregators.

To learn more about event driven architectures, visit the event driven architecture section on Serverless Land.

Sending and receiving webhooks on AWS: Innovate with event notifications

2023-10-30 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-and-receiving-webhooks-on-aws-innovate-with-event-notifications/

This post is written by Daniel Wirjo, Solutions Architect, and Justin Plock, Principal Solutions Architect.

Commonly known as reverse APIs or push APIs, webhooks provide a way for applications to integrate to each other and communicate in near real-time. It enables integration for business and system events.

Whether you’re building a software as a service (SaaS) application integrating with your customer workflows, or transaction notifications from a vendor, webhooks play a critical role in unlocking innovation, enhancing user experience, and streamlining operations.

This post explains how to build with webhooks on AWS and covers two scenarios:

Webhooks Provider: A SaaS application that sends webhooks to an external API.
Webhooks Consumer: An API that receives webhooks with capacity to handle large payloads.

It includes high-level reference architectures with considerations, best practices and code sample to guide your implementation.

Sending webhooks

To send webhooks, you generate events, and deliver them to third-party APIs. These events facilitate updates, workflows, and actions in the third-party system. For example, a payments platform (provider) can send notifications for payment statuses, allowing ecommerce stores (consumers) to ship goods upon confirmation.

AWS reference architecture for a webhook provider

The architecture consists of two services:

Webhook delivery: An application that delivers webhooks to an external endpoint specified by the consumer.
Subscription management: A management API enabling the consumer to manage their configuration, including specifying endpoints for delivery, and which events for subscription.

Considerations and best practices for sending webhooks

When building an application to send webhooks, consider the following factors:

Event generation: Consider how you generate events. This example uses Amazon DynamoDB as the data source. Events are generated by change data capture for DynamoDB Streams and sent to Amazon EventBridge Pipes. You then simplify the DynamoDB response format by using an input transformer.

With EventBridge, you send events in near real time. If events are not time-sensitive, you can send multiple events in a batch. This can be done by polling for new events at a specified frequency using EventBridge Scheduler. To generate events from other data sources, consider similar approaches with Amazon Simple Storage Service (S3) Event Notifications or Amazon Kinesis.

Filtering: EventBridge Pipes support filtering by matching event patterns, before the event is routed to the target destination. For example, you can filter for events in relation to status update operations in the payments DynamoDB table to the relevant subscriber API endpoint.

Delivery: EventBridge API Destinations deliver events outside of AWS using REST API calls. To protect the external endpoint from surges in traffic, you set an invocation rate limit. In addition, retries with exponential backoff are handled automatically depending on the error. An Amazon Simple Queue Service (SQS) dead-letter queue retains messages that cannot be delivered. These can provide scalable and resilient delivery.

Payload Structure: Consider how consumers process event payloads. This example uses an input transformer to create a structured payload, aligned to the CloudEvents specification. CloudEvents provides an industry standard format and common payload structure, with developer tools and SDKs for consumers.

Payload Size: For fast and reliable delivery, keep payload size to a minimum. Consider delivering only necessary details, such as identifiers and status. For additional information, you can provide consumers with a separate API. Consumers can then separately call this API to retrieve the additional information.

Security and Authorization: To deliver events securely, you establish a connection using an authorization method such as OAuth. Under the hood, the connection stores the credentials in AWS Secrets Manager, which securely encrypts the credentials.

Subscription Management: Consider how consumers can manage their subscription, such as specifying HTTPS endpoints and event types to subscribe. DynamoDB stores this configuration. Amazon API Gateway, Amazon Cognito, and AWS Lambda provide a management API for operations.

Costs: In practice, sending webhooks incurs cost, which may become significant as you grow and generate more events. Consider implementing usage policies, quotas, and allowing consumers to subscribe only to the event types that they need.

Monetization: Consider billing consumers based on their usage volume or tier. For example, you can offer a free tier to provide a low-friction access to webhooks, but only up to a certain volume. For additional volume, you charge a usage fee that is aligned to the business value that your webhooks provide. At high volumes, you offer a premium tier where you provide dedicated infrastructure for certain consumers.

Monitoring and troubleshooting: Beyond the architecture, consider processes for day-to-day operations. As endpoints are managed by external parties, consider enabling self-service. For example, allow consumers to view statuses, replay events, and search for past webhook logs to diagnose issues.

Advanced Scenarios: This example is designed for popular use cases. For advanced scenarios, consider alternative application integration services noting their Service Quotas. For example, Amazon Simple Notification Service (SNS) for fan-out to a larger number of consumers, Lambda for flexibility to customize payloads and authentication, and AWS Step Functions for orchestrating a circuit breaker pattern to deactivate unreliable subscribers.

Receiving webhooks

To receive webhooks, you require an API to provide to the webhook provider. For example, an ecommerce store (consumer) may rely on notifications provided by their payment platform (provider) to ensure that goods are shipped in a timely manner. Webhooks present a unique scenario as the consumer must be scalable, resilient, and ensure that all requests are received.

AWS reference architecture for a webhook consumer

In this scenario, consider an advanced use case that can handle large payloads by using the claim-check pattern.

At a high-level, the architecture consists of:

API: An API endpoint to receive webhooks. An event-driven system then authorizes and processes the received webhooks.
Payload Store: S3 provides scalable storage for large payloads.
Webhook Processing: EventBridge Pipes provide an extensible architecture for processing. It can batch, filter, enrich, and send events to a range of processing services as targets.

Considerations and best practices for receiving webhooks

When building an application to receive webhooks, consider the following factors:

Scalability: Providers typically send events as they occur. API Gateway provides a scalable managed endpoint to receive events. If unavailable or throttled, providers may retry the request, however, this is not guaranteed. Therefore, it is important to configure appropriate rate and burst limits. Throttling requests at the entry point mitigates impact on downstream services, where each service has its own quotas and limits. In many cases, providers are also aware of impact on downstream systems. As such, they send events at a threshold rate limit, typically up to 500 transactions per second (TPS).

In addition, API Gateway allows you to validate requests, monitor for any errors, and protect against distributed denial of service (DDoS). This includes Layer 7 and Layer 3 attacks, which are common threats to webhook consumers given public exposure.

Authorization and Verification: Providers can support different authorization methods. Consider a common scenario with Hash-based Message Authentication Code (HMAC), where a shared secret is established and stored in Secrets Manager. A Lambda function then verifies integrity of the message, processing a signature in the request header. Typically, the signature contains a timestamped nonce with an expiry to mitigate replay attacks, where events are sent multiple times by an attacker. Alternatively, if the provider supports OAuth, consider securing the API with Amazon Cognito.

Payload Size: Providers may send a variety of payload sizes. Events can be batched to a single larger request, or they may contain significant information. Consider payload size limits in your event-driven system. API Gateway and Lambda have limits of 10 Mb and 6 Mb. However, DynamoDB and SQS are limited to 400kb and 256kb (with extension for large messages) which can represent a bottleneck.

Instead of processing the entire payload, S3 stores the payload. It is then referenced in DynamoDB, via its bucket name and object key. This is known as the claim-check pattern. With this approach, the architecture supports payloads of up to 6mb, as per the Lambda invocation payload quota.

Idempotency: For reliability, many providers prioritize delivering at-least-once, even if it means not guaranteeing exactly once delivery. They can transmit the same request multiple times, resulting in duplicates. To handle this, a Lambda function checks against the event’s unique identifier against previous records in DynamoDB. If not already processed, you create a DynamoDB item.

Ordering: Consider processing requests in its intended order. As most providers prioritize at-least-once delivery, events can be out of order. To indicate order, events may include a timestamp or a sequence identifier in the payload. If not, ordering may be on a best-efforts basis based on when the webhook is received. To handle ordering reliably, select event-driven services that ensure ordering. This example uses DynamoDB Streams and EventBridge Pipes.

Flexible Processing: EventBridge Pipes provide integrations to a range of event-driven services as targets. You can route events to different targets based on filters. Different event types may require different processors. For example, you can use Step Functions for orchestrating complex workflows, Lambda for compute operations with less than 15-minute execution time, SQS to buffer requests, and Amazon Elastic Container Service (ECS) for long-running compute jobs. EventBridge Pipes provide transformation to ensure only necessary payloads are sent, and enrichment if additional information is required.

Costs: This example considers a use case that can handle large payloads. However, if you can ensure that providers send minimal payloads, consider a simpler architecture without the claim-check pattern to minimize cost.

Conclusion

Webhooks are a popular method for applications to communicate, and for businesses to collaborate and integrate with customers and partners.

This post shows how you can build applications to send and receive webhooks on AWS. It uses serverless services such as EventBridge and Lambda, which are well-suited for event-driven use cases. It covers high-level reference architectures, considerations, best practices and code sample to assist in building your solution.

For standards and best practices on webhooks, visit the open-source community resources Webhooks.fyi and CloudEvents.io.

For more serverless learning resources, visit Serverless Land.

Archiving and replaying messages with Amazon SNS FIFO

2023-10-27 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/archiving-and-replaying-messages-with-amazon-sns-fifo/

This post is written by A Mohammed Atiq, Solutions Architect and Mithun Mallick, Principal Solutions Architect, Serverless

Amazon Simple Notification Service (SNS) offers a flexible, fully managed messaging service, allowing applications to send and receive messages. SNS acts as a channel, delivering events from publishers to subscribers.

Today, AWS is announcing a new capability that enables you to archive and replay messages published to SNS FIFO (first-in first-Out) topics. Now, when enabled with an archive policy, SNS FIFO topics automatically:

Archives events, with a no-code, in-place message archive that doesn’t require any external resources. You only need to define an archive policy on your topic, including the required retention period (from 1 to 365 days).
Replays events: subscribers benefit from a managed, no-code message replay functionality, with built-in progress reporting and message filtering capabilities. To initiate a replay, subscribers simply apply a replay policy to their subscription, defining a starting point and an ending point using timestamps.

This feature can be useful in failure recovery and state replication scenarios.

Failure recovery

In failure recovery scenarios, developers can use this to reprocess a subset of messages and recover from a downstream application failure or a dependency issue.

Consider a situation where a search application needs to reprocess messages because the search engine’s index has been erased. To initiate recovery, the search application would update the ReplayPolicy attribute in its existing subscription using the SetSubscriptionAttributes API action, to start receiving messages from a specific point in time, rather than from when the Archive policy was applied to the topic.

State replication

For state replication scenarios, this feature enables new applications to duplicate the state of previously subscribed applications.

Consider an internal data warehouse application that must replicate the state of an external search application to make the data indexed in the search engine available to product managers and other internal staff. The data warehouse application subscribes its newly created endpoint (for example, an Amazon SQS FIFO queue) to the topic using the Subscribe API action and sets the ReplayPolicy subscription attribute.

If it opts to replicate the full state of the search engine, it might set the timestamp in its ReplayPolicy to coincide with the search engine’s subscription’s creation date and time, ensuring all data ever delivered to the search engine is also delivered to the data warehouse tool.

Enabling the archive policy via the SNS console

When creating a new SNS FIFO topic, you see an option for the archive policy. This policy determines how long SNS stores your messages, making them available for potential resending to a subscription if necessary. The Archive policy does not activate by default – you must enable it for each topic manually or automate the operation.

For instance, the retention period for this FIFO topic is set at 30 days. However, you can adjust this duration anywhere from 1 to 365 days. Once you activate the archive policy, messages sent to this topic are archived for the defined period.

To confirm that the archive policy is in effect after creating the topic, check the topic details. Next to the retention policy, and its status is displayed as Active.

By subscribing an SQS FIFO queue to an SNS FIFO topic, you can replay messages, and the Replay status shows Not running. You can subscribe both FIFO and standard SQS queues to their SNS FIFO topics, providing flexibility for various use-case requirements. To initiate a replay, navigate to the SNS console, choose Replay, and then choose Start replay.

When you initiate a replay, a window appears, allowing you to specify the start and end dates, as well as the exact time from which messages are archived. This feature affords the flexibility to replay only messages of interest, instead of every archived message, by allowing you to set on a specific schedule. When you choose Start replay, the service begins sending messages to the subscriber.

You can also define settings for the SNS FIFO archive and replay features with both AWS CloudFormation and the AWS Serverless Application Model (AWS SAM).

Use Cases

Replaying events for error recovery in microservices

In a scenario where an insurance application uses multiple microservices, consider one claims processing microservice encounters an error and drops a claim. Such an oversight can cause the workload to be out of sync.

With the archive and replay feature, you can revisit and replay events from the time the error was detected. This allows the microservice to recognize the missed event and complete the necessary actions, ensuring the system remains updated and accurate.

Messages are published to an SNS FIFO topic from an application.
Messages are delivered to an SQS FIFO queue containing claim details to be processed by a downstream microservice.
The microservice fails to process a series of messages due to an exception and discards all of the messages.
The user then initiates a replay from the SNS FIFO topic, specifies the time frame of messages to replay based on when the failure occurred.
The microservice is now able to successfully process the replayed messages and persists data into a DynamoDB table.

Replicating state across Regions

In situations where an application spans multiple Regions, and a microservice encounters difficulties in its primary Region, you can replicate the infrastructure to another Region using an active/standby setup.

You can reroute traffic to the standby microservice in the secondary Region, maintaining synchronization through event replays. You can set an end time in the SNS replay policy but if this isn’t defined, replaying continues until all the most recent messages are sent.

After, the SNS subscription resumes normal functioning, capturing all new messages. This approach is suitable for many state replication scenarios, such as cross-Region backup strategies, as it helps minimize downtime and prevent message loss.

Messages are published to an SNS FIFO topic from an application.
Messages are delivered to an SQS FIFO queue containing claim details to be processed by a downstream microservice.
The microservice failed to process a series of messages due to an exception and discarded all of the messages.
The user then subscribes a new SQS FIFO queue in another region, initiates a replay from the SNS FIFO topic and specifies the time frame of messages to replay based on when the failure occurred.
The microservice in a different region is able to retrieve the replayed messages from the new SQS FIFO queue, successfully processes the series of messages and persists data into a DynamoDB table.

Configuring SNS FIFO archive and replay for auto insurance processing

Managing auto insurance claims requires timely coordination. This walkthrough shows the combined benefits of SNS FIFO and SQS FIFO to process claims in the correct sequence.

Both SQS FIFO and SQS standard queues can be subscribed to the SNS FIFO topic, offering versatility in handling claims. The archive and replay functionality of SNS FIFO is paramount; disruptions in downstream microservices don’t compromise claim integrity due to the replay capability.

This walkthrough guides you through deploying an auto insurance claims processing example using the AWS CLI. You create an SNS FIFO topic for claim submissions and two SQS FIFO queues. The first queue is for primary processing of the claims, while the second is specifically for message replays to support application state replication across various system instances.

Prerequisites

AWS Command Line Interface (CLI): You can download and set it up using the documentation.
JQ: To install, follow installation instructions.

Step 1 – Creating resources using the AWS CLI and storing variables

Run the following commands in the terminal.

REGION=$(aws configure get region)

# Create an SNS FIFO topic for auto insurance claims
AUTO_INSURANCE_TOPIC_ARN=$(aws sns create-topic --name "AutoInsuranceClaimsTopic.fifo" --attributes "FifoTopic=true,ContentBasedDeduplication=true,DisplayName=Auto Insurance Claims Topic" --region $REGION | jq -r '.TopicArn')

# Create primary and replay SQS FIFO queues
AUTO_INSURANCE_QUEUE_URL=$(aws sqs create-queue --queue-name "AutoInsuranceClaimsQueue.fifo" --attributes "FifoQueue=true" --region $REGION | jq -r '.QueueUrl')
AUTO_INSURANCE_REPLAY_QUEUE_URL=$(aws sqs create-queue --queue-name "AutoInsuranceReplayQueue.fifo" --attributes "FifoQueue=true" --region $REGION | jq -r '.QueueUrl')

# Get ARNs for both SQS queues
AUTO_INSURANCE_QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url $AUTO_INSURANCE_QUEUE_URL --attribute-names QueueArn --output text --query 'Attributes.QueueArn')
AUTO_INSURANCE_REPLAY_QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --attribute-names QueueArn --region $REGION | jq -r '.Attributes.QueueArn')

# Define a policy allowing the topic to publish to both queues
SQS_POLICY_TEMPLATE="{\"Policy\" : \"{ \\\"Version\\\": \\\"2012-10-17\\\", \\\"Statement\\\": [ { \\\"Sid\\\": \\\"1\\\", \\\"Effect\\\": \\\"Allow\\\", \\\"Principal\\\": { \\\"Service\\\": \\\"sns.amazonaws.com\\\" }, \\\"Action\\\": [\\\"sqs:SendMessage\\\"], \\\"Resource\\\": [\\\"$AUTO_INSURANCE_QUEUE_ARN\\\", \\\"$AUTO_INSURANCE_REPLAY_QUEUE_ARN\\\"], \\\"Condition\\\": { \\\"ArnLike\\\": { \\\"aws:SourceArn\\\": [\\\"$AUTO_INSURANCE_TOPIC_ARN\\\"] } } } ]}\"}"

# Apply the access policy to the queues
aws sqs set-queue-attributes --queue-url $AUTO_INSURANCE_QUEUE_URL --attributes file://<(echo $SQS_POLICY_TEMPLATE)
aws sqs set-queue-attributes --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --attributes file://<(echo $SQS_POLICY_TEMPLATE)

# Subscribe the primary queue to the created SNS FIFO topic
aws sns subscribe --topic-arn $AUTO_INSURANCE_TOPIC_ARN --protocol sqs --notification-endpoint $AUTO_INSURANCE_QUEUE_ARN --region $REGION

Step 2 – Setting the archive policy on the SNS FIFO topic

Modify the attributes of the SNS FIFO topic to set a retention period. This determines how long a message is retained in the topic archive. This example uses 30 days.

# Set a 30-day retention period for the SNS FIFO topic

aws sns set-topic-attributes --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --attribute-name ArchivePolicy --attribute-value "{\"MessageRetentionPeriod\":\"30\"}"

Step 3- Publishing auto insurance claim details

Publish a sample claim to the SNS FIFO topic. This step mimics a real-world scenario where an insurance claim must be processed by subscribers of the topic.

# Get the current timestamp and publish a sample insurance claim
TIMESTAMP_START=$(date -u +%FT%T.000Z)
aws sns publish --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --message "{ \"claim_type\": \"collision\", \"registration\": \"AB123CDE\" }" --message-group-id "group1"

Step 4 – Reading auto insurance claim details

Retrieve the insurance claim details from the primary SQS FIFO queue. This simulates a process reading the insurance claim to take action. After reading the message, the claim is deleted from the queue to avoid reprocessing.

# Fetch the claim details from the primary queue, then delete to avoid redundancy
MESSAGE=$(aws sqs receive-message --region $REGION --queue-url $AUTO_INSURANCE_QUEUE_URL --output json)
MESSAGE_TEXT=$(echo "$MESSAGE" | jq -r '.Messages[0].Body')
MESSAGE_RECEIPT=$(echo "$MESSAGE" | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --region $REGION --queue-url $AUTO_INSURANCE_QUEUE_URL --receipt-handle $MESSAGE_RECEIPT
echo "Received claim details: ${MESSAGE_TEXT}"

Step 5 – Subscribing the replay SQS queue to the SNS FIFO topic

To ensure no claims are lost, configure a replay policy for your SQS FIFO queue subscription. This policy sets the schedule from which messages are replayed to the SQS FIFO queue. Here, you subscribe a replay queue with a replay policy and then monitor the status of the replay queue. Once complete, read the replayed claim details from the secondary SQS FIFO queue. If any processing issues occurred initially, there is a second chance to process the claim.

Subscribe replay queue to SNS FIFO topic:

# Subscribe the replay queue to the topic and define its replay policy
NEW_SUBSCRIPTION_ARN=$(aws sns subscribe --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --protocol sqs --return-subscription-arn --notification-endpoint $AUTO_INSURANCE_REPLAY_QUEUE_ARN --attributes "{\"ReplayPolicy\":\"{\\\"PointType\\\":\\\"Timestamp\\\",\\\"StartingPoint\\\":\\\"$TIMESTAMP_START\\\"}\"}" --output json | jq -r '.SubscriptionArn')

To monitor the replay status:

# Wait for the replay to complete
while [[ $(aws sns get-subscription-attributes --region $REGION --subscription-arn $NEW_SUBSCRIPTION_ARN --output text | awk 'END{print $9}') != 'Completed' ]]; do printf "."; sleep 5; done; echo "Replay complete";

To read the replayed message and delete the message from the queue:

# Fetch the replayed message and then remove it from the queue
REPLAYED_MESSAGE=$(aws sqs receive-message --region $REGION --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --output json)
REPLAYED_MESSAGE_TEXT=$(echo "$REPLAYED_MESSAGE" | jq -r '.Messages[0].Body')
REPLAYED_MESSAGE_RECEIPT=$(echo "$REPLAYED_MESSAGE" | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --region $REGION --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --receipt-handle $REPLAYED_MESSAGE_RECEIPT
echo "Received replayed claim details: ${REPLAYED_MESSAGE_TEXT}"

Cleaning up

To avoid incurring unnecessary costs, clean up the resources created in this walkthrough:

# Delete the primary SQS FIFO queue
aws sqs delete-queue --queue-url $AUTO_INSURANCE_QUEUE_URL --region $REGION

# Delete the replay SQS FIFO queue
aws sqs delete-queue --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --region $REGION

# Unset the 'ArchivePolicy' attribute
aws sns set-topic-attributes --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --attribute-name ArchivePolicy --attribute-value "{}"

# Delete the SNS FIFO topic
aws sns delete-topic --topic-arn $AUTO_INSURANCE_TOPIC_ARN --region $REGION

Conclusion

The new SNS FIFO archive and replay feature provides a substantial foundation for event-driven applications, emphasizing failure recovery and application state replication. These features allow developers to efficiently manage and recover from disruptions, and ensure state replication across different application instances or environments.

Get started with this new SNS FIFO capability by using the AWS Management Console, AWS CLI, AWS Software Development Kit (SDK), or AWS CloudFormation. For information on cost, see SNS pricing and SQS pricing.

For more serverless learning resources, visit Serverless Land.

Enhancing runtime security and governance with the AWS Lambda Runtime API proxy extension

2023-10-27 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/enhancing-runtime-security-and-governance-with-the-aws-lambda-runtime-api-proxy-extension/

This post is written by Anton Aleksandrov, Principal Serverless Solutions Architect, and Shridhar Pandey, Senior AWS Lambda Product Manager.

AWS Lambda runtimes use the Lambda Runtime API to communicate with the Lambda service. Runtimes use it to retrieve inbound events to be processed by the function handler, return successful handler responses to the Lambda service, and report failures. This post shows how to intercept inbound events and outbound responses without changing function code, using the Runtime API proxy pattern.

This approach enables security vendors and engineering teams to provide enhanced, non-invasive security and governance tools for Lambda functions. Use cases include sanitizing event payload, blocking malicious events, and auditing and augmenting payloads.

Overview

The Lambda Runtime API is an HTTP endpoint available within the Lambda execution environment. It allows the Lambda runtime that executes the function code to communicate with the Lambda service. It is used by managed Lambda runtimes, such as Node.js or Python, as well as custom runtime, which enables developers to create their own Lambda runtime in any programming language of their choice.

Lambda runtimes use the Runtime API to retrieve the next incoming event to be processed by the function handler and return the handler response to the Lambda service.

Lambda Extensions enable you to integrate Lambda functions with your organization’s preferred tools for monitoring, observability, security, and governance. You can use extensions from AWS, AWS Lambda Ready partners, and open-source projects for a wide range of use cases. Extensions allow adding functionality, such as pre-fetching configurations or dispatching telemetry, without making intrusive changes to function code. Lambda Extensions are packaged as Lambda layers and run as a separate process in the execution environment.

This is how runtimes and extensions communicate with the Lambda service via the Runtime API and Extensions API endpoints:

AWS Lambda Runtime API and Extensions API endpoints

When you register your extension with the Lambda service, you can specify you want to receive the INVOKE event. The Lambda service sends this event to the extension asynchronously when a function is invoked.

The information supplied contains the function invocation metadata, but not the event payload. This makes the event useful for observability, but limited for application security and governance use cases, such as inspecting payloads for vulnerabilities, sanitizing inputs, and blocking malicious events.

The Lambda Runtime API proxy is a pattern that enables you to hook into the function invocation request and response lifecycle. It allows you to use extensions to implement advanced security, compliance, governance, and observability scenarios without changes to function code. You can add runtime security mechanisms, implement audit procedures for data flowing in and out of the function, enhance observability by auto-injecting tracing headers, and many more.

Understanding the Lambda Runtime API workflow

This is how the Lambda Runtime consumes the Lambda Runtime API:

How the Lambda Runtime consumes the Lambda Runtime API

Lambda runtimes use the value of the AWS_LAMBDA_RUNTIME_API environment variable to make Runtime API requests. The two primary endpoints are /next, which is used to retrieve the next event to process, and /response, which is used to return event processing results to the Lambda service. In addition, the Runtime API also provides endpoints for reporting failures. See the full protocol and schema definitions of the Runtime API.

How the Runtime API proxy approach works

The Runtime API proxy is a component that you can build to hook into the invocation workflow. It proxies requests and responses, allowing you to augment them, and control the workflow:

Runtime API proxy hooks

When the Lambda service creates a new execution environment, it starts by initializing the extensions attached to the function. The execution environment waits for all extensions to register with the Lambda service by calling the Extensions API /register endpoint, then proceeds to initialize the runtime. This allows you to start the Runtime API proxy HTTP listener during extension initialization, making it ready to serve the runtime requests.

Runtime API proxy flow

By default, the value of the AWS_LAMBDA_RUNTIME_API environment variable in the runtime process points to the Lambda Runtime API endpoint 127.0.0.1:9001. You can use a wrapper script to change that value to point to the Runtime API proxy endpoint instead.

A wrapper script enables you to customize the runtime startup behavior of your Lambda function by letting you set configuration parameters that cannot be set through language-specific environment variables. You can add a wrapper script to your function by setting the AWS_LAMBDA_EXEC_WRAPPER environment variable. The following wrapper script assumes that the Runtime API Proxy is listening on port 9009.

#!/bin/bash
export AWS_LAMBDA_RUNTIME_API="127.0.0.1:9009"
exec "$@"

You can either add this export line to an existing wrapper script or create a new one.

Runtime API proxy example

The Runtime API Proxy is bootstrapped by the Lambda service when a new execution environment is created and it’s ready to proxy requests from the Lambda runtime to the Runtime API before first invocation.

Implementing the Runtime API proxy logic

AWS recommends you implement extensions using a programming language that compiles to a binary executable, such as Golang or Rust. This allows you to use the extension with any Lambda runtime. Extensions implemented in interpreted languages, such as JavaScript and Python, or languages that require additional virtual machines, such as Java and C#, can only be used with that specific runtime.

This diagram shows a scenario where both incoming events and outbound responses are processed by the extension. You can use this workflow for auditing event or response payloads, sanitizing them, or injecting additional properties. You can use it for scenarios like masking account numbers, stripping personally identifiable information (PII), or injecting observability headers.

Runtime API proxy logic

This diagram demonstrates an advanced scenario, where the first inbound event is identified as malicious, and rejected by the Runtime API proxy. The function handler is not invoked. The second event is not flagged as malicious, and is therefore forwarded to the handler for processing. You can use this workflow for security scenarios like runtime application protection.

Runtime API proxy security scenario

AWS Partners using the Runtime API Proxy solution

“Using Lambda Runtime API proxy solution is a game-changing approach for us. It enables us to support multiple Lambda runtimes with a single implementation, provides comprehensive visibility into Lambda execution, and allows to detect attackers targeting serverless applications,” says Julio Guerra, Engineering Manager, Application Security Management, Datadog.

“Lambda Runtime API proxy is a simple solution that gives us a pluggable way to protect Lambda Function URLs. We can implement request authorization and enrichment with no changes to function code,” says Ilya Zilber, Senior Manager, Solutions Engineering, Okta Inc.

Security best practices

Extensions run within the same execution environment as the function, so they have the same level of access to resources such as file system, networking, and environment variables. IAM permissions assigned to the function are shared with extensions. Our guidance is to assign the least required privileges to your functions.

Always install extensions from a trusted source only. Use Infrastructure as Code (IaC) tools, such as AWS CloudFormation, to simplify the task of attaching the same extension configuration, including AWS Identity and Access Management (IAM) permissions, to multiple functions. Additionally, IaC tools allow you to have an audit record of extensions and versions you’ve used previously.

When building extensions, do not log sensitive data. Sanitize payloads and metadata before logging or persisting them for audit purposes.

Considerations

The Runtime API proxy approach allows you to hook into the Lambda request/response workflow, enabling new security and observability use cases. There are several important considerations:

This requires you to have a good understanding of the Lambda execution environment lifecycle and the Lambda Runtime API. You must implement proxying for all Runtime API endpoints and handle potential runtime failures.
Prepare your extension for composability for scenarions in which more than one extension implements the Runtime API proxy pattern. Allow your extension consumers to configure the extension via environment variables using at least two parameters – the port your proxy listens on and the Runtime API endpoint your proxy forwards requests to. The latter should default to the original value of the AWS_LAMBDA_RUNTIME_API environment variable. See sample implementations below for details.
Proxying API requests with default buffered responses requires additional work to support functions with response payload streaming.
Proxying API requests adds latency. The added overhead depends on your implementation. AWS recommends using programming languages that can be compiled to an executable binary, such as Rust and Golang, and keeping your extensions lightweight and optimized.

Samples

You can find sample extensions implementing the Runtime API Proxy at https://github.com/aws-samples/aws-lambda-extensions/. See Golang, Rust, and Node.js samples.

Follow the instructions described in README.md for a step-by-step tutorial on running the extension.

Conclusion

This post introduces and illustrates the Lambda Runtime API proxy pattern. You can use this pattern to hook into the Lambda function request and response workflow to intercept, process, audit, modify, and block inbound events and handler responses.

You can use this pattern to implement enhanced runtime security and governance scenarios, as well as scenarios from other domains.. AWS customers and partners can use this advanced solution approach to add enhanced security and observability to Lambda functions without requiring code changes.

For more serverless learning resources, visit Serverless Land.

Serverless ICYMI Q3 2023

2023-10-17 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/serverless-icymi-q3-2023/

Welcome to the 23rd edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

AWS announces the general availability of Amazon Bedrock

Amazon Web Services (AWS) unveils five generative artificial intelligence (AI) innovations to democratize generative AI applications. Amazon Bedrock, now generally available, enables experimentation with top foundation models (FMs) and allows customization with proprietary data.

It supports creating managed agents for complex tasks without code and ensures security and privacy. Amazon Titan Embeddings, another FM, is generally available for various language-related use cases. Meta’s Llama 2, coming soon, enhances dialogue scenarios.

The upcoming Amazon CodeWhisperer customization capability enables secure customization using private code bases. Generative BI authoring capabilities in Amazon QuickSight simplify visualization creation for business analysts.

AWS Lambda

AWS Lambda now detects and stops recursive loops in Lambda functions. AWS Lambda now detects and halts functions caught in recursive or infinite loops, guarding against unexpected costs. Lambda identifies recursive behavior, discontinuing requests after 16 invocations. The feature addresses pitfalls stemming from misconfiguration or coding bugs, introducing detailed error messaging, and allowing users to set maximum limits on retry intervals. Notifications about recursive occurrences are relayed through the AWS Health Dashboard, emails, and CloudWatch Alarms for streamlined troubleshooting. Lambda uses AWS X-Ray trace headers for invocation tracking, requiring supported AWS SDK versions.

AWS simplifies writing .NET 6 Lambda functions. The Lambda Annotations Framework for .NET. A new programming model makes the experience of writing Lambda functions in C# feel more natural for .NET developers by using C# source generator technology. This streamlines the development workflow for .NET developers, making it easier to create serverless applications using the latest version of the .NET framework.

AWS Lambda and Amazon EventBridge Pipes now support enhanced filtering. Additional filtering capabilities include the ability to match against characters at the end of a value (suffix filtering), ignore case sensitivity (equals-ignore-case), and have a single rule match if any conditions across multiple separate fields are true (OR matching).

AWS Lambda Functions powered by AWS Graviton2 are now available in 6 additional Regions. Graviton2 processors are known for their performance benefits, and this expansion provides users with more choices for running serverless workloads.

AWS Lambda adds support for Python 3.11 allowing developers to take advantage of the latest features and improvements in the Python programming language for their serverless functions.

AWS Step Functions

AWS Step Functions enhances Workflow Studio, focusing on an Advanced Starter Template and Code Mode for efficient AWS Step Functions workflow creation. Users benefit from streamlined design-to-code transitions, pasting Amazon States Language (ASL) definitions directly into Workflow Studio, speeding up adjustments. Enhanced workflow execution and configuration allow direct execution and setting adjustments within Workflow Studio, improving user experience.

AWS Step Functions launches enhanced error handling This update helps users to identify errors with precision and refine retry strategies. Step Functions now enables detailed error messages in Fail states and precise control over retry intervals. Use the new maximum limits and jitter functionality to ensure efficient and controlled retries, preventing service overload in recovery scenarios.

AWS Step Functions distributed map is now available in the AWS GovCloud (US) Regions. This release highlights the availability of the distributed map feature in Step Functions specifically tailored for the AWS GovCloud (US) Regions. The distributed map feature is a powerful capability for orchestrating parallel and distributed processing in serverless workflows.

AWS SAM

AWS SAM CLI announces local testing and debugging support on Terraform projects.

Developers can now use AWS SAM CLI to locally test and debug AWS Lambda functions and Amazon API Gateway defined in their Terraform projects. AWS SAM CLI reads infrastructure resource information from the Terraform application, allowing users to start Lambda functions and API Gateway endpoints locally in a Docker container.

This update enables faster development cycles for Terraform users, who can use AWS SAM CLI commands like `AWS SAM local start-api`, `sam local start-lambda`, and `sam local invoke`, along with `sam local generate` for generating mock test events.

Amazon EventBridge

Amazon EventBridge Scheduler adds schedule deletion after completion. This feature offers enhanced functionality by supporting the automatic deletion of schedules upon completion of their last invocation. It is applicable to various scheduling types, including one-time, cron, and rate schedules with an end date. Amazon EventBridge Scheduler, a centralized and highly scalable service, enables the creation, execution, and management of schedules.

With the ability to schedule millions of tasks invoking over 270 AWS services and 6,000 API operations. This update streamlines the process of managing completed schedules. The automatic deletion feature reduces the need for manual intervention or custom code, saving time and simplifying scalability for users leveraging EventBridge Scheduler.

Amazon EventBridge Pipes now available in three additional Regions. This update extends the availability of Amazon EventBridge Pipes, a powerful event-routing service, to three additional Regions.

Amazon EventBridge API Destinations is now available in additional Regions. Providing users with more options for building scalable and decoupled applications.

Amazon EventBridge Schema Registry and Schema Discovery now in additional Regions. This expansion allows you to discover and store event structure – or schema – in a shared, central location. You can download code bindings for those schemas for Java, Python, TypeScript, and Golang so it’s easier to use events as objects in your code.

Amazon SNS

To enhance message privacy and security, Amazon Simple Notification Service (SNS) implemented Message Data Protection, allowing users to de-identify outbound messages via redaction or masking. Amazon SNS FIFO topics now support message delivery to Amazon SQS Standard queues. This provides users with increased flexibility in managing message delivery and ordering.

Expanding its monitoring capabilities, Amazon SNS introduced Additional Usage Metrics in Amazon CloudWatch. This enhancement allows users to gain more comprehensive insights into the performance and utilization of their SNS resources. SNS extended its global SMS sending capabilities to Israel (Tel Aviv), providing users in that Region with additional options for SMS notifications. SNS also expanded its reach by supporting Mobile Push Notifications in twelve new AWS Regions. This expansion aligns with the growing demand for mobile notification capabilities, offering a broader coverage for users across diverse Regions.

Amazon SQS

Amazon Simple Queue Service (SQS) introduced a number of updates. Attribute-Based Access Control (ABAC) was implemented for scalable access permissions, while message data protection can now de-identify outbound messages via redaction or masking. SQS FIFO topics now support message delivery to Amazon SQS Standard queues, providing enhanced flexibility. Addressing throughput demands, SQS increased the quota for FIFO High Throughput mode. JSON protocol support was previewed, offering improved message format flexibility. These updates underscore SQS’s commitment to advanced security and flexibility.

Amazon API Gateway

Amazon API Gateway undergoes a console refresh, aligning with Cloudscape Design System guidelines. Notable enhancements include improved usability, sortable tables, enhanced API key management, and direct API deployment from the Resource view. The update introduces dark mode, accessibility improvements, and visual alignment with HTTP APIs and AWS Services.

GOTO EDA day Nashville 2023

Join GOTO EDA Day in Nashville on October 26 for insights on event-driven architectures. Learn from industry leaders at Music City Center with talks, panels, and Hands-On Labs. Limited tickets available.

Serverless blog posts

July 2023

July 5- Implementing AWS Lambda error handling patterns

July 6 – Implementing AWS Lambda error handling patterns

July 7 – Understanding AWS Lambda’s invoke throttling limits

July 10 – Detecting and stopping recursive loops in AWS Lambda functions

July 11 – Implementing patterns that exit early out of a parallel state in AWS Step Functions

July 26 – Migrating AWS Lambda functions from the Go1.x runtime to the custom runtime on Amazon Linux 2

July 27 – Python 3.11 runtime now available in AWS Lambda

August 2023

August 2 – Automatically delete schedules upon completion with Amazon EventBridge Scheduler

August 7 – Using response streaming with AWS Lambda Web Adapter to optimize performance

August 15 – Integrating IBM MQ with Amazon SQS and Amazon SNS using Apache Camel

August 15 – Implementing the transactional outbox pattern with Amazon EventBridge Pipes

August 23 – Protecting an AWS Lambda function URL with Amazon CloudFront and Lambda@Edge

August 29 – Enhancing file sharing using Amazon S3 and AWS Step Functions

August 31 – Enhancing Workflow Studio with new features for streamlined authoring

September 2023

September 5 – AWS SAM support for HashiCorp Terraform now generally available

September 14 – Building a secure webhook forwarder using an AWS Lambda extension and Tailscale

September 18 – Building resilient serverless applications using chaos engineering

September 19 – Implementing idempotent AWS Lambda functions with Powertools for AWS Lambda (TypeScript)

September 19 – Centralizing management of AWS Lambda layers across multiple AWS Accounts

September 26 – Architecting for scale with Amazon API Gateway private integrations

September 26 – Visually design your application with AWS Application Composer

Videos

Serverless Office Hours – Tues 10AM PT

FooBar Serverless YouTube channel

July 2023

July 27 – Generative AI and Serverless to create a new story everyday

August 2023

August 3 – Getting started with Data Streaming

August 10 – Amazon Kinesis Data Streams – Shards? Provisioned? On-demand? What does all this mean?

August 17 – Put and consume events with AWS Lambda, Amazon Kinesis Data Stream and Event Source Mapping

August 24 – Create powerful data pipelines with Amazon Kinesis and EventBridge Pipes

August 31 – New Step Functions versions and alias!

September 2023

September 7 – Amazon Kinesis Data Firehose – What is this service for?

September 14 – Kinesis Data Firehose with AWS CDK – Lambda transformations

September 21 – Advanced Event Source Mapping configuration | AWS Lambda and Amazon Kinesis Data Streams

September 28 – Data Streaming Patterns

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
David Boyne: @boyney123

Filtering events in Amazon EventBridge with wildcard pattern matching

2023-10-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/filtering-events-in-amazon-eventbridge-with-wildcard-pattern-matching/

This post is written by Rajdeep Banerjee, Sr PSA, and Brian Krygsman, Sr. Solutions Architect.

Amazon EventBridge recently announced support for wildcard filters in rule event patterns. An EventBridge event bus is a serverless event router that helps you decouple your event-driven systems. You can route events between your systems, AWS services, or third-party SaaS services. You attach a rule to your event bus to define logic for routing events from producers to consumers.

You set an event pattern on the rule to filter incoming events to specific consumers. The new wildcard filter lets you build more flexible event matching patterns to reduce rule management and optimize your event consumers. This shows how these EventBridge attributes work together.

Wildcard filters use the wildcard character (*) to match zero, single, or multiple characters in a string value. For example, a filter string like "*.png" matches strings that end with ".png".

You can also use multiple wildcard characters in a filter. For example, a filter string like "*Title*" matches string values that include "Title" in the middle. When using wildcard filters, be careful to avoid matching more events than you intend.

This blog post describes how you can use wildcard filters in example scenarios. For more information about event-driven architectures, visit Serverless Land.

Wildcard pattern matching in S3 Event Notifications

Applications must often perform an action when new data is available. One example can be to process trading data uploaded to your Amazon S3 bucket. The data may be stored in individual folders depending on the date, time, and stock symbol. Business rules may dictate that when stock XYZ receives a file, it must send a notification to a downstream system.

This is the typical folder structure in an S3 bucket:

S3 can send an event to EventBridge when an object is written to a bucket. The S3 event includes the object key (for example, 2023-10-01/T13:22:22Z/XYZ/filename.ext). When any object is uploaded to the XYZ folder, you can use an EventBridge rule to send these events to a downstream service like an Amazon SQS.

Before this launch, you would first send the event to an AWS Lambda function. Existing prefix and suffix filters alone are insufficient because of the extra date and time folders. The function would run your code to inspect the object path for the stock symbol. Your code would then forward events to SQS when they matched.

With the new wildcard patterns in EventBridge rules, the logic is simpler. You no longer need to create a Lambda function to run custom matching code. You can instead use wildcard characters in the rule’s filter pattern, matching against portions of the S3 object key.

To use this, start with creating a new rule in the EventBridge console:

Choose Next. Keep the standard parameters and move to the Event pattern section. Here you can use a JSON-based event pattern.

{
  "source": ["aws.s3"],
  "detail": {
    "bucket": {
      "name": ["intraday-trading-data"]
    },
    "object": {
      "key": [{
        "wildcard": "*/XYZ/*"
      }]
    }
  }
}

This pattern looks for Event Notifications from a specific bucket. The pattern then filters the events further by the object keys that match "*/XYZ/*". The rule filters out notifications from other stock symbols, listening to only “XYZ“ data, irrespective of date and time of the data feed.
To use an SQS queue for the filtered event target, you must provide resource-based policies for EventBridge to send messages to the queue.
Choose Next and review the rule details before saving.
Before testing, enable S3 event notifications to EventBridge in the S3 console:
To test the new wildcard pattern, upload any sample CSV file in the XYZ folder to launch the Event Notifications.
You can monitor EventBridge CloudWatch metrics to check if the rule is invoked from the S3 upload. The SQS CloudWatch metrics show if messages are received from the EventBridge rule.

Filtering based on Amazon Resource Name (ARN)

Customers often need to perform actions when AWS Identity and Access Management (IAM) policies are added to specific roles. You can achieve this by creating custom EventBridge rules, which filter the event to match or create multiple rules to achieve the same effect. With the newly introduced wildcard filter, the task to invoke an action is simplified.

Consider an IAM role with fine-grained IAM policies attached. You may need to ensure any new policy attached to this role must be from a specific ARNs. This action can be implemented like this.

When you attach a new IAM policy to a role, it generates an event like this:

{
    "version": "0",
    "id": "0b85984e-ec53-84ba-140e-9e0cff7f05b4",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.iam",
    "account": "123456789012",
    "time": "2023-10-07T20:23:28Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "arn": "arn:aws:sts::123456789012:assumed-role/Admin/UserName",
            // ... additional detail fields
        },
        "eventTime": "2023-10-07T20:23:28Z",
        "eventSource": "iam.amazonaws.com",
        "eventName": "AttachRolePolicy",
        // ... additional detail fields

    }
}

You can create a rule matching against a combination of these event properties. You can filter detail.userIdentity.arn with a wildcard to catch events that come from a particular ARN. You can then route these events to a target like an Amazon CloudWatch Logs stream to record the change. You can also route them to Amazon Simple Notification Service (SNS). You can use the SNS notification to start a review and ensure that the newly attached policies are well-crafted as part of your reconciliation and audit process. The filter looks like this:

{
  "source": ["aws.iam"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["iam.amazonaws.com"],
    "eventName": ["AttachRolePolicy"],
    "userIdentity": {
      "arn": [{
        "wildcard": "arn:aws:sts::123456789012:assumed-role/*/*"
      }]
    }
  }
}

Filtering custom events

You can use EventBridge to build your own event-driven systems with loosely coupled, scalable application services. When building event-driven applications in AWS, you can publish events to the default event bus, or create a custom event bus. You define the structure of events emitted from your services.

This structure is known as the event schema. When you attach rules to your bus to route events from producers to consumers, you match against values from properties in your event schema. Wildcard filters allow you to match property values that are unknown ahead of time, or across multiple value variants.

Consider an ecommerce application as an example. You may have several decoupled services working together, like a shopping cart service, an inventory service, and others. Each of these services emits events onto your event bus as your customers shop.

Events may include errors, to record problems customers encounter using your system. You can use a single rule with a wildcard filter to match all error events and send them to a common target. This allows you to simplify observability across your services.

This is the event flow:

Your shopping cart service may emit a timeout error event:

{
  "version": "0",
  "id": "24a4b957-570d-590b-c213-2a72e5dc4c66",
  "detail-type": "shopping.cart.error.timeout",
  "source": "com.mybusiness.shopping.cart",
  "account": "123456789012",
  "time": "2023-10-06T03:28:44Z",
  "region": "us-west-2",
  "resources": [],
  "detail": {
    "message": "Operation timed out.",
    "related-entity": {
      "entity-type": "order",
      "id": "123"
    },
    // ... additional detail fields
  }
}

The detail-type property of the example event determines what type of event this is. Other services may emit error events with different prefixes in detail-type. Other error types might have different suffixes in detail-type.

For example, an inventory service may emit an out-of-stock error event like this:

{
  "version": "0",
  "id": "e456f480-cc1e-47fa-8399-ab2e54116958",
  "detail-type": "shopping.inventory.error.outofstock",
  "source": "com.mybusiness.shopping.inventory",
  "account": "123456789012",
  "time": "2023-10-06T03:28:44Z",
  "region": "us-west-2",
  "resources": [],
  "detail": {
    "message": "Product cannot be added to a cart. Out of stock.",
    "related-entity": {
      "entity-type": "product",
      "id": "456"
    }
    // ... additional detail fields
  }
}

To route these events to a common target like an Amazon CloudWatch Logs stream, you can create a rule with a wildcard filter matching against detail-type. You can combine this with a prefix filter on source that filters events down to only services from your shopping system. The filter looks like this:

{
  "source": [{
    "prefix": "com.mybusiness.shopping."
  }],
  "detail-type": [{
    "wildcard": "*.error.*"
  }]
}

Without a wildcard filter you would need to create a more complex matching pattern, possibly across multiple rules.

Conclusion

Wildcard filters in EventBridge rules help simplify your event driven applications by ensuring the correct events are passed on to your targets. The new feature reduces the need for custom code, which was required previously. Try EventBridge rules with wildcard filters and experience the benefits of this new feature in your event-driven serverless applications.

For more serverless learning resources, visit Serverless Land.

Building a serverless document chat with AWS Lambda and Amazon Bedrock

2023-10-04 Pascal Vogel

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/building-a-serverless-document-chat-with-aws-lambda-and-amazon-bedrock/

This post is written by Pascal Vogel, Solutions Architect, and Martin Sakowski, Senior Solutions Architect.

Large language models (LLMs) are proving to be highly effective at solving general-purpose tasks such as text generation, analysis and summarization, translation, and much more. Because they are trained on large datasets, they can use a broad generalist knowledge base. However, as training takes place offline and uses publicly available data, their ability to access specialized, private, and up-to-date knowledge is limited.

One way to improve LLM knowledge in a specific domain is fine-tuning them on domain-specific datasets. However, this is time and resource intensive, requires specialized knowledge, and may not be appropriate for some tasks. For example, fine-tuning won’t allow an LLM to access information with daily accuracy.

To address these shortcomings, Retrieval Augmented Generation (RAG) is proving to be an effective approach. With RAG, data external to the LLM is used to augment prompts by adding relevant retrieved data in the context. This allows for integrating disparate data sources and the complete separation of data sources from the machine learning model entirely.

Tools such as LangChain or LlamaIndex are gaining popularity because of their ability to flexibly integrate with a variety of data sources such as (vector) databases, search engines, and current public data sources.

In the context of LLMs, semantic search is an effective search approach, as it considers the context and intent of user-provided prompts as opposed to a traditional literal search. Semantic search relies on word embeddings, which represent words, sentences, or documents as vectors. Consequently, documents must be transformed into embeddings using an embedding model as the basis for semantic search. Because this embedding process only needs to happen when a document is first ingested or updated, it’s a great fit for event-driven compute with AWS Lambda.

This blog post presents a solution that allows you to ask natural language questions of any PDF document you upload. It combines the text generation and analysis capabilities of an LLM with a vector search on the document content. The solution uses serverless services such as AWS Lambda to run LangChain and Amazon DynamoDB for conversational memory.

Amazon Bedrock is used to provide serverless access to foundational models such as Amazon Titan and models developed by leading AI startups, such as AI21 Labs, Anthropic, and Cohere. See the GitHub repository for a full list of available LLMs and deployment instructions.

You learn how the solution works, what design choices were made, and how you can use it as a blueprint to build your own custom serverless solutions based on LangChain that go beyond prompting individual documents. The solution code and deployment instructions are available on GitHub.

Solution overview

Let’s look at how the solution works at a high level before diving deeper into specific elements and the AWS services used in the following sections. The following diagram provides a simplified view of the solution architecture and highlights key elements:

The process of interacting with the web application looks like this:

A user uploads a PDF document into an Amazon Simple Storage Service (Amazon S3) bucket through a static web application frontend.
This upload triggers a metadata extraction and document embedding process. The process converts the text in the document into vectors. The vectors are loaded into a vector index and stored in S3 for later use.
When a user chats with a PDF document and sends a prompt to the backend, a Lambda function retrieves the index from S3 and searches for information related to the prompt.
An LLM then uses the results of this vector search, previous messages in the conversation, and its general-purpose capabilities to formulate a response to the user.

As can be seen on the following screenshot, the web application deployed as part of the solution allows you to upload documents and list uploaded documents and their associated metadata, such as number of pages, file size, and upload date. The document status indicates if a document is successfully uploaded, is being processed, or is ready for a conversation.

By clicking on one of the processed documents, you can access a chat interface, which allows you to send prompts to the backend. It is possible to have multiple independent conversations with each document with separate message history.

Embedding documents

When a new document is uploaded to the S3 bucket, an S3 event notification triggers a Lambda function that extracts metadata, such as file size and number of pages, from the PDF file and stores it in a DynamoDB table. Once the extraction is complete, a message containing the document location is placed on an Amazon Simple Queue Service (Amazon SQS) queue. Another Lambda function polls this queue using Lambda event source mapping. Applying the decouple messaging pattern to the metadata extraction and document embedding functions ensures loose coupling and protects the more compute-intensive downstream embedding function.

The embedding function loads the PDF file from S3 and uses a text embedding model to generate a vector representation of the contained text. LangChain integrates with text embedding models for a variety of LLM providers. The resulting vector representation of the text is loaded into a FAISS index. FAISS is an open source vector store that can run inside the Lambda function memory using the faiss-cpu Python package. Finally, a dump of this FAISS index is stored in the S3 bucket besides the original PDF document.

Generating responses

When a prompt for a specific document is submitted via the Amazon API Gateway REST API endpoint, it is proxied to a Lambda function that:

Loads the FAISS index dump of the corresponding PDF file from S3 and into function memory.
Performs a similarity search of the FAISS vector store based on the prompt.
If available, retrieves a record of previous messages in the same conversation via the DynamoDBChatMessageHistory integration. This integration can store message history in DynamoDB. Each conversation is identified by a unique ID.
Finally, a LangChain ConversationalRetrievalChain passes the combination of the prompt submitted by the user, the result of the vector search, and the message history to an LLM to generate a response.

Web application and file uploads

A static web application serves as the frontend for this solution. It’s built with React, TypeScript, Vite, and TailwindCSS and deployed via AWS Amplify Hosting, a fully managed CI/CD and hosting service for fast, secure, and reliable static and server-side rendered applications. To protect the application from unauthorized access, it integrates with an Amazon Cognito user pool. The API Gateway uses an Amazon Cognito authorizer to authenticate requests.

Users upload PDF files directly to the S3 bucket using S3 presigned URLs obtained via the REST API. Several Lambda functions implement API endpoints used to create, read, and update document metadata in a DynamoDB table.

Extending and adapting the solution

The solution provided serves as a blueprint that can be enhanced and extended to develop your own use cases based on LLMs. For example, you can extend the solution so that users can ask questions across multiple PDF documents or other types of data sources. LangChain makes it easy to load different types of data into vector stores, which you can then use for semantic search.

Once your use case involves searching across multiple documents, consider moving from loading vectors into memory with FAISS to a dedicated vector database. There are several options for vector databases on AWS. One serverless option is Amazon Aurora Serverless v2 with the pgvector extension for PostgreSQL. Alternatively, vector databases developed by AWS Partners such as Pinecone or MongoDB Atlas Vector Search can be integrated with LangChain. Besides vector search, LangChain also integrates with traditional external data sources, such as the enterprise search service Amazon Kendra, Amazon OpenSearch, and many other data sources.

The solution presented in this blog post uses similarity search to find information in the vector database that closely matches the user-supplied prompt. While this works well in the presented use case, you can also use other approaches, such as maximal marginal relevance, to find the most relevant information to provide to the LLM. When searching across many documents and receiving many results, techniques such as MapReduce can improve the quality of the LLM responses.

Depending on your use case, you may also want to select a different LLM to achieve an ideal balance between quality of results and cost. Amazon Bedrock is a fully managed service that makes foundational models (FMs) from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. You can use models such as Amazon Titan, Jurassic-2 from AI21 Labs, or Anthropic Claude.

To further optimize the user experience of your generative AI application, consider streaming LLM responses to your frontend in real-time using Lambda response streaming and implementing real-time data updates using AWS AppSync subscriptions or Amazon API Gateway WebSocket APIs.

Conclusion

AWS serverless services make it easier to focus on building generative AI applications by providing automatic scaling, built-in high availability, and a pay-for-use billing model. Event-driven compute with AWS Lambda is a good fit for compute-intensive, on-demand tasks such as document embedding and flexible LLM orchestration.

The solution in this blog post combines the capabilities of LLMs and semantic search to answer natural language questions directed at PDF documents. It serves as a blueprint that can be extended and adapted to fit further generative AI use cases.

Deploy the solution by following the instructions in the associated GitHub repository.

For more serverless learning resources, visit Serverless Land.

Visually design your application with AWS Application Composer

2023-09-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visually-design-your-application-with-aws-application-composer/

This post is written by Paras Jain, Senior Technical Account Manager and Curtis Darst, Senior Solutions Architect.

AWS Application Composer allows you to design and build applications visually using 13 AWS CloudFormation resource types. Today, the service expands the support to all available CloudFormation resource types.

Overview

AWS Application Composer provides you with an interactive canvas for visually designing your applications. You use a drag-and-drop interface to create an application design from scratch or import an existing application definition to edit it.

Modern event-driven applications are built on many services. Visualizing an architecture helps you better understand the relationship between those services and identify gaps and areas of improvements.

You can use AWS Application Composer in local sync mode to connect to your local file system. That way your changes are updated to your file system. This way, you can integrate with existing version control systems and development and deployment workflow.

AWS Application Composer provides a drag-and-drop canvas view and a code editor template view. Changes made to one view reflect on the other view. Similarly, changes made in AWS Application Composer are reflected in your local code editor and vice versa.

What is AWS releasing today?

AWS Application Composer already supports 13 serverless resource types. For these resource types, AWS Application Composer provides enhanced component cards.

Enhanced component cards allow you to configure and join components together. Today’s release gives you the ability to drag and drop 1,134 resource types to the canvas and configure these using resource configuration pane.

This blog post shows how you can create a fault tolerant compute architecture involving an Application Load Balancer, two Amazon Elastic Compute Cloud (EC2) instances in different Availability Zones, and an Amazon Relational Database Service (RDS) instance.

Conceptually, this is the application design:

Designing a scalable and fault tolerant compute stack

For this blog post, you create a fault tolerant compute stack consisting of an ALB, two EC2 instances in two different Availability Zones with automatic scaling capabilities and an RDS instance.

Navigate to the AWS Application Composer service in the AWS Management Console. Create a new project by choosing Create Project.
If you are using one of the browsers that support local sync (Google Chrome and Microsoft Edge at this time), you can connect the project to the local file system and edit using command line interface or integrated development environment. To do so:
1. Choose Menu, and Local sync.
2. Select a folder on your file system and allow the necessary permissions from the browser when prompted.

Some components in architecture diagrams, like security groups, can be visualized in the canvas but you don’t necessarily want to represent them as prominent part of architectures. Therefore, for brevity, instead of dragging and dropping, you only configure them in the template mode.

Choose Template to switch to the template view.

Paste the following code in the template editor:

Resources:
  DBEC2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Open database for access
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '3306'
          ToPort: '3306'
          SourceSecurityGroupId: !Ref WebServerSecurityGroup
      VpcId:
        ParameterId: VpcId
        Format: AWS::EC2::VPC::Id
  WebServerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable HTTP access via port 80 locked down to the load balancer + SSH access.
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '80'
          ToPort: '80'
          SourceSecurityGroupId: !Select
            - 0
            - !GetAtt LoadBalancer.SecurityGroups
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp:
            ParameterId: SSHLocation
            Format: String
            Default: 0.0.0.0/0
      VpcId:
        ParameterId: VpcId
        Format: AWS::EC2::VPC::Id
  WebServerGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      VPCZoneIdentifier:
        ParameterId: Subnets
        Format: List<AWS::EC2::Subnet::Id>
      LaunchConfigurationName: !Ref LaunchConfiguration
      MinSize: '1'
      MaxSize: '5'
      DesiredCapacity:
        ParameterId: WebServerCapacity
        Format: Number
        Default: '1'
      TargetGroupARNs:
        - !Ref TargetGroup

Switch back to canvas view.

Add an Application Load Balancer, Load Balancer Listener, Load Balancer Target Group, Auto Scaling Launch Configuration and an RDS DB instance.
1. Under the resources pane on the left, enter loadbalancer in the search bar.
2. Drag and drop AWS::ElasticLoadBalancingV2::LoadBalancer from the resources pane to the canvas.
Repeat these steps for other four resource types. Choose Arrange. Your canvas now appears as follows:
Start configuring the remaining component cards. You can connect two cards visually by connecting the right connection port of one card to the left connection port of another card. At the moment, not all component cards support visual connectivity. For those cards you can establish connectivity using the resource configuration pane. You can also update the template code directly. Either way, the connectivity is reflected in the canvas.
You configure the components in the architecture using the Resource configuration pane. First, configure the Application Load Balancer listener:
1. Choose the Listener Card in the canvas.
2. Choose Details.
3. Paste the following code in the Resource Configuration Section:
```
DefaultActions:
     Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref LoadBalancer
Port: '80'
Protocol: HTTP
```
4. Choose Save.
Repeat the same for remaining resource types with the following code. The code for the Load Balancer Card is:
```
Subnets:
ParameterId: Subnets
Format: List<AWS::EC2::Subnet::Id>
```

The code for the Target Group card is:

HealthCheckPath: /
HealthCheckIntervalSeconds: 10
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
Port: 80
Protocol: HTTP
UnhealthyThresholdCount: 5
VpcId:
  ParameterId: VpcId
  Format: AWS::EC2::VPC::Id
TargetGroupAttributes:
  - Key: stickiness.enabled
    Value: 'true'
  - Key: stickiness.type
    Value: lb_cookie
  - Key: stickiness.lb_cookie.duration_seconds
    Value: '30'

This is the code for the Launch Configuration. Replace <image-id>with the right image id for your Region.
```
ImageId: <image-id>
InstanceType: t2.small
SecurityGroups: !Ref WebServerSecurityGroup
```

The code for DBInstance is:

DBName:
  ParameterId: DBName
  Format: String
  Default: wordpressdb
Engine: MySQL
MultiAZ:
  ParameterId: MultiAZDatabase
  Format: String
  Default: 'false'
MasterUsername:
  ParameterId: DBUser
  Format: String
MasterUserPassword:
  ParameterId: DBPassword
  Format: String
DBInstanceClass:
  ParameterId: DBClass
  Format: String
  Default: db.t2.small
AllocatedStorage:
  ParameterId: DBAllocatedStorage
  Format: Number
  Default: '5'
VPCSecurityGroups:
  - !GetAtt DBEC2SecurityGroup.GroupId

Choose Arrange. Your canvas looks like this:
This completes the visualization portion of the application architecture. You can export this visualization by using the Export Canvas option in the menu.

Adding observability

After adding the core application components, you now add observability to your application. Observability enables you to collect and analyze important events and metrics for your applications.

To be notified of any changes to the RDS database configuration, use a serverless design pattern to avoid running instances when they are not needed. Conceptually, your observability stack looks like:

Amazon EventBridge captures the events emitted by Amazon RDS.
For any event matching the EventBridge rule, EventBridge invokes AWS Lambda.
Lambda runs the custom logic and send an email to an Amazon Simple Notification Service(SNS) topic. You can subscribe interested parties to this SNS topic.

There are now two distinct sets of components in the architecture. One set of components comprises the core application while another comprises the observability logic.

AWS Application Composer allows you to organize different components in groups. This allows you and your team to focus on one portion of the architecture at a time. Before adding observability components, first create a group of the existing components.

Select a component card.
While holding the ‘shift’ key, select the other cards. Once all resources are selected, select Group action.

Once the group is created, follow these steps to rename the group.

Select the Group card.
Rename the group to Application Stack.
Choose Save.

Now add the observability components. Repeat the process of searching then dragging and dropping of the following components from the Resources pane to the canvas outside the Application Stack group.

1. EventBridge Event rule
2. Lambda Function
3. SNS Topic
4. SNS Subscription

Repeat the process for grouping these 4 components in a group with the name Observability.

Some of the components have a small circle on their sides. These are connector ports. A port on the right side of a card indicates an opportunity for the card to invoke another card. A port on the left side indicates an opportunity for a card to be invoked by another card. You can connect two cards by clicking the right port of a card and dragging to the left port of another card.

Create the observability stack by following the following steps:

Connect the right port of EventBridge Event Rule card to the left port of Lambda Function Card. This makes the Lambda function a target for the EventBridge rule.
Connect the right port of the Lambda function to the left port of the SNS topic. This adds the necessary AWS Identity and Management(IAM) permissions policies and environment variable to the Lambda function to provide it the ability to interact with the SNS topic.
Select the EventBridge event rule card and replace the event pattern code in the resource properties pane with the following code. This event pattern monitors the RDS instance for an instance change event and pushes this event to Lambda.
```
source:
  - aws.rds
detail-type:
  - RDS DB Instance Event
```
Select the SNS subscription to see the resource configuration pane. Add the following code to the resource configuration. Replace [email protected] with your email address.
```
    Endpoint: [email protected]
    Protocol: email
    TopicArn: !Ref Topic
```
Repeat the group creating steps to create an observability group comprising an EventBridge event rule, Lambda function, SNS topic, and SNS subscription. Name the group Observability. Your group appears as follows:

Deploying your AWS Architecture

Before you can provision the resources for your architecture, you must make the configuration changes as per development and deployment best practices for your organization.

For example, you must provide a strong DB password, name the resources as per the naming conventions of your organization. You must also add the Lambda code with your custom logic.

AWS Application Composer provides you the ability to configure each resource via resource configuration panel. This enables you to always stay in-context while creating a complex architecture. You can quickly find the resource you want to edit instead of scrolling through a large template file. If you prefer to edit the template file directly, you can use the Template View of AWS Application Composer.

Alternatively, if you have enabled the local sync, you can edit the file directly in your integrated development environment (IDE) where changes made in AWS Application Composer are saved in real-time. If you have not enabled the local sync, you can export the template using the Save Template File option in the menu. After concluding your changes, you can provision the AWS infrastructure either by using AWS CloudFormation Console or by command line interface.

Pricing

AWS Application Composer does not provision any AWS resources. Using AWS Application Composer to design your application architecture is free. You are only charged when you provision AWS Resources using the template file created by AWS Application Composer.

Conclusion

This blog post shows how to use AWS Application Composer to create and update an application architecture using any of the 1,134 CloudFormation resource types. It covers how to configure local sync mode to integrate the AWS Application to your development workflow. The post demonstrates how to organize your architecture into two distinct groups. Changes made in Canvas view are reflected in the template view and vice versa.

To learn more about AWS Application Composer visit https://aws.amazon.com/application-composer/.

For more serverless learning resources, visit Serverless Land.

Architecting for scale with Amazon API Gateway private integrations

2023-09-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/architecting-for-scale-with-amazon-api-gateway-private-integrations/

This post is written by Lior Sadan, Sr. Solutions Architect, and Anandprasanna Gaitonde,
Sr. Solutions Architect.

Organizations use Amazon API Gateway to build secure, robust APIs that expose internal services to other applications and external users. When the environment evolves to many microservices, customers must ensure that the API layer can handle the scale without compromising security and performance. API Gateway provides various API types and integration options, and builders must consider how each option impacts the ability to scale the API layer securely and performantly as the microservices environment grows.

This blog post compares architecture options for building scalable, private integrations with API Gateway for microservices. It covers REST and HTTP APIs and their use of private integrations, and shows how to develop secure, scalable microservices architectures.

Overview

Here is a typical API Gateway implementation with backend integrations to various microservices:

API Gateway handles the API layer, while integrating with backend microservices running on Amazon EC2, Amazon Elastic Container Service (ECS), or Amazon Elastic Kubernetes Service (EKS). This blog focuses on containerized microservices that expose internal endpoints that the API layer then exposes externally.

To keep microservices secure and protected from external traffic, they are typically implemented within an Amazon Virtual Private Cloud (VPC) in a private subnet, which is not accessible from the internet. API Gateway offers a way to expose these resources securely beyond the VPC through private integrations using VPC link. Private integration forwards external traffic sent to APIs to private resources, without exposing the services to the internet and without leaving the AWS network. For more information, read Best Practices for Designing Amazon API Gateway Private APIs and Private Integration.

The example scenario has four microservices that could be hosted in one or more VPCs. It shows the patterns integrating the microservices with front-end load balancers and API Gateway via VPC links.

While VPC links enable private connections to microservices, customers may have additional needs:

Increase scale: Support a larger number of microservices behind API Gateway.
Independent deployments: Dedicated load balancers per microservice enable teams to perform blue/green deployments independently without impacting other teams.
Reduce complexity: Ability to use existing microservice load balancers instead of introducing additional ones to achieve API Gateway integration
Low latency: Ensure minimal latency in API request/response flow.

API Gateway offers HTTP APIs and REST APIs (see Choosing between REST APIs and HTTP APIs) to build RESTful APIs. For large microservices architectures, the API type influences integration considerations:

	VPC link supported integrations	Quota on VPC links per account per Region
REST API	Network Load Balancer (NLB)	20
HTTP API	Network Load Balancer (NLB), Application Load Balancer (ALB), and AWS Cloud Map	10

This post presents four private integration options taking into account the different capabilities and quotas of VPC link for REST and HTTP APIs:

Option 1: HTTP API using VPC link to multiple NLBs or ALBs.
Option 2: REST API using multiple VPC links.
Option 3: REST API using VPC link with NLB.
Option 4: REST API using VPC link with NLB and ALB targets.

Option 1: HTTP API using VPC link to multiple NLBs or ALBs

HTTP APIs allow connecting a single VPC link to multiple ALBs, NLBs, or resources registered with an AWS Cloud Map service. This provides a fan out approach to connect with multiple backend microservices. However, load balancers integrated with a particular VPC link should reside in the same VPC.

Two microservices are in a single VPC, each with its own dedicated ALB. The ALB listeners direct HTTPS traffic to the respective backend microservice target group. A single VPC link is connected to two ALBs in that VPC. API Gateway uses path-based routing rules to forward requests to the appropriate load balancer and associated microservice. This approach is covered in Best Practices for Designing Amazon API Gateway Private APIs and Private Integration – HTTP API. Sample CloudFormation templates to deploy this solution are available on GitHub.

You can add additional ALBs and microservices within VPC IP space limits. Use the Network Address Usage (NAU) to design the distribution of microservices across VPCs. Scale beyond one VPC by adding VPC links to connect more VPCs, within VPC link quotas. You can further scale this by using routing rules like path-based routing at the ALB to connect more services behind a single ALB (see Quotas for your Application Load Balancers). This architecture can also be built using an NLB.

Benefits:

High degree of scalability. Fanning out to multiple microservices using single VPC link and/or multiplexing capabilities of ALB/NLB.
Direct integration with existing microservices load balancers eliminates the need for introducing new components and reducing operational burden.
Lower latency for API request/response thanks to direct integration.
Dedicated load balancers per microservice enable independent deployments for microservices teams.

Option 2: REST API using multiple VPC links

For REST APIs, the architecture to support multiple microservices may differ due to these considerations:

NLB is the only supported private integration for REST APIs.
VPC links for REST APIs can have only one target NLB.

A VPC link is required for each NLB, even if the NLBs are in the same VPC. Each NLB serves one microservice, with a listener to route API Gateway traffic to the target group. API Gateway path-based routing sends requests to the appropriate NLB and corresponding microservice. The setup required for this private integration is similar to the example described in Tutorial: Build a REST API with API Gateway private integration.

To scale further, add additional VPC link and NLB integration for each microservice, either in the same or different VPCs based on your needs and isolation requirements. This approach is limited by the VPC links quota per account per Region.

Benefits:

Single NLB in the request path reduces operational complexity.
Dedicated NLBs for each enable independent microservice deployments.
No additional hops in the API request path results in lower latency.

Considerations:

Limits scalability due to a one-to-one mapping of VPC links to NLBs and microservices limited by VPC links quota per account per Region.

Option 3: REST API using VPC link with NLB

The one-to-one mapping of VPC links to NLBs and microservices in option 2 has scalability limits due to VPC link quotas. An alternative is to use multiple microservices per NLB.

A single NLB fronts multiple microservices in a VPC by using multiple listeners, with each listener on a separate port per microservice. Here, NLB1 fronts two microservices in one VPC. NLB2 fronts two other microservices in a second VPC. With multiple microservices per NLB, routing is defined for the REST API when choosing the integration point for a method. You define each service using a combination of selecting the VPC Link, which is integrated with a specific NLB, and a specific port that is assigned for each microservice at the NLB Listener and addressed from the Endpoint URL.

To scale out further, add additional listeners to existing NLBs, limited by Quotas for your Network Load Balancers. In cases where each microservice has its dedicated load balancer or access point, those are configured as targets to the NLB. Alternatively, integrate additional microservices by adding additional VPC links.

Benefits:

Larger scalability – limited by NLB listener quotas and VPC link quotas.
Managing fewer NLBs supporting multiple microservices reduces operational complexity.
Low latency with a single NLB in the request path.

Considerations:

Shared NLB configuration limits independent deployments for individual microservices teams.

Option 4: REST API using VPC link with NLB and ALB targets

Customers often build microservices with ALB as their access point. To expose these via API Gateway REST APIs, you can take advantage of ALB as a target for NLB. This pattern also increases the number of microservices supported compared to the option 3 architecture.

A VPC link (VPCLink1) is created with NLB1 in a VPC1. ALB1 and ALB2 front-end the microservices mS1 and mS2, added as NLB targets on separate listeners. VPC2 has a similar configuration. Your isolation needs and IP space determine if microservices can reside in one or multiple VPCs.

To scale out further:

Create additional VPC links to integrate new NLBs.
Add NLB listeners to support more ALB targets.
Configure ALB with path-based rules to route requests to multiple microservices.

Benefits:

High scalability integrating services using NLBs and ALBs.
Independent deployments per team is possible when each ALB is dedicated to a single microservice.

Considerations:

Multiple load balancers in the request path can increase latency.

Considerations and best practices

Beyond the scaling considerations of scale with VPC link integration discussed in this blog, there are other considerations:

Evaluate REST APIs and HTTP APIs capabilities to meet your requirements.
Choose the optimal load balancer type for your application needs.
For multi-account architecture reference Building private cross-account APIs using Amazon API Gateway and AWS PrivateLink.
Avoid exceeding default quotas rapidly and request a quota increase for higher limit requirements.
Monitor Service Quotas to plan proactively and mitigate risks as your architecture evolves. Consider the use of the Quota Monitor solution for monitoring.
See Best Practices for Designing Amazon API Gateway Private APIs and Private Integration – Rest API.

Conclusion

This blog explores building scalable API Gateway integrations for microservices using VPC links. VPC links enable forwarding external traffic to backend microservices without exposing them to the internet or leaving the AWS network. The post covers scaling considerations based on using REST APIs versus HTTP APIs and how they integrate with NLBs or ALBs across VPCs.

While API type and load balancer selection have other design factors, it’s important to keep the scaling considerations discussed in this blog in mind when designing your API layer architecture. By optimizing API Gateway implementation for performance, latency, and operational needs, you can build a robust, secure API to expose microservices at scale.

For more serverless learning resources, visit Serverless Land.

Centralizing management of AWS Lambda layers across multiple AWS Accounts

2023-09-19 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/centralizing-management-of-aws-lambda-layers-across-multiple-aws-accounts/

This post is written by Debasis Rath, Sr. Specialist SA-Serverless, Kanwar Bajwa, Enterprise Support Lead, and Xiaoxue Xu, Solutions Architect (FSI).

Enterprise customers often manage an inventory of AWS Lambda layers, which provide shared code and libraries to Lambda functions. These Lambda layers are then shared across AWS accounts and AWS Organizations to promote code uniformity, reusability, and efficiency. However, as enterprises scale on AWS, managing shared Lambda layers across an increasing number of functions and accounts is best handled with automation.

This blog post centralizes the management of Lambda layers to ensure compliance with your enterprise’s governance standards, and promotes consistency across your infrastructure. This centralized management uses a detective configuration approach to identify non-compliant Lambda functions systematically using outdated Lambda layer versions, and corrective measures to remediate these Lambda functions by updating them with the right layer version.

This solution uses AWS services such as AWS Config, Amazon EventBridge Scheduler, AWS Systems Manager (SSM) Automation, and AWS CloudFormation StackSets.

Solution overview

This solution offers two parts for layers management:

On-demand visibility into outdated Lambda functions.
Automated remediation of the affected Lambda functions.

This is the architecture for the first part. Users with the necessary permissions can use AWS Config advanced queries to obtain a list of outdated Lambda functions.

The current configuration state of any Lambda function is captured by the configuration recorder within the member account. This data is then aggregated by the AWS Config Aggregator within the management account. The aggregated data can be accessed using queries.

This diagram depicts the architecture for the second part. Administrators must manually deploy CloudFormation StackSets to initiate the automatic remediation of outdated Lambda functions.

The manual remediation trigger is used instead of a fully automated solution. Administrators schedule this manual trigger as part of a change request to minimize disruptions to the business. All business stakeholders owning affected Lambda functions should receive this change request notification and have adequate time to perform unit tests to assess the impact.

Upon receiving confirmation from the business stakeholders, the administrator deploys the CloudFormation StackSets, which in turn deploy the CloudFormation stack to the designated member account and Region. After the CloudFormation stack deployment, the EventBridge scheduler invokes an AWS Config custom rule evaluation. This rule identifies the non-compliant Lambda functions, and later updates them using SSM Automation runbooks.

The following walkthrough deploys the two-part architecture described, using a centralized approach to layer management as in the preceding diagram. A decentralized approach scatters management and updates of Lambda layers across accounts, making enforcement more difficult and error-prone.

This solution is also available on GitHub.

Prerequisites

For the solution walkthrough, you should have the following prerequisites:

CloudFormation StackSets enabled for your AWS Organizations. Refer to the documentation to enable AWS CloudFormation StackSets at the organizational level.
AWS Config enabled for your AWS Organizations. Refer to the provided documentation to enable AWS Config at the organizational level.
An AWS Config Aggregator set up to collect recorded configuration data from all accounts across all AWS Regions within your AWS Organizations. Refer to the provided documentation to create an aggregator.
The necessary permissions to deploy CloudFormation StackSets and to query AWS Config.
AWS CloudShell to run scripts with the AWS Command Line Interface (CLI).

Writing an on-demand query for outdated Lambda functions

First, you write and run an AWS Config advanced query to identify the accounts and Regions where the outdated Lambda functions reside. This is helpful for end users to determine the scope of impact, and identify the responsible groups to inform based on the affected Lambda resources.

Follow these procedures to understand the scope of impact using the AWS CLI:

Open CloudShell in your AWS account.

Run the following AWS CLI command. Replace YOUR_AGGREGATOR_NAME with the name of your AWS Config aggregator, and YOUR_LAYER_ARN with the outdated Lambda layer Amazon Resource Name (ARN).

aws configservice select-aggregate-resource-config \
--expression "SELECT accountId, awsRegion, configuration.functionName, configuration.version WHERE resourceType = 'AWS::Lambda::Function' AND configuration.layers.arn = 'YOUR_LAYER_ARN'" \
--configuration-aggregator-name 'YOUR_AGGREGATOR_NAME' \
--query "Results" \
--output json | \
jq -r '.[] | fromjson | [.accountId, .awsRegion, .configuration.functionName, .configuration.version] | @csv' > output.csv

The results are saved to a CSV file named output.csv in the current working directory. This file contains the account IDs, Regions, names, and versions of the Lambda functions that are currently using the specified Lambda layer ARN. Refer to the documentation on how to download a file from AWS CloudShell.

To explore more configuration data and further improve visualization using services like Amazon Athena and Amazon QuickSight, refer to Visualizing AWS Config data using Amazon Athena and Amazon QuickSight.

Deploying automatic remediation to update outdated Lambda functions

Next, you deploy the automatic remediation CloudFormation StackSets to the affected accounts and Regions where the outdated Lambda functions reside. You can use the query outlined in the previous section to obtain the account IDs and Regions.

Updating Lambda layers may affect the functionality of existing Lambda functions. It is essential to notify affected development groups, and coordinate unit tests to prevent unintended disruptions before remediation.

To create and deploy CloudFormation StackSets from your management account for automatic remediation:

Run the following command in CloudShell to clone the GitHub repository:
```
git clone https://github.com/aws-samples/lambda-layer-management.git
```

Run the following CLI command to upload your template and create the stack set container.

aws cloudformation create-stack-set \
  --stack-set-name layers-remediation-stackset \
  --template-body file://lambda-layer-management/layer_manager.yaml

Run the following CLI command to add stack instances in the desired accounts and Regions to your CloudFormation StackSets. Replace the account IDs, Regions, and parameters before you run this command. You can refer to the syntax in the AWS CLI Command Reference. “NewLayerArn” is the ARN for your updated Lambda layer, while “OldLayerArn” is the original Lambda layer ARN.
```
aws cloudformation create-stack-instances \
--stack-set-name layers-remediation-stackset \
--accounts <LIST_OF_ACCOUNTS> \
--regions <YOUR_REGIONS> \
--parameter-overrides ParameterKey=NewLayerArn,ParameterValue='<NEW_LAYER_ARN>' ParameterKey=OldLayerArn,ParameterValue='=<OLD_LAYER_ARN>'
```
Run the following CLI command to verify that the stack instances are created successfully. The operation ID is returned as part of the output from step 3.
```
aws cloudformation describe-stack-set-operation \
  --stack-set-name layers-remediation-stackset \
  --operation-id <OPERATION_ID>
```

This CloudFormation StackSet deploys an EventBridge Scheduler that immediately triggers the AWS Config custom rule for evaluation. This rule, written in AWS CloudFormation Guard, detects all the Lambda functions in the member accounts currently using the outdated Lambda layer version. By using the Auto Remediation feature of AWS Config, the SSM automation document is run against each non-compliant Lambda function to update them with the new layer version.

Other considerations

The provided remediation CloudFormation StackSet uses the UpdateFunctionConfiguration API to modify your Lambda functions’ configurations directly. This method of updating may lead to drift from your original infrastructure as code (IaC) service, such as the CloudFormation stack that you used to provision the outdated Lambda functions. In this case, you might need to add an additional step to resolve drift from your original IaC service.

Alternatively, you might want to update your IaC code directly, referencing the latest version of the Lambda layer, instead of deploying the remediation CloudFormation StackSet as described in the previous section.

Cleaning up

Refer to the documentation for instructions on deleting all the created stack instances from your account. After, proceed to delete the CloudFormation StackSet.

Conclusion

Managing Lambda layers across multiple accounts and Regions can be challenging at scale. By using a combination of AWS Config, EventBridge Scheduler, AWS Systems Manager (SSM) Automation, and CloudFormation StackSets, it is possible to streamline the process.

The example provides on-demand visibility into affected Lambda functions and allows scheduled remediation of impacted functions. AWS SSM Automation further simplifies maintenance, deployment, and remediation tasks. With this architecture, you can efficiently manage updates to your Lambda layers and ensure compliance with your organization’s policies, saving time and reducing errors in your serverless applications.

To learn more about using Lambda layer, visit the AWS documentation. For more serverless learning resources, visit Serverless Land.