Tag Archives: contributed

Building a Serverless Streaming Pipeline to Deliver Reliable Messaging

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/building-a-serverless-streaming-pipeline-to-deliver-reliable-messaging/

This post is written by Jeff Harman, Senior Prototyping Architect, Vaibhav Shah, Senior Solutions Architect and Erik Olsen, Senior Technical Account Manager.

Many industries are required to provide audit trails for decision and transactional systems. AI assisted decision making requires monitoring the full inputs to the decision system in near real time to prevent fraud, detect model drift, and discrimination. Modern systems often use a much wider array of inputs for decision making, including images, unstructured text, historical values, and other large data elements. These large data elements pose a challenge to traditional audit systems that deal with relatively small text messages in structured formats. This blog shows the use of serverless technology to create a reliable, performant, traceable, and durable streaming pipeline for audit processing.

Overview

Consider the following four requirements to develop an architecture for audit record ingestion:

  1. Audit record size: Store and manage large payloads (256k – 6 MB in size) that may be heterogeneous, including text, binary data, and references to other storage systems.
  2. Audit traceability: The data stored has full traceability of the payload and external processes to monitor the process via subscription-based events.
  3. High Performance: The time required for blocking writes to the system is limited to the time it takes to transmit the audit record over the network.
  4. High data durability: Once the system sends a payload receipt, the payload is at very low risk of loss because of system failures.

The following diagram shows an architecture that meets these requirements and models the flow of the audit record through the system.

The primary source of latency is the time it takes for an audit record to be transmitted across the network. Applications sending audit records make an API call to an Amazon API Gateway endpoint. An AWS Lambda function receives the message and an Amazon ElastiCache for Redis cluster provides a low latency initial storage mechanism for the audit record. Once the data is stored in ElastiCache, the AWS Step Functions workflow then orchestrates the communication and persistence functions.

Subscribers receive four Amazon Simple Notification Service (Amazon SNS) notifications pertaining to arrival and storage of the audit record payload, storage of the audit record metadata, and audit record archive completion. Users can subscribe an Amazon Simple Queue Service (SQS) queue to the SNS topic and use fan out mechanisms to achieve high reliability.

  1. The Ingest Message Lambda function sends an initial receipt notification
  2. The Message Archive Handler Lambda function notifies on storage of the audit record from ElastiCache to Amazon Simple Storage Service (Amazon S3)
  3. The Message Metadata Handler Lambda function notifies on storage of the message metadata into Amazon DynamoDB
  4. The Final State Aggregation Lambda function notifies that the audit record has been archived.

Any failure by the three fundamental processing steps: Ingestion, Data Archive, and Metadata Archive triggers a message in an SQS Dead Letter Queue (DLQ) which contains the original request and an explanation of the failure reason. Any failure in the Ingest Message function invokes the Ingest Message Failure function, which stores the original parameters to the S3 Failed Message Storage bucket for later analysis.

The Step Functions workflow provides orchestration and parallel path execution for the system. The detailed workflow below shows the execution flow and notification actions. The transformer steps convert the internal data structures into the format required for consumers.

Data structures

There are types three events and messages managed by this system:

  1. Incoming message: This is the message the producer sends to an API Gateway endpoint.
  2. Internal message: This event contains the message metadata allowing subsequent systems to understand the originating message producer context.
  3. Notification message: Messages that allow downstream subscribers to act based on the message.

Solution walkthrough

The message producer calls the API Gateway endpoint, which enforces the security requirements defined by the business. In this implementation, API Gateway uses an API key for providing more robust security. API Gateway also creates a security header for consumption by the Ingest Message Lambda function. API Gateway can be configured to enforce message format standards, see Use request validation in API Gateway for more information.

The Ingest Message Lambda function generates a message ID that tracks the message payload throughout its lifecycle. Then it stores the full message in the ElastiCache for Redis cache. The Ingest Message Lambda function generates an internal message with all the elements necessary as described above. Finally, the Lambda function handler code starts the Step Functions workflow with the internal message payload.

If the Ingest Message Lambda function fails for any reason, the Lambda function invokes the Ingestion Failure Handler Lambda function. This Lambda function writes any recoverable incoming message data to an S3 bucket and sends a notification on the Ingest Message dead letter queue.

The Step Functions workflow then runs three processes in parallel.

  • The Step Functions workflow triggers the Message Archive Data Handler Lambda function to persist message data from the ElastiCache cache to an S3 bucket. Once stored, the Lambda function returns the S3 bucket reference and state information. There are two options to remove the internal message from the cache. Remove the message from cache immediately before sending the internal message and updating the ElastiCache cache flag or wait for the ElastiCache lifecycle to remove a stale message from cache. This solution waits for the ElastiCache lifecycle to remove the message.
  • The workflow triggers the Message Metadata Handler Lambda function to write all message metadata and security information to DynamoDB. The Lambda function replies with the DynamoDB reference information.
  • Finally, the Step Functions workflow sends a message to the SNS topic to inform subscribers that the message has arrived and the data persistence processes have started.

After each of the Lambda functions’ processes complete, the Lambda function sends a notification to the SNS notification topic to alert subscribers that each action is complete. When both Message Metadata and Message Archive Lambda functions are done, the Final Aggregation function makes a final update to the metadata in DynamoDB to include S3 reference information and to remove the ElastiCache Redis reference.

Deploying the solution

Prerequisites:

  1. AWS Serverless Application Model (AWS SAM) is installed (see Getting started with AWS SAM)
  2. AWS User/Credentials with appropriate permissions to run AWS CloudFormation templates in the target AWS account
  3. Python 3.8 – 3.10
  4. The AWS SDK for Python (Boto3) is installed
  5. The requests python library is installed

The source code for this implementation can be found at  https://github.com/aws-samples/blog-serverless-reliable-messaging

Installing the Solution:

  1. Clone the git repository to a local directory
  2. git clone https://github.com/aws-samples/blog-serverless-reliable-messaging.git
  3. Change into the directory that was created by the clone operation, usually blog_serverless_reliable_messaging
  4. Execute the command: sam build
  5. Execute the command: sam deploy –-guided. You are asked to supply the following parameters:
    1. Stack Name: Name given to this deployment (example: serverless-streaming)
    2. AWS Region: Where to deploy (example: us-east-1)
    3. ElasticacheInstanceClass: EC2 cache instance type to use with (example: cache.t3.small)
    4. ElasticReplicaCount: How many replicas should be used with ElastiCache (recommended minimum: 2)
    5. ProjectName: Used for naming resources in account (example: serverless-streaming)
    6. MultiAZ: True/False if multiple Availability Zones should be used (recommend: True)
    7. The default parameters can be selected for the remainder of questions

Testing:

Once you have deployed the stack, you can test it through the API gateway endpoint with the API key that is referenced in the deployment output. There are two methods for retrieving the API key either via the AWS console (from the link provided in the output – ApiKeyConsole) or via the AWS CLI (from the AWS CLI reference in the output – APIKeyCLI).

You can test directly in the Lambda service console by invoking the ingest message function.

A test message is available at the root of the project test_message.json for direct Lambda function testing of the Ingest function.

  1. In the console navigate to the Lambda service
  2. From the list of available functions, select the “<project name> -IngestMessageFunction-xxxxx” function
  3. Under the “Function overview” select the “Test” tab
  4. Enter an event name of your choosing
  5. Copy and paste the contents of test_message.json into the “Event JSON” box
  6. Click “Save” then after it has saved, click the “Test”
  7. If successful, you should see something similar to the below in the details:
    {
    "isBase64Encoded": false,
    "statusCode": 200,
    "headers": {
    "Access-Control-Allow-Headers": "Content-Type",
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "OPTIONS,POST"
    },
    "body": "{\"messageID\": \"XXXXXXXXXXXXXX\"}"
    }
  8. In the S3 bucket “<project name>-s3messagearchive-xxxxxx“, find the payload of the original json with a key based on the date and time of the script execution, e.g.: YEAR/MONTH/DAY/HOUR/MINUTE with a file name of the messageID
  9. In a DynamoDB table named metaDataTable, you should find a record with a messageID equal to the messageID from above that contains all of the metadata related to the payload

A python script is included with the code in the test_client folder

  1. Replace the <Your API key key here> and the <Your API Gateway URL here (IngestMessageApi)> values with the correct ones for your environment in the test_client.py file
  2. Execute the test script with Python 3.8 or higher with the requests package installed
    Example execution (from main directory of git clone):
    python3 -m pip install -r ./test_client/requirements.txt
    python3 ./test_client/test_client.py
  3. Successful output shows the messageID and the header JSON payload:
    {
    "messageID": " XXXXXXXXXXXXXX"
    }
  4. In the S3 bucket “<project name>-s3messagearchive-xxxxxx“, you should be able to find the payload of the original json with a key based on the date and time of the script execution, e.g.: YEAR/MONTH/DAY/HOUR/MINUTE with a file name of the messageID
  5. In a DynamoDB table named metaDataTable, you should find a record with a messageID equal to the messageID from above that contains all of the meta data related to the payload

Conclusion

This blog describes architectural patterns, messaging patterns, and data structures that support a highly reliable messaging system for large messages. The use of serverless services including Lambda functions, Step Functions, ElastiCache, DynamoDB, and S3 meet the requirements of modern audit systems to be scalable and reliable. The architecture shared in this blog post is suitable for a highly regulated environment to store and track messages that are larger than typical logging systems, records sized between 256k and 6MB. The architecture serves as a blueprint that can be extended and adapted to fit further serverless use cases.

For serverless learning resources, visit Serverless Land.

Comparing design approaches for building serverless microservices

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/comparing-design-approaches-for-building-serverless-microservices/

This post is written by Luca Mezzalira, Principal SA, and Matt Diamond, Principal, SA.

Designing a workload with AWS Lambda creates questions for developers due to the modularity that can be expressed either at the code or infrastructure level. Using serverless for running code requires additional planning to extract the business logic from the underlying functional components. This deliberate separation of concerns ensures a robust modularity, paving the way for evolutionary architectures.

This post focuses on synchronous workloads, but similar considerations are applicable in other workload types. After identifying the bounded context of your API and agreeing on API contracts with consumers, it’s time to structure the architecture of your bounded context and the associated infrastructure.

The two most common ways to structure an API using Lambda functions are single responsibility and Lambda-lith. However, this blog post explores an alternative to these approaches, which can provide the best of both.

Single responsibility Lambda functions

Single responsibility Lambda functions are designed to run a specific task or handle a particular event-triggered operation within a serverless architecture:

c:\temp\design1.png

This approach provides a strong separation of concerns between business logic and capabilities. You can test in isolation specific capabilities, deploy a Lambda function independently, reduce the surface to introduce bugs, and enable easier debugging for issues in Amazon CloudWatch.

Additionally, single purpose functions enable efficient resource allocation as Lambda automatically scales based on demand, optimizing resource consumption, and minimizing costs. This means you can modify the memory size, architecture, and any other configuration available per function. Moreover, requesting an update of concurrent function execution via a support ticket becomes easier because you are not aggregating the traffic to a single Lambda function that handles every request but you can request specific increase based on the traffic of a single task.

Another advantage is rapid execution time. Considering the business logic for a single-purpose Lambda function designed for a single task, you can optimize the size of a function more easily, without the need of additional libraries required in other approaches. This helps reduce the cold start time due to a smaller bundle size.

Despite these benefits, some issues exist when solely relying on single-purpose Lambda functions. While the cold start time is mitigated, you might experience a higher number of cold starts, particularly for functions with sporadic or infrequent invocations. For example, a function that deletes users in an Amazon DynamoDB table likely won’t be triggered as often as one that reads user data. Also, relying heavily on single-purpose Lambda functions can lead to increased system complexity, especially as the number of functions grows.

A good separation of concerns helps maintain your code base, at the cost of a lack of cohesion. In functions with similar tasks, such as write operations of an API (POST, PUT, DELETE), you might duplicate code and behaviors across multiple functions. Moreover, updating common libraries shared via Lambda Layers, or other dependency management systems, requires multiple changes across every function instead of an atomic change on a single file. This is also true for any other change across multiple functions, for instance, updating the runtime version.

Lambda-lith: Using one single Lambda function

When many workloads use single purpose Lambda functions, developers end up with a proliferation of Lambda functions across an AWS account. One of the main challenges developers face is updating common dependencies or function configurations. Unless there is a clear governance strategy implemented for addressing this problem (such as using Dependabot for enforcing the update of dependencies, or parameterized parameters that are retrieved at provisioning time), developers may opt for a different strategy.

As a result, many development teams move in the opposite direction, aggregating all code related to an API inside the same Lambda function.

Lambda-lith: Using one single Lambda function

This approach is often referred to as a Lambda-lith, because it gathers all the HTTP verbs that compose an API and sometimes multiple APIs in the same function.

This allows you to have a higher code cohesion and colocation across the different parts of the application. Modularity in this case is expressed at the code level, where patterns like single responsibility, dependency injection, and façade are applied to structure your code. The discipline and code best practices applied by the development teams is crucial for maintaining large code bases.

However, considering the reduced number of Lambda functions, updating a configuration or implementing a new standard across multiple APIs can be achieved more easily compared with the single responsibility approach.

Moreover, since every request invokes the same Lambda function for every HTTP verb, it’s more likely that little-used parts of your code have a better response time because an execution environment is more likely to be available to fulfill the request.

Another factor to consider is the function size. This increases when collocating verbs in the same function with all the dependencies and business logic of an API. This may affect the cold start of your Lambda functions with spiky workloads. Customers should evaluate the benefits of this approach, especially when applications have restrictive SLAs, which would be impacted by cold starts. Developers can mitigate this problem by paying attention to the dependencies used and implementing techniques like tree-shaking, minification, and dead code elimination, where the programming language allows.

This coarse grain approach won’t allow you to tune your function configurations individually. But you must find a configuration that matches all the code capabilities with a possibly higher memory size and looser security permissions that might clash with the requirements defined by the security team.

Read and write functions

These two approaches both have trade-offs, but there is a third option that can combine their benefits.

Often, API traffic leans towards more reads or writes and that forces developers to optimize code and configurations more on one side over the other.

For example, consider building a user API that allows consumers to create, update, and delete a user but also to find a user or a list of users. In this scenario, you can change one user at a time with no bulk operations available, but you can get one or more users per API request. Dividing the design of the API into read and write operations results in this architecture:

Read and write functions

The cohesion of code for write operations (create, update, and delete) is beneficial for many reasons. For instance, you may need to validate the request body, ensuring it contains all the mandatory parameters. If the workload is heavy on writes, the less-used operations (for instance, Delete) benefit from warm execution environments. The code colocation enables reusability of code on similar actions, reducing the cognitive load to structure your projects with shared libraries or Lambda layers, for instance.

When looking at the read operations side, you can reduce the code bundled with this function, having a faster cold start, and heavily optimize the performance compared to a write operation. You can also store partial or full query results in-memory of an execution environment to improve the execution time of a Lambda function.

This approach helps you further with its evolutionary nature. Imagine if this platform becomes much more popular. Now, you must optimize the API even further by improving reads and adding a cache aside pattern with ElastiCache and Redis. Moreover, you have decided to optimize the read queries with a second database that is optimized for the read capability when the cache is missed.

On the write side, you have agreed with the API consumers that receiving and acknowledging user creation or deletion is adequate, considering they fully embraced the eventual consistency nature of distributed systems.

Now, you can improve the response time of write operations by adding an SQS queue before the Lambda function. You can update the write database in batches to reduce the number of invocations needed for handling write operations, instead of dealing with every request individually.

CQRS pattern

Command query responsibility segregation (CQRS) is a well-established pattern that separates the data mutation, or the command part of a system, from the query part. You can use the CQRS pattern to separate updates and queries if they have different requirements for throughput, latency, or consistency.

While it’s not mandatory to start with a full CQRS pattern, you can evolve from the infrastructure highlighted more easily in the initial read and write implementation, without massive refactoring of your API.

Comparison of the three approaches

Here is a comparison of the three approaches:

 

Single responsibility Lambda-lith Read and write
Benefits
  • Strong separation of concerns
  • Granular configuration
  • Better debug
  • Rapid execution time
  • Fewer cold start invocations
  • Higher code cohesion
  • Simpler maintenance
  • Code cohesion where needed
  • Evolutionary architecture
  • Optimization of read and write operations
Issues
  • Code duplication
  • Complex maintenance
  • Higher cold start invocations
  • Corse grain configuration
  • Higher cold start time
  • Using CQRS with two data models
  • CQRS adds eventual consistency to your system

Conclusion

Developers often move from single responsibility functions to the Lambda-lith as their architectures evolve, but both approaches have relative trade-offs. This post shows how it’s possible to have the best of both approaches by dividing your workloads per read and write operations.

All three approaches are viable for designing serverless APIs, and understanding what you are optimizing for is the key for making the best decision. Remember, understanding your context and business requirements to express in your applications leads you towards the acceptable trade-offs to specify inside a specific workload. Keep an open mind and find the solution that solves the problem and balances security, developer experience, cost, and maintainability.

For more serverless learning resources, visit Serverless Land.

Introducing the .NET 8 runtime for AWS Lambda

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-the-net-8-runtime-for-aws-lambda/

This post is written by Beau Gosse, Senior Software Engineer and Paras Jain, Senior Technical Account Manager.

AWS Lambda now supports .NET 8 as both a managed runtime and container base image. With this release, Lambda developers can benefit from .NET 8 features including API enhancements, improved Native Ahead of Time (Native AOT) support, and improved performance. .NET 8 supports C# 12, F# 8, and PowerShell 7.4. You can develop Lambda functions in .NET 8 using the AWS Toolkit for Visual Studio, the AWS Extensions for .NET CLI, AWS Serverless Application Model (AWS SAM), AWS CDK, and other infrastructure as code tools.

Creating .NET 8 function in the console

Creating .NET 8 function in the console

What’s new

Upgraded operating system

The .NET 8 runtime is built on the Amazon Linux 2023 (AL2023) minimal container image. This provides a smaller deployment footprint than earlier Amazon Linux 2 (AL2) based runtimes and updated versions of common libraries such as glibc 2.34 and OpenSSL 3.

The new image also uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the .NET 8 base image. For more information, see Introducing the Amazon Linux 2023 runtime for AWS Lambda.

Performance

There are a number of language performance improvements available as part of .NET 8. Initialization time can impact performance, as Lambda creates new execution environments to scale your function automatically. There are a number of ways to optimize performance for Lambda-based .NET workloads, including using source generators in System.Text.Json or using Native AOT.

Lambda has increased the default memory size from 256 MB to 512 MB in the blueprints and templates for improved performance with .NET 8. Perform your own functional and performance tests on your .NET 8 applications. You can use AWS Compute Optimizer or AWS Lambda Power Tuning for performance profiling.

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda subsystems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing performance comparison conclusions with other Lambda runtimes until the performance has stabilized.

Native AOT

Lambda introduced .NET Native AOT support in November 2022. Benchmarks show up to 86% improvement in cold start times by eliminating the JIT compilation. Deploying .NET 8 Native AOT functions using the managed dotnet8 runtime rather than the OS-only provided.al2023 runtime gives your function access to .NET system libraries. For example, libicu, which is used for globalization, is not included by default in the provided.al2023 runtime but is in the dotnet8 runtime.

While Native AOT is not suitable for all .NET functions, .NET 8 has improved trimming support. This allows you to more easily run ASP.NET APIs. Improved trimming support helps eliminate build time trimming warnings, which highlight possible runtime errors. This can give you confidence that your Native AOT function behaves like a JIT-compiled function. Trimming support has been added to the Lambda runtime libraries, AWS .NET SDK, .NET Lambda Annotations, and .NET 8 itself.

Using.NET 8 with Lambda

To use .NET 8 with Lambda, you must update your tools.

  1. Install or update the .NET 8 SDK.
  2. If you are using AWS SAM, install or update to the latest version.
  3. If you are using Visual Studio, install or update the AWS Toolkit for Visual Studio.
  4. If you use the .NET Lambda Global Tools extension (Amazon.Lambda.Tools), install the CLI extension and templates. You can upgrade existing tools with dotnet tool update -g Amazon.Lambda.Tools and existing templates with dotnet new install Amazon.Lambda.Templates.

You can also use .NET 8 with Powertools for AWS Lambda (.NET), a developer toolkit to implement serverless best practices such as observability, batch processing, retrieving parameters, idempotency, and feature flags.

Building new .NET 8 functions

Using AWS SAM

  1. Run sam init.
  2. Choose 1- AWS Quick Start Templates.
  3. Choose one of the available templates such as Hello World Example.
  4. Select N for Use the most popular runtime and package type?
  5. Select dotnet8 as the runtime. The dotnet8 Hello World Example also includes a Native AOT template option.
  6. Follow the rest of the prompts to create the .NET 8 function.
AWS SAM .NET 8 init options

AWS SAM .NET 8 init options

You can amend the generated function code and use sam deploy --guided to deploy the function.

Using AWS Toolkit for Visual Studio

  1. From the Create a new project wizard, filter the templates to either the Lambda or Serverless project type and select a template. Use Lambda for deploying a single function. Use Serverless for deploying a collection of functions using AWS CloudFormation.
  2. Continue with the steps to finish creating your project.
  3. You can amend the generated function code.
  4. To deploy, right click on the project in the Solution Explorer and select Publish to AWS Lambda.

Using AWS extensions for the .NET CLI

  1. Run dotnet new list --tag Lambda to get a list of available Lambda templates.
  2. Choose a template and run dotnet new <template name>. To build a function using Native AOT, use dotnet new lambda.NativeAOT or dotnet new serverless.NativeAOT when using the .NET Lambda Annotations Framework.
  3. Locate the generated Lambda function in the directory under src which contains the .csproj file. You can amend the generated function code.
  4. To deploy, run dotnet lambda deploy-function and follow the prompts.
  5. You can test the function in the cloud using dotnet lambda invoke-function or by using the test functionality in the Lambda console.

You can build and deploy .NET Lambda functions using container images. Follow the instructions in the documentation.

Migrating from .NET 6 to .NET 8 without Native AOT

Using AWS SAM

  1. Open the template.yaml file.
  2. Update Runtime to dotnet8.
  3. Open a terminal window and rebuild the code using sam build.
  4. Run sam deploy to deploy the changes.

Using AWS Toolkit for Visual Studio

  1. Open the .csproj project file and update the TargetFramework to net8.0. Update NuGet packages for your Lambda functions to the latest version to pull in .NET 8 updates.
  2. Verify that the build command you are using is targeting the .NET 8 runtime.
  3. There may be additional steps depending on what build/deploy tool you’re using. Updating the function runtime may be sufficient.

.NET function in AWS Toolkit for Visual Studio

Using AWS extensions for the .NET CLI or AWS Toolkit for Visual Studio

  1. Open the aws-lambda-tools-defaults.json file if it exists.
    1. Set the framework field to net8.0. If unspecified, the value is inferred from the project file.
    2. Set the function-runtime field to dotnet8.
  2. Open the serverless.template file if it exists. For any AWS::Lambda::Function or AWS::Servereless::Function resources, set the Runtime property to dotnet8.
  3. Open the .csproj project file if it exists and update the TargetFramework to net8.0. Update NuGet packages for your Lambda functions to the latest version to pull in .NET 8 updates.

Migrating from .NET 6 to .NET 8 Native AOT

The following example migrates a .NET 6 class library function to a .NET 8 Native AOT executable function. This uses the optional Lambda Annotations framework which provides idiomatic .NET coding patterns.

Update your project file

  1. Open the project file.
  2. Set TargetFramework to net8.0.
  3. Set OutputType to exe.
  4. Remove PublishReadyToRun if it exists.
  5. Add PublishAot and set to true.
  6. Add or update NuGet package references to Amazon.Lambda.Annotations and Amazon.Lambda.RuntimeSupport. You can update using the NuGet UI in your IDE, manually, or by running dotnet add package Amazon.Lambda.RuntimeSupport and dotnet add package Amazon.Lambda.Annotations from your project directory.

Your project file should look similar to the following:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <AWSProjectType>Lambda</AWSProjectType>
    <CopyLocalLockFileAssemblies>true</CopyLocalLockFileAssemblies>
    <!-- Generate native aot images during publishing to improve cold start time. -->
    <PublishAot>true</PublishAot>
	  <!-- StripSymbols tells the compiler to strip debugging symbols from the final executable if we're on Linux and put them into their own file. 
		This will greatly reduce the final executable's size.-->
	  <StripSymbols>true</StripSymbols>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Amazon.Lambda.Core" Version="2.2.0" />
    <PackageReference Include="Amazon.Lambda.RuntimeSupport" Version="1.10.0" />
    <PackageReference Include="Amazon.Lambda.Serialization.SystemTextJson" Version="2.4.0" />
  </ItemGroup>
</Project>

Updating your function code

    1. Reference the annotations library with using Amazon.Lambda.Annotations;
    2. Add [assembly:LambdaGlobalProperties(GenerateMain = true)] to allow the annotations framework to create the main method. This is required as the project is now an executable instead of a library.
    3. Add the below partial class and include a JsonSerializable attribute for any types that you need to serialize, including for your function input and output This partial class is used at build time to generate reflection free code dedicated to serializing the listed types. The following is an example:
    4. /// <summary>
      /// This class is used to register the input event and return type for the FunctionHandler method with the System.Text.Json source generator.
      /// There must be a JsonSerializable attribute for each type used as the input and return type or a runtime error will occur 
      /// from the JSON serializer unable to find the serialization information for unknown types.
      /// </summary>
      [JsonSerializable(typeof(APIGatewayHttpApiV2ProxyRequest))]
      [JsonSerializable(typeof(APIGatewayHttpApiV2ProxyResponse))]
      public partial class MyCustomJsonSerializerContext : JsonSerializerContext
      {
          // By using this partial class derived from JsonSerializerContext, we can generate reflection free JSON Serializer code at compile time
          // which can deserialize our class and properties. However, we must attribute this class to tell it what types to generate serialization code for
          // See https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-source-generation
      }

    5. After the using statement, add the following to specify the serializer to use. [assembly: LambdaSerializer(typeof(SourceGeneratorLambdaJsonSerializer<LambdaFunctionJsonSerializerContext>))]

    Swap LambdaFunctionJsonSerializerContext for your context if you are not using the partial class from the previous step.

    Updating your function configuration

    If you are using aws-lambda-tools-defaults.json.

    1. Set function-runtime to dotnet8.
    2. Set function-architecture to match your build machine – either x86_64 or arm64.
    3. Set (or update) environment-variables to include ANNOTATIONS_HANDLER=<YourFunctionHandler>. Replace <YourFunctionHandler> with the method name of your function handler, so the annotations framework knows which method to call from the generated main method.
    4. Set function-handler to the name of the executable assembly in your bin directory. By default, this is your project name, which tells the .NET Lambda bootstrap script to run your native binary instead of starting the .NET runtime. If your project file has AssemblyName then use that value for the function handler.
    {
      "function-architecture": "x86_64",
      "function-runtime": "dotnet8",
      "function-handler": "<your-assembly-name>",
      "environment-variables",
      "ANNOTATIONS_HANDLER=<your-function-handler>",
    }

    Deploy and test

    1. Deploy your function. If you are using Amazon.Lambda.Tools, run dotnet lambda deploy-function. Check for trim warnings during build and refactor to eliminate them.
    2. Test your function to ensure that the native calls into AL2023 are working correctly. By default, running local unit tests on your development machine won’t run natively and will still use the JIT compiler. Running with the JIT compiler does not allow you to catch native AOT specific runtime errors.

    Conclusion

    Lambda is introducing the new .NET 8 managed runtime. This post highlights new features in .NET 8. You can create new Lambda functions or migrate existing functions to .NET 8 or .NET 8 Native AOT.

    For more information, see the AWS Lambda for .NET repository, documentation, and .NET on Serverless Land.

    For more serverless learning resources, visit Serverless Land.

Re-platforming Java applications using the updated AWS Serverless Java Container

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/re-platforming-java-applications-using-the-updated-aws-serverless-java-container/

This post is written by Dennis Kieselhorst, Principal Solutions Architect.

The combination of portability, efficiency, community, and breadth of features has made Java a popular choice for businesses to build their applications for over 25 years. The introduction of serverless functions, pioneered by AWS Lambda, changed what you need in a programming language and runtime environment. Functions are often short-lived, single-purpose, and do not require extensive infrastructure configuration.

This blog post shows how you can modernize a legacy Java application to run on Lambda with minimal code changes using the updated AWS Serverless Java Container.

Deployment model comparison

Classic Java enterprise applications often run on application servers such as JBoss/ WildFly, Oracle WebLogic and IBM WebSphere, or servlet containers like Apache Tomcat. The underlying Java virtual machine typically runs 24/7 and serves multiple requests using its multithreading capabilities.

Typical long running Java application server

Typical long running Java application server

When building Lambda functions with Java, an HTTP server is no longer required and there are other considerations for running code in a Lambda environment. Code runs in an execution environment, which processes a single invocation at a time. Functions can run for up to 15 minutes with a maximum of 10 Gb allocated memory.

Functions are triggered by events such as an HTTP request with a corresponding payload. An Amazon API Gateway HTTP request invokes the function with the following JSON payload:

Amazon API Gateway HTTP request payload

Amazon API Gateway HTTP request payload

The code to process these events is different from how you implement it in a traditional application.

AWS Serverless Java Container

The AWS Serverless Java Container makes it easier to run Java applications written with frameworks such as Spring, Spring Boot, or JAX-RS/Jersey in Lambda.

The container provides adapter logic to minimize code changes. Incoming events are translated to the Servlet specification so that frameworks work as before.

AWS Serverless Java Container adapter

AWS Serverless Java Container adapter

Version 1 of this library was released in 2018. Today, AWS is announcing the release of version 2, which supports the latest Jakarta EE specification, along with Spring Framework 6.x, Spring Boot 3.x and Jersey 3.x.

Example: Modifying a Spring Boot application

This following example illustrates how to migrate a Spring Boot 3 application. You can find the full example for Spring and other frameworks in the GitHub repository.

  1. Add the AWS Serverless Java dependency to your Maven POM build file (or Gradle accordingly):
  2. <dependency>
        <groupId>com.amazonaws.serverless</groupId>
        <artifactId>aws-serverless-java-container-springboot3</artifactId>
        <version>2.0.0</version>
    </dependency>
  3. Spring Boot, by default, embeds Apache Tomcat to deal with HTTP requests. The examples use Amazon API Gateway to handle inbound HTTP requests so you can exclude the dependency.
  4. <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <configuration>
                    <createDependencyReducedPom>false</createDependencyReducedPom>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <artifactSet>
                                <excludes>
                                    <exclude>org.apache.tomcat.embed:*</exclude>
                                </excludes>
                            </artifactSet>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

    The AWS Serverless Java Container accepts API Gateway proxy requests and transforms them into a plain Java object. The library also transforms outputs into a suitable API Gateway response object.

    Once you run your build process, Maven’s Shade-plugin now produces an Uber-JAR that bundles all dependencies, which you can upload to Lambda.

  5. The Lambda runtime must know which handler method to invoke. You can configure and use the SpringDelegatingLambdaContainerHandler implementation or implement your own handler Java class that delegates to AWS Serverless Java Container. This is useful if you want to add additional functionality.
  6. Configure the handler name in the runtime settings of your function.
  7. Configure the handler name

    Configure the handler name

  8. Configure an environment variable named MAIN_CLASS to let the generic handler know where to find your original application main class, which is usually annotated with @SpringBootApplication.
  9. Configure MAIN_CLASS environment variable

    Configure MAIN_CLASS environment variable

    You can also configure these settings using infrastructure as code (IaC) tools such as AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or the AWS Serverless Application Model (AWS SAM).

    In an AWS SAM template, the related changes are as follows. Full templates are part of the GitHub repository.

    Handler: com.amazonaws.serverless.proxy.spring.SpringDelegatingLambdaContainerHandler 
    Environment:
      Variables:
        MAIN_CLASS: com.amazonaws.serverless.sample.springboot3.Application

    Optimizing memory configuration

    When running Lambda functions, start-up time and memory footprint are important considerations. The amount of memory you configure for your Lambda function also determines the amount of virtual CPU available. Adding more memory proportionally increases the amount of CPU, and therefore increases the overall computational power available. If a function is CPU-, network- or memory-bound, adding more memory can improve performance.

    Lambda charges for the total amount of gigabyte-seconds consumed by a function. Gigabyte-seconds are a combination of total memory (in gigabytes) and duration (in seconds). Increasing memory incurs additional cost. However, in many cases, increasing the memory available causes a decrease in the function duration due to the additional CPU available. As a result, the overall cost increase may be negligible for additional performance, or may even decrease.

    Choosing the memory allocated to your Lambda functions is an optimization process that balances speed (duration) and cost. You can manually test functions by selecting different memory allocations and measuring the completion time. AWS Lambda Power Tuning is a tool to simplify and automate the process, which you can use to optimize your configuration.

    Power Tuning uses AWS Step Functions to run multiple concurrent versions of a Lambda function at different memory allocations and measures the performance. The function runs in your AWS account, performing live HTTP calls and SDK interactions, to measure performance in a production scenario.

    Improving cold-start time with AWS Lambda SnapStart

    Traditional applications often have a large tree of dependencies. Lambda loads the function code and initializes dependencies during Lambda lifecycle initialization phase. With many dependencies, this initialization time may be too long for your requirements. AWS Lambda SnapStart for Java based functions can deliver up to 10 times faster startup performance.

    Instead of running the function initialization phase on every cold-start, Lambda SnapStart runs the function initialization process at deployment time. Lambda takes a snapshot of the initialized execution environment. This snapshot is encrypted and persisted in a tiered cache for low latency access. When the function is invoked and scales, Lambda resumes the execution environment from the persisted snapshot instead of running the full initialization process. This results in lower startup latency.

    To enable Lambda SnapStart you must first enable the configuration setting, and also publish a function version.

    Enabling SnapStart

    Enabling SnapStart

    Ensure you point your API Gateway endpoint to the published version or an alias to ensure you are using the SnapStart enabled function.

    The corresponding settings in an AWS SAM template contain the following:

    SnapStart: 
      ApplyOn: PublishedVersions
    AutoPublishAlias: my-function-alias

    Read the Lambda SnapStart compatibility considerations in the documentation as your application may contain specific code that requires attention.

    Conclusion

    When building serverless applications with Lambda, you can deliver features faster, but your language and runtime must work within the serverless architectural model. AWS Serverless Java Container helps to bridge between traditional Java Enterprise applications and modern cloud-native serverless functions.

    You can optimize the memory configuration of your Java Lambda function using AWS Lambda Power Tuning tool and enable SnapStart to optimize the initial cold-start time.

    The self-paced Java on AWS Lambda workshop shows how to build cloud-native Java applications and migrate existing Java application to Lambda.

    Explore the AWS Serverless Java Container GitHub repo where you can report related issues and feature requests.

    For more serverless learning resources, visit Serverless Land.

Build real-time applications with Amazon EventBridge and AWS AppSync

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/build-real-time-applications-with-amazon-eventbridge-and-aws-appsync/

This post is written by Josh Kahn, Tech Leader, Serverless.

Amazon EventBridge now supports publishing events to AWS AppSync GraphQL APIs as native targets. The new integration enables builders to publish events easily to a wider variety of consumers and simplifies updating clients with near real-time data. You can use EventBridge and AWS AppSync to build resilient, subscription-based event-driven architectures across consumers.

To illustrate using EventBridge with AWS AppSync, consider a simplified airport operations scenario. In this example, airlines publish flight events (for example, boarding, push back, gate changes, and delays) to a service that maintains flight status on in-airport displays. Airlines also publish events that are useful for other entities at the airport, such as baggage handlers and maintenance, but not to passengers. This depicts a conceptual view of the system:

Conceptual view of the system

Passengers want the in-airport displays to be up-to-date and accurate. There are a number of ways to design the display application so that data remains up-to-date. Broadly, these include the application polling some API or the application subscribing to data changes.

Subscriptions for this scenario are better as the data changes are small and incremental relative to the large amount of information displayed. In a delay, for example, the display updates the status and departure time but no other details of a single flight among a larger list of flight information.

Flight board

AWS AppSync can enable clients to listen for real-time data changes through the use of GraphQL subscriptions. These are implemented using a WebSocket connection between the client and the AWS AppSync service. The display application client invokes the GraphQL subscription operation to establish a secure connection. AWS AppSync will automatically push data changes (or mutations) via the GraphQL API to subscribers using that connection.

Previously, builders could use EventBridge API Destinations to wire events published and routed through EventBridge to AWS AppSync, as described in an earlier blog post, and available in Serverless Land patterns (API Key, OAuth). The approach is useful for dealing with “out-of-band” updates in which data changes outside of an AWS AppSync mutation. Out-of-band updates generally require a NONE data source in AWS AppSync to notify subscribers of changes, as described in the AWS re:Post Knowledge Center. The addition of AWS AppSync as a target for EventBridge simplifies these use cases as you can now trigger a mutation in response to an event without additional code.

Airport Operations Events

Expanding the scenario, airport operations events look like this:

{
  "flightNum": 123,
  "carrierCode": "JK",
  "date": "2024-01-25",
  "event": "FlightDelayed",
  "message": "Delayed 15 minutes, late aircraft",
  "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
}

The event field identifies the type of event and if it is relevant to passengers. The event details provide further information about the event, which varies based on the type of event. The airport publishes a variety of events but the airport displays only need a subset of those changes.

AWS AppSync GraphQL APIs start with a GraphQL schema that defines the types, fields, and operations available in that API. AWS AppSync documentation provides an overview of schema and other GraphQL essentials. The partial GraphQL schema for the airport scenario is as follows:


type DelayEventInfo implements EventInfo {
	message: String
	delayMinutes: Int
	newDepTime: AWSDateTime
}

interface EventInfo {
	message: String
}

enum StatusEvent {
	FlightArrived
	FlightBoarding
	FlightCancelled
	FlightDelayed
	FlightGateChanged
	FlightLanded
	FlightPushBack
	FlightTookOff
}

type StatusUpdate {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	info: EventInfo
}

input StatusUpdateInput {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	message: String
	extra: AWSJSON
}

type Mutation {
	updateFlightStatus(input: StatusUpdateInput!): StatusUpdate!
}

type Query {
	listStatusUpdates(by: String): [StatusUpdate]
}

type Subscription {
	onFlightStatusUpdate(date: AWSDate, carrier: String): StatusUpdate
		@aws_subscribe(mutations: ["updateFlightStatus"])
}

schema {
	query: Query
	mutation: Mutation
	subscription: Subscription
}

Connect EventBridge to AWS AppSync

EventBridge allows you to filter, transform, and route events to a number of targets. The airport display service only needs events that directly impact passengers. You can define a rule in EventBridge that routes only those events (included in the preceding GraphQL schema) to the AWS AppSync target. Other events are routed elsewhere, as defined by other rules, or dropped. Details on creating EventBridge rules and the event matching pattern format can be found in EventBridge documentation.

The previous flight delayed event would be delivered using EventBridge as follows:

{
  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
  }
}

In this scenario, there is a specific list of events of interest, but EventBridge provides a flexible set of operations to match patterns, inspect arrays, and filter by content using prefix, numerical, or other matching. Some organizations will also allow subscribers to define their own rules on an EventBridge event bus, allowing targets to subscribe to events via self-service.

The following event pattern matches on the events needed for the airport display service:

{
  "source": [ "airport-operations" ],
  "detail": {
    "event": [ "FlightArrived", "FlightBoarding", "FlightCancelled", ... ]
  }
}

To create a new EventBridge rule, you can use the AWS Management Console or infrastructure as code. You can find the CloudFormation definition for the completed rule, with the AWS AppSync target, later in this post.

Console view

Create the AWS AppSync target

Now that EventBridge is configured to route selected events, define AWS AppSync as the target for the rule. The AWS AppSync API must support IAM authorization to be used as an EventBridge target. AWS AppSync supports multiple authorization types on a single GraphQL type, so you can also use OpenID Connect, Amazon Cognito User Pools, or other authorization methods as needed.

To configure AWS AppSync as an EventBridge target, define the target using the AWS Management Console or infrastructure as code. In the console, select the Target Type as “AWS Service” and Target as “AppSync.” Select your API. EventBridge parses the GraphQL schema and allows you to select the mutation to invoke when the rule is triggered.

When using the AWS Management Console, EventBridge will also configure the necessary AWS IAM role to invoke the selected mutation. Remember to create and associate a role with an appropriate trust policy when configuring with IaC.

EventBridge target types

EventBridge supports input transformation to customize the contents of an event before passing the information as input to the target. Configure the input transformer to extract needed values from the event using JSON path and a template in the input format expected by the AWS AppSync API. EventBridge provides a handy utility in the Console to pass and test the output of a sample event.

Target input transformer

Finally, configure the selection set to include the response from the AWS AppSync API. These are the fields that will be returned to EventBridge when the mutation is invoked. While the result returned to EventBridge is not overly useful (aside from troubleshooting), the mutation selection set will also determine the fields available to subscribers to the onFlightStatusUpdate subscription.

Configuring the selection set

Define the EventBridge to AWS AppSync rule in CloudFormation

Infrastructure as code templates, including AWS CloudFormation and AWS CDK, are useful for codifying infrastructure definitions to deploy across Regions and accounts. While you can write CloudFormation by hand, EventBridge provides a useful CloudFormation export in the AWS Management Console. You can use this feature to export the definition for a defined rule.

Export definition

This is the CloudFormation for the previous configured rule and AWS AppSync target. This snippet includes both the rule definition and the target configuration.

PassengerEventsToDisplayServiceRule:
    Type: AWS::Events::Rule
    Properties:
      Description: Route passenger related events to the display service endpoint
      EventBusName: eb-to-appsync
      EventPattern:
        source:
          - airport-operations
        detail:
          event:
            - FlightArrived
            - FlightBoarding
            - FlightCancelled
            - FlightDelayed
            - FlightGateChanged
            - FlightLanded
            - FlightPushBack
            - FlightTookOff
      Name: passenger-events-to-display-service
      State: ENABLED
      Targets:
        - Id: 12344535353263463
          Arn: <AppSync API GraphQL API ARN>
          RoleArn: <EventBridge Role ARN (defined elsewhere)>
          InputTransformer:
            InputPathsMap:
              carrier: $.detail.carrierCode
              date: $.detail.date
              event: $.detail.event
              extra: $.detail.info
              message: $.detail.message
              num: $.detail.flightNum
            InputTemplate: |-
              {
                "input": {
                  "num": <num>,
                  "carrier": <carrier>,
                  "date": <date>,
                  "event": <event>,
                  "message": "<message>",
                  "extra": <extra>
                }
              }
          AppSyncParameters:
            GraphQLOperation: >-
              mutation
              UpdateFlightStatus($input:StatusUpdateInput!){updateFlightStatus(input:$input){
                event
                date
                carrier
                num
                info {
                  __typename
                  ... on DelayEventInfo {
                    message
                    delayMinutes
                    newDepTime
                  }
                }
              }}

The ARN of the AWS AppSync API follows the form arn:aws:appsync:<AWS_REGION>:<ACCOUNT_ID>:endpoints/graphql-api/<GRAPHQL_ENDPOINT_ID>. The ARN is available in CloudFormation (see GraphQLEndpointArn return value) or can be created using the identifier found in the AWS AppSync GraphQL endpoint. The ARN included in the EventBridge execution role policy is the AWS AppSync API ARN (a different ARN).

The AppSyncParameters field includes the GraphQL operation for EventBridge to invoke on the AWS AppSync API. This must be well formatted and match the GraphQL schema. Include any fields that must be available to subscribers in the selection set.

Testing subscriptions

AWS AppSync is now configured as a target for the EventBridge rule. The real-life display application would use a GraphQL library, such as AWS Amplify, to subscribe to real-time data changes. The AWS Management Console provides a useful utility to test. Navigate to the AWS AppSync console and select Queries in the menu for your API. Enter the following query and choose Run to subscribe for data changes:

subscription MySubscription {
  onFlightStatusUpdate {
    carrier
    date
    event
    num
    info {
      __typename
      … on DelayEventInfo {
        message
        delayMinutes
        newDepTime
      }
    }
  }
}

In a separate browser tab, navigate to the EventBridge console, and choose Send events. On the Send events page, select the required event bus and set the Event source to “airport-operations.” Then enter a detail type of your choice. Finally, paste the following as the Event detail, then choose Send.

{
  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
  }
}

Return to the AWS AppSync tab in your browser to see the changed data in the result pane:

Result pane

Conclusion

Directly invoking AWS AppSync GraphQL API targets from EventBridge simplifies and streamlines integration between these two services, ideal for notifying a variety of subscribers of data changes in event-driven workloads. You can also take advantage of other features available from the two services. For example, use AWS AppSync enhanced subscription filtering to update only airport displays in the terminal in which they are located.

To learn more about serverless, visit Serverless Land for a wide array of reusable patterns, tutorials, and learning materials. Newly added to the pattern library is an EventBridge to AWS AppSync pattern similar to the one described in this post. Visit EventBridge documentation for more details.

For more serverless learning resources, visit Serverless Land.

Invoking on-premises resources interactively using AWS Step Functions and MQTT

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/invoking-on-premises-resources-interactively-using-aws-step-functions-and-mqtt/

This post is written by Alex Paramonov, Sr. Solutions Architect, ISV, and Pieter Prinsloo, Customer Solutions Manager.

Workloads in AWS sometimes require access to data stored in on-premises databases and storage locations. Traditional solutions to establish connectivity to the on-premises resources require inbound rules to firewalls, a VPN tunnel, or public endpoints.

This blog post demonstrates how to use the MQTT protocol (AWS IoT Core) with AWS Step Functions to dispatch jobs to on-premises workers to access or retrieve data stored on-premises. The state machine can communicate with the on-premises workers without opening inbound ports or the need for public endpoints on on-premises resources. Workers can run behind Network Access Translation (NAT) routers while keeping bidirectional connectivity with the AWS Cloud. This provides a more secure and cost-effective way to access data stored on-premises.

Overview

By using Step Functions with AWS Lambda and AWS IoT Core, you can access data stored on-premises securely without altering the existing network configuration.

AWS IoT Core lets you connect IoT devices and route messages to AWS services without managing infrastructure. By using a Docker container image running on-premises as a proxy IoT Thing, you can take advantage of AWS IoT Core’s fully managed MQTT message broker for non-IoT use cases.

MQTT subscribers receive information via MQTT topics. An MQTT topic acts as a matching mechanism between publishers and subscribers. Conceptually, an MQTT topic behaves like an ephemeral notification channel. You can create topics at scale with virtually no limit to the number of topics. In SaaS applications, for example, you can create topics per tenant. Learn more about MQTT topic design here.

The following reference architecture shown uses the AWS Serverless Application Model (AWS SAM) for deployment, Step Functions to orchestrate the workflow, AWS Lambda to send and receive on-premises messages, and AWS IoT Core to provide the MQTT message broker, certificate and policy management, and publish/subscribe topics.

Reference architecture

  1. Start the state machine, either “on demand” or on a schedule.
  2. The state: “Lambda: Invoke Dispatch Job to On-Premises” publishes a message to an MQTT message broker in AWS IoT Core.
  3. The message broker sends the message to the topic corresponding to the worker (tenant) in the on-premises container that runs the job.
  4. The on-premises container receives the message and starts work execution. Authentication is done using client certificates and the attached policy limits the worker access to only the tenant’s topic.
  5. The worker in the on-premises container can access local resources like DBs or storage locations.
  6. The on-premises container sends the results and job status back to another MQTT topic.
  7. The AWS IoT Core rule invokes the “TaskToken Done” Lambda function.
  8. The Lambda function submits the results to Step Functions via SendTaskSuccess or SendTaskFailure API.

Deploying and testing the sample

Ensure you can manage AWS resources from your terminal and that:

  • Latest versions of AWS CLI and AWS SAM CLI are installed.
  • You have an AWS account. If not, visit this page.
  • Your user has sufficient permissions to manage AWS resources.
  • Git is installed.
  • Python version 3.11 or greater is installed.
  • Docker is installed.

You can access the GitHub repository here and follow these steps to deploy the sample.

The aws-resources directory contains the required AWS resources including the state machine, Lambda functions, topics, and policies. The directory on-prem-worker contains the Docker container image artifacts. Use it to run the on-premises worker locally.

In this example, the worker container adds two numbers, provided as an input in the following format:

{
  "a": 15,
  "b": 42
}

In a real-world scenario, you can substitute this operation with business logic. For example, retrieving data from on-premises databases, generating aggregates, and then submitting the results back to your state machine.

Follow these steps to test the sample end-to-end.

Using AWS IoT Core without IoT devices

There are no IoT devices in the example use case. However, the fully managed MQTT message broker in AWS IoT Core lets you route messages to AWS services without managing infrastructure.

AWS IoT Core authenticates clients using X.509 client certificates. You can attach a policy to a client certificate allowing the client to publish and subscribe only to certain topics. This approach does not require IAM credentials inside the worker container on-premises.

AWS IoT Core’s security, cost efficiency, managed infrastructure, and scalability make it a good fit for many hybrid applications beyond typical IoT use cases.

Dispatching jobs from Step Functions and waiting for a response

When a state machine reaches the state to dispatch the job to an on-premises worker, the execution pauses and waits until the job finishes. Step Functions support three integration patterns: Request-Response, Sync Run a Job, and Wait for a Callback with Task Token. The sample uses the “Wait for a Callback with Task Token“ integration. It allows the state machine to pause and wait for a callback for up to 1 year.

When the on-premises worker completes the job, it publishes a message to the topic in AWS IoT Core. A rule in AWS IoT Core then invokes a Lambda function, which sends the result back to the state machine by calling either SendTaskSuccess or SendTaskFailure API in Step Functions.

You can prevent the state machine from timing out by adding HeartbeatSeconds to the task in the Amazon States Language (ASL). Timeouts happen if the job freezes and the SendTaskFailure API is not called. HeartbeatSeconds send heartbeats from the worker via the SendTaskHeartbeat API call and should be less than the specified TimeoutSeconds.

To create a task in ASL for your state machine, which waits for a callback token, use the following code:

{
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
      "Parameters": {
        "FunctionName": "${LambdaNotifierToWorkerArn}",
        "Payload": {
          "Input.$": "$",
          "TaskToken.$": "$$.Task.Token"
        }
}

The .waitForTaskToken suffix indicates that the task must wait for the callback. The state machine generates a unique callback token, accessible via the $$.Task.Token built-in variable, and passes it as an input to the Lambda function defined in FunctionName.

The Lambda function then sends the token to the on-premises worker via an AWS IoT Core topic.

Lambda is not the only service that supports Wait for Callback integration – see the full list of supported services here.

In addition to dispatching tasks and getting the result back, you can implement progress tracking and shut down mechanisms. To track progress, the worker sends metrics via a separate topic.

Depending on your current implementation, you have several options:

  1. Storing progress data from the worker in Amazon DynamoDB and visualizing it via REST API calls to a Lambda function, which reads from the DynamoDB table. Refer to this tutorial on how to store data in DynamoDB directly from the topic.
  2. For a reactive user experience, create a rule to invoke a Lambda function when new progress data arrives. Open a WebSocket connection to your backend. The Lambda function sends progress data via WebSocket directly to the frontend.

To implement a shutdown mechanism, you can run jobs in separate threads on your worker and subscribe to the topic, to which your state machine publishes the shutdown messages. If a shutdown message arrives, end the job thread on the worker and send back the status including the callback token of the task.

Using AWS IoT Core Rules and Lambda Functions

A message with job results from the worker does not arrive to the Step Functions API directly. Instead, an AWS IoT Core Rule and a dedicated Lambda function forward the status message to Step Functions. This allows for more granular permissions in AWS IoT Core policies, which result in improved security because the worker container can only publish and subscribe to specific topics. No IAM credentials exist on-premises.

The Lambda function’s execution role contains the permissions for SendTaskSuccess, SendTaskHeartbeat, and SendTaskFailure API calls only.

Alternatively, a worker can run API calls in Step Functions workflows directly, which replaces the need for a topic in AWS IoT Core, a rule, and a Lambda function to invoke the Step Functions API. This approach requires IAM credentials inside the worker’s container. You can use AWS Identity and Access Management Roles Anywhere to obtain temporary security credentials. As your worker’s functionality evolves over time, you can add further AWS API calls while adding permissions to the IAM execution role.

Cleaning up

The services used in this solution are eligible for AWS Free Tier. To clean up the resources in the aws-resources/ directory of the repository run:

sam delete

This removes all resources provisioned by the template.yml file.

To remove the client certificate from AWS, navigate to AWS IoT Core Certificates and delete the certificate, which you added during the manual deployment steps.

Lastly, stop the Docker container on-premises and remove it:

docker rm --force mqtt-local-client

Finally, remove the container image:

docker rmi mqtt-client-waitfortoken

Conclusion

Accessing on-premises resources with workers controlled via Step Functions using MQTT and AWS IoT Core is a secure, reactive, and cost effective way to run on-premises jobs. Consider updating your hybrid workloads from using inefficient polling or schedulers to the reactive approach described in this post. This offers an improved user experience with fast dispatching and tracking of jobs outside of cloud.

For more serverless learning resources, visit Serverless Land.

Consuming private Amazon API Gateway APIs using mutual TLS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/consuming-private-amazon-api-gateway-apis-using-mutual-tls/

This post is written by Thomas Moore, Senior Solutions Architect and Josh Hart, Senior Solutions Architect.

A previous blog post explores using Amazon API Gateway to create private REST APIs that can be consumed across different AWS accounts inside a virtual private cloud (VPC). Private cross-account APIs are useful for software vendors (ISVs) and SaaS companies providing secure connectivity for customers, and organizations building internal APIs and backend microservices.

Mutual TLS (mTLS) is an advanced security protocol that provides two-way authentication via certificates between a client and server. mTLS requires the client to send an X.509 certificate to prove its identity when making a request, together with the default server certificate verification process. This ensures that both parties are who they claim to be.

mTLS connection process

The mTLS connection process illustrated in the diagram above:

  1. Client connects to the server.
  2. Server presents its certificate, which is verified by the client.
  3. Client presents its certificate, which is verified by the server.
  4. Encrypted TLS connection established.

Customers use mTLS because it offers stronger security and identity verification than standard TLS connections. mTLS helps prevent man-in-the-middle attacks and protects against threats such as impersonation attempts, data interception, and tampering. As threats become more advanced, mTLS provides an extra layer of defense to validate connections.

Implementing mTLS increases overhead for certificate management, but for applications transmitting valuable or sensitive data, the extra security is important. If security is a priority for your systems and users, you should consider deploying mTLS.

Regional API Gateway endpoints have native support for mTLS but private API Gateway endpoints do not support mTLS, so you must terminate mTLS before API Gateway. The previous blog post shows how to build private mTLS APIs using a self-managed verification process inside a container running an NGINX proxy. Since then, Application Load Balancer (ALB) now supports mTLS natively, simplifying the architecture.

This post explores building mTLS private APIs using this new feature.

Application Load Balancer mTLS configuration

You can enable mutual authentication (mTLS) on a new or existing Application Load Balancer. By enabling mTLS on the load balancer listener, clients are required to present trusted certificates to connect. The load balancer validates the certificates before allowing requests to the backends.

Application Load Balancer mTLS configuration

There are two options available when configuring mTLS on the Application Load Balancer: Passthrough mode and Verify with trust store mode.

In Passthrough mode, the client certificate chain is passed as an X-Amzn-Mtls-Clientcert HTTP header for the application to inspect for authorization. In this scenario, there is still a backend verification process. The benefit in adding the ALB to the architecture is that you can perform application (layer 7) routing, such as path-based routing, allowing more complex application routing configurations.

In Verify with trust store mode, the load balancer validates the client certificate and only allows clients providing trusted certificates to connect. This simplifies the management and reduces load on backend applications.

This example uses AWS Private Certificate Authority but the steps are similar for third-party certificate authorities (CA).

To configure the certificate Trust Store for the ALB:

  1. Create an AWS Private Certificate Authority. Specify the Common Name (CN) to be the domain you use to host the application at (for example, api.example.com).
  2. Export the CA using either the CLI or the Console and upload the resulting Certificate.pem to an Amazon S3 bucket.
  3. Create a Trust Store, point this at the certificate uploaded in the previous step.
  4. Update the listener of your Application Load Balancer to use this trust store and select the required mTLS verification behavior.
  5. Generate certificates for the client application against the private certificate authority, for example using the following commands:
openssl req -new -newkey rsa:2048 -days 365 -keyout my_client.key -out my_client.csr

aws acm-pca issue-certificate –certificate-authority-arn arn:aws:acm-pca:us-east-1:111122223333:certificate-authority/certificate_authority_id–csr fileb://my_client.csr –signing-algorithm “SHA256WITHRSA” –validity Value=365,Type=”DAYS” –template-arn arn:aws:acm-pca:::template/EndEntityCertificate/V1

aws acm-pca get-certificate -certificate-authority-arn arn:aws:acm-pca:us-east-1:111122223333:certificate-authority/certificate_authority_id–certificate-arn arn:aws:acm-pca:us-east-1:account_id:certificate-authority/certificate_authority_id/certificate/certificate_id–output text

For more details on this part of the process, see Use ACM Private CA for Amazon API Gateway Mutual TLS.

Private API Gateway mTLS verification using an ALB

Using the ALB Verify with trust store mode together with API Gateway can enable private APIs with mTLS, without the operational burden of a self-managed proxy service.

You can use this pattern to access API Gateway in the same AWS account, or cross-account.

Private API Gateway mTLS verification using an ALB

The same account pattern allows clients inside the VPC to consume the private API Gateway by calling the Application Load Balancer URL. The ALB is configured to verify the provided client certificate against the trust store before passing the request to the API Gateway.

If the certificate is invalid, the API never receives the request. A resource policy on the API Gateway ensures that can requests are only allowed via the VPC endpoint, and a security group on the VPC endpoint ensures that it can only receive requests from the ALB. This prevents the client from bypassing mTLS by invoking the API Gateway or VPC endpoints directly.

Cross-account private API Gateway mTLS using AWS PrivateLink.

The cross-account pattern using AWS PrivateLink provides the ability to connect to the ALB endpoint securely across accounts and across VPCs. It avoids the need to connect VPCs together using VPC Peering or AWS Transit Gateway and enables software vendors to deliver SaaS services to be consumed by their end customers. This pattern is available to deploy as sample code in the GitHub repository.

The flow of a client request through the cross-account architecture is as follows:

  1. A client in the consumer application sends a request to the producer API endpoint.
  2. The request is routed via AWS PrivateLink to a Network Load Balancer in the consumer account. The Network Load Balancer is a requirement of AWS PrivateLink services.
  3. The Network Load Balancer uses an Application Load Balancer-type Target Group.
  4. The Application Load Balancer listener is configured for mTLS in verify with trust store mode.
  5. An authorization decision is made comparing the client certificate to the chain in the certificate trust store.
  6. If the client certificate is allowed the request is routed to the API Gateway via the execute-api VPC Endpoint. An API Gateway resource policy is used to allow connections only via the VPC endpoint.
  7. Any additional API Gateway authentication and authorization is performed, such as using a Lambda authorizer to validate a JSON Web Token (JWT).

Using the example deployed from the GitHub repo, this is the expected response from a successful request with a valid certificate:

curl –key my_client.key –cert my_client.pem https://api.example.com/widgets 

{“id”:”1”,”value”:”4.99”}

When passing an invalid certificate, the following response is received:

curl: (35) Recv failure: Connection reset by peer

Custom domain names

An additional benefit to implementing the mTLS solution with an Application Load Balancer is support for private custom domain names. Private API Gateway endpoints do not support custom domain names currently. But in this case, clients first connect to an ALB endpoint, which does support a custom domain. The sample code implements private custom domains using a public AWS Certificate Manager (ACM) certificate on the internal ALB, and an Amazon Route 53 hosted DNS zone. This allows you to provide a static URL to consumers so that if the API Gateway is replaced the consumer does not need to update their code.

Certificate revocation list

Optionally, as another layer of security, you can also configure a certificate revocation list for a trust store on the ALB. Revocation lists allow you to revoke and invalidate issued certificates before their expiry date. You can use this feature to off-boarding customers or denying compromised credentials, for example.

You can add the certificate revocation list to a new or existing trust store. The list is provided via an Amazon S3 URI as a PEM formatted file.

Conclusion

This post explores ways to provide mutual TLS authentication for private API Gateway endpoints. A previous post shows how to achieve this using a self-managed NGINX proxy. This post simplifies the architecture by using the native mTLS support now available for Application Load Balancers.

This new pattern centralizes authentication at the edge, streamlines deployment, and minimizes operational overhead compared to self-managed verification. AWS Private Certificate Authority and certificate revocation lists integrate with managed credentials and security policies. This makes it easier to expose private APIs safely across accounts and VPCs.

Mutual authentication and progressive security controls are growing in importance when architecting secure cloud-based workloads. To get started, visit the GitHub repository.

For more serverless learning resources, visit Serverless Land.

Using generative infrastructure as code with Application Composer

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/using-generative-infrastructure-as-code-with-application-composer/

This post is written by Anna Spysz, Frontend Engineer, AWS Application Composer

AWS Application Composer launched in the AWS Management Console one year ago, and has now expanded to the VS Code IDE as part of the AWS Toolkit. This includes access to a generative AI partner that helps you write infrastructure as code (IaC) for all 1100+ AWS CloudFormation resources that Application Composer now supports.

Overview

Application Composer lets you create IaC templates by dragging and dropping cards on a virtual canvas. These represent CloudFormation resources, which you can wire together to create permissions and references. With support for all 1100+ resources that CloudFormation allows, you can now build with everything from AWS Amplify to AWS X-Ray.

­­Previously, standard CloudFormation resources came only with a basic configuration. Adding an Amplify App resource resulted in the following configuration by default:

  MyAmplifyApp:
    Type: AWS::Amplify::App
    Properties:
      Name: <String>

And in the console:

AWS App Composer in the console

AWS App Composer in the console

Now, Application Composer in the IDE uses generative AI to generate resource-specific configurations with safeguards such as validation against the CloudFormation schema to ensure valid values.

When working on a CloudFormation or AWS Serverless Application Model (AWS SAM) template in VS Code, you can sign in with your Builder ID and generate multiple suggested configurations in Application Composer. Here is an example of an AI generated configuration for the AWS::Amplify::App type:

AI generated configuration for the Amplify App type

AI generated configuration for the Amplify App type

These suggestions are specific to the resource type, and are safeguarded by a check against the CloudFormation schema to ensure valid values or helpful placeholders. You can then select, use, and modify the suggestions to fit your needs.

You now know how to generate a basic example with one resource, but let’s look at building a full application with the help of AI-generated suggestions. This example recreates a serverless application from a Serverless Land tutorial, “Use GenAI capabilities to build a chatbot,” using Application Composer and generative AI-powered code suggestions.

Getting started with the AWS Toolkit in VS Code

If you don’t yet have the AWS Toolkit extension, you can find it under the Extensions tab in VS Code. Install or update it to at least version 2.1.0, so that the screen shows Amazon Q and Application Composer:

Amazon Q and Application Composer

Amazon Q and Application Composer

Next, to enable gen AI-powered code suggestions, you must enable Amazon CodeWhisperer using your Builder ID. The easiest way is to open Amazon Q chat, and select Authenticate. On the next screen, select the Builder ID option, then sign in with your Builder ID.

Enable Amazon CodeWhisperer using your Builder ID

Enable Amazon CodeWhisperer using your Builder ID

After sign-in, your connection appears in the VS Code toolkit panel:

Connection in VS Code toolkit panel

Connection in VS Code toolkit panel

Building with Application Composer

With the toolkit installed and connected with your Builder ID, you are ready to start building.

  1. In a new workspace, create a folder for the application and a blank template.yaml file.
  2. Open this file and initiate Application Composer by choosing the icon in the top right.
Initiate Application Composer

Original architecture diagram

The original tutorial includes this architecture diagram:

Original architecture diagram

Initiate Application Composer

First, add the services in the diagram to sketch out the application architecture, which simultaneously creates a deployable CloudFormation template:

  1. From the Enhanced components list, drag in a Lambda function and a Lambda layer.
  2. Double-click the Function resource to edit its properties. Rename the Lambda function’s Logical ID to LexGenAIBotLambda.
  3. Change the Source path to src/LexGenAIBotLambda, and the runtime to Python.
  4. Change the handler value to TextGeneration.lambda_handler, and choose Save.
  5. Double-click the Layer resource to edit its properties. Rename the layer Boto3Layer and change its build method to Python. Change its Source path to src/Boto3PillowPyshorteners.zip.
  6. Finally, connect the layer to the function to add a reference between them. Your canvas looks like this:
Your App Composer canvas

Your App Composer canvas

The template.yaml file is now updated to include those resources. In the source directory, you can see some generated function files. You will replace them with the tutorial function and layers later.

In the first step, you added some resources and Application Composer generated IaC that includes best practices defaults. Next, you will use standard CloudFormation components.

Using AI for standard components

Start by using the search bar to search for and add several of the Standard components needed for your application.

Search for and add Standard components

Search for and add Standard components

  1. In the Resources search bar, enter “lambda” and add the resource type AWS::Lambda::Permission to the canvas.
  2. Enter “iam” in the search bar, and add type AWS::IAM::Policy.
  3. Add two resources of the type AWS::IAM::Role.

Your application now look like this:

Updated canvas

Updated canvas

Some standard resources have all the defaults you need. For example, when you add the AWS::Lambda::Permission resource, replace the placeholder values with:

FunctionName: !Ref LexGenAIBotLambda
Action: lambda:InvokeFunction
Principal: lexv2.amazonaws.com

Other resources, such as the IAM roles and IAM policy, have a vanilla configuration. This is where you can use the AI assistant. Select an IAM Role resource and choose Generate suggestions to see what the generative AI suggests.

Generate suggestions

Generate suggestions

Because these suggestions are generated by a Large Language Model (LLM), they may differ between each generation. These are checked against the CloudFormation schema, ensuring validity and providing a range of configurations for your needs.

Generating different configurations gives you an idea of what a resource’s policy should look like, and often gives you keys that you can then fill in with the values you need. Use the following settings for each resource, replacing the generated values where applicable.

  1. Double-click the “Permission” resource to edit its settings. Change its Logical ID to LexGenAIBotLambdaInvoke and replace its Resource configuration with the following, then choose Save:
  2. Action: lambda:InvokeFunction
    FunctionName: !GetAtt LexGenAIBotLambda.Arn
    Principal: lexv2.amazonaws.com
  3. Double-click the “Role” resource to edit its settings. Change its Logical ID to CfnLexGenAIDemoRole and replace its Resource configuration with the following, then choose Save:
  4. AssumeRolePolicyDocument:
      Statement:
        - Action: sts:AssumeRole
          Effect: Allow
          Principal:
            Service: lexv2.amazonaws.com
      Version: '2012-10-17'
    ManagedPolicyArns:
      - !Join
        - ''
        - - 'arn:'
          - !Ref AWS::Partition
          - ':iam::aws:policy/AWSLambdaExecute'
  5. Double-click the “Role2” resource to edit its settings. Change its Logical ID to LexGenAIBotLambdaServiceRole and replace its Resource configuration with the following, then choose Save:
  6. AssumeRolePolicyDocument:
      Statement:
        - Action: sts:AssumeRole
          Effect: Allow
          Principal:
            Service: lambda.amazonaws.com
      Version: '2012-10-17'
    ManagedPolicyArns:
      - !Join
        - ''
        - - 'arn:'
          - !Ref AWS::Partition
          - ':iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
  7. Double-click the “Policy” resource to edit its settings. Change its Logical ID to LexGenAIBotLambdaServiceRoleDefaultPolicy and replace its Resource configuration with the following, then choose Save:
PolicyDocument:
  Statement:
    - Action:
        - lex:*
        - logs:*
        - s3:DeleteObject
        - s3:GetObject
        - s3:ListBucket
        - s3:PutObject
      Effect: Allow
      Resource: '*'
    - Action: bedrock:InvokeModel
      Effect: Allow
      Resource: !Join
        - ''
        - - 'arn:aws:bedrock:'
          - !Ref AWS::Region
          - '::foundation-model/anthropic.claude-v2'
  Version: '2012-10-17'
PolicyName: LexGenAIBotLambdaServiceRoleDefaultPolicy
Roles:
  - !Ref LexGenAIBotLambdaServiceRole

Once you have updated the properties of each resource, you see the connections and groupings automatically made between them:

Connections and automatic groupings

Connections and automatic groupings

To add the Amazon Lex bot:

  1. In the resource picker, search for and add the type AWS::Lex::Bot. Here’s another chance to see what configuration the AI suggests.
  2. Change the Amazon Lex bot’s logical ID to LexGenAIBot update its configuration to the following:
  3. DataPrivacy:
      ChildDirected: false
    IdleSessionTTLInSeconds: 300
    Name: LexGenAIBot
    RoleArn: !GetAtt CfnLexGenAIDemoRole.Arn
    AutoBuildBotLocales: true
    BotLocales:
      - Intents:
          - InitialResponseSetting:
              CodeHook:
                EnableCodeHookInvocation: true
                IsActive: true
                PostCodeHookSpecification: {}
            IntentClosingSetting:
              ClosingResponse:
                MessageGroupsList:
                  - Message:
                      PlainTextMessage:
                        Value: Hi there, I'm a GenAI Bot. How can I help you?
            Name: WelcomeIntent
            SampleUtterances:
              - Utterance: Hi
              - Utterance: Hey there
              - Utterance: Hello
              - Utterance: I need some help
              - Utterance: Help needed
              - Utterance: Can I get some help?
          - FulfillmentCodeHook:
              Enabled: true
              IsActive: true
              PostFulfillmentStatusSpecification: {}
            InitialResponseSetting:
              CodeHook:
                EnableCodeHookInvocation: true
                IsActive: true
                PostCodeHookSpecification: {}
            Name: GenerateTextIntent
            SampleUtterances:
              - Utterance: Generate content for
              - Utterance: 'Create text '
              - Utterance: 'Create a response for '
              - Utterance: Text to be generated for
          - FulfillmentCodeHook:
              Enabled: true
              IsActive: true
              PostFulfillmentStatusSpecification: {}
            InitialResponseSetting:
              CodeHook:
                EnableCodeHookInvocation: true
                IsActive: true
                PostCodeHookSpecification: {}
            Name: FallbackIntent
            ParentIntentSignature: AMAZON.FallbackIntent
        LocaleId: en_US
        NluConfidenceThreshold: 0.4
    Description: Bot created demonstration of GenAI capabilities.
    TestBotAliasSettings:
      BotAliasLocaleSettings:
        - BotAliasLocaleSetting:
            CodeHookSpecification:
              LambdaCodeHook:
                CodeHookInterfaceVersion: '1.0'
                LambdaArn: !GetAtt LexGenAIBotLambda.Arn
            Enabled: true
          LocaleId: en_US
  4. Choose Save on the resource.

Once all of your resources are configured, your application looks like this:

New AI generated canvas

New AI generated canvas

Adding function code and deployment

Once your architecture is defined, review and refine your template.yaml file. For a detailed reference and to ensure all your values are correct, visit the GitHub repository and check against the template.yaml file.

  1. Copy the Lambda layer directly from the repository, and add it to ./src/Boto3PillowPyshorteners.zip.
  2. In the .src/ directory, rename the generated handler.py to TextGeneration.py. You can also delete any unnecessary files.
  3. Open TextGeneration.py and replace the placeholder code with the following:
  4. import json
    import boto3
    import os
    import logging
    from botocore.exceptions import ClientError
    
    LOG = logging.getLogger()
    LOG.setLevel(logging.INFO)
    
    region_name = os.getenv("region", "us-east-1")
    s3_bucket = os.getenv("bucket")
    model_id = os.getenv("model_id", "anthropic.claude-v2")
    
    # Bedrock client used to interact with APIs around models
    bedrock = boto3.client(service_name="bedrock", region_name=region_name)
    
    # Bedrock Runtime client used to invoke and question the models
    bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name=region_name)
    
    
    def get_session_attributes(intent_request):
        session_state = intent_request["sessionState"]
        if "sessionAttributes" in session_state:
            return session_state["sessionAttributes"]
    
        return {}
    
    def close(intent_request, session_attributes, fulfillment_state, message):
        intent_request["sessionState"]["intent"]["state"] = fulfillment_state
        return {
            "sessionState": {
                "sessionAttributes": session_attributes,
                "dialogAction": {"type": "Close"},
                "intent": intent_request["sessionState"]["intent"],
            },
            "messages": [message],
            "sessionId": intent_request["sessionId"],
            "requestAttributes": intent_request["requestAttributes"]
            if "requestAttributes" in intent_request
            else None,
        }
    
    def lambda_handler(event, context):
        LOG.info(f"Event is {event}")
        accept = "application/json"
        content_type = "application/json"
        prompt = event["inputTranscript"]
    
        try:
            request = json.dumps(
                {
                    "prompt": "\n\nHuman:" + prompt + "\n\nAssistant:",
                    "max_tokens_to_sample": 4096,
                    "temperature": 0.5,
                    "top_k": 250,
                    "top_p": 1,
                    "stop_sequences": ["\\n\\nHuman:"],
                }
            )
    
            response = bedrock_runtime.invoke_model(
                body=request,
                modelId=model_id,
                accept=accept,
                contentType=content_type,
            )
    
            response_body = json.loads(response.get("body").read())
            LOG.info(f"Response body: {response_body}")
            response_message = {
                "contentType": "PlainText",
                "content": response_body["completion"],
            }
            session_attributes = get_session_attributes(event)
            fulfillment_state = "Fulfilled"
    
            return close(event, session_attributes, fulfillment_state, response_message)
    
        except ClientError as e:
            LOG.error(f"Exception raised while execution and the error is {e}")
  5. To deploy the infrastructure, go back to the App Composer extension, and choose the Sync icon. Follow the guided AWS SAM instructions to complete the deployment.
App Composer Sync

App Composer Sync

After the message SAM Sync succeeded, navigate to CloudFormation in the AWS Management Console to see the newly created resources. To continue building the chatbot, follow the rest of the original tutorial.

Conclusion

This guide demonstrates how AI-generated CloudFormation can streamline your workflow in Application Composer, enhance your understanding of resource configurations, and speed up the development process. As always, adhere to the AWS Responsible AI Policy when using these features.

Introducing Amazon MQ cross-Region data replication for ActiveMQ brokers

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/introducing-amazon-mq-cross-region-data-replication-for-activemq-brokers/

This post is written by Dominic Gagné, Senior Software Development Engineer, and Vinodh Kannan Sadayamuthu, Senior Solutions Architect

Amazon MQ now supports cross-Region data replication for ActiveMQ brokers. This feature enables you to build regionally resilient messaging applications and makes it easier to set up cross-Region message replication between ActiveMQ brokers in Amazon MQ. This blog post explains how cross-Region data replication works in Amazon MQ, how to setup cross-Region replica brokers for ActiveMQ, and how to test promoting a replica broker.

Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ that simplifies setting up and operating message brokers on AWS.

Cross-Region replication improves the resilience and disaster recovery capabilities of your systems. This new Amazon MQ feature makes it easier to increase resilience of your ActiveMQ messaging systems across AWS Regions.

How cross-Region data replication works in Amazon MQ for ActiveMQ

The Amazon MQ for ActiveMQ cross-Region data replication feature replicates broker state from the primary broker in one AWS Region to the replica broker in another Region. Broker state consists of messages that have been sent to a broker by a message producer. Additionally, message acknowledgments and transactions are replicated. Scheduled messages and broker XML configuration are not replicated from the primary to the replica broker.

State replication occurs asynchronously and runs in the background. When a message is sent to a cross-Region data replication enabled broker, the data is persisted both to the primary data store and also on a queue used to replicate data. The replica broker acts as a client of this queue and consumes data that represents broker state from the primary broker.

At any given moment, only the primary broker is available for client connections. The replica broker is a hot standby and passively replicates the primary broker’s state. However, it does not accept client connections. The following diagram shows a simplified version of a cross-Region data replication broker pair. All replication traffic is encrypted using TLS and remains within AWS’ private backbone.

Amazon MQ for ActiveMQ cross-region data replication architecture

Configuring cross-Region replica brokers for Amazon MQ for ActiveMQ

To set up a cross-Region replica broker, your Amazon MQ for ActiveMQ primary broker must meet the following eligibility criteria:

  • ActiveMQ version 5.17.6 or above
  • Instance size m5.large or higher
  • Active/standby broker deployment enabled
  • Be in the Running state

If you do not have an ActiveMQ broker that meets these criteria, see Creating and configuring an ActiveMQ broker for instructions on how to create a primary broker.

To configure cross-Region replication

  1. Navigate to the Amazon MQ console and choose Create replica broker.
    Amazon MQ console create replica broker
  2. Select a primary broker from the list of eligible primary brokers and choose Next.
    Amazon MQ console choose primary broker
  3. Under Replica broker details, select the Region for your replica broker and enter a Replica broker name.
    Amazon MQ console configure replica broker
  4. In the ActiveMQ console user for replica broker panel, enter a Username and Password for broker access.
    Amazon MQ console user for replica broker
  5. In the Data replication user to bridge access between brokers panel, enter a replication user Username and Password.
    Amazon MQ console user for replica broker
  6. In the Additional settings panel, keep the defaults and choose Next.
  7. Review the settings and choose Create replica broker.
    Note: The broker access type is automatically set based on the primary broker access type.
    Amazon MQ console create replica broker setting summary
  8. The creation process takes up to 25 minutes. Once the replica broker creation is complete, begin replication between the primary and the replica brokers by rebooting the primary broker.
  9. Once the primary broker is rebooted and its status is Running, you can see the replica details in the Data replication panel of the primary broker.
    Amazon MQ console broker replication details

Both brokers now synchronize with each other to establish an inter-Region network and connection through which broker state is replicated. Once both brokers are in the Running state, the primary broker accepts client connections and passes all broker state changes (messages, acknowledgments, transactions, etc.) to the replica broker.

The replica broker now asynchronously mirrors the state of the primary broker. However, it does not become available for client connections until it is promoted via a switchover or a failover. These operations are covered in the following section.

Testing data replication and promoting the replica broker

There are two ways to promote a replica broker: initiating a switchover or a failover.

Switchover Failover
  • Prioritizes consistency over availability.
  • Prioritizes availability over consistency.
  • Brokers are guaranteed to have identical states.
  • Brokers are not guaranteed to be in identical states.
  • Brokers may not be available immediately to serve client traffic.
  • Replica broker is immediately available to serve client traffic.

To initiate a failover or switchover

    1. Navigate to the Amazon MQ console, choose your primary broker, and log in to the ActiveMQ Web Console using the URLs located in the Connections panel.
    2. In the top menu, select Queues. You should be able to see four ActiveMQ.Plugin.Replication queues used by the replication feature.
      Active MQ console queues
    3. To test message replication from the primary to a replica broker, create a queue and send messages. To create the queue:
      • For Queue Name, enter TestQueue.
      • Choose Create.

      ActiveMQ console create queue

    4. Under Operations for the TestQueue, choose Send To and perform the following steps:
      • For Number of messages to send, enter 10 and keep the other defaults.
      • Under Message body, enter a test message.
      • Choose Send.

      ActiveMQ console send test message

    5. To promote the replica broker, navigate to the Amazon MQ console and change the Region to the AWS Region where the replica broker is located.
    6. Select the replica broker (in this example called Secondarybroker) and choose Promote replica.
      Amazon MQ console promote broker
    7. In the Promote replica broker pop-up window:
      • Select Failover or Switchover.
      • Enter confirm in text box.
      • Choose Confirm.

      Amazon MQ console confirm broker promotion

    8. While a replica broker is being promoted, its replication status changes to Promotion in progress. The corresponding primary broker’s replication status changes to Demotion in progress.

Replica Secondarybroker status – Promotion in progress:

Replica Secondarybroker status - Promotion in progress

Primary broker status – Demotion in progress:

Primary broker status - Demotion in progress

Secondarybroker status – Promoted to new primary broker:

Secondarybroker status – Promoted to new primary broker

  1. Once the Secondarybroker status is Running, log in to the ActiveMQ Web Console from the URLs located in the Connections panel. You can see the replicated messages sent from the former primary broker in Step 4 in the TestQueue:
    Replicated message from primary broker in TestQueue

Monitoring cross-Region data replication

To monitor cross-Region data replication progress, you can use the Amazon CloudWatch metrics TotalReplicationLag and ReplicationLag.

Amazon CloudWatch metrics TotalReplicationLag and ReplicationLag

You can use these two metrics to monitor the progress of a switchover. When their value reaches zero, the switchover will complete because the broker states have been synchronized and the replica broker begins accepting client connections. If the switchover does not progress fast enough, or if you need the replica broker to be immediately available to serve client traffic, you can request a failover at any time.

Note: A failover can interrupt an ongoing switchover. However, a switchover cannot interrupt an ongoing failover.

Issuing a failover request causes the replica broker to become immediately available, but does not provide any guarantees about what data has been replicated to the replica broker. This means that a failover can make data tracking and reconciliation more challenging for your client application than a switchover.

For this reason, we recommend that you always start with a switchover and interrupt it with a failover if necessary. To interrupt an ongoing switchover, follow the same steps as for promoting a replica broker, select the failover option, and confirm.

Note: If you fail back to the original primary broker, messages that are not replicated from the primary to the replica broker during the failover will still exist on the primary broker. Therefore, consumers must manage these messages. We recommend tracking the processed message IDs in a data store such as Amazon DynamoDB global tables and comparing the message to the processed message IDs.

If you no longer need to replicate broker data across Regions or if you need to delete a primary or replica broker, you must unpair the replica broker and reboot the primary broker. You can unpair the replica broker in the Amazon MQ console by following Delete a CRDR broker.

To unpair the broker using the AWS Command Line Interface (AWS CLI), run the following command, replacing the --broker-id with your primary broker ID:

aws mq update-broker --broker-id <primary broker ID> \
--data-replication-mode "NONE" \
--region us-east-1

Conclusion

Using the cross-Region data replication feature for Amazon MQ for ActiveMQ provides a straightforward way to implement cross-Region replication to improve the resilience of your architecture and meet your business continuity and disaster recovery requirements. This post explains how cross-Region data replication works in Amazon MQ, how to set up a cross-Region replica broker, and how to test and promote the replica broker.

For more details, see the Amazon MQ documentation.

For more serverless learning resources, visit Serverless Land.

Python 3.12 runtime now available in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/python-3-12-runtime-now-available-in-aws-lambda/

This post is written by Jeff Gebhart, Sr. Specialist TAM, Serverless.

AWS Lambda now supports Python 3.12 as both a managed runtime and container base image. Python 3.12 builds on the performance enhancements that were first released with Python 3.11, and adds a number of performance and language readability features in the interpreter. With this release, Python developers can now take advantage of these new features and enhancements when creating serverless applications on AWS Lambda.

You can use Python 3.12 with Powertools for AWS Lambda (Python), a developer toolkit to implement Serverless best practices such as observability, batch processing, Parameter Store integration, idempotency, feature flags, CloudWatch Metrics, and structured logging among other features.

You can also use Python 3.12 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

Python is a popular language for building serverless applications. The Python 3.12 release has a number of interpreter and syntactic improvements.

At launch, new Lambda runtimes receive less usage than existing, established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Lambda runtime changes

Amazon Linux 2023

The Python 3.12 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. This OS update brings several improvements over the Amazon Linux 2 (AL2)-based OS used for Lambda Python runtimes from Python 3.8 to Python 3.11.

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint of less than 40MB compared to over 100MB for Lambda’s AL2-based images.

With glibc version 2.34, customers have access to a modern version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the Python 3.12 base image.

Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Response format change

Starting with the Python 3.12 runtime, functions return Unicode characters as part of their JSON response. Previous versions return escaped sequences for Unicode characters in responses.

For example, in Python 3.11, if you return a Unicode string such as “こんにちは”, it escapes the Unicode characters and returns “\u3053\u3093\u306b\u3061\u306f”. The Python 3.12 runtime returns the original “こんにちは”.

This change reduces the size of the payload returned by Lambda. In the previous example, the escaped version is 32 bytes compared to 17 bytes with the Unicode string. Using Unicode responses reduces the size of Lambda responses, making it easier to fit larger responses into the 6MB Lambda response (synchronous) limit.

When upgrading to Python 3.12, you may need to adjust your code in other modules to account for this new behavior. If the caller expects escaped Unicode based on the previous runtime behavior, you must either add code to the returning function to escape the Unicode manually, or adjust the caller to handle the Unicode return.

Extensions processing for graceful shutdown

Lambda functions with external extensions can now benefit from improved graceful shutdown capabilities. When the Lambda service is about to shut down the runtime, it sends a SIGTERM signal to the runtime and then a SHUTDOWN event to each registered external extension.

These events are sent each time an execution environment shuts down. This allows you to catch the SIGTERM signal in your Lambda function and clean up resources, such as database connections, which were created by the function.

To learn more about the Lambda execution environment lifecycle, see Lambda execution environment. More details and examples of how to use graceful shutdown with extensions is available in the AWS Samples GitHub repository.

New Python features

Comprehension inlining

With the implementation of PEP 709, dictionary, list, and set comprehensions are now inlined. Prior versions create a single-use function to execute such comprehensions. Removing this overhead results in faster comprehension execution by a factor of two.

There are some behavior changes to comprehensions because of this update. For example, a call to the ‘locals()’ function from within the comprehension now includes objects from the containing scope, not just within the comprehension itself as in prior versions. You should test functions you are migrating from an earlier version of Python to Python 3.12.

Typing changes

Python 3.12 continues the evolution of including type annotations to Python. PEP 695 includes a new, more compact syntax for generic classes and functions, and adds a new “type” statement to allow for type alias creation. Type aliases are evaluated on demand. This permits aliases to refer to other types defined later.

Type parameters are visible within the scope of the declaration and any nested scopes, but not in the outer scope.

Formalization of f-strings

One of the largest changes in Python 3.12, the formalization of f-strings syntax, is covered under PEP 701. Any valid expression can now be contained within an f-string, including other f-strings.

In prior versions of Python, reusing quotes within an f-string results in errors. With Python 3.12, quote reuse is fully supported in nested f-strings such as the following example:

>>>songs = ['Take me back to Eden', 'Alkaline', 'Ascensionism']

>>>f"This is the playlist: {", ".join(songs)}"

'This is the playlist: Take me back to Eden, Alkaline, Ascensionism'

Additionally, any valid Python expression can be contained within an f-string. This includes multi-line expressions and the ability to embed comments within an f-string.

Before Python 3.12, the “\” character is not permitted within an f-string. This prevented use of “\N” syntax for defining escaped Unicode characters within the body of an f-string.

Asyncio improvements

There are a number of improvements to the asyncio module. These include performance improvements to writing of sockets and a new implementation of asyncio.current_task() that can yield a 4–6 times performance improvement. Event loops now optimize their child watchers for their underlying environment.

Using Python 3.12 in Lambda

AWS Management Console

To use the Python 3.12 runtime to develop your Lambda functions, specify a runtime parameter value Python 3.12 when creating or updating a function. The Python 3.12 version is now available in the Runtime dropdown in the Create Function page:

To update an existing Lambda function to Python 3.12, navigate to the function in the Lambda console and choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown:

AWS Lambda container image

Change the Python base image version by modifying the FROM statement in your Dockerfile:

FROM public.ecr.aws/lambda/python:3.12
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

Customers running the Python 3.12 Docker images locally, including customers using AWS SAM, must upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM set the Runtime attribute to python3.12 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.12

AWS SAM supports generating this template with Python 3.12 for new serverless applications using the `sam init` command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.PYTHON_3_12 to use this version. In Python CDK:

from constructs import Construct 
from aws_cdk import ( App, Stack, aws_lambda as _lambda )

class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        base_lambda = _lambda.Function(self, 'SampleLambda', 
                                       handler='lambda_handler.handler', 
                                    runtime=_lambda.Runtime.PYTHON_3_12, 
                                 code=_lambda.Code.from_asset('lambda'))

In TypeScript CDK:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The python3.12 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python311LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_12,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'
    })
  }
}

Conclusion

Lambda now supports Python 3.12. This release uses the Amazon Linux 2023 OS, supports Unicode responses, and graceful shutdown for functions with external extensions, and Python 3.12 language features.

You can build and deploy functions using Python 3.12 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC) tool. You can also use the Python 3.12 container base image if you prefer to build and deploy your functions using container images.

Python 3.12 runtime support helps developers to build more efficient, powerful, and scalable serverless applications. Try the Python 3.12 runtime in Lambda today and experience the benefits of this updated language version.

For more serverless learning resources, visit Serverless Land.

Introducing support for read-only management events in Amazon EventBridge

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/introducing-support-for-read-only-management-events-in-amazon-eventbridge/

This post is written by Pawan Puthran, Principal Specialist TAM, Serverless and Heeki Park, Principal Solutions Architect, Serverless

Today, AWS is announcing support for read-only management events in Amazon EventBridge. This feature enables customers to build rich event-driven responses from any action taken on AWS infrastructure to detect security vulnerabilities or identify suspicious activity in near real-time. You can now gain insight into all activity across all your AWS accounts and respond to those events as is appropriate.

Overview

EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the downstream targets that receive and process the events, and EventBridge routes the events accordingly.

EventBridge allows customers to monitor, audit, and react, in near real-time, to changes in their AWS environments through events generated by AWS CloudTrail for AWS API calls. CloudTrail records actions taken by a user, role, or an AWS service as events in a trail. Events include actions taken in the AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs and APIs.

Previously, only mutating API calls are published from CloudTrail to EventBridge for control plane changes. Events for mutating API calls include those that create, update, or delete resources. Control plane changes are also referred to as management events. EventBridge now supports non-mutating or read-only API calls for management events at no additional cost. These include those that list, get, or describe resources.

CloudTrail events in EventBridge enable you to build rich event-driven responses from any action taken on AWS infrastructure in real time. Previously, customers and partners often used a polling model to iterate over a batch of CloudTrail logs from Amazon S3 buckets to detect issues, making it slower to respond. The launch of read-only management events enables you to detect and remediate issues in near real-time and thus improve the overall security posture.

Enabling read-only management events

You can start receiving read-only management events if a CloudTrail is configured in your account and if the event selector for that CloudTrail is configured with ReadWriteType of either All or ReadOnly. This ensures that the read-only management events are logged in the CloudTrail and are then passed to EventBridge.

For example, you can receive an alert if a production account lists resources from an IP address outside of your VPC. Another example could be if an entity, such as a principal (AWS account root user, IAM role, or IAM user), calls the ListInstanceProfilesForRole or DescribeInstances APIs without any prior record of doing so. A malicious actor could use stolen credentials for conducting this reconnaissance to find more valuable credentials or determine the capabilities of the credentials they have.

To enable read-only management events with an EventBridge rule, in addition to the existing mutating events, use the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state when creating the rule.

Resources:
  SampleMgmtRule:
    Type: 'AWS::Events::Rule'
    Properties:
      Description: 'Example for enabling read-only management events'
      State: ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS
      EventPattern:
        source:
          - aws.s3
        detail:
          - AWS API Call via CloudTrail
      Targets:
        - Arn: 'arn:aws:sns:us-east-1:123456789012:notificationTopic'
          Id: 'NotificationTarget'

You can also create a rule on an event bus to specify a particular API action by using the eventName attribute under the detail key:

aws events put-rule --name "SampleTestRule" \
--event-pattern '{"detail": {"eventName": ["ListBuckets"]}}' \
--state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS --region us-east-1

Common scenarios and use-cases

The following section describes a couple of the scenarios where you can set up EventBridge rules and take actions on non-mutating or read-only management events.

Detecting anomalous Secrets Manager GetSecretValue API Calls

Consider the security team of your organization that wants a notification whenever GetSecretValue API calls for AWS Secrets Manager are made through the CLI. They can then investigate if these calls are made by entities outside their organization or by an unauthorized user and take corrective actions to deny such requests.

When the application calls the GetSecretValue API to retrieve the secrets via a CLI, it generates an event like this:

{
    "version": "0",
    "id": "d3368cc1-e6d6-e4bf-e58e-030f03b6eae3",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.secretsmanager",
    "account": "111111111111",
    "time": "2023-11-08T19:58:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "AAAAAAAAAAAAA",
            "arn": "arn:aws:iam:: 111111111111:user/USERNAME"
            // ... additional detail fields
        },
        "eventTime": "2023-11-08T19:58:38Z",
        "eventSource": "secretsmanager.amazonaws.com",
        "eventName": "GetSecretValue",
        "awsRegion": "us-east-1",
        "userAgent": "aws-cli/2.13.15 Python/3.11.4 Darwin/22.6.0 exe/x86_64 prompt/off command/secretsmanager.get-secret-value"
        // ... additional detail fields
    }
}

You set the following event pattern on the rule to filter incoming events to specific consumers. This example also uses the recently launched wildcard filter for event matching.

{
    "source": [
        "aws.secretsmanager"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "GetSecretValue"
        ],
        "userAgent": [
            {
                "wildcard": "aws-cli/*"
            }
        ]
    }
}

You can create a rule matching a combination of these event properties. In this case, you are matching for aws.secretsmanager as source, AWS API Call via CloudTrail as detail-type, GetSecretValue as detail.eventName and wildcard pattern on detail.userAgent for aws-cli/*. You can filter detail.userAgent with a wildcard to catch events that come from a particular application or user.

You can then route these events to a target like an Amazon CloudWatch Logs stream to record the change. You can also route them to Amazon SNS to get notified via email subscription. You can alternatively route them to an AWS Lambda function in which you perform custom business logic.

Creating an EventBridge rule for read-only management events

  1. Create a rule on the default event bus using the new state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS.
    aws events put-rule --name "monitor-secretsmanager" \
    --event-pattern '{"source": ["aws.secretsmanager"], "detail-type": ["AWS API Call via CloudTrail"], "detail": {"eventName": ["GetSecretValue"], "userAgent": [{ "wildcard": "aws-cli/*"} ]}}' \
    --state ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS --region us-east-1

    Rule details

    Rule details

  2. Configure a target. In this case, the target service is CloudWatch Logs but you can configure any of the supported targets.
    aws events put-targets --rule monitor-secretsmanager --targets Id=1,Arn=arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/events/getsecretvaluelogs --region us-east-1

    Target details

    Target details

You can then use CloudWatch Log Insights to search and analyze log data using the CloudWatch Log Insights query syntax where you can retrieve the user who performed these calls.

Identifying suspicious data exfiltration behavior

Consider the security or data perimeter team who wants to secure data residing in Amazon S3 buckets. The team requires notifications whenever API calls to list S3 buckets or to list S3 objects are made.

When a user or application calls the ListBuckets API to discover the available buckets, it generates the following CloudTrail event:

{
    "version": "0",
    "id": "345ca690-6510-85b2-ff02-090493a33cf1",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.s3",
    "account": "111111111111",
    "time": "2023-11-14T17:25:30Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.09",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "principal-identity-uuid",
            "arn": "arn:aws:iam::111111111111:user/exploited-user",
            "accountId": "111111111111",
            "accessKeyId": "AAAABBBBCCCCDDDDEEEE",
            "userName": "exploited-user"
        },
        "eventTime": "2023-11-14T17:25:30Z",
        "eventSource": "s3.amazonaws.com",
        "eventName": "ListBuckets",
        "awsRegion": "us-east-1",
        "sourceIPAddress": "11.22.33.44",
        "userAgent": "[aws-cli/2.13.29 Python/3.11.6 Darwin/22.6.0 exe/x86_64 prompt/off command/s3api.list-buckets]",
        "requestParameters": {
            "Host": "s3.us-east-1.amazonaws.com"
        },
        "readOnly": true,
        "eventType": "AwsApiCall",
        "managementEvent": true
        // additional detail fields
    }
}

In this scenario, you can create an EventBridge rule matching for aws.s3 for the source field, and ListBuckets for the eventName.

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "ListBuckets "
        ]
    }
}

However, listing objects alone might only be the beginning of a potential data exfiltration attempt. You may also want to check for ListObjects or ListObjectsV2 as the next action, followed by a large number of GetObject API calls. You can create the following rule to match those actions.

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "AWS API Call via CloudTrail"
    ],
    "detail": {
        "eventName": [
            "ListObjects",
            "ListObjectsV2",
            "GetObject"
        ]
    }
}

You could potentially forward this log information to your central security logging solution or use anomaly detection machine learning models to evaluate these events to determine the appropriate response to these events.

Configuring cross-account and cross-Region event routing

You can also create rules to receive the read-only events to cross account or cross-Region to centralize your AWS events into one Region or one AWS account for auditing and monitoring purposes. For example, capture all workload events from multiple Regions in eu-west-1 for compliance reporting.

Cross Account example for default event bus and custom event bus

Cross Account example for default event bus and custom event bus

To do this, create a rule using the new ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS state on the default bus of the source account or the Region, targeting either default event bus or a custom event bus of the target account or Region. You must also ensure you have a rule configured with ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS to be able to invoke the targets in the destination account or Region.

Cross-Region setup for CloudTrail read-only events

Cross-Region setup for CloudTrail read-only events

Conclusion

This blog shows how customers can build rich event-driven responses with the newly launched support for read-only events. You can now observe events as potential signals of reconnaissance and data exfiltration activities from any action taken on AWS infrastructure in near real time. You can also use the cross-Region and cross-account functionality to deliver the read-only events to a centralized AWS account or Region, enhancing the capability for auditing and monitoring across all your AWS environments.

For more serverless learning resources, visit Serverless Land.

Introducing advanced logging controls for AWS Lambda functions

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/introducing-advanced-logging-controls-for-aws-lambda-functions/

This post is written by Nati Goldberg, Senior Solutions Architect and Shridhar Pandey, Senior Product Manager, AWS Lambda

Today, AWS is launching advanced logging controls for AWS Lambda, giving developers and operators greater control over how function logs are captured, processed, and consumed.

This launch introduces three new capabilities to provide a simplified and enhanced default logging experience on Lambda.

First, you can capture Lambda function logs in JSON structured format without having to use your own logging libraries. JSON structured logs make it easier to search, filter, and analyze large volumes of log entries.

Second, you can control the log level granularity of Lambda function logs without making any code changes, enabling more effective debugging and troubleshooting.

Third, you can also set which Amazon CloudWatch log group Lambda sends logs to, making it easier to aggregate and manage logs at scale.

Overview

Being able to identify and filter relevant log messages is essential to troubleshoot and fix critical issues. To help developers and operators monitor and troubleshoot failures, the Lambda service automatically captures and sends logs to CloudWatch Logs.

Previously, Lambda emitted logs in plaintext format, also known as unstructured log format. This unstructured format could make the logs challenging to query or filter. For example, you had to search and correlate logs manually using well-known string identifiers such as “START”, “END”, “REPORT” or the request id of the function invocation. Without a native way to enrich application logs, you needed custom work to extract data from logs for automated analysis or to build analytics dashboards.

Previously, operators could not control the level of log detail generated by functions. They relied on application development teams to make code changes to emit logs with the required granularity level, such as INFO, DEBUG, or ERROR.

Lambda-based applications often comprise microservices, where a single microservice is composed of multiple single-purpose Lambda functions. Before this launch, Lambda sent logs to a default CloudWatch log group created with the Lambda function with no option to select a log group. Now you can aggregate logs from multiple functions in one place so you can uniformly apply security, governance, and retention policies to your logs.

Capturing Lambda logs in JSON structured format

Lambda now natively supports capturing structured logs in JSON format as a series of key-value pairs, making it easier to search and filter logs more easily.

JSON also enables you to add custom tags and contextual information to logs, enabling automated analysis of large volumes of logs to help understand the function performance. The format adheres to the OpenTelemetry (OTel) Logs Data Model, a popular open-source logging standard, enabling you to use open-source tools to monitor functions.

To set the log format in the Lambda console, select the Configuration tab, choose Monitoring and operations tools on the left pane, then change the log format property:

Currently, Lambda natively supports capturing application logs (logs generated by the function code) and system logs (logs generated by the Lambda service) in JSON structured format.

This is for functions that use non-deprecated versions of Python, Node.js, and Java Lambda managed runtimes, when using Lambda recommended logging methods such as using logging library for Python, console object for Node.js, and LambdaLogger or Log4j for Java.

For other managed runtimes, Lambda currently only supports capturing system logs in JSON structured format. However, you can still capture application logs in JSON structured format for these runtimes by manually configuring logging libraries. See configuring advanced logging controls section in the Lambda Developer Guide to learn more. You can also use Powertools for AWS Lambda to capture logs in JSON structured format.

Changing log format from text to JSON can be a breaking change if you parse logs in a telemetry pipeline. AWS recommends testing any existing telemetry pipelines after switching log format to JSON.

Working with JSON structured format for Node.js Lambda functions

You can use JSON structured format with CloudWatch Embedded Metric Format (EMF) to embed custom metrics alongside JSON structured log messages, and CloudWatch automatically extracts the custom metrics for visualization and alarming. However, to use JSON log format along with EMF libraries for Node.js Lambda functions, you must use the latest version of the EMF client library for Node.js or the latest version of Powertools for AWS Lambda (TypeScript) library.

Configuring log level granularity for Lambda function

You can now filter Lambda logs by log level, such as ERROR, DEBUG, or INFO, without code changes. Simplified log level filtering enables you to choose the required logging granularity level for Lambda functions, without sifting through large volumes of logs to debug errors.

You can specify separate log level filters for application logs (which are logs generated by the function code) and system logs (which are logs generated by the Lambda service, such as START and REPORT log messages). Note that log level controls are only available if the log format of the function is set to JSON.

The Lambda console allows setting both the Application log level and System log level properties:

You can define the granularity level of each log event in your function code. The following statement prints out the event input of the function, emitted as a DEBUG log message:

console.debug(event);

Once configured, log events emitted with a lower log level than the one selected are not published to the function’s CloudWatch log stream. For example, setting the function’s log level to INFO results in DEBUG log events being ignored.

This capability allows you to choose the appropriate amount of logs emitted by functions. For example, you can set a higher log level to improve the signal-to-noise ratio in production logs, or set a lower log level to capture detailed log events for testing or troubleshooting purposes.

Customizing Lambda function’s CloudWatch log group

Previously, you could not specify a custom CloudWatch log group for functions, so you could not stream logs from multiple functions into a shared log group. Also, to set custom retention policy for multiple log groups, you had to create each log group separately using a pre-defined name (for example, /aws/lambda/<function name>).

Now you can select a custom CloudWatch log group to aggregate logs from multiple functions automatically within an application in one place. You can apply security, governance, and retention policies at the application level instead of individually to every function.

To distinguish between logs from different functions in a shared log group, each log stream contains the Lambda function name and version.

You can share the same log group between multiple functions to aggregate logs together. The function’s IAM policy must include the logs:CreateLogStream and logs:PutLogEvents permissions for Lambda to create logs in the specified log group. The Lambda service can optionally create these permissions, when you configure functions in the Lambda console.

You can set the custom log group in the Lambda console by entering the destination log group name. If the entered log group does not exist, Lambda creates it automatically.

Advanced logging controls for Lambda can be configured using Lambda APIAWS Management ConsoleAWS Command Line Interface (CLI), and infrastructure as code (IaC) tools such as AWS Serverless Application Model (AWS SAM) and AWS CloudFormation.

Example of Lambda advanced logging controls

This section demonstrates how to use the new advanced logging controls for Lambda using AWS SAM to build and deploy the resources in your AWS account.

Overview

The following diagram shows Lambda functions processing newly created objects inside an Amazon S3 bucket, where both functions emit logs into the same CloudWatch log group:

The architecture includes the following steps:

  1. A new object is created inside an S3 bucket.
  2. S3 publishes an event using S3 Event Notifications to Amazon EventBridge.
  3. EventBridge triggers two Lambda functions asynchronously.
  4. Each function processes the object to extract labels and text, using Amazon Rekognition and Amazon Textract.
  5. Both functions then emit logs into the same CloudWatch log group.

This uses AWS SAM to define the Lambda functions and configure the required logging controls. The IAM policy allows the function to create a log stream and emit logs to the selected log group:

DetectLabelsFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: detect-labels/
      Handler: app.lambdaHandler
      Runtime: nodejs18.x
      Policies:
        ...
        - Version: 2012-10-17
          Statement:
            - Sid: CloudWatchLogGroup
              Action: 
                - logs:CreateLogStream
                - logs:PutLogEvents
              Resource: !GetAtt CloudWatchLogGroup.Arn
              Effect: Allow
      LoggingConfig:
        LogFormat: JSON 
        ApplicationLogLevel: DEBUG 
        SystemLogLevel: INFO 
        LogGroup: !Ref CloudWatchLogGroup 

Deploying the example

To deploy the example:

  1. Clone the GitHub repository and explore the application.
    git clone https://github.com/aws-samples/advanced-logging-controls-lambda/
    
    cd advanced-logging-controls-lambda
  2. Use AWS SAM to build and deploy the resources to your AWS account. This compiles and builds the application using npm, and then populate the template required to deploy the resources:
    sam build
  3. Deploy the solution to your AWS account with a guided deployment, using AWS SAM CLI interactive flow:
    sam deploy --guided
  4. Enter the following values:
    • Stack Name: advanced-logging-controls-lambda
    • Region: your preferred Region (for example, us-east-1)
    • Parameter UploadsBucketName: enter a unique bucket name.
    • Accept the rest of the initial defaults.
  5. To test the application, use the AWS CLI to copy the sample image into the S3 bucket you created.
    aws s3 cp samples/skateboard.jpg s3://example-s3-images-bucket

Explore CloudWatch Logs to view the logs emitted into the log group created, AggregatedLabelsLogGroup:

The DetectLabels Lambda function emits DEBUG log events in JSON format to the log stream. Log events with the same log level from the ExtractText Lambda function are omitted. This is a result of the different application log level settings for each function (DEBUG and INFO).

You can also use CloudWatch Logs Insights to search, filter, and analyze the logs in JSON format using this sample query:

You can see the results:

Conclusion

Advanced logging controls for Lambda give you greater control over logging. Use advanced logging controls to control your Lambda function’s log level and format, allowing you to search, query, and filter logs to troubleshoot issues more effectively.

You can also choose the CloudWatch log group where Lambda sends your logs. This enables you to aggregate logs from multiple functions into a single log group, apply retention, security, governance policies, and easily manage logs at scale.

To get started, specify the required settings in the Logging Configuration for any new or existing Lambda functions.

Advanced logging controls for Lambda are available in all AWS Regions where Lambda is available at no additional cost. Learn more about AWS Lambda Advanced Logging Controls.

For more serverless learning resources, visit Serverless Land.

Introducing the AWS Integrated Application Test Kit (IATK)

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/aws-integrated-application-test-kit/

This post is written by Dan Fox, Principal Specialist Solutions Architect, and Brian Krygsman, Senior Solutions Architect.

Today, AWS announced the public preview launch of the AWS Integrated Application Test Kit (IATK). AWS IATK is a software library that helps you write automated tests for cloud-based applications. This blog post presents several initial features of AWS IATK, and then shows working examples using an example video processing application. If you are getting started with serverless testing, learn more at serverlessland.com/testing.

Overview

When you create applications composed of serverless services like AWS Lambda, Amazon EventBridge, or AWS Step Functions, many of your architecture components cannot be deployed to your desktop, but instead only exist in the AWS Cloud. In contrast to working with applications deployed locally, these types of applications benefit from cloud-based strategies for performing automated tests. For its public preview launch, AWS IATK helps you implement some of these strategies for Python applications. AWS IATK will support other languages in future launches.

Locating resources for tests

When you write automated tests for cloud resources, you need the physical IDs of your resources. The physical ID is the name AWS assigns to a resource after creation. For example, to send requests to Amazon API Gateway you need the physical ID, which forms the API endpoint.

If you deploy cloud resources in separate infrastructure as code stacks, you might have difficulty locating physical IDs. In CloudFormation, you create the logical IDs of the resources in your template, as well as the stack name. With IATK, you can get the physical ID of a resource if you provide the logical ID and stack name. You can also get stack outputs by providing the stack name. These convenient methods simplify locating resources for the tests that you write.

Creating test harnesses for event driven architectures

To write integration tests for event driven architectures, establish logical boundaries by breaking your application into subsystems. Your subsystems should be simple enough to reason about, and contain understandable inputs and outputs. One useful technique for testing subsystems is to create test harnesses. Test harnesses are resources that you create specifically for testing subsystems.

For example, an integration test can begin a subsystem process by passing an input test event to it. IATK can create a test harness for you that listens to Amazon EventBridge for output events. (Under the hood, the harness is composed of an EventBridge Rule that forwards the output event to Amazon Simple Queue Service.) Your integration test then queries the test harness to examine the output and determine if the test passes or fails. These harnesses help you create integration tests in the cloud for event driven architectures.

Establishing service level agreements to test asynchronous features

If you write a synchronous service, your automated tests make requests and expect immediate responses. When your architecture is asynchronous, your service accepts a request and then performs a set of actions at a later time. How can you test for the success of an activity if it does not have a specified duration?

Consider creating reasonable timeouts for your asynchronous systems. Document timeouts as service level agreements (SLAs). You may decide to publish your SLAs externally or to document them as internal standards. IATK contains a polling feature that allows you to establish timeouts. This feature helps you to test that your asynchronous systems complete tasks in a timely manner.

Using AWS X-Ray for detailed testing

If you want to gain more visibility into the interior details of your application, instrument with AWS X-Ray. With AWS X-Ray, you trace the path of an event through multiple services. IATK provides conveniences that help you set the AWS X-Ray sampling rate, get trace trees, and assert for trace durations. These features help you observe and test your distributed systems in greater detail.

Learn more about testing asynchronous architectures at aws-samples/serverless-test-samples.

Overview of the example application

To demonstrate the features of IATK, this post uses a portion of a serverless video application designed with a plugin architecture. A core development team creates the primary application. Distributed development teams throughout the organization create the plugins. One AWS CloudFormation stack deploys the primary application. Separate stacks deploy each plugin.

Communications between the primary application and the plugins are managed by an EventBridge bus. Plugins pull application lifecycle events off the bus and must put completion notification events back on the bus within 20 seconds. For testing, the core team has created an AWS Step Functions workflow that mimics the production process by emitting properly formatted example lifecycle events. Developers run this test workflow in development and test environments to verify that their plugins are communicating properly with the event bus.

The following demonstration shows an integration test for the example application that validates plugin behavior. In the integration test, IATK locates the Step Functions workflow. It creates a test harness to listen for the event completion notification to be sent by the plugin. The test then runs the workflow to begin the lifecycle process and start plugin actions. Then IATK uses a polling mechanism with a timeout to verify that the plugin complies with the 20 second service level agreement. This is the sequence of processing:

Sequence of processing

  1. The integration test starts an execution of the test workflow.
  2. The workflow puts a lifecycle event onto the bus.
  3. The plugin pulls the lifecycle event from the bus.
  4. When the plugin is complete, it puts a completion event onto the bus.
  5. The integration test polls for the completion event to determine if the test passes within the SLA.

Deploying and testing the example application

Follow these steps to review this application, build it locally, deploy it in your AWS account, and test it.

Downloading the example application

  1. Open your terminal and clone the example application from GitHub with the following command or download the code. This repository also includes other example patterns for testing serverless applications.
    git clone https://github.com/aws-samples/serverless-test-samples
  2. The root of the IATK example application is in python-test-samples/integrated-application-test-kit. Change to this directory:
    cd serverless-test-samples/python-test-samples/integrated-application-test-kit

Reviewing the integration test

Before deploying the application, review how the integration test uses the IATK by opening plugins/2-postvalidate-plugins/python-minimal-plugin/tests/integration/test_by_polling.py in your text editor. The test class instantiates the IATK at the top of the file.

iatk_client = aws_iatk.AwsIatk(region=aws_region)

In the setUp() method, the test class uses IATK to fetch CloudFormation stack outputs. These outputs are references to deployed cloud components like the plugin tester AWS Step Functions workflow:

stack_outputs = self.iatk_client.get_stack_outputs(
    stack_name=self.plugin_tester_stack_name,
    output_names=[
        "PluginLifecycleWorkflow",
        "PluginSuccessEventRuleName"
    ],
)

The test class attaches a listener to the default event bus using an Event Rule provided in the stack outputs. The test uses this listener later to poll for events.

add_listener_output = self.iatk_client.add_listener(
    event_bus_name="default",
    rule_name=self.existing_rule_name
)

The test class cleans up the listener in the tearDown() method.

self.iatk_client.remove_listeners(
    ids=[self.listener_id]
)

Once the configurations are complete, the method test_minimal_plugin_event_published_polling() implements the actual test.

The test first initializes the trigger event.

trigger_event = {
    "eventHook": "postValidate",
    "pluginTitle": "PythonMinimalPlugin"
}

Next, the test starts an execution of the plugin tester Step Functions workflow. It uses the plugin_tester_arn that was fetched during setUp.

self.step_functions_client.start_execution(
    stateMachineArn=self.plugin_tester_arn,
    input=json.dumps(trigger_event)
)

The test polls the listener, waiting for the plugin to emit events. It stops polling once it hits the SLA timeout or receives the maximum number of messages.

poll_output = self.iatk_client.poll_events(
    listener_id=self.listener_id,
    wait_time_seconds=self.SLA_TIMEOUT_SECONDS,
    max_number_of_messages=1,
)

Finally, the test asserts that it receives the right number of events, and that they are well-formed.

self.assertEqual(len(poll_output.events), 1)
self.assertEqual(received_event["source"], "video.plugin.PythonMinimalPlugin")
self.assertEqual(received_event["detail-type"], "plugin-complete")

Installing prerequisites

You need the following prerequisites to build this example:

Build and deploy the example application components

  1. Use AWS SAM to build and deploy the plugin tester to your AWS account. The plugin tester is the Step Functions workflow shown in the preceding diagram. During the build process, you can add the --use-container flag to the build command to instruct AWS SAM to create the application in a provided container. You can accept or override the default values during the deploy process. You will use “Stack Name” and “AWS Region” later to run the integration test.
    cd plugins/plugin_tester # Move to the plugin tester directory
    
    sam build --use-container # Build the plugin tester
    

    sam build

  2. Deploy the tester:
    sam deploy --guided # Deploy the plugin tester

    Deploy the tester

  3. Once the plugin tester is deployed, use AWS SAM to deploy the plugin.
    cd ../2-postvalidate-plugins/python-minimal-plugin # Move to the plugin directory
    
    sam build --use-container # Build the plugin

    Deploy the plugin

  4. Deploy the plugin:
    sam deploy --guided # Deploy the plugin

Running the test

You can run tests written with IATK using standard Python test runners like unittest and pytest. The example application test uses unittest.

    1. Use a virtual environment to organize your dependencies. From the root of the example application, run:
      python3 -m venv .venv # Create the virtual environment
      source .venv/bin/activate # Activate the virtual environment
      
    2. Install the dependencies, including the IATK:
      cd tests 
      pip3 install -r requirements.txt
      
    3. Run the test, providing the required environment variables from the earlier deployments. You can find correct values in the samconfig.toml file of the plugin_tester directory.
      samconfig.toml

      cd integration
      
      PLUGIN_TESTER_STACK_NAME=video-plugin-tester \
      AWS_REGION=us-west-2 \
      python3 -m unittest ./test_by_polling.py

You should see output as unittest runs the test.

Open the Step Functions console in your AWS account, then choose the PluginLifecycleWorkflow-<random value> workflow to validate that the plugin tester successfully ran. A recent execution shows a Succeeded status:

Recent execution status

Review other IATK features

The example application includes examples of other IATK features like generating mock events and retrieving AWS X-Ray traces.

Cleaning up

Use AWS SAM to clean up both the plugin and the plugin tester resources from your AWS account.

  1. Delete the plugin resources:
    cd ../.. # Move to the plugin directory
    sam delete # Delete the plugin

    Deleting resources

  2. Delete the plugin tester resources:
    cd ../../plugin_tester # Move to the plugin tester directory
    sam delete # Delete the plugin tester

    Deleting the tester

The temporary test harness resources that IATK created during the test are cleaned up when the tearDown method runs. If there are problems during teardown, some resources may not be deleted. IATK adds tags to all resources that it creates. You can use these tags to locate the resources then manually remove them. You can also add your own tags.

Conclusion

The AWS Integrated Application Test Kit is a software library that provides conveniences to help you write automated tests for your cloud applications. This blog post shows some of the features of the initial Python version of the IATK.

To learn more about automated testing for serverless applications, visit serverlessland.com/testing. You can also view code examples at serverlessland.com/testing/patterns or at the AWS serverless-test-samples repository on GitHub.

For more serverless learning resources, visit Serverless Land.

Triggering AWS Lambda function from a cross-account Amazon Managed Streaming for Apache Kafka

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/triggering-aws-lambda-function-from-a-cross-account-amazon-managed-streaming-for-apache-kafka/

This post is written by Subham Rakshit, Senior Specialist Solutions Architect, and Ismail Makhlouf, Senior Specialist Solutions Architect.

Many organizations use a multi-account strategy for stream processing applications. This involves decomposing the overall architecture into a single producer account and many consumer accounts. Within AWS, in the producer account, you can use Amazon Managed Streaming for Apache Kafka (Amazon MSK), and in their consumer accounts have AWS Lambda functions for event consumption. This blog post explains how you can trigger Lambda functions from a cross-account Amazon MSK cluster.

The Lambda event sourcing mapping (ESM) for Amazon MSK continuously polls for new events from the Amazon MSK cluster, aggregates them into batches, and then triggers the target Lambda function. The ESM for Amazon MSK functions as a serverless set of Kafka consumers that ensures that each event is processed at least once. Additionally, events are processed in the same order they are received within each Kafka partition. In addition, the ESM batches the stream of data and filters the events based on configured logic.

Overview

Amazon MSK supports two different deployment types: provisioned and serverless. Triggering a Lambda function from a cross-account Amazon MSK cluster is only supported with a provisioned cluster deployed within the same Region. To facilitate this functionality, Amazon MSK uses multi-VPC private connectivity, powered by AWS PrivateLink, which simplifies connecting Kafka consumers hosted in different AWS accounts to an Amazon MSK cluster.

The following diagram illustrates the architecture of this example:

Architecture Diagram

The architecture is divided in two parts: the producer and the consumer.

In the producer account, you have the Amazon MSK cluster with multi-VPC connectivity enabled. Multi-VPC connectivity is only available for authenticated Amazon MSK clusters. Cluster policies are required to grant permissions to other AWS accounts, allowing them to establish private connectivity to the Amazon MSK cluster. You can delegate permissions to relevant roles or users. When combined with AWS Identity and Access Management (IAM) client authentication, cluster policies offer fine-grained control over Kafka data plane permissions for connecting applications.

In the consumer account, you have the Lambda ESM for Amazon MSK and the managed VPC connection deployed within the same VPC. The managed VPC connection allows private connectivity from the consumer application VPC to the Amazon MSK cluster. The Lambda ESM for Amazon MSK connects to the cross-account Amazon MSK cluster via IAM authentication. It also supports SASL/SCRAM, and mutual TLS (mTLS) authenticated clusters. The ESM receives the event from the Kafka topic and invokes the Lambda function to process it.

Deploying the example application

To set up the Lambda function trigger from a cross-account Amazon MSK cluster as the event source, follow these steps. The AWS CloudFormation templates for deploying the example are accessible in the GitHub repository.

As a part of this example, some sample data is published using the Kafka console producer and Lambda processes these events and writes to Amazon S3.

Pre-requisites

For this example, you need two AWS accounts. This post uses the following naming conventions:

  • Producer (for example, account No: 1111 1111 1111): Account that hosts the Amazon MSK cluster and Kafka client instance.
  • Consumer (for example, account No: 2222 2222 2222): Account that hosts the Lambda function and consumes events from Amazon MSK.

To get started:

  1. Clone the repository locally:
    git clone https://github.com/aws-samples/lambda-cross-account-msk.git
  2. Set up the producer account: you must configure the VPC networking, deploy the Amazon MSK cluster, and a Kafka client instance to publish data. To do this, deploy the CloudFormation template producer-account.yaml from the AWS console and take note of the MSKClusterARN from the CloudFormation outputs tab.
  3. Set up the consumer account: To set up the consumer account, you need the Lambda function, IAM role used by the Lambda function, and S3 bucket receiving the data. For this, deploy the CloudFormation template consumer-account.yaml from the AWS console with the input parameter MSKAccountId, that is the producer AWS account ID (for example, account Id: 1111 1111 1111). Note the LambdaRoleArn from the CloudFormation outputs tab.

Setting up multi-VPC connectivity in the Amazon MSK cluster

Once the accounts are created, you must enable connectivity between them. By enabling multi-VPC private connectivity in the Amazon MSK cluster, you set up the network connection to allow the cross-account consumers to connect to the cluster.

  1. In the producer account, navigate to the Amazon MSK console.
  2. Choose producer-cluster, and go to the Properties tab.
  3. Scroll to Networking settings, choose Edit, and select Turn on multi-VPC connectivity. This takes some time, then appears as follows.Networking settings
  4. Add the necessary cluster policy to allow cross-account consumers to connect to Amazon MSK. In the producer account, deploy the CloudFormation template producer-msk-cluster-policy.yaml from the AWS console with the following input parameters:
    • MSKClusterArnAmazon Resource Name (ARN) of the Amazon MSK cluster in producer account. Find this information in the CloudFormation output of producer-account.yaml.
    • LambdaRoleArn – ARN of the IAM role attached to the Lambda function in the consumer account. Find this information in the CloudFormation output of consumer-account.yaml.
    • LambdaAccountId – Consumer AWS account ID (for example, account Id: 2222 2222 2222).

Creating a Kafka topic in Amazon MSK and publishing events

In the producer account, navigate to the Amazon MSK console. Choose the Amazon MSK cluster named producer-cluster. Choose View client information to show the bootstrap server.

Client information

The CloudFormation template also deploys a Kafka client instance to create topics and publish events.

To access the client, go to the Amazon EC2 console and choose the instance producer-KafkaClientInstance1. Connect to EC2 instance with Session Manager:

sudo su - ec2-user
#Set MSK Broker IAM endpoint
export BS=<<Provide IAM bootstrap address here>>

You must use the single-VPC Private endpoint for the Amazon MSK cluster and not the multi-VPC private endpoint, as you are going to publish events from a Kafka console producer from the producer account.

Run these scripts to create the customer topic and publish sample events in the topic:

./kafka_create_topic.sh
./kafka_produce_events.sh

Creating a managed VPC connection in the consumer account

To establish a connection to the Amazon MSK cluster in the producer account, you must create a managed VPC connection in the consumer account. Lambda communicates with cross-account Amazon MSK through this managed VPC connection.

For detailed setup steps, read the Amazon MSK managed VPC connection documentation.

Configuring the Lambda ESM for Amazon MSK

The final step is to set up the Lambda ESM for Amazon MSK. Setting up the ESM enables you to connect to the Amazon MSK cluster in the producer account via the managed VPC endpoint. This allows you to trigger the Lambda function to process the data produced from the Kafka topic:

  1. In the consumer account, go to the Lambda console.
  2. Open the Lambda function msk-lambda-cross-account-iam.
  3. Go to the Configuration tab, select Triggers, and choose Add Trigger.
  4. For Trigger configuration, select Amazon MSK.

Lambda trigger

To configure this trigger:

  1. Select the shared Amazon MSK cluster. This automatically defaults to the IAM authentication that is used to connect to the cluster.
    MSK Lambda trigger
  2. By default, the Active trigger check box is enabled. This ensures that the trigger is in the active state after creation. For the other values:
    1. Keep the Batch size default to 100.
    2. Change the Starting Position to Trim horizon.
    3. Set the Topic name as customer.
    4. Set the Consumer Group ID as msk-lambda-iam.

Trigger configuration

Scroll to the bottom and choose Add. This starts creating the Amazon MSK trigger, which takes several minutes. After creation, the state of the trigger shows as Enabled.

Verifying the output on the consumer side

The Lambda function receives the events and writes them in an S3 bucket.

To validate that the function is working, go to the consumer account and navigate to the S3 console. Search for the cross-account-lambda-consumer-data-<<REGION>>-<<AWS Account Id>> bucket. In the bucket, you see the customer-data-<<datetime>>.csv files.

S3 bucket objects

Cleaning up

You must empty and delete the S3 bucket, managed VPC connection, and the Lambda ESM for Amazon MSK manually from the consumer account. Next, delete the CloudFormation stacks from the AWS console from both the producer and consumer accounts to remove all other resources created as a part of the example.

Conclusion

With Lambda and Amazon MSK, you can now build a decentralized application distributed across multiple AWS accounts. This post shows how you can set up Amazon MSK as an event source for cross-account Lambda functions and also walks you through the configuration required in both producer and consumer accounts.

For further reading on AWS Lambda with Amazon MSK as an event source, visit the documentation.

For more serverless learning resources, visit Serverless Land.

Node.js 20.x runtime now available in AWS Lambda

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/node-js-20-x-runtime-now-available-in-aws-lambda/

This post is written by Pascal Vogel, Solutions Architect, and Andrea Amorosi, Senior Solutions Architect.

You can now develop AWS Lambda functions using the Node.js 20 runtime. This Node.js version is in active LTS status and ready for general use. To use this new version, specify a runtime parameter value of nodejs20.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 20 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes proven libraries to support common patterns such as observability, Parameter Store integration, idempotency, batch processing, and more.

You can also use Node.js 20 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 20 runtime in your serverless applications.

Node.js 20 runtime updates

Changes to Root CA certificate loading

By default, Node.js includes root certificate authority (CA) certificates from well-known certificate providers. Earlier Lambda Node.js runtimes up to Node.js 18 augmented these certificates with Amazon-specific CA certificates, making it easier to create functions accessing other AWS services. For example, it included the Amazon RDS certificates necessary for validating the server identity certificate installed on your Amazon RDS database.

However, loading these additional certificates has a performance impact during cold start. Starting with Node.js 20, Lambda no longer loads these additional CA certificates by default. The Node.js 20 runtime contains a certificate file with all Amazon CA certificates located at /var/runtime/ca-cert.pem. By setting the NODE_EXTRA_CA_CERTS environment variable to /var/runtime/ca-cert.pem, you can restore the behavior from Node.js 18 and earlier runtimes.

This causes Node.js to validate and load all Amazon CA certificates during a cold start. It can take longer compared to loading only specific certificates. For the best performance, we recommend bundling only the certificates that you need with your deployment package and loading them via NODE_EXTRA_CA_CERTS. The certificates file should consist of one or more trusted root or intermediate CA certificates in PEM format.

For example, for RDS, include the required certificates alongside your code as certificates/rds.pem and then load it as follows:

NODE_EXTRA_CA_CERTS=/var/task/certificates/rds.pem

See Using Lambda environment variables in the AWS Lambda Developer Guide for detailed instructions for setting environment variables.

Amazon Linux 2023

The Node.js 20 runtime is based on the provided.al2023 runtime. The provided.al2023 runtime in turn is based on the Amazon Linux 2023 minimal container image release and brings several improvements over Amazon Linux 2 (AL2).

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint with a compressed image size of less than 40MB compared to the over 100MB AL2-based base image.

With glibc version 2.34, customers have access to a more recent version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as package manager, symlinked as dnf, replacing yum in AL2-based images. Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Runtime Interface Client

The Node.js 20 runtime uses the open source AWS Lambda NodeJS Runtime Interface Client (RIC). You can now use the same RIC version in your Open Container Initiative (OCI) Lambda container images as the one used by the managed Node.js 20 runtime.

The Node.js 20 runtime supports Lambda response streaming which enables you to send response payload data to callers as it becomes available. Response streaming can improve application performance by reducing time-to-first-byte, can indicate progress during long-running tasks, and allows you to build functions that return payloads larger than 6MB, which is the Lambda limit for buffered responses.

Setting Node.js heap memory size

Node.js allows you to configure the heap size of the v8 engine via the --max-old-space-size and --max-semi-space-size options. By default, Lambda overrides the Node.js default values with values derived from the memory size configured for the function. If you need control over your runtime’s memory allocation, you can now set both of these options using the NODE_OPTIONS environment variable, without needing an exec wrapper script. See Using Lambda environment variables in the AWS Lambda Developer Guide for details.

Use the --max-old-space-size option to set the max memory size of V8’s old memory section, and the --max-semi-space-size option to set the maximum semispace size for V8’s garbage collector. See the Node.js documentation for more details on these options.

Node.js 20 language updates

Language features

With this release, Lambda customers can take advantage of new Node.js 20 language features, including:

  • HTTP(S)/1.1 default keepAlive: Node.js now sets keepAlive to true by default. Any outgoing HTTPs connections use HTTP 1.1 keep-alive with a default waiting window of 5 seconds. This can deliver improved throughput as connections are reused by default.
  • Fetch API is enabled by default: The global Node.js Fetch API is enabled by default. However, it is still an experimental module.
  • Faster URL parsing: Node.js 20 comes with the Ada 2.0 URL parser which brings performance improvements to URL parsing. This has also been back-ported to Node.js 18.7.0.
  • Web Crypto API now stable: The Node.js implementation of the standard Web Crypto API has been marked as stable. You can access the provided cryptographic primitives through globalThis.crypto.
  • Web assembly support: Node.js 20 enables the experimental WebAssembly System Interface (WASI) API by default without the need to set an experimental flag.

For a detailed overview of Node.js 20 language features, see the Node.js 20 release blog post and the Node.js 20 changelog.

Performance considerations

Node.js 19.3 introduced a change that impacts how non-essential modules are lazy-loaded during the Node.js process startup. In terms of the impact to your Lambda functions, this reduces the work during initialization of each execution environment, then if used, the modules will instead be loaded during the first function invoke. This change remains in Node.js 20.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

Migration from Node.js 16

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

In the case of Node.js 16, we have delayed deprecation from the community end of support date on September 11, 2023, to June 12, 2024. This gives customers the opportunity to migrate directly from Node.js 16 to Node.js 20, skipping Node.js 18.

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2020. Starting with Node.js 18, and continuing with Node.js 20, the Lambda Node.js runtimes have upgraded the version of the AWS SDK for JavaScript included in the runtime from v2 to v3. Customers upgrading from Node.js 16 or earlier runtimes who are using the included AWS SDK for JavaScript v2 should upgrade their code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Using the Node.js 20 runtime in AWS Lambda

AWS Management Console

To use the Node.js 20 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 20.x when creating or updating a function. The Node.js 20 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

Select Node.js 20.x when creating a new AWS Lambda function in the AWS Management Console

To update an existing Lambda function to Node.js 20, navigate to the function in the Lambda console, then choose Edit in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

Select Node.js 20.x when updating an existing AWS Lambda function in the AWS Management Console

AWS Lambda – Container Image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile:


FROM public.ecr.aws/lambda/nodejs:20
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

Customers running Node.js 20 Docker images locally, including customers using AWS SAM, will need to upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node20.x to use this version:


AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs20.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

AWS Cloud Development Kit (AWS CDK)

In the AWS CDK, set the runtime attribute to Runtime.NODEJS_20_X to use this version:


import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 20 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node20LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_20_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}

Conclusion

Lambda now supports Node.js 20. This release uses the Amazon Linux 2023 OS, supports configurable CA certificate loading for faster cold starts, as well as other improvements detailed in this blog post.

You can build and deploy functions using the Node.js 20 runtime using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC). You can also use the Node.js 20 container base image if you prefer to build and deploy your functions using container images.

The Node.js 20 runtime empowers developers to build more efficient, powerful, and scalable serverless applications. Try the Node.js runtime in Lambda today and read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 20.

For more serverless learning resources, visit Serverless Land.

Converting Apache Kafka events from Avro to JSON using EventBridge Pipes

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/converting-apache-kafka-events-from-avro-to-json-using-eventbridge-pipes/

This post is written by Pascal Vogel, Solutions Architect, and Philipp Klose, Global Solutions Architect.

Event streaming with Apache Kafka has become an important element of modern data-oriented and event-driven architectures (EDAs), unlocking use cases such as real-time analytics of user behavior, anomaly and fraud detection, and Internet of Things event processing. Stream producers and consumers in Kafka often use schema registries to ensure that all components follow agreed-upon event structures when sending (serializing) and processing (deserializing) events to avoid application bugs and crashes.

A common schema format in Kafka is Apache Avro, which supports rich data structures in a compact binary format. To integrate Kafka with other AWS and third-party services more easily, AWS offers Amazon EventBridge Pipes, a serverless point-to-point integration service. However, many downstream services expect JSON-encoded events, requiring custom, and repetitive schema validation and conversion logic from Avro to JSON in each downstream service.

This blog post shows how to reliably consume, validate, convert, and send Avro events from Kafka to AWS and third-party services using EventBridge Pipes, allowing you to reduce custom deserialization logic in downstream services. You can also use EventBridge event buses as targets in Pipes to filter and distribute events from Pipes to multiple targets, including cross-account and cross-Region delivery.

This blog describes two scenarios:

  1. Using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and AWS Glue Schema Registry.
  2. Using Confluent Cloud and the Confluent Schema Registry.

See the associated GitHub repositories for Glue Schema Registry or Confluent Schema Registry for full source code and detailed deployment instructions.

Kafka event streaming and schema validation on AWS

To build event streaming applications with Kafka on AWS, you can use Amazon MSK, offerings such as Confluent Cloud, or self-hosted Kafka on Amazon Elastic Compute Cloud (Amazon EC2) instances.

To avoid common issues in event streaming and event-driven architectures, such as data inconsistencies and incompatibilities, it is a recommended practice to define and share event schemas between event producers and consumers. In Kafka, schema registries are used to manage, evolve, and enforce schemas for event producers and consumers. The AWS Glue Schema Registry provides a central location to discover, manage, and evolve schemas. In the case of Confluent Cloud, the Confluent Schema Registry serves the same role. Both the Glue Schema Registry and the Confluent Schema Registry support common schema formats such as Avro, Protobuf, and JSON.

To integrate Kafka with AWS services, third-party services, and your own applications, you can use EventBridge Pipes. EventBridge Pipes helps you create point-to-point integrations between event sources and targets with optional filtering, transformation, and enrichment. EventBridge Pipes reduces the amount of integration code that you have to write and maintain when building EDAs.

Many AWS and third-party services expect JSON-encoded payloads (events) as input, meaning they cannot directly consume Avro or Protobuf payloads. To replace repetitive Avro-to-JSON validation and conversion logic in each consumer, you can use the EventBridge Pipes enrichment step. This solution uses an AWS Lambda function in the enrichment step to deserialize and validate Kafka events with a schema registry, including error handling with dead-letter queues, and convert events to JSON before passing them to downstream services.

Solution overview

Architecture overview of the solution

The solution presented in this blog post consists of the following key elements:

  1. The source of the pipe is a Kafka cluster deployed using MSK or Confluent Cloud. EventBridge Pipes reads events from the Kafka stream in batches and sends them to the enrichment function (see here for an example event).
  2. The enrichment step (Lambda function) deserializes and validates the events against the configured schema registry (Glue or Confluent), converts events from Avro to JSON with integrated error handling, and returns them to the pipe.
  3. The target of this example solution is an EventBridge custom event bus that is invoked by EventBridge Pipes with JSON-encoded events returned by the enrichment Lambda function. EventBridge Pipes supports a variety of other targets, including Lambda, AWS Step Functions, Amazon API Gateway, API destinations, and more, enabling you to build EDAs without writing integration code.
  4. In this sample solution, the event bus sends all events to Amazon CloudWatch Logs via an EventBridge rule. You can extend the example to invoke additional EventBridge targets.

Optionally, you can add OpenAPI 3 or JSONSchema Draft 4 schemas for your events in the EventBridge schema registry by either manually generating it from the Avro schema or using EventBridge schema discovery. This allows you to download code bindings for the JSON-converted events for various programming languages, such as JavaScript, Python, and Java, to correctly use them in your EventBridge targets.

The remainder of this blog post describes this solution for the Glue and Confluent schema registries with code examples.

EventBridge Pipes with the Glue Schema Registry

This section describes how to implement event schema validation and conversion from Avro to JSON using EventBridge Pipes and the Glue Schema Registry. You can find the source code and detailed deployment instructions on GitHub.

Prerequisites

You need an Amazon MSK serverless cluster running and the Glue Schema registry configured. This example includes a Avro schema and a Glue Schema Registry. See the following AWS blog post for an introduction to schema validation with the Glue Schema Registry: Validate, evolve, and control schemas in Amazon MSK and Amazon Kinesis Data Streams with AWS Glue Schema Registry.

EventBridge Pipes configuration

Use the AWS Cloud Development Kit (AWS CDK) template provided in the GitHub repository to deploy:

  1. An EventBridge pipe that connects to your existing Amazon MSK Serverless Kafka topic as the source via AWS Identity and Access Management (IAM) authentication.
  2. EventBridge Pipes reads events from your Kafka topic using the Amazon MSK source type.
  3. An enrichment Lambda function in Java to perform event deserialization, validation, and conversion from Avro to JSON.
  4. An Amazon Simple Queue Service (Amazon SQS) dead letter queue to hold events for which deserialization failed.
  5. An EventBridge custom event bus as the pipe target. An EventBridge rule sends all incoming events into a CloudWatch Logs log group.

For MSK-based sources, EventBridge supports configuration parameters, such as batch window, batch size, and starting position, which you can set using the parameters of the CfnPipe class in the example CDK stack.

The example EventBridge pipe consumes events from Kafka in batches of 10 because it is targeting an EventBridge event bus, which has a max batch size of 10. See batching and concurrency in the EventBridge Pipes User Guide to choose an optimal configuration for other targets.

EventBridge Pipes with the Confluent Schema Registry

This section describes how to implement event schema validation and conversion from Avro to JSON using EventBridge Pipes and the Confluent Schema Registry. You can find the source code and detailed deployment instructions on GitHub.

Prerequisites

To set up this solution, you need a Kafka stream running on Confluent Cloud as well as the Confluent Schema Registry set up. See the corresponding Schema Registry tutorial for Confluent Cloud to set up a schema registry for your Confluent Kafka stream.

To connect to your Confluent Cloud Kafka cluster, you need an API key for Confluent Cloud and Confluent Schema Registry. AWS Secrets Manager is used to securely store your Confluent secrets.

EventBridge Pipes configuration

Use the AWS CDK template provided in the GitHub repository to deploy:

  1. An EventBridge pipe that connects to your existing Confluent Kafka topic as the source via an API secret stored in Secrets Manager.
  2. EventBridge Pipes reads events from your Confluent Kafka topic using the self-managed Apache Kafka stream source type, which includes all non-MSK Kafka clusters.
  3. An enrichment Lambda function in Python to perform event deserialization, validation, and conversion from Avro to JSON.
  4. An SQS dead letter queue to hold events for which deserialization failed.
  5. An EventBridge custom event bus as the pipe target. An EventBridge rule writes all incoming events into a CloudWatch Logs log group.

For self-managed Kafka sources, EventBridge supports configuration parameters, such as batch window, batch size, and starting position, which you can set using the parameters of the CfnPipe class in the example CDK stack.

The example EventBridge pipe consumes events from Kafka in batches of 10 because it is targeting an EventBridge event bus, which has a max batch size of 10. See batching and concurrency in the EventBridge Pipes User Guide to choose an optimal configuration for other targets.

Enrichment Lambda functions

Both of the solutions described previously include an enrichment Lambda function for schema validation and conversion from Avro to JSON.

The Java Lambda function integrates with the Glue Schema Registry using the AWS Glue Schema Registry Library. The Python Lambda function integrates with the Confluent Schema Registry using the confluent-kafka library and uses Powertools for AWS Lambda (Python) to implement Serverless best practices such as logging and tracing.

The enrichment Lambda functions perform the following tasks:

  1. In the events polled from the Kafka stream by the EventBridge pipe, the key and value of the event are base64 encoded. Therefore, for each event in the batch passed to the function, the key and the value are decoded.
  2. The event key is assumed to be serialized by the producer as a string type.
  3. The event value is deserialized using the Glue Schema registry Serde (Java) or the confluent-kafka AvroDeserializer (Python).
  4. The function then returns the successfully converted JSON events to the EventBridge pipe, which then invokes the target for each of them.
  5. Events for which Avro deserialization failed are sent to the SQS dead letter queue.

Conclusion

This blog post shows how to implement event consumption, Avro schema validation, and conversion to JSON using Amazon EventBridge Pipes, Glue Schema Registry, and Confluent Schema Registry.

The source code for the presented example is available in the AWS Samples GitHub repository for Glue Schema Registry and Confluent Schema Registry. For more patterns, visit the Serverless Patterns Collection.

For more serverless learning resources, visit Serverless Land.

Enhanced Amazon CloudWatch metrics for Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/enhanced-amazon-cloudwatch-metrics-for-amazon-eventbridge/

This post is written by Vaibhav Shah, Sr. Solutions Architect.

Customers use event-driven architectures to orchestrate and automate their event flows from producers to consumers. Amazon EventBridge acts as a serverless event router for various targets based on event rules. It decouples the producers and consumers, allowing customers to build asynchronous architectures.

EventBridge provides metrics to enable you to monitor your events. Some of the metrics include: monitoring the number of partner events ingested, the number of invocations that failed permanently, and the number of times a target is invoked by a rule in response to an event, or the number of events that matched with any rule.

In response to customer requests, EventBridge has added additional metrics that allow customers to monitor their events and provide additional visibility. This blog post explains these new capabilities.

What’s new?

EventBridge has new metrics mainly around the API, events, and invocations metrics. These metrics give you insights into the total number of events published, successful events published, failed events, number of events matched with any or specific rule, events rejected because of throttling, latency, and invocations based metrics.

This allows you to track the entire span of event flow within EventBridge and quickly identify and resolve issues as they arise.

EventBridge now has the following metrics:

Metric Description Dimensions and Units
PutEventsLatency The time taken per PutEvents API operation

None

Units: Milliseconds

PutEventsRequestSize The size of the PutEvents API request in bytes

None

Units: Bytes

MatchedEvents Number of events that matched with any rule, or a specific rule None
RuleName,
EventBusName,
EventSourceName

Units: Count

ThrottledRules The number of times rule execution was throttled.

None, RuleName

Unit: Count

PutEventsApproximateCallCount Approximate total number of calls in PutEvents API calls.

None

Units: Count

PutEventsApproximateThrottledCount Approximate number of throttled requests in PutEvents API calls.

None

Units: Count

PutEventsApproximateFailedCount Approximate number of failed PutEvents API calls.

None

Units: Count

PutEventsApproximateSuccessCount Approximate number of successful PutEvents API calls.

None

Units: Count

PutEventsEntriesCount The number of event entries contained in a PutEvents request.

None

Units: Count

PutEventsFailedEntriesCount The number of event entries contained in a PutEvents request that failed to be ingested.

None

Units: Count

PutPartnerEventsApproximateCallCount Approximate total number of calls in PutPartnerEvents API calls. (visible in Partner’s account)

None

Units: Count

PutPartnerEventsApproximateThrottledCount Approximate number of throttled requests in PutPartnerEvents API calls. (visible in Partner’s account)

None

Units: Count

PutPartnerEventsApproximateFailedCount Approximate number of failed PutPartnerEvents API calls. (visible in Partner’s account)

None

Units: Count

PutPartnerEventsApproximateSuccessCount Approximate number of successful PutPartnerEvents API calls. (visible in Partner’s account)

None

Units: Count

PutPartnerEventsEntriesCount The number of event entries contained in a PutPartnerEvents request.

None

Units: Count

PutPartnerEventsFailedEntriesCount The number of event entries contained in a PutPartnerEvents request that failed to be ingested.

None

Units: Count

PutPartnerEventsLatency The time taken per PutPartnerEvents API operation (visible in Partner’s account)

None

Units: Milliseconds

InvocationsCreated Number of times a target is invoked by a rule in response to an event. One invocation attempt represents a single count for this metric.

None

Units: Count

InvocationAttempts Number of times EventBridge attempted invoking a target.

None

Units: Count

SuccessfulInvocationAttempts Number of times target was successfully invoked.

None

Units: Count

RetryInvocationAttempts The number of times a target invocation has been retried.

None

Units: Count

IngestiontoInvocationStartLatency The time to process events, measured from when an event is ingested by EventBridge to the first invocation of a target. None,
RuleName,
EventBusName

Units: Milliseconds

IngestiontoInvocationCompleteLatency The time taken from event Ingestion to completion of the first successful invocation attempt None,
RuleName,
EventBusName

Units: Milliseconds

Use-cases for these metrics

These new metrics help you improve observability and monitoring of your event-driven applications. You can proactively monitor metrics that help you understand the event flow, invocations, latency, and service utilization. You can also set up alerts on specific metrics and take necessary actions, which help improve your application performance, proactively manage quotas, and improve resiliency.

Monitor service usage based on Service Quotas

The PutEventsApproximateCallCount metric in the events family helps you identify the approximate number of events published on the event bus using the PutEvents API action. The PutEventsApproximateSuccessfulCount metric shows the approximate number of successful events published on the event bus.

Similarly, you can monitor throttled and failed events count with PutEventsApproximateThrottledCount and PutEventsApproximateFailedCount respectively. These metrics allow you to monitor if you are reaching your quota for PutEvents. You can use a CloudWatch alarm and set a threshold close to your account quotas. If that is triggered, send notifications using Amazon SNS to your operations team. They can work to increase the Service Quotas.

You can also set an alarm on the PutEvents throttle limit in transactions per second service quota.

  1. Navigate to the Service Quotas console. On the left pane, choose AWS services, search for EventBridge, and select Amazon EventBridge (CloudWatch Events).
  2. In the Monitoring section, you can monitor the percentage utilization of the PutEvents throttle limit in transactions per second.
    Monitor the percentage utilization of PutEvents
  3. Go to the Alarms tab, and choose Create alarm. In Alarm threshold, choose 80% of the applied quota value from the dropdown. Set the Alarm name to PutEventsThrottleAlarm, and choose Create.
    Create alarm
  4. To be notified if this threshold is breached, navigate to Amazon CloudWatch Alarms console and choose PutEventsThrottleAlarm.
  5. Select the Actions dropdown from the top right corner, and choose Edit.
  6. On the Specify metric and conditions page, under Conditions, make sure that the Threshold type is selected as Static and the % Utilization selected as Greater/Equal than 80. Choose Next.
    Specify metrics and conditions
  7. Configure actions to send notifications to an Amazon SNS topic and choose Next.
    7.	Configure actions to send notifications.
  8. The Alarm name should be already set to PutEventsThrottleAlarm. Choose Next, then choose Update alarm.
    Add name and description

This helps you get notified when the percentage utilization of PutEvents throttle limit in transactions per second reaches close to the threshold set. You can then request Service Quota increases if required.

Similarly, you can also create CloudWatch alarms on percentage utilization of Invocations throttle limit in transactions per second against the service quota.

Invocations throttle limit in transactions per second

Enhanced observability

The PutEventsLatency metric shows the time taken per PutEvents API operation. There are two additional metrics, IngestiontoInvocationStartLatency metric and IngestiontoInvocationCompleteLatency metric. The first metric shows the time to process events measured from when the events are first ingested by EventBridge to the first invocation of a target. The second shows the time taken from event ingestion to completion of the first successful invocation attempt.

This helps identify latency-related issues from the time of ingestion until the time it reaches the target based on the RuleName. If there is high latency, these two metrics give you visibility into this issue, allowing you to take appropriate action.

Enhanced observability

You can set a threshold around these metrics, and if the threshold is triggered, the defined actions can help recover from potential failures. One of the defined actions here can be to send events generated later to EventBridge in the secondary Region using EventBridge global endpoints.

Sometimes, events are not delivered to the target specified in the rule. This can be because the target resource is unavailable, you don’t have permission to invoke the target, or there are network issues. In such scenarios, EventBridge retries to send these events to the target for 24 hours or up to 185 times, both of which are configurable.

The new RetryInvocationAttempts metric shows the number of times the EventBridge has retried to invoke the target. The retries are done when requests are throttled, target service having availability issues, network issues, and service failures. This provides additional observability to the customers and can be used to trigger a CloudWatch alarm to notify teams if the desired threshold is crossed. If the retries are exhausted, store the failed events in the Amazon SQS dead-letter queues to process failed events for the later time.

In addition to these, EventBridge supports additional dimensions like DetailType, Source, and RuleName to MatchedEvents metrics. This helps you monitor the number of matched events coming from different sources.

  1. Navigate to the Amazon CloudWatch. On the left pane, choose Metrics, and All metrics.
  2. In the Browse section, select Events, and Source.
  3. From the Graphed metrics tab, you can monitor matched events coming from different sources.Graphed metrics tab

Failover events to secondary Region

The PutEventsFailedEntriesCount metric shows the number of events that failed ingestion. Monitor this metric and set a CloudWatch alarm. If it crosses a defined threshold, you can then take appropriate action.

Also, set an alarm on the PutEventsApproximateThrottledCount metric, which shows the number of events that are rejected because of throttling constraints. For these event ingestion failures, the client must resend the failed events to the event bus again, allowing you to process every single event critical for your application.

Alternatively, send events to EventBridge service in the secondary Region using Amazon EventBridge global endpoints to improve resiliency of your event-driven applications.

Conclusion

This blog shows how to use these new metrics to improve the visibility of event flows in your event-driven applications. It helps you monitor the events more effectively, from invocation until the delivery to the target. This improves observability by proactively alerting on key metrics.

For more serverless learning resources, visit Serverless Land.

Introducing the Amazon Linux 2023 runtime for AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-the-amazon-linux-2023-runtime-for-aws-lambda/

This post is written by Rakshith Rao, Senior Solutions Architect.

AWS Lambda now supports Amazon Linux 2023 (AL2023) as a managed runtime and container base image. Named provided.al2023, this runtime provides an OS-only environment to run your Lambda functions.

It is based on the Amazon Linux 2023 minimal container image release and has several improvements over Amazon Linux 2 (AL2), such as a smaller deployment footprint, updated versions of libraries like glibc, and a new package manager.

What are OS-only Lambda runtimes?

Lambda runtimes define the execution environment where your function runs. They provide the OS, language support, and additional settings such as environment variables and certificates.

Lambda provides managed runtimes for Java, Python, Node.js, .NET, and Ruby. However, if you want to develop your Lambda functions in programming languages that are not supported by Lambda’s managed language runtimes, the ‘provided’ runtime family provides an OS-only environment in which you can run code written in any language. This release extends the provided runtime family to support Amazon Linux 2023.

Customers use these OS-only runtimes in three common scenarios. First, they are used with languages that compile to native code, such as Go, Rust, C++, .NET Native AOT and Java GraalVM Native. Since you only upload the compiled binary to Lambda, these languages do not require a dedicated language runtime, they only require an OS environment in which the binary can run.

Second, the OS-only runtimes also enable building third-party language runtimes that you can use off the shelf. For example, you can write Lambda functions in PHP using Bref, or Swift using the Swift AWS Lambda Runtime.

Third, you can use the OS-only runtime to deploy custom runtimes, which you build for a language or language version which Lambda does not provide a managed runtime. For example, Node.js 19 – Lambda only provides managed runtimes for LTS releases, which for Node.js are the even-numbered releases.

New in Amazon Linux 2023 base image for Lambda

Updated packages

AL2023 base image for Lambda is based on the AL2023-minimal container image and includes various package updates and changes compared with provided.al2.

The version of glibc in the AL2023 base image has been upgraded to 2.34, from 2.26 that was bundled in the AL2 base image. Some libraries that developers wanted to use in provided runtimes required newer versions of glibc. With this launch, you can now use an up-to-date version of glibc with your Lambda function.

The AL2 base image for Lambda came pre-installed with Python 2.7. This was needed because Python was a required dependency for some of the packages that were bundled in the base image. The AL2023 base image for Lambda has removed this dependency on Python 2.7 and does not come with any pre-installed language runtime. You are free to choose and install any compatible Python version that you need.

Since the AL2023 base image for Lambda is based on the AL2023-minimal distribution, you also benefit from a significantly smaller deployment footprint. The new image is less than 40MB compared to the AL2-based base image, which is over 100MB in size. You can find the full list of packages available in the AL2023 base image for Lambda in the “minimal container” column of the AL2023 package list documentation.

Package manager

Amazon Linux 2023 uses dnf as the package manager, replacing yum, which was the default package manager in Amazon Linux 2. AL2023 base image for Lambda uses microdnf as the package manager, which is a standalone implementation of dnf based on libdnf and does not require extra dependencies such as Python. microdnf in provided.al2023 is symlinked as dnf. Note that microdnf does not support all options of dnf. For example, you cannot install a remote rpm using the rpm’s URL or install a local rpm file. Instead, you can use the rpm command directly to install such packages.

This example Dockerfile shows how you can install packages using dnf while building a container-based Lambda function:

# Use the Amazon Linux 2023 Lambda base image
FROM public.ecr.aws/lambda/provided.al2023

# Install the required Python version
RUN dnf install -y python3

Runtime support

With the launch of provided.al2023 you can migrate your AL2 custom runtime-based Lambda functions right away. It also sets the foundation of future Lambda managed runtimes. The future releases of managed language runtimes such as Node.js 20, Python 3.12, Java 21, and .NET 8 are based on Amazon Linux 2023 and will use provided.al2023 as the base image.

Changing runtimes and using other compute services

Previously, the provided.al2 base image was built as a custom image that used a selection of packages from AL2. It included packages like curl and yum that were needed to build functions using custom runtime. Also, each managed language runtime used different packages based on the use case.

Since future releases of managed runtimes use provided.al2023 as the base image, they contain the same set of base packages that come with AL2023-minimal. This simplifies migrating your Lambda function from a custom runtime to a managed language runtime. It also makes it easier to switch to other compute services like AWS Fargate or Amazon Elastic Container Services (ECS) to run your application.

Upgrading from AL1-based runtimes

For more information on Lambda runtime deprecation, see Lambda runtimes.

AL1 end of support is scheduled for December 31, 2023. The AL1-based runtimes go1.x, java8 and provided will be deprecated from this date. You should migrate your Go based Lambda functions to the provided runtime family, such as provided.al2 or provided.al2023. Using a provided runtime offers several benefits over the go1.x runtime. First, you can run your Lambda functions on AWS Graviton2 processors that offer up to 34% better price-performance compared to functions running on x86_64 processors. Second, it offers a smaller deployment package and faster function invoke path. And third, it aligns Go with other languages that also compile to native code and run on the provided runtime family.

The deprecation of the Amazon Linux 1 (AL1) base image for Lambda is also scheduled for December 31, 2023. With provided.al2023 now generally available, you should start planning the migration of your go1.x and AL1 based Lambda functions to provided.al2023.

Using the AL2023 base image for Lambda

To build Lambda functions using a custom runtime, follow these steps using the provided.al2023 runtime.

AWS Management Console

Navigate to the Create function page in the Lambda console. To use the AL2023 custom runtime, select Provide your own bootstrap on Amazon Linux 2023 as the Runtime value:

Runtime value

AWS Serverless Application Model (AWS SAM) template

If you use the AWS SAM template to build and deploy your Lambda function, use the provided.al2023 as the value of the Runtime:

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: hello-world/
      Handler: my.bootstrap.file
      Runtime: provided.al2023

Building Lambda functions that compile natively

Lambda’s custom runtime simplifies the experience to build functions in languages that compile to native code, broadening the range of languages you can use. Lambda provides the Runtime API, an HTTP API that custom runtimes can use to interact with the Lambda service. Implementations of this API, called Runtime Interface Client (RIC), allow your function to receive invocation events from Lambda, send the response back to Lambda, and report errors to the Lambda service. RICs are available as language-specific libraries for several popular programming langauges such as Go, Rust, Python, and Java.

For example, you can build functions using Go as shown in the Building with Go section of the Lambda developer documentation. Note that the name of the executable file of your function should always be bootstrap in provided.al2023 when using the zip deployment model. To use AL2023 in this example, use provided.al2023 as the runtime for your Lambda function.

If you are using CLI set the --runtime option to provided.al2023:

aws lambda create-function --function-name myFunction \
--runtime provided.al2023 --handler bootstrap \
--role arn:aws:iam::111122223333:role/service-role/my-lambda-role \
--zip-file fileb://myFunction.zip

If you are using AWS Serverless Application Model, use provided.al2023 as the value of the Runtime in your AWS SAM template file:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Metadata:
      BuildMethod: go1.x
    Properties:
      CodeUri: hello-world/ # folder where your main program resides
      Handler: bootstrap
      Runtime: provided.al2023
      Architectures: [arm64]

If you run your function as a container image as shown in the Deploy container image example, use this Dockerfile. You can use any name for the executable file of your function when using container images. You need to specify the name of the executable as the ENTRYPOINT in your Dockerfile:

FROM golang:1.20 as build
WORKDIR /helloworld

# Copy dependencies list
COPY go.mod go.sum ./

# Build with optional lambda.norpc tag
COPY main.go .
RUN go build -tags lambda.norpc -o main main.go

# Copy artifacts to a clean image
FROM public.ecr.aws/lambda/provided:al2023
COPY --from=build /helloworld/main ./main
ENTRYPOINT [ "./main" ]

Conclusion

With this launch, you can now build your Lambda functions using Amazon Linux 2023 as the custom runtime or use it as the base image to run your container-based Lambda functions. You benefit from the updated versions of libraries such as glibc, new package manager, and smaller deployment size than Amazon Linux 2 based runtimes. Lambda also uses Amazon Linux 2023-minimal as the basis for future Lambda runtime releases.

For more serverless learning resources, visit Serverless Land.

Introducing faster polling scale-up for AWS Lambda functions configured with Amazon SQS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-faster-polling-scale-up-for-aws-lambda-functions-configured-with-amazon-sqs/

This post was written by Anton Aleksandrov, Principal Solutions Architect, and Tarun Rai Madan, Senior Product Manager.

Today, AWS is announcing that AWS Lambda supports up to five times faster polling scale-up rate for spiky Lambda workloads configured with Amazon Simple Queue Service (Amazon SQS) as an event source.

This feature enables customers building event-driven applications using Lambda and SQS to achieve more responsive scaling during a sudden burst of messages in their SQS queues, and reduces the need to duplicate Lambda functions or SQS queues to achieve faster message processing.

Overview

Customers building modern event-driven and messaging applications with AWS Lambda use the Amazon SQS as a fundamental building block for creating decoupled architectures. Amazon SQS is a fully managed message queueing service for microservices, distributed systems, and serverless applications. When a Lambda function subscribes to an SQS queue as an event source, Lambda polls the queue, retrieves the messages, and sends retrieved messages in batches to the function handler for processing. To consume messages efficiently, Lambda detects the increase in queue depth, and increases the number of poller processes to process the queued messages.

Up until today, the Lambda was adding up to 60 concurrent executions per minute for Lambda functions subscribed to SQS queues, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes. However, customers tell us that some of the modern event-driven applications they build using Lambda and SQS are sensitive to sudden spikes in messages, which may cause noticeable delay in processing of messages for end users. In order to harness the power of Lambda for applications that experience a burst of messages in SQS queues, these customers needed Lambda message polling to scale up faster.

With today’s announcement, Lambda functions that subscribe to an SQS queue can scale up to five times faster for queues that see a spike in message backlog, adding up to 300 concurrent executions per minute, and scaling up to a maximum of 1,250 concurrent executions. This scaling improvement helps to use the simplicity of Lambda and SQS integration to build event-driven applications that scale faster during a surge of incoming messages, particularly for real-time systems. It also offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent Lambda invocations per SQS event source.

Controlling the maximum concurrent Lambda invocations by SQS

The new improved scaling rates are automatically applied to all AWS accounts using Lambda and SQS as an event source. There is no explicit action that you must take, and there’s no additional cost. This scaling improvement helps customers to build more performant Lambda applications where they need faster SQS polling scale-up. To prevent potentially overloading the downstream dependencies, Lambda provides customers the control to set the maximum number of concurrent executions at a function level with reserved concurrency, and event source level with maximum concurrency.

The following diagram illustrates settings that you can use to control the flow rate of an SQS event-source. You use reserved concurrency to control function-level scaling, and maximum concurrency to control event source scaling.

Control the flow rate of an SQS event-source

Reserved concurrency is the maximum concurrency that you want to allocate to a function. When a function has reserved concurrency allocated, no other functions can use that concurrency.

AWS recommends using reserved concurrency when you want to ensure a function has enough concurrency to scale up. When an SQS event source is attempting to scale up concurrent Lambda invocations, but the function has already reached the threshold defined by the reserved concurrency, the Lambda service throttles further function invocations.

This may result in SQS event source attempting to scale down, reducing the number of concurrently processed messages. Depending on the queue configuration, the throttled messages are returned to the queue for retrying, expire based on the retention policy, or sent to a dead-letter queue (DLQ) or on-failure destination.

The maximum concurrency setting allows you to control concurrency at the event source level. It allows you to define the maximum number of concurrent invocations the event source attempts to send to the Lambda function. For scenarios where a single function has multiple SQS event sources configured, you can define maximum concurrency for each event source separately, providing more granular control. When trying to add rate control to SQS event sources, AWS recommends you start evaluating maximum concurrency control first, as it provides greater flexibility.

Reserved concurrency and maximum concurrency are complementary capabilities, and can be used together. Maximum concurrency can help to prevent overwhelming downstream systems and throttled invocations. Reserved concurrency helps to ensure available concurrency for the function.

Example scenario

Consider your business must process large volumes of documents from storage. Once every few hours, your business partners upload large volumes of documents to S3 buckets in your account.

For resiliency, you’ve designed your application to send a message to an SQS queue for each of the uploaded documents, so you can efficiently process them without accidentally skipping any. The documents are processed using a Lambda function, which takes around two seconds to process a single document.

Processing these documents is a CPU-intensive operation, so you decide to process a single document per invocation. You want to use the power of Lambda to fan out the parallel processing to as many concurrent execution environments as possible. You want the Lambda function to scale up rapidly to process those documents in parallel as fast as possible, and scale-down to zero once all documents are processed to save costs.

When a business partner uploads 200,000 documents, 200,000 messages are sent to the SQS queue. The Lambda function is configured with an SQS event source, and it starts consuming the messages from the queue.

This diagram shows the results of running the test scenario before the SQS event source scaling improvements. As expected, you can see that concurrent executions grow by 60 per minute. It takes approximately 16 minutes to scale up to 900 concurrent executions gradually and process all the messages in the queue.

Results of running the test scenario before the SQS event source scaling improvements

The following diagram shows the results of running the same test scenario after the SQS event source scaling improvements. The timeframe used for both charts is the same, but the performance on the second chart is better. Concurrent executions grow by 300 per minute. It only takes 4 minutes to scale up to 1,250 concurrent executions, and all the messages in the queue are processed in approximately 8 minutes.

The results of running the same test scenario after the SQS event source scaling improvements

Deploying this example

Use the example project to replicate this performance test in your own AWS account. Follow the instructions in README.md for provisioning the sample project in your AWS accounts using the AWS Cloud Development Kit (CDK).

This example project is configured to demonstrate a large-scale workload processing 200,000 messages. Running this sample project in your account may incur charges. See AWS Lambda pricing and Amazon SQS pricing.

Once deployed, use the application under the “sqs-cannon” directory to send 200,000 messages to the SQS queue (or reconfigure to any other number). It takes several minutes to populate the SQS queue with messages. After all messages are sent, enable the SQS event source, as described in the README.md, and monitor the charts in the provisioned CloudWatch dashboard.

The default concurrency quota for new AWS accounts is 1000. If you haven’t requested an increase in this quota, the number of concurrent executions is capped at this number. Use Service Quotas or contact your account team to request a concurrency increase.

Security best practices

Always use the least privileged permissions when granting your Lambda functions access to SQS queues. This reduces potential attack surface by ensuring that only specific functions have permissions to perform specific actions on specific queues. For example, in case your function only polls from the queue, grant it permission to read messages, but not to send new messages. A function execution role defines which actions your function is allowed to perform on other resources. A queue access policy defines the principals that can access this queue, and the actions that are allowed.

Use server-side encryption (SSE) to store sensitive data in encrypted SQS queues. With SSE, your messages are always stored in encrypted form, and SQS only decrypts them for sending to an authorized consumer. SSE protects the contents of messages in queues using SQS-managed encryption keys (SSE-SQS) or keys managed in the AWS Key Management Service (SSE-KMS).

Conclusion

The improved Lambda SQS event source polling scale-up capability enables up to five times faster scale-up performance for spiky event-driven workloads using SQS queues, at no additional cost. This improvement offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent invokes by SQS as an event source.

For more serverless learning resources, visit Serverless Land.

Orchestrating dependent file uploads with AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/orchestrating-dependent-file-uploads-with-aws-step-functions/

This post is written by Nelson Assis, Enterprise Support Lead, Serverless and Jevon Liburd, Technical Account Manager, Serverless

Amazon S3 is an object storage service that many customers use for file storage. With the use of Amazon S3 Event Notifications or Amazon EventBridge customers can create workloads with event-driven architecture (EDA). This architecture responds to events produced when changes occur to objects in S3 buckets.

EDA involves asynchronous communication between system components. This serves to decouple the components allowing each component to be autonomous.

Some scenarios may introduce coupling in the architecture due to dependency between events. This blog post presents a common example of this coupling and how it can be handled using AWS Step Functions.

Overview

In this example, an organization has two distributed autonomous teams, the Sales team and the Warehouse team. Each team is responsible for uploading a monthly data file to an S3 bucket so it can be processed.

The files generate events when they are uploaded, initiating downstream processes. The processing of the Warehouse file cleans the data and joins it with data from the Shipping team. The processing of the Sales file correlates the data with the combined Warehouse and Shipping data. This enables analysts to perform forecasting and gain other insights.

For this correlation to happen, the Warehouse file must be processed before the Sales file. As the two teams are autonomous, there is no coordination among the teams. This means that the files can be uploaded at any time with no assurance that the Warehouse file is processed before the Sales file.

For scenarios like these, the Aggregator pattern can be used. The pattern collects and stores the events, and triggers a new event based on the combined events. In the described scenario, the combined events are the processed Warehouse file and the uploaded Sales file.

The requirements of the aggregator pattern are:

  1. Correlation – A way to group the related events. This is fulfilled by a unique identifier in the file name.
  2. Event aggregator – A stateful store for the events.
  3. Completion check and trigger – A condition when the combined events have been received and a way to publish the resulting event.

Architecture overview

The architecture uses the following AWS services:

  1. File upload: The Sales and Warehouse teams upload their respective files to S3.
  2. EventBridge: The ObjectCreated event is sent to EventBridge where there is a rule with a target of the main workflow.
  3. Main state machine: This state machine orchestrates the aggregator operations and the processing of the files. It encapsulates the workflows for each file to separate the aggregator logic from the files’ workflow logic.
  4. File parser and correlation: The business logic to identify the file and its type is run in this Lambda function.
  5. Stateful store: A DynamoDB table stores information about the file such as the name, type, and processing status. The state machine reads from and writes to the DynamoDB table. Task tokens are also stored in this table.
  6. File processing: Depending on the file type and any pre-conditions, state machines corresponding to the file type are run. These state machines contain the logic to process the specific file.
  7. Task Token & Callback: The task token is generated when the dependent file tries to be processed before the independent file. The Step Functions “Wait for a Callback” pattern continues the execution of the dependent file after the independent file is processed.

Walkthrough

You need the following prerequisites:

  • AWS CLI and AWS SAM CLI installed.
  • An AWS account.
  • Sufficient permissions to manage the AWS resources.
  • Git installed.

To deploy the example, follow the instructions in the GitHub repo.

This walkthrough shows what happens if the dependent file (Sales file) is uploaded before the independent one (Warehouse file).

  1. The workflow starts with the uploading of the Sales file to the dedicated Sales S3 bucket. The example uses separate S3 buckets for the two files as it assumes that the Sales and Warehouse teams are distributed and autonomous. You can find sample files in the code repository.
  2. Uploading the file to S3 sends an event to EventBridge, which the aggregator state machine acts on. The event pattern used in the EventBridge rule is:
    {
      "detail-type": ["Object Created"],
      "source": ["aws.s3"],
      "detail": {
        "bucket": {
          "name": ["sales-mfu-eda-09092023", "warehouse-mfu-eda-09092023"]
        },
        "reason": ["PutObject"]
      }
    }
  3. The aggregator state machine starts by invoking the file parser Lambda function. This function parses the file type and uses the identifier to correlate the files. In this example, the name of the file contains the file type and the correlation identifier (the year_month). To use other ways of representing the file type and correlation identifier, you can modify this function to parse that information.
  4. The next step in the state machine inserts a record for the event in the event aggregator DynamoDB table. The table has a composite primary key with the correlation identifier as the partition key and the file type as the sort key. The processing status of the file is tracked to give feedback on the state of the workflow.
  5. Based on the file type, the state machine determines which branch to follow. In the example, the Sales branch is run. The state machine tries to get the status of the (dependent) Warehouse file from DynamoDB using the correlation identifier. Using the result of this query, the state machine determines if the corresponding Warehouse file has already been processed.
  6. Since the Warehouse file is not processed yet, the waitForTaskToken integration pattern is used. The state machine waits at this step and creates a task token, which the external services use to trigger the state machine to continue its execution. The Sales record in the DynamoDB table is updated with the Task Token.
  7. Navigate to the S3 console and upload the sample Warehouse file to the Warehouse S3 bucket. This invokes a new instance of the Step Functions workflow, which flows through the other branch after the file type choice step. In this branch, the Warehouse state machine is run and the processing status of the file is updated in DynamoDB.

When the status of the Warehouse file is changed to “Completed”, the Warehouse state machine checks DynamoDB for a pending Sales file. If there is one, it retrieves the task token and calls the SendTaskSuccess method. This triggers the Sales state machine, which is in a waiting state to continue. The Sales state machine is started and the processing status is updated.

Conclusion

This blog post shows how to handle file dependencies in event driven architectures. You can customize the sample provided in the code repository for your own use case.

This solution is specific to file dependencies in event driven architectures. For more information on solving event dependencies and aggregators read the blog post: Moving to event-driven architectures with serverless event aggregators.

To learn more about event driven architectures, visit the event driven architecture section on Serverless Land.