All posts by James Beswick

Automating chaos experiments with AWS Fault Injection Service and AWS Lambda

2024-03-22 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-chaos-experiments-with-aws-fault-injection-service-and-aws-lambda/

This post is written by André Stoll, Solution Architect.

Chaos engineering is a popular practice for building confidence in system resilience. However, many existing tools assume the ability to alter infrastructure configurations, and cannot be easily applied to the serverless application paradigm. Due to the stateless, ephemeral, and distributed nature of serverless architectures, you must evolve the traditional technique when running chaos experiments on these systems.

This blog post explains a technique for running chaos engineering experiments on AWS Lambda functions. The approach uses Lambda extensions to induce failures in a runtime-agnostic way requiring no function code changes. It shows how you can use the AWS Fault Injection Service (FIS) to automate and manage chaos experiments across different Lambda functions to provide a reusable testing method.

Overview

Chaos experiments are commonly applied to cloud applications to uncover latent issues and prevent service disruptions. IT teams use chaos experiments to build confidence in the robustness of their systems. However, the traditional methods used in server-based chaos engineering do not easily translate to the serverless world since many existing tools are based on altering the underlying infrastructure configurations, such as cluster nodes or server instances of your applications.

In serverless applications, AWS handles the undifferentiated heavy lifting of managing infrastructure, so you can focus on delivering business value. But this also means that engineering teams have limited control over the infrastructure, and must rely on application-level tooling to run chaos experiments. Two techniques commonly used in the serverless community for conducting chaos experiments on Lambda functions are modifying the function configuration or using runtime-specific libraries.

Changing the configuration of a Lambda function allows you to induce rudimentary failures. For example, you can set the reserved concurrency of a Lambda function to simulate invocation throttling. Alternatively, you might change the function execution role permissions or the function policy to simulate IAM access denial. These types of failures are easy to implement, but the range of possible fault injection types is limited.

The other technique—injecting chaos into Lambda functions through purpose-built, runtime-specific libraries—is more flexible. There are various open-source libraries that allow you to inject failures, such as added latency, exceptions, or disk exhaustion. Examples of such libraries are Python’s chaos_lambda and failure-lambda for Node.js. The downside is that you must change the function code for every function you want to run chaos experiments on. In addition, those libraries are runtime-specific and each library comes with a set of different capabilities and configurations. This reduces the reusability of your chaos experiments across Lambda functions implemented in different languages.

Injecting chaos using Lambda extensions

Implementing chaos experiments using Lambda extensions allows you to address all of the previous concerns. Lambda extensions augment your functions by adding functionality, such as capturing diagnostic information or automatically instrumenting your code. You can integrate your preferred monitoring, observability, or security tooling deeply into the Lambda environment without complex installation or configuration management. Lambda extensions are generally packaged as Lambda layers and run as a separate process in the Lambda execution environment. You may use extensions from AWS, AWS Lambda partners, or build your own custom functionality.

With Lambda extensions, you can implement a chaos extension to inject the desired failures into your Lambda environments. This chaos extension uses the Runtime API proxy pattern that enables you to hook into the function invocation request and response lifecycle. Lambda runtimes use the Lambda Runtime API to retrieve the next incoming event to be processed by the function handler and return the handler response to the Lambda service.

The Runtime API HTTP endpoint is available within the Lambda execution environment. Runtimes get the API endpoint from the environment variable AWS_LAMBDA_RUNTIME_API. During the initialization of the execution environment, you can modify the runtime startup behavior. This lets you change the value of AWS_LAMBDA_RUNTIME_API to the port the chaos extension process is listening on. Now, all requests to the Runtime API go through the chaos extension proxy. You can use this workflow for blocking malicious events, auditing payloads, or injecting failures.

The chaos extension intercepts incoming events and outbound responses, and injects failures according to the chaos experiment configuration.
The extension accesses environment variables to read the chaos experiment configuration.
A wrapper script configures the runtime to proxy requests through the chaos extension.

When intercepting incoming events and outbound responses to the Lambda Runtime API, you can simulate failures such as introducing artificial delay or generate an error response to return to the Lambda service. This workflow adds latency to your function calls:

All Lambda runtimes support extensions. Since extensions run as a separate process, you can implement them in a language other than the function code. AWS recommends you implement extensions using a programming language that compiles to a binary executable, such as Golang or Rust. This allows you to use the extension with any Lambda runtime.

Some of the open source projects following this technique are the chaos-lambda-extension, implemented in Rust, or the serverless-chaos-extension, implemented in Python.

Extensions provide you with a flexible and reusable method to run your chaos experiments on Lambda functions. You can reuse the chaos extension for all runtimes without having to change function code. Add the extension to any Lambda function where you want to run chaos experiments.

Automating with AWS FIS experiment templates

According to the Principles of Chaos Engineering, you should “automate your experiments to run continuously”. To achieve this, you can use the AWS Fault Injection Service (FIS).

This service allows you to generate reusable experiment templates. The template specifies the targets and the actions to run on them during the experiment, and an optional stop condition that prevents the experiment from going out of bounds. You can also execute AWS Systems Manager Automation runbooks which support custom fault types. You can write your own custom Systems Manager documents to define the individual steps involved in the automation. To carry out the actions of the experiment, you define scripts in the document to manage your Lambda function and set it up for the chaos experiment.

To use the chaos extension for your serverless chaos experiments:

Set up the Lambda function for the experiment. Add the chaos extension as a layer and configure the experiment, for example, by adding environment variables specifying the fault type and its corresponding value.
Pause the automation and conduct the experiment. To do this, use the aws:sleep automation action. During this period, you conduct the experiment, measure and observe the outcome.
Clean up the experiment. The script removes the layer again and also resets the environment variables.

Running your first serverless chaos experiment

This sample repository provides you with the necessary code to run your first serverless chaos experiment in AWS. The experiment uses the chaos-lambda-extension extension to inject chaos.

The sample deploys the AWS FIS experiment template, the necessary SSM Automation runbooks including the IAM role used by the runbook to configure the Lambda functions. The sample also provisions a Lambda function for testing and an Amazon CloudWatch alarm used to roll back the experiment.

Prerequisites

You have the chaos extension already deployed as a Lambda layer in your AWS account.
You have installed the AWS Cloud Development Kit (CDK) CLI. See the Getting started with the AWS CDK guide for details.
Clone the sample repository locally by running
git clone [email protected]:aws-samples/serverless-chaos-experiments.git
Ensure you have sufficient permissions to interact with the AWS FIS, Lambda, and CloudWatch alarms.

Running the experiment

Follow the steps outlined in the repository to conduct your first experiment. Starting the experiment triggers the automation execution.

This automation includes adding the extension and configuring the experiment, pausing the execution and observing the system and reverting all changes to the initial state.

If you invoke the targeted Lambda function during the second step, failures (in this case, artificial latency) are simulated.

Security best practices

Extensions run within the same execution environment as the function, so they have the same level of access to resources such as file system, networking, and environment variables. IAM permissions assigned to the function are shared with extensions. AWS recommends you assign the least required privileges to your functions.

Always install extensions from a trusted source only. Use Infrastructure as Code (IaC) and automation tools, such as CloudFormation or AWS Systems Manager, to simplify attaching the same extension configuration, including AWS Identity and Access Management (IAM) permissions, to multiple functions. IaC and automation tools allow you to have an audit record of extensions and versions used previously.

When building extensions, do not log sensitive data. Sanitize payloads and metadata before logging or persisting them for audit purposes.

Conclusion

This blog post details how to run chaos experiments for serverless applications built using Lambda. The described approach uses Lambda extension to inject faults into the execution environment. This allows you to use the same method regardless of runtime or configuration of the Lambda function.

To automate and successfully conduct the experiment, you can use the AWS Fault Injection Service. By creating an experiment template, you can specify the actions to run on the defined targets, such as adding the extension during the experiment. Since the extension can be used for any runtime, you can reuse the experiment template to inject failures into different Lambda functions.

Visit this repository to deploy your first serverless chaos experiment, or watch this video guide for learning more about building extensions. Explore the AWS FIS documentation to learn how to create your own experiments.

For more serverless learning resources, visit Serverless Land.

Comparing design approaches for building serverless microservices

2024-03-04 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/comparing-design-approaches-for-building-serverless-microservices/

This post is written by Luca Mezzalira, Principal SA, and Matt Diamond, Principal, SA.

Designing a workload with AWS Lambda creates questions for developers due to the modularity that can be expressed either at the code or infrastructure level. Using serverless for running code requires additional planning to extract the business logic from the underlying functional components. This deliberate separation of concerns ensures a robust modularity, paving the way for evolutionary architectures.

This post focuses on synchronous workloads, but similar considerations are applicable in other workload types. After identifying the bounded context of your API and agreeing on API contracts with consumers, it’s time to structure the architecture of your bounded context and the associated infrastructure.

The two most common ways to structure an API using Lambda functions are single responsibility and Lambda-lith. However, this blog post explores an alternative to these approaches, which can provide the best of both.

Single responsibility Lambda functions

Single responsibility Lambda functions are designed to run a specific task or handle a particular event-triggered operation within a serverless architecture:

$c:\temp\design1.png$

This approach provides a strong separation of concerns between business logic and capabilities. You can test in isolation specific capabilities, deploy a Lambda function independently, reduce the surface to introduce bugs, and enable easier debugging for issues in Amazon CloudWatch.

Additionally, single purpose functions enable efficient resource allocation as Lambda automatically scales based on demand, optimizing resource consumption, and minimizing costs. This means you can modify the memory size, architecture, and any other configuration available per function. Moreover, requesting an update of concurrent function execution via a support ticket becomes easier because you are not aggregating the traffic to a single Lambda function that handles every request but you can request specific increase based on the traffic of a single task.

Another advantage is rapid execution time. Considering the business logic for a single-purpose Lambda function designed for a single task, you can optimize the size of a function more easily, without the need of additional libraries required in other approaches. This helps reduce the cold start time due to a smaller bundle size.

Despite these benefits, some issues exist when solely relying on single-purpose Lambda functions. While the cold start time is mitigated, you might experience a higher number of cold starts, particularly for functions with sporadic or infrequent invocations. For example, a function that deletes users in an Amazon DynamoDB table likely won’t be triggered as often as one that reads user data. Also, relying heavily on single-purpose Lambda functions can lead to increased system complexity, especially as the number of functions grows.

A good separation of concerns helps maintain your code base, at the cost of a lack of cohesion. In functions with similar tasks, such as write operations of an API (POST, PUT, DELETE), you might duplicate code and behaviors across multiple functions. Moreover, updating common libraries shared via Lambda Layers, or other dependency management systems, requires multiple changes across every function instead of an atomic change on a single file. This is also true for any other change across multiple functions, for instance, updating the runtime version.

Lambda-lith: Using one single Lambda function

When many workloads use single purpose Lambda functions, developers end up with a proliferation of Lambda functions across an AWS account. One of the main challenges developers face is updating common dependencies or function configurations. Unless there is a clear governance strategy implemented for addressing this problem (such as using Dependabot for enforcing the update of dependencies, or parameterized parameters that are retrieved at provisioning time), developers may opt for a different strategy.

As a result, many development teams move in the opposite direction, aggregating all code related to an API inside the same Lambda function.

This approach is often referred to as a Lambda-lith, because it gathers all the HTTP verbs that compose an API and sometimes multiple APIs in the same function.

This allows you to have a higher code cohesion and colocation across the different parts of the application. Modularity in this case is expressed at the code level, where patterns like single responsibility, dependency injection, and façade are applied to structure your code. The discipline and code best practices applied by the development teams is crucial for maintaining large code bases.

However, considering the reduced number of Lambda functions, updating a configuration or implementing a new standard across multiple APIs can be achieved more easily compared with the single responsibility approach.

Moreover, since every request invokes the same Lambda function for every HTTP verb, it’s more likely that little-used parts of your code have a better response time because an execution environment is more likely to be available to fulfill the request.

Another factor to consider is the function size. This increases when collocating verbs in the same function with all the dependencies and business logic of an API. This may affect the cold start of your Lambda functions with spiky workloads. Customers should evaluate the benefits of this approach, especially when applications have restrictive SLAs, which would be impacted by cold starts. Developers can mitigate this problem by paying attention to the dependencies used and implementing techniques like tree-shaking, minification, and dead code elimination, where the programming language allows.

This coarse grain approach won’t allow you to tune your function configurations individually. But you must find a configuration that matches all the code capabilities with a possibly higher memory size and looser security permissions that might clash with the requirements defined by the security team.

Read and write functions

These two approaches both have trade-offs, but there is a third option that can combine their benefits.

Often, API traffic leans towards more reads or writes and that forces developers to optimize code and configurations more on one side over the other.

For example, consider building a user API that allows consumers to create, update, and delete a user but also to find a user or a list of users. In this scenario, you can change one user at a time with no bulk operations available, but you can get one or more users per API request. Dividing the design of the API into read and write operations results in this architecture:

The cohesion of code for write operations (create, update, and delete) is beneficial for many reasons. For instance, you may need to validate the request body, ensuring it contains all the mandatory parameters. If the workload is heavy on writes, the less-used operations (for instance, Delete) benefit from warm execution environments. The code colocation enables reusability of code on similar actions, reducing the cognitive load to structure your projects with shared libraries or Lambda layers, for instance.

When looking at the read operations side, you can reduce the code bundled with this function, having a faster cold start, and heavily optimize the performance compared to a write operation. You can also store partial or full query results in-memory of an execution environment to improve the execution time of a Lambda function.

This approach helps you further with its evolutionary nature. Imagine if this platform becomes much more popular. Now, you must optimize the API even further by improving reads and adding a cache aside pattern with ElastiCache and Redis. Moreover, you have decided to optimize the read queries with a second database that is optimized for the read capability when the cache is missed.

On the write side, you have agreed with the API consumers that receiving and acknowledging user creation or deletion is adequate, considering they fully embraced the eventual consistency nature of distributed systems.

Now, you can improve the response time of write operations by adding an SQS queue before the Lambda function. You can update the write database in batches to reduce the number of invocations needed for handling write operations, instead of dealing with every request individually.

Command query responsibility segregation (CQRS) is a well-established pattern that separates the data mutation, or the command part of a system, from the query part. You can use the CQRS pattern to separate updates and queries if they have different requirements for throughput, latency, or consistency.

While it’s not mandatory to start with a full CQRS pattern, you can evolve from the infrastructure highlighted more easily in the initial read and write implementation, without massive refactoring of your API.

Comparison of the three approaches

Here is a comparison of the three approaches:

	Single responsibility	Lambda-lith	Read and write
Benefits	Strong separation of concerns Granular configuration Better debug Rapid execution time	Fewer cold start invocations Higher code cohesion Simpler maintenance	Code cohesion where needed Evolutionary architecture Optimization of read and write operations
Issues	Code duplication Complex maintenance Higher cold start invocations	Corse grain configuration Higher cold start time	Using CQRS with two data models CQRS adds eventual consistency to your system

Conclusion

Developers often move from single responsibility functions to the Lambda-lith as their architectures evolve, but both approaches have relative trade-offs. This post shows how it’s possible to have the best of both approaches by dividing your workloads per read and write operations.

All three approaches are viable for designing serverless APIs, and understanding what you are optimizing for is the key for making the best decision. Remember, understanding your context and business requirements to express in your applications leads you towards the acceptable trade-offs to specify inside a specific workload. Keep an open mind and find the solution that solves the problem and balances security, developer experience, cost, and maintainability.

For more serverless learning resources, visit Serverless Land.

Build real-time applications with Amazon EventBridge and AWS AppSync

2024-01-30 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/build-real-time-applications-with-amazon-eventbridge-and-aws-appsync/

This post is written by Josh Kahn, Tech Leader, Serverless.

Amazon EventBridge now supports publishing events to AWS AppSync GraphQL APIs as native targets. The new integration enables builders to publish events easily to a wider variety of consumers and simplifies updating clients with near real-time data. You can use EventBridge and AWS AppSync to build resilient, subscription-based event-driven architectures across consumers.

To illustrate using EventBridge with AWS AppSync, consider a simplified airport operations scenario. In this example, airlines publish flight events (for example, boarding, push back, gate changes, and delays) to a service that maintains flight status on in-airport displays. Airlines also publish events that are useful for other entities at the airport, such as baggage handlers and maintenance, but not to passengers. This depicts a conceptual view of the system:

Passengers want the in-airport displays to be up-to-date and accurate. There are a number of ways to design the display application so that data remains up-to-date. Broadly, these include the application polling some API or the application subscribing to data changes.

Subscriptions for this scenario are better as the data changes are small and incremental relative to the large amount of information displayed. In a delay, for example, the display updates the status and departure time but no other details of a single flight among a larger list of flight information.

AWS AppSync can enable clients to listen for real-time data changes through the use of GraphQL subscriptions. These are implemented using a WebSocket connection between the client and the AWS AppSync service. The display application client invokes the GraphQL subscription operation to establish a secure connection. AWS AppSync will automatically push data changes (or mutations) via the GraphQL API to subscribers using that connection.

Previously, builders could use EventBridge API Destinations to wire events published and routed through EventBridge to AWS AppSync, as described in an earlier blog post, and available in Serverless Land patterns (API Key, OAuth). The approach is useful for dealing with “out-of-band” updates in which data changes outside of an AWS AppSync mutation. Out-of-band updates generally require a NONE data source in AWS AppSync to notify subscribers of changes, as described in the AWS re:Post Knowledge Center. The addition of AWS AppSync as a target for EventBridge simplifies these use cases as you can now trigger a mutation in response to an event without additional code.

Airport Operations Events

Expanding the scenario, airport operations events look like this:

{
  "flightNum": 123,
  "carrierCode": "JK",
  "date": "2024-01-25",
  "event": "FlightDelayed",
  "message": "Delayed 15 minutes, late aircraft",
  "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
}

The event field identifies the type of event and if it is relevant to passengers. The event details provide further information about the event, which varies based on the type of event. The airport publishes a variety of events but the airport displays only need a subset of those changes.

AWS AppSync GraphQL APIs start with a GraphQL schema that defines the types, fields, and operations available in that API. AWS AppSync documentation provides an overview of schema and other GraphQL essentials. The partial GraphQL schema for the airport scenario is as follows:


type DelayEventInfo implements EventInfo {
	message: String
	delayMinutes: Int
	newDepTime: AWSDateTime
}

interface EventInfo {
	message: String
}

enum StatusEvent {
	FlightArrived
	FlightBoarding
	FlightCancelled
	FlightDelayed
	FlightGateChanged
	FlightLanded
	FlightPushBack
	FlightTookOff
}

type StatusUpdate {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	info: EventInfo
}

input StatusUpdateInput {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	message: String
	extra: AWSJSON
}

type Mutation {
	updateFlightStatus(input: StatusUpdateInput!): StatusUpdate!
}

type Query {
	listStatusUpdates(by: String): [StatusUpdate]
}

type Subscription {
	onFlightStatusUpdate(date: AWSDate, carrier: String): StatusUpdate
		@aws_subscribe(mutations: ["updateFlightStatus"])
}

schema {
	query: Query
	mutation: Mutation
	subscription: Subscription
}

Connect EventBridge to AWS AppSync

EventBridge allows you to filter, transform, and route events to a number of targets. The airport display service only needs events that directly impact passengers. You can define a rule in EventBridge that routes only those events (included in the preceding GraphQL schema) to the AWS AppSync target. Other events are routed elsewhere, as defined by other rules, or dropped. Details on creating EventBridge rules and the event matching pattern format can be found in EventBridge documentation.

The previous flight delayed event would be delivered using EventBridge as follows:

{
  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
  }
}

In this scenario, there is a specific list of events of interest, but EventBridge provides a flexible set of operations to match patterns, inspect arrays, and filter by content using prefix, numerical, or other matching. Some organizations will also allow subscribers to define their own rules on an EventBridge event bus, allowing targets to subscribe to events via self-service.

The following event pattern matches on the events needed for the airport display service:

{
  "source": [ "airport-operations" ],
  "detail": {
    "event": [ "FlightArrived", "FlightBoarding", "FlightCancelled", ... ]
  }
}

To create a new EventBridge rule, you can use the AWS Management Console or infrastructure as code. You can find the CloudFormation definition for the completed rule, with the AWS AppSync target, later in this post.

Create the AWS AppSync target

Now that EventBridge is configured to route selected events, define AWS AppSync as the target for the rule. The AWS AppSync API must support IAM authorization to be used as an EventBridge target. AWS AppSync supports multiple authorization types on a single GraphQL type, so you can also use OpenID Connect, Amazon Cognito User Pools, or other authorization methods as needed.

To configure AWS AppSync as an EventBridge target, define the target using the AWS Management Console or infrastructure as code. In the console, select the Target Type as “AWS Service” and Target as “AppSync.” Select your API. EventBridge parses the GraphQL schema and allows you to select the mutation to invoke when the rule is triggered.

When using the AWS Management Console, EventBridge will also configure the necessary AWS IAM role to invoke the selected mutation. Remember to create and associate a role with an appropriate trust policy when configuring with IaC.

EventBridge supports input transformation to customize the contents of an event before passing the information as input to the target. Configure the input transformer to extract needed values from the event using JSON path and a template in the input format expected by the AWS AppSync API. EventBridge provides a handy utility in the Console to pass and test the output of a sample event.

Finally, configure the selection set to include the response from the AWS AppSync API. These are the fields that will be returned to EventBridge when the mutation is invoked. While the result returned to EventBridge is not overly useful (aside from troubleshooting), the mutation selection set will also determine the fields available to subscribers to the onFlightStatusUpdate subscription.

Define the EventBridge to AWS AppSync rule in CloudFormation

Infrastructure as code templates, including AWS CloudFormation and AWS CDK, are useful for codifying infrastructure definitions to deploy across Regions and accounts. While you can write CloudFormation by hand, EventBridge provides a useful CloudFormation export in the AWS Management Console. You can use this feature to export the definition for a defined rule.

This is the CloudFormation for the previous configured rule and AWS AppSync target. This snippet includes both the rule definition and the target configuration.

PassengerEventsToDisplayServiceRule:
    Type: AWS::Events::Rule
    Properties:
      Description: Route passenger related events to the display service endpoint
      EventBusName: eb-to-appsync
      EventPattern:
        source:
          - airport-operations
        detail:
          event:
            - FlightArrived
            - FlightBoarding
            - FlightCancelled
            - FlightDelayed
            - FlightGateChanged
            - FlightLanded
            - FlightPushBack
            - FlightTookOff
      Name: passenger-events-to-display-service
      State: ENABLED
      Targets:
        - Id: 12344535353263463
          Arn: <AppSync API GraphQL API ARN>
          RoleArn: <EventBridge Role ARN (defined elsewhere)>
          InputTransformer:
            InputPathsMap:
              carrier: $.detail.carrierCode
              date: $.detail.date
              event: $.detail.event
              extra: $.detail.info
              message: $.detail.message
              num: $.detail.flightNum
            InputTemplate: |-
              {
                "input": {
                  "num": <num>,
                  "carrier": <carrier>,
                  "date": <date>,
                  "event": <event>,
                  "message": "<message>",
                  "extra": <extra>
                }
              }
          AppSyncParameters:
            GraphQLOperation: >-
              mutation
              UpdateFlightStatus($input:StatusUpdateInput!){updateFlightStatus(input:$input){
                event
                date
                carrier
                num
                info {
                  __typename
                  ... on DelayEventInfo {
                    message
                    delayMinutes
                    newDepTime
                  }
                }
              }}

The ARN of the AWS AppSync API follows the form arn:aws:appsync:<AWS_REGION>:<ACCOUNT_ID>:endpoints/graphql-api/<GRAPHQL_ENDPOINT_ID>. The ARN is available in CloudFormation (see GraphQLEndpointArn return value) or can be created using the identifier found in the AWS AppSync GraphQL endpoint. The ARN included in the EventBridge execution role policy is the AWS AppSync API ARN (a different ARN).

The AppSyncParameters field includes the GraphQL operation for EventBridge to invoke on the AWS AppSync API. This must be well formatted and match the GraphQL schema. Include any fields that must be available to subscribers in the selection set.

Testing subscriptions

AWS AppSync is now configured as a target for the EventBridge rule. The real-life display application would use a GraphQL library, such as AWS Amplify, to subscribe to real-time data changes. The AWS Management Console provides a useful utility to test. Navigate to the AWS AppSync console and select Queries in the menu for your API. Enter the following query and choose Run to subscribe for data changes:

subscription MySubscription {
  onFlightStatusUpdate {
    carrier
    date
    event
    num
    info {
      __typename
      … on DelayEventInfo {
        message
        delayMinutes
        newDepTime
      }
    }
  }
}

In a separate browser tab, navigate to the EventBridge console, and choose Send events. On the Send events page, select the required event bus and set the Event source to “airport-operations.” Then enter a detail type of your choice. Finally, paste the following as the Event detail, then choose Send.

{
  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
  }
}

Return to the AWS AppSync tab in your browser to see the changed data in the result pane:

Conclusion

Directly invoking AWS AppSync GraphQL API targets from EventBridge simplifies and streamlines integration between these two services, ideal for notifying a variety of subscribers of data changes in event-driven workloads. You can also take advantage of other features available from the two services. For example, use AWS AppSync enhanced subscription filtering to update only airport displays in the terminal in which they are located.

To learn more about serverless, visit Serverless Land for a wide array of reusable patterns, tutorials, and learning materials. Newly added to the pattern library is an EventBridge to AWS AppSync pattern similar to the one described in this post. Visit EventBridge documentation for more details.

For more serverless learning resources, visit Serverless Land.

Invoking on-premises resources interactively using AWS Step Functions and MQTT

2024-01-30 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/invoking-on-premises-resources-interactively-using-aws-step-functions-and-mqtt/

This post is written by Alex Paramonov, Sr. Solutions Architect, ISV, and Pieter Prinsloo, Customer Solutions Manager.

Workloads in AWS sometimes require access to data stored in on-premises databases and storage locations. Traditional solutions to establish connectivity to the on-premises resources require inbound rules to firewalls, a VPN tunnel, or public endpoints.

This blog post demonstrates how to use the MQTT protocol (AWS IoT Core) with AWS Step Functions to dispatch jobs to on-premises workers to access or retrieve data stored on-premises. The state machine can communicate with the on-premises workers without opening inbound ports or the need for public endpoints on on-premises resources. Workers can run behind Network Access Translation (NAT) routers while keeping bidirectional connectivity with the AWS Cloud. This provides a more secure and cost-effective way to access data stored on-premises.

Overview

By using Step Functions with AWS Lambda and AWS IoT Core, you can access data stored on-premises securely without altering the existing network configuration.

AWS IoT Core lets you connect IoT devices and route messages to AWS services without managing infrastructure. By using a Docker container image running on-premises as a proxy IoT Thing, you can take advantage of AWS IoT Core’s fully managed MQTT message broker for non-IoT use cases.

MQTT subscribers receive information via MQTT topics. An MQTT topic acts as a matching mechanism between publishers and subscribers. Conceptually, an MQTT topic behaves like an ephemeral notification channel. You can create topics at scale with virtually no limit to the number of topics. In SaaS applications, for example, you can create topics per tenant. Learn more about MQTT topic design here.

The following reference architecture shown uses the AWS Serverless Application Model (AWS SAM) for deployment, Step Functions to orchestrate the workflow, AWS Lambda to send and receive on-premises messages, and AWS IoT Core to provide the MQTT message broker, certificate and policy management, and publish/subscribe topics.

Start the state machine, either “on demand” or on a schedule.
The state: “Lambda: Invoke Dispatch Job to On-Premises” publishes a message to an MQTT message broker in AWS IoT Core.
The message broker sends the message to the topic corresponding to the worker (tenant) in the on-premises container that runs the job.
The on-premises container receives the message and starts work execution. Authentication is done using client certificates and the attached policy limits the worker access to only the tenant’s topic.
The worker in the on-premises container can access local resources like DBs or storage locations.
The on-premises container sends the results and job status back to another MQTT topic.
The AWS IoT Core rule invokes the “TaskToken Done” Lambda function.
The Lambda function submits the results to Step Functions via SendTaskSuccess or SendTaskFailure API.

Deploying and testing the sample

Ensure you can manage AWS resources from your terminal and that:

Latest versions of AWS CLI and AWS SAM CLI are installed.
You have an AWS account. If not, visit this page.
Your user has sufficient permissions to manage AWS resources.
Git is installed.
Python version 3.11 or greater is installed.
Docker is installed.

You can access the GitHub repository here and follow these steps to deploy the sample.

The aws-resources directory contains the required AWS resources including the state machine, Lambda functions, topics, and policies. The directory on-prem-worker contains the Docker container image artifacts. Use it to run the on-premises worker locally.

In this example, the worker container adds two numbers, provided as an input in the following format:

{
  "a": 15,
  "b": 42
}

In a real-world scenario, you can substitute this operation with business logic. For example, retrieving data from on-premises databases, generating aggregates, and then submitting the results back to your state machine.

Follow these steps to test the sample end-to-end.

Using AWS IoT Core without IoT devices

There are no IoT devices in the example use case. However, the fully managed MQTT message broker in AWS IoT Core lets you route messages to AWS services without managing infrastructure.

AWS IoT Core authenticates clients using X.509 client certificates. You can attach a policy to a client certificate allowing the client to publish and subscribe only to certain topics. This approach does not require IAM credentials inside the worker container on-premises.

AWS IoT Core’s security, cost efficiency, managed infrastructure, and scalability make it a good fit for many hybrid applications beyond typical IoT use cases.

Dispatching jobs from Step Functions and waiting for a response

When a state machine reaches the state to dispatch the job to an on-premises worker, the execution pauses and waits until the job finishes. Step Functions support three integration patterns: Request-Response, Sync Run a Job, and Wait for a Callback with Task Token. The sample uses the “Wait for a Callback with Task Token“ integration. It allows the state machine to pause and wait for a callback for up to 1 year.

When the on-premises worker completes the job, it publishes a message to the topic in AWS IoT Core. A rule in AWS IoT Core then invokes a Lambda function, which sends the result back to the state machine by calling either SendTaskSuccess or SendTaskFailure API in Step Functions.

You can prevent the state machine from timing out by adding HeartbeatSeconds to the task in the Amazon States Language (ASL). Timeouts happen if the job freezes and the SendTaskFailure API is not called. HeartbeatSeconds send heartbeats from the worker via the SendTaskHeartbeat API call and should be less than the specified TimeoutSeconds.

To create a task in ASL for your state machine, which waits for a callback token, use the following code:

{
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
      "Parameters": {
        "FunctionName": "${LambdaNotifierToWorkerArn}",
        "Payload": {
          "Input.$": "$",
          "TaskToken.$": "$$.Task.Token"
        }
}

The .waitForTaskToken suffix indicates that the task must wait for the callback. The state machine generates a unique callback token, accessible via the $$.Task.Token built-in variable, and passes it as an input to the Lambda function defined in FunctionName.

The Lambda function then sends the token to the on-premises worker via an AWS IoT Core topic.

Lambda is not the only service that supports Wait for Callback integration – see the full list of supported services here.

In addition to dispatching tasks and getting the result back, you can implement progress tracking and shut down mechanisms. To track progress, the worker sends metrics via a separate topic.

Depending on your current implementation, you have several options:

Storing progress data from the worker in Amazon DynamoDB and visualizing it via REST API calls to a Lambda function, which reads from the DynamoDB table. Refer to this tutorial on how to store data in DynamoDB directly from the topic.
For a reactive user experience, create a rule to invoke a Lambda function when new progress data arrives. Open a WebSocket connection to your backend. The Lambda function sends progress data via WebSocket directly to the frontend.

To implement a shutdown mechanism, you can run jobs in separate threads on your worker and subscribe to the topic, to which your state machine publishes the shutdown messages. If a shutdown message arrives, end the job thread on the worker and send back the status including the callback token of the task.

Using AWS IoT Core Rules and Lambda Functions

A message with job results from the worker does not arrive to the Step Functions API directly. Instead, an AWS IoT Core Rule and a dedicated Lambda function forward the status message to Step Functions. This allows for more granular permissions in AWS IoT Core policies, which result in improved security because the worker container can only publish and subscribe to specific topics. No IAM credentials exist on-premises.

The Lambda function’s execution role contains the permissions for SendTaskSuccess, SendTaskHeartbeat, and SendTaskFailure API calls only.

Alternatively, a worker can run API calls in Step Functions workflows directly, which replaces the need for a topic in AWS IoT Core, a rule, and a Lambda function to invoke the Step Functions API. This approach requires IAM credentials inside the worker’s container. You can use AWS Identity and Access Management Roles Anywhere to obtain temporary security credentials. As your worker’s functionality evolves over time, you can add further AWS API calls while adding permissions to the IAM execution role.

Cleaning up

The services used in this solution are eligible for AWS Free Tier. To clean up the resources in the aws-resources/ directory of the repository run:

sam delete

This removes all resources provisioned by the template.yml file.

To remove the client certificate from AWS, navigate to AWS IoT Core Certificates and delete the certificate, which you added during the manual deployment steps.

Lastly, stop the Docker container on-premises and remove it:

docker rm --force mqtt-local-client

Finally, remove the container image:

docker rmi mqtt-client-waitfortoken

Conclusion

Accessing on-premises resources with workers controlled via Step Functions using MQTT and AWS IoT Core is a secure, reactive, and cost effective way to run on-premises jobs. Consider updating your hybrid workloads from using inefficient polling or schedulers to the reactive approach described in this post. This offers an improved user experience with fast dispatching and tracking of jobs outside of cloud.

For more serverless learning resources, visit Serverless Land.

Consuming private Amazon API Gateway APIs using mutual TLS

2024-01-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/consuming-private-amazon-api-gateway-apis-using-mutual-tls/

This post is written by Thomas Moore, Senior Solutions Architect and Josh Hart, Senior Solutions Architect.

A previous blog post explores using Amazon API Gateway to create private REST APIs that can be consumed across different AWS accounts inside a virtual private cloud (VPC). Private cross-account APIs are useful for software vendors (ISVs) and SaaS companies providing secure connectivity for customers, and organizations building internal APIs and backend microservices.

Mutual TLS (mTLS) is an advanced security protocol that provides two-way authentication via certificates between a client and server. mTLS requires the client to send an X.509 certificate to prove its identity when making a request, together with the default server certificate verification process. This ensures that both parties are who they claim to be.

The mTLS connection process illustrated in the diagram above:

Client connects to the server.
Server presents its certificate, which is verified by the client.
Client presents its certificate, which is verified by the server.
Encrypted TLS connection established.

Customers use mTLS because it offers stronger security and identity verification than standard TLS connections. mTLS helps prevent man-in-the-middle attacks and protects against threats such as impersonation attempts, data interception, and tampering. As threats become more advanced, mTLS provides an extra layer of defense to validate connections.

Implementing mTLS increases overhead for certificate management, but for applications transmitting valuable or sensitive data, the extra security is important. If security is a priority for your systems and users, you should consider deploying mTLS.

Regional API Gateway endpoints have native support for mTLS but private API Gateway endpoints do not support mTLS, so you must terminate mTLS before API Gateway. The previous blog post shows how to build private mTLS APIs using a self-managed verification process inside a container running an NGINX proxy. Since then, Application Load Balancer (ALB) now supports mTLS natively, simplifying the architecture.

This post explores building mTLS private APIs using this new feature.

Application Load Balancer mTLS configuration

You can enable mutual authentication (mTLS) on a new or existing Application Load Balancer. By enabling mTLS on the load balancer listener, clients are required to present trusted certificates to connect. The load balancer validates the certificates before allowing requests to the backends.

There are two options available when configuring mTLS on the Application Load Balancer: Passthrough mode and Verify with trust store mode.

In Passthrough mode, the client certificate chain is passed as an X-Amzn-Mtls-Clientcert HTTP header for the application to inspect for authorization. In this scenario, there is still a backend verification process. The benefit in adding the ALB to the architecture is that you can perform application (layer 7) routing, such as path-based routing, allowing more complex application routing configurations.

In Verify with trust store mode, the load balancer validates the client certificate and only allows clients providing trusted certificates to connect. This simplifies the management and reduces load on backend applications.

This example uses AWS Private Certificate Authority but the steps are similar for third-party certificate authorities (CA).

To configure the certificate Trust Store for the ALB:

Create an AWS Private Certificate Authority. Specify the Common Name (CN) to be the domain you use to host the application at (for example, api.example.com).
Export the CA using either the CLI or the Console and upload the resulting Certificate.pem to an Amazon S3 bucket.
Create a Trust Store, point this at the certificate uploaded in the previous step.
Update the listener of your Application Load Balancer to use this trust store and select the required mTLS verification behavior.
Generate certificates for the client application against the private certificate authority, for example using the following commands:

openssl req -new -newkey rsa:2048 -days 365 -keyout my_client.key -out my_client.csr

aws acm-pca issue-certificate –certificate-authority-arn arn:aws:acm-pca:us-east-1:111122223333:certificate-authority/certificate_authority_id–csr fileb://my_client.csr –signing-algorithm “SHA256WITHRSA” –validity Value=365,Type=”DAYS” –template-arn arn:aws:acm-pca:::template/EndEntityCertificate/V1

aws acm-pca get-certificate -certificate-authority-arn arn:aws:acm-pca:us-east-1:111122223333:certificate-authority/certificate_authority_id–certificate-arn arn:aws:acm-pca:us-east-1:account_id:certificate-authority/certificate_authority_id/certificate/certificate_id–output text

For more details on this part of the process, see Use ACM Private CA for Amazon API Gateway Mutual TLS.

Private API Gateway mTLS verification using an ALB

Using the ALB Verify with trust store mode together with API Gateway can enable private APIs with mTLS, without the operational burden of a self-managed proxy service.

You can use this pattern to access API Gateway in the same AWS account, or cross-account.

The same account pattern allows clients inside the VPC to consume the private API Gateway by calling the Application Load Balancer URL. The ALB is configured to verify the provided client certificate against the trust store before passing the request to the API Gateway.

If the certificate is invalid, the API never receives the request. A resource policy on the API Gateway ensures that can requests are only allowed via the VPC endpoint, and a security group on the VPC endpoint ensures that it can only receive requests from the ALB. This prevents the client from bypassing mTLS by invoking the API Gateway or VPC endpoints directly.

The cross-account pattern using AWS PrivateLink provides the ability to connect to the ALB endpoint securely across accounts and across VPCs. It avoids the need to connect VPCs together using VPC Peering or AWS Transit Gateway and enables software vendors to deliver SaaS services to be consumed by their end customers. This pattern is available to deploy as sample code in the GitHub repository.

The flow of a client request through the cross-account architecture is as follows:

A client in the consumer application sends a request to the producer API endpoint.
The request is routed via AWS PrivateLink to a Network Load Balancer in the consumer account. The Network Load Balancer is a requirement of AWS PrivateLink services.
The Network Load Balancer uses an Application Load Balancer-type Target Group.
The Application Load Balancer listener is configured for mTLS in verify with trust store mode.
An authorization decision is made comparing the client certificate to the chain in the certificate trust store.
If the client certificate is allowed the request is routed to the API Gateway via the execute-api VPC Endpoint. An API Gateway resource policy is used to allow connections only via the VPC endpoint.
Any additional API Gateway authentication and authorization is performed, such as using a Lambda authorizer to validate a JSON Web Token (JWT).

Using the example deployed from the GitHub repo, this is the expected response from a successful request with a valid certificate:

curl –key my_client.key –cert my_client.pem https://api.example.com/widgets 

{“id”:”1”,”value”:”4.99”}

When passing an invalid certificate, the following response is received:

curl: (35) Recv failure: Connection reset by peer

Custom domain names

An additional benefit to implementing the mTLS solution with an Application Load Balancer is support for private custom domain names. Private API Gateway endpoints do not support custom domain names currently. But in this case, clients first connect to an ALB endpoint, which does support a custom domain. The sample code implements private custom domains using a public AWS Certificate Manager (ACM) certificate on the internal ALB, and an Amazon Route 53 hosted DNS zone. This allows you to provide a static URL to consumers so that if the API Gateway is replaced the consumer does not need to update their code.

Certificate revocation list

Optionally, as another layer of security, you can also configure a certificate revocation list for a trust store on the ALB. Revocation lists allow you to revoke and invalidate issued certificates before their expiry date. You can use this feature to off-boarding customers or denying compromised credentials, for example.

You can add the certificate revocation list to a new or existing trust store. The list is provided via an Amazon S3 URI as a PEM formatted file.

Conclusion

This post explores ways to provide mutual TLS authentication for private API Gateway endpoints. A previous post shows how to achieve this using a self-managed NGINX proxy. This post simplifies the architecture by using the native mTLS support now available for Application Load Balancers.

This new pattern centralizes authentication at the edge, streamlines deployment, and minimizes operational overhead compared to self-managed verification. AWS Private Certificate Authority and certificate revocation lists integrate with managed credentials and security policies. This makes it easier to expose private APIs safely across accounts and VPCs.

Mutual authentication and progressive security controls are growing in importance when architecting secure cloud-based workloads. To get started, visit the GitHub repository.

For more serverless learning resources, visit Serverless Land.

Python 3.12 runtime now available in AWS Lambda

2023-12-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/python-3-12-runtime-now-available-in-aws-lambda/

This post is written by Jeff Gebhart, Sr. Specialist TAM, Serverless.

AWS Lambda now supports Python 3.12 as both a managed runtime and container base image. Python 3.12 builds on the performance enhancements that were first released with Python 3.11, and adds a number of performance and language readability features in the interpreter. With this release, Python developers can now take advantage of these new features and enhancements when creating serverless applications on AWS Lambda.

You can use Python 3.12 with Powertools for AWS Lambda (Python), a developer toolkit to implement Serverless best practices such as observability, batch processing, Parameter Store integration, idempotency, feature flags, CloudWatch Metrics, and structured logging among other features.

You can also use Python 3.12 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

Python is a popular language for building serverless applications. The Python 3.12 release has a number of interpreter and syntactic improvements.

At launch, new Lambda runtimes receive less usage than existing, established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Lambda runtime changes

Amazon Linux 2023

The Python 3.12 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. This OS update brings several improvements over the Amazon Linux 2 (AL2)-based OS used for Lambda Python runtimes from Python 3.8 to Python 3.11.

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint of less than 40MB compared to over 100MB for Lambda’s AL2-based images.

With glibc version 2.34, customers have access to a modern version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the Python 3.12 base image.

Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Response format change

Starting with the Python 3.12 runtime, functions return Unicode characters as part of their JSON response. Previous versions return escaped sequences for Unicode characters in responses.

For example, in Python 3.11, if you return a Unicode string such as “こんにちは”, it escapes the Unicode characters and returns “\u3053\u3093\u306b\u3061\u306f”. The Python 3.12 runtime returns the original “こんにちは”.

This change reduces the size of the payload returned by Lambda. In the previous example, the escaped version is 32 bytes compared to 17 bytes with the Unicode string. Using Unicode responses reduces the size of Lambda responses, making it easier to fit larger responses into the 6MB Lambda response (synchronous) limit.

When upgrading to Python 3.12, you may need to adjust your code in other modules to account for this new behavior. If the caller expects escaped Unicode based on the previous runtime behavior, you must either add code to the returning function to escape the Unicode manually, or adjust the caller to handle the Unicode return.

Extensions processing for graceful shutdown

Lambda functions with external extensions can now benefit from improved graceful shutdown capabilities. When the Lambda service is about to shut down the runtime, it sends a SIGTERM signal to the runtime and then a SHUTDOWN event to each registered external extension.

These events are sent each time an execution environment shuts down. This allows you to catch the SIGTERM signal in your Lambda function and clean up resources, such as database connections, which were created by the function.

To learn more about the Lambda execution environment lifecycle, see Lambda execution environment. More details and examples of how to use graceful shutdown with extensions is available in the AWS Samples GitHub repository.

New Python features

Comprehension inlining

With the implementation of PEP 709, dictionary, list, and set comprehensions are now inlined. Prior versions create a single-use function to execute such comprehensions. Removing this overhead results in faster comprehension execution by a factor of two.

There are some behavior changes to comprehensions because of this update. For example, a call to the ‘locals()’ function from within the comprehension now includes objects from the containing scope, not just within the comprehension itself as in prior versions. You should test functions you are migrating from an earlier version of Python to Python 3.12.

Typing changes

Python 3.12 continues the evolution of including type annotations to Python. PEP 695 includes a new, more compact syntax for generic classes and functions, and adds a new “type” statement to allow for type alias creation. Type aliases are evaluated on demand. This permits aliases to refer to other types defined later.

Type parameters are visible within the scope of the declaration and any nested scopes, but not in the outer scope.

Formalization of f-strings

One of the largest changes in Python 3.12, the formalization of f-strings syntax, is covered under PEP 701. Any valid expression can now be contained within an f-string, including other f-strings.

In prior versions of Python, reusing quotes within an f-string results in errors. With Python 3.12, quote reuse is fully supported in nested f-strings such as the following example:

>>>songs = ['Take me back to Eden', 'Alkaline', 'Ascensionism']

>>>f"This is the playlist: {", ".join(songs)}"

'This is the playlist: Take me back to Eden, Alkaline, Ascensionism'

Additionally, any valid Python expression can be contained within an f-string. This includes multi-line expressions and the ability to embed comments within an f-string.

Before Python 3.12, the “\” character is not permitted within an f-string. This prevented use of “\N” syntax for defining escaped Unicode characters within the body of an f-string.

Asyncio improvements

There are a number of improvements to the asyncio module. These include performance improvements to writing of sockets and a new implementation of asyncio.current_task() that can yield a 4–6 times performance improvement. Event loops now optimize their child watchers for their underlying environment.

Using Python 3.12 in Lambda

AWS Management Console

To use the Python 3.12 runtime to develop your Lambda functions, specify a runtime parameter value Python 3.12 when creating or updating a function. The Python 3.12 version is now available in the Runtime dropdown in the Create Function page:

To update an existing Lambda function to Python 3.12, navigate to the function in the Lambda console and choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown:

AWS Lambda container image

Change the Python base image version by modifying the FROM statement in your Dockerfile:

FROM public.ecr.aws/lambda/python:3.12
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

Customers running the Python 3.12 Docker images locally, including customers using AWS SAM, must upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM set the Runtime attribute to python3.12 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.12

AWS SAM supports generating this template with Python 3.12 for new serverless applications using the `sam init` command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.PYTHON_3_12 to use this version. In Python CDK:

from constructs import Construct 
from aws_cdk import ( App, Stack, aws_lambda as _lambda )

class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        base_lambda = _lambda.Function(self, 'SampleLambda', 
                                       handler='lambda_handler.handler', 
                                    runtime=_lambda.Runtime.PYTHON_3_12, 
                                 code=_lambda.Code.from_asset('lambda'))

In TypeScript CDK:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The python3.12 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python311LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_12,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'
    })
  }
}

Conclusion

Lambda now supports Python 3.12. This release uses the Amazon Linux 2023 OS, supports Unicode responses, and graceful shutdown for functions with external extensions, and Python 3.12 language features.

You can build and deploy functions using Python 3.12 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC) tool. You can also use the Python 3.12 container base image if you prefer to build and deploy your functions using container images.

Python 3.12 runtime support helps developers to build more efficient, powerful, and scalable serverless applications. Try the Python 3.12 runtime in Lambda today and experience the benefits of this updated language version.

For more serverless learning resources, visit Serverless Land.

Introducing the AWS Integrated Application Test Kit (IATK)

2023-11-16 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/aws-integrated-application-test-kit/

This post is written by Dan Fox, Principal Specialist Solutions Architect, and Brian Krygsman, Senior Solutions Architect.

Today, AWS announced the public preview launch of the AWS Integrated Application Test Kit (IATK). AWS IATK is a software library that helps you write automated tests for cloud-based applications. This blog post presents several initial features of AWS IATK, and then shows working examples using an example video processing application. If you are getting started with serverless testing, learn more at serverlessland.com/testing.

Overview

When you create applications composed of serverless services like AWS Lambda, Amazon EventBridge, or AWS Step Functions, many of your architecture components cannot be deployed to your desktop, but instead only exist in the AWS Cloud. In contrast to working with applications deployed locally, these types of applications benefit from cloud-based strategies for performing automated tests. For its public preview launch, AWS IATK helps you implement some of these strategies for Python applications. AWS IATK will support other languages in future launches.

Locating resources for tests

When you write automated tests for cloud resources, you need the physical IDs of your resources. The physical ID is the name AWS assigns to a resource after creation. For example, to send requests to Amazon API Gateway you need the physical ID, which forms the API endpoint.

If you deploy cloud resources in separate infrastructure as code stacks, you might have difficulty locating physical IDs. In CloudFormation, you create the logical IDs of the resources in your template, as well as the stack name. With IATK, you can get the physical ID of a resource if you provide the logical ID and stack name. You can also get stack outputs by providing the stack name. These convenient methods simplify locating resources for the tests that you write.

Creating test harnesses for event driven architectures

To write integration tests for event driven architectures, establish logical boundaries by breaking your application into subsystems. Your subsystems should be simple enough to reason about, and contain understandable inputs and outputs. One useful technique for testing subsystems is to create test harnesses. Test harnesses are resources that you create specifically for testing subsystems.

For example, an integration test can begin a subsystem process by passing an input test event to it. IATK can create a test harness for you that listens to Amazon EventBridge for output events. (Under the hood, the harness is composed of an EventBridge Rule that forwards the output event to Amazon Simple Queue Service.) Your integration test then queries the test harness to examine the output and determine if the test passes or fails. These harnesses help you create integration tests in the cloud for event driven architectures.

Establishing service level agreements to test asynchronous features

If you write a synchronous service, your automated tests make requests and expect immediate responses. When your architecture is asynchronous, your service accepts a request and then performs a set of actions at a later time. How can you test for the success of an activity if it does not have a specified duration?

Consider creating reasonable timeouts for your asynchronous systems. Document timeouts as service level agreements (SLAs). You may decide to publish your SLAs externally or to document them as internal standards. IATK contains a polling feature that allows you to establish timeouts. This feature helps you to test that your asynchronous systems complete tasks in a timely manner.

Using AWS X-Ray for detailed testing

If you want to gain more visibility into the interior details of your application, instrument with AWS X-Ray. With AWS X-Ray, you trace the path of an event through multiple services. IATK provides conveniences that help you set the AWS X-Ray sampling rate, get trace trees, and assert for trace durations. These features help you observe and test your distributed systems in greater detail.

Learn more about testing asynchronous architectures at aws-samples/serverless-test-samples.

Overview of the example application

To demonstrate the features of IATK, this post uses a portion of a serverless video application designed with a plugin architecture. A core development team creates the primary application. Distributed development teams throughout the organization create the plugins. One AWS CloudFormation stack deploys the primary application. Separate stacks deploy each plugin.

Communications between the primary application and the plugins are managed by an EventBridge bus. Plugins pull application lifecycle events off the bus and must put completion notification events back on the bus within 20 seconds. For testing, the core team has created an AWS Step Functions workflow that mimics the production process by emitting properly formatted example lifecycle events. Developers run this test workflow in development and test environments to verify that their plugins are communicating properly with the event bus.

The following demonstration shows an integration test for the example application that validates plugin behavior. In the integration test, IATK locates the Step Functions workflow. It creates a test harness to listen for the event completion notification to be sent by the plugin. The test then runs the workflow to begin the lifecycle process and start plugin actions. Then IATK uses a polling mechanism with a timeout to verify that the plugin complies with the 20 second service level agreement. This is the sequence of processing:

The integration test starts an execution of the test workflow.
The workflow puts a lifecycle event onto the bus.
The plugin pulls the lifecycle event from the bus.
When the plugin is complete, it puts a completion event onto the bus.
The integration test polls for the completion event to determine if the test passes within the SLA.

Deploying and testing the example application

Follow these steps to review this application, build it locally, deploy it in your AWS account, and test it.

Downloading the example application

Open your terminal and clone the example application from GitHub with the following command or download the code. This repository also includes other example patterns for testing serverless applications.
```
git clone https://github.com/aws-samples/serverless-test-samples
```
The root of the IATK example application is in python-test-samples/integrated-application-test-kit. Change to this directory:
```
cd serverless-test-samples/python-test-samples/integrated-application-test-kit
```

Reviewing the integration test

Before deploying the application, review how the integration test uses the IATK by opening plugins/2-postvalidate-plugins/python-minimal-plugin/tests/integration/test_by_polling.py in your text editor. The test class instantiates the IATK at the top of the file.

iatk_client = aws_iatk.AwsIatk(region=aws_region)

In the setUp() method, the test class uses IATK to fetch CloudFormation stack outputs. These outputs are references to deployed cloud components like the plugin tester AWS Step Functions workflow:

stack_outputs = self.iatk_client.get_stack_outputs(
    stack_name=self.plugin_tester_stack_name,
    output_names=[
        "PluginLifecycleWorkflow",
        "PluginSuccessEventRuleName"
    ],
)

The test class attaches a listener to the default event bus using an Event Rule provided in the stack outputs. The test uses this listener later to poll for events.

add_listener_output = self.iatk_client.add_listener(
    event_bus_name="default",
    rule_name=self.existing_rule_name
)

The test class cleans up the listener in the tearDown() method.

self.iatk_client.remove_listeners(
    ids=[self.listener_id]
)

Once the configurations are complete, the method test_minimal_plugin_event_published_polling() implements the actual test.

The test first initializes the trigger event.

trigger_event = {
    "eventHook": "postValidate",
    "pluginTitle": "PythonMinimalPlugin"
}

Next, the test starts an execution of the plugin tester Step Functions workflow. It uses the plugin_tester_arn that was fetched during setUp.

self.step_functions_client.start_execution(
    stateMachineArn=self.plugin_tester_arn,
    input=json.dumps(trigger_event)
)

The test polls the listener, waiting for the plugin to emit events. It stops polling once it hits the SLA timeout or receives the maximum number of messages.

poll_output = self.iatk_client.poll_events(
    listener_id=self.listener_id,
    wait_time_seconds=self.SLA_TIMEOUT_SECONDS,
    max_number_of_messages=1,
)

Finally, the test asserts that it receives the right number of events, and that they are well-formed.

self.assertEqual(len(poll_output.events), 1)
self.assertEqual(received_event["source"], "video.plugin.PythonMinimalPlugin")
self.assertEqual(received_event["detail-type"], "plugin-complete")

Installing prerequisites

You need the following prerequisites to build this example:

An AWS account
The AWS Serverless Application Model (AWS SAM) CLI with credentials that can manage AWS resources
Python 3.11
Node.js 18.x
Docker (optional, but recommended for building Python applications with AWS SAM)

Build and deploy the example application components

Use AWS SAM to build and deploy the plugin tester to your AWS account. The plugin tester is the Step Functions workflow shown in the preceding diagram. During the build process, you can add the --use-container flag to the build command to instruct AWS SAM to create the application in a provided container. You can accept or override the default values during the deploy process. You will use “Stack Name” and “AWS Region” later to run the integration test.
```
cd plugins/plugin_tester # Move to the plugin tester directory

sam build --use-container # Build the plugin tester
```

Deploy the tester:

sam deploy --guided # Deploy the plugin tester

Once the plugin tester is deployed, use AWS SAM to deploy the plugin.

cd ../2-postvalidate-plugins/python-minimal-plugin # Move to the plugin directory

sam build --use-container # Build the plugin

Deploy the plugin:
```
sam deploy --guided # Deploy the plugin
```

Running the test

You can run tests written with IATK using standard Python test runners like unittest and pytest. The example application test uses unittest.

1. Use a virtual environment to organize your dependencies. From the root of the example application, run:
```
python3 -m venv .venv # Create the virtual environment
source .venv/bin/activate # Activate the virtual environment
```
2. Install the dependencies, including the IATK:
```
cd tests 
pip3 install -r requirements.txt
```
3. Run the test, providing the required environment variables from the earlier deployments. You can find correct values in the samconfig.toml file of the plugin_tester directory.
```
cd integration

PLUGIN_TESTER_STACK_NAME=video-plugin-tester \
AWS_REGION=us-west-2 \
python3 -m unittest ./test_by_polling.py
```

You should see output as unittest runs the test.

Open the Step Functions console in your AWS account, then choose the PluginLifecycleWorkflow-<random value> workflow to validate that the plugin tester successfully ran. A recent execution shows a Succeeded status:

Review other IATK features

The example application includes examples of other IATK features like generating mock events and retrieving AWS X-Ray traces.

Cleaning up

Use AWS SAM to clean up both the plugin and the plugin tester resources from your AWS account.

Delete the plugin resources:

cd ../.. # Move to the plugin directory
sam delete # Delete the plugin

Delete the plugin tester resources:

cd ../../plugin_tester # Move to the plugin tester directory
sam delete # Delete the plugin tester

The temporary test harness resources that IATK created during the test are cleaned up when the tearDown method runs. If there are problems during teardown, some resources may not be deleted. IATK adds tags to all resources that it creates. You can use these tags to locate the resources then manually remove them. You can also add your own tags.

Conclusion

The AWS Integrated Application Test Kit is a software library that provides conveniences to help you write automated tests for your cloud applications. This blog post shows some of the features of the initial Python version of the IATK.

To learn more about automated testing for serverless applications, visit serverlessland.com/testing. You can also view code examples at serverlessland.com/testing/patterns or at the AWS serverless-test-samples repository on GitHub.

For more serverless learning resources, visit Serverless Land.

Enhanced Amazon CloudWatch metrics for Amazon EventBridge

2023-11-11 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/enhanced-amazon-cloudwatch-metrics-for-amazon-eventbridge/

This post is written by Vaibhav Shah, Sr. Solutions Architect.

Customers use event-driven architectures to orchestrate and automate their event flows from producers to consumers. Amazon EventBridge acts as a serverless event router for various targets based on event rules. It decouples the producers and consumers, allowing customers to build asynchronous architectures.

EventBridge provides metrics to enable you to monitor your events. Some of the metrics include: monitoring the number of partner events ingested, the number of invocations that failed permanently, and the number of times a target is invoked by a rule in response to an event, or the number of events that matched with any rule.

In response to customer requests, EventBridge has added additional metrics that allow customers to monitor their events and provide additional visibility. This blog post explains these new capabilities.

What’s new?

EventBridge has new metrics mainly around the API, events, and invocations metrics. These metrics give you insights into the total number of events published, successful events published, failed events, number of events matched with any or specific rule, events rejected because of throttling, latency, and invocations based metrics.

This allows you to track the entire span of event flow within EventBridge and quickly identify and resolve issues as they arise.

EventBridge now has the following metrics:

Metric	Description	Dimensions and Units
PutEventsLatency	The time taken per PutEvents API operation	None Units: Milliseconds
PutEventsRequestSize	The size of the PutEvents API request in bytes	None Units: Bytes
MatchedEvents	Number of events that matched with any rule, or a specific rule	None RuleName, EventBusName, EventSourceName Units: Count
ThrottledRules	The number of times rule execution was throttled.	None, RuleName Unit: Count
PutEventsApproximateCallCount	Approximate total number of calls in PutEvents API calls.	None Units: Count
PutEventsApproximateThrottledCount	Approximate number of throttled requests in PutEvents API calls.	None Units: Count
PutEventsApproximateFailedCount	Approximate number of failed PutEvents API calls.	None Units: Count
PutEventsApproximateSuccessCount	Approximate number of successful PutEvents API calls.	None Units: Count
PutEventsEntriesCount	The number of event entries contained in a PutEvents request.	None Units: Count
PutEventsFailedEntriesCount	The number of event entries contained in a PutEvents request that failed to be ingested.	None Units: Count
PutPartnerEventsApproximateCallCount	Approximate total number of calls in PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsApproximateThrottledCount	Approximate number of throttled requests in PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsApproximateFailedCount	Approximate number of failed PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsApproximateSuccessCount	Approximate number of successful PutPartnerEvents API calls. (visible in Partner’s account)	None Units: Count
PutPartnerEventsEntriesCount	The number of event entries contained in a PutPartnerEvents request.	None Units: Count
PutPartnerEventsFailedEntriesCount	The number of event entries contained in a PutPartnerEvents request that failed to be ingested.	None Units: Count
PutPartnerEventsLatency	The time taken per PutPartnerEvents API operation (visible in Partner’s account)	None Units: Milliseconds
InvocationsCreated	Number of times a target is invoked by a rule in response to an event. One invocation attempt represents a single count for this metric.	None Units: Count
InvocationAttempts	Number of times EventBridge attempted invoking a target.	None Units: Count
SuccessfulInvocationAttempts	Number of times target was successfully invoked.	None Units: Count
RetryInvocationAttempts	The number of times a target invocation has been retried.	None Units: Count
IngestiontoInvocationStartLatency	The time to process events, measured from when an event is ingested by EventBridge to the first invocation of a target.	None, RuleName, EventBusName Units: Milliseconds
IngestiontoInvocationCompleteLatency	The time taken from event Ingestion to completion of the first successful invocation attempt	None, RuleName, EventBusName Units: Milliseconds

Use-cases for these metrics

These new metrics help you improve observability and monitoring of your event-driven applications. You can proactively monitor metrics that help you understand the event flow, invocations, latency, and service utilization. You can also set up alerts on specific metrics and take necessary actions, which help improve your application performance, proactively manage quotas, and improve resiliency.

Monitor service usage based on Service Quotas

The PutEventsApproximateCallCount metric in the events family helps you identify the approximate number of events published on the event bus using the PutEvents API action. The PutEventsApproximateSuccessfulCount metric shows the approximate number of successful events published on the event bus.

Similarly, you can monitor throttled and failed events count with PutEventsApproximateThrottledCount and PutEventsApproximateFailedCount respectively. These metrics allow you to monitor if you are reaching your quota for PutEvents. You can use a CloudWatch alarm and set a threshold close to your account quotas. If that is triggered, send notifications using Amazon SNS to your operations team. They can work to increase the Service Quotas.

You can also set an alarm on the PutEvents throttle limit in transactions per second service quota.

Navigate to the Service Quotas console. On the left pane, choose AWS services, search for EventBridge, and select Amazon EventBridge (CloudWatch Events).
In the Monitoring section, you can monitor the percentage utilization of the PutEvents throttle limit in transactions per second.
Go to the Alarms tab, and choose Create alarm. In Alarm threshold, choose 80% of the applied quota value from the dropdown. Set the Alarm name to PutEventsThrottleAlarm, and choose Create.
To be notified if this threshold is breached, navigate to Amazon CloudWatch Alarms console and choose PutEventsThrottleAlarm.
Select the Actions dropdown from the top right corner, and choose Edit.
On the Specify metric and conditions page, under Conditions, make sure that the Threshold type is selected as Static and the % Utilization selected as Greater/Equal than 80. Choose Next.
Configure actions to send notifications to an Amazon SNS topic and choose Next.
The Alarm name should be already set to PutEventsThrottleAlarm. Choose Next, then choose Update alarm.

This helps you get notified when the percentage utilization of PutEvents throttle limit in transactions per second reaches close to the threshold set. You can then request Service Quota increases if required.

Similarly, you can also create CloudWatch alarms on percentage utilization of Invocations throttle limit in transactions per second against the service quota.

Enhanced observability

The PutEventsLatency metric shows the time taken per PutEvents API operation. There are two additional metrics, IngestiontoInvocationStartLatency metric and IngestiontoInvocationCompleteLatency metric. The first metric shows the time to process events measured from when the events are first ingested by EventBridge to the first invocation of a target. The second shows the time taken from event ingestion to completion of the first successful invocation attempt.

This helps identify latency-related issues from the time of ingestion until the time it reaches the target based on the RuleName. If there is high latency, these two metrics give you visibility into this issue, allowing you to take appropriate action.

You can set a threshold around these metrics, and if the threshold is triggered, the defined actions can help recover from potential failures. One of the defined actions here can be to send events generated later to EventBridge in the secondary Region using EventBridge global endpoints.

Sometimes, events are not delivered to the target specified in the rule. This can be because the target resource is unavailable, you don’t have permission to invoke the target, or there are network issues. In such scenarios, EventBridge retries to send these events to the target for 24 hours or up to 185 times, both of which are configurable.

The new RetryInvocationAttempts metric shows the number of times the EventBridge has retried to invoke the target. The retries are done when requests are throttled, target service having availability issues, network issues, and service failures. This provides additional observability to the customers and can be used to trigger a CloudWatch alarm to notify teams if the desired threshold is crossed. If the retries are exhausted, store the failed events in the Amazon SQS dead-letter queues to process failed events for the later time.

In addition to these, EventBridge supports additional dimensions like DetailType, Source, and RuleName to MatchedEvents metrics. This helps you monitor the number of matched events coming from different sources.

Navigate to the Amazon CloudWatch. On the left pane, choose Metrics, and All metrics.
In the Browse section, select Events, and Source.
From the Graphed metrics tab, you can monitor matched events coming from different sources.

Failover events to secondary Region

The PutEventsFailedEntriesCount metric shows the number of events that failed ingestion. Monitor this metric and set a CloudWatch alarm. If it crosses a defined threshold, you can then take appropriate action.

Also, set an alarm on the PutEventsApproximateThrottledCount metric, which shows the number of events that are rejected because of throttling constraints. For these event ingestion failures, the client must resend the failed events to the event bus again, allowing you to process every single event critical for your application.

Alternatively, send events to EventBridge service in the secondary Region using Amazon EventBridge global endpoints to improve resiliency of your event-driven applications.

Conclusion

This blog shows how to use these new metrics to improve the visibility of event flows in your event-driven applications. It helps you monitor the events more effectively, from invocation until the delivery to the target. This improves observability by proactively alerting on key metrics.

For more serverless learning resources, visit Serverless Land.

Introducing the Amazon Linux 2023 runtime for AWS Lambda

2023-11-10 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-the-amazon-linux-2023-runtime-for-aws-lambda/

This post is written by Rakshith Rao, Senior Solutions Architect.

AWS Lambda now supports Amazon Linux 2023 (AL2023) as a managed runtime and container base image. Named provided.al2023, this runtime provides an OS-only environment to run your Lambda functions.

It is based on the Amazon Linux 2023 minimal container image release and has several improvements over Amazon Linux 2 (AL2), such as a smaller deployment footprint, updated versions of libraries like glibc, and a new package manager.

What are OS-only Lambda runtimes?

Lambda runtimes define the execution environment where your function runs. They provide the OS, language support, and additional settings such as environment variables and certificates.

Lambda provides managed runtimes for Java, Python, Node.js, .NET, and Ruby. However, if you want to develop your Lambda functions in programming languages that are not supported by Lambda’s managed language runtimes, the ‘provided’ runtime family provides an OS-only environment in which you can run code written in any language. This release extends the provided runtime family to support Amazon Linux 2023.

Customers use these OS-only runtimes in three common scenarios. First, they are used with languages that compile to native code, such as Go, Rust, C++, .NET Native AOT and Java GraalVM Native. Since you only upload the compiled binary to Lambda, these languages do not require a dedicated language runtime, they only require an OS environment in which the binary can run.

Second, the OS-only runtimes also enable building third-party language runtimes that you can use off the shelf. For example, you can write Lambda functions in PHP using Bref, or Swift using the Swift AWS Lambda Runtime.

Third, you can use the OS-only runtime to deploy custom runtimes, which you build for a language or language version which Lambda does not provide a managed runtime. For example, Node.js 19 – Lambda only provides managed runtimes for LTS releases, which for Node.js are the even-numbered releases.

New in Amazon Linux 2023 base image for Lambda

Updated packages

AL2023 base image for Lambda is based on the AL2023-minimal container image and includes various package updates and changes compared with provided.al2.

The version of glibc in the AL2023 base image has been upgraded to 2.34, from 2.26 that was bundled in the AL2 base image. Some libraries that developers wanted to use in provided runtimes required newer versions of glibc. With this launch, you can now use an up-to-date version of glibc with your Lambda function.

The AL2 base image for Lambda came pre-installed with Python 2.7. This was needed because Python was a required dependency for some of the packages that were bundled in the base image. The AL2023 base image for Lambda has removed this dependency on Python 2.7 and does not come with any pre-installed language runtime. You are free to choose and install any compatible Python version that you need.

Since the AL2023 base image for Lambda is based on the AL2023-minimal distribution, you also benefit from a significantly smaller deployment footprint. The new image is less than 40MB compared to the AL2-based base image, which is over 100MB in size. You can find the full list of packages available in the AL2023 base image for Lambda in the “minimal container” column of the AL2023 package list documentation.

Package manager

Amazon Linux 2023 uses dnf as the package manager, replacing yum, which was the default package manager in Amazon Linux 2. AL2023 base image for Lambda uses microdnf as the package manager, which is a standalone implementation of dnf based on libdnf and does not require extra dependencies such as Python. microdnf in provided.al2023 is symlinked as dnf. Note that microdnf does not support all options of dnf. For example, you cannot install a remote rpm using the rpm’s URL or install a local rpm file. Instead, you can use the rpm command directly to install such packages.

This example Dockerfile shows how you can install packages using dnf while building a container-based Lambda function:

# Use the Amazon Linux 2023 Lambda base image
FROM public.ecr.aws/lambda/provided.al2023

# Install the required Python version
RUN dnf install -y python3

Runtime support

With the launch of provided.al2023 you can migrate your AL2 custom runtime-based Lambda functions right away. It also sets the foundation of future Lambda managed runtimes. The future releases of managed language runtimes such as Node.js 20, Python 3.12, Java 21, and .NET 8 are based on Amazon Linux 2023 and will use provided.al2023 as the base image.

Changing runtimes and using other compute services

Previously, the provided.al2 base image was built as a custom image that used a selection of packages from AL2. It included packages like curl and yum that were needed to build functions using custom runtime. Also, each managed language runtime used different packages based on the use case.

Since future releases of managed runtimes use provided.al2023 as the base image, they contain the same set of base packages that come with AL2023-minimal. This simplifies migrating your Lambda function from a custom runtime to a managed language runtime. It also makes it easier to switch to other compute services like AWS Fargate or Amazon Elastic Container Services (ECS) to run your application.

Upgrading from AL1-based runtimes

For more information on Lambda runtime deprecation, see Lambda runtimes.

AL1 end of support is scheduled for December 31, 2023. The AL1-based runtimes go1.x, java8 and provided will be deprecated from this date. You should migrate your Go based Lambda functions to the provided runtime family, such as provided.al2 or provided.al2023. Using a provided runtime offers several benefits over the go1.x runtime. First, you can run your Lambda functions on AWS Graviton2 processors that offer up to 34% better price-performance compared to functions running on x86_64 processors. Second, it offers a smaller deployment package and faster function invoke path. And third, it aligns Go with other languages that also compile to native code and run on the provided runtime family.

The deprecation of the Amazon Linux 1 (AL1) base image for Lambda is also scheduled for December 31, 2023. With provided.al2023 now generally available, you should start planning the migration of your go1.x and AL1 based Lambda functions to provided.al2023.

Using the AL2023 base image for Lambda

To build Lambda functions using a custom runtime, follow these steps using the provided.al2023 runtime.

AWS Management Console

Navigate to the Create function page in the Lambda console. To use the AL2023 custom runtime, select Provide your own bootstrap on Amazon Linux 2023 as the Runtime value:

AWS Serverless Application Model (AWS SAM) template

If you use the AWS SAM template to build and deploy your Lambda function, use the provided.al2023 as the value of the Runtime:

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: hello-world/
      Handler: my.bootstrap.file
      Runtime: provided.al2023

Building Lambda functions that compile natively

Lambda’s custom runtime simplifies the experience to build functions in languages that compile to native code, broadening the range of languages you can use. Lambda provides the Runtime API, an HTTP API that custom runtimes can use to interact with the Lambda service. Implementations of this API, called Runtime Interface Client (RIC), allow your function to receive invocation events from Lambda, send the response back to Lambda, and report errors to the Lambda service. RICs are available as language-specific libraries for several popular programming langauges such as Go, Rust, Python, and Java.

For example, you can build functions using Go as shown in the Building with Go section of the Lambda developer documentation. Note that the name of the executable file of your function should always be bootstrap in provided.al2023 when using the zip deployment model. To use AL2023 in this example, use provided.al2023 as the runtime for your Lambda function.

If you are using CLI set the --runtime option to provided.al2023:

aws lambda create-function --function-name myFunction \
--runtime provided.al2023 --handler bootstrap \
--role arn:aws:iam::111122223333:role/service-role/my-lambda-role \
--zip-file fileb://myFunction.zip

If you are using AWS Serverless Application Model, use provided.al2023 as the value of the Runtime in your AWS SAM template file:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Metadata:
      BuildMethod: go1.x
    Properties:
      CodeUri: hello-world/ # folder where your main program resides
      Handler: bootstrap
      Runtime: provided.al2023
      Architectures: [arm64]

If you run your function as a container image as shown in the Deploy container image example, use this Dockerfile. You can use any name for the executable file of your function when using container images. You need to specify the name of the executable as the ENTRYPOINT in your Dockerfile:

FROM golang:1.20 as build
WORKDIR /helloworld

# Copy dependencies list
COPY go.mod go.sum ./

# Build with optional lambda.norpc tag
COPY main.go .
RUN go build -tags lambda.norpc -o main main.go

# Copy artifacts to a clean image
FROM public.ecr.aws/lambda/provided:al2023
COPY --from=build /helloworld/main ./main
ENTRYPOINT [ "./main" ]

Conclusion

With this launch, you can now build your Lambda functions using Amazon Linux 2023 as the custom runtime or use it as the base image to run your container-based Lambda functions. You benefit from the updated versions of libraries such as glibc, new package manager, and smaller deployment size than Amazon Linux 2 based runtimes. Lambda also uses Amazon Linux 2023-minimal as the basis for future Lambda runtime releases.

For more serverless learning resources, visit Serverless Land.

Introducing faster polling scale-up for AWS Lambda functions configured with Amazon SQS

2023-11-06 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-faster-polling-scale-up-for-aws-lambda-functions-configured-with-amazon-sqs/

This post was written by Anton Aleksandrov, Principal Solutions Architect, and Tarun Rai Madan, Senior Product Manager.

Today, AWS is announcing that AWS Lambda supports up to five times faster polling scale-up rate for spiky Lambda workloads configured with Amazon Simple Queue Service (Amazon SQS) as an event source.

This feature enables customers building event-driven applications using Lambda and SQS to achieve more responsive scaling during a sudden burst of messages in their SQS queues, and reduces the need to duplicate Lambda functions or SQS queues to achieve faster message processing.

Overview

Customers building modern event-driven and messaging applications with AWS Lambda use the Amazon SQS as a fundamental building block for creating decoupled architectures. Amazon SQS is a fully managed message queueing service for microservices, distributed systems, and serverless applications. When a Lambda function subscribes to an SQS queue as an event source, Lambda polls the queue, retrieves the messages, and sends retrieved messages in batches to the function handler for processing. To consume messages efficiently, Lambda detects the increase in queue depth, and increases the number of poller processes to process the queued messages.

Up until today, the Lambda was adding up to 60 concurrent executions per minute for Lambda functions subscribed to SQS queues, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes. However, customers tell us that some of the modern event-driven applications they build using Lambda and SQS are sensitive to sudden spikes in messages, which may cause noticeable delay in processing of messages for end users. In order to harness the power of Lambda for applications that experience a burst of messages in SQS queues, these customers needed Lambda message polling to scale up faster.

With today’s announcement, Lambda functions that subscribe to an SQS queue can scale up to five times faster for queues that see a spike in message backlog, adding up to 300 concurrent executions per minute, and scaling up to a maximum of 1,250 concurrent executions. This scaling improvement helps to use the simplicity of Lambda and SQS integration to build event-driven applications that scale faster during a surge of incoming messages, particularly for real-time systems. It also offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent Lambda invocations per SQS event source.

Controlling the maximum concurrent Lambda invocations by SQS

The new improved scaling rates are automatically applied to all AWS accounts using Lambda and SQS as an event source. There is no explicit action that you must take, and there’s no additional cost. This scaling improvement helps customers to build more performant Lambda applications where they need faster SQS polling scale-up. To prevent potentially overloading the downstream dependencies, Lambda provides customers the control to set the maximum number of concurrent executions at a function level with reserved concurrency, and event source level with maximum concurrency.

The following diagram illustrates settings that you can use to control the flow rate of an SQS event-source. You use reserved concurrency to control function-level scaling, and maximum concurrency to control event source scaling.

Reserved concurrency is the maximum concurrency that you want to allocate to a function. When a function has reserved concurrency allocated, no other functions can use that concurrency.

AWS recommends using reserved concurrency when you want to ensure a function has enough concurrency to scale up. When an SQS event source is attempting to scale up concurrent Lambda invocations, but the function has already reached the threshold defined by the reserved concurrency, the Lambda service throttles further function invocations.

This may result in SQS event source attempting to scale down, reducing the number of concurrently processed messages. Depending on the queue configuration, the throttled messages are returned to the queue for retrying, expire based on the retention policy, or sent to a dead-letter queue (DLQ) or on-failure destination.

The maximum concurrency setting allows you to control concurrency at the event source level. It allows you to define the maximum number of concurrent invocations the event source attempts to send to the Lambda function. For scenarios where a single function has multiple SQS event sources configured, you can define maximum concurrency for each event source separately, providing more granular control. When trying to add rate control to SQS event sources, AWS recommends you start evaluating maximum concurrency control first, as it provides greater flexibility.

Reserved concurrency and maximum concurrency are complementary capabilities, and can be used together. Maximum concurrency can help to prevent overwhelming downstream systems and throttled invocations. Reserved concurrency helps to ensure available concurrency for the function.

Example scenario

Consider your business must process large volumes of documents from storage. Once every few hours, your business partners upload large volumes of documents to S3 buckets in your account.

For resiliency, you’ve designed your application to send a message to an SQS queue for each of the uploaded documents, so you can efficiently process them without accidentally skipping any. The documents are processed using a Lambda function, which takes around two seconds to process a single document.

Processing these documents is a CPU-intensive operation, so you decide to process a single document per invocation. You want to use the power of Lambda to fan out the parallel processing to as many concurrent execution environments as possible. You want the Lambda function to scale up rapidly to process those documents in parallel as fast as possible, and scale-down to zero once all documents are processed to save costs.

When a business partner uploads 200,000 documents, 200,000 messages are sent to the SQS queue. The Lambda function is configured with an SQS event source, and it starts consuming the messages from the queue.

This diagram shows the results of running the test scenario before the SQS event source scaling improvements. As expected, you can see that concurrent executions grow by 60 per minute. It takes approximately 16 minutes to scale up to 900 concurrent executions gradually and process all the messages in the queue.

The following diagram shows the results of running the same test scenario after the SQS event source scaling improvements. The timeframe used for both charts is the same, but the performance on the second chart is better. Concurrent executions grow by 300 per minute. It only takes 4 minutes to scale up to 1,250 concurrent executions, and all the messages in the queue are processed in approximately 8 minutes.

Deploying this example

Use the example project to replicate this performance test in your own AWS account. Follow the instructions in README.md for provisioning the sample project in your AWS accounts using the AWS Cloud Development Kit (CDK).

This example project is configured to demonstrate a large-scale workload processing 200,000 messages. Running this sample project in your account may incur charges. See AWS Lambda pricing and Amazon SQS pricing.

Once deployed, use the application under the “sqs-cannon” directory to send 200,000 messages to the SQS queue (or reconfigure to any other number). It takes several minutes to populate the SQS queue with messages. After all messages are sent, enable the SQS event source, as described in the README.md, and monitor the charts in the provisioned CloudWatch dashboard.

The default concurrency quota for new AWS accounts is 1000. If you haven’t requested an increase in this quota, the number of concurrent executions is capped at this number. Use Service Quotas or contact your account team to request a concurrency increase.

Security best practices

Always use the least privileged permissions when granting your Lambda functions access to SQS queues. This reduces potential attack surface by ensuring that only specific functions have permissions to perform specific actions on specific queues. For example, in case your function only polls from the queue, grant it permission to read messages, but not to send new messages. A function execution role defines which actions your function is allowed to perform on other resources. A queue access policy defines the principals that can access this queue, and the actions that are allowed.

Use server-side encryption (SSE) to store sensitive data in encrypted SQS queues. With SSE, your messages are always stored in encrypted form, and SQS only decrypts them for sending to an authorized consumer. SSE protects the contents of messages in queues using SQS-managed encryption keys (SSE-SQS) or keys managed in the AWS Key Management Service (SSE-KMS).

Conclusion

The improved Lambda SQS event source polling scale-up capability enables up to five times faster scale-up performance for spiky event-driven workloads using SQS queues, at no additional cost. This improvement offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent invokes by SQS as an event source.

For more serverless learning resources, visit Serverless Land.

Sending and receiving webhooks on AWS: Innovate with event notifications

2023-10-30 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-and-receiving-webhooks-on-aws-innovate-with-event-notifications/

This post is written by Daniel Wirjo, Solutions Architect, and Justin Plock, Principal Solutions Architect.

Commonly known as reverse APIs or push APIs, webhooks provide a way for applications to integrate to each other and communicate in near real-time. It enables integration for business and system events.

Whether you’re building a software as a service (SaaS) application integrating with your customer workflows, or transaction notifications from a vendor, webhooks play a critical role in unlocking innovation, enhancing user experience, and streamlining operations.

This post explains how to build with webhooks on AWS and covers two scenarios:

Webhooks Provider: A SaaS application that sends webhooks to an external API.
Webhooks Consumer: An API that receives webhooks with capacity to handle large payloads.

It includes high-level reference architectures with considerations, best practices and code sample to guide your implementation.

Sending webhooks

To send webhooks, you generate events, and deliver them to third-party APIs. These events facilitate updates, workflows, and actions in the third-party system. For example, a payments platform (provider) can send notifications for payment statuses, allowing ecommerce stores (consumers) to ship goods upon confirmation.

AWS reference architecture for a webhook provider

The architecture consists of two services:

Webhook delivery: An application that delivers webhooks to an external endpoint specified by the consumer.
Subscription management: A management API enabling the consumer to manage their configuration, including specifying endpoints for delivery, and which events for subscription.

Considerations and best practices for sending webhooks

When building an application to send webhooks, consider the following factors:

Event generation: Consider how you generate events. This example uses Amazon DynamoDB as the data source. Events are generated by change data capture for DynamoDB Streams and sent to Amazon EventBridge Pipes. You then simplify the DynamoDB response format by using an input transformer.

With EventBridge, you send events in near real time. If events are not time-sensitive, you can send multiple events in a batch. This can be done by polling for new events at a specified frequency using EventBridge Scheduler. To generate events from other data sources, consider similar approaches with Amazon Simple Storage Service (S3) Event Notifications or Amazon Kinesis.

Filtering: EventBridge Pipes support filtering by matching event patterns, before the event is routed to the target destination. For example, you can filter for events in relation to status update operations in the payments DynamoDB table to the relevant subscriber API endpoint.

Delivery: EventBridge API Destinations deliver events outside of AWS using REST API calls. To protect the external endpoint from surges in traffic, you set an invocation rate limit. In addition, retries with exponential backoff are handled automatically depending on the error. An Amazon Simple Queue Service (SQS) dead-letter queue retains messages that cannot be delivered. These can provide scalable and resilient delivery.

Payload Structure: Consider how consumers process event payloads. This example uses an input transformer to create a structured payload, aligned to the CloudEvents specification. CloudEvents provides an industry standard format and common payload structure, with developer tools and SDKs for consumers.

Payload Size: For fast and reliable delivery, keep payload size to a minimum. Consider delivering only necessary details, such as identifiers and status. For additional information, you can provide consumers with a separate API. Consumers can then separately call this API to retrieve the additional information.

Security and Authorization: To deliver events securely, you establish a connection using an authorization method such as OAuth. Under the hood, the connection stores the credentials in AWS Secrets Manager, which securely encrypts the credentials.

Subscription Management: Consider how consumers can manage their subscription, such as specifying HTTPS endpoints and event types to subscribe. DynamoDB stores this configuration. Amazon API Gateway, Amazon Cognito, and AWS Lambda provide a management API for operations.

Costs: In practice, sending webhooks incurs cost, which may become significant as you grow and generate more events. Consider implementing usage policies, quotas, and allowing consumers to subscribe only to the event types that they need.

Monetization: Consider billing consumers based on their usage volume or tier. For example, you can offer a free tier to provide a low-friction access to webhooks, but only up to a certain volume. For additional volume, you charge a usage fee that is aligned to the business value that your webhooks provide. At high volumes, you offer a premium tier where you provide dedicated infrastructure for certain consumers.

Monitoring and troubleshooting: Beyond the architecture, consider processes for day-to-day operations. As endpoints are managed by external parties, consider enabling self-service. For example, allow consumers to view statuses, replay events, and search for past webhook logs to diagnose issues.

Advanced Scenarios: This example is designed for popular use cases. For advanced scenarios, consider alternative application integration services noting their Service Quotas. For example, Amazon Simple Notification Service (SNS) for fan-out to a larger number of consumers, Lambda for flexibility to customize payloads and authentication, and AWS Step Functions for orchestrating a circuit breaker pattern to deactivate unreliable subscribers.

Receiving webhooks

To receive webhooks, you require an API to provide to the webhook provider. For example, an ecommerce store (consumer) may rely on notifications provided by their payment platform (provider) to ensure that goods are shipped in a timely manner. Webhooks present a unique scenario as the consumer must be scalable, resilient, and ensure that all requests are received.

AWS reference architecture for a webhook consumer

In this scenario, consider an advanced use case that can handle large payloads by using the claim-check pattern.

At a high-level, the architecture consists of:

API: An API endpoint to receive webhooks. An event-driven system then authorizes and processes the received webhooks.
Payload Store: S3 provides scalable storage for large payloads.
Webhook Processing: EventBridge Pipes provide an extensible architecture for processing. It can batch, filter, enrich, and send events to a range of processing services as targets.

Considerations and best practices for receiving webhooks

When building an application to receive webhooks, consider the following factors:

Scalability: Providers typically send events as they occur. API Gateway provides a scalable managed endpoint to receive events. If unavailable or throttled, providers may retry the request, however, this is not guaranteed. Therefore, it is important to configure appropriate rate and burst limits. Throttling requests at the entry point mitigates impact on downstream services, where each service has its own quotas and limits. In many cases, providers are also aware of impact on downstream systems. As such, they send events at a threshold rate limit, typically up to 500 transactions per second (TPS).

In addition, API Gateway allows you to validate requests, monitor for any errors, and protect against distributed denial of service (DDoS). This includes Layer 7 and Layer 3 attacks, which are common threats to webhook consumers given public exposure.

Authorization and Verification: Providers can support different authorization methods. Consider a common scenario with Hash-based Message Authentication Code (HMAC), where a shared secret is established and stored in Secrets Manager. A Lambda function then verifies integrity of the message, processing a signature in the request header. Typically, the signature contains a timestamped nonce with an expiry to mitigate replay attacks, where events are sent multiple times by an attacker. Alternatively, if the provider supports OAuth, consider securing the API with Amazon Cognito.

Payload Size: Providers may send a variety of payload sizes. Events can be batched to a single larger request, or they may contain significant information. Consider payload size limits in your event-driven system. API Gateway and Lambda have limits of 10 Mb and 6 Mb. However, DynamoDB and SQS are limited to 400kb and 256kb (with extension for large messages) which can represent a bottleneck.

Instead of processing the entire payload, S3 stores the payload. It is then referenced in DynamoDB, via its bucket name and object key. This is known as the claim-check pattern. With this approach, the architecture supports payloads of up to 6mb, as per the Lambda invocation payload quota.

Idempotency: For reliability, many providers prioritize delivering at-least-once, even if it means not guaranteeing exactly once delivery. They can transmit the same request multiple times, resulting in duplicates. To handle this, a Lambda function checks against the event’s unique identifier against previous records in DynamoDB. If not already processed, you create a DynamoDB item.

Ordering: Consider processing requests in its intended order. As most providers prioritize at-least-once delivery, events can be out of order. To indicate order, events may include a timestamp or a sequence identifier in the payload. If not, ordering may be on a best-efforts basis based on when the webhook is received. To handle ordering reliably, select event-driven services that ensure ordering. This example uses DynamoDB Streams and EventBridge Pipes.

Flexible Processing: EventBridge Pipes provide integrations to a range of event-driven services as targets. You can route events to different targets based on filters. Different event types may require different processors. For example, you can use Step Functions for orchestrating complex workflows, Lambda for compute operations with less than 15-minute execution time, SQS to buffer requests, and Amazon Elastic Container Service (ECS) for long-running compute jobs. EventBridge Pipes provide transformation to ensure only necessary payloads are sent, and enrichment if additional information is required.

Costs: This example considers a use case that can handle large payloads. However, if you can ensure that providers send minimal payloads, consider a simpler architecture without the claim-check pattern to minimize cost.

Conclusion

Webhooks are a popular method for applications to communicate, and for businesses to collaborate and integrate with customers and partners.

This post shows how you can build applications to send and receive webhooks on AWS. It uses serverless services such as EventBridge and Lambda, which are well-suited for event-driven use cases. It covers high-level reference architectures, considerations, best practices and code sample to assist in building your solution.

For standards and best practices on webhooks, visit the open-source community resources Webhooks.fyi and CloudEvents.io.

For more serverless learning resources, visit Serverless Land.

Filtering events in Amazon EventBridge with wildcard pattern matching

2023-10-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/filtering-events-in-amazon-eventbridge-with-wildcard-pattern-matching/

This post is written by Rajdeep Banerjee, Sr PSA, and Brian Krygsman, Sr. Solutions Architect.

Amazon EventBridge recently announced support for wildcard filters in rule event patterns. An EventBridge event bus is a serverless event router that helps you decouple your event-driven systems. You can route events between your systems, AWS services, or third-party SaaS services. You attach a rule to your event bus to define logic for routing events from producers to consumers.

You set an event pattern on the rule to filter incoming events to specific consumers. The new wildcard filter lets you build more flexible event matching patterns to reduce rule management and optimize your event consumers. This shows how these EventBridge attributes work together.

Wildcard filters use the wildcard character (*) to match zero, single, or multiple characters in a string value. For example, a filter string like "*.png" matches strings that end with ".png".

You can also use multiple wildcard characters in a filter. For example, a filter string like "*Title*" matches string values that include "Title" in the middle. When using wildcard filters, be careful to avoid matching more events than you intend.

This blog post describes how you can use wildcard filters in example scenarios. For more information about event-driven architectures, visit Serverless Land.

Wildcard pattern matching in S3 Event Notifications

Applications must often perform an action when new data is available. One example can be to process trading data uploaded to your Amazon S3 bucket. The data may be stored in individual folders depending on the date, time, and stock symbol. Business rules may dictate that when stock XYZ receives a file, it must send a notification to a downstream system.

This is the typical folder structure in an S3 bucket:

S3 can send an event to EventBridge when an object is written to a bucket. The S3 event includes the object key (for example, 2023-10-01/T13:22:22Z/XYZ/filename.ext). When any object is uploaded to the XYZ folder, you can use an EventBridge rule to send these events to a downstream service like an Amazon SQS.

Before this launch, you would first send the event to an AWS Lambda function. Existing prefix and suffix filters alone are insufficient because of the extra date and time folders. The function would run your code to inspect the object path for the stock symbol. Your code would then forward events to SQS when they matched.

With the new wildcard patterns in EventBridge rules, the logic is simpler. You no longer need to create a Lambda function to run custom matching code. You can instead use wildcard characters in the rule’s filter pattern, matching against portions of the S3 object key.

To use this, start with creating a new rule in the EventBridge console:

Choose Next. Keep the standard parameters and move to the Event pattern section. Here you can use a JSON-based event pattern.

{
  "source": ["aws.s3"],
  "detail": {
    "bucket": {
      "name": ["intraday-trading-data"]
    },
    "object": {
      "key": [{
        "wildcard": "*/XYZ/*"
      }]
    }
  }
}

This pattern looks for Event Notifications from a specific bucket. The pattern then filters the events further by the object keys that match "*/XYZ/*". The rule filters out notifications from other stock symbols, listening to only “XYZ“ data, irrespective of date and time of the data feed.
To use an SQS queue for the filtered event target, you must provide resource-based policies for EventBridge to send messages to the queue.
Choose Next and review the rule details before saving.
Before testing, enable S3 event notifications to EventBridge in the S3 console:
To test the new wildcard pattern, upload any sample CSV file in the XYZ folder to launch the Event Notifications.
You can monitor EventBridge CloudWatch metrics to check if the rule is invoked from the S3 upload. The SQS CloudWatch metrics show if messages are received from the EventBridge rule.

Filtering based on Amazon Resource Name (ARN)

Customers often need to perform actions when AWS Identity and Access Management (IAM) policies are added to specific roles. You can achieve this by creating custom EventBridge rules, which filter the event to match or create multiple rules to achieve the same effect. With the newly introduced wildcard filter, the task to invoke an action is simplified.

Consider an IAM role with fine-grained IAM policies attached. You may need to ensure any new policy attached to this role must be from a specific ARNs. This action can be implemented like this.

When you attach a new IAM policy to a role, it generates an event like this:

{
    "version": "0",
    "id": "0b85984e-ec53-84ba-140e-9e0cff7f05b4",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.iam",
    "account": "123456789012",
    "time": "2023-10-07T20:23:28Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "arn": "arn:aws:sts::123456789012:assumed-role/Admin/UserName",
            // ... additional detail fields
        },
        "eventTime": "2023-10-07T20:23:28Z",
        "eventSource": "iam.amazonaws.com",
        "eventName": "AttachRolePolicy",
        // ... additional detail fields

    }
}

You can create a rule matching against a combination of these event properties. You can filter detail.userIdentity.arn with a wildcard to catch events that come from a particular ARN. You can then route these events to a target like an Amazon CloudWatch Logs stream to record the change. You can also route them to Amazon Simple Notification Service (SNS). You can use the SNS notification to start a review and ensure that the newly attached policies are well-crafted as part of your reconciliation and audit process. The filter looks like this:

{
  "source": ["aws.iam"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["iam.amazonaws.com"],
    "eventName": ["AttachRolePolicy"],
    "userIdentity": {
      "arn": [{
        "wildcard": "arn:aws:sts::123456789012:assumed-role/*/*"
      }]
    }
  }
}

Filtering custom events

You can use EventBridge to build your own event-driven systems with loosely coupled, scalable application services. When building event-driven applications in AWS, you can publish events to the default event bus, or create a custom event bus. You define the structure of events emitted from your services.

This structure is known as the event schema. When you attach rules to your bus to route events from producers to consumers, you match against values from properties in your event schema. Wildcard filters allow you to match property values that are unknown ahead of time, or across multiple value variants.

Consider an ecommerce application as an example. You may have several decoupled services working together, like a shopping cart service, an inventory service, and others. Each of these services emits events onto your event bus as your customers shop.

Events may include errors, to record problems customers encounter using your system. You can use a single rule with a wildcard filter to match all error events and send them to a common target. This allows you to simplify observability across your services.

This is the event flow:

Your shopping cart service may emit a timeout error event:

{
  "version": "0",
  "id": "24a4b957-570d-590b-c213-2a72e5dc4c66",
  "detail-type": "shopping.cart.error.timeout",
  "source": "com.mybusiness.shopping.cart",
  "account": "123456789012",
  "time": "2023-10-06T03:28:44Z",
  "region": "us-west-2",
  "resources": [],
  "detail": {
    "message": "Operation timed out.",
    "related-entity": {
      "entity-type": "order",
      "id": "123"
    },
    // ... additional detail fields
  }
}

The detail-type property of the example event determines what type of event this is. Other services may emit error events with different prefixes in detail-type. Other error types might have different suffixes in detail-type.

For example, an inventory service may emit an out-of-stock error event like this:

{
  "version": "0",
  "id": "e456f480-cc1e-47fa-8399-ab2e54116958",
  "detail-type": "shopping.inventory.error.outofstock",
  "source": "com.mybusiness.shopping.inventory",
  "account": "123456789012",
  "time": "2023-10-06T03:28:44Z",
  "region": "us-west-2",
  "resources": [],
  "detail": {
    "message": "Product cannot be added to a cart. Out of stock.",
    "related-entity": {
      "entity-type": "product",
      "id": "456"
    }
    // ... additional detail fields
  }
}

To route these events to a common target like an Amazon CloudWatch Logs stream, you can create a rule with a wildcard filter matching against detail-type. You can combine this with a prefix filter on source that filters events down to only services from your shopping system. The filter looks like this:

{
  "source": [{
    "prefix": "com.mybusiness.shopping."
  }],
  "detail-type": [{
    "wildcard": "*.error.*"
  }]
}

Without a wildcard filter you would need to create a more complex matching pattern, possibly across multiple rules.

Conclusion

Wildcard filters in EventBridge rules help simplify your event driven applications by ensuring the correct events are passed on to your targets. The new feature reduces the need for custom code, which was required previously. Try EventBridge rules with wildcard filters and experience the benefits of this new feature in your event-driven serverless applications.

For more serverless learning resources, visit Serverless Land.

Visually design your application with AWS Application Composer

2023-09-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visually-design-your-application-with-aws-application-composer/

This post is written by Paras Jain, Senior Technical Account Manager and Curtis Darst, Senior Solutions Architect.

AWS Application Composer allows you to design and build applications visually using 13 AWS CloudFormation resource types. Today, the service expands the support to all available CloudFormation resource types.

Overview

AWS Application Composer provides you with an interactive canvas for visually designing your applications. You use a drag-and-drop interface to create an application design from scratch or import an existing application definition to edit it.

Modern event-driven applications are built on many services. Visualizing an architecture helps you better understand the relationship between those services and identify gaps and areas of improvements.

You can use AWS Application Composer in local sync mode to connect to your local file system. That way your changes are updated to your file system. This way, you can integrate with existing version control systems and development and deployment workflow.

AWS Application Composer provides a drag-and-drop canvas view and a code editor template view. Changes made to one view reflect on the other view. Similarly, changes made in AWS Application Composer are reflected in your local code editor and vice versa.

What is AWS releasing today?

AWS Application Composer already supports 13 serverless resource types. For these resource types, AWS Application Composer provides enhanced component cards.

Enhanced component cards allow you to configure and join components together. Today’s release gives you the ability to drag and drop 1,134 resource types to the canvas and configure these using resource configuration pane.

This blog post shows how you can create a fault tolerant compute architecture involving an Application Load Balancer, two Amazon Elastic Compute Cloud (EC2) instances in different Availability Zones, and an Amazon Relational Database Service (RDS) instance.

Conceptually, this is the application design:

Designing a scalable and fault tolerant compute stack

For this blog post, you create a fault tolerant compute stack consisting of an ALB, two EC2 instances in two different Availability Zones with automatic scaling capabilities and an RDS instance.

Navigate to the AWS Application Composer service in the AWS Management Console. Create a new project by choosing Create Project.
If you are using one of the browsers that support local sync (Google Chrome and Microsoft Edge at this time), you can connect the project to the local file system and edit using command line interface or integrated development environment. To do so:
1. Choose Menu, and Local sync.
2. Select a folder on your file system and allow the necessary permissions from the browser when prompted.

Some components in architecture diagrams, like security groups, can be visualized in the canvas but you don’t necessarily want to represent them as prominent part of architectures. Therefore, for brevity, instead of dragging and dropping, you only configure them in the template mode.

Choose Template to switch to the template view.

Paste the following code in the template editor:

Resources:
  DBEC2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Open database for access
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '3306'
          ToPort: '3306'
          SourceSecurityGroupId: !Ref WebServerSecurityGroup
      VpcId:
        ParameterId: VpcId
        Format: AWS::EC2::VPC::Id
  WebServerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable HTTP access via port 80 locked down to the load balancer + SSH access.
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '80'
          ToPort: '80'
          SourceSecurityGroupId: !Select
            - 0
            - !GetAtt LoadBalancer.SecurityGroups
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp:
            ParameterId: SSHLocation
            Format: String
            Default: 0.0.0.0/0
      VpcId:
        ParameterId: VpcId
        Format: AWS::EC2::VPC::Id
  WebServerGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      VPCZoneIdentifier:
        ParameterId: Subnets
        Format: List<AWS::EC2::Subnet::Id>
      LaunchConfigurationName: !Ref LaunchConfiguration
      MinSize: '1'
      MaxSize: '5'
      DesiredCapacity:
        ParameterId: WebServerCapacity
        Format: Number
        Default: '1'
      TargetGroupARNs:
        - !Ref TargetGroup

Switch back to canvas view.

Add an Application Load Balancer, Load Balancer Listener, Load Balancer Target Group, Auto Scaling Launch Configuration and an RDS DB instance.
1. Under the resources pane on the left, enter loadbalancer in the search bar.
2. Drag and drop AWS::ElasticLoadBalancingV2::LoadBalancer from the resources pane to the canvas.
Repeat these steps for other four resource types. Choose Arrange. Your canvas now appears as follows:
Start configuring the remaining component cards. You can connect two cards visually by connecting the right connection port of one card to the left connection port of another card. At the moment, not all component cards support visual connectivity. For those cards you can establish connectivity using the resource configuration pane. You can also update the template code directly. Either way, the connectivity is reflected in the canvas.
You configure the components in the architecture using the Resource configuration pane. First, configure the Application Load Balancer listener:
1. Choose the Listener Card in the canvas.
2. Choose Details.
3. Paste the following code in the Resource Configuration Section:
```
DefaultActions:
     Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref LoadBalancer
Port: '80'
Protocol: HTTP
```
4. Choose Save.
Repeat the same for remaining resource types with the following code. The code for the Load Balancer Card is:
```
Subnets:
ParameterId: Subnets
Format: List<AWS::EC2::Subnet::Id>
```

The code for the Target Group card is:

HealthCheckPath: /
HealthCheckIntervalSeconds: 10
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
Port: 80
Protocol: HTTP
UnhealthyThresholdCount: 5
VpcId:
  ParameterId: VpcId
  Format: AWS::EC2::VPC::Id
TargetGroupAttributes:
  - Key: stickiness.enabled
    Value: 'true'
  - Key: stickiness.type
    Value: lb_cookie
  - Key: stickiness.lb_cookie.duration_seconds
    Value: '30'

This is the code for the Launch Configuration. Replace <image-id>with the right image id for your Region.
```
ImageId: <image-id>
InstanceType: t2.small
SecurityGroups: !Ref WebServerSecurityGroup
```

The code for DBInstance is:

DBName:
  ParameterId: DBName
  Format: String
  Default: wordpressdb
Engine: MySQL
MultiAZ:
  ParameterId: MultiAZDatabase
  Format: String
  Default: 'false'
MasterUsername:
  ParameterId: DBUser
  Format: String
MasterUserPassword:
  ParameterId: DBPassword
  Format: String
DBInstanceClass:
  ParameterId: DBClass
  Format: String
  Default: db.t2.small
AllocatedStorage:
  ParameterId: DBAllocatedStorage
  Format: Number
  Default: '5'
VPCSecurityGroups:
  - !GetAtt DBEC2SecurityGroup.GroupId

Choose Arrange. Your canvas looks like this:
This completes the visualization portion of the application architecture. You can export this visualization by using the Export Canvas option in the menu.

Adding observability

After adding the core application components, you now add observability to your application. Observability enables you to collect and analyze important events and metrics for your applications.

To be notified of any changes to the RDS database configuration, use a serverless design pattern to avoid running instances when they are not needed. Conceptually, your observability stack looks like:

Amazon EventBridge captures the events emitted by Amazon RDS.
For any event matching the EventBridge rule, EventBridge invokes AWS Lambda.
Lambda runs the custom logic and send an email to an Amazon Simple Notification Service(SNS) topic. You can subscribe interested parties to this SNS topic.

There are now two distinct sets of components in the architecture. One set of components comprises the core application while another comprises the observability logic.

AWS Application Composer allows you to organize different components in groups. This allows you and your team to focus on one portion of the architecture at a time. Before adding observability components, first create a group of the existing components.

Select a component card.
While holding the ‘shift’ key, select the other cards. Once all resources are selected, select Group action.

Once the group is created, follow these steps to rename the group.

Select the Group card.
Rename the group to Application Stack.
Choose Save.

Now add the observability components. Repeat the process of searching then dragging and dropping of the following components from the Resources pane to the canvas outside the Application Stack group.

1. EventBridge Event rule
2. Lambda Function
3. SNS Topic
4. SNS Subscription

Repeat the process for grouping these 4 components in a group with the name Observability.

Some of the components have a small circle on their sides. These are connector ports. A port on the right side of a card indicates an opportunity for the card to invoke another card. A port on the left side indicates an opportunity for a card to be invoked by another card. You can connect two cards by clicking the right port of a card and dragging to the left port of another card.

Create the observability stack by following the following steps:

Connect the right port of EventBridge Event Rule card to the left port of Lambda Function Card. This makes the Lambda function a target for the EventBridge rule.
Connect the right port of the Lambda function to the left port of the SNS topic. This adds the necessary AWS Identity and Management(IAM) permissions policies and environment variable to the Lambda function to provide it the ability to interact with the SNS topic.
Select the EventBridge event rule card and replace the event pattern code in the resource properties pane with the following code. This event pattern monitors the RDS instance for an instance change event and pushes this event to Lambda.
```
source:
  - aws.rds
detail-type:
  - RDS DB Instance Event
```
Select the SNS subscription to see the resource configuration pane. Add the following code to the resource configuration. Replace [email protected] with your email address.
```
    Endpoint: [email protected]
    Protocol: email
    TopicArn: !Ref Topic
```
Repeat the group creating steps to create an observability group comprising an EventBridge event rule, Lambda function, SNS topic, and SNS subscription. Name the group Observability. Your group appears as follows:

Deploying your AWS Architecture

Before you can provision the resources for your architecture, you must make the configuration changes as per development and deployment best practices for your organization.

For example, you must provide a strong DB password, name the resources as per the naming conventions of your organization. You must also add the Lambda code with your custom logic.

AWS Application Composer provides you the ability to configure each resource via resource configuration panel. This enables you to always stay in-context while creating a complex architecture. You can quickly find the resource you want to edit instead of scrolling through a large template file. If you prefer to edit the template file directly, you can use the Template View of AWS Application Composer.

Alternatively, if you have enabled the local sync, you can edit the file directly in your integrated development environment (IDE) where changes made in AWS Application Composer are saved in real-time. If you have not enabled the local sync, you can export the template using the Save Template File option in the menu. After concluding your changes, you can provision the AWS infrastructure either by using AWS CloudFormation Console or by command line interface.

Pricing

AWS Application Composer does not provision any AWS resources. Using AWS Application Composer to design your application architecture is free. You are only charged when you provision AWS Resources using the template file created by AWS Application Composer.

Conclusion

This blog post shows how to use AWS Application Composer to create and update an application architecture using any of the 1,134 CloudFormation resource types. It covers how to configure local sync mode to integrate the AWS Application to your development workflow. The post demonstrates how to organize your architecture into two distinct groups. Changes made in Canvas view are reflected in the template view and vice versa.

To learn more about AWS Application Composer visit https://aws.amazon.com/application-composer/.

For more serverless learning resources, visit Serverless Land.

Architecting for scale with Amazon API Gateway private integrations

2023-09-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/architecting-for-scale-with-amazon-api-gateway-private-integrations/

This post is written by Lior Sadan, Sr. Solutions Architect, and Anandprasanna Gaitonde,
Sr. Solutions Architect.

Organizations use Amazon API Gateway to build secure, robust APIs that expose internal services to other applications and external users. When the environment evolves to many microservices, customers must ensure that the API layer can handle the scale without compromising security and performance. API Gateway provides various API types and integration options, and builders must consider how each option impacts the ability to scale the API layer securely and performantly as the microservices environment grows.

This blog post compares architecture options for building scalable, private integrations with API Gateway for microservices. It covers REST and HTTP APIs and their use of private integrations, and shows how to develop secure, scalable microservices architectures.

Overview

Here is a typical API Gateway implementation with backend integrations to various microservices:

API Gateway handles the API layer, while integrating with backend microservices running on Amazon EC2, Amazon Elastic Container Service (ECS), or Amazon Elastic Kubernetes Service (EKS). This blog focuses on containerized microservices that expose internal endpoints that the API layer then exposes externally.

To keep microservices secure and protected from external traffic, they are typically implemented within an Amazon Virtual Private Cloud (VPC) in a private subnet, which is not accessible from the internet. API Gateway offers a way to expose these resources securely beyond the VPC through private integrations using VPC link. Private integration forwards external traffic sent to APIs to private resources, without exposing the services to the internet and without leaving the AWS network. For more information, read Best Practices for Designing Amazon API Gateway Private APIs and Private Integration.

The example scenario has four microservices that could be hosted in one or more VPCs. It shows the patterns integrating the microservices with front-end load balancers and API Gateway via VPC links.

While VPC links enable private connections to microservices, customers may have additional needs:

Increase scale: Support a larger number of microservices behind API Gateway.
Independent deployments: Dedicated load balancers per microservice enable teams to perform blue/green deployments independently without impacting other teams.
Reduce complexity: Ability to use existing microservice load balancers instead of introducing additional ones to achieve API Gateway integration
Low latency: Ensure minimal latency in API request/response flow.

API Gateway offers HTTP APIs and REST APIs (see Choosing between REST APIs and HTTP APIs) to build RESTful APIs. For large microservices architectures, the API type influences integration considerations:

	VPC link supported integrations	Quota on VPC links per account per Region
REST API	Network Load Balancer (NLB)	20
HTTP API	Network Load Balancer (NLB), Application Load Balancer (ALB), and AWS Cloud Map	10

This post presents four private integration options taking into account the different capabilities and quotas of VPC link for REST and HTTP APIs:

Option 1: HTTP API using VPC link to multiple NLBs or ALBs.
Option 2: REST API using multiple VPC links.
Option 3: REST API using VPC link with NLB.
Option 4: REST API using VPC link with NLB and ALB targets.

Option 1: HTTP API using VPC link to multiple NLBs or ALBs

HTTP APIs allow connecting a single VPC link to multiple ALBs, NLBs, or resources registered with an AWS Cloud Map service. This provides a fan out approach to connect with multiple backend microservices. However, load balancers integrated with a particular VPC link should reside in the same VPC.

Two microservices are in a single VPC, each with its own dedicated ALB. The ALB listeners direct HTTPS traffic to the respective backend microservice target group. A single VPC link is connected to two ALBs in that VPC. API Gateway uses path-based routing rules to forward requests to the appropriate load balancer and associated microservice. This approach is covered in Best Practices for Designing Amazon API Gateway Private APIs and Private Integration – HTTP API. Sample CloudFormation templates to deploy this solution are available on GitHub.

You can add additional ALBs and microservices within VPC IP space limits. Use the Network Address Usage (NAU) to design the distribution of microservices across VPCs. Scale beyond one VPC by adding VPC links to connect more VPCs, within VPC link quotas. You can further scale this by using routing rules like path-based routing at the ALB to connect more services behind a single ALB (see Quotas for your Application Load Balancers). This architecture can also be built using an NLB.

Benefits:

High degree of scalability. Fanning out to multiple microservices using single VPC link and/or multiplexing capabilities of ALB/NLB.
Direct integration with existing microservices load balancers eliminates the need for introducing new components and reducing operational burden.
Lower latency for API request/response thanks to direct integration.
Dedicated load balancers per microservice enable independent deployments for microservices teams.

Option 2: REST API using multiple VPC links

For REST APIs, the architecture to support multiple microservices may differ due to these considerations:

NLB is the only supported private integration for REST APIs.
VPC links for REST APIs can have only one target NLB.

A VPC link is required for each NLB, even if the NLBs are in the same VPC. Each NLB serves one microservice, with a listener to route API Gateway traffic to the target group. API Gateway path-based routing sends requests to the appropriate NLB and corresponding microservice. The setup required for this private integration is similar to the example described in Tutorial: Build a REST API with API Gateway private integration.

To scale further, add additional VPC link and NLB integration for each microservice, either in the same or different VPCs based on your needs and isolation requirements. This approach is limited by the VPC links quota per account per Region.

Benefits:

Single NLB in the request path reduces operational complexity.
Dedicated NLBs for each enable independent microservice deployments.
No additional hops in the API request path results in lower latency.

Considerations:

Limits scalability due to a one-to-one mapping of VPC links to NLBs and microservices limited by VPC links quota per account per Region.

Option 3: REST API using VPC link with NLB

The one-to-one mapping of VPC links to NLBs and microservices in option 2 has scalability limits due to VPC link quotas. An alternative is to use multiple microservices per NLB.

A single NLB fronts multiple microservices in a VPC by using multiple listeners, with each listener on a separate port per microservice. Here, NLB1 fronts two microservices in one VPC. NLB2 fronts two other microservices in a second VPC. With multiple microservices per NLB, routing is defined for the REST API when choosing the integration point for a method. You define each service using a combination of selecting the VPC Link, which is integrated with a specific NLB, and a specific port that is assigned for each microservice at the NLB Listener and addressed from the Endpoint URL.

To scale out further, add additional listeners to existing NLBs, limited by Quotas for your Network Load Balancers. In cases where each microservice has its dedicated load balancer or access point, those are configured as targets to the NLB. Alternatively, integrate additional microservices by adding additional VPC links.

Benefits:

Larger scalability – limited by NLB listener quotas and VPC link quotas.
Managing fewer NLBs supporting multiple microservices reduces operational complexity.
Low latency with a single NLB in the request path.

Considerations:

Shared NLB configuration limits independent deployments for individual microservices teams.

Option 4: REST API using VPC link with NLB and ALB targets

Customers often build microservices with ALB as their access point. To expose these via API Gateway REST APIs, you can take advantage of ALB as a target for NLB. This pattern also increases the number of microservices supported compared to the option 3 architecture.

A VPC link (VPCLink1) is created with NLB1 in a VPC1. ALB1 and ALB2 front-end the microservices mS1 and mS2, added as NLB targets on separate listeners. VPC2 has a similar configuration. Your isolation needs and IP space determine if microservices can reside in one or multiple VPCs.

To scale out further:

Create additional VPC links to integrate new NLBs.
Add NLB listeners to support more ALB targets.
Configure ALB with path-based rules to route requests to multiple microservices.

Benefits:

High scalability integrating services using NLBs and ALBs.
Independent deployments per team is possible when each ALB is dedicated to a single microservice.

Considerations:

Multiple load balancers in the request path can increase latency.

Considerations and best practices

Beyond the scaling considerations of scale with VPC link integration discussed in this blog, there are other considerations:

Evaluate REST APIs and HTTP APIs capabilities to meet your requirements.
Choose the optimal load balancer type for your application needs.
For multi-account architecture reference Building private cross-account APIs using Amazon API Gateway and AWS PrivateLink.
Avoid exceeding default quotas rapidly and request a quota increase for higher limit requirements.
Monitor Service Quotas to plan proactively and mitigate risks as your architecture evolves. Consider the use of the Quota Monitor solution for monitoring.
See Best Practices for Designing Amazon API Gateway Private APIs and Private Integration – Rest API.

Conclusion

This blog explores building scalable API Gateway integrations for microservices using VPC links. VPC links enable forwarding external traffic to backend microservices without exposing them to the internet or leaving the AWS network. The post covers scaling considerations based on using REST APIs versus HTTP APIs and how they integrate with NLBs or ALBs across VPCs.

While API type and load balancer selection have other design factors, it’s important to keep the scaling considerations discussed in this blog in mind when designing your API layer architecture. By optimizing API Gateway implementation for performance, latency, and operational needs, you can build a robust, secure API to expose microservices at scale.

For more serverless learning resources, visit Serverless Land.

Centralizing management of AWS Lambda layers across multiple AWS Accounts

2023-09-19 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/centralizing-management-of-aws-lambda-layers-across-multiple-aws-accounts/

This post is written by Debasis Rath, Sr. Specialist SA-Serverless, Kanwar Bajwa, Enterprise Support Lead, and Xiaoxue Xu, Solutions Architect (FSI).

Enterprise customers often manage an inventory of AWS Lambda layers, which provide shared code and libraries to Lambda functions. These Lambda layers are then shared across AWS accounts and AWS Organizations to promote code uniformity, reusability, and efficiency. However, as enterprises scale on AWS, managing shared Lambda layers across an increasing number of functions and accounts is best handled with automation.

This blog post centralizes the management of Lambda layers to ensure compliance with your enterprise’s governance standards, and promotes consistency across your infrastructure. This centralized management uses a detective configuration approach to identify non-compliant Lambda functions systematically using outdated Lambda layer versions, and corrective measures to remediate these Lambda functions by updating them with the right layer version.

This solution uses AWS services such as AWS Config, Amazon EventBridge Scheduler, AWS Systems Manager (SSM) Automation, and AWS CloudFormation StackSets.

Solution overview

This solution offers two parts for layers management:

On-demand visibility into outdated Lambda functions.
Automated remediation of the affected Lambda functions.

This is the architecture for the first part. Users with the necessary permissions can use AWS Config advanced queries to obtain a list of outdated Lambda functions.

The current configuration state of any Lambda function is captured by the configuration recorder within the member account. This data is then aggregated by the AWS Config Aggregator within the management account. The aggregated data can be accessed using queries.

This diagram depicts the architecture for the second part. Administrators must manually deploy CloudFormation StackSets to initiate the automatic remediation of outdated Lambda functions.

The manual remediation trigger is used instead of a fully automated solution. Administrators schedule this manual trigger as part of a change request to minimize disruptions to the business. All business stakeholders owning affected Lambda functions should receive this change request notification and have adequate time to perform unit tests to assess the impact.

Upon receiving confirmation from the business stakeholders, the administrator deploys the CloudFormation StackSets, which in turn deploy the CloudFormation stack to the designated member account and Region. After the CloudFormation stack deployment, the EventBridge scheduler invokes an AWS Config custom rule evaluation. This rule identifies the non-compliant Lambda functions, and later updates them using SSM Automation runbooks.

The following walkthrough deploys the two-part architecture described, using a centralized approach to layer management as in the preceding diagram. A decentralized approach scatters management and updates of Lambda layers across accounts, making enforcement more difficult and error-prone.

This solution is also available on GitHub.

Prerequisites

For the solution walkthrough, you should have the following prerequisites:

CloudFormation StackSets enabled for your AWS Organizations. Refer to the documentation to enable AWS CloudFormation StackSets at the organizational level.
AWS Config enabled for your AWS Organizations. Refer to the provided documentation to enable AWS Config at the organizational level.
An AWS Config Aggregator set up to collect recorded configuration data from all accounts across all AWS Regions within your AWS Organizations. Refer to the provided documentation to create an aggregator.
The necessary permissions to deploy CloudFormation StackSets and to query AWS Config.
AWS CloudShell to run scripts with the AWS Command Line Interface (CLI).

Writing an on-demand query for outdated Lambda functions

First, you write and run an AWS Config advanced query to identify the accounts and Regions where the outdated Lambda functions reside. This is helpful for end users to determine the scope of impact, and identify the responsible groups to inform based on the affected Lambda resources.

Follow these procedures to understand the scope of impact using the AWS CLI:

Open CloudShell in your AWS account.

Run the following AWS CLI command. Replace YOUR_AGGREGATOR_NAME with the name of your AWS Config aggregator, and YOUR_LAYER_ARN with the outdated Lambda layer Amazon Resource Name (ARN).

aws configservice select-aggregate-resource-config \
--expression "SELECT accountId, awsRegion, configuration.functionName, configuration.version WHERE resourceType = 'AWS::Lambda::Function' AND configuration.layers.arn = 'YOUR_LAYER_ARN'" \
--configuration-aggregator-name 'YOUR_AGGREGATOR_NAME' \
--query "Results" \
--output json | \
jq -r '.[] | fromjson | [.accountId, .awsRegion, .configuration.functionName, .configuration.version] | @csv' > output.csv

The results are saved to a CSV file named output.csv in the current working directory. This file contains the account IDs, Regions, names, and versions of the Lambda functions that are currently using the specified Lambda layer ARN. Refer to the documentation on how to download a file from AWS CloudShell.

To explore more configuration data and further improve visualization using services like Amazon Athena and Amazon QuickSight, refer to Visualizing AWS Config data using Amazon Athena and Amazon QuickSight.

Deploying automatic remediation to update outdated Lambda functions

Next, you deploy the automatic remediation CloudFormation StackSets to the affected accounts and Regions where the outdated Lambda functions reside. You can use the query outlined in the previous section to obtain the account IDs and Regions.

Updating Lambda layers may affect the functionality of existing Lambda functions. It is essential to notify affected development groups, and coordinate unit tests to prevent unintended disruptions before remediation.

To create and deploy CloudFormation StackSets from your management account for automatic remediation:

Run the following command in CloudShell to clone the GitHub repository:
```
git clone https://github.com/aws-samples/lambda-layer-management.git
```

Run the following CLI command to upload your template and create the stack set container.

aws cloudformation create-stack-set \
  --stack-set-name layers-remediation-stackset \
  --template-body file://lambda-layer-management/layer_manager.yaml

Run the following CLI command to add stack instances in the desired accounts and Regions to your CloudFormation StackSets. Replace the account IDs, Regions, and parameters before you run this command. You can refer to the syntax in the AWS CLI Command Reference. “NewLayerArn” is the ARN for your updated Lambda layer, while “OldLayerArn” is the original Lambda layer ARN.
```
aws cloudformation create-stack-instances \
--stack-set-name layers-remediation-stackset \
--accounts <LIST_OF_ACCOUNTS> \
--regions <YOUR_REGIONS> \
--parameter-overrides ParameterKey=NewLayerArn,ParameterValue='<NEW_LAYER_ARN>' ParameterKey=OldLayerArn,ParameterValue='=<OLD_LAYER_ARN>'
```
Run the following CLI command to verify that the stack instances are created successfully. The operation ID is returned as part of the output from step 3.
```
aws cloudformation describe-stack-set-operation \
  --stack-set-name layers-remediation-stackset \
  --operation-id <OPERATION_ID>
```

This CloudFormation StackSet deploys an EventBridge Scheduler that immediately triggers the AWS Config custom rule for evaluation. This rule, written in AWS CloudFormation Guard, detects all the Lambda functions in the member accounts currently using the outdated Lambda layer version. By using the Auto Remediation feature of AWS Config, the SSM automation document is run against each non-compliant Lambda function to update them with the new layer version.

Other considerations

The provided remediation CloudFormation StackSet uses the UpdateFunctionConfiguration API to modify your Lambda functions’ configurations directly. This method of updating may lead to drift from your original infrastructure as code (IaC) service, such as the CloudFormation stack that you used to provision the outdated Lambda functions. In this case, you might need to add an additional step to resolve drift from your original IaC service.

Alternatively, you might want to update your IaC code directly, referencing the latest version of the Lambda layer, instead of deploying the remediation CloudFormation StackSet as described in the previous section.

Cleaning up

Refer to the documentation for instructions on deleting all the created stack instances from your account. After, proceed to delete the CloudFormation StackSet.

Conclusion

Managing Lambda layers across multiple accounts and Regions can be challenging at scale. By using a combination of AWS Config, EventBridge Scheduler, AWS Systems Manager (SSM) Automation, and CloudFormation StackSets, it is possible to streamline the process.

The example provides on-demand visibility into affected Lambda functions and allows scheduled remediation of impacted functions. AWS SSM Automation further simplifies maintenance, deployment, and remediation tasks. With this architecture, you can efficiently manage updates to your Lambda layers and ensure compliance with your organization’s policies, saving time and reducing errors in your serverless applications.

To learn more about using Lambda layer, visit the AWS documentation. For more serverless learning resources, visit Serverless Land.

Building a secure webhook forwarder using an AWS Lambda extension and Tailscale

2023-09-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-secure-webhook-forwarder-using-an-aws-lambda-extension-and-tailscale/

This post is written by Duncan Parsons, Enterprise Architect, and Simon Kok, Sr. Consultant.

Webhooks can help developers to integrate with third-party systems or devices when building event based architectures.

However, there are times when control over the target’s network environment is restricted or targets change IP addresses. Additionally, some endpoints lack sufficient security hardening, requiring a reverse proxy and additional security checks to inbound traffic from the internet.

It can be complex to set up and maintain highly available secure reverse proxies to inspect and send events to these backend systems for multiple endpoints. This blog shows how to use AWS Lambda extensions to build a cloud native serverless webhook forwarder to meet this need with minimal maintenance and running costs.

The custom Lambda extension forms a secure WireGuard VPN connection to a target in a private subnet behind a stateful firewall and NAT Gateway. This example sets up a public HTTPS endpoint to receive events, selectively filters, and proxies requests over the WireGuard connection. This example uses a serverless architecture to minimize maintenance overhead and running costs.

Example overview

The sample code to deploy the following architecture is available on GitHub. This example uses AWS CodePipeline and AWS CodeBuild to build the code artifacts and deploys this using AWS CloudFormation via the AWS Cloud Development Kit (CDK). It uses Amazon API Gateway to manage the HTTPS endpoint and the Lambda service to perform the application functions. AWS Secrets Manager stores the credentials for Tailscale.

To orchestrate the WireGuard connections, you can use a free account on the Tailscale service. Alternatively, set up your own coordination layer using the open source Headscale example.

The event producer sends an HTTP request to the API Gateway URL.
API Gateway proxies the request to the Lambda authorizer function. It returns an authorization decision based on the source IP of the request.
API Gateway proxies the request to the Secure Webhook Forwarder Lambda function running the Tailscale extension.
On initial invocation, the Lambda extension retrieves the Tailscale Auth key from Secrets Manager and uses that to establish a connection to the appropriate Tailscale network. The extension then exposes the connection as a local SOCKS5 port to the Lambda function.
The Lambda extension maintains a connection to the Tailscale network via the Tailscale coordination server. Through this coordination server, all other devices on the network can be made aware of the running Lambda function and vice versa. The Lambda function is configured to refuse incoming WireGuard connections – read more about the --shields-up command here.
Once the connection to the Tailscale network is established, the Secure Webhook Forwarder Lambda function proxies the request over the internet to the target using a WireGuard connection. The connection is established via the Tailscale Coordination server, traversing the NAT Gateway to reach the Amazon EC2 instance inside a private subnet. The EC2 instance responds with an HTML response from a local Python webserver.
On deployment and every 60 days, Secrets Manager rotates the Tailscale Auth Key automatically. It uses the Credential Rotation Lambda function, which retrieves the OAuth Credentials from Secrets Manager and uses these to create a new Tailscale Auth Key using the Tailscale API and stores the new key in Secrets Manager.

To separate the network connection layer logically from the application code layer, a Lambda extension encapsulates the code required to form the Tailscale VPN connection and make this available to the Lambda function application code via a local SOCK5 port. You can reuse this connectivity across multiple Lambda functions for numerous use cases by attaching the extension.

To deploy the example, follow the instructions in the repository’s README. Deployment may take 20–30 minutes.

How the Lambda extension works

The Lambda extension creates the network tunnel and exposes it to the Lambda function as a SOCKS5 server running on port 1055. There are three stages of the Lambda lifecycle: init, invoke, and shutdown.

With the Tailscale Lambda extension, the majority of the work is performed in the init phase. The webhook forwarder Lambda function has the following lifecycle:

Init phase:
1. Extension Init – Extension connects to Tailscale network and exposes WireGuard tunnel via local SOCKS5 port.
2. Runtime Init – Bootstraps the Node.js runtime.
3. Function Init – Imports required Node.js modules.
Invoke phase:
1. The extension intentionally doesn’t register to receive any invoke events. The Tailscale network is kept online until the function is instructed to shut down.
2. The Node.js handler function receives the request from API Gateway in 2.0 format which it then proxies to the SOCKS5 port to send the request over the WireGuard connection to the target. The invoke phase ends once the function receives a response from the target EC2 instance and optionally returns that to API Gateway for onward forwarding to the original event source.
Shutdown phase:
1. The extension logs out of the Tailscale network and logs the receipt of the shutdown event.
2. The function execution environment is shut down along with the Lambda function’s execution environment.

Extension file structure

The extension code exists as a zip file along with some metadata set at the time the extension is published as an AWS Lambda layer. The zip file holds three folders:

/extensions – contains the extension code and is the directory that the Lambda service looks for code to run when the Lambda extension is initialized.
/bin –includes the executable dependencies. For example, within the tsextension.sh script, it runs the tailscale, tailscaled, curl, jq, and OpenSSL binaries.
/ssl –stores the certificate authority (CA) trust store (containing the root CA certificates that are trusted to connect with). OpenSSL uses these to verify SSL and TLS certificates.

The tsextension.sh file is the core of the extension. Most of the code is run in the Lambda function’s init phase. The extension code is split into three stages. The first two stages relate to the Lambda function init lifecycle phase, with the third stage covering invoke and shutdown lifecycle phases.

Extension phase 1: Initialization

In this phase, the extension initializes the Tailscale connection and waits for the connection to become available.

The first step retrieves the Tailscale auth key from Secrets Manager. To keep the size of the extension small, the extension uses a series of Bash commands instead of packaging the AWS CLI to make the Sigv4 requests to Secrets Manager.

The temporary credentials of the Lambda function are made available as environment variables by the Lambda execution environment, which the extension uses to authenticate the Sigv4 request. The IAM permissions to retrieve the secret are added to the Lambda execution role by the CDK code. To optimize security, the secret’s policy restricts reading permissions to (1) this Lambda function and (2) Lambda function that rotates it every 60 days.

The Tailscale agent starts using the Tailscale Auth key. Both the tailscaled and tailscale binaries start in userspace networking mode, as each Lambda function runs in its own container on its own virtual machine. More information about userspace networking mode can be found in the Tailscale documentation.

With the Tailscale processes running, the process must wait for the connection to the Tailnet (the name of a Tailscale network) to be established and for the SOCKS5 port to be available to accept connections. To accomplish this, the extension simply waits for the ‘tailscale status’ command not to return a message with ‘stopped’ in it and then moves on to phase 2.

Extension phase 2: Registration

The extension now registers itself as initialized with the Lambda service. This is performed by sending a POST request to the Lambda service extension API with the events that should be forwarded to the extension.

The runtime init starts next (this initializes the Node.js runtime of the Lambda function itself), followed by the function init (the code outside the event handler). In the case of the Tailscale Lambda extension, it only registers the extension to receive ‘SHUTDOWN’ events. Once the SOCKS5 service is up and available, there is no action for the extension to take on each subsequent invocation of the function.

Extension phase 3: Event processing

To signal the extension is ready to receive an event, a GET request is made to the ‘next’ endpoint of the Lambda runtime API. This blocks the extension script execution until a SHUTDOWN event is sent (as that is the only event registered for this Lambda extension).

When this is sent, the extension logs out of the Tailscale service and the Lambda function shuts down. If INVOKE events are also registered, the extension processes the event. It then signals back to the Lambda runtime API that the extension is ready to receive another event by sending a GET request to the ‘next’ endpoint.

Access control

A sample Lambda authorizer is included in this example. Note that it is recommended to use the AWS Web Application Firewall service to add additional protection to your public API endpoint, as well as hardening the sample code for production use.

For the purposes of this demo, the implementation demonstrates a basic source IP CIDR range restriction, though you can use any property of the request to base authorization decisions on. Read more about Lambda authorizers for HTTP APIs here. To use the source IP restriction, update the CIDR range of the IPs you want to accept on the Lambda authorizer function AUTHD_SOURCE_CIDR environment variable.

Costs

You are charged for all the resources used by this project. The NAT Gateway and EC2 instance are destroyed by the pipeline once the final pipeline step is manually released to minimize costs. The AWS Lambda Power Tuning tool can help find the balance between performance and cost while it polls the demo EC2 instance through the Tailscale network.

The following result shows that 256 MB of memory is the optimum for the lowest cost of execution. The cost is estimated at under $3 for 1 million requests per month, once the demo stack is destroyed.

Conclusion

Using Lambda extensions can open up a wide range of options to extend the capability of serverless architectures. This blog shows a Lambda extension that creates a secure VPN tunnel using the WireGuard protocol and the Tailscale service to proxy events through to an EC2 instance inaccessible from the internet.

This is set up to minimize operational overhead with an automated deployment pipeline. A Lambda authorizer secures the endpoint, providing the ability to implement custom logic on the basis of the request contents and context.

For more serverless learning resources, visit Serverless Land.

Enhancing file sharing using Amazon S3 and AWS Step Functions

2023-08-29 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/enhancing-file-sharing-using-amazon-s3-and-aws-step-functions/

This post is written by Islam Elhamaky, Senior Solutions Architect and Adrian Tadros, Senior Solutions Architect.

Amazon S3 is a cloud storage service that many customers use for secure file storage. S3 offers a feature called presigned URLs to generate temporary download links, which are effective and secure way to upload and download data to authorized users.

There are times when customers need more control over how data is accessed. For example, they may want to limit downloads based on IAM roles instead of presigned URLs, or limit the number of downloads per object to control data access costs. Additionally, it can be useful to track individuals access those download URLs.

This blog post presents an example application that can provide this extra functionality, using AWS serverless services.

Overview

The code included in this example uses a variety of serverless services:

Amazon API Gateway receives all incoming requests from users and authorizes access using Amazon Cognito.
AWS Step Functions coordinates file sharing and downloading activities such as user validation, checking download eligibility, recording events, request routing, and response formatting.
AWS Lambda implements admin activities such as retrieving metadata, listing files and deletion.
Amazon DynamoDB stores permissions to ensure users only have access to files that have been shared with them.
Amazon S3 provides durable storage for users to upload and download files.
Amazon Athena provides an efficient way to query S3 Access Logs to extract download and bandwidth usage.
Amazon QuickSight provides a visual dashboard to view download and bandwidth analytics.

AWS Cloud Development Kit (AWS CDK) deploys the AWS resources and can plug into your preferred CI/CD process.

Architecture Overview

User Interface: The front end is a static React single page application hosted on S3 and served via Amazon CloudFront. The UI uses AWS NorthStar and Cloudscape design components. Amplify UI simplifies interactions with Amazon Cognito such as providing the ability to log in, sign up, and perform email verification.
API Gateway: Users interact via an API Gateway REST API.
Authentication: Amazon Cognito manages user identities and access. Users sign up using their email address and then verify their email address. Requests to the API include an access token, which is verified using a Amazon Cognito authorizer.
Microservices: The core operations are built with Lambda. The primary workflows allow users to share and download files and Step Functions orchestrates multiple steps in the process. These can include validating requests, authorizing that users have the correct permissions to access files, sending notifications, auditing, and keeping tracking of who is accessing files.
Permission store: DynamoDB stores essential information about files such as ownership details and permissions for sharing. It tracks who owns a file and who has been granted access to download it.
File store: An S3 bucket is the central file repository. Each user has a dedicated folder within the S3 bucket to store files.
Notifications: The solution uses Amazon Simple Notification Service (SNS) to send email notifications to recipients when a file is shared.
Analytics: S3 Access Logs are generated whenever users download or upload files to the file storage bucket. Amazon Athena filters these logs to generate a download report, extracting key information (such as the identity of the users who downloaded files and the total bandwidth consumed during the downloads).
Reporting: Amazon QuickSight provides an interface for administrators to view download reports and dashboards.

Walkthrough

As prerequisites, you need:

Node.js version 16+.
AWS CLI version 2+.
An AWS account and a profile set up on your computer.

Follow the instructions in the code repository to deploy the example to your AWS account. Once the application is deployed, you can access the user interface.

In this example, you walk through the steps to create upload a file and share it with a recipient:

The example requires users to identify themselves using an email address. Choose Create Account then Sign In with your credentials.
Select Share a file.
Select Choose file to browse and select file to share. Choose Next.
You must populate at least one recipient. Choose Add recipient to add more recipients. Choose Next.
Set Expire date and Limit downloads to configure share expiry date and limit the number of allowed downloads. Choose Next.
Review the share request details. You can navigate to previous screens to modify. Choose Submit once done.
Choose My files to view your shared file.

Extending the solution

The example uses Step Functions to allow you to extend and customize the workflows. This implements a default workflow, providing you with the ability to override logic or introduce new steps to meet your requirements.

This section walks through the default behavior of the Share File and Download File Step Functions workflows.

The Share File workflow

The share file workflow consists of the following steps:

Validate: check that the share request contains all mandatory fields.
Get User Info: retrieve the logged in user’s information such as name and email address from Amazon Cognito.
Authorize: check the permissions stored in DynamoDB to verify if the user owns the file and has permission to share the file.
Audit: record the share attempt for auditing purposes.
Process: update the permission store in DynamoDB.
Send notifications: send email notifications to recipients to let them know that a new file has been shared with them.

The Download File workflow

The download file workflow consists of the following steps:

Validate: check that the download request contains the required fields (for example, user ID and file ID).
Get user info: retrieve the user’s information from Amazon Cognito such as their name and email address.
Authorize: check the permissions store in DynamoDB to check if the user owns the file or is valid recipient with permissions to download the file.
Audit: record the download attempt.
Process: generate a short-lived S3 pre-signed download URL and return to the user.

Step Functions API data mapping

The example uses API Gateway request and response data mappings to allow the REST API to communicate directly with Step Functions. This section shows how to customize the mapping based on your use case.

Request data mapping

The API Gateway REST API uses Apache VTL templates to transform and construct requests to the underlying service. This solution abstracts the construction of these templates using a CDK construct:

api.root
.addResource('share')
.addResource('{fileId}')
.addMethod(
  'POST',
   StepFunctionApiIntegration(shareStepFunction, [
      { name: 'fileId', sourceType: 'params' },
      { name: 'recipients', sourceType: 'body' },
      /* your custom input fields */
   ]),
   authorizerSettings,
);

The StepFunctionApiIntegration construct handles the request mapping allowing you to extract fields from the incoming API request and pass these as inputs to a Step Functions workflow. This generates the following VTL template:

{
  "name": "$context.requestId",
  "input": "{\"userId\":\"$context.authorizer.claims.sub\",\"fileId\":\"$util.escap eJavaScript($input.params('fileId'))\",\"recipients\":$util.escapeJavaScript($input.json('$.recipients'))}",
  "stateMachineArn": "...stateMachineArn"
}

In this scenario, fields are extracted from the API request parameters, body, and authorization header and passed to the workflow. You can customize the configuration to meet your requirements.

Response data mapping

The example has response mapping templates using Apache VTL. The output of the last step in a workflow is mapped as a JSON response and returned to the user through API Gateway. The response also includes CORS headers:

#set($context.responseOverride.header.Access-Control-Allow-Headers = '*')
#set($context.responseOverride.header.Access-Control-Allow-Origin = '*')
#set($context.responseOverride.header.Access-Control-Allow-Methods = '*')
#if($input.path('$.status').toString().equals("FAILED"))
#set($context.responseOverride.status = 500)
{
  "error": "$input.path('$.error')",
  "cause": "$input.path('$.cause')"
}
#else
  $input.path('$.output')
#end

You can customize this response template to meet your requirements. For example, you may provide custom behavior for different response codes.

Conclusion

In this blog post, you learn how you can securely share files with authorized external parties and track their access using AWS serverless services. The sample application presented uses Step Functions to allow you to extend and customize the workflows to meet your use case requirements.

To learn more about the concepts discussed, visit:

The example application’s GitHub repo
Sharing objects with presigned URLs
Using AWS Step Functions with other services
Cloudscape design system for the cloud
Mapping template utility reference

For more serverless learning resources, visit Serverless Land. Learn about data processing in Step Functions by reading the guide: Introduction to Distributed Map for Serverless Data Processing.

Protecting an AWS Lambda function URL with Amazon CloudFront and Lambda@Edge

2023-08-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/protecting-an-aws-lambda-function-url-with-amazon-cloudfront-and-lambdaedge/

This post is written by Jerome Van Der Linden, Senior Solutions Architect Builder.

A Lambda function URL is a dedicated HTTPs endpoint for an AWS Lambda function. When configured, you can invoke the function directly with an HTTP request. You can choose to make it public by setting the authentication type to NONE for an open API. Or you can protect it with AWS IAM, setting the authentication type to AWS_IAM. In that case, only authenticated users and roles are able to invoke the function via the function URL.

Lambda@Edge is a feature of Amazon CloudFront that can run code closer to the end user of an application. It is generally used to manipulate incoming HTTP requests or outgoing HTTP responses between the user client and the application’s origin. In particular, it can add extra headers to the request (‘Authorization’, for example).

This blog post shows how to use CloudFront and Lambda@Edge to protect a Lambda function URL configured with the AWS_IAM authentication type by adding the appropriate headers to the request before it reaches the origin.

Overview

There are four main components in this example:

Lambda functions with function URLs enabled: This is the heart of the ‘application’, the functions that contain the business code exposed to the frontend. The function URL is configured with AWS_IAM authentication type, so that only authenticated users/roles can invoke it.
A CloudFront distribution: CloudFront is a content delivery network (CDN) service used to deliver content to users with low latency. It also improves the security with traffic encryption and built-in DDoS protection. In this example, using CloudFront in front of the Lambda URL can add this layer of security and potentially cache content closer to the users.
A Lambda function at the edge: CloudFront also provides the ability to run Lambda functions close to the users: Lambda@Edge. This example does this to sign the request made to the Lambda function URL and adds the appropriate headers to the request so that invocation of the URL is authenticated with IAM.
A web application that invokes the Lambda function URLs: The example also contains a single page application built with React, from which the users make requests to one or more Lambda function URLs. The static assets (for example, HTML and JavaScript files) are stored in Amazon S3 and also exposed and cached by CloudFront.

This is the example architecture:

The request flow is:

The user performs requests via the client to reach static assets from the React application or Lambda function URLs.
For a static asset, CloudFront retrieves it from S3 or its cache and returns it to the client.
If the request is for a Lambda function URL, it first goes to a Lambda@Edge. The Lambda@Edge function has the lambda:InvokeFunctionUrl permission on the target Lambda function URL and uses this to sign the request with the signature V4. It adds the Authorization, X-Amz-Security-Token, and X-Amz-Date headers to the request.
After the request is properly signed, CloudFront forwards it to the Lambda function URL.
Lambda triggers the execution of the function that performs any kind of business logic. The current solution is handling books (create, get, update, delete).
Lambda returns the response of the function to CloudFront.
Finally, CloudFront returns the response to the client.

There are several types of events where a Lambda@Edge function can be triggered:

Viewer request: After CloudFront receives a request from the client.
Origin request: Before the request is forwarded to the origin.
Origin response: After CloudFront receives the response from the origin.
Viewer response: Before the response is sent back to the client.

The current example, to update the request before it is sent to the origin (the Lambda function URL), uses the “Origin Request” type.

You can find the complete example, based on the AWS Cloud Development Kit (CDK), on GitHub.

Backend stack

The backend contains the different Lambda functions and Lambda function URLs. It uses the AWS_IAM auth type and the CORS (Cross Origin Resource Sharing) definition when adding the function URL to the Lambda function. Use a more restrictive allowedOrigins for a real application.

const getBookFunction = new NodejsFunction(this, 'GetBookFunction', {
    runtime: Runtime.NODEJS_18_X,  
    memorySize: 256,
    timeout: Duration.seconds(30),
    entry: path.join(__dirname, '../functions/books/books.ts'),
    environment: {
      TABLE_NAME: bookTable.tableName
    },
    handler: 'getBookHandler',
    description: 'Retrieve one book by id',
});
bookTable.grantReadData(getBookFunction);
const getBookUrl = getBookFunction.addFunctionUrl({
    authType: FunctionUrlAuthType.AWS_IAM,
    cors: {
        allowedOrigins: ['*'],
        allowedMethods: [HttpMethod.GET],
        allowedHeaders: ['*'],
        allowCredentials: true,
    }
});

Frontend stack

The Frontend stack contains the CloudFront distribution and the Lambda@Edge function. This is the Lambda@Edge definition:

const authFunction = new cloudfront.experimental.EdgeFunction(this, 'AuthFunctionAtEdge', {
    handler: 'auth.handler',
    runtime: Runtime.NODEJS_16_X,  
    code: Code.fromAsset(path.join(__dirname, '../functions/auth')),
 });

The following policy allows the Lambda@Edge function to sign the request with the appropriate permission and to invoke the function URLs:

authFunction.addToRolePolicy(new PolicyStatement({
    sid: 'AllowInvokeFunctionUrl',
    effect: Effect.ALLOW,
    actions: ['lambda:InvokeFunctionUrl'],
    resources: [getBookArn, getBooksArn, createBookArn, updateBookArn, deleteBookArn],
    conditions: {
        "StringEquals": {"lambda:FunctionUrlAuthType": "AWS_IAM"}
    }
}));

The function code uses the AWS JavaScript SDK and more precisely the V4 Signature part of it. There are two important things here:

The service for which we want to sign the request: Lambda
The credentials of the function (with the InvokeFunctionUrl permission)

const request = new AWS.HttpRequest(new AWS.Endpoint(`https://${host}${path}`), region);
// ... set the headers, body and method ...
const signer = new AWS.Signers.V4(request, 'lambda', true);
signer.addAuthorization(AWS.config.credentials, AWS.util.date.getDate());

You can get the full code of the function here.

CloudFront distribution and behaviors definition

The CloudFront distribution has a default behavior with an S3 origin for the static assets of the React application.

It also has one behavior per function URL, as defined in the following code. You can notice the configuration of the Lambda@Edge function with the type ORIGIN_REQUEST and the behavior referencing the function URL:

const getBehaviorOptions: AddBehaviorOptions  = {
    viewerProtocolPolicy: ViewerProtocolPolicy.HTTPS_ONLY,
    cachePolicy: CachePolicy.CACHING_DISABLED,
    originRequestPolicy: OriginRequestPolicy.CORS_CUSTOM_ORIGIN,
    responseHeadersPolicy: ResponseHeadersPolicy.CORS_ALLOW_ALL_ORIGINS_WITH_PREFLIGHT,
    edgeLambdas: [{
        functionVersion: authFunction.currentVersion,
        eventType: LambdaEdgeEventType.ORIGIN_REQUEST,
        includeBody: false, // GET, no body
    }],
    allowedMethods: AllowedMethods.ALLOW_GET_HEAD_OPTIONS,
}
this.distribution.addBehavior('/getBook/*', new HttpOrigin(Fn.select(2, Fn.split('/', getBookUrl)),), getBehaviorOptions);

Regional consideration

The Lambda@Edge function must be in the us-east-1 Region (N. Virginia), as does the frontend stack. If you deploy the backend stack in another Region, you’ll must pass the Lambda function URLs (and ARNs) to the frontend. Using a custom resource in CDK, it’s possible to create parameters in AWS Systems Manager Parameter Store in the us-east-1 Region containing this information. For more details, review the code in the GitHub repo.

Walkthrough

Before deploying the solution, follow the README in the GitHub repo and make sure to meet the prerequisites.

Deploying the solution

From the solution directory, install the dependencies:
```
npm install
```
Start the deployment of the solution (it can take up to 15 minutes):
```
cdk deploy --all
```
Once the deployment succeeds, the outputs contain both the Lambda function URLs and the URLs “protected” behind the CloudFront distribution:

Testing the solution

Using cURL, query the Lambda Function URL to retrieve all books (GetBooksFunctionURL in the CDK outputs):
```
curl -v https://qwertyuiop1234567890.lambda-url.eu-west-1.on.aws/
```
You should get the following output. As expected, it’s forbidden to directly access the Lambda function URL without the proper IAM authentication:
Now query the “protected” URL to retrieve all books (GetBooksURL in the CDK outputs):
```
curl -v https://q1w2e3r4t5y6u.cloudfront.net/getBooks
```
This time you should get a HTTP 200 OK with an empty list as a result.

The logs of the Lambda@Edge function (search for “AuthFunctionAtEdge” in CloudWatch Logs in the closest Region) show:

The incoming request:
The signed request, with the additional headers (Authorization, X-Amz-Security-Token, and X-Amz-Date). These headers make the difference when the Lambda URL receives the request and validates it with IAM.

You can test the complete solution throughout the frontend, using the FrontendURL in the CDK outputs.

Cleaning up

The Lambda@Edge function is replicated in all Regions where you have users. You must delete the replicas before deleting the rest of the solution.

To delete the deployed resources, run the cdk destroy --all command from the solution directory.

Conclusion

This blog post shows how to protect a Lambda Function URL, configured with IAM authentication, using a CloudFront distribution and Lambda@Edge. CloudFront helps protect from DDoS, and the function at the edge adds appropriate headers to the request to authenticate it for Lambda.

Lambda function URLs provide a simpler way to invoke your function using HTTP calls. However, if you need more advanced features like user authentication with Amazon Cognito, request validation or rate throttling, consider using Amazon API Gateway.

For more serverless learning resources, visit Serverless Land.

Implementing the transactional outbox pattern with Amazon EventBridge Pipes

2023-08-16 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/implementing-the-transactional-outbox-pattern-with-amazon-eventbridge-pipes/

This post is written by Sayan Moitra, Associate Solutions Architect, and Sangram Sonawane, Senior Solutions Architect.

Microservice architecture is an architectural style that structures an application as a collection of loosely coupled and independently deployable services. Services must communicate with each other to exchange messages and perform business operations. Ensuring message reliability while maintaining loose coupling between services is crucial for building robust and scalable systems.

This blog demonstrates how to use Amazon DynamoDB, a fully managed serverless key-value NoSQL database, and Amazon EventBridge, a managed serverless event bus, to implement reliable messaging for microservices using the transactional outbox pattern.

Business operations can span across multiple systems or databases to maintain consistency and synchronization between them. One approach often used in distributed systems or architectures where data must be replicated across multiple locations or components is dual writes. In a dual write scenario, when a write operation is performed on one system or database, the same data or event also triggers another system in real-time or near real-time. This ensures that both systems always have the same data, minimizing data inconsistencies.

Dual writes can also introduce data integrity challenges in distributed systems. Failure to update the database or to send events to other downstream systems after an initial system update can lead to data loss and leave the application in an inconsistent state. One design approach to overcome this challenge is to combine dual writes with the transactional outbox pattern.

Challenges with dual writes

Consider an online food ordering application to illustrate the challenges with dual writes. Once the user submits the order, the order service updates the order status in a persistent data store. The order status update should also be sent to notify_restaurant and order_tracking services using a message bus for asynchronous communication. After successfully updating the order status in the database, the order service writes the event to the message bus. The order_service performs a dual write operation of updating the database and publishing the event details on the message bus for other services to read.

This approach works until there are issues encountered in publishing the event to the message bus. Publishing events can fail for multiple reasons like a network error or a message bus outage. When failure occurs, the notify_restaurant and order_tracking service will not be notified of the order update event, leaving the system in an inconsistent state. Implementing the transactional outbox pattern with dual writes can help ensure reliable messaging between systems after a database update.

This illustration shows a sequence diagram for an online food ordering application and the challenges with dual writes:

Overview of the transactional outbox pattern

In the transactional outbox pattern, a second persistent data store is introduced to store the outgoing messages. In the online food order example, updating the database with order details and storing the event information in the outbox table becomes a single atomic transaction.

The transaction is only successful when writing to both the database and the outbox table. Any failures to write to the outbox table rolls back the transaction. A separate process then reads the event from the outbox table and publishes the event on the message bus. Once the message is available on the message bus, it can be read by the notify_restaurant and order_tracking services. Combining transactional outbox pattern with dual writes allows for data consistency across systems and reliable message delivery with the transactional context.

The following illustration shows a sequence diagram for an online food ordering application with transactional outbox pattern for reliable message delivery.

Implementing the transaction outbox pattern

DynamoDB includes a feature called DynamoDB Streams to capture a time-ordered sequence of item-level modifications in the DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near real time.

Whenever an application creates, updates, or deletes items in the table, DynamoDB Streams writes a stream record with the primary key attributes of the items that were modified. A stream record contains information about a data modification to a single item in a DynamoDB table. DynamoDB Streams writes stream records in near real time and these can be consumed for processing based on the contents. Enabling this feature removes the need to maintain a separate outbox table and lowers the management and operational overhead.

EventBridge Pipes connects event producers to consumers with options to transform, filter, and enrich messages. EventBridge Pipes can integrate with DynamoDB Streams to capture table events without writing any code. There is no need to write and maintain a separate process to read from the stream. EventBridge Pipes also supports retries, and any failed events can be routed to a dead-letter queue (DLQ) for further analysis and reprocessing.

EventBridge polls shards in DynamoDB stream for records and invokes pipes as soon as records are available. You can configure this to read records from DynamoDB only when it has gathered a specified batch size or the batch window expires. Pipes maintains the order of records from the data stream when sending that data to the destination. You can optionally filter or enhance these records before sending them to a target for processing.

Example overview

The following diagram illustrates the implementation of transactional outbox pattern with DynamoDB Streams and EventBridge Pipe. Amazon API Gateway is used to trigger a DynamoDB operation via a POST request. The change in the DynamoDB triggers an EventBridge event bus via Amazon EventBridge Pipes. This event bus invokes the Lambda functions through an SQS Queue, depending on the filters applied.

In this sample implementation, Amazon API Gateway makes a POST call to the DynamoDB table for database updates. Amazon API Gateway supports CRUD operations for Amazon DynamoDB without the need of a compute layer for database calls.
DynamoDB Streams is enabled on the table, which captures a time-ordered sequence of item-level modifications in the DynamoDB table in near real time.
EventBridge Pipes integrates with DynamoDB Streams to capture the events and can optionally filter and enrich the data before it is sent to a supported target. In this example, events are sent to Amazon EventBridge, which acts as a message bus. This can be replaced with any of the supported targets as detailed in Amazon EventBridge Pipes targets. DLQ can be configured to handle any failed events, which can be analyzed and retried.
Consumers listening to the event bus receive messages. You can optionally fan out and deliver the events to multiple consumers and apply filters. You can configure a DLQ to handle any failures and retries.

Prerequisites

AWS SAM CLI, version 1.85.0 or higher
Python 3.10

Deploying the example application

Clone the repository:

git clone https://github.com/aws-samples/amazon-eventbridge-pipes-dynamodb-stream-transactional-outbox.git

Change to the root directory of the project and run the following AWS SAM CLI commands:

cd amazon-eventbridge-pipes-dynamodb-stream-transactional-outbox               
sam build
sam deploy --guided

Enter the name for your stack during guided deployment. During the deploy process, select the default option for all the additional steps.
The resources are deployed.

Testing the application

Once the deployment is complete, it provides the API Gateway URL in the output. You can test using that URL. To test the application, use Postman to make a POST call to API Gateway prod URL:

You can also test using the curl command:

curl -s --header "Content-Type: application/json" \
  --request POST \
  --data '{"Status":"Created"}' \
  <API_ENDPOINT>

This produces the following output:

To verify if the order details are updated in the DynamoDB table, run this command for performing a scan operation on the table.

aws dynamodb scan \
    --table-name <DynamoDB Table Name>

Handling failures

DynamoDB Streams captures a time-ordered sequence of item-level modifications in the DynamoDB table and stores this information in a log for up to 24 hours. If EventBridge is unavailable to read from DynamoDB Stream due to misconfiguration, for example, the records are available in the log for 24 hours. Once EventBridge is reintegrated, it retrieves all undelivered records from the last 24 hours. For integration issues between EventBridge Pipes and the target application, all failed messages can be sent to the DLQ for reprocessing at a later time.

Cleaning up

To clean up your AWS based resources, run following AWS SAM CLI command, answering “y” to all questions:

sam delete --stack-name <stack_name>

Conclusion

Reliable interservice communication is an important consideration in microservice design, especially when faced with dual writes. Combining the transactional outbox pattern with dual writes provides a robust way of improving message reliability.

This blog demonstrates an architecture pattern to tackle the challenge of dual writes by combining it with the transactional outbox pattern using DynamoDB and EventBridge Pipes. This solution provides a no-code approach with AWS Managed Services, reducing management and operational overhead.

For more serverless learning resources, visit Serverless Land.

Using response streaming with AWS Lambda Web Adapter to optimize performance

2023-08-07 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-response-streaming-with-aws-lambda-web-adapter-to-optimize-performance/

This post is written by Harold Sun, Senior Serverless SSA, AWS GCR, Xue Jiaqing, Solutions Architect, AWS GCR, and Su Jie, Associate Solution Architect, AWS GCR.

AWS Lambda now supports Lambda response streaming, which introduces a new invocation mode accessible through the Lambda Function URLs. This feature enables Lambda functions to send response content in sequential chunks to the client. It is available for Lambda’s Node.js runtime, custom runtimes, and can be accessed using the InvokeWithResponseStream API in Lambda.

The Lambda Web Adapter, written in Rust, serves as a universal adapter for Lambda Runtime API and HTTP API. It allows developers to package familiar HTTP 1.1/1.0 web applications, such as Express.js, Next.js, Flask, SpringBoot, or Laravel, and deploy them on AWS Lambda. This replaces the need to modify the web application to accommodate Lambda’s input and output formats, reducing the complexity of adapting code to meet Lambda’s requirements.

When using other managed runtimes such as Java, Go, Python, or Ruby, developers can use the Lambda Web Adapter to build applications that support Lambda response streaming more easily.

Implementing response streaming with Lambda Web Adapter

In general, you can regard Lambda Web Adapter as an extension of Lambda, which is integrated into Lambda’s runtime environment using the Lambda Extension API. It operates within an independent process space when the Lambda function is invoked and serves as a custom runtime. When the function is run, the Web Adapter starts alongside the packaged web application.

After initialization, it performs a readiness check on the configured web application’s port every 10ms (the default is 8080, but you can configure other ports using environment variables). Once it receives an HTTP response with an “200 OK” status from the web application, it encapsulates the received Lambda invocation parameters according to the HTTP protocol and sends a request to the running web application.

Once the web application responds to this request, the Web Adapter formats the response content according to the function’s response format and sends it to the client, completing one invocation of the function.

Similarly, the Lambda Web Adapter uses the Custom Runtime API to implement response streaming. When implementing a function using response streaming:

The Web Adapter sends a POST request to the Lambda Runtime’s Response API, including the Lambda-Runtime-Function-Response-Mode HTTP header with the value streaming and the Transfer-Encoding HTTP header with the value chunked:
```
POST http://${AWS_LAMBDA_RUNTIME_API}/runtime/invocation/${AwsRequestId}/response
Lambda-Runtime-Function-Response-Mode: streaming
Transfer-Encoding: chunked
```
It encodes the response data according to the HTTP/1.1 Chunked Transfer Encoding protocol specification and sends it as the “Body” to the Lambda Runtime’s Response API.
After assembling the response and completing the data transmission, the Web Adapter closes the underlying network connection.

Under normal circumstances, completing these steps enables Lambda response streaming in a function. However, this is not sufficient for web application scenarios. Web applications must often send custom HTTP response status codes, custom HTTP headers, and some cookie data to the client. The previous steps only achieve streaming of the response body, and cannot add content to the response’s HTTP headers.

To add these, when sending the response content to the Response API, you must:

Add a Content-Type HTTP Header to specify the MIME type (original media type) of the response as application/vnd.awslambda.http-integration-response.
Send the custom response headers, such as HTTP status code, customer headers, and cookies, in JSON format.
Send 8 NULL characters as separators.
Send the response content encoded using the HTTP 1.1 Chunked Transfer Encoding protocol.

Here is an example of the response format:

POST http://${AWS_LAMBDA_RUNTIME_API}/runtime/invocation/${AwsRequestId}/response
Lambda-Runtime-Function-Response-Mode: streaming
Transfer-Encoding: chunked
Content-Type: application/vnd.awslambda.http-integration-response
{
    "statusCode":200,
    "headers":{
        "Content-Type":"text/html",
        "custom-header":"outer space"
    },
    "cookies":[
        "language=xxx",
        "theme=abc"
    ]
}
8 NULL characters
Chunked response body

In Lambda Function URLs, multi-value HTTP headers are not supported. As a result, you cannot implement responses with multi-value HTTP headers in the Lambda Web Adapter.

Using response streaming with Lambda Web Adapter

When packaging Lambda functions using the zip format, you must attach the Lambda Web Adapter as a layer and configure the environment variable AWS_LAMBDA_EXEC_WRAPPER with the value /opt/bootstrap.

After that, you can configure the startup script of the web application as the Lambda function’s handler. By doing this, the function is able to use the Lambda Web Adapter, and the web application can be launched and run within the Lambda runtime environment.

The AWS_LAMBDA_EXEC_WRAPPER environment variable points to the bootstrap script provided by the Web Adapter to ensure the proper execution of the web application.

When using a Docker Image or OCI Image to package the Lambda function, you only need to include the Lambda Web Adapter binary package in the Dockerfile by copying it to the /opt/extensions directory within the image. Additionally, you should specify the port on which the web application listens by setting the PORT environment variable:

COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.7.0 /lambda-adapter /opt/extensions/lambda-adapter

ENV PORT=3000

By default, the Web Adapter is invoked using the buffered mode. To use response streaming as the invocation mode in the function, you must configure an environment variable. Specify the function’s Web Adapter invocation mode as response_stream:

ENV AWS_LWA_INVOKE_MODE=response_stream

Due to the different data formats between the buffered and response stream invocation modes, you must configure the AWS_LWA_INVOKE_MODE to have the same behavior as the InvokeMode specified in the Lambda Function URLs. Otherwise, the client may not process the response content correctly.

Lambda response streaming example

Server-side rendering (SSR) can accelerate the loading time of a React application. With SSR, the the server generates the HTML pages and sends them to the client, which renders the content. The browser executes the hydration process, which “wakes up” the static components from the received HTML and mounts them into the React application. This allows for a faster response to user interactions and improves the overall user experience.

By using Lambda response streaming, your application can achieve a faster TTFB by processing response content in sequential chunks. This helps to reduce the time it takes for the initial data to be sent from the server to the client, enhancing overall performance.

The hydration process can introduce delays as the client-side JavaScript must re-render and rehydrate the page after the initial load. Lambda response streaming minimizes the need for full page hydration, leading to an improved user experience.

Next.js 13’s support for streaming with suspense complements the Lambda response streaming feature, allowing you to use both SSR and selective hydration. This combination can lead to greater improvements in performance and user experience for your Next.js applications.

This GitHub repo demonstrates a Next.js application that supports Lambda response streaming using the Web Adapter and the streaming with suspense feature. Use AWS Serverless Application Model (AWS SAM) to deploy the application to test these optimizations:

git clone [email protected]:aws-samples/lwa-nextjs-response-streaming-example.git
cd lwa-nextjs-response-streaming-example.git

sam build
sam deploy -g --stack-name lambda-web-adapter-nextjs-response-streaming-example

After the sam deploy process is completed, you can access the Lambda Function URLs endpoint provided in the output. Here is the output of the Lambda response streaming Next.js application demo:

Example output

Quotas and pricing

Web Adapter is an enhancement to Lambda and does not incur additional costs. You are only charged for the Lambda function usage based on the resources consumed.

However, response streaming may result in additional network costs. You are billed for any part of the response that exceeds 6MB. For more information, refer to the pricing page.

There is a maximum response size limit of 20MB for Lambda response streaming. This is a soft limit, and you can request to increase this limit by creating a support ticket.

The response speed of Lambda response streaming depends on the size of the response body. The transfer rate for the first 6MB is not limited, but any part of the response beyond 6MB has a maximum throughput of 2MB/s. For more detailed information, refer to Bandwidth limits for response streaming.

Conclusion

Lambda response streaming can improve the TTFB for web pages. With the support of AWS Lambda Web Adapter, developers can more easily package web applications that support Lambda response streaming, enhancing the user experience and performance metrics of their web applications.

For more serverless learning resources, visit Serverless Land.