Tag Archives: contributed

AWS Lambda: Resilience under-the-hood

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/aws-lambda-resilience-under-the-hood/

This post is written by Adrian Hornsby (Principal System Dev Engineer) and Marcia Villalba (Principal Developer Advocate).

AWS Lambda comprises over 80 services working together to provide the serverless compute service that it offers to customers. Under the hood, many of these services are built on top of Amazon Elastic Compute Cloud (Amazon EC2) instances, provisioned within Availability Zones. However, AWS Lambda is a Regional service. This means that customers use Lambda services from the Region level and its services are designed to be resilient to impairments that the underlying Availability Zones might have.

This blog post discusses how a Regional service such as Lambda takes advantage of Availability Zones and static stability to achieve its high availability target, and shows how Lambda teams verify their service’s static stability using AWS Fault Injection Simulator (AWS FIS). It also provides a solution using AWS services and tools to achieve Lambda’s resiliency strategy, using FIS, Amazon CloudWatch, and Amazon Route 53 Application Recovery Controller (Route 53 ARC).

The role of Availability Zones

Availability Zones are physically isolated sections of an AWS Region, designed to operate but also fail independently. They are separated by a meaningful distance from each other, up to 100 kilometers (60 miles), to prevent correlated failures, but close enough to use synchronous replication with single-digit millisecond latency.

Customers and AWS services have been using Availability Zones for years to build highly available, fault tolerant, and scalable applications. In particular, AWS Regional services such as AWS Lambda, Amazon DynamoDB, Amazon Simple Queue Service (Amazon SQS), and Amazon Simple Storage Service (Amazon S3), have achieved their high availability promises by spreading multiple independent replicas of their services across multiple Availability Zones. It uses the principles of independence and redundancy of Availability Zones to maximize the overall availability of that service.

Each replica is called a zonal replica. The system is designed so that any of the replicas can fail at any time. When a replica fails, it can be temporarily removed from the system until everything works as expected again. When that happens, the load is shared between the remaining zonal replicas.

Designing for failures

One lesson we learned at AWS when building services is when there is an Availability Zone impairment, it is better not to rely on control plane operations to remediate the failure. A control plane operation can, for example, be provisioning more capacity in an Availability Zone that is not affected by the impairment.

This principle is called static stability, and it describes the capability for a system to keep its original steady-state (or behavior) even when subjected to disruptive events without having to make any changes. A statically stable service should have as few dependencies as possible for its recovery process.

For a Regional service like AWS Lambda, this means that the remaining capacity in the healthy Availability Zones can absorb the traffic from a potentially impaired Availability Zone without having to scale up. This implies over-provisioning resources in all Availability Zones. Having that extra capacity pre-provisioned helps Lambda achieve its static stability. It is a tradeoff between the cost of over-provisioning resources and service availability. Since AWS Lambda promises high availability to its customers, with a monthly uptime service commitment of 99.95%, that tradeoff falls towards service availability.

How to prepare for failures

Preparing for an Availability Zone impairment is difficult because the symptoms and size of the impact can vary widely. An Availability Zone may be partially accessible or totally unreachable, and everything in between. Causes for the impairment can range from fiber cuts, power issues, overheating, hardware malfunctions, networking problems, capacity issues, and other unexpected situations. While those happen, they happen rarely. The most common categories of failures are bad deployments and bad configurations.

While some of these failures can be difficult to infer or reproduce, common symptoms include disruption of connectivity, increased latency, increased traffic due to retry storms, increased CPU and memory usage, and slow I/O.

At AWS, we learned to expect the unexpected and plan for failure. This means injecting faults in the system to reproduce some of the common symptoms of Availability Zone impairments, then observe how the system responds, and implement improvements. In addition, injecting faults in the system helps uncover potential monitoring and alarming blind spots, and gives an opportunity for teams to practice and improve their response to events with a focus on reducing time to recovery.

How Lambda tests its response to an Availability Zone impairment

Lambda’s approach to being resilient to Availability Zone impairments is to rely on static stability and automated systems. Humans are slower than machines for detecting issues and mitigating them. Therefore, Lambda must ensure that its services can detect issues within a zonal replica and remediate automatically within minutes and with no operator intervention. This auto-remediation is done by shifting customer traffic away from the affected Availability Zone to healthy ones, and it is called Availability Zone evacuation.

To do this, Lambda built a tool that detects failures and performs the Availability Zone evacuation when needed. This tool does a statistical comparison of metrics between different Availability Zones and EC2 instances in order to identify unhealthy Availability Zones. If an Availability Zone is found to have issues, the tool starts the evacuation out of the unhealthy Availability Zone automatically. This automation cuts the time to the first action from 30 minutes to less than 3 minutes.

How AWS Lambda uses AWS FIS

To verify the automation continuously works as expected, Lambda performs a wide variety of tests, which includes Availability Zone failure testing in their pre-production environment. The main objective of these tests is to verify the services are statically stable in the presence of Availability Zone impairments, and to verify that the Availability Zone evacuation can be successfully initiated. The benefit of having an automated test is that teams can repeat it regularly and don’t need to have special skills. One click is all it takes to launch the test.

For these tests, Lambda uses AWS FIS to inject faults into their large fleet of EC2 instances. They use AWS FIS with support of the AWS System Manager (SSM) agent and resource filters to target their fleet of EC2 instances in a particular Availability Zone. This is a versatile approach that can inject resource faults, such as CPU and memory exhaustion, and networking faults, such as packet latency, loss, or drop.

Injecting packet loss or latency is very important, since these symptoms can have a serious impact on application and network performance. Indeed, latency and loss, even in small quantities, can create inefficiencies and prevent applications from running at their peak performance. For Lambda, being able to detect increased latency or loss before it affects customers is critical.

How to recover your applications rapidly from Availability Zones failures

You can build a similar solution to rapidly recover your applications from a zonal failure. The solution must have a mechanism to evacuate an impaired Availability Zone, a monitoring system that allows you to detect when a zonal replica is impaired, and a way to test the static stability of your system. AWS provides many tools and services that can help you build this solution to achieve Lambda’s resiliency strategy.

For performing Availability Zone evacuation, you can use the new zonal shift capability from Route 53 ARC, which at the time of writing is in preview. Zonal shift lets you evacuate an Availability Zone for applications that are uses Elastic Load Balancing. If you find out that a zonal replica is impaired or unhealthy, you can use zonal shift to evacuate the Availability Zone for a period of time, while the issue gets fixed.

For performing the zonal shift, you must detect when a zonal replica is unhealthy. Your application must provide a signal of its health per Availability Zone. There are two common ways to capture this signal. First, passively, you can check your metrics, like response times, HTTP status codes, and other metrics that can help track fatal errors in your applications. Or actively, using synthetic monitoring, which allows you to create synthetic requests against your production application to provide a more complete view of the customer experience.

Amazon CloudWatch Synthetics provides canaries, which are scripts that run on a schedule and perform synthetic requests in your application endpoints and APIs. Canaries perform the same actions as customers and continuously verify the customer experience. You can create a canary for each zonal replica of your application and monitor the results independently.

With this information, if the user experience diminishes in one of the replicas, you can start an Availability Zone evacuation using zonal shift and minimize the bad experience for the user while you find and fix the sources of the failure.

To ensure that you can successfully recover from a failure, you must test the solution in advance. Without testing, it is just an assumption. To prove or disprove your assumptions about your system’s capability to handle disruptive events such as issues within an Availability Zones, you can use FIS.

With FIS, you can inject faults simultaneously in multiple resources within the same failure domain, such as Availability Zones. FIS currently integrates with several AWS services including EC2, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), Amazon Relational Database Service (Amazon RDS), AWS Networking, and CloudWatch.

Typical use cases for testing a workload’s resilience to Availability Zones impairment are, for example, terminating all compute resources and databases within a particular Availability Zone, injecting latency or packet loss, increasing resource consumption (CPU, memory, and I/O) in compute resources in a particular Availability Zone, or impacting network communication within or between Availability Zones.

For more information and a step-by-step example of how to recover rapidly from application failures in a single Availability Zone and testing it with AWS FIS, read this blog post.

Conclusion

­­­This article discusses static stability, a mechanism that is used by AWS services such as Lambda to build resilient Regional services. It also discusses how AWS takes advantage of the same services and infrastructure as customers. It shows how Lambda uses multiple Availability Zones and services like AWS FIS to build highly available services and improve its recovery time from unexpected failures to only a few minutes without human intervention. Finally, it shows a solution that you can implement for your applications to achieve Lambda’s resilience strategy.

To learn more about AWS FIS, there are many tutorials and a workshop you can check out.

For more serverless learning resources, visit Serverless Land.

Processing geospatial IoT data with AWS IoT Core and the Amazon Location Service

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/processing-geospatial-iot-data-with-aws-iot-core-and-the-amazon-location-service/

This post is written by Swarna Kunnath (Cloud Application Architect), and Anand Komandooru (Sr. Cloud Application Architect).

This blog post shows how to republish messages that arrive from Internet of Things (IoT) devices across AWS accounts using a replatforming approach. A replatforming approach minimizes changes to the core application architecture, allowing an organization to reduce risk and meet business needs more quickly. In this post, you also learn how to track an IoT device’s location using the Amazon Location Service.

The example used in this post relates to an aviation company that has airplanes with line replacement unit devices or transponders. Transponders are IoT devices that send airplane geospatial data (location and altitude) to the AWS IoT Core service. The company’s airplane transponders send location data to the AWS IoT Core service provisioned in an existing AWS account (source account). The solution required manual intervention to track airplane location sent by the transponders.

They must rearchitect their application due to an internal reorganization event. As part of the rearchitecture approach, the business decides to enhance the application to process the transponder messages in another AWS account (destination account). In addition, the business needs full automation of the airplane’s location tracking process, to minimize the risk of the application changes, and to deliver the changes quickly.

Solution overview

The high-level solution republishes the IoT messages from the source account to the destination account using AWS IoT Core, Amazon SQS, AWS Lambda, and integrates the application with Amazon Location Service. IoT messages are replicated to an IoT topic in the destination account for downstream processing, minimizing changes to the original application architecture. Integration with Amazon Location Service automates the process of device location tracking and alert generation.

The AWS IoT platform allows you to connect your internet-enabled devices to the AWS Cloud via MQTT, HTTP, or WebSocket protocol. Once connected, the devices send data to the MQTT topics. Data ingested on MQTT topics is routed into AWS services (Amazon S3, SQS, Amazon DynamoDB, and Lambda) by configuring rules in the AWS IoT Rules Engine. The AWS IoT Rules Engine offers ways to define queries to format and filter messages published by these devices, and supports integration with several other AWS services as targets.

The Amazon Location Service lets you add geospatial data including capabilities such as maps, points of interest, geocoding, routing, geofences, and tracking. The tracker with geofence tracks the location of the device based on the geospatial data in the published IoT messages. Amazon Location Service generates enter and exit events and integrates with Amazon EventBridge and Amazon Simple Notification Service (Amazon SNS) to generate alerts based on defined filters in EventBridge rules.

The solution in this post delivers high availability, scalability, and cost efficiency by using serverless and managed services. The serverless services used by this solution also provide automatic scaling and built-in high availability. Integrating Amazon Location Service with AWS IoT and EventBridge helps to automate the auditing and processing of geospatial messages.

Solution architecture

These steps describe an end-to-end sequence of events:

  1. An IoT device (a transponder in an airplane) publishes a message to the AWS IoT Core service in the source account.
  2. The message arrives at an AWS IoT Core topic in the source account.
  3. AWS IoT Rules Engine receives the message and processes it, using IoT rules attached to the corresponding topic in the source account.
  4. An AWS IoT rule replicates the message to an SQS queue in the destination account.
  5. A Lambda function in the destination account polls the SQS queue and publishes received messages in batches to the destination account IoT topic.
  6. The Location action configured to the IoT rule sends the messages to Amazon Location Service tracker from the IoT topic.
  7. An Amazon Location tracker sends events when an IoT device enters or exits a linked geofence.
  8. EventBridge receives these events and, via the configured event rule, sends out SNS notifications for the configured devices.

Pre-requisites

This example has the following prerequisites:

  1. Access to the AWS services mentioned in this blog post within two AWS Accounts.
  2. A local install of AWS SAM CLI to build and deploy the sample code.

Solution walkthrough

To deploy this solution, first deploy IoT components via the AWS Serverless Application Model (AWS SAM), in the source and destination accounts. After, configure Amazon Location Service resources in the destination account. To learn more, visit the AWS SAM deployment documentation.

Deploying the code

Deploy the following AWS SAM templates in order:

To build and deploy the code, run:

sam build --template <TemplateName>.yaml
sam deploy --guided

Configuring a tracker

Amazon Location Trackers send device location updates that provide data to retrieve current and historical locations for devices.

Using Amazon Location Trackers and Amazon Location Geofences together, you can automatically evaluate the location updates from your IoT devices against your geofences to generate the geofence events. Actions could be taken to generate the alerts based on the areas of interest.

  1. Follow the instructions in the documentation to create the tracker resource from the AWS Management Console. Use this information for the new tracker:
    • Name: Enter a unique name that has a maximum of 100 characters. For example, FlightTracker.
    • Description: Enter an optional description. For example, Tracker for storing device positions.
  2. Configure a Location action to the destination IoT rule that receives messages from the destination IoT topic and publishes them in batches to the configured Tracker device (for example, FlightTracker). The parameters in the JSON data that is returned to the Location action can also be configured via substitution templates.

Geofence collection

Geofences contain points and vertices that form a closed boundary, which defines an area of interest. For example, flight origin and destination details. You can use tools, such as GeoJSON.io, to draw geofences and save the output as a GeoJSON file. Follow the instructions in the documentation to create the GeoJSON file and link it to the geofence collection.

  1. Create the geofence collection with a GeoJSON file and link it to the tracker you just created.
  2. Link the tracker to the geofence by following these instructions and start tracking the device’s location updates. You can link them together so that you automatically evaluate location updates against all your geofences. You can evaluate device positions against geofences on demand as well.

When device positions are evaluated against geofences, they generate events. For example, when a plane enters or exits a location specified in the geofence.

You can configure EventBridge with rules to react to these events. You can set up SNS to notify your clients when a specific tracker device location changes. Follow the instructions in the documentation on how to set up EventBridge rules to integrate with Amazon Location Service events.

Testing the solution

You can test the first part of the solution by sending an IoT message with location details in the JSON format from the source account and verify that the message arrives at the destination account SQS queue. Detailed instructions to publish a test message from the source account that includes location information (latitude and longitude) can be found here.

Messages from the destination account SQS queue are published to the Amazon Location Service Tracker. When the location in the test message matches the criteria provided in the geofence, Amazon Location Service generates an event. EventBridge has a rule configured that gets matched when an Amazon Location tracker event arrives, and the rule target is an SNS topic that sends an email or text message to the client.

Cleaning up

To avoid incurring future charges, delete the CloudFormation stacks, location tracker, and geofence collection created as part of the solution walk-through. Replace the resource identifiers in the following commands with the ID/name of the resources.

  1. Delete the SAM application stack:
    aws cloudformation delete-stack --stack-name <StackName>
    

    Refer to this documentation for further information.

  2. Delete the location tracker:
    aws location delete-tracker --tracker-name <TrackerName>
  3. Delete the geofence collection:
    aws location delete-geofence-collection --collection-name <GeoCollectionName>

Conclusion

This blog post shows how to create a serverless solution for cross account IoT message publishing and tracking device location updates using Amazon Location Service.

It describes the process of how to publish AWS IoT messages across multiple accounts. Integration with the Amazon Location Service shows how to track IoT device location updates and generate alerts, alleviating the need for manual device location tracking.

For more serverless learning resources, visit Serverless Land.

Introducing maximum concurrency of AWS Lambda functions when using Amazon SQS as an event source

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/

This blog post is written by Solutions Architects John Lee and Jeetendra Vaidya.

AWS Lambda now provides a way to control the maximum number of concurrent functions invoked by Amazon SQS as an event source. You can use this feature to control the concurrency of Lambda functions processing messages in individual SQS queues.

This post describes how to set the maximum concurrency of SQS triggers when using SQS as an event source with Lambda. It also provides an overview of the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of the maximum concurrency feature.

Overview

Lambda uses an event source mapping to process items from a stream or queue. The event source mapping reads from an event source, such as an SQS queue, optionally filters the messages, batches them, and invokes the mapped Lambda function.

The scaling behavior for Lambda integration with SQS FIFO queues is simple. A single Lambda function processes batches of messages within a single message group to ensure that messages are processed in order.

For SQS standard queues, the event source mapping polls the queue to consume incoming messages, starting at five concurrent batches with five functions at a time. As messages are added to the SQS queue, Lambda continues to scale out to meet demand, adding up to 60 functions per minute, up to 1,000 functions, to consume those messages. To learn more about Lambda scaling behavior, read ”Understanding how AWS Lambda scales with Amazon SQS standard queues.”

Lambda processing standard SQS queues

Lambda processing standard SQS queues

Challenges

When a large number of messages are in the SQS queue, Lambda scales out, adding additional functions to process the messages. The scale out can consume the concurrency quota in the account. To prevent this from happening, you can set reserved concurrency for individual Lambda functions. This ensures that the specified Lambda function can always scale to that much concurrency, but it also cannot exceed this number.

When the Lambda function concurrency reaches the reserved concurrency limit, the queue configuration specifies the subsequent behavior. The message is returned to the queue and retried based on the redrive policy, expired based on its retention policy, or sent to another SQS dead-letter queue (DLQ). While sending unprocessed messages to a DLQ is a good option to preserve messages, it requires a separate mechanism to inspect and process messages from the DLQ.

The following example shows a Lambda function reaching its reserved concurrency quota of 10.

Lambda reaching reserved concurrency of 10.

Lambda reaching reserved concurrency of 10.

Maximum Lambda concurrency with SQS as an event source

The launch of maximum concurrency for SQS as an event source allows you to control Lambda function concurrency per source. You set the maximum concurrency on the event source mapping, not on the Lambda function.

This event source mapping setting does not change the scaling or batching behavior of Lambda with SQS. You can continue to batch messages with a customized batch size and window. It rather sets a limit on the maximum number of concurrent function invocations per SQS event source. Once Lambda scales and reaches the maximum concurrency configured on the event source, Lambda stops reading more messages from the queue. This feature also provides you with the flexibility to define the maximum concurrency for individual event sources when the Lambda function has multiple event sources.

Maximum concurrency is set to 10 for the SQS queue.

Maximum concurrency is set to 10 for the SQS queue.

This feature can help prevent a Lambda function from consuming all available Lambda concurrency of the account and avoids messages returning to the queue unnecessarily because of Lambda functions being throttled. It provides an easier way to control and consume messages at a desired pace, controlled by the maximum number of concurrent Lambda functions.

The maximum concurrency setting does not replace the existing reserved concurrency feature. Both serve distinct purposes and the two features can be used together. Maximum concurrency can help prevent overwhelming downstream systems and unnecessary throttled invocations. Reserved concurrency guarantees a maximum number of concurrent instances for the function.

When used together, the Lambda function can have its own allocated capacity (reserved concurrency), while being able to control the throughput for each event source (maximum concurrency). When using the two features together, you must set the function reserved concurrency higher than the maximum concurrency on the SQS event source mapping to prevent throttling.

Setting maximum concurrency for SQS as an event source

You can configure the maximum concurrency for an SQS event source through the AWS Management Console, AWS Command Line Interface (CLI), or infrastructure as code tools such as AWS Serverless Application Model (AWS SAM). The minimum supported value is 2 and the maximum value is 1000. Refer to the Lambda quotas documentation for the latest limits.

Configuring the maximum concurrency for an SQS trigger in the console

Configuring the maximum concurrency for an SQS trigger in the console

You can set the maximum concurrency through the create-event-source-mapping AWS CLI command.

aws lambda create-event-source-mapping --function-name my-function --ScalingConfig {MaxConcurrency=2} --event-source-arn arn:aws:sqs:us-east-2:123456789012:my-queue

Seeing the maximum concurrency setting in action

The following demo compares Lambda receiving and processes messages differently when using maximum concurrency compared to reserved concurrency.

This GitHub repository contains an AWS SAM template that deploys the following resources:

  • ReservedConcurrencyQueue (SQS queue)
  • ReservedConcurrencyDeadLetterQueue (SQS queue)
  • ReservedConcurrencyFunction (Lambda function)
  • MaxConcurrencyQueue (SQS queue)
  • MaxConcurrencyDeadLetterQueue (SQS queue)
  • MaxConcurrencyFunction (Lambda function)
  • CloudWatchDashboard (CloudWatch dashboard)

The AWS SAM template provisions two sets of identical architectures and an Amazon CloudWatch dashboard to monitor the resources. Each architecture comprises a Lambda function receiving messages from an SQS queue, and a DLQ for the SQS queue.

The maxReceiveCount is set as 1 for the SQS queues, which sends any returned messages directly to the DLQ. The ReservedConcurrencyFunction has its reserved concurrency set to 5, and the MaxConcurrencyFunction has the maximum concurrency for the SQS event source set to 5.

Pre-requisites

Running this demo requires the AWS CLI and the AWS SAM CLI. After installing both CLIs, clone this GitHub repository and navigate to the root of the directory:

git clone https://github.com/aws-samples/aws-lambda-amazon-sqs-max-concurrency
cd aws-lambda-amazon-sqs-max-concurrency

Deploying the AWS SAM template

  1. Build the AWS SAM template with the build command to prepare for deployment to your AWS environment.
  2. sam build
  3. Use the guided deploy command to deploy the resources in your account.
  4. sam deploy --guided
  5. Give the stack a name and accept the remaining default values. Once deployed, you can track the progress through the CLI or by navigating to the AWS CloudFormation page in the AWS Management Console.
  6. Note the queue URLs from the Outputs tab in the AWS SAM CLI, CloudFormation console, or navigate to the SQS console to find the queue URLs.
The Outputs tab of the launched AWS SAM template provides URLs to CloudWatch dashboard and SQS queues.

The Outputs tab of the launched AWS SAM template provides URLs to CloudWatch dashboard and SQS queues.

Running the demo

The deployed Lambda function code simulates processing by sleeping for 10 seconds before returning a 200 response. This allows the function to reach a high function concurrency number with only a small number of messages.

To add 25 messages to the Reserved Concurrency queue, run the following commands. Replace <ReservedConcurrencyQueueURL> with your queue URL from the AWS SAM Outputs.

for i in {1..25}; do aws sqs send-message --queue-url <ReservedConcurrencyQueueURL> --message-body testing; done 

To add 25 messages to the Maximum Concurrency queue, run the following commands. Replace <MaxConcurrencyQueueURL> with your queue URL from the AWS SAM Outputs.

for i in {1..25}; do aws sqs send-message --queue-url <MaxConcurrencyQueueURL> --message-body testing; done 

After sending messages to both queues, navigate to the dashboard URL available in the Outputs tab to view the CloudWatch dashboard.

Validating results

Both Lambda functions have the same number of invocations and the same concurrent invocations fixed at 5. The CloudWatch dashboard shows the ReservedConcurrencyFunction experienced throttling and 9 messages, as seen in the top-right metric, were sent to the corresponding DLQ. The MaxConcurrencyFunction did not experience any throttling and messages were not delivered to the DLQ.

CloudWatch dashboard showing throttling and DLQs.

CloudWatch dashboard showing throttling and DLQs.

Clean up

To remove all the resources created in this demo, use the delete command and follow the prompts:

sam delete

Conclusion

You can now control the maximum number of concurrent functions invoked by SQS as a Lambda event source. This post explains the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of maximum concurrency in action.

There are no additional charges to use this feature besides the standard SQS and Lambda charges. You can start using maximum concurrency for SQS as an event source with new or existing event source mappings by connecting it with SQS. This feature is available in all Regions where Lambda and SQS are available.

For more serverless learning resources, visit Serverless Land.

Running Next.js applications with serverless services on AWS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/running-next-js-applications-with-serverless-services-on-aws/

This is written by Julian Bonilla, Senior Solutions Architect, and Matthew de Anda, Startup Solutions Architect.

React is a popular JavaScript library used to create single-page applications (SPAs). React focuses on helping to build UIs, but leaves it up to developers to decide how to accomplish other aspects involved with developing a SPA.

Next.js is a React framework to help provide more structure and solve common application requirements such as routing and data fetching. Next.js also provides multiple types of rendering methods – Static Site Generation (SSG), Server-Side Rendering (SSR), Incremental Static Regeneration (ISR), and Client-Side Rendering (CSR).

This post demonstrates how to build a Next.js application with Serverless services on AWS and explains Next.js Server-Side rendering. To deploy this solution and to provision the AWS resources, you can use either AWS Serverless Application Model (AWS SAM) or AWS Cloud Development Kit (CDK). Both are open-source frameworks to automate AWS deployment. AWS SAM is a declarative framework for building serverless applications and CDK is an imperative framework to define cloud application resources using familiar programming languages.

Overview

To render a Next.js application, you use Amazon S3, Amazon CloudFront, Amazon API Gateway, and AWS Lambda. Static resources are hosted in a private S3 bucket with a CloudFront distribution. Since static sources are generated at build time, this takes advantage of CloudFront so browsers can load these files cached on the network edge instead of from the server.

The Next.js application components that uses server-side rendering is rendered with a Lambda function using AWS Lambda Web Adapter. The CloudFront distribution is configured to forward requests to the API Gateway endpoint, which then calls the Lambda function.

  1. Static files (for example, CSS, JavaScript, and HTML) are mapped to /_next/static/* and /public/*.
  2. Server-side rendering is mapped with default behavior (*).
  3. The AWS Lambda Adapter runs Next.js Output File Tracing.

What’s Next.js?

Next.js is a React framework that creates a more opinionated approach to building web applications while providing additional structure and features such as Server-Side Rendering and Static Site Generation.

These additional rendering options provide more flexibility over the typical ways a React application is built, which is to render in the client’s browser with JavaScript. This can help in scenarios where customers have JavaScript disabled and can improve search engine optimization (SEO). While you can implement SSR in React applications, Next.js makes it simpler for developers.

Next.js rendering strategies

These are the different rendering strategies offered by Next.js:

  • Static Site Generation generates static resources at build time and is a good rendering strategy for static content that rarely changes and SEO.
  • Server-Side Rendering generates each page on-demand at request time and is good for pages that are dynamic. Since from the browser perspective it’s still pre-rendered, like Static Site Generation, it’s also good for SEO.
  •  Incremental Static Regeneration is a new rendering strategy that is good for apps with many pages where build times are high. With Incremental Static Regeneration, you can build page per-page without needing to rebuild the entire app.
  • Client-Side Rendering is the typical rendering strategy where the application is rendered in the browser with JavaScript. Next.js lets you choose the appropriate rendering method page-by-page. When a Next.js application is built, Next.js transforms the application to production-optimized files. You have HTML for statically generated pages, JavaScript for rendering on the server, JavaScript for rendering on the client, and CSS files.

Next.js also supports static HTML export, which has no server side component. Features that require a server are not supported with this approach. These apps can be hosted from S3 and CloudFront.

The remainder of this post focuses on Static Site Generation and Server-Side Rendering.

Next.js application project structure

Understanding how Next.js structures projects can give insight into how you deploy the application. A page is a React Component exported from files in the “pages” directory. These files are also used for routing where pages/index.js is routed to / route.

By default, these pages are pre-rendered. Static assets, such as images, are stored under “public” directory and can be referenced from /. Since these files are best stored in persistent storage and backed by a content delivery network (CDN), you can add a prefix in the implementation to distinguish these static files.

To create dynamic routes, add brackets to a page file – for example, pages/user/[id].js. This creates a statically generated page with the path /user/<id> where <id> can be dynamic.

API routes provide a way to create an API endpoint and are located in the pages/api directory. When building, Next.js generates an optimized version of your application under the .next directory. Static files not stored in the public directory are in .next/static. These static files are expected to be uploaded as _next/static to a CDN.

No other code in the .next/directory should be uploaded to a CDN because that would expose server code and other configuration.

Implementation

Next.js pre-renders HTML for every page using Static Site Generation or Server-side Rendering. For Static Site Generation, pages are rendered at build time and can be cached in CloudFront. Server-side rendered pages are rendered at request time, and typically fetch data from downstream resources on each request.

Clients connect to a CloudFront distribution, which is configured to forward requests for static resources to S3 and all other requests to API Gateway. API Gateway forwards requests to the Next.js application running on Lambda, which performs the server-side rendering.

At build time, Next.js Output File Tracing determines the minimal set of files needed for deploying to Lambda. The files are automatically copied to a standalone directory and must be enabled in next.config.js.

const nextConfig = {
  reactStrictMode: true,
  output: 'standalone',
}
module.exports = nextConfig

Since the Next.js application is essentially a webserver, this example uses the AWS Lambda Web Adapter as a Lambda layer to convert incoming events from API Gateway to HTTP requests that Next.js can process.

Once processed, the AWS Lambda Web Adapter converts the HTTP response back to a Lambda event response. The Lambda handler is configured to run the minimal server.js file created by the standalone build step.

The CloudFront distribution has two origins: one for the S3 bucket and another for the API Gateway. Two behaviors are created to specify path patterns to route static content produced by Next.js and static resources stored under the public/static directory. Next.js uses the public directory under root to serve static assets such as images.

These assets are then served under / so if you add public/me.png, it would be served at /me.png. This makes it harder to create a CloudFront behavior for these assets. One workaround is to create a static directory under the public directory and then map it to the CloudFront behavior. The default(*) path pattern behavior has the origin set to API Gateway with caching disabled.

Prerequisites and deployment

Refer to the project in its GitHub repository for instructions to deploy the solution using AWS SAM or AWS CDK. Multiple resources are provisioned for you as part of the deployment, and it takes several minutes to complete. The example Next.js application deployed is created using Create Next App.

Understanding the Next.js Application

To create a new page, you create a file under the pages directory and that creates a route based on the name (e.g. pages/hello.js creates route /hello). To create dynamic routes, create a file following the project’s example of pages/posts/[id].js to produce routes for posts/1, posts/2, and so forth.

For API routes, any file added to the directory pages/api is mapped to /api/* and becomes an API endpoint. These are server-side only bundles hosted by API Gateway and Lambda.

Conclusion

This blog shows how to run Next.js applications using S3, CloudFront, API Gateway, and Lambda. This architecture supports building Next.js applications that can use static-site generation, server-side rendering, and client-side rendering. The blog also covers how you can use open-source frameworks, AWS SAM and CDK, to build and deploy your Next.js applications.

If your organization is looking for a fully managed hosting of your Next.js applications, AWS Amplify Hosting supports Next.js. If interested in learning more about server-side rendering and micro-frontends, see Server-side rendering micro-frontends – the architecture.

For more serverless learning resources, visit Serverless Land.

Architecture patterns for consuming private APIs cross-account

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/architecture-patterns-for-consuming-private-apis-cross-account/

This blog written by Thomas Moore, Senior Solutions Architect and Josh Hart, Senior Solutions Architect.

Amazon API Gateway allows developers to create private REST APIs that are only accessible from a virtual private cloud (VPC). Traffic to the private API uses secure connections and does not leave the AWS network, meaning AWS isolates it from the public internet. This makes private API Gateway endpoints a good fit for publishing internal APIs, such as those used by backend microservice communication.

In microservice architectures, where multiple teams build and manage components, different AWS accounts often consume private API endpoints.

This blog post shows how a service can consume a private API Gateway endpoint that is published in another AWS account securely over AWS PrivateLink.

Consuming API Gateway private endpoint cross-account via AWS PrivateLink.

Consuming API Gateway private endpoint cross-account via AWS PrivateLink.

This blog covers consuming API Gateway endpoints cross-account. For exposing cross-account resources behind an API Gateway, read this existing blog post.

Overview

To access API Gateway private endpoints, you must create an interface VPC endpoint (named execute-api) inside your VPC. This creates an AWS PrivateLink connection between your AWS account VPC and the API Gateway service VPC. The PrivateLink connection allows traffic to flow over private IP address space without traversing the internet.

PrivateLink allows access to private API Gateway endpoints in different AWS accounts, without VPC peering, VPN connections, or AWS Transit Gateway. A single execute-api endpoint is used to connect to any API Gateway, regardless of which AWS account the destination API Gateway is in. Resource policies control which VPC endpoints have access to the API Gateway private endpoint. This makes the cross-account architecture simpler, with no complex routing or inter-vpc connectivity.

The following diagram shows how interface VPC endpoints in a consumer account create a PrivateLink connection back to the API Gateway service account VPC. The resource policy applied to the private API determines which VPC endpoint can access the API. For this reason, it is critical to ensure that the resource policy is correct to prevent unintentional access from other AWS account VPC endpoints.

Access to private API Gateway endpoints requires an AWS PrivateLink connection to an AWS service account VPC.

Access to private API Gateway endpoints requires an AWS PrivateLink connection to an AWS service account VPC.

In this example, the resource policy denies all connections to the private API endpoint unless the aws:SourceVpce condition matches vpce-1a2b3c4d in account A. This means that connections from other execute-api VPC endpoints are denied. To allow access from account B, add vpce-9z8y7x6w to the resource policy. Refer to the documentation to learn about other condition keys you can use in API Gateway resource policies.

For more detail on how VPC links work, read Understanding VPC links in Amazon API Gateway private integrations.

The following sections cover three architecture patterns to consume API Gateway private endpoints cross-account:

  1. Regional API Gateway to private API Gateway
  2. Lambda function calling API Gateway in another account
  3. Container microservice calling API Gateway in another account using mTLS

Regional API Gateway to private API Gateway cross-account

When building microservices in different AWS accounts, private API Gateway endpoints are often used to allow service-to-service communication. Sometimes a portion of these endpoints must be exposed publicly for end user consumption. One pattern for this is to have a central public API Gateway, which acts as the front-door to multiple private API Gateway endpoints. This allows for central governance of authentication, logging and monitoring.

The following diagram shows how to achieve this using a VPC link. VPC links enable you to connect API Gateway integrations to private resources inside a VPC. The API Gateway VPC interface endpoint is the VPC resource that you want to connect to, as this is routing traffic to the private API Gateway endpoints in different AWS accounts.

API Gateway Regional endpoint consuming API Gateway private endpoints cross-account

API Gateway Regional endpoint consuming API Gateway private endpoints cross-account

VPC link requires the use of a Network Load Balancer (NLB). The target group of the NLB points to the private IP addresses of the VPC endpoint, normally one for each Availability Zone. The target group health check must validate the API Gateway service is online. You can use the API Gateway reserved /ping path for this, which returns an HTTP status code of 200 when the service is healthy.

You can deploy this pattern in your own account using the example CDK code found on GitHub.

Lambda function calling private API Gateway cross-account

Another popular requirement is for AWS Lambda functions to invoke private API Gateway endpoints cross-account. This enables service-to-service communication in microservice architectures.

The following diagram shows how to achieve this using interface endpoints for Lambda, which allows access to private resources inside your VPC. This allows Lambda to access the API Gateway VPC endpoint and, therefore, the private API Gateway endpoints in another account.

Consuming API Gateway private endpoints from Lambda cross-account

Consuming API Gateway private endpoints from Lambda cross-account

Unlike the previous example, there is no NLB or VPC link required. The resource policy on the private API Gateway must allow access from the VPC endpoint in the account where the consuming Lambda function is.

As the Lambda function has a VPC attachment, it will use DNS resolution from inside the VPC. This means that if you selected the Enable Private DNS Name option when creating the interface VPC endpoint for API Gateway the https://{restapi-id}.execute-api.{region}.amazonaws.com endpoint will automatically resolve to private IP addresses. Note that this DNS configuration can block access from Regional and edge-optimized API endpoints from inside the VPC. For more information, refer to the knowledge center article.

You can deploy this pattern in your own account using the sample CDK code found on GitHub.

Calling private API Gateway cross-account with mutual TLS (mTLS)

Customers that operate in regulated industries, such as open banking, must often implement mutual TLS (mTLS) for securely accessing their APIs. It is also great for Internet of Things (IoT) applications to authenticate devices using digital certificates.

Mutual TLS (mTLS) verifies both the client and server via certificates with TLS

Mutual TLS (mTLS) verifies both the client and server via certificates with TLS

Regional API Gateway has native support for mTLS but, currently, private API Gateway does not support mTLS, so you must terminate mTLS before the API Gateway. One pattern is to implement a proxy service in the producer account that resolves the mTLS handshake, terminates mTLS, and proxies the request to the private API Gateway over regular HTTPS.

The following diagram shows how to use a combination of PrivateLink, an NGINX-based proxy, and private API Gateway to implement mTLS and consume the private API across accounts.

Consuming API Gateway private endpoints cross-account with mTLS

Consuming API Gateway private endpoints cross-account with mTLS

In this architecture diagram, Amazon ECS Fargate is used to host the container task running the NGINX proxy server. This proxy validates the certificate passed by the connecting client before passing the connection to API Gateway via the execute-proxy VPC endpoint. The following sample NGINX configuration shows how the mTLS proxy service works by using ssl_verify_client and ssl_client_certificate settings to verify the connecting client’s certificate, and proxy_pass to forward the request onto API Gateway.

server {
    listen 443 ssl;

    ssl_certificate     /etc/ssl/server.crt;
    ssl_certificate_key /etc/ssl/server.key;
    ssl_protocols       TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;

    ssl_client_certificate /etc/ssl/client.crt;
    ssl_verify_client      on;

    location / {
        proxy_pass https://{api-gateway-endpoint-api};
    }
}

The connecting client must supply the client certificate when connecting to the API via the VPC endpoint service:

curl --key client.key --cert client.crt --cacert server.crt https://{vpc-endpoint-service-url}

Use VPC security group rules on both the VPC endpoint and the NGINX proxy to prevent clients bypassing the mTLS endpoint and connecting directly to the API Gateway endpoint.

There is an example NGINX config and Dockerfile to configure this solution in the GitHub repository.

Conclusion

This post explores three solutions to consume private API Gateway across AWS accounts. A key component of all the solutions is the VPC interface endpoint. Using VPC Endpoints & PrivateLink, you can consume resources securely and even your own microservices across AWS accounts. For more details, read Enabling New SaaS Strategies with AWS PrivateLink. Visit the GitHub repository to get started implementing one of these solutions today.

For more serverless learning resources, visit Serverless Land.

Chaos experiments using AWS Step Functions and AWS Fault Injection Simulator

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/chaos-experiments-using-aws-step-functions-and-aws-fault-injection-simulator/

This post is written by Arunsingh Jeyasingh Jacob, Senior Solutions Architect, and Sindhura Palakodety, Senior Solutions Architect.

To run business-critical applications at scale, it is important to determine the resiliency of the application. Chaos experiments induce controlled failures into a distributed application to gain confidence in the application behavior. The learnings from these experiments can be fed into a continuous feedback cycle to improve the resiliency.

In 2021, AWS launched AWS Fault Injection Simulator (FIS). This is a fully managed service for running Fault Injection experiments on AWS. It makes it easier to improve an application’s performance, observability, and resiliency. With Fault Injection Simulator, AWS customers can quickly set up experiments using pre-built templates that generate the desired disruptions.

Overview

This post uses AWS Step Functions to create and run AWS Fault Injection Simulator (FIS) experiments. You are encouraged to perform these experiments in a test account. Do not use the example in a production environment without making appropriate code changes.

The demo-fis-stepfunctions code deployed in this post is used to build the Step Functions state machine for FIS experiments.

There are two Step Functions workflows that are deployed. One is for chaos testing Amazon EC2 workloads and the other one is for Amazon ECS workloads.

The Step Functions workflow for EC2 workloads

The following Step Functions workflow shows an EC2 chaos experiment to stress CPU utilization, and stop and terminate EC2 instances.

Step Functions workflow for Amazon EC2 Fault Injection Experiments

EC2 experiment templates

This workflow runs through FIS experiment template creation followed by the execution of the FIS experiment. The FIS experiment template contains one or more actions to run on specified targets during an experiment. By creating a template, you are not running experiments against any workload, but creating a definition for the experiment.

The EC2 experiment templates created in this workflow are:

  • EC2CPUStressExperimentTemplate
  • EC2StopExperimentTemplate
  • EC2TerminateExperimentTemplate

These are the terms used in the FIS experiment template:

  • Actions: While creating an experimentation template, you must define an action once during an experiment.
  • Targets: A target is one or more AWS resources on which an action is performed by AWS Fault Injection Simulator (AWS FIS) during an experiment. For example, defining which instances to stress CPUs based on the tags.
  • Filters: Resource filters are queries that identify target resources according to specific attributes.
  • IAM role: This IAM role is assumed by FIS to perform the actions mentioned in the template. The Step Functions role must pass permissions to this FIS role.
  • Client token: The Step Functions execution fails if a client token is not passed.
  • Selection mode: Run experiments on all the resources matching the target criteria or specify the number of resources. For example, the EC2CPUStressExperimentTemplate targets one resource in random.

The ‘EC2CPUStressExperimentTemplate’ code defines how to stop EC2 instances with the tag ‘FISAction: CPUStress’:

{
   "Targets": {
      "CPUStressInstances": {
         "ResourceType": "aws:ec2:instance",
         "ResourceTags": {
            "FISAction": "CPUStress"
         },
         "Filters": [
            {
               "Path": "VpcId",
               "Values": [
                  "${VpcID}"
               ]
            }
         ],
         "SelectionMode": "COUNT(1)"
      }
   }
}

Targets can also be filtered using parameters like Amazon VPC ID. You can change the Step Functions definition by modifying the targets, actions, and filters.

EC2 FIS experiments

FIS experiment uses the experiment template definition during the state execution, and targets the appropriate resources.

  1. There are three EC2 FIS experiments created as a part of the workflow:1. CPUStressInstances: This runs after the ‘EC2CPUStressExperimentTemplate’ state. In this state, AWS Systems Manager (SSM) attempts to add CPU stress on the target instance with the tag “FISAction: CPUStress”. You can monitor the metric in the Amazon CloudWatch dashboard, and take actions using Amazon CloudWatch alarms.
    Monitoring the CPU utilization using Amazon CloudWatch
  2. StopInstances: Here, the target instances enter the ‘stopping’ state. Based on the template definition, all the EC2 instances with the tag “FISAction:Stop” in the filtered VPC are stopped.
  3.  TerminateInstances: This terminates the target instances with the tag “FISAction:Terminate” in the filtered VPC.

The Step Functions workflow for Amazon ECS workloads

The following Step Functions workflow shows an FIS experiment to stop ECS tasks:

Step Functions workflow for Amazon ECS Fault Injection experiment

  • Amazon ECS experiment template: The ECSStopTaskExperimentTemplate state is created in this workflow. This FIS template defines the action to be run during the experiment.
  • Amazon ECS experiments: After creating the experiment template, the ECSStopTask state runs the FIS experiment.
  • ECSStopTask: FIS targets all the Amazon ECS tasks with the tag “FISAction: StopECSTask” and stops the tasks.

After the FIS experiment state is initiated, the status of the experiment can be polled before proceeding to the next state. The FIS experiments have multiple states like pending, initiating, running, completed, stopping, stopped, and failed.

The choice state in the workflow checks for the ‘running’ state by polling the status using FIS: GetExperiment API. A ‘failed’ status will result in the workflow failure. You can also design the workflow by introducing wait times between the experiments or by including flow activities like parallel.

Deploying with the AWS Serverless Application Model

This example has the following prerequisites:

  1. Create an AWS account if you don’t have one already.
  2. A valid existing VPC with subnets.
  3. A local install of Git CLI.
  4. A local install of AWS SAM CLI to build and deploy the sample code.

After installation, follow these steps to deploy the example:

  1. Clone the sample code:
    git clone https://github.com/aws-samples/aws-stepfunctions-examples.git
    cd sam/demo-fis-stepfunctions/
    
  2. Modify the templates as needed. You can also edit the state machine code from the AWS Management Console after you deploy this code.
  3. Build and deploy the code:
    sam build 
    sam deploy --guided
    

To learn more, visit the AWS SAM deployment documentation. This launches an AWS CloudFormation stack that creates the state machine, AWS IAM roles and CloudWatch log group. The next step is to run the state machines.

Running the Step Functions workflows

The AWS SAM deployment creates two Step Functions state machines in the deployed AWS Region: FISTest-aws-region-StateMachineFIS and FISTest-aws-region-StateMachineECSFIS.

Before running the state machine FISTest-aws-region-StateMachineFIS, create three EC2 instances with the tags “FISAction: CPUStress”, “FISAction:Stop” and “FISAction:Terminate” respectively.

  1. Navigate to the Step Functions console.
  2. Choose FISTest-aws-region-StateMachineFIS then Start execution.

As the workflow progresses, CPU Utilization spikes in one Amazon EC2 instance. The other Amazon EC2 instances are stopped and terminated.

To run chaos experiments on an ECS cluster, you can either use an existing ECS cluster or create a new cluster. To create an ECS cluster:

  1. Navigate to the ECS console.
  2. Choose Get Started.
  3. Choose sample-app and follow the instructions to deploy. Wait for the cluster to be created.
  4. Choose the sample cluster, then choose Tasks.
  5. Choose the running tasks and add the tag “FISAction: StopECSTask”

From the browser, the public IP assigned to the task takes you to the sample application.

Amazon ECS Sample Application

  1. Navigate to the Step Functions console.
  2. Choose FISTest-aws-region-StateMachineECSFIS, then Start execution.
  3. The workflow transitions to Wait during the execution.

Execution - FIS Step Functions workflow for ECS experiment

Once the execution is complete, the webpage momentarily becomes unavailable until a new task comes up. The public IP address and ARN of the task changes. The task status of the stopped tasks now shows “Task stopped by AWS FIS”.

ECS Task stopped by AWS FIS Experiment

To perform an FIS experiment against an existing ECS cluster, add the Resource Tags value “FISAction: StopECSTask” to your ECS tasks before running the workflows.

{
   "Targets": {
      "ecsfargatetask": {
         "ResourceType": "aws:ecs:task",
         "ResourceTags": {
            "FISAction": "StopECSTask"
         },
         "SelectionMode": "ALL"
      }
   }
}

Cleanup

If you have deployed the code using AWS SAM, delete the resources:

sam delete –stack-name <STACK_NAME>

Refer to this documentation for further information.

Conclusion

This blog post describes how to use Step Functions to orchestrate Fault Injection Simulator (FIS) experiments for EC2 and ECS workloads. Using the workflow in this post as an example, you can build state machines for more AWS FIS experiments. Step Functions, AWS FIS, and other services can be combined to build resiliency workflows, and test your application against your resiliency goals.

To learn more about AWS FIS and Step Functions, visit:

For more Step Functions resources, visit the Serverless Workflows Collection.

Securing Lambda Function URLs using Amazon Cognito, Amazon CloudFront and AWS WAF

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/securing-lambda-function-urls-using-amazon-cognito-amazon-cloudfront-and-aws-waf/

This post is written by Madhu Singh (Solutions Architect), and Krupanidhi Jay (Solutions Architect).

Lambda function URLs is a dedicated HTTPs endpoint for a AWS Lambda function. You can configure a function URL to have two methods of authentication: IAM and NONE. IAM authentication means that you are restricting access to the function URL (and in-turn access to invoke the Lambda function) to certain AWS principals (such as roles or users). Authentication type of NONE means that the Lambda function URL has no authentication and is open for anyone to invoke the function.

This blog shows how to use Lambda function URLs with an authentication type of NONE and use custom authorization logic as part of the function code, and to only allow requests that present valid Amazon Cognito credentials when invoking the function. You also learn ways to protect Lambda function URL against common security threats like DDoS using AWS WAF and Amazon CloudFront.

Lambda function URLs provides a simpler way to invoke your function using HTTP calls. However, it is not a replacement for Amazon API Gateway, which provides advanced features like request validation and rate throttling.

Solution overview

There are four core components in the example.

1. A Lambda function with function URLs enabled

At the core of the example is a Lambda function with the function URLs feature enabled with the authentication type of NONE. This function responds with a success message if a valid authorization code is passed during invocation. If not, it responds with a failure message.

2. Amazon Cognito User Pool

Amazon Cognito user pools enable user authentication on websites and mobile apps. You can also enable publicly accessible Login and Sign-Up pages in your applications using Amazon Cognito user pools’ feature called the hosted UI.

In this example, you use a user pool and the associated Hosted UI to enable user login and sign-up on the website used as entry point. This Lambda function validates the authorization code against this Amazon Cognito user pool.

3. CloudFront distribution using AWS WAF

CloudFront is a content delivery network (CDN) service that helps deliver content to end users with low latency, while also improving the security posture for your applications.

AWS WAF is a web application firewall that helps protect your web applications or APIs against common web exploits and bots and AWS Shield is a managed distributed denial of service (DDoS) protection service that safeguards applications running on AWS. AWS WAF inspects the incoming request according to the configured Web Access Control List (web ACL) rules.

Adding CloudFront in front of your Lambda function URL helps to cache content closer to the viewer, and activating AWS WAF and AWS Shield helps in increasing security posture against multiple types of attacks, including network and application layer DDoS attacks.

4. Public website that invokes the Lambda function

The example also creates a public website built on React JS and hosted in AWS Amplify as the entry point for the demo. This website works both in authenticated mode and in guest mode. For authentication, the website uses Amazon Cognito user pools hosted UI.

Solution architecture

This shows the architecture of the example and the information flow for user requests.

In the request flow:

  1. The entry point is the website hosted in AWS Amplify. In the home page, when you choose “sign in”, you are redirected to the Amazon Cognito hosted UI for the user pool.
  2. Upon successful login, Amazon Cognito returns the authorization code, which is stored as a cookie with the name “code”. The user is redirected back to the website, which has an “execute Lambda” button.
  3. When the user choose “execute Lambda”, the value from the “code” cookie is passed in the request body to the CloudFront distribution endpoint.
  4. The AWS WAF web ACL rules are configured to determine whether the request is originating from the US or Canada IP addresses and to determine if the request should be allowed to invoke Lambda function URL origin.
  5. Allowed requests are forwarded to the CloudFront distribution endpoint.
  6. CloudFront is configured to allow CORS headers and has the origin set to the Lambda function URL. The request that CloudFront receives is passed to the function URL.
  7. This invokes the Lambda function associated with the function URL, which validates the token.
  8. The function code does the following in order:
    1. Exchange the authorization code in the request body (passed as the event object to Lambda function) to access_token using Amazon Cognito’s token endpoint (check the documentation for more details).
      1. Amazon Cognito user pool’s attributes like user pool URL, Client ID and Secret are retrieved from AWS Systems Manager Parameter Store (SSM Parameters).
      2. These values are stored in SSM Parameter Store at the time these resources are deployed via AWS CDK (see “how to deploy” section)
    2. The access token is then verified to determine its authenticity.
    3. If valid, the Lambda function returns a message stating user is authenticated as <username> and execution was successful.
    4. If either the authorization code was not present, for example, the user was in “guest mode” on the website, or the code is invalid or expired, the Lambda function returns a message stating that the user is not authorized to execute the function.
  9. The webpage displays the Lambda function return message as an alert.

Getting started

Pre-requisites:

Before deploying the solution, please follow the README from the GitHub repository and take the necessary steps to fulfill the pre-requisites.

Deploy the sample solution

1. From the code directory, download the dependencies:

$ npm install

2. Start the deployment of the AWS resources required for the solution:

$ cdk deploy

Note:

  • optionally pass in the –profile argument if needed
  • The deployment can take up to 15 minutes

3. Once the deployment completes, the output looks similar to this:

Open the amplifyAppUrl from the output in your browser. This is the URL for the demo website. If you don’t see the “Welcome to Compute Blog” page, the Amplify app is still building, and the website is not available yet. Retry in a few minutes. This website works either in an authenticated or unauthenticated state.

Test the authenticated flow

  1. To test the authenticated flow, choose “Sign In”.

2. In the sign-in page, choose on sign-up (for the first time) and create a user name and password.

3. To use an existing an user name and password, enter those credentials and choose login.

4. Upon successful sign-in or sign up, you are redirected back to the webpage with “Execute Lambda” button.

5. Choose this button. In a few seconds, an alert pop-up shows the logged in user and that the Lambda execution is successful.

Testing the unauthenticated flow

1. To test the unauthenticated flow, from the Home page, choose “Continue”.

2. Choose “Execute Lambda” and in a few seconds, you see a message that you are not authorized to execute the Lambda function.

Testing the geo-block feature of AWS WAF

1. Access the website from a Region other than US or Canada. If you are physically in the US or Canada, you may use a VPN service to connect to a Region other than US or Canada.

2. Choose the “Execute Lambda” button. In the Network trace of browser, you can see the call to invoke Lambda function was blocked with Forbidden response.

3. To try either the authenticated or unauthenticated flow again, choose “Return to Home Page” to go back to the home page with “Sign In” and “Continue” buttons.

Cleaning up

To delete the resources provisioned, run the cdk destroy command from the AWS CDK CLI.

Conclusion

In this blog, you create a Lambda function with function URLs enabled with NONE as the authentication type. You then implemented a custom authentication mechanism as part of your Lambda function code. You also increased the security of your Lambda function URL by setting it as Origin for the CloudFront distribution and using AWS WAF Geo and IP limiting rules for protection against common web threats, like DDoS.

For more serverless learning resources, visit Serverless Land.

Visualize and create your serverless workloads with AWS Application Composer

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visualize-and-create-your-serverless-workloads-with-aws-application-composer/

This post is written by Luca Mezzalira, Principal Specialist Solutions Architect.

Today, AWS is launching a preview of AWS Application Composer, a visual designer that you can use to build your serverless applications from multiple AWS services.

In distributed systems, empowering teams is a cultural shift needed for enabling developers to help translate business capabilities into code.

This doesn’t mean every team works in isolation. Different teams or even new-joiners must understand what they are building to contribute to a project. The best way to understand architecture quickly is by using diagrams. Unfortunately, architectural diagrams are often outdated. Often, when releasing a workload in production, there are already discrepancies from the initial design and infrastructure.

Developers new to building serverless applications can face a learning curve when composing applications from multiple AWS services. They must understand how to configure each service, and then learn and write infrastructure as code (IaC) to deploy their application.

Example scenario

Emma is a cloud architect working for a video on-demand platform where every user can access the content after subscribing to the service. In the next few months, the marketing team wants to start a campaign to increase the user base using discount codes for new users only.

She collaborates with a team of developers who are new to building serverless applications. They must design a discount code service that can scale to thousands of transactions per second. There are many requirements to implement this service:

  • Gathering the gift code from a user.
  • Verifying the discount code is available.
  • Applying the discount code to the invoice at the end of the month.

Based on these requirements and default SLAs available for all the platform services, Emma designs a high-level architecture with the key elements needed for building this microservice.

Discount code service high-level architecture

Discount code service high-level architecture

Her idea is to receive a request from clients with a discount code in the payload, and validate the availability of the discount code in a database. The service then asynchronously processes different discount codes in batches to reduce traffic to downstream dependencies and reduce the cost of the overall infrastructure.

This approach ensures that the service can scale in the future beyond the initial traffic volume. It simplifies the management and implementation of the discount code service and other parts of the system with a loosely coupled architecture.

After discussing the architecture with her developers, she opens Application Composer in the AWS Management Console and starts building the implementation using serverless services.

Application Composer initial screen

Application Composer initial screen

To start, she selects New blank project and selects a local file system folder to save the project files.

Application Composer create blank project

Application Composer create blank project

Granting Application Composer access to your local project files allows near real time bidirectional syncing of changes between the console interface and locally stored project files. When you update a property with the Application Composer interface, it’s reflected in the files stored locally. When you change a local file in your IDE, it automatically reflects in the Application Composer canvas.

After creating the project, Emma drags the AWS resources she needs from the left sidebar for expressing the initial design agreed with the team.

Using Application Composer, you can drag serverless resources on the canvas and connect them together. In the background, Application Composer generates the infrastructure as code AWS CloudFormation template for you.

Application Composer canvas

Application Composer canvas

For example, this is the default configuration generated when you drag a Lambda function onto the canvas. The following code is present in the template view:

  Function:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub ${AWS::StackName}-Function
      Description: !Sub
        - Stack ${AWS::StackName} Function ${ResourceName}
        - ResourceName: Function
      CodeUri: src/Function
      Handler: index.handler
      Runtime: nodejs14.x
      MemorySize: 3008
      Timeout: 30
      Tracing: Active

Application Composer incorporates some helpful default property values, which are sometimes overlooked by developers new to serverless workloads. These include activating tracing using AWS X-Ray or increasing a function timeout, for instance.

You can change these parameters either in the CloudFormation template inside Application Composer or by visually selecting a resource. In the previous example, you can update the Lambda function parameters by opening the resource properties panel.

Application Composer resource panel

Application Composer resource panel

When you synchronize an Application Composer project with the local system, you can change the CloudFormation template from a code editor. This reflects the change in the Application Composer interface automatically.

When you connect two elements in the canvas, Application Composer sets default IAM policies, environment variables for Lambda functions, and event subscriptions where applicable.

For instance, if you have a Lambda function that interacts with an Amazon DynamoDB table and Amazon SQS queue, Application Composer generates the following configuration for the Lambda function.

Function:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub ${AWS::StackName}-Function
      Description: !Sub
        - Stack ${AWS::StackName} Function ${ResourceName}
        - ResourceName: Function
      CodeUri: src/Function
      Handler: index.handler
      […]
      Environment:
        Variables:
          QUEUE_NAME: !GetAtt Queue.QueueName
          QUEUE_ARN: !GetAtt Queue.Arn
          QUEUE_URL: !Ref Queue
          TABLE_NAME: !Ref Table
          TABLE_ARN: !GetAtt Table.Arn
      Policies:
        - SQSSendMessagePolicy:
            QueueName: !GetAtt Queue.QueueName
        - DynamoDBCrudPolicy:
            TableName: !Ref Table

This helps new builders when designing their first serverless applications and provides an initial configuration, which more advanced builders can amend. This allows you to include good operational practices when designing a serverless application.

Emma’s team continues to add together the different services needed to express the discount code architecture. This is the final result in Application Composer:

Discount code architecture in Application Composer

Discount code architecture in Application Composer

  1. The application includes an Amazon API Gateway endpoint that exposes the API needed for submitting a discount code to the system.
  2. The POST API triggers a Lambda function that first validates that the discount code is still available.
  3. This is stored using a DynamoDB table
  4. After successfully validating the discount code, the function adds a message to an SQS queue and returns a successful response to the client.
  5. Another Lambda function retrieves the message from the SQS queue and sends an invoice.

Using this approach optimizes the Lambda function invocation for speed as the remaining operations are handled asynchronously. This also simplifies the complexity and cost of the architecture because you can aggregate multiple discount codes per user SQS batching, rather than scaling the service when requests arrive from the users.

The team agrees to use this as the initial design of their service. In the future, they plan to integrate with their authentication mechanism. They add Lambda Powertools for observability, and additional libraries developed internally to make the project compliant with company standards.

Application Composer has created all the files needed to start the project in Emma’s local file system including the CloudFormation template .yaml file and the Lambda functions’ handlers.

Application Composer generated files

Application Composer generated files

Emma can now upload the outline of this service to a version control system and share the artifacts with other developers who can start coding the business logic.

Additional features

Application Composer includes a resource list tab within the left-side panel that allows you to quickly browse available resources.

Application Composer browse available resources

Application Composer browse available resources

You can also group resources semantically for simplifying the visualization inside the canvas. This helps when you have a large application in the canvas and you want to select an element quickly without dragging the canvas around to find the resource. This feature doesn’t impact the infrastructure generated.

Application Composer grouping

Application Composer grouping

Application Composer adds some metadata to the CloudFormation template to allow the canvas to group resources together when the project is loaded again.

Metadata:
  AWS::Composer::Groups:
    Group:
      Label: Group
      Members:
        - CodesQueue
        - CodesTable

You can use Application Composer beyond building new serverless workloads. You can load existing CloudFormation templates by selecting Load existing project in the Create project dialog.

Application Composer load existing project

Application Composer load existing project

You can use this to define your blueprints with organizational best practices and then visualize them within Application Composer. This helps teams collaborate when starting new serverless services. You can add resources from an existing base template to build serverless microservices or event-driven architectures.

Integration with AWS SAM

AWS Serverless Application Model (AWS SAM) recently announced the general availability of AWS SAM Accelerate to accelerate the feedback loop and testing of your code and cloud infrastructure by synchronizing only project changes. You can use Application Composer together with AWS SAM Accelerate to more simply visually build and then test your serverless applications in the cloud.

To learn more about AWS SAM Accelerate, watch this live demo.

Where Application Composer fits into the development process

Emma used Application Composer to help her team for this project but has ideas on further ways to use it.

  • Rapid prototyping.
  • Reviewing and collaboratively evolving existing serverless projects.
  • Generating diagrams for documentation or Wikis.
  • On-boarding new team members to a project
  • Reducing the first steps to deploy something in an AWS Cloud account.

Application Composer availability

Application Composer is currently available as a public preview in the following Regions: Frankfurt (eu-central-1), Ireland (eu-west-1), Ohio (us-east-2), Oregon (us-west-2), North Virginia (us-east-1) and Tokyo (ap-northeast-1).

Application Composer is available at no additional cost and can be accessed via the AWS Management Console.

Conclusion

Application Composer is a visual designer to help developers and architects express and build their application architecture. They can iterate on their ideas with colleagues and create documentation for others working on the application for the first time. You can use Application Composer during multiple stages of your software development lifecycle, reducing the friction in getting your project started and into production.

Currently, Application Composer supports a limited number of services that we plan to add to in the future. Let us know which services you would like to see included.

As a public preview, we are looking for suggestions and ideas to evolve the tool. We are looking for ways to help you and your teams to speed up the adoption of serverless workloads inside your organization. Add a comment to this post or tweet with the tag #AWSAppComposerWishlist.

For more serverless learning resources, visit Serverless Land.

Introducing new AWS Serverless digital learning badges

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-new-aws-serverless-digital-learning-badges/

This post is written by Josh Kahn, Tech Leader, Serverless.

Today, we are excited to announce an all-new way to demonstrate your AWS Serverless knowledge and skills: a verifiable, digital badge. The new digital badge is aligned with our Serverless Learning Plan now available in AWS Skill Builder.

You can earn the digital badge by scoring at least 80 percent on the assessment associated with the Learning Plan. The badge proves your knowledge and skills for AWS Lambda, Amazon API Gateway, and designing serverless applications. You can celebrate your achievement on your resume, social media, and AWS re:Post with the verifiable badge distributed and managed by Credly. The badge includes metadata to verify the issuer and skills demonstrated by the holder. The Serverless Learning Plan and digital badge assessment are now available, for free.

Ready to get started or want to jump immediately to the assessment? Start here. Continue reading to learn more about the details of AWS Skill Builder and our Serverless Learning Plan.

Serverless Digital Learning Badge

The Serverless Learning Plan

Our Serverless Learning Plan has been designed to help you get started building with Serverless technology. AWS experts designed the content to provide a clear learning path to help you develop the skills you need quickly.

The Learning Plan starts with an introduction to the “Serverless Mindset” and introduces key concepts to help you design architectures and applications. It discusses how to best take advantage of the event-driven orientation of serverless computing.

Next, the course “AWS Lambda Foundations” covers the fundamentals of AWS Lambda, an event-driven compute service that lets you run code without provisioning or managing servers. You’ll learn foundational concepts, including how Lambda works, security and permission models, and best practices for writing Lambda functions.

The Learning Plan also includes four courses that span the lifecycle of building Lambda-based applications. In “Architecting Serverless Applications,” you learn about common architectures and patterns for serverless applications. We explore how to build microservices, data processing workloads, Alexa skills, mobile backends, and automate tasks in your AWS account. The course also discusses the trade-offs in selecting from the various compute options available to you.

The “Scaling Serverless Architectures” course discusses concepts such as Lambda concurrency and how Lambda-based applications scale. We briefly explore optimization opportunities for Lambda functions and trade-offs. While this course is not a deep dive in optimization across all supported runtimes, it offers a starting point.

In “Security and Observability for Serverless Applications,” you’ll learn how to use services such as AWS CloudTrail, AWS Config, and AWS X-Ray in concert with Lambda-based applications. We also discuss the built-in logging to Amazon CloudWatch and considerations. This course also touches on how the Lambda service creates isolation and a security boundary between functions.

There are a number of popular options for deploying and managing serverless applications. In “Deploying Serverless Applications,” we explore the AWS Serverless Application Model (AWS SAM) and the AWS suite of developer tools. You’ll learn best practices for deployment, including how to automate deployment using a CI/CD pipeline. This course also covers concepts such as Lambda versions and aliases, Lambda environment variables, and other deployment features.

Serverless is more than Lambda. During the Learning Plan, you also learn how to use Amazon API Gateway to create and deploy serverless APIs. “Amazon API Gateway for Serverless Applications” discusses REST and WebSocket options available from API Gateway and how to integrate with Lambda and other backends. The course also discusses the rich set of API Gateway features available, including caching, various authorization modes, usage plans, API keys, and deployment stages.

To complete the Learning Plan, we also provide an introduction to event-driven architectures built using services such as AWS Step Functions and Amazon Simple Queue Service (SQS). This course compliments the “Serverless Mindset” course to help you think about how asynchronous processing can improve the resiliency and scalability of your serverless applications.

All courses are available in a variety of languages.

After completing the Learning Plan, take the online assessment and score over 80 percent to earn the digital badge. Our badge assessments are linked to curriculum standards and have been developed by field subject matter experts (SMEs) and content/curriculum SMEs. If you are already familiar with AWS Serverless, you can also jump right to the assessment. If you don’t pass, you’ll be guided on how to fill knowledge gaps and can retake the assessment after 24 hours.

We’re also working to add more courses on topics, such as Amazon EventBridge, and more extensive course work on event-driven architectures next year. Stay tuned.

Our Learning Plan has been designed for you to move at your own pace, from wherever you are. It’s a great opportunity to build new skills or refresh your knowledge. Employers seeking to build knowledge in Serverless can also use the Learning Plan and digital badge to build critical knowledge in the space.

AWS Skill Builder

Beyond our recommended Serverless curriculum, Skill Builder offers a bevy of digital courses developed for different roles (e.g., developer, architect, data engineer) and domains (e.g., storage, databases). Skill Builder offers free learning content as well as subscription plans for individuals and teams. Skill Builder is a great way to advance your skills in areas that often touch serverless applications, including security, observability / monitoring, and DevOps.

We encourage you to check out these other expert-designed courses to help advance your knowledge of AWS. Subscription plans include hands-on labs and certification practice exams. The free content includes over 500 courses and learning plans, all available on-demand so that you can learn at your own pace.

Dive deeper with the AWS Serverless Ramp-up Guide

If you want to dive deeper after completing the AWS Serverless Learning Plan, download the AWS Ramp-Up Guide for Serverless. The guide includes a listing of courses, hands-on workshops, classroom training, and other resources to enrich your serverless knowledge.

Think of the Ramp-Up Guide as a menu of options. Pick and choose the topics that are most interesting to you and move at your own pace. We’ve included digital courses, reading, videos, and workshops to help you learn however is most effective for you.

We’re working to continually update the Ramp-up Guide so that you can easily find up-to-date content to deepen your skills. Check back for updates.

Conclusion

We’re excited to share the newly updated Serverless Learning Plan and all-new digital badge with you. To our knowledge, this is one of the first ways (if not the first) that Serverless builders can verifiably demonstrate their knowledge to the community and employers. Our team of SMEs across AWS Serverless and Training & Certification are excited to hear your feedback on the Learning Plan as well as where you would like to see us develop training next.

The AWS Serverless Learning Plan and digital badge are available now. All courses are available on-demand. Both the learning plan courses and the assessment are free for everyone.

Share your accomplishment by posting on social media with the hashtag #AWSTraining! Get started today at https://aws.amazon.com/training/learn-about/serverless/.

For more serverless learning resources, visit Serverless Land.

Reducing Java cold starts on AWS Lambda functions with SnapStart

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/reducing-java-cold-starts-on-aws-lambda-functions-with-snapstart/

Written by Mark Sailes, Senior Serverless Solutions Architect, AWS.

At AWS re:Invent 2022, AWS announced SnapStart for AWS Lambda functions running on Java Corretto 11. This feature enables customers to achieve up to 10x faster function startup performance for Java functions, at no additional cost, and typically with minimal or no code changes.

Overview

Today, for Lambda’s function invocations, the largest contributor to startup latency is the time spent initializing a function. This includes loading the function’s code and initializing dependencies. For interactive workloads that are sensitive to start-up latencies, this can cause suboptimal end user experience.

To address this challenge, customers either provision resources ahead of time, or spend effort building relatively complex performance optimizations, such as compiling with GraalVM native-image. Although these workarounds help reduce the startup latency, users must spend time on some heavy lifting instead of focusing on delivering business value. SnapStart addresses this concern directly for Java-based Lambda functions.

How SnapStart works

With SnapStart, when a customer publishes a function version, the Lambda service initializes the function’s code. It takes an encrypted snapshot of the initialized execution environment, and persists the snapshot in a tiered cache for low latency access.

When the function is first invoked and then scaled, Lambda resumes the execution environment from the persisted snapshot instead of initializing from scratch. This results in a lower startup latency.

Lambda function lifecycle

Lambda function lifecycle

A function version activated with SnapStart transitions to an inactive state if it remains idle for 14 days, after which Lambda deletes the snapshot. When you try to invoke a function version that is inactive, the invocation fails. Lambda sends a SnapStartNotReadyException and begins initializing a new snapshot in the background, during which the function version remains in Pending state. Wait until the function reaches the Active state, and then invoke it again. To learn more about this process and the function states, read the documentation.

Using SnapStart

Application frameworks such as Spring give developers an enormous productivity gain by reducing the amount of boilerplate code they write to accomplish common tasks. When first created, frameworks didn’t have to consider startup time because they run on application servers, which run for long periods of time. The startup time is minimal compared to the running duration. You often only restart them when there is an application version change.

If the functionality that these frameworks bring is implemented at runtime, then they often contribute to latency in startup time. SnapStart allows you to use frameworks like Spring and not compromise tail latency.

To demonstrate SnapStart, I use a sample application that saves records into Amazon DynamoDB. This Spring Boot application (TODO Link) uses a REST controller to handle CRUD requests. This sample includes infrastructure as code to deploy the application using the AWS Serverless Application Model (AWS SAM). You must install the AWS SAM CLI to deploy this example.

To deploy:

  1. Clone the git repository and change to project directory:
    git clone https://github.com/aws-samples/serverless-patterns.git
    cd serverless-patterns/apigw-lambda-snapstart
  2. Use the AWS SAM CLI to build the application:
    sam build
  3. Use the AWS SAM CLI to deploy the resources to your AWS account:
    sam deploy -g

This project deploys with SnapStart already enabled. To enable or disable this functionality in the AWS Management Console:

  1. Navigate to your Lambda function.
  2. Select the Configuration tab.
  3. Choose Edit and change the SnapStart attribute to PublishedVersions.
  4. Choose Save.

    Lambda Console confoguration

    Lambda Console confoguration

  5. Select the Versions tab and choose Publish new.
  6. Choose Publish.

Once you’ve enabled SnapStart, Lambda publishes all subsequent versions with snapshots. The time to run your publish version depends on your init code. You can run init up to 15 minutes with this feature.

Considerations

Stale credentials

Using SnapStart and restoring from a snapshot often changes how you create functions. With on-demand functions, you might access one time data in the init phase, and then reuse it during future invokes. If this data is ephemeral, a database password for example, then there might be a time between fetching the secret and using it, that the password has changed leading to an error. You must write code to handle this error case.

With SnapStart, if you follow the same approach, your database password is persisted in an encrypted snapshot. All future execution environments have the same state. This can be days, weeks, or longer after the snapshot is taken. This makes it more likely that your function has the incorrect password stored. To improve this, you could move the functionality to fetch the password to the post-snapshot hook. With each approach, it is important to understand your application’s needs and handle errors when they occur.

Demo application architecture

Demo application architecture

A second challenge in sharing the initial state is with randomness and uniqueness. If random seeds are stored in the snapshot during the initialization phase, then it may cause random numbers to be predictable.

Cryptography

AWS has changed the managed runtime to help customers handle the effects of uniqueness and randomness when restoring functions.

Lambda has already incorporated updates to Amazon Linux 2 and one of the commonly used cryptographic libraries, OpenSSL (1.0.2), to make them resilient to snapshot operations. AWS has also validated that Java runtime’s built-in RNG java.security.SecureRandom maintains uniqueness when resuming from a snapshot.

Software that always gets random numbers from the operating system (for example, from /dev/random or /dev/urandom) is already resilient to snapshot operations. It does not need updates to restore uniqueness. However, customers who prefer to implement uniqueness using custom code for their Lambda functions must verify that their code restores uniqueness when using SnapStart.

For more details, read Starting up faster with AWS Lambda SnapStart and refer to Lambda documentation on SnapStart uniqueness.

Runtime hooks

These pre- and post-hooks give developers a way to react to the snapshotting process.

For example, a function that must always preload large amounts of data from Amazon S3 should do this before Lambda takes the snapshot. This embeds the data in the snapshot so that it does not need fetching repeatedly. However, in some cases, you may not want to keep ephemeral data. A password to a database may be rotated frequently and cause unnecessary errors. I discuss this in greater detail in a later section.

The Java managed runtime uses the open-source Coordinated Restore at Checkpoint (CRaC) project to provide hook support. The managed Java runtime contains a customized CRaC context implementation that calls your Lambda function’s runtime hooks before completing snapshot creation and after restoring the execution environment from a snapshot.

The following function example shows how you can create a function handler with runtime hooks. The handler implements the CRaC Resource and the Lambda RequestHandler interface.

...
import org.crac.Resource;
import org.crac.Core;
...

public class HelloHandler implements RequestHandler<String, String>, Resource {

    public HelloHandler() {
        Core.getGlobalContext().register(this);
    }

    public String handleRequest(String name, Context context) throws IOException {
        System.out.println("Handler execution");
        return "Hello " + name;
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("Before Checkpoint");
    }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("After Restore");
    }
}

For the classes required to write runtime hooks, add the following dependency to your project:

Maven

<dependency>
  <groupId>io.github.crac</groupId>
  <artifactId>org-crac</artifactId>
  <version>0.1.3</version>
</dependency>

Gradle

implementation 'io.github.crac:org-crac:0.1.3'

Priming

SnapStart and runtime hooks give you new ways to build your Lambda functions for low startup latency. You can use the pre-snapshot hook to make your Java application as ready as possible for the first invoke. Do as much as possible within your function before the snapshot is taken. This is called priming.

When you upload your zip file of Java code to Lambda, the zip contains .class files of bytecode. This can be run on any machine with a JVM. When the JVM executes your bytecode, it is initially interpreted, then compiled into native machine code. This compilation stage is relatively CPU intensive and happens just in time (JIT Compiler).

You can use the before snapshot hook to run code paths before the snapshot is taken. The JVM compiles these code paths and the optimization is kept for future restores. For example, if you have a function that integrates with DynamoDB, you can make a read operation in your before snapshot hook.

This means that your function code, the AWS SDK for Java, and any other libraries used in that action are compiled and kept within the snapshot. The JVM then won’t need to compile this code when your function is invoked, meaning your latency is less the first time an execution environment is invoked.

Priming requires that you understand your application code and the consequences of executing it. The sample application includes a before snapshot hook, which primes the application by making a read operation from DynamoDB. (TODO Link)

Metrics

The following chart reflects invoking the sample application Lambda function 100 times per second for 10 minutes. This test is based on this function, both with and without SnapStart.

 

p50 p99.9
On-demand 7.87ms 5,114ms
SnapStart 7.87ms 488ms

Conclusion

This blog shows how SnapStart reduces startup (cold-start) latencies times for Java-based Lambda functions. You can configure SnapStart using AWS SDK, AWS CloudFormation, AWS SAM, and CDK.

To learn more, see Configuring function options in the AWS documentation. This functionality may require some minimal code changes. In most cases, the existing code is already compatible with SnapStart. You can now bring your latency-sensitive Java-based workloads to Lambda and run with improved tail latencies.

This feature allows developers to use the on-demand model in Lambda with low-latency response times, without incurring extra cost. To read more about how to use SnapStart with partner frameworks, find out more from Quarkus and Micronaut. To read more about this and other features, visit Serverless Land.

Starting up faster with AWS Lambda SnapStart

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/starting-up-faster-with-aws-lambda-snapstart/

This blog written by Tarun Rai Madan, Sr. Product Manager, AWS Lambda, and Mike Danilov, Sr. Principal Engineer, AWS Lambda.

AWS Lambda SnapStart is a new performance optimization developed by AWS that can significantly improve the startup time for applications. Announced at AWS re:Invent 2022, the first capability to feature SnapStart is Lambda SnapStart for Java. This feature delivers up to 10x faster function startup times for latency-sensitive Java applications at no extra cost, and with minimal or no code changes.

Overview

When applications start up, whether it’s an app on your phone, or a serverless Lambda function, they go through initialization. The initialization process can vary based on the application and the programming language, but even the smallest applications written in the most efficient programming languages require some kind of initialization before they can do anything useful. For a Lambda function, the initialization phase involves downloading the function’s code, starting the runtime and any external dependencies, and running the function’s initialization code. Ordinarily, for a Lambda function, this initialization happens every time your application scales up to create a new execution environment.

With SnapStart, the function’s initialization is done ahead of time when you publish a function version. Lambda takes a Firecracker microVM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access. When your application starts up and scales to handle traffic, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup performance.

The following diagram compares a cold start request lifecycle for a non-SnapStart function and a SnapStart function. The time it takes to initialize the function, which is the predominant contributor to high startup latency, is replaced by a faster resume phase with SnapStart.

Diagram of a non-SnapStart function versus a SnapStart function

Diagram of a non-SnapStart function versus a SnapStart function

Request lifecycle for a non-SnapStart function versus a SnapStart function

Front loading the initialization phase can significantly improve the startup performance for latency-sensitive Lambda functions, such as synchronous microservices that are sensitive to initialization time. Because Java is a dynamic language with its own runtime and garbage collector, Lambda functions written in Java can be amongst the slowest to initialize. For applications that require frequent scaling, the delay introduced by initialization, commonly referred to as a cold start, can lead to a suboptimal experience for end users. Such applications can now start up faster with SnapStart.

AWS’ work in Firecracker makes it simple to use SnapStart. Because SnapStart uses micro Virtual Machine (microVM) snapshots to checkpoint and restore full applications, the approach is adaptable and general purpose. It can be used to speed up many kinds of application starts. While microVMs have long been used for strong secure isolation between applications and environments, the ability to front-load initialization with SnapStart means that microVMs can also augment performance savings at scale.

SnapStart and uniqueness

Lambda SnapStart speeds up applications by re-using a single initialized snapshot to resume multiple execution environments. As a result, unique content included in the snapshot during initialization is reused across execution environments, and so may no longer remain unique. A class of applications where uniqueness of state is a key consideration is cryptographic software, which assumes that the random numbers are truly random (both random and unpredictable). If content such as a random seed is saved in the snapshot during initialization, it is re-used when multiple execution environments resume and may produce predictable random sequences.

To maintain uniqueness, you must verify before using SnapStart that any unique content previously generated during the initialization now gets generated after that initialization. This includes unique IDs, unique secrets, and entropy used to generate pseudo-randomness.

Multiple execution environments resumed from a shared snapshot

SnapStart life cycle

SnapStart life cycle

However, we have implemented a few things to make it easier for customers to maintain uniqueness.

First, it is not common or a best practice for applications to generate these unique items directly. Still, it’s worth confirming that your application handles uniqueness correctly. That’s usually a matter of checking for any unique IDs, keys, timestamps, or “homemade” entropy in the initializer methods for your function.

Lambda offers a SnapStart scanning tool that checks for certain categories of code that assume uniqueness, so customers can make changes as required. The SnapStart scanning tool is an open-source SpotBugs plugin that runs static analysis against a set of rules and reports “potential SnapStart bugs”. We are committed to engaging with the community to expand these set of rules against which the scanning tool checks the code.

As an example, the following Lambda function creates a unique log stream for each execution environment during initialization. This unique value is re-used across execution environments when they re-use a snapshot.

public class LambdaUsingUUID {

    private AWSLogsClient logs;
    private final UUID sandboxId;

    public LambdaUsingUUID() {
       sandboxId = UUID.randomUUID(); // <-- unique content created
       logs = new AWSLogsClient();
    }
    @Override
    public String handleRequest(Map<String,String> event, Context context) {
       CreateLogStreamRequest request = new CreateLogStreamRequest(
         "myLogGroup", sandboxId + ".log9.txt");
         logs.createLogStream(request);     
         return "Hello world!";
    }
} 

When you run the scanning tool on the previous code, the following message helps identify a potential implementation that assumes uniqueness. One way to address such cases is to move the generation of the unique ID inside your function’s handler method.

H C SNAP_START: Detected a potential SnapStart bug in Lambda function initialization code. At LambdaUsingUUID.java: [line 7]

A best practice used by many applications is to rely on the system libraries and kernel for uniqueness. These have long-handled other cases where keys and IDs may be inadvertently duplicated, such as when forking or cloning processes. AWS has worked with upstream kernel maintainers and open source developers so that the existing protection mechanisms use the open standard VM Generation ID (vmgenid) that SnapStart supports. vmgenid is an emulated device, which exposes a 128-bit, cryptographically random integer value identifier to the kernel, and is statistically unique across all resumed microVMs.

Lambda’s included versions of Amazon Linux 2, OpenSSL (1.0.2), and java.security.SecureRandom all automatically re-initialize their randomness and secrets after a SnapStart. Software that always gets random numbers from the operating system (for example, from /dev/random or /dev/urandom) does not need any updates to maintain randomness. Because Lambda always reseeds /dev/random and /dev/urandom when restoring a snapshot, random numbers are not repeated even when multiple execution environments resume from the same snapshot.

Lambda’s request IDs are already unique for each invocation and are available using the getAwsRequestId() method of the Lambda request object. Most Lambda functions should require no modification to run with SnapStart enabled. It’s generally recommended that for SnapStart, you do not include unique state in the function’s initialization code, and use cryptographically secure random number generators (CSPRNGs) when needed.

Second, if you do want to create unique data directly in a Lambda function initialization phase, Lambda supports two new runtime hooks. Runtime hooks are available as part of the open-source Coordinated Restore at Checkpoint (CRaC) project. You can use the beforeCheckpoint hook to run code immediately before a snapshot is taken, and use the afterRestore hook to run code immediately after restoring a snapshot. This helps you delete any unique content before the snapshot is created, and restore any unique content after the snapshot is restored. For an example of how to use CRaC with a reference application, see the CRaC GitHub repository.

Conclusion

This blog describes how SnapStart optimizes startup performance under the hood, and outlines considerations around uniqueness. We also introduce the new interfaces that AWS Lambda provides (via scanning tool and runtime hooks) to customers to maintain uniqueness for their SnapStart functions.

SnapStart is made possible by several pieces of open-source work, including Firecracker, Linux, CraC, OpenSSL and more. AWS is grateful to the maintainers and developers who have made this possible. With this work, we’re excited to launch Lambda SnapStart for Java as what we hope is the first amongst many other capabilities to benefit from the performance savings and enhanced security that SnapStart microVMs provide.

For more serverless learning resources, visit Serverless Land.

Introducing payload-based message filtering for Amazon SNS

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-payload-based-message-filtering-for-amazon-sns/

This post is written by Prachi Sharma (Software Development Manager, Amazon SNS), Mithun Mallick (Principal Solutions Architect, AWS Integration Services), and Otavio Ferreira (Sr. Software Development Manager, Amazon SNS).

Amazon Simple Notification Service (SNS) is a messaging service for Application-to-Application (A2A) and Application-to-Person (A2P) communication. The A2A functionality provides high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications. These applications include Amazon Simple Queue Service (SQS), Amazon Kinesis Data Firehose, AWS Lambda, and HTTP/S endpoints. The A2P functionality enables you to communicate with your customers via mobile text messages (SMS), mobile push notifications, and email notifications.

Today, we’re introducing the payload-based message filtering option of SNS, which augments the existing attribute-based option, enabling you to offload additional filtering logic to SNS and further reduce your application integration costs. For more information, see Amazon SNS Message Filtering.

Overview

You use SNS topics to fan out messages from publisher systems to subscriber systems, addressing your application integration needs in a loosely-coupled way. Without message filtering, subscribers receive every message published to the topic, and require custom logic to determine whether an incoming message needs to be processed or filtered out. This results in undifferentiating code, as well as unnecessary infrastructure costs. With message filtering, subscribers set a filter policy to their SNS subscription, describing the characteristics of the messages in which they are interested. Thus, when a message is published to the topic, SNS can verify the incoming message against the subscription filter policy, and only deliver the message to the subscriber upon a match. For more information, see Amazon SNS Subscription Filter Policies.

However, up until now, the message characteristics that subscribers could express in subscription filter policies were limited to metadata in message attributes. As a result, subscribers could not benefit from message filtering when the messages were published without attributes. Examples of such messages include AWS events published to SNS from 60+ other AWS services, like Amazon Simple Storage Service (S3), Amazon CloudWatch, and Amazon CloudFront. For more information, see Amazon SNS Event Sources.

The new payload-based message filtering option in SNS empowers subscribers to express their SNS subscription filter policies in terms of the contents of the message. This new capability further enables you to use SNS message filtering for your event-driven architectures (EDA) and cross-account workloads, specifically where subscribers may not be able to influence a given publisher to have its events sent with attributes. With payload-based message filtering, you have a simple, no-code option to further prevent unwanted data from being delivered to and processed by subscriber systems, thereby simplifying the subscribers’ code as well as reducing costs associated with downstream compute infrastructure. This new message filtering option is available across SNS Standard and SNS FIFO topics, for JSON message payloads.

Applying payload-based filtering in a use case

Consider an insurance company moving their lead generation platform to a serverless architecture based on microservices, adopting enterprise integration patterns to help them develop and scale these microservices independently. The company offers a variety of insurance types to its customers, including auto and home insurance. The lead generation and processing workflow for each insurance type is different, and entails notifying different backend microservices, each designed to handle a specific type of insurance request.

Payload filtering example

Payload filtering example

The company uses multiple frontend apps to interact with customers and receive leads from them, including a web app, a mobile app, and a call center app. These apps submit the customer-generated leads to an internal lead storage microservice, which then uploads the leads as XML documents to an S3 bucket. Next, the S3 bucket publishes events to an SNS topic to notify that lead documents have been created. Based on the contents of each lead document, the SNS topic forks the workflow by delivering the auto insurance leads to an SQS queue and the home insurance leads to another SQS queue. These SQS queues are respectively polled by the auto insurance and the home insurance lead processing microservices. Each processing microservice applies its business logic to validate the incoming leads.

The following S3 event, in JSON format, refers to a lead document uploaded with key auto-insurance-2314.xml to the S3 bucket. S3 automatically publishes this event to SNS, which in turn matches the S3 event payload against the filter policy of each subscription in the SNS topic. If the event matches the subscription filter policy, SNS delivers the event to the subscribed SQS queue. Otherwise, SNS filters the event out.

{
  "Records": [{
    "eventVersion": "2.1",
    "eventSource": "aws:s3",
    "awsRegion": "sa-east-1",
    "eventTime": "2022-11-21T03:41:29.743Z",
    "eventName": "ObjectCreated:Put",
    "userIdentity": {
      "principalId": "AWS:AROAJ7PQSU42LKEHOQNIC:demo-user"
    },
    "requestParameters": {
      "sourceIPAddress": "177.72.241.11"
    },
    "responseElements": {
      "x-amz-request-id": "SQCC55WT60XABW8CF",
      "x-amz-id-2": "FRaO+XDBrXtx0VGU1eb5QaIXH26tlpynsgaoJrtGYAWYRhfVMtq/...dKZ4"
    },
    "s3": {
      "s3SchemaVersion": "1.0",
      "configurationId": "insurance-lead-created",
      "bucket": {
        "name": "insurance-bucket-demo",
        "ownerIdentity": {
          "principalId": "A1ATLOAF34GO2I"
        },
        "arn": "arn:aws:s3:::insurance-bucket-demo"
      },
      "object": {
        "key": "auto-insurance-2314.xml",
        "size": 17,
        "eTag": "1530accf30cab891d759fa3bb8322211",
        "sequencer": "00737AF379B2683D6C"
      }
    }
  }]
}

To express its interest in auto insurance leads only, the SNS subscription for the auto insurance lead processing microservice sets the following filter policy. Note that, unlike attribute-based policies, payload-based policies support property nesting.

{
  "Records": {
    "s3": {
      "object": {
        "key": [{
          "prefix": "auto-"
        }]
      }
    },
    "eventName": [{
      "prefix": "ObjectCreated:"
    }]
  }
}

Likewise, to express its interest in home insurance leads only, the SNS subscription for the home insurance lead processing microservice sets the following filter policy.

{
  "Records": {
    "s3": {
      "object": {
        "key": [{
          "prefix": "home-"
        }]
      }
    },
    "eventName": [{
      "prefix": "ObjectCreated:"
    }]
  }
}

Note that each filter policy uses the string prefix matching capability of SNS message filtering. In this use case, this matching capability enables the filter policy to match only the S3 objects whose key property value starts with the insurance type it’s interested in (either auto- or home-). Note as well that each filter policy matches only the S3 events whose eventName property value starts with ObjectCreated, as opposed to ObjectRemoved. For more information, see Amazon S3 Event Notifications.

Deploying the resources and filter policies

To deploy the AWS resources for this use case, you need an AWS account with permissions to use SNS, SQS, and S3. On your development machine, install the AWS Serverless Application Model (SAM) Command Line Interface (CLI). You can find the complete SAM template for this use case in the aws-sns-samples repository in GitHub.

The SAM template has a set of resource definitions, as presented below. The first resource definition creates the SNS topic that receives events from S3.

InsuranceEventsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: insurance-events-topic

The next resource definition creates the S3 bucket where the insurance lead documents are stored. This S3 bucket publishes an event to the SNS topic whenever a new lead document is created.

InsuranceEventsBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    DependsOn: InsuranceEventsTopicPolicy
    Properties:
      BucketName: insurance-doc-events
      NotificationConfiguration:
        TopicConfigurations:
          - Topic: !Ref InsuranceEventsTopic
            Event: 's3:ObjectCreated:*'

The next resource definitions create the SQS queues to be subscribed to the SNS topic. As presented in the architecture diagram, there’s one queue for auto insurance leads, and another queue for home insurance leads.

AutoInsuranceEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: auto-insurance-events-queue
      
HomeInsuranceEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: home-insurance-events-queue

The next resource definitions create the SNS subscriptions and their respective filter policies. Note that, in addition to setting the FilterPolicy property, you need to set the FilterPolicyScope property to MessageBody in order to enable the new payload-based message filtering option for each subscription. The default value for the FilterPolicyScope property is MessageAttributes.

AutoInsuranceEventsSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Protocol: sqs
      Endpoint: !GetAtt AutoInsuranceEventsQueue.Arn
      TopicArn: !Ref InsuranceEventsTopic
      FilterPolicyScope: MessageBody
      FilterPolicy:
        '{"Records":{"s3":{"object":{"key":[{"prefix":"auto-"}]}}
        ,"eventName":[{"prefix":"ObjectCreated:"}]}}'
  
HomeInsuranceEventsSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Protocol: sqs
      Endpoint: !GetAtt HomeInsuranceEventsQueue.Arn
      TopicArn: !Ref InsuranceEventsTopic
      FilterPolicyScope: MessageBody
      FilterPolicy:
        '{"Records":{"s3":{"object":{"key":[{"prefix":"home-"}]}}
        ,"eventName":[{"prefix":"ObjectCreated:"}]}}'

Once you download the full SAM template from GitHub to your local development machine, run the following command in your terminal to build the deployment artifacts.

sam build –t SNS-Payload-Based-Filtering-SAM.template

Once SAM has finished building the deployment artifacts, run the following command to deploy the AWS resources and the SNS filter policies. The command guides you through the process of setting deployment preferences, which you can answer based on your requirements. For more information, refer to the SAM Developer Guide.

sam deploy --guided

Once SAM has finished deploying the resources, you can start testing the solution in the AWS Management Console.

Testing the filter policies

Go the AWS CloudFormation console, choose the stack created by the SAM template, then choose the Outputs tab. Note the name of the S3 bucket created.

S3 bucket name

S3 bucket name

Now switch to the S3 console, and choose the bucket with the corresponding name. Once on the bucket details page, upload a test file whose name starts with the auto- prefix. For example, you can name your test file auto-insurance-7156.xml. The upload triggers an S3 event, typed as ObjectCreated, which is then routed through the SNS topic to the SQS queue that stores auto insurance leads.

Insurance bucket contents

Insurance bucket contents

Now switch to the SQS console, and choose to receive messages for the SQS queue storing an auto insurance lead. Note that the SQS queue for home insurance leads is empty.

SQS home insurance queue empty

SQS home insurance queue empty

If you want to check the filter policy configured, you may switch to the SNS console, choose the SNS topic created by the SAM template, and choose the SNS subscription for auto insurance leads. Once on the subscription details page, you can view the filter policy, in JSON format, alongside the filter policy scope set to “Message body”.

SNS filter policy

SNS filter policy

You may repeat the testing steps above, now with another file whose name starts with the home- prefix, and see how the S3 event is routed through the SNS topic to the SQS queue that stores home insurance leads.

Monitoring the filtering activity

CloudWatch provides visibility into your SNS message filtering activity, with dedicated metrics, which also enables you to create alarms. You can use the NumberOfNotifcationsFilteredOut-MessageBody metric to monitor the number of messages filtered out due to payload-based filtering, as opposed to attribute-based filtering. For more information, see Monitoring Amazon SNS topics using CloudWatch.

Moreover, you can use the NumberOfNotificationsFilteredOut-InvalidMessageBody metric to monitor the number of messages filtered out due to having malformed JSON payloads. You can have these messages with malformed JSON payloads moved to a dead-letter queue (DLQ) for troubleshooting purposes. For more information, see Designing Durable Serverless Applications with DLQ for Amazon SNS.

Cleaning up

To delete all the AWS resources that you created as part of this use case, run the following command from the project root directory.

sam delete

Conclusion

In this blog post, we introduce the use of payload-based message filtering for SNS, which provides event routing for JSON-formatted messages. This enables you to write filter policies based on the contents of the messages published to SNS. This also removes the message parsing overhead from your subscriber systems, as well as any custom logic from your publisher systems to move message properties from the payload to the set of attributes. Lastly, payload-based filtering can facilitate your event-driven architectures (EDA) by enabling you to filter events published to SNS from 60+ other AWS event sources.

For more information, see Amazon SNS Message Filtering, Amazon SNS Event Sources, and Amazon SNS Pricing. For more serverless learning resources, visit Serverless Land.

Introducing container, database, and queue utilization metrics for the Amazon MWAA environment

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/introducing-container-database-and-queue-utilization-metrics-for-the-amazon-mwaa-environment/

This post is written by Uma Ramadoss (Senior Specialist Solutions Architect), and Jeetendra Vaidya (Senior Solutions Architect).

Today, AWS is announcing the availability of container, database, and queue utilization metrics for Amazon Managed Workflows for Apache Airflow (Amazon MWAA). This is a new collection of metrics published by Amazon MWAA in addition to existing Apache Airflow metrics in Amazon CloudWatch. With these new metrics, you can better understand the performance of your Amazon MWAA environment, troubleshoot issues related to capacity, delays, and get insights on right-sizing your Amazon MWAA environment.

Previously, customers were limited to Apache Airflow metrics such as DAG processing parse times, pool running slots, and scheduler heartbeat to measure the performance of the Amazon MWAA environment. While these metrics are often effective in diagnosing Airflow behavior, they lack the ability to provide complete visibility into the utilization of the various Apache Airflow components in the Amazon MWAA environment. This could limit the ability for some customers to monitor the performance and health of the environment effectively.

Overview

Amazon MWAA is a managed service for Apache Airflow. There are a variety of deployment techniques with Apache Airflow. The Amazon MWAA deployment architecture of Apache Airflow is carefully chosen to allow customers to run workflows in production at scale.

Amazon MWAA has distributed architecture with multiple schedulers, auto-scaled workers, and load balanced web server. They are deployed in their own Amazon Elastic Container Service (ECS) cluster using AWS Fargate compute engine. Amazon Simple Queue Service (SQS) queue is used to decouple Airflow workers and schedulers as part of Celery Executor architecture. Amazon Aurora PostgreSQL-Compatible Edition is used as the Apache Airflow metadata database. From today, you can get complete visibility into the scheduler, worker, web server, database, and queue metrics.

In this post, you can learn about the new metrics published for Amazon MWAA environment, build a sample application with a pre-built workflow, and explore the metrics using CloudWatch dashboard.

Container, database, and queue utilization metrics

  1. In the CloudWatch console, in Metrics, select All metrics.
  2. From the metrics console that appears on the right, expand AWS namespaces and select MWAA tile.
  3. MWAA metrics

    MWAA metrics

  4. You can see a tile of dimensions, each corresponding to the container (cluster), database, and queue metrics.
  5. MWAA metrics drilldown

    MWAA metrics drilldown

Cluster metrics

The base MWAA environment comes up with three Amazon ECS clusters – scheduler, one worker (BaseWorker), and a web server. Workers can be configured with minimum and maximum numbers. When you configure more than one minimum worker, Amazon MWAA creates another ECS cluster (AdditionalWorker) to host the workers from 2 up to n where n is the max workers configured in your environment.

When you select Cluster from the console, you can see the list of metrics for all the clusters. To learn more about the metrics, visit the Amazon ECS product documentation.

MWAA metrics list

MWAA metrics list

CPU usage is the most important factor for schedulers due to DAG file processing. When you have many DAGs, CPU usage can be higher. You can improve the performance by setting min_file_process_interval higher. Similarly, you can apply other techniques described in the Apache Airflow Scheduler page to fine tune the performance.

Higher CPU or memory utilization in the worker can be due to moving large files or doing computation on the worker itself. This can be resolved by offloading the compute to purpose-built services such as Amazon ECS, Amazon EMR, and AWS Glue.

Database metrics

Amazon Aurora DB clusters used by Amazon MWAA come up with a primary DB instance and a read replica to support the read operations. Amazon MWAA publishes database metrics for both READER and WRITER instances. When you select Database tile, you can view the list of metrics available for the database cluster.

Database metrics

Database metrics

Amazon MWAA uses connection pooling technique so the database connections from scheduler, workers, and web servers are taken from the connection pool. If you have many DAGs scheduled to start at the same time, it can overload the scheduler and increase the number of database connections at a high frequency. This can be minimized by staggering the DAG schedule.

SQS metrics

An SQS queue helps decouple scheduler and worker so they can independently scale. When workers read the messages, they are considered in-flight and not available for other workers. Messages become available for other workers to read if they are not deleted before the 12 hours visibility timeout. Amazon MWAA publishes in-flight message count (RunningTasks), messages available for reading count (QueuedTasks) and the approximate age of the earliest non-deleted message (ApproximateAgeOfOldestTask).

Database metrics

Database metrics

Getting started with container, database and queue utilization metrics for Amazon MWAA

The following sample project explores some key metrics using an Amazon CloudWatch dashboard to help you find the number of workers running in your environment at any given moment.

The sample project deploys the following resources:

  • Amazon Virtual Private Cloud (Amazon VPC).
  • Amazon MWAA environment of size small with 2 minimum workers and 10 maximum workers.
  • A sample DAG that fetches NOAA Global Historical Climatology Network Daily (GHCN-D) data, uses AWS Glue Crawler to create tables and AWS Glue Job to produce an output dataset in Apache Parquet format that contains the details of precipitation readings for the US between year 2010 and 2022.
  • Amazon MWAA execution role.
  • Two Amazon S3 buckets – one for Amazon MWAA DAGs, one for AWS Glue job scripts and weather data.
  • AWSGlueServiceRole to be used by AWS Glue Crawler and AWS Glue job.

Prerequisites

There are a few tools required to deploy the sample application. Ensure that you have each of the following in your working environment:

Setting up the Amazon MWAA environment and associated resources

  1. From your local machine, clone the project from the GitHub repository.
  2. git clone https://github.com/aws-samples/amazon-mwaa-examples

  3. Navigate to mwaa_utilization_cw_metric directory.
  4. cd usecases/mwaa_utilization_cw_metric

  5. Run the makefile.
  6. make deploy

  7. Makefile runs the terraform template from the infra/terraform directory. While the template is being applied, you are prompted if you want to perform these actions.
  8. MWAA utilization terminal

    MWAA utilization terminal

This provisions the resources and copies the necessary files and variables for the DAG to run. This process can take approximately 30 minutes to complete.

Generating metric data and exploring the metrics

  1. Login into your AWS account through the AWS Management Console.
  2. In the Amazon MWAA environment console, you can see your environment with the Airflow UI link in the right of the console.
  3. MMQA environment console

    MMQA environment console

  4. Select the link Open Airflow UI. This loads the Apache Airflow UI.
  5. Apache Airflow UI

    Apache Airflow UI

  6. From the Apache Airflow UI, enable the DAG using Pause/Unpause DAG toggle button and run the DAG using the Trigger DAG link.
  7. You can see the Treeview of the DAG run with the tasks running.
  8. Navigate to the Amazon CloudWatch dashboard in another browser tab. You can see a dashboard by the name, MWAA_Metric_Environment_env_health_metric_dashboard.
  9. Access the dashboard to view different key metrics across cluster, database, and queue.
  10. MWAA dashboard

    MWAA dashboard

  11. After the DAG run is complete, you can look into the dashboard for worker count metrics. Worker count started with 2 and increased to 4.

When you trigger the DAG, the DAG runs 13 tasks in parallel to fetch weather data from 2010-2022. With two small size workers, the environment can run 10 parallel tasks. The rest of the tasks wait for either the running tasks to complete or automatic scaling to start. As the tasks take more than a few minutes to finish, MWAA automatic scaling adds additional workers to handle the workload. Worker count graph now plots higher with AdditionalWorker count increased to 3 from 1.

Cleanup

To delete the sample application infrastructure, use the following command from the usecases/mwaa_utilization_cw_metric directory.

make undeploy

Conclusion

This post introduces the new Amazon MWAA container, database, and queue utilization metrics. The example shows the key metrics and how you can use the metrics to solve a common question of finding the Amazon MWAA worker counts. These metrics are available to you from today for all versions supported by Amazon MWAA at no additional cost.

Start using this feature in your account to monitor the health and performance of your Amazon MWAA environment, troubleshoot issues related to capacity and delays, and to get insights into right-sizing the environment

Build your own CloudWatch dashboard using the metrics data JSON and Airflow metrics. To deploy more solutions in Amazon MWAA, explore the Amazon MWAA samples GitHub repo.

For more serverless learning resources, visit Serverless Land.

Apache, Apache Airflow, and Airflow are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Using the AWS Parameter and Secrets Lambda extension to cache parameters and secrets

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-the-aws-parameter-and-secrets-lambda-extension-to-cache-parameters-and-secrets/

This post is written by Pal Patel, Solutions Architect, and Saud ul Khalid, Sr. Cloud Support Engineer.

Serverless applications often rely on AWS Systems Manager Parameter Store or AWS Secrets Manager to store configuration data, encrypted passwords, or connection details for a database or API service.

Previously, you had to make runtime API calls to AWS Parameter Store or AWS Secrets Manager every time you wanted to retrieve a parameter or a secret inside the execution environment of an AWS Lambda function. This involved configuring and initializing the AWS SDK client and managing when to store values in memory to optimize the function duration, and avoid unnecessary latency and cost.

The new AWS Parameters and Secrets Lambda extension provides a managed parameters and secrets cache for Lambda functions. The extension is distributed as a Lambda layer that provides an in-memory cache for parameters and secrets. It allows functions to persist values through the Lambda execution lifecycle, and provides a configurable time-to-live (TTL) setting.

When you request a parameter or secret in your Lambda function code, the extension retrieves the data from the local in-memory cache, if it is available. If the data is not in the cache or it is stale, the extension fetches the requested parameter or secret from the respective service. This helps to reduce external API calls, which can improve application performance and reduce cost. This blog post shows how to use the extension.

Overview

The following diagram provides a high-level view of the components involved.

High-level architecture showing how parameters or secrets are retrieved when using the Lambda extension

The extension can be added to new or existing Lambda. It works by exposing a local HTTP endpoint to the Lambda environment, which provides the in-memory cache for parameters and secrets. When retrieving a parameter or secret, the extension first queries the cache for a relevant entry. If an entry exists, the query checks how much time has elapsed since the entry was first put into the cache, and returns the entry if the elapsed time is less than the configured cache TTL. If the entry is stale, it is invalidated, and fresh data is retrieved from either Parameter Store or Secrets Manager.

The extension uses the same Lambda IAM execution role permissions to access Parameter Store and Secrets Manager, so you must ensure that the IAM policy is configured with the appropriate access. Permissions may also be required for AWS Key Management Service (AWS KMS) if you are using this service. You can find an example policy in the example’s AWS SAM template.

Example walkthrough

Consider a basic serverless application with a Lambda function connecting to an Amazon Relational Database Service (Amazon RDS) database. The application loads a configuration stored in Parameter Store and connects to the database. The database connection string (including user name and password) is stored in Secrets Manager.

This example walkthrough is composed of:

  • A Lambda function.
  • An Amazon Virtual Private Cloud (VPC).
  • Multi-AZ Amazon RDS Instance running MySQL.
  • AWS Secrets Manager database secret that holds database connection.
  • AWS Systems Manager Parameter Store parameter that holds the application configuration.
  • An AWS Identity and Access Management (IAM) role that the Lambda function uses.

Lambda function

This Python code shows how to retrieve the secrets and parameters using the extension

import pymysql
import urllib3
import os
import json

### Load in Lambda environment variables
port = os.environ['PARAMETERS_SECRETS_EXTENSION_HTTP_PORT']
aws_session_token = os.environ['AWS_SESSION_TOKEN']
env = os.environ['ENV']
app_config_path = os.environ['APP_CONFIG_PATH']
creds_path = os.environ['CREDS_PATH']
full_config_path = '/' + env + '/' + app_config_path

### Define function to retrieve values from extension local HTTP server cachce
def retrieve_extension_value(url): 
    http = urllib3.PoolManager()
    url = ('http://localhost:' + port + url)
    headers = { "X-Aws-Parameters-Secrets-Token": os.environ.get('AWS_SESSION_TOKEN') }
    response = http.request("GET", url, headers=headers)
    response = json.loads(response.data)   
    return response  

def lambda_handler(event, context):
       
    ### Load Parameter Store values from extension
    print("Loading AWS Systems Manager Parameter Store values from " + full_config_path)
    parameter_url = ('/systemsmanager/parameters/get/?name=' + full_config_path)
    config_values = retrieve_extension_value(parameter_url)['Parameter']['Value']
    print("Found config values: " + json.dumps(config_values))

    ### Load Secrets Manager values from extension
    print("Loading AWS Secrets Manager values from " + creds_path)
    secrets_url = ('/secretsmanager/get?secretId=' + creds_path)
    secret_string = json.loads(retrieve_extension_value(secrets_url)['SecretString'])
    #print("Found secret values: " + json.dumps(secret_string))

    rds_host =  secret_string['host']
    rds_db_name = secret_string['dbname']
    rds_username = secret_string['username']
    rds_password = secret_string['password']
    
    
    ### Connect to RDS MySQL database
    try:
        conn = pymysql.connect(host=rds_host, user=rds_username, passwd=rds_password, db=rds_db_name, connect_timeout=5)
    except:
        raise Exception("An error occurred when connecting to the database!")

    return "DemoApp sucessfully loaded config " + config_values + " and connected to RDS database " + rds_db_name + "!"

In the global scope the environment variable PARAMETERS_SECRETS_EXTENSION_HTTP_PORT is retrieved, which defines the port the extension HTTP server is running on. This defaults to 2773.

The retrieve_extension_value function calls the extension’s local HTTP server, passing in the X-Aws-Parameters-Secrets-Token as a header. This is a required header that uses the AWS_SESSION_TOKEN value, which is present in the Lambda execution environment by default.

The Lambda handler code uses the extension cache on every Lambda invoke to obtain configuration data from Parameter Store and secret data from Secrets Manager. This data is used to make a connection to the RDS MySQL database.

Prerequisites

  1. Git installed
  2. AWS SAM CLI version 1.58.0 or greater.

Deploying the resources

  1. Clone the repository and navigate to the solution directory:
    git clone https://github.com/aws-samples/parameters-secrets-lambda-extension-
    sample.git

     

     

  2. Build and deploy the application using following command:
    sam build
    sam deploy --guided

This template takes the following parameters:

  • pVpcCIDR — IP range (CIDR notation) for the VPC. The default is 172.31.0.0/16.
  • pPublicSubnetCIDR — IP range (CIDR notation) for the public subnet. The default is 172.31.3.0/24.
  • pPrivateSubnetACIDR — IP range (CIDR notation) for the private subnet A. The default is 172.31.2.0/24.
  • pPrivateSubnetBCIDR — IP range (CIDR notation) for the private subnet B, which defaults to 172.31.1.0/24
  • pDatabaseName — Database name for DEV environment, defaults to devDB
  • pDatabaseUsername — Database user name for DEV environment, defaults to myadmin
  • pDBEngineVersion — The version number of the SQL database engine to use (the default is 5.7).

Adding the Parameter Store and Secrets Manager Lambda extension

To add the extension:

  1. Navigate to the Lambda console, and open the Lambda function you created.
  2. In the Function Overview pane. select Layers, and then select Add a layer.
  3. In the Choose a layer pane, keep the default selection of AWS layers and in the dropdown choose AWS Parameters and Secrets Lambda Extension
  4. Select the latest version available and choose Add.

The extension supports several configurable options that can be set up as Lambda environment variables.

This example explicitly sets an extension port and TTL value:

Lambda environment variables from the Lambda console

Testing the example application

To test:

  1. Navigate to the function created in the Lambda console and select the Test tab.
  2. Give the test event a name, keep the default values and then choose Create.
  3. Choose Test. The function runs successfully:

Lambda execution results visible from Lambda console after successful invocation.

To evaluate the performance benefits of the Lambda extension cache, three tests were run using the open source tool Artillery to load test the Lambda function. This can use the Lambda URL to invoke the function. The Artillery configuration snippet shows the duration and requests per second for the test:

config:
  target: "https://lambda.us-east-1.amazonaws.com"
  phases:
    -
      duration: 60
      arrivalRate: 10
      rampTo: 40

scenarios:
  -
    flow:
      -
        post:
          url: "https://abcdefghijjklmnopqrst.lambda-url.us-east-1.on.aws/"
  • Test 1: The extension cache is disabled by setting the TTL environment variable to 0. This results in 1650 GetParameter API calls to Parameter Store over 60 seconds.
  • Test 2: The extension cache is enabled with a TTL of 1 second. This results in 106 GetParameter API calls over 60 seconds.
  • Test 3: The extension is enabled with a TTL value of 300 seconds. This results in only 18 GetParameter API calls over 60 seconds.

In test 3, the TTL value is longer than the test duration. The 18 GetParameter calls correspond to the number of Lambda execution environments created by Lambda to run requests in parallel. Each execution environment has its own in-memory cache and so each one needs to make the GetParameter API call.

In this test, using the extension has reduced API calls by ~98%. Reduced API calls results in reduced function execution time, and therefore reduced cost.

Cleanup

After you test this example, delete the resources created by the template, using following commands from the same project directory to avoid continuing charges to your account.

sam delete

Conclusion

Caching data retrieved from external services is an effective way to improve the performance of your Lambda function and reduce costs. Implementing a caching layer has been made simpler with this AWS-managed Lambda extension.

For more information on the Parameter Store, Secrets Manager, and Lambda extensions, refer to:

For more serverless learning resources, visit Serverless Land.

Introducing cross-account access capabilities for AWS Step Functions

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-cross-account-access-capabilities-for-aws-step-functions/

This post is written by Siarhei Kazhura, Senior Solutions Architect, Serverless.

AWS Step Functions allows you to integrate with more than 220 AWS services by using optimized integrations (for services such as AWS Lambda), and AWS SDK integrations. These capabilities provide the ability to build robust solutions using AWS Step Functions as the engine behind the solution.

Many customers are using multiple AWS accounts for application development. Until today, customers had to rely on resource-based policies to make cross-account access for Step Functions possible. With resource-based policies, you can specify who has access to the resource and what actions they can perform on it.

Not all AWS services support resource-based policies. For example, it is possible to enable cross-account access via resource-based policies with services like AWS Lambda, Amazon SQS, or Amazon SNS. However, services such as Amazon DynamoDB do not support resource-based policies, so your workflows can only use Step Functions’ direct integration if it belongs to the same account.

Now, customers can take advantage of identity-based policies in Step Functions so your workflow can directly invoke resources in other AWS accounts, thus allowing cross-account service API integrations.

Overview

This example demonstrates how to use cross-account capability using two AWS accounts:

  • A trusted AWS account (account ID 111111111111) with a Step Functions workflow named SecretCacheConsumerWfw, and an IAM role named TrustedAccountRl.
  • A trusting AWS account (account ID 222222222222) with a Step Functions workflow named SecretCacheWfw, and two IAM roles named TrustingAccountRl, and SecretCacheWfwRl.

AWS Step Functions cross-account workflow example

At a high level:

  1. The SecretCacheConsumerWfw workflow runs under TrustedAccountRl role in the account 111111111111. The TrustedAccountRl role has permissions to assume the TrustingAccountRl role from the account 222222222222.
  2. The FetchConfiguration Step Functions task fetches the TrustingAccountRl role ARN, the SecretCacheWfw workflow ARN, and the secret ARN (all these resources belong to the Trusting AWS account).
  3. The GetSecretCrossAccount Step Functions task has a Credentials field with the TrustingAccountRl role ARN specified (fetched in the step 2).
  4. The GetSecretCrossAccount task assumes the TrustingAccountRl role during the SecretCacheConsumerWfw workflow execution.
  5. The SecretCacheWfw workflow (that belongs to the account 222222222222) is invoked by the SecretCacheConsumerWfw workflow under the TrustingAccountRl role.
  6. The results are returned to the SecretCacheConsumerWfw workflow that belongs to the account 111111111111.

The SecretCacheConsumerWfw workflow definition specifies the Credentials field and the RoleArn. This allows the GetSecretCrossAccount step to assume an IAM role that belongs to a separate AWS account:

{
  "StartAt": "FetchConfiguration",
  "States": {
    "FetchConfiguration": {
      "Type": "Task",
      "Next": "GetSecretCrossAccount",
      "Parameters": {
        "Name": "<ConfigurationParameterName>"
      },
      "Resource": "arn:aws:states:::aws-sdk:ssm:getParameter",
      "ResultPath": "$.Configuration",
      "ResultSelector": {
        "Params.$": "States.StringToJson($.Parameter.Value)"
      }
    },
    "GetSecretCrossAccount": {
      "End": true,
      "Type": "Task",
      "ResultSelector": {
        "Secret.$": "States.StringToJson($.Output)"
      },
      "Resource": "arn:aws:states:::aws-sdk:sfn:startSyncExecution",
      "Credentials": {
        "RoleArn.$": "$.Configuration.Params.trustingAccountRoleArn"
      },
      "Parameters": {
        "Input.$": "$.Configuration.Params.secret",
        "StateMachineArn.$": "$.Configuration.Params.trustingAccountWorkflowArn"
      }
    }
  }
}

Permissions

AWS Step Functions cross-account permissions setup example

At a high level:

  1. The TrustedAccountRl role belongs to the account 111111111111.
  2. The TrustingAccountRl role belongs to the account 222222222222.
  3. A trust relationship setup between the TrustedAccountRl and the TrustingAccountRl role.
  4. The SecretCacheConsumerWfw workflow is executed under the TrustedAccountRl role in the account 111111111111.
  5. The SecretCacheWfw is executed under the SecretCacheWfwRl role in the account 222222222222.

The TrustedAccountRl role (1) has the following trust policy setup that allows the SecretCacheConsumerWfw workflow to assume (4) the role.

{
  "RoleName": "<TRUSTED_ACCOUNT_ROLE_NAME>",
  "AssumeRolePolicyDocument": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "states.<REGION>.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }
}

The TrustedAccountRl role (1) has the following permissions configured that allow it to assume (3) the TrustingAccountRl role (2).

{
  "RoleName": "<TRUSTED_ACCOUNT_ROLE_NAME>",
  "PolicyDocument": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Resource":  "arn:aws:iam::<TRUSTING_ACCOUNT>:role/<TRUSTING_ACCOUNT_ROLE_NAME>",
        "Effect": "Allow"
      }
    ]
  }
}

The TrustedAccountRl role (1) has the following permissions setup that allow it to access Parameter Store, a capability of AWS Systems Manager, and fetch the required configuration.

{
  "RoleName": "<TRUSTED_ACCOUNT_ROLE_NAME>",
  "PolicyDocument": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": [
          "ssm:DescribeParameters",
          "ssm:GetParameter",
          "ssm:GetParameterHistory",
          "ssm:GetParameters"
        ],
        "Resource": "arn:aws:ssm:<REGION>:<TRUSTED_ACCOUNT>:parameter/<CONFIGURATION_PARAM_NAME>",
        "Effect": "Allow"
      }
    ]
  }
}

The TrustingAccountRl role (2) has the following trust policy that allows it to be assumed (3) by the TrustedAccountRl role (1). Notice the Condition field setup. This field allows us to further control which account and state machine can assume the TrustingAccountRl role, preventing the confused deputy problem.

{
  "RoleName": "<TRUSTING_ACCOUNT_ROLE_NAME>",
  "AssumeRolePolicyDocument": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::<TRUSTED_ACCOUNT>:role/<TRUSTED_ACCOUNT_ROLE_NAME>"
        },
        "Action": "sts:AssumeRole",
        "Condition": {
          "StringEquals": {
            "sts:ExternalId": "arn:aws:states:<REGION>:<TRUSTED_ACCOUNT>:stateMachine:<CACHE_CONSUMER_WORKFLOW_NAME>"
          }
        }
      }
    ]
  }
}

The TrustingAccountRl role (2) has the following permissions configured that allow it to start Step Functions Express Workflows execution synchronously. This capability is needed because the SecretCacheWfw workflow is invoked by the SecretCacheConsumerWfw workflow under the TrustingAccountRl role via a StartSyncExecution API call.

{
  "RoleName": "<TRUSTING_ACCOUNT_ROLE_NAME>",
  "PolicyDocument": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "states:StartSyncExecution",
        "Resource": "arn:aws:states:<REGION>:<TRUSTING_ACCOUNT>:stateMachine:<SECRET_CACHE_WORKFLOW_NAME>",
        "Effect": "Allow"
      }
    ]
  }
}

The SecretCacheWfw workflow is running under a separate identity – the SecretCacheWfwRl role. This role has the permissions that allow it to get secrets from AWS Secrets Manager, read/write to DynamoDB table, and invoke Lambda functions.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "secretsmanager:getSecretValue",
            ],
            "Resource": "arn:aws:secretsmanager:<REGION>:<TRUSTING_ACCOUNT>:secret:*",
            "Effect": "Allow"
        },
        {
            "Action": "dynamodb:GetItem",
            "Resource": "arn:aws:dynamodb:<REGION>:<TRUSTING_ACCOUNT>:table/<SECRET_CACHE_DDB_TABLE_NAME>",
            "Effect": "Allow"
        },
        {
            "Action": "lambda:InvokeFunction",
            "Resource": [
"arn:aws:lambda:<REGION>:<TRUSTING_ACCOUNT>:function:<CACHE_SECRET_FUNCTION_NAME>",
"arn:aws:lambda:<REGION>:<TRUSTING_ACCOUNT>:function:<CACHE_SECRET_FUNCTION_NAME>:*"
            ],
            "Effect": "Allow"
        }
    ]
}

Comparing with resource-based policies

To implement the solution above using resource-based policies, you must front the SecretCacheWfw with a resource that supports resource base policies. You can use Lambda for this purpose. A Lambda function has a resource permissions policy that allows for the access by SecretCacheConsumerWfw workflow.

The function proxies the call to the SecretCacheWfw, waits for the workflow to finish (synchronous call), and yields the result back to the SecretCacheConsumerWfw. However, this approach has a few disadvantages:

  • Extra cost: With Lambda you are charged based on the number of requests for your function, and the duration it takes for your code to run.
  • Additional code to maintain: The code must take the payload from the SecretCacheConsumerWfw workflow and pass it to the SecretCacheWfw workflow.
  • No out-of-the-box error handling: The code must handle errors correctly, retry the request in case of a transient error, provide the ability to do exponential backoff, and provide a circuit breaker in case of persistent errors. Error handling capabilities are provided natively by Step Functions.

AWS Step Functions cross-account setup using resource-based policies

The identity-based policy permission solution provides multiple advantages over the resource-based policy permission solution in this case.

However, resource-based policy permissions provide some advantages and can be used in conjunction with identity-based policies. Identity-based policies and resource-based policies are both permissions policies and are evaluated together:

  • Single point of entry: Resource-based policies are attached to a resource. With resource-based permissions policies, you control what identities that do not belong to your AWS account have access to the resource at the resource level. This allows for easier reasoning about what identity has access to the resource. AWS Identity and Access Management Access Analyzer can help with the identity-based policies, providing an ability to identify resources that are shared with an external identity.
  • The principal that accesses a resource via a resource-based policy still works in the trusted account and does not have to give its permissions to receive the cross-account role permissions. In this example, SecretCacheConsumerWfw still runs under TrustedAccountRl role, and does not need to assume an IAM role in the Trusting AWS account to access the Lambda function.

Refer to the how IAM roles differ from resource-based policies article for more information.

Solution walkthrough

To follow the solution walkthrough, visit the solution repository. The walkthrough explains:

  1. Prerequisites required.
  2. Detailed solution deployment walkthrough.
  3. Solution testing.
  4. Cleanup process.
  5. Cost considerations.

Conclusion

This post demonstrates how to create a Step Functions Express Workflow in one account and call it from a Step Functions Standard Workflow in another account using a new credentials capability of AWS Step Functions. It provides an example of a cross-account IAM roles setup that allows for the access. It also provides a walk-through on how to use AWS CDK for TypeScript to deploy the example.

For more serverless learning resources, visit Serverless Land.

Node.js 18.x runtime now available in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/node-js-18-x-runtime-now-available-in-aws-lambda/

This post is written by Suraj Tripathi, Cloud Consultant, AppDev.

You can now develop AWS Lambda functions using the Node.js 18 runtime. This version is in active LTS status and considered ready for general use. When creating or updating functions, specify a runtime parameter value of nodejs18.x or use the appropriate container base image to use this new runtime.

This runtime version is supported by functions running on either Arm-based AWS Graviton2 processors or x86-based processors. Using the Graviton2 processor architecture option allows you to get up to 34% better price performance.

This blog post explains the major changes available with the Node.js 18 runtime in Lambda.

AWS SDK for JavaScript upgrade to v3

Lambda’s Node.js runtimes include the AWS SDK for JavaScript. This enables customers to use the AWS SDK to connect to other AWS services from their function code, without having to include the AWS SDK in their function deployment. This is especially useful when creating functions in the AWS Management Console. It’s also useful for Lambda functions deployed as inline code in CloudFormation templates.

Up until Node.js 16, Lambda’s Node.js runtimes have included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2020. With this release, Lambda has upgraded the version of the AWS SDK for JavaScript included with the runtime from v2 to v3.

If your existing Lambda functions are using the included SDK v2, then you must update your function code to use the SDK v3 when upgrading to the Node.js 18 runtime. This is the recommended approach when upgrading existing functions to Node.js 18. Alternatively, you can use the Node.js 18 runtime without updating your existing code if you deploy the SDK v2 together with your function code.

Version 3 of the SDK for JavaScript offers many benefits over version 2. Most importantly, it is modular, so your code only loads the modules it needs. Modularity also reduces your function size if you choose to deploy the SDK with your function code rather than using the version built into the Lambda runtime. Learn more about optimizing Node.js dependencies in Lambda here.

For example, for a function interacting with Amazon S3 using the v2 SDK, you import the entire SDK, even though you don’t use most of it:

const AWS = require("aws-sdk");

With the v3 SDK, you only import the modules you need, such as ListBucketsCommand, and a service client like S3Client.

import { S3Client, ListBucketsCommand } from "@aws-sdk/client-s3";

Another difference between SDK v2 and SDK v3 is the default settings for TCP connection re-use. In the SDK v2, connection re-use is disabled by default. In SDK v3, it is enabled by default. In most cases, enabling connection re-use improves function performance. To stop TCP connection reuse, set the AWS_NODEJS_CONNECTION_REUSE_ENABLED environment variable to false. You can also stop keeping the connections alive on a per-service client basis.

For more information, see Why and how you should use AWS SDK for JavaScript (v3) on Node.js 18.

Support for ES module resolution using NODE_PATH

Another change in the Node.js 18 runtime is added support for ES module resolution via the NODE_PATH environment variable.

ES modules are supported by Lambda’s Node.js 14 and Node.js 16 runtimes. They enable top-level await, which can lower cold start latency when used with Provisioned Concurrency. However, by default Node.js does not search the folders in the NODE_PATH environment variable when importing ES modules. This makes it difficult to import ES modules from folders outside of the /var/task/ folder in which the function code is deployed. For example, to load the AWS SDK included in the runtime as an ES module, or to load ES modules from Lambda layers.

The Node.js 18.x runtime for Lambda searches the folders listed in NODE_PATH when loading ES modules. This makes it easier to include the AWS SDK as an ES module or load ES modules from Lambda layers.

Node.js 18 language updates

The Lambda Node.js 18 runtime also enables you to take advantage of new Node.js 18 language features. This includes improved performance for class fields and private class methods, JSON import assertions, and experimental features such as the Fetch API, Test Runner module, and Web Streams API.

JSON import assertion

The import assertions feature allows module import statements to include additional information alongside the module specifier. Now the following code is valid:

// index.mjs

// static import
import fooData from './foo.json' assert { type: 'json' };

// dynamic import
const { default: barData } = await import('./bar.json', { assert: { type: 'json' } });

export const handler = async(event) => {

    console.log(fooData)
    // logs data in foo.json file
    console.log(barData)
    // logs data in bar.json file

    const response = {
        statusCode: 200,
        body: JSON.stringify('Hello from Lambda!'),
    };
    return response;
};

foo.json

{
  "foo1" : "1234",
  "foo2" : "4678"
}

bar.json

{
  "bar1" : "0001",
  "bar2" : "0002"
}

Experimental features

While still experimental, the global fetch API is available by default in Node.js 18. The API includes a fetch function, making fetch polyfills and third-party HTTP packages redundant.

// index.mjs 

export const handler = async(event) => {
    
    const res = await fetch('https://nodejs.org/api/documentation.json');
    if (res.ok) {
      const data = await res.json();
      console.log(data);
    }

    const response = {
        statusCode: 200,
        body: JSON.stringify('Hello from Lambda!'),
    };
    return response;
};

Experimental features in Node.js can be enabled/disabled via the NODE_OPTIONS environment variable. For example, to stop the experimental fetch API you can create a Lambda environment variable NODE_OPTIONS and set the value to --no-experimental-fetch.

With this change, if you run the previous code for the fetch API in your Lambda function, it throws a reference error because the experimental fetch API is now disabled.

Conclusion

Node.js 18 is now supported by Lambda. When building your Lambda functions using the zip archive packaging style, use a runtime parameter value of nodejs18.x to get started building with Node.js 18.

You can also build Lambda functions in Node.js 18 by deploying your function code as a container image using the Node.js 18 AWS base image for Lambda. You may learn more about writing functions in Node.js 18 by reading about the Node.js programming model in the Lambda documentation.

For existing Node.js functions, review your code for compatibility with Node.js 18, including deprecations, then migrate to the new runtime by changing the function’s runtime configuration to nodejs18.x.

For more serverless learning resources, visit Serverless Land.

Introducing attribute-based access controls (ABAC) for Amazon SQS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-attribute-based-access-controls-abac-for-amazon-sqs/

This post is written by Vikas Panghal (Principal Product Manager), and Hardik Vasa (Senior Solutions Architect).

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that makes it easier to decouple and scale microservices, distributed systems, and serverless applications. SQS queues enable asynchronous communication between different application components and ensure that each of these components can keep functioning independently without losing data.

Today we’re announcing support for attribute-based access control (ABAC) using queue tags with the SQS service. As an AWS customer, if you use multiple SQS queues to achieve better application decoupling, it is often challenging to manage access to individual queues. In such cases, using tags can enable you to classify these resources in different ways, such as by owner, category, or environment.

This blog post demonstrates how to use tags to allow conditional access to SQS queues. You can use attribute-based access control (ABAC) policies to grant access rights to users through policies that combine attributes together. ABAC can be helpful in rapidly growing environments, where policy management for each individual resource can become cumbersome.

ABAC for SQS is supported in all Regions where SQS is currently available.

Overview

SQS supports tagging of queues. Each tag is a label comprising a customer-defined key and an optional value that can make it easier to manage, search for, and filter resources. Tags allows you to assign metadata to your SQS resources. This can help you track and manage the costs associated with your queues, provide enhanced security in your AWS Identity and Access Management (IAM) policies, and lets you easily filter through thousands of queues.

SQS queue options in the console

The preceding image shows SQS queue in AWS Management Console with two tags – ‘auto-delete’ with value of ‘no’ and ‘environment’ with value of ‘prod’.

Attribute-based access controls (ABAC) is an authorization strategy that defines permissions based on tags attached to users and AWS resources. With ABAC, you can use tags to configure IAM access permissions and policies for your queues. ABAC hence enables you to scale your permissions management easily. You can author a single permissions policy in IAM using tags that you create per business role, and you no longer need to update the policy while adding each new resource.

You can also attach tags to AWS Identity and Access Management (IAM) principals to create an ABAC policy. These ABAC policies can be designed to allow SQS operations when the tag on the IAM user or role making the call matches the SQS queue tag.

ABAC provides granular and flexible access control based on attributes and values, reduces security risk because of misconfigured role-based policy, and easily centralizes auditing and access policy management.

ABAC enables two key use cases:

  1. Tag-based Access Control: You can use tags to control access to your SQS queues, including control plane and data plane API calls.
  2. Tag-on-Create: You can enforce tags during the creation of SQS queues and deny the creation of SQS resources without tags.

Tagging for access control

Let’s take a look at a couple of examples on using tags for access control.

Let’s say that you would want to restrict IAM user to all SQS actions for all queues that include a resource tag with the key environment and the value production. The following IAM policy helps to fulfill the requirement.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAccessForProd",
            "Effect": "Deny",
            "Action": "sqs:*",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/environment": "prod"
                }
            }
        }
    ]
}

Now, for instance you need to restrict IAM policy for any operation on resources with a given tag with key environment and value production as an argument within the API call, the following IAM policy helps fulfill the requirements.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAccessForStageProduction",
            "Effect": "Deny",
            "Action": "sqs:*",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/environment": "production"
                }
            }
        }
    ]
}

Creating IAM user and SQS queue using AWS Management Console

Configuration of the ABAC on SQS resources is a two-step process. The first step is to tag your SQS resources with tags. You can use the AWS API, the AWS CLI, or the AWS Management Console to tag your resources. Once you have tagged the resources, create an IAM policy that allows or denies access to SQS resources based on their tags.

This post reviews the step-by-step process of creating ABAC policies for controlling access to SQS queues.

Create an IAM user

  1. Navigate to the AWS IAM console and choose User from the left navigation pane.
  2. Choose Add Users and provide a name in the User name text box.
  3. Check the Access key – Programmatic access box and choose Next:Permissions.
  4. Choose Next:Tags.
  5. Add tag key as environment and tag value as beta
  6. Select Next:Review and then choose Create user
  7. Copy the access key ID and secret access key and store in a secure location.

IAM configuration

Add IAM user permissions

  1. Select the IAM user you created.
  2. Choose Add inline policy.
  3. In the JSON tab, paste the following policy:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowAccessForSameResTag",
                "Effect": "Allow",
                "Action": [
                    "sqs:SendMessage",
                    "sqs:ReceiveMessage",
                    "sqs:DeleteMessage"
                ],
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "aws:ResourceTag/environment": "${aws:PrincipalTag/environment}"
                    }
                }
            },
            {
                "Sid": "AllowAccessForSameReqTag",
                "Effect": "Allow",
                "Action": [
                    "sqs:CreateQueue",
                    "sqs:DeleteQueue",
                    "sqs:SetQueueAttributes",
                    "sqs:tagqueue"
                ],
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "aws:RequestTag/environment": "${aws:PrincipalTag/environment}"
                    }
                }
            },
            {
                "Sid": "DenyAccessForProd",
                "Effect": "Deny",
                "Action": "sqs:*",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "aws:ResourceTag/stage": "prod"
                    }
                }
            }
        ]
    }
    
  4. Choose Review policy.
  5. Choose Create policy.
    Create policy

The preceding permissions policy ensures that the IAM user can call SQS APIs only if the value of the request tag within the API call matches the value of the environment tag on the IAM principal. It also makes sure that the resource tag applied to the SQS queue matches the IAM tag applied on the user.

Creating IAM user and SQS queue using AWS CloudFormation

Here is the sample CloudFormation template to create an IAM user with an inline policy attached and an SQS queue.

AWSTemplateFormatVersion: "2010-09-09"
Description: "CloudFormation template to create IAM user with custom in-line policy"
Resources:
    IAMPolicy:
        Type: "AWS::IAM::Policy"
        Properties:
            PolicyDocument: |
                {
                    "Version": "2012-10-17",
                    "Statement": [
                        {
                            "Sid": "AllowAccessForSameResTag",
                            "Effect": "Allow",
                            "Action": [
                                "sqs:SendMessage",
                                "sqs:ReceiveMessage",
                                "sqs:DeleteMessage"
                            ],
                            "Resource": "*",
                            "Condition": {
                                "StringEquals": {
                                    "aws:ResourceTag/environment": "${aws:PrincipalTag/environment}"
                                }
                            }
                        },
                        {
                            "Sid": "AllowAccessForSameReqTag",
                            "Effect": "Allow",
                            "Action": [
                                "sqs:CreateQueue",
                                "sqs:DeleteQueue",
                                "sqs:SetQueueAttributes",
                                "sqs:tagqueue"
                            ],
                            "Resource": "*",
                            "Condition": {
                                "StringEquals": {
                                    "aws:RequestTag/environment": "${aws:PrincipalTag/environment}"
                                }
                            }
                        },
                        {
                            "Sid": "DenyAccessForProd",
                            "Effect": "Deny",
                            "Action": "sqs:*",
                            "Resource": "*",
                            "Condition": {
                                "StringEquals": {
                                    "aws:ResourceTag/stage": "prod"
                                }
                            }
                        }
                    ]
                }
                
            Users: 
              - "testUser"
            PolicyName: tagQueuePolicy

    IAMUser:
        Type: "AWS::IAM::User"
        Properties:
            Path: "/"
            UserName: "testUser"
            Tags: 
              - 
                Key: "environment"
                Value: "beta"

Testing tag-based access control

Create queue with tag key as environment and tag value as prod

We will use AWS CLI to demonstrate the permission model. If you do not have AWS CLI, you can download and configure it for your machine.

Run this AWS CLI command to create the queue:

aws sqs create-queue --queue-name prodQueue —region us-east-1 —tags "environment=prod"

You receive an AccessDenied error from the SQS endpoint:

An error occurred (AccessDenied) when calling the CreateQueue operation: Access to the resource <queueUrl> is denied.

This is because the tag value on the IAM user does not match the tag passed in the CreateQueue API call. Remember that we applied a tag to the IAM user with key as ‘environment’ and value as ‘beta’.

Create queue with tag key as environment and tag value as beta

aws sqs create-queue --queue-name betaQueue —region us-east-1 —tags "environment=beta"

You see a response similar to the following, which shows the successful creation of the queue.

{
"QueueUrl": "<queueUrl>“
}

Sending message to the queue

aws sqs send-message --queue-url <queueUrl> —message-body testMessage

You will get a successful response from the SQS endpoint. The response will include MD5OfMessageBody and MessageId of the message.

{
"MD5OfMessageBody": "<MD5OfMessageBody>",
"MessageId": "<MessageId>"
}

The response shows successful message delivery to the SQS queue since the IAM user permission allows sending message with queue with tag ‘beta’.

Benefits of attribute-based access controls

The following are benefits of using attribute-based access controls (ABAC) in Amazon SQS:

  • ABAC for SQS requires fewer permissions policies – You do not have to create different policies for different job functions. You can use the resource and request tags that apply to more than one queue. This reduces the operational overhead.
  • Using ABAC, teams can scale quickly – Permissions for new resources are automatically granted based on tags when resources are appropriately tagged upon creation.
  • Use permissions on the IAM principal to restrict resource access – You can create tags for the IAM principal and restrict access to specific action only if it matches the tag on the IAM principal. This helps automate granting of request permissions.
  • Track who is accessing resources – Easily determine the identity of a session by looking at the user attributes in AWS CloudTrail to track user activity in AWS.

Conclusion

In this post, we have seen how Attribute-based access control (ABAC) policies allow you to grant access rights to users through IAM policies based on tags defined on the SQS queues.

ABAC for SQS supports all SQS API actions. Managing the access permissions via tags can save you engineering time creating complex access permissions as your applications and resources grow. With the flexibility of using multiple resource tags in the security policies, the data and compliance teams can now easily set more granular access permissions based on resource attributes.

For additional details on pricing, see Amazon SQS pricing. For additional details on programmatic access to the SQS data protection, see Actions in the Amazon SQS API Reference. For more information on SQS security, see the SQS security public documentation page. To get started with the attribute-based access control for SQS, navigate to the SQS console.

For more serverless learning resources, visit Serverless Land.

Building serverless .NET applications on AWS Lambda using .NET 7

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-serverless-net-applications-on-aws-lambda-using-net-7/

This post is written by James Eastham, Senior Cloud Architect, Beau Gosse, Senior Software Engineer, and Samiullah Mohammed, Senior Software Engineer

Today, AWS is announcing tooling support to enable applications running .NET 7 to be built and deployed on AWS Lambda. This includes applications compiled using .NET 7 native AOT. .NET 7 is the latest version of .NET and brings many performance improvements and optimizations.

Native AOT enables .NET code to be ahead-of-time compiled to native binaries for up to 86% faster cold starts when compared to the .NET 6 managed runtime. The fast execution and lower memory consumption of native AOT can also result in reduced Lambda costs. This post walks through how to get started running .NET 7 applications on AWS Lambda with native AOT.

Overview

Customers can use .NET 7 with Lambda in two ways. First, Lambda has released a base container image for .NET 7, enabling customers to build and deploy .NET 7 functions as container images. Second, you can use Lambda’s custom runtime support to run functions compiled to native code using .NET 7 native AOT. Lambda has not released a managed runtime for .NET 7, since it is not a long-term support (LTS) release.

Native AOT allows .NET applications to be pre-compiled to a single binary, removing the need for JIT (Just In Time compilation) and the .NET runtime. To use this binary in a custom runtime, it needs to include the Lambda runtime client. The runtime client integrates your application code with the Lambda runtime API, which enables your application code to be invoked by Lambda.

The enhanced tooling announced today streamlines the tasks of building .NET applications using .NET 7 native AOT and deploying them to Lambda using a custom runtime. This tooling comprises three tools. The AWS Lambda extension to the ‘dotnet’ CLI (Amazon.Lambda.Tools) contains the commands to build and deploy Lambda functions using .NET. The dotnet CLI can be used directly, and is also used by the AWS Toolkit for Visual Studio, and the AWS Serverless Application Model (AWS SAM), an open-source framework for building serverless applications.

Native AOT compiles code for a specific OS version. If you run the dotnet publish command on your machine, the compiled code only runs on the OS version and processor architecture of your machine. For your application code to run in Lambda using native AOT, the code must be compiled on the Amazon Linux 2 (AL2) OS. The new tooling supports compiling your Lambda functions within an AL2-based Docker image, with the compiled application stored on your local hard drive.

Develop Lambda functions with .NET 7 native AOT

In this section, we’ll discuss how to develop your Lambda function code to be compatible with .NET 7 native AOT. This is the first GA version of native AOT Microsoft has released. It may not suit all workloads, since it does come with trade-offs. For example, dynamic assembly loading and the System.Reflection.Emit library are not available. Native AOT also trims your application code, resulting in a small binary that contains the essential components for your application to run.

Prerequisites

Getting Started

To get started, create a new Lambda function project using a custom runtime from the .NET CLI.

dotnet new lambda.NativeAOT -n LambdaNativeAot
cd ./LambdaNativeAot/src/LambdaNativeAot/
dotnet add package Amazon.Lambda.APIGatewayEvents
dotnet add package AWSSDK.Core

To review the project settings, open the LambdaNativeAot.csproj file. The target framework in this template is set to net7.0. To enable native AOT, add a new property named PublishAot, with value true. This PublishAot flag is an MSBuild property required by the .NET SDK so that the compiler performs native AOT compilation.

When using Lambda with a custom runtime, the Lambda service looks for an executable file named bootstrap within the packaged ZIP file. To enable this, the OutputType is set to exe and the AssemblyName to bootstrap.

The correctly configured LambdaNativeAot.csproj file looks like this:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <AWSProjectType>Lambda</AWSProjectType>
    <AssemblyName>bootstrap</AssemblyName>
    <PublishAot>true</PublishAot>
  </PropertyGroup> 
  …
</Project>

Function code

Running .NET with a custom runtime uses the executable assembly feature of .NET. To do this, your function code must define a static Main method. Within the Main method, you must initialize the Lambda runtime client, and configure the function handler and the JSON serializer to use when processing Lambda events.

The Amazon.Lambda.RuntimeSupport Nuget package is added to the project to enable this runtime initialization. The LambdaBootstrapBuilder.Create() method is used to configure the handler and the ILambdaSerializer implementation to use for (de)serialization.

private static async Task Main(string[] args)
{
    Func<string, ILambdaContext, string> handler = FunctionHandler;
    await LambdaBootstrapBuilder.Create(handler, new DefaultLambdaJsonSerializer())
        .Build()
        .RunAsync();
}

Assembly trimming

Native AOT trims application code to optimize the compiled binary, which can cause two issues. The first is with de/serialization. Common .NET libraries for working with JSON like Newtonsoft.Json and System.Text.Json rely on reflection. The second is with any third party libraries not yet updated to be trim-friendly. The compiler may trim out parts of the library that are required for the library to function. However, there are solutions for both issues.

Working with JSON

Source generated serialization is a language feature introduced in .NET 6. It allows the code required for de/serialization to be generated at compile time instead of relying on reflection at runtime. One drawback of native AOT is that the ability to use System.Relefection.Emit library is lost. Source generated serialization enables developers to work with JSON while also using native AOT.

To use the source generator, you must define a new empty partial class that inherits from System.Text.Json.JsonSerializerContext. On the empty partial class, add the JsonSerializable attribute for any .NET type that your application must de/serialize.

In this example, the Lambda function needs to receive events from API Gateway. Create a new class in the project named HttpApiJsonSerializerContext and copy the code below:

[JsonSerializable(typeof(APIGatewayHttpApiV2ProxyRequest))]
[JsonSerializable(typeof(APIGatewayHttpApiV2ProxyResponse))]
public partial class HttpApiJsonSerializerContext : JsonSerializerContext
{
}

When the application is compiled, static classes, properties, and methods are generated to perform the de/serialization.

This custom serializer must now also be passed in to the Lambda runtime to ensure that event inputs and outputs are serialized and deserialized correctly. To do this, pass a new instance of the serializer context into the runtime when bootstrapped. Here is an example of a Lambda function using API Gateway as a source:

using System.Text.Json.Serialization;
using Amazon.Lambda.APIGatewayEvents;
using Amazon.Lambda.Core;
using Amazon.Lambda.RuntimeSupport;
using Amazon.Lambda.Serialization.SystemTextJson;
namespace LambdaNativeAot;
public class Function
{
    /// <summary>
    /// The main entry point for the custom runtime.
    /// </summary>
    private static async Task Main()
    {
        Func<APIGatewayHttpApiV2ProxyRequest, ILambdaContext, Task<APIGatewayHttpApiV2ProxyResponse>> handler = FunctionHandler;
        await LambdaBootstrapBuilder.Create(handler, new SourceGeneratorLambdaJsonSerializer<HttpApiJsonSerializerContext>())
            .Build()
            .RunAsync();
    }

    public static async Task<APIGatewayHttpApiV2ProxyResponse> FunctionHandler(APIGatewayHttpApiV2ProxyRequest apigProxyEvent, ILambdaContext context)
    {
        // API Handling logic here
        return new APIGatewayHttpApiV2ProxyResponse()
        {
            StatusCode = 200,
            Body = "OK"
        };
    }
}

Third party libraries

The .NET compiler provides the capability to control how applications are trimmed. For native AOT compilation, this enables us to exclude specific assemblies from trimming. For any libraries used in your applications that may not yet be trim-friendly this is a powerful way to still use native AOT. This is important for any of the Lambda event source NuGet packages like Amazon.Lambda.ApiGatewayEvents. Without controlling this, the C# objects for the Amazon API Gateway event sources are trimmed, leading to serialization errors at runtime.

Currently, the AWSSDK.Core library used by all the .NET AWS SDKs must also be excluded from trimming.

To control the assembly trimming, create a new file in the project root named rd.xml. Full details on the rd.xml format are found in the Microsoft documentation. Adding assemblies to the rd.xml file excludes them from trimming.

The following example contains an example of how to exclude the AWSSDK.Core, API Gateway event and function library from trimming:

<Directives xmlns="http://schemas.microsoft.com/netfx/2013/01/metadata">
	<Application>
		<Assembly Name="AWSSDK.Core" Dynamic="Required All"></Assembly>
		<Assembly Name="Amazon.Lambda.APIGatewayEvents" Dynamic="Required All"></Assembly>
		<Assembly Name="bootstrap" Dynamic="Required All"></Assembly>
	</Application>
</Directives>

Once added, the csproj file must be updated to reference the rd.xml file. Edit the csproj file for the Lambda project and add this ItemGroup:

<ItemGroup>
  <RdXmlFile Include="rd.xml" />
</ItemGroup>

When the function is compiled, assembly trimming skips the three libraries specified. If you are using .NET 7 native AOT with Lambda, we recommend excluding both the AWSSDK.Core library and the specific libraries for any event sources your Lambda function uses. If you are using the AWS X-Ray SDK for .NET to trace your serverless application, this must also be excluded.

Deploying .NET 7 native AOT applications

We’ll now explain how to build and deploy .NET 7 native AOT functions on Lambda, using each of the three deployment tools.

Using the dotnet CLI

Prerequisites

  • Docker (if compiling on a non-Amazon Linux 2 based machine)

Build and deploy

To package and deploy your Native AOT compiled Lambda function, run:

dotnet lambda deploy-function

When compiling and packaging your Lambda function code using the Lambda tools CLI, the tooling checks for the PublishAot flag in your project. If set to true, the tooling pulls an AL2-based Docker image and compiles your code inside. It mounts your local file system to the running container, allowing the compiled binary to be stored back to your local file system ready for deployment. As a default, the generated ZIP file is output to the bin/Release directory.

Once the deployment completes, you can execute the below command to invoke the created function, replacing the FUNCTION_NAME option with the name of the function chosen during deployment.

dotnet lambda invoke-function FUNCTION_NAME

Using the Visual Studio Extension

AWS is also announcing support for compiling and deploying native AOT-based Lambda functions from within Visual Studio using the AWS Toolkit for Visual Studio.

Prerequisites

Getting Started

As part of this release, templates are available in Visual Studio 2022 to get started using native AOT with AWS Lambda. From within Visual Studio, select File -> New Project. Search for Lambda .NET 7 native AOT to start a new project pre-configured for native AOT.

Create a new project

Build and deploy

Once the project is created, right-click the project in Visual Studio and choose Publish to AWS Lambda.

Solution Explorer

Complete the steps in the publish wizard and press upload. The log messages created by Docker appear in the publish window as it compiles your function code for native AOT.

Uploading function

You can now invoke the deployed function from within Visual Studio by setting the Example request dropdown to API Gateway AWS Proxy and pressing the Invoke button.

Invoke example

Using the AWS SAM CLI

Prerequisites

  • Docker (If compiling on a non-AL2 based machine)
  • AWS SAM v1.6.4 or later

Getting started

Support for compiling and deploying .NET 7 native AOT is built into the AWS SAM CLI. To get started, initialize a new AWS SAM project:

sam init

In the new project wizard, choose:

  1. What template source would you like to use? 1 – AWS Quick Start Template
  2. Choose an AWS Quick start application template. 1 – Hello World example
  3. Use the most popular runtime and package type? – N
  4. Which runtime would you like to use? aot.dotnet7 (provided.al2)
  5. Enable X-Ray Tracing? N
  6. Choose a project name

The cloned project includes the configuration to deploy to Lambda.

One new AWS SAM metadata property called ‘BuildMethod’ is required in the AWS SAM template:

HelloWorldFunction:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: 'provided.al2' # // Use provided to deploy to AWS Lambda for .NET 7 native AOT
    Architectures:
      - x86_64
  Metadata:
    BuildMethod: 'dotnet7' # // But build with new build method for .NET 7 that calls into Amazon.Lambda.Tools 

Build and deploy

Build and deploy your serverless application, completing the guided deployment steps:

sam build
sam deploy –-guided

The AWS SAM CLI uses the Amazon.Lambda.Tools CLI to pull an AL2-based Docker image and compile your application code inside a container. You can use AWS SAM accelerate to speed up the update of serverless applications during development. It uses direct API calls instead of deploying changes through AWS CloudFormation, automating updates whenever you change your local code base. Learn more in the AWS SAM development documentation.

Conclusion

AWS now supports .NET 7 native AOT on Lambda. Read the Lambda Developer Guide for more getting started information. For more details on the performance improvements from using .NET 7 native AOT on Lambda, see the serverless-dotnet-demo repository on GitHub.

To provide feedback for .NET on AWS Lambda, contact the AWS .NET team on the .NET Lambda GitHub repository.

For more serverless learning resources, visit Serverless Land.

Better together: AWS SAM CLI and HashiCorp Terraform

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/better-together-aws-sam-cli-and-hashicorp-terraform/

This post is written by Suresh Poopandi, Senior Solutions Architect and Seb Kasprzak, Senior Solutions Architect.

Today, AWS is announcing the public preview of AWS Serverless Application Model CLI (AWS SAM CLI) support for local development, testing, and debugging of serverless applications defined using HashiCorp Terraform configuration.

AWS SAM and Terraform are open-source frameworks for building applications using infrastructure as code (IaC). Both frameworks allow building, changing, and managing cloud infrastructure in a repeatable way by defining resource configurations.

Previously, you could use the AWS SAM CLI to build, test, and debug applications defined by AWS SAM templates or through the AWS Cloud Development Kit (CDK). With this preview release, you can also use AWS SAM CLI to test and debug serverless applications defined using Terraform configurations.

Walkthrough of Terraform support

This blog post contains a sample Terraform template, which shows how developers can use AWS SAM CLI to build locally, test, and debug AWS Lambda functions defined in Terraform. This sample application has a Lambda function that stores a book review score and review text in an Amazon DynamoDB table. An Amazon API Gateway book review API uses Lambda proxy integration to invoke the book review Lambda function.

Demo application architecture

Demo application architecture

Prerequisites

Before running this example:

  • Install the AWS CLI.
    • Configure with valid AWS credentials.
    • Note that AWS CLI now requires Python runtime.
  • Install HashiCorp Terraform.
  • Install the AWS SAM CLI.
  • Install Docker (required to run AWS Lambda function locally).

Since Terraform support is currently in public preview, you must provide a –beta-features flag while executing AWS SAM commands. Alternatively, set this flag in samconfig.toml file by adding beta_features=”true”.

Deploying the example application

This Lambda function interacts with DynamoDB. For the example to work, it requires an existing DynamoDB table in an AWS account. Deploying this creates all the required resources for local testing and debugging of the Lambda function.

To deploy:

  1. Clone the aws-sam-terraform-examples repository locally:
    git clone https://github.com/aws-samples/aws-sam-terraform-examples
  2. Change to the project directory:
    cd aws-sam-terraform-examples/zip_based_lambda_functions/api-lambda-dynamodb-example/

    Terraform must store the state of the infrastructure and configuration it creates. Terraform uses this state to map cloud resources to configuration and track changes. This example uses a local backend to store the state file on the local filesystem.

  3. Open the main.tf file and review its contents. Locate the backend section of the code, updating the region field with the target deployment Region of this sample solution:
    provider “aws” {
        region = “<AWS region>” # e.g. us-east-1
    }
  4. Initialize a working directory containing Terraform configuration files:
    terraform init
  5. Deploy the application using Terraform CLI. When prompted by “Do you want to perform these actions?”, enter Yes.
    terraform apply

Terraform deploys the application, as shown in the terminal output.

Terminal output

Terminal output

After completing the deployment process, the AWS account is ready for use by the Lambda function with all the required resources.

Terraform Configuration for local testing

Lambda functions require application dependencies bundled together with function code as a deployment package (typically a .zip file) to run correctly. Terraform natively does not create the deployment package and a separate build process handles this package creation.

This sample application uses Terraform’s null_resource and local-exec provisioner to trigger a build process script. This installs Python dependencies in a temporary folder and creates a .zip file with dependencies and function code. It contains this logic within the main.tf file of the example application.

To explain each code segment in more detail:

Terraform example

Terraform example

  1. aws_lambda_function: This sample defines a Lambda function resource. It contains properties such as environment variables (in this example, the DynamoDB table_id) and the depends_on argument, which creates the .zip package before deploying the Lambda function.

    Terraform example

    Terraform example

  2. null_resource: When the AWS SAM CLI build command runs, AWS SAM reviews Terraform code for any null_resource starting with sam_metadata_ and uses the information contained within this resource block to gather the location of the Lambda function source code and .zip package. This information allows the AWS SAM CLI to start the local execution of the Lambda function. This special resource should contain the following attributes:
    • resource_name: The Lambda function address as defined in the current module (aws_lambda_function.publish_book_review)
    • resource_type: Packaging type of the Lambda function (ZIP_LAMBDA_FUNCTION)
    • original_source_code: Location of Lambda function code
    • built_output_path: Location of .zip deployment package

Local testing

With the backend services now deployed, run local tests to see if everything is working. The locally running sample Lambda function interacts with the services deployed in the AWS account. Run the sam build to reflect the local sam testing environment with changes after each code update.

  1. Local Build: To create a local build of the Lambda function for testing, use the sam build command:
    sam build --hook-name terraform --beta-features
  2. Local invoke: The first test is to invoke the Lambda function with a mocked event payload from the API Gateway. These events are in the events directory. Run this command, passing in a mocked event:
    AWS_DEFAULT_REGION=<Your Region Name>
    sam local invoke aws_lambda_function.publish_book_review -e events/new-review.json --beta-features

    AWS SAM mounts the Lambda function runtime and code and runs it locally. The function makes a request to the DynamoDB table in the cloud to store the information provided via the API. It returns a 200 response code, signaling the successful completion of the function.

  3. Local invoke from AWS CLI
    Another test is to run a local emulation of the Lambda service using “sam local start-lambda” and invoke the function directly using AWS SDK or the AWS CLI. Start the local emulator with the following command:

    sam local start-lambda
    Terminal output

    Terminal output

    AWS SAM starts the emulator and exposes a local endpoint for the AWS CLI or a software development kit (SDK) to call. With the start-lambda command still running, run the following command to invoke this function locally with the AWS CLI:

    aws lambda invoke --function-name aws_lambda_function.publish_book_review --endpoint-url http://127.0.0.1:3001/ response.json --cli-binary-format raw-in-base64-out --payload file://events/new-review.json

    The AWS CLI invokes the local function and returns a status report of the service to the screen. The response from the function itself is in the response.json file. The window shows the following messages:

    Invocation results

    Invocation results

  4. Debugging the Lambda function

Developers can use AWS SAM with a variety of AWS toolkits and debuggers to test and debug serverless applications locally. For example, developers can perform local step-through debugging of Lambda functions by setting breakpoints, inspecting variables, and running function code one line at a time.

The AWS Toolkit Integrated Development Environment (IDE) plugin provides the ability to perform many common debugging tasks, like setting breakpoints, inspecting variables, and running function code one line at a time. AWS Toolkits make it easier to develop, debug, and deploy serverless applications defined using AWS SAM. They provide an experience for building, testing, debugging, deploying, and invoking Lambda functions integrated into IDE. Refer to this link that lists common IDE/runtime combinations that support step-through debugging of AWS SAM applications.

Visual Studio Code keeps debugging configuration information in a launch.json file in a workspace .vscode folder. Here is a sample launch configuration file to debug Lambda code locally using AWS SAM and Visual Studio Code.

{
    "version": "0.2.0",
    "configurations": [
          {
            "name": "Attach to SAM CLI",
            "type": "python",
            "request": "attach",
            "address": "localhost",
            "port": 9999,
            "localRoot": "${workspaceRoot}/sam-terraform/book-reviews",
            "remoteRoot": "/var/task",
            "protocol": "inspector",
            "stopOnEntry": false
          }
    ]
}

After adding the launch configuration, start a debug session in the Visual Studio Code.

Step 1: Uncomment the following two lines in zip_based_lambda_functions/api-lambda-dynamodb-example/src/index.py

Enable debugging in the Lambda function

Enable debugging in the Lambda function

Step 2: Run the Lambda function in the debug mode and wait for the Visual Studio Code to attach to this debugging session:

sam local invoke aws_lambda_function.publish_book_review -e events/new-review.json -d 9999

Step 3: Select the Run and Debug icon in the Activity Bar on the side of VS Code. In the Run and Debug view, select “Attach to SAM CLI” and choose Run.

For this example, set a breakpoint at the first line of lambda_handler. This breakpoint allows viewing the input data coming into the Lambda function. Also, it helps in debugging code issues before deploying to the AWS Cloud.

Debugging in then IDE

Debugging in then IDE

Lambda Terraform module

A community-supported Terraform module for lambda (terraform-aws-lambda) has added support for SAM metadata null_resource. When using the latest version of this module, AWS SAM CLI will automatically support local invocation of the Lambda function, without additional resource blocks required.

Conclusion

This blog post shows how to use the AWS SAM CLI together with HashiCorp Terraform to develop and test serverless applications in a local environment. With AWS SAM CLI’s support for HashiCorp Terraform, developers can now use the AWS SAM CLI to test their serverless functions locally while choosing their preferred infrastructure as code tooling.

For more information about the features supported by AWS SAM, visit AWS SAM. For more information about the Metadata resource, visit HashiCorp Terraform.

Support for the Terraform configuration is currently in preview, and the team is asking for feedback and feature request submissions. The goal is for both communities to help improve the local development process using AWS SAM CLI. Submit your feedback by creating a GitHub issue here.

For more serverless learning resources, visit Serverless Land.

Introducing the AWS Lambda Telemetry API

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-the-aws-lambda-telemetry-api/

This blog post is written by Anton Aleksandrov, Principal Solution Architect and Shridhar Pandey, Senior Product Manager

Today AWS is announcing the AWS Lambda Telemetry API. This provides an easier way to receive enhanced function telemetry directly from the Lambda service and send it to custom destinations. Developers and operators can now more easily monitor and observe their Lambda functions using Lambda extensions from their preferred observability tool providers.

Extensions can use the Lambda Logs API to collect logs generated by the Lambda service and code running in their Lambda function. While the Logs API provides extensions with access to logs, it does not provide a way to collect additional telemetry, such as traces and metrics, which the Lambda service generates during initialization and invocation of your Lambda function.

Previously, observability tools retrieved traces from AWS X-Ray using the AWS X-Ray API or built their own custom tracing libraries to generate traces during Lambda function invocation. Tools required customers to modify AWS Identity and Access Management (IAM) policies to grant access to the traces from X-Ray. This caused additional complexity for tools to collect traces and metrics from multiple sources and introduced latency in seeing Lambda function traces in observability tool dashboards.

The Lambda Telemetry API is a new API that enhances the existing Lambda Logs API functionality. With the new Telemetry API, observability tools can receive function and extension logs, and also events, traces, and metrics directly from within the Lambda service. You do not need to install additional tracing libraries. This reduces latency and simplifies access permissions, as the extension does not require additional access to X-Ray.

Today you can use Telemetry API-enabled extensions to send telemetry data to Coralogix, Datadog, Dynatrace, Lumigo, New Relic, Sedai, Site24x7, Serverless.com, Sumo Logic, Sysdig, Thundra, or your own custom destinations.

Overview

To receive logs, extensions subscribe using the new Lambda Telemetry API.

Lambda Telemetry API

Lambda Telemetry API

The Lambda service then streams the telemetry events directly to the extension. The events include platform events, trace spans, function and extension logs, and additional Lambda platform metrics. The extension can then process, filter, and route them to any preferred destination.

You can add an extension from the tooling provider of your choice to your Lambda function. You can deploy extensions, including ones that use the Telemetry API, as Lambda layers, with the AWS Management Console and AWS Command Line Interface (AWS CLI). You can also use infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), Serverless Framework, and Terraform.

Lambda Extensions from the AWS Partner Network (APN) available at launch

Today, you can use Lambda extensions that use Telemetry API from the following Lambda partners:

  • The Coralogix AWS Lambda Telemetry Exporter extension now offers improved monitoring and alerting for Lambda functions by further streamlining collection and correlation of logs, metrics, and traces.
  • The Datadog extension further simplifies how you visualize the impact of cold starts, and monitor and alert on latency, duration, and payload size of your Lambda functions by collecting logs, traces, and real-time metrics from your function in a simple and cost-effective way.
  • Dynatrace now provides a simplified observability configuration for AWS Lambda through a seamless integration. The new solution delivers low-latency telemetry, enables monitoring at scale, and helps reduce monitoring costs for your serverless workloads.
  • The Lumigo lambda-log-shipper extension simplifies aggregating and forwarding Lambda logs to third-party tools. It now also makes it easy for you to detect Lambda function timeouts.
  • The New Relic extension now provides a unified observability view for your Lambda functions with insights that help you better understand and optimize the performance of your functions.
  • Sedai now uses the Telemetry API to help you improve the performance and availability of your Lambda functions by gathering insights about your function and providing recommendations for manual and autonomous remediation in a cost-effective manner.
  • The Site24x7 extension now offers new metrics, which enable you to get deeper insights into the different phases of the Lambda function lifecycle, such as initialization and invocation.
  • Serverless.com now uses the Telemetry API to provide real-time performance details for your Lambda function through the Dev Mode feature of their new Serverless Console V.2 offering, which simplifies debugging in the AWS Cloud.
  • Sumo Logic now makes it easier, faster, and more cost-effective for you to get your mission-critical Lambda function telemetry sent directly to Sumo Logic so you could quickly analyze and remediate errors and exceptions.
  • The Sysdig Monitor extension generates and collects real-time metrics directly from the Lambda platform. The simplified instrumentation offers lower latency, reduced MTTR (mean time to resolution) for critical issues, and cost benefits while monitoring your serverless applications.
  • The Thundra extension enables you to export logs, metrics, and events for Lambda execution environment lifecycle events emitted by the Telemetry API to a destination of your choice such as an S3 bucket, a database, or a monitoring backend.

Seeing example Telemetry API extensions in action

This demo shows an example of using a telemetry extension to receive telemetry, batch, and send it to a desired destination.

To set up the example, visit the GitHub repo for the extension implemented in the language of your choice and follow the instructions in the README.md file.

To configure the batching behavior, which controls when the extension sends the data, set the Lambda environment variable DISPATCH_MIN_BATCH_SIZE. When the extension receives the batch threshold, it POSTs the telemetry events batch to the destination specified in the DISPATCH_POST_URI environment variable.

You can configure an example DISPATCH_POST_URL for the extension to deliver the telemetry data using https://webhook.site/.

Lambda environment variables

Lambda environment variables

Telemetry events for one invoke may be received and processed during the next invocation. Events for the last invoke may be processed during the SHUTDOWN event.

Test and invoke the function from the Lambda console, or AWS CLI. You can see that the webhook receives the telemetry data.

Webhook receiving telemetry data

Webhook receiving telemetry data

You can also view the function and extension logs in CloudWatch Logs. The example extension includes verbose logging to understand the extension lifecycle.

CloudWatch Logs showing extension verbose logging

Sample Telemetry API events

When the extension receives telemetry data, each event contains a JSON dictionary with additional information, such as related metrics or trace spans. The following example shows a function initialization event. You can see that the function initializes with on-demand concurrency. The runtime version is Node.js 14, the initialization is successful, and the initialization duration is 123 milliseconds.

{
  "time": "2022-08-02T12:01:23.521Z",
  "type": "platform.initStart",
  "record": {
    "initializationType": "on-demand",
    "phase":"init",
    "runtimeVersion": "nodejs-14.v3",
    "runtimeVersionArn": "arn"
  }
}

{
  "time": "2022-08-02T12:01:23.521Z",
  "type": "platform.initRuntimeDone",
  "record": {
    "initializationType": "on-demand",
    "status": "success"
  }
}

{
  "time": "2022-08-02T12:01:23.521Z",
  "type": "platform.initReport",
  "record": {
    "initializationType": "on-demand",
    "phase":"init",
    "metrics": {
      "durationMs": 123.0,
    }
  }
}

Function invocation events include the associated requestId and tracing information connecting this invocation with the X-Ray tracing context, and platform spans showing response latency and response duration as well as invocation metrics such as duration in milliseconds.

{
    "time": "2022-08-02T12:01:23.521Z",
    "type": "platform.start",
    "record": {
      "requestId": "e6b761a9-c52d-415d-b040-7ba94b9452f3",
      "version": "$LATEST",
      "tracing": {
        "spanId": "54565fb41ac79632",
        "type": "X-Amzn-Trace-Id",
        "value": "Root=1-62e900b2-710d76f009d6e7785905449a;Parent=0efbd19962d95b05;Sampled=1"
      }
    }
  }
  
  {
    "time": "2022-08-02T12:01:23.521Z",
    "type": "platform.runtimeDone",
    "record": {
      "requestId": "e6b761a9-c52d-415d-b040-7ba94b9452f3",
      "status": "success",
      "tracing": {
        "spanId": "54565fb41ac79632",
        "type": "X-Amzn-Trace-Id",
        "value": "Root=1-62e900b2-710d76f009d6e7785905449a;Parent=0efbd19962d95b05;Sampled=1"
      },
      "spans": [
        {
          "name": "responseLatency", 
          "start": "2022-08-02T12:01:23.521Z",
          "durationMs": 23.02
        },
        {
          "name": "responseDuration", 
          "start": "2022-08-02T12:01:23.521Z",
          "durationMs": 20
        }
      ],
      "metrics": {
        "durationMs": 200.0,
        "producedBytes": 15
      }
    }
  }
  
  {
    "time": "2022-08-02T12:01:23.521Z",
    "type": "platform.report",
    "record": {
      "requestId": "e6b761a9-c52d-415d-b040-7ba94b9452f3",
      "metrics": {
        "durationMs": 220.0,
        "billedDurationMs": 300,
        "memorySizeMB": 128,
        "maxMemoryUsedMB": 90,
        "initDurationMs": 200.0
      },
      "tracing": {
        "spanId": "54565fb41ac79632",
        "type": "X-Amzn-Trace-Id",
        "value": "Root=1-62e900b2-710d76f009d6e7785905449a;Parent=0efbd19962d95b05;Sampled=1"
      }
    }
  }

Building a Telemetry API extension

Lambda extensions run as independent processes in the execution environment and continue to run after the function invocation is fully processed. Because extensions run as separate processes, you can write them in a language different from the function code. We recommend implementing extensions using a compiled language as a self-contained binary. This makes the extension compatible with all the supported runtimes.

Extensions that use the Telemetry API have the following lifecycle.

Telemetry API lifecycle

Telemetry API lifecycle

  1. The extension registers itself using the Lambda Extension API and subscribes to receive INVOKE and SHUTDOWN events. With the Telemetry API, the registration response body contains additional information, such as function name, function version, and account ID.
  2. The extensions start a telemetry listener. This is a local HTTP or TCP endpoint. We recommend using HTTP rather than TCP.
  3. The extensions use the Telemetry API to subscribe to desired telemetry event streams.
  4. The Lambda service POSTs telemetry stream data to your telemetry listener. We recommend batching the telemetry data as it arrives to the listener. You can perform any custom processing on this data and send it on to an S3 bucket, other custom destination, or an external observability service.

See the Telemetry API documentation and sample extensions for additional details.

The Lambda Telemetry API supersedes the Lambda Logs API. While the Logs API remains fully functional, AWS recommends using the Telemetry API. New functionality is only available with the Extensions API. Extensions can only subscribe to either the Logs or Telemetry API. After subscribing to one of them, any attempt to subscribe to the other returns an error.

Mapping Telemetry API schema to OpenTelemetry spans

The Lambda Telemetry API schema is semantically compatible with OpenTelemetry (OTEL). You can use events received from the Telemetry API to build and report OTEL spans. Three Telemetry API lifecycle events represent a single function invocation: start, runtimeDone, and runtimeReport. You should represent this as a single OTEL span. You can add additional details to your spans using information available in runtimeDone events under the event.spans property.

Mapping of Telemetry API events to OTEL spans is described in the Telemetry API documentation.

Metrics and pricing

The Telemetry API introduces new per-invoke metrics to help you understand the impact of extensions on your function’s performance. The metrics are available within the report.runtimeDone event.

  • platform.runtime measures the time taken by the Lambda Runtime to run your function handler code.
  • producedBytes measures the number of bytes returned during the invoke phase.

There are also two new trace spans available within the report.runtimeDone event:

  • responseLatencyMs measures the time taken by the Runtime to send a response.
  • responseDurationMs measures the time taken by the Runtime to finish sending the response from when it starts streaming it.

Extensions using Telemetry API, like other extensions, share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served, and the combined compute time used to run your code and all extensions, in 1-ms increments. To learn more about the billing for extensions, visit the Lambda pricing page.

Useful links

Conclusion

The Lambda Telemetry API allows you to receive enhanced telemetry data more easily using your preferred monitoring and observability tools. The Telemetry API enhances the functionality of the Logs API to receive logs, metrics, and traces directly from the Lambda service. Developers and operators can send telemetry to destinations without custom libraries, with reduced latency, and simplified permissions.

To see how the Telemetry API works, try the demos in the GitHub repository.

Build your own extensions using the Telemetry API today, or use extensions provided by the Lambda observability partners.

For more serverless learning resources, visit Serverless Land.