Tag Archives: AWS Lambda

Amazon Q brings generative AI-powered assistance to IT pros and developers (preview)

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/amazon-q-brings-generative-ai-powered-assistance-to-it-pros-and-developers-preview/

Today, we are announcing the preview of Amazon Q, a new type of generative artificial intelligence (AI) powered assistant that is specifically for work and can be tailored to a customer’s business.

Amazon Q brings a set of capabilities to support developers and IT professionals. Now you can use Amazon Q to get started building applications on AWS, research best practices, resolve errors, and get assistance in coding new features for your applications. For example, Amazon Q Code Transformation can perform Java application upgrades now, from version 8 and 11 to version 17.

Amazon Q is available in multiple areas of AWS to provide quick access to answers and ideas wherever you work. Here’s a quick look at Amazon Q, including in integrated development environment (IDE):

Building applications together with Amazon Q
Application development is a journey. It involves a continuous cycle of researching, developing, deploying, optimizing, and maintaining. At each stage, there are many questions—from figuring out the right AWS services to use, to troubleshooting issues in the application code.

Trained on 17 years of AWS knowledge and best practices, Amazon Q is designed to help you at each stage of development with a new experience for building applications on AWS. With Amazon Q, you minimize the time and effort you need to gain the knowledge required to answer AWS questions, explore new AWS capabilities, learn unfamiliar technologies, and architect solutions that fuel innovation.

Let us show you some capabilities of Amazon Q.

1. Conversational Q&A capability
You can interact with the Amazon Q conversational Q&A capability to get started, learn new things, research best practices, and iterate on how to build applications on AWS without needing to shift focus away from the AWS console.

To start using this feature, you can select the Amazon Q icon on the right-hand side of the AWS Management Console.

For example, you can ask, “What are AWS serverless services to build serverless APIs?” Amazon Q provides concise explanations along with references you can use to follow up on your questions and validate the guidance. You can also use Amazon Q to follow up on and iterate your questions. Amazon Q will show more deep-dive answers for you with references.

There are times when we have questions for a use case with fairly specific requirements. With Amazon Q, you can elaborate on your use cases in more detail to provide context.

For example, you can ask Amazon Q, “I’m planning to create serverless APIs with 100k requests/day. Each request needs to lookup into the database. What are the best services for this workload?” Amazon Q responds with a list of AWS services you can use and tries to limit the answer results to those that are accurately referenceable and verified with best practices.

Here is some additional information that you might want to note:

2. Optimize Amazon EC2 instance selection
Choosing the right Amazon Elastic Compute Cloud (Amazon EC2) instance type for your workload can be challenging with all the options available. Amazon Q aims to make this easier by providing personalized recommendations.

To use this feature, you can ask Amazon Q, “Which instance families should I use to deploy a Web App Server for hosting an application?” This feature is also available when you choose to launch an instance in the Amazon EC2 console. In Instance type, you can select Get advice on instance type selection. This will show a dialog to define your requirements.

Your requirements are automatically translated into a prompt on the Amazon Q chat panel. Amazon Q returns with a list of suggestions of EC2 instances that are suitable for your use cases. This capability helps you pick the right instance type and settings so your workloads will run smoothly and more cost-efficiently.

This capability to provide EC2 instance type recommendations based on your use case is available in preview in all commercial AWS Regions.

3. Troubleshoot and solve errors directly in the console
Amazon Q can also help you to solve errors for various AWS services directly in the console. With Amazon Q proposed solutions, you can avoid slow manual log checks or research.

Let’s say that you have an AWS Lambda function that tries to interact with an Amazon DynamoDB table. But, for an unknown reason (yet), it fails to run. Now, with Amazon Q, you can troubleshoot and resolve this issue faster by selecting Troubleshoot with Amazon Q.

Amazon Q provides concise analysis of the error which helps you to understand the root cause of the problem and the proposed resolution. With this information, you can follow the steps described by Amazon Q to fix the issue.

In just a few minutes, you will have the solution to solve your issues, saving significant time without disrupting your development workflow. The Amazon Q capability to help you troubleshoot errors in the console is available in preview in the US West (Oregon) for Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon ECS, and AWS Lambda.

4. Network troubleshooting assistance
You can also ask Amazon Q to assist you in troubleshooting network connectivity issues caused by network misconfiguration in your current AWS account. For this capability, Amazon Q works with Amazon VPC Reachability Analyzer to check your connections and inspect your network configuration to identify potential issues.

This makes it easy to diagnose and resolve AWS networking problems, such as “Why can’t I SSH to my EC2 instance?” or “Why can’t I reach my web server from the Internet?” which you can ask Amazon Q.

Then, on the response text, you can select preview experience here, which will provide explanations to help you to troubleshoot network connectivity-related issues.

Here are a few things you need to know:

5. Integration and conversational capabilities within your IDEs
As we mentioned, Amazon Q is also available in supported IDEs. This allows you to ask questions and get help within your IDE by chatting with Amazon Q or invoking actions by typing / in the chat box.

To get started, you need to install or update the latest AWS Toolkit and sign in to Amazon CodeWhisperer. Once you’re signed in to Amazon CodeWhisperer, it will automatically activate the Amazon Q conversational capability in the IDE. With Amazon Q enabled, you can now start chatting to get coding assistance.

You can ask Amazon Q to describe your source code file.

From here, you can improve your application, for example, by integrating it with Amazon DynamoDB. You can ask Amazon Q, “Generate code to save data into DynamoDB table called save_data() accepting data parameter and return boolean status if the operation successfully runs.”

Once you’ve reviewed the generated code, you can do a manual copy and paste into the editor. You can also select Insert at cursor to place the generated code into the source code directly.

This feature makes it really easy to help you focus on building applications because you don’t have to leave your IDE to get answers and context-specific coding guidance. You can try the preview of this feature in Visual Studio Code and JetBrains IDEs.

6. Feature development capability
Another exciting feature that Amazon Q provides is guiding you interactively from idea to building new features within your IDE and Amazon CodeCatalyst. You can go from a natural language prompt to application features in minutes, with interactive step-by-step instructions and best practices, right from your IDE. With a prompt, Amazon Q will attempt to understand your application structure and break down your prompt into logical, atomic implementation steps.

To use this capability, you can start by invoking an action command /dev in Amazon Q and describe the task you need Amazon Q to process.

Then, from here, you can review, collaborate and guide Amazon Q in the chat for specific areas that need to be implemented.

Additional capabilities to help you ship features faster with complete pull requests are available if you’re using Amazon CodeCatalyst. In Amazon CodeCatalyst, you can assign a new or an existing issue to Amazon Q, and it will process an end-to-end development workflow for you. Amazon Q will review the existing code, propose a solution approach, seek feedback from you on the approach, generate merge-ready code, and publish a pull request for review. All you need to do after is to review the proposed solutions from Amazon Q.

The following screenshots show a pull request created by Amazon Q in Amazon CodeCatalyst.

Here are a couple of things that you should know:

  • Amazon Q feature development capability is currently in preview in Visual Studio Code and Amazon CodeCatalyst
  • To use this capability in IDE, you need to have the Amazon CodeWhisperer Professional tier. Learn more on the Amazon CodeWhisperer pricing page.

7. Upgrade applications with Amazon Q Code Transformation
With Amazon Q, you can now upgrade an entire application within a few hours by starting a guided code transformation. This capability, called Amazon Q Code Transformation, simplifies maintaining, migrating, and upgrading your existing applications.

To start, navigate to the CodeWhisperer section and then select Transform. Amazon Q Code Transformation automatically analyzes your existing codebase, generates a transformation plan, and completes the key transformation tasks suggested by the plan.

Some additional information about this feature:

  • Amazon Q Code Transformation is available in preview today in the AWS Toolkit for IntelliJ IDEA and the AWS Toolkit for Visual Studio Code.
  • To use this capability, you need to have the Amazon CodeWhisperer Professional tier during the preview.
  • During preview, you can can upgrade Java 8 and 11 applications to version 17, a Java Long-Term Support (LTS) release.

Get started with Amazon Q today
With Amazon Q, you have an AI expert by your side to answer questions, write code faster, troubleshoot issues, optimize workloads, and even help you code new features. These capabilities simplify every phase of building applications on AWS.

Amazon Q lets you engage with AWS Support agents directly from the Q interface if additional assistance is required, eliminating any dead ends in the customer’s self-service experience. The integration with AWS Support is available in the console and will honor the entitlements of your AWS Support plan.

Learn more

— Donnie & Channy

AWS Lambda functions now scale 12 times faster when handling high-volume requests

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-lambda-functions-now-scale-12-times-faster-when-handling-high-volume-requests/

Now AWS Lambda scales up to 12 times faster. Each synchronously invoked Lambda function now scales by 1,000 concurrent executions every 10 seconds until the aggregate concurrency across all functions reaches the account’s concurrency limit. In addition, each function within an account now scales independently from each other, no matter how the functions are invoked. These improvements come at no additional cost, and you don’t need to do any configuration in your existing functions.

Building scalable and high-performing applications can be challenging with traditional architectures, often requiring over-provisioning of compute resources or complex caching solutions for peak demands and unpredictable traffic. Many developers choose Lambda because it scales on-demand when applications face unpredictable traffic.

Before this update, Lambda functions could initially scale at the account level by 500–3,000 concurrent executions (depending on the Region) in the first minute, followed by 500 concurrent executions every minute until the account’s concurrency limit is reached. Because this scaling limit was shared between all the functions in the same account and Region, if one function experienced an influx of traffic, it could affect the throughput of other functions in the same account. This increased engineering efforts to monitor a few functions that could burst beyond the account limits, causing a noisy neighbor scenario and reducing the overall concurrency of other functions in the same account.

Now, with these scaling improvements, customers with highly variable traffic can reach concurrency targets faster than before. For instance, a news site publishing a breaking news story or an online store running a flash sale would experience a significant influx of visitors. Thanks to these improvements, they can now scale 12 times faster than before.

In addition, customers that use services such as Amazon Athena and Amazon Redshift with scalar Lambda-based UDFs to perform data enrichment or data transformations will see benefits from these improvements. These services rely on batching data and passing it in chunks to Lambda, simultaneously invoking multiple parallel functions. The enhanced concurrency scaling behavior ensures Lambda can rapidly scale and service level agreement (SLA) requirements are met.

How does this work in practice?
The following graph shows a function receiving requests and processing them every 10 seconds. The account concurrency limit is set to 7,000 concurrent requests and is shared between all the functions in the same account. Each function scaling-up rate is fixed to 1,000 concurrent executions every 10 seconds. This rate is independent from other functions in the same account, making it easier for you to predict how this function will scale and throttle the requests if needed.

  • 09:00:00 – The function has been running for a while, and there are already 1,000 concurrent executions that are being processed.
  • 09:00:10 – Ten seconds later, there is a new burst of 1,000 new requests. This function can process them with no problem because the function can scale up to 1,000 concurrent executions every 10 seconds.
  • 09:00:20 – The same happens here: a thousand new requests.
  • 09:00:30 – The function now receives 1,500 new requests. Because the maximum scale-up capacity for a function is 1,000 requests per 10 seconds, 500 of those requests will get throttled.
  • 09:01:00 – At this time, the function is already processing 4,500 concurrent requests. But there is a burst of 3,000 new requests. Lambda processes 1,000 of the new requests and throttles 2,000 because the function can scale up to 1,000 requests every 10 seconds.
  • 09:01:10 – After 10 seconds, there is another burst of 2,000 requests, and the function can now process 1,000 more requests. However, the remaining 1,000 requests get throttled because the function can scale to 1,000 requests every 10 seconds.
  • 09:01:20 – Now the function is processing 6,500 concurrent requests, and there are 1,000 incoming requests. The first 500 of those requests get processed, but the other 500 get throttled because the function reached the account concurrency limit of 7,000 requests. It’s important to remember that you can raise the account concurrency limit by creating a support ticket in the AWS Management Console.

Example of a function scaling

In the case of having more than one function in your account, the functions scale independently until the total account concurrency limit is reached. After that, all new invocations will be throttled.

Availability
These scaling improvements will be enabled by default for all functions. Starting on November 26 through mid-December, AWS is gradually rolling out these scaling improvements to all AWS Regions except China and GovCloud Regions.

If you want to learn more about Lambda’s new scaling behavior, read the Lambda scaling behavior documentation page.

Marcia

AWS Weekly Roundup – EC2 DL2q instances, PartyRock, Amplify’s 6th birthday, and more – November 20, 2023

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-ec2-dl2q-instances-partyrock-amplifys-6th-birthday-and-more-november-20-2023/

Last week I saw an astonishing 160+ new service launches. There were so many updates that we decided to publish a weekly roundup again. This continues the same innovative pace of the previous week as we are getting closer to AWS re:Invent 2023.

Our News Blog team is also finalizing new blog posts for re:Invent to introduce awesome launches with service teams for your reading pleasure. Jeff Barr shared The Road to AWS re:Invent 2023 to explain our blogging journey and process. Please stay tuned in the next week!

Last week’s launches
Here are some of the launches that caught my attention last week:

Amazon EC2 DL2q instances – New DL2q instances are powered by Qualcomm AI 100 Standard accelerators and are the first to feature Qualcomm’s AI technology in the public cloud. With eight Qualcomm AI 100 Standard accelerators and 128 GiB of total accelerator memory, you can run popular generative artificial intelligence (AI) applications and extend to edge devices across smartphones, autonomous driving, personal compute, and extended reality headsets to develop and validate these AI workloads before deploying.

PartyRock for Amazon Bedrock – We introduced PartyRock, a fun and intuitive hands-on, generative AI app-building playground powered by Amazon Bedrock. You can experiment, learn all about prompt engineering, build mini-apps, and share them with your friends—all without writing any code or creating an AWS account.

You also can now access the Meta Llama 2 Chat 13B foundation model and Cohere Command Light, Embed English, and multilingual models for Amazon Bedrock.

AWS Amplify celebrates its sixth birthday – We announced six new launches; a new documentation site, support for Next.js 14 with our hosting and JavaScript library, added custom token providers and an automatic React Native social sign-in update to Amplify Auth, new ChangePassword and DeleteUser account settings components, and updated all Amplify UI packages to use new Amplify JavaScript v6. You can also use wildcard subdomains when using a custom domain with your Amplify application deployed to AWS Amplify Hosting.

Amplify docs site UI

Also check out other News Blog posts about major launches published in the past week:

Other AWS service launches
Here are some other bundled feature launches per AWS service:

Amazon Athena  – You can use a new cost-based optimizer (CBO) to enhance query performance based on table and column statistics, collected by AWS Glue Data Catalog and Athena JDBC 3.x driver, a new alternative that supports almost all authentication plugins. You can also use Amazon EMR Studio to develop and run interactive queries on Amazon Athena.

Amazon CloudWatch – You can use a new CloudWatch metric called EBS Stalled I/O Check to monitor the health of your Amazon EBS volumes, the regular expression for Amazon CloudWatch Logs Live Tail filter pattern syntax to search and match relevant log events, observability of SAP Sybase ASE database in CloudWatch Application Insights, and up to two stats commands in a Log Insights query to perform aggregations on the results.

Amazon CodeCatalyst – You can connect to a Amazon Virtual Private Cloud (Amazon VPC) from CodeCatalyst Workflows, provision infrastructure using Terraform within CodeCatalyst Workflows, access CodeCatalyst with your workforce identities configured in IAM Identity Center, and create teams made up of members of the CodeCatalyst space.

Amazon Connect – You can use a pre-built queue performance dashboard and Contact Lens conversational analytics dashboard to view and compare real-time and historical aggregated queue performance. You can use quick responses for chats, previously written formats such as typing in ‘/#greet’ to insert a personalized response, and scanning attachments to detect malware or other unwanted content.

AWS Glue – AWS Glue for Apache Spark added new six database connectors: Teradata, SAP HANA, Azure SQL, Azure Cosmos DB, Vertica, and MongoDB, as well as the native connectivity to Amazon OpenSearch Service.

AWS Lambda – You can see single pane view of metrics, logs, and traces in the AWS Lambda console and advanced logging controls to natively capture logs in JSON structured format. You can view the SAM template on the Lambda console and export the function’s configuration to AWS Application Composer. AWS Lambda also supports Java 21 and NodeJS 20 versions built on the new Amazon Linux 2023 runtime.

AWS Local Zones in Dallas – You can enable the new Local Zone in Dallas, Texas, us-east-1-dfw-2a, with Amazon EC2 C6i, M6i, R6i, C6gn, and M6g instances and Amazon EBS volume types gp2, gp3, io1, sc1, and st1. You can also access Amazon ECS, Amazon EKS, Application Load Balancer, and AWS Direct Connect in this new Local Zone to support a broad set of workloads at the edge.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) – You can standardize access control to Kafka resources using AWS Identity and Access Management (IAM) and build Kafka clients for Amazon MSK Serverless written in all programming languages. These are open source client helper libraries and code samples for popular languages, including Java, Python, Go, and JavaScript. Also, Amazon MSK now supports an enhanced version of Apache Kafka 3.6.0 that offers generally available Tiered Storage and automatically sends you storage capacity alerts when you are at risk of exhausting your storage.

Amazon OpenSearch Service Ingestion – You can migrate your data from Elasticsearch version 7.x clusters to the latest versions of Amazon OpenSearch Service and use persistent buffering to protect the durability of incoming data.

Amazon RDS –Amazon RDS for MySQL now supports creating active-active clusters using the Group Replication plugin, upgrading MySQL 5.7 snapshots to MySQL 8.0, and Innovation Release version of MySQL 8.1.

Amazon RDS Custom for SQL Server extends point-in-time recovery support for up to 1,000 databases, supports Service Master Key Retention to use transparent data encryption (TDE), table- and column-level encryption, DBMail and linked servers, and use SQL Server Developer edition with the bring your own media (BYOM).

Additionally, Amazon RDS Multi-AZ deployments with two readable standbys now supports minor version upgrades and system maintenance updates with typically less than one second of downtime when using Amazon RDS Proxy.

AWS Partner Central – You can use an improved user experience in AWS Partner Central to build and promote your offerings and the new Investments tab in the Partner Analytics Dashboard to gain actionable insights. You can now link accounts and associated users between Partner Central and AWS Marketplace and use an enhanced co-sell experience with APN Customer Engagements (ACE) manager.

Amazon QuickSight – You can programmatically manage user access and custom permissions support for roles to restrict QuickSight functionality to the QuickSight account for IAM Identity Center and Active Directory using APIs. You can also use shared restricted folders, a Contributor role and support for data source asset types in folders and the Custom Week Start feature, an addition designed to enhance the data analysis experience for customers across diverse industries and social contexts.

AWS Trusted Advisor – You can use new APIs to programmatically access Trusted Advisor best practices checks, recommendations, and prioritized recommendations and 37 new Amazon RDS checks that provide best practices guidance by analyzing DB instance configuration, usage, and performance data.

There’s a lot more launch news that I haven’t covered. See AWS What’s New for more details.

See you virtually in AWS re:Invent
AWS re:Invent 2023Next week we’ll hear the latest from AWS, learn from experts, and connect with the global cloud community in Las Vegas. If you come, check out the agenda, session catalog, and attendee guides before your departure.

If you’re not able to attend re:Invent in person this year, we’re offering the option to livestream our Keynotes and Innovation Talks. With the registration for online pass, you will have access to on-demand keynote, Innovation Talks, and selected breakout sessions after the event.

Channy

Introducing advanced logging controls for AWS Lambda functions

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/introducing-advanced-logging-controls-for-aws-lambda-functions/

This post is written by Nati Goldberg, Senior Solutions Architect and Shridhar Pandey, Senior Product Manager, AWS Lambda

Today, AWS is launching advanced logging controls for AWS Lambda, giving developers and operators greater control over how function logs are captured, processed, and consumed.

This launch introduces three new capabilities to provide a simplified and enhanced default logging experience on Lambda.

First, you can capture Lambda function logs in JSON structured format without having to use your own logging libraries. JSON structured logs make it easier to search, filter, and analyze large volumes of log entries.

Second, you can control the log level granularity of Lambda function logs without making any code changes, enabling more effective debugging and troubleshooting.

Third, you can also set which Amazon CloudWatch log group Lambda sends logs to, making it easier to aggregate and manage logs at scale.

Overview

Being able to identify and filter relevant log messages is essential to troubleshoot and fix critical issues. To help developers and operators monitor and troubleshoot failures, the Lambda service automatically captures and sends logs to CloudWatch Logs.

Previously, Lambda emitted logs in plaintext format, also known as unstructured log format. This unstructured format could make the logs challenging to query or filter. For example, you had to search and correlate logs manually using well-known string identifiers such as “START”, “END”, “REPORT” or the request id of the function invocation. Without a native way to enrich application logs, you needed custom work to extract data from logs for automated analysis or to build analytics dashboards.

Previously, operators could not control the level of log detail generated by functions. They relied on application development teams to make code changes to emit logs with the required granularity level, such as INFO, DEBUG, or ERROR.

Lambda-based applications often comprise microservices, where a single microservice is composed of multiple single-purpose Lambda functions. Before this launch, Lambda sent logs to a default CloudWatch log group created with the Lambda function with no option to select a log group. Now you can aggregate logs from multiple functions in one place so you can uniformly apply security, governance, and retention policies to your logs.

Capturing Lambda logs in JSON structured format

Lambda now natively supports capturing structured logs in JSON format as a series of key-value pairs, making it easier to search and filter logs more easily.

JSON also enables you to add custom tags and contextual information to logs, enabling automated analysis of large volumes of logs to help understand the function performance. The format adheres to the OpenTelemetry (OTel) Logs Data Model, a popular open-source logging standard, enabling you to use open-source tools to monitor functions.

To set the log format in the Lambda console, select the Configuration tab, choose Monitoring and operations tools on the left pane, then change the log format property:

Currently, Lambda natively supports capturing application logs (logs generated by the function code) and system logs (logs generated by the Lambda service) in JSON structured format.

This is for functions that use non-deprecated versions of Python, Node.js, and Java Lambda managed runtimes, when using Lambda recommended logging methods such as using logging library for Python, console object for Node.js, and LambdaLogger or Log4j for Java.

For other managed runtimes, Lambda currently only supports capturing system logs in JSON structured format. However, you can still capture application logs in JSON structured format for these runtimes by manually configuring logging libraries. See configuring advanced logging controls section in the Lambda Developer Guide to learn more. You can also use Powertools for AWS Lambda to capture logs in JSON structured format.

Changing log format from text to JSON can be a breaking change if you parse logs in a telemetry pipeline. AWS recommends testing any existing telemetry pipelines after switching log format to JSON.

Working with JSON structured format for Node.js Lambda functions

You can use JSON structured format with CloudWatch Embedded Metric Format (EMF) to embed custom metrics alongside JSON structured log messages, and CloudWatch automatically extracts the custom metrics for visualization and alarming. However, to use JSON log format along with EMF libraries for Node.js Lambda functions, you must use the latest version of the EMF client library for Node.js or the latest version of Powertools for AWS Lambda (TypeScript) library.

Configuring log level granularity for Lambda function

You can now filter Lambda logs by log level, such as ERROR, DEBUG, or INFO, without code changes. Simplified log level filtering enables you to choose the required logging granularity level for Lambda functions, without sifting through large volumes of logs to debug errors.

You can specify separate log level filters for application logs (which are logs generated by the function code) and system logs (which are logs generated by the Lambda service, such as START and REPORT log messages). Note that log level controls are only available if the log format of the function is set to JSON.

The Lambda console allows setting both the Application log level and System log level properties:

You can define the granularity level of each log event in your function code. The following statement prints out the event input of the function, emitted as a DEBUG log message:

console.debug(event);

Once configured, log events emitted with a lower log level than the one selected are not published to the function’s CloudWatch log stream. For example, setting the function’s log level to INFO results in DEBUG log events being ignored.

This capability allows you to choose the appropriate amount of logs emitted by functions. For example, you can set a higher log level to improve the signal-to-noise ratio in production logs, or set a lower log level to capture detailed log events for testing or troubleshooting purposes.

Customizing Lambda function’s CloudWatch log group

Previously, you could not specify a custom CloudWatch log group for functions, so you could not stream logs from multiple functions into a shared log group. Also, to set custom retention policy for multiple log groups, you had to create each log group separately using a pre-defined name (for example, /aws/lambda/<function name>).

Now you can select a custom CloudWatch log group to aggregate logs from multiple functions automatically within an application in one place. You can apply security, governance, and retention policies at the application level instead of individually to every function.

To distinguish between logs from different functions in a shared log group, each log stream contains the Lambda function name and version.

You can share the same log group between multiple functions to aggregate logs together. The function’s IAM policy must include the logs:CreateLogStream and logs:PutLogEvents permissions for Lambda to create logs in the specified log group. The Lambda service can optionally create these permissions, when you configure functions in the Lambda console.

You can set the custom log group in the Lambda console by entering the destination log group name. If the entered log group does not exist, Lambda creates it automatically.

Advanced logging controls for Lambda can be configured using Lambda APIAWS Management ConsoleAWS Command Line Interface (CLI), and infrastructure as code (IaC) tools such as AWS Serverless Application Model (AWS SAM) and AWS CloudFormation.

Example of Lambda advanced logging controls

This section demonstrates how to use the new advanced logging controls for Lambda using AWS SAM to build and deploy the resources in your AWS account.

Overview

The following diagram shows Lambda functions processing newly created objects inside an Amazon S3 bucket, where both functions emit logs into the same CloudWatch log group:

The architecture includes the following steps:

  1. A new object is created inside an S3 bucket.
  2. S3 publishes an event using S3 Event Notifications to Amazon EventBridge.
  3. EventBridge triggers two Lambda functions asynchronously.
  4. Each function processes the object to extract labels and text, using Amazon Rekognition and Amazon Textract.
  5. Both functions then emit logs into the same CloudWatch log group.

This uses AWS SAM to define the Lambda functions and configure the required logging controls. The IAM policy allows the function to create a log stream and emit logs to the selected log group:

DetectLabelsFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: detect-labels/
      Handler: app.lambdaHandler
      Runtime: nodejs18.x
      Policies:
        ...
        - Version: 2012-10-17
          Statement:
            - Sid: CloudWatchLogGroup
              Action: 
                - logs:CreateLogStream
                - logs:PutLogEvents
              Resource: !GetAtt CloudWatchLogGroup.Arn
              Effect: Allow
      LoggingConfig:
        LogFormat: JSON 
        ApplicationLogLevel: DEBUG 
        SystemLogLevel: INFO 
        LogGroup: !Ref CloudWatchLogGroup 

Deploying the example

To deploy the example:

  1. Clone the GitHub repository and explore the application.
    git clone https://github.com/aws-samples/advanced-logging-controls-lambda/
    
    cd advanced-logging-controls-lambda
  2. Use AWS SAM to build and deploy the resources to your AWS account. This compiles and builds the application using npm, and then populate the template required to deploy the resources:
    sam build
  3. Deploy the solution to your AWS account with a guided deployment, using AWS SAM CLI interactive flow:
    sam deploy --guided
  4. Enter the following values:
    • Stack Name: advanced-logging-controls-lambda
    • Region: your preferred Region (for example, us-east-1)
    • Parameter UploadsBucketName: enter a unique bucket name.
    • Accept the rest of the initial defaults.
  5. To test the application, use the AWS CLI to copy the sample image into the S3 bucket you created.
    aws s3 cp samples/skateboard.jpg s3://example-s3-images-bucket

Explore CloudWatch Logs to view the logs emitted into the log group created, AggregatedLabelsLogGroup:

The DetectLabels Lambda function emits DEBUG log events in JSON format to the log stream. Log events with the same log level from the ExtractText Lambda function are omitted. This is a result of the different application log level settings for each function (DEBUG and INFO).

You can also use CloudWatch Logs Insights to search, filter, and analyze the logs in JSON format using this sample query:

You can see the results:

Conclusion

Advanced logging controls for Lambda give you greater control over logging. Use advanced logging controls to control your Lambda function’s log level and format, allowing you to search, query, and filter logs to troubleshoot issues more effectively.

You can also choose the CloudWatch log group where Lambda sends your logs. This enables you to aggregate logs from multiple functions into a single log group, apply retention, security, governance policies, and easily manage logs at scale.

To get started, specify the required settings in the Logging Configuration for any new or existing Lambda functions.

Advanced logging controls for Lambda are available in all AWS Regions where Lambda is available at no additional cost. Learn more about AWS Lambda Advanced Logging Controls.

For more serverless learning resources, visit Serverless Land.

Triggering AWS Lambda function from a cross-account Amazon Managed Streaming for Apache Kafka

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/triggering-aws-lambda-function-from-a-cross-account-amazon-managed-streaming-for-apache-kafka/

This post is written by Subham Rakshit, Senior Specialist Solutions Architect, and Ismail Makhlouf, Senior Specialist Solutions Architect.

Many organizations use a multi-account strategy for stream processing applications. This involves decomposing the overall architecture into a single producer account and many consumer accounts. Within AWS, in the producer account, you can use Amazon Managed Streaming for Apache Kafka (Amazon MSK), and in their consumer accounts have AWS Lambda functions for event consumption. This blog post explains how you can trigger Lambda functions from a cross-account Amazon MSK cluster.

The Lambda event sourcing mapping (ESM) for Amazon MSK continuously polls for new events from the Amazon MSK cluster, aggregates them into batches, and then triggers the target Lambda function. The ESM for Amazon MSK functions as a serverless set of Kafka consumers that ensures that each event is processed at least once. Additionally, events are processed in the same order they are received within each Kafka partition. In addition, the ESM batches the stream of data and filters the events based on configured logic.

Overview

Amazon MSK supports two different deployment types: provisioned and serverless. Triggering a Lambda function from a cross-account Amazon MSK cluster is only supported with a provisioned cluster deployed within the same Region. To facilitate this functionality, Amazon MSK uses multi-VPC private connectivity, powered by AWS PrivateLink, which simplifies connecting Kafka consumers hosted in different AWS accounts to an Amazon MSK cluster.

The following diagram illustrates the architecture of this example:

Architecture Diagram

The architecture is divided in two parts: the producer and the consumer.

In the producer account, you have the Amazon MSK cluster with multi-VPC connectivity enabled. Multi-VPC connectivity is only available for authenticated Amazon MSK clusters. Cluster policies are required to grant permissions to other AWS accounts, allowing them to establish private connectivity to the Amazon MSK cluster. You can delegate permissions to relevant roles or users. When combined with AWS Identity and Access Management (IAM) client authentication, cluster policies offer fine-grained control over Kafka data plane permissions for connecting applications.

In the consumer account, you have the Lambda ESM for Amazon MSK and the managed VPC connection deployed within the same VPC. The managed VPC connection allows private connectivity from the consumer application VPC to the Amazon MSK cluster. The Lambda ESM for Amazon MSK connects to the cross-account Amazon MSK cluster via IAM authentication. It also supports SASL/SCRAM, and mutual TLS (mTLS) authenticated clusters. The ESM receives the event from the Kafka topic and invokes the Lambda function to process it.

Deploying the example application

To set up the Lambda function trigger from a cross-account Amazon MSK cluster as the event source, follow these steps. The AWS CloudFormation templates for deploying the example are accessible in the GitHub repository.

As a part of this example, some sample data is published using the Kafka console producer and Lambda processes these events and writes to Amazon S3.

Pre-requisites

For this example, you need two AWS accounts. This post uses the following naming conventions:

  • Producer (for example, account No: 1111 1111 1111): Account that hosts the Amazon MSK cluster and Kafka client instance.
  • Consumer (for example, account No: 2222 2222 2222): Account that hosts the Lambda function and consumes events from Amazon MSK.

To get started:

  1. Clone the repository locally:
    git clone https://github.com/aws-samples/lambda-cross-account-msk.git
  2. Set up the producer account: you must configure the VPC networking, deploy the Amazon MSK cluster, and a Kafka client instance to publish data. To do this, deploy the CloudFormation template producer-account.yaml from the AWS console and take note of the MSKClusterARN from the CloudFormation outputs tab.
  3. Set up the consumer account: To set up the consumer account, you need the Lambda function, IAM role used by the Lambda function, and S3 bucket receiving the data. For this, deploy the CloudFormation template consumer-account.yaml from the AWS console with the input parameter MSKAccountId, that is the producer AWS account ID (for example, account Id: 1111 1111 1111). Note the LambdaRoleArn from the CloudFormation outputs tab.

Setting up multi-VPC connectivity in the Amazon MSK cluster

Once the accounts are created, you must enable connectivity between them. By enabling multi-VPC private connectivity in the Amazon MSK cluster, you set up the network connection to allow the cross-account consumers to connect to the cluster.

  1. In the producer account, navigate to the Amazon MSK console.
  2. Choose producer-cluster, and go to the Properties tab.
  3. Scroll to Networking settings, choose Edit, and select Turn on multi-VPC connectivity. This takes some time, then appears as follows.Networking settings
  4. Add the necessary cluster policy to allow cross-account consumers to connect to Amazon MSK. In the producer account, deploy the CloudFormation template producer-msk-cluster-policy.yaml from the AWS console with the following input parameters:
    • MSKClusterArnAmazon Resource Name (ARN) of the Amazon MSK cluster in producer account. Find this information in the CloudFormation output of producer-account.yaml.
    • LambdaRoleArn – ARN of the IAM role attached to the Lambda function in the consumer account. Find this information in the CloudFormation output of consumer-account.yaml.
    • LambdaAccountId – Consumer AWS account ID (for example, account Id: 2222 2222 2222).

Creating a Kafka topic in Amazon MSK and publishing events

In the producer account, navigate to the Amazon MSK console. Choose the Amazon MSK cluster named producer-cluster. Choose View client information to show the bootstrap server.

Client information

The CloudFormation template also deploys a Kafka client instance to create topics and publish events.

To access the client, go to the Amazon EC2 console and choose the instance producer-KafkaClientInstance1. Connect to EC2 instance with Session Manager:

sudo su - ec2-user
#Set MSK Broker IAM endpoint
export BS=<<Provide IAM bootstrap address here>>

You must use the single-VPC Private endpoint for the Amazon MSK cluster and not the multi-VPC private endpoint, as you are going to publish events from a Kafka console producer from the producer account.

Run these scripts to create the customer topic and publish sample events in the topic:

./kafka_create_topic.sh
./kafka_produce_events.sh

Creating a managed VPC connection in the consumer account

To establish a connection to the Amazon MSK cluster in the producer account, you must create a managed VPC connection in the consumer account. Lambda communicates with cross-account Amazon MSK through this managed VPC connection.

For detailed setup steps, read the Amazon MSK managed VPC connection documentation.

Configuring the Lambda ESM for Amazon MSK

The final step is to set up the Lambda ESM for Amazon MSK. Setting up the ESM enables you to connect to the Amazon MSK cluster in the producer account via the managed VPC endpoint. This allows you to trigger the Lambda function to process the data produced from the Kafka topic:

  1. In the consumer account, go to the Lambda console.
  2. Open the Lambda function msk-lambda-cross-account-iam.
  3. Go to the Configuration tab, select Triggers, and choose Add Trigger.
  4. For Trigger configuration, select Amazon MSK.

Lambda trigger

To configure this trigger:

  1. Select the shared Amazon MSK cluster. This automatically defaults to the IAM authentication that is used to connect to the cluster.
    MSK Lambda trigger
  2. By default, the Active trigger check box is enabled. This ensures that the trigger is in the active state after creation. For the other values:
    1. Keep the Batch size default to 100.
    2. Change the Starting Position to Trim horizon.
    3. Set the Topic name as customer.
    4. Set the Consumer Group ID as msk-lambda-iam.

Trigger configuration

Scroll to the bottom and choose Add. This starts creating the Amazon MSK trigger, which takes several minutes. After creation, the state of the trigger shows as Enabled.

Verifying the output on the consumer side

The Lambda function receives the events and writes them in an S3 bucket.

To validate that the function is working, go to the consumer account and navigate to the S3 console. Search for the cross-account-lambda-consumer-data-<<REGION>>-<<AWS Account Id>> bucket. In the bucket, you see the customer-data-<<datetime>>.csv files.

S3 bucket objects

Cleaning up

You must empty and delete the S3 bucket, managed VPC connection, and the Lambda ESM for Amazon MSK manually from the consumer account. Next, delete the CloudFormation stacks from the AWS console from both the producer and consumer accounts to remove all other resources created as a part of the example.

Conclusion

With Lambda and Amazon MSK, you can now build a decentralized application distributed across multiple AWS accounts. This post shows how you can set up Amazon MSK as an event source for cross-account Lambda functions and also walks you through the configuration required in both producer and consumer accounts.

For further reading on AWS Lambda with Amazon MSK as an event source, visit the documentation.

For more serverless learning resources, visit Serverless Land.

Introducing shared VPC support on Amazon MWAA

Post Syndicated from John Jackson original https://aws.amazon.com/blogs/big-data/introducing-shared-vpc-support-on-amazon-mwaa/

In this post, we demonstrate automating deployment of Amazon Managed Workflows for Apache Airflow (Amazon MWAA) using customer-managed endpoints in a VPC, providing compatibility with shared, or otherwise restricted, VPCs.

Data scientists and engineers have made Apache Airflow a leading open source tool to create data pipelines due to its active open source community, familiar Python development as Directed Acyclic Graph (DAG) workflows, and extensive library of pre-built integrations. Amazon MWAA is a managed service for Airflow that makes it easy to run Airflow on AWS without the operational burden of having to manage the underlying infrastructure. For each Airflow environment, Amazon MWAA creates a single-tenant service VPC, which hosts the metadatabase that stores states and the web server that provides the user interface. Amazon MWAA further manages Airflow scheduler and worker instances in a customer-owned and managed VPC, in order to schedule and run tasks that interact with customer resources. Those Airflow containers in the customer VPC access resources in the service VPC via a VPC endpoint.

Many organizations choose to centrally manage their VPC using AWS Organizations, allowing a VPC in an owner account to be shared with resources in a different participant account. However, because creating a new route outside of a VPC is considered a privileged operation, participant accounts can’t create endpoints in owner VPCs. Furthermore, many customers don’t want to extend the security privileges required to create VPC endpoints to all users provisioning Amazon MWAA environments. In addition to VPC endpoints, customers also wish to restrict data egress via Amazon Simple Queue Service (Amazon SQS) queues, and Amazon SQS access is a requirement in the Amazon MWAA architecture.

Shared VPC support for Amazon MWAA adds the ability for you to manage your own endpoints within your VPCs, adding compatibility to shared and otherwise restricted VPCs. Specifying customer-managed endpoints also provides the ability to meet strict security policies by explicitly restricting VPC resource access to just those needed by your Amazon MWAA environments. This post demonstrates how customer-managed endpoints work with Amazon MWAA and provides examples of how to automate the provisioning of those endpoints.

Solution overview

Shared VPC support for Amazon MWAA allows multiple AWS accounts to create their Airflow environments into shared, centrally managed VPCs. The account that owns the VPC (owner) shares the two private subnets required by Amazon MWAA with other accounts (participants) that belong to the same organization from AWS Organizations. After the subnets are shared, the participants can view, create, modify, and delete Amazon MWAA environments in the subnets shared with them.

When users specify the need for a shared, or otherwise policy-restricted, VPC during environment creation, Amazon MWAA will first create the service VPC resources, then enter a pending state for up to 72 hours, with an Amazon EventBridge notification of the change in state. This allows owners to create the required endpoints on behalf of participants based on endpoint service information from the Amazon MWAA console or API, or programmatically via an AWS Lambda function and EventBridge rule, as in the example in this post.

After those endpoints are created on the owner account, the endpoint service in the single-tenant Amazon MWAA VPC will detect the endpoint connection event and resume environment creation. Should there be an issue, you can cancel environment creation by deleting the environment during this pending state.

This feature also allows you to remove the create, modify, and delete VPCE privileges from the AWS Identity and Access Management (IAM) principal creating Amazon MWAA environments, even when not using a shared VPC, because that permission will instead be imposed on the IAM principal creating the endpoint (the Lambda function in our example). Furthermore, the Amazon MWAA environment will provide the SQS queue Amazon Resource Name (ARN) used by the Airflow Celery Executor to queue tasks (the Celery Executor Queue), allowing you to explicitly enter those resources into your network policy rather than having to provide a more open and generalized permission.

In this example, we create the VPC and Amazon MWAA environment in the same account. For shared VPCs across accounts, the EventBridge rule and Lambda function would exist in the owner account, and the Amazon MWAA environment would be created in the participant account. See Sending and receiving Amazon EventBridge events between AWS accounts for more information.

Prerequisites

You should have the following prerequisites:

  • An AWS account
  • An AWS user in that account, with permissions to create VPCs, VPC endpoints, and Amazon MWAA environments
  • An Amazon Simple Storage Service (Amazon S3) bucket in that account, with a folder called dags

Create the VPC

We begin by creating a restrictive VPC using an AWS CloudFormation template, in order to simulate creating the necessary VPC endpoint and modifying the SQS endpoint policy. If you want to use an existing VPC, you can proceed to the next section.

  1. On the AWS CloudFormation console, choose Create stack and choose With new resources (standard).
  2. Under Specify template, choose Upload a template file.
  3. Now we edit our CloudFormation template to restrict access to Amazon SQS. In cfn-vpc-private-bjs.yml, edit the SqsVpcEndoint section to appear as follows:
   SqsVpcEndoint:
     Type: AWS::EC2::VPCEndpoint
     Properties:
       ServiceName: !Sub "com.amazonaws.${AWS::Region}.sqs"
       VpcEndpointType: Interface
       VpcId: !Ref VPC
       PrivateDnsEnabled: true
       SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
       SecurityGroupIds:
        - !Ref SecurityGroup
       PolicyDocument:
        Statement:
         - Effect: Allow
           Principal: '*'
           Action: '*'
           Resource: []

This additional policy document entry prevents Amazon SQS egress to any resource not explicitly listed.

Now we can create our CloudFormation stack.

  1. On the AWS CloudFormation console, choose Create stack.
  2. Select Upload a template file.
  3. Choose Choose file.
  4. Browse to the file you modified.
  5. Choose Next.
  6. For Stack name, enter MWAA-Environment-VPC.
  7. Choose Next until you reach the review page.
  8. Choose Submit.

Create the Lambda function

We have two options for self-managing our endpoints: manual and automated. In this example, we create a Lambda function that responds to the Amazon MWAA EventBridge notification. You could also use the EventBridge notification to send an Amazon Simple Notification Service (Amazon SNS) message, such as an email, to someone with permission to create the VPC endpoint manually.

First, we create a Lambda function to respond to the EventBridge event that Amazon MWAA will emit.

  1. On the Lambda console, choose Create function.
  2. For Name, enter mwaa-create-lambda.
  3. For Runtime, choose Python 3.11.
  4. Choose Create function.
  5. For Code, in the Code source section, for lambda_function, enter the following code:
    import boto3
    import json
    import logging
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def lambda_handler(event, context):
        if event['detail']['status']=="PENDING":
            detail=event['detail']
            name=detail['name']
            celeryExecutorQueue=detail['celeryExecutorQueue']
            subnetIds=detail['networkConfiguration']['subnetIds']
            securityGroupIds=detail['networkConfiguration']['securityGroupIds']
            databaseVpcEndpointService=detail['databaseVpcEndpointService']
    
            # MWAA does not need to store the VPC ID, but we can get it from the subnets
            client = boto3.client('ec2')
            response = client.describe_subnets(SubnetIds=subnetIds)
            logger.info(response['Subnets'][0]['VpcId'])  
            vpcId=response['Subnets'][0]['VpcId']
            logger.info("vpcId: " + vpcId)       
            
            webserverVpcEndpointService=None
            if detail['webserverAccessMode']=="PRIVATE_ONLY":
                webserverVpcEndpointService=event['detail']['webserverVpcEndpointService']
            
            response = client.describe_vpc_endpoints(
                VpcEndpointIds=[],
                Filters=[
                    {"Name": "vpc-id", "Values": [vpcId]},
                    {"Name": "service-name", "Values": ["*.sqs"]},
                    ],
                MaxResults=1000
            )
            sqsVpcEndpoint=None
            for r in response['VpcEndpoints']:
                if subnetIds[0] in r['SubnetIds'] or subnetIds[0] in r['SubnetIds']:
                    # We are filtering describe by service name, so this must be SQS
                    sqsVpcEndpoint=r
                    break
            
            if sqsVpcEndpoint:
                logger.info("Found SQS endpoint: " + sqsVpcEndpoint['VpcEndpointId'])
    
                logger.info(sqsVpcEndpoint)
                pd = json.loads(sqsVpcEndpoint['PolicyDocument'])
                for s in pd['Statement']:
                    if s['Effect']=='Allow':
                        resource = s['Resource']
                        logger.info(resource)
                        if '*' in resource:
                            logger.info("'*' already allowed")
                        elif celeryExecutorQueue in resource: 
                            logger.info("'"+celeryExecutorQueue+"' already allowed")                
                        else:
                            s['Resource'].append(celeryExecutorQueue)
                            logger.info("Updating SQS policy to " + str(pd))
            
                            client.modify_vpc_endpoint(
                                VpcEndpointId=sqsVpcEndpoint['VpcEndpointId'],
                                PolicyDocument=json.dumps(pd)
                                )
                        break
            
            # create MWAA database endpoint
            logger.info("creating endpoint to " + databaseVpcEndpointService)
            endpointName=name+"-database"
            response = client.create_vpc_endpoint(
                VpcEndpointType='Interface',
                VpcId=vpcId,
                ServiceName=databaseVpcEndpointService,
                SubnetIds=subnetIds,
                SecurityGroupIds=securityGroupIds,
                TagSpecifications=[
                    {
                        "ResourceType": "vpc-endpoint",
                        "Tags": [
                            {
                                "Key": "Name",
                                "Value": endpointName
                            },
                        ]
                    },
                ],           
            )
            logger.info("created VPCE: " + response['VpcEndpoint']['VpcEndpointId'])
                
            # create MWAA web server endpoint (if private)
            if webserverVpcEndpointService:
                endpointName=name+"-webserver"
                logger.info("creating endpoint to " + webserverVpcEndpointService)
                response = client.create_vpc_endpoint(
                    VpcEndpointType='Interface',
                    VpcId=vpcId,
                    ServiceName=webserverVpcEndpointService,
                    SubnetIds=subnetIds,
                    SecurityGroupIds=securityGroupIds,
                    TagSpecifications=[
                        {
                            "ResourceType": "vpc-endpoint",
                            "Tags": [
                                {
                                    "Key": "Name",
                                    "Value": endpointName
                                },
                            ]
                        },
                    ],                  
                )
                logger.info("created VPCE: " + response['VpcEndpoint']['VpcEndpointId'])
    
        return {
            'statusCode': 200,
            'body': json.dumps(event['detail']['status'])
        }

  6. Choose Deploy.
  7. On the Configuration tab of the Lambda function, in the General configuration section, choose Edit.
  8. For Timeout, increate to 5 minutes, 0 seconds.
  9. Choose Save.
  10. In the Permissions section, under Execution role, choose the role name to edit the permissions of this function.
  11. For Permission policies, choose the link under Policy name.
  12. Choose Edit and add a comma and the following statement:
    {
    		"Sid": "Statement1",
    		"Effect": "Allow",
    		"Action": 
    		[
    			"ec2:DescribeVpcEndpoints",
    			"ec2:CreateVpcEndpoint",
    			"ec2:ModifyVpcEndpoint",
                "ec2:DescribeSubnets",
    			"ec2:CreateTags"
    		],
    		"Resource": 
    		[
    			"*"
    		]
    }

The complete policy should look similar to the following:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": "logs:CreateLogGroup",
			"Resource": "arn:aws:logs:us-east-1:112233445566:*"
		},
		{
			"Effect": "Allow",
			"Action": [
				"logs:CreateLogStream",
				"logs:PutLogEvents"
			],
			"Resource": [
				"arn:aws:logs:us-east-1:112233445566:log-group:/aws/lambda/mwaa-create-lambda:*"
			]
		},
		{
			"Sid": "Statement1",
			"Effect": "Allow",
			"Action": [
				"ec2:DescribeVpcEndpoints",
				"ec2:CreateVpcEndpoint",
				"ec2:ModifyVpcEndpoint",
               	"ec2:DescribeSubnets",
				"ec2:CreateTags"
			],
			"Resource": [
				"*"
			]
		}
	]
}
  1. Choose Next until you reach the review page.
  2. Choose Save changes.

Create an EventBridge rule

Next, we configure EventBridge to send the Amazon MWAA notifications to our Lambda function.

  1. On the EventBridge console, choose Create rule.
  2. For Name, enter mwaa-create.
  3. Select Rule with an event pattern.
  4. Choose Next.
  5. For Creation method, choose User pattern form.
  6. Choose Edit pattern.
  7. For Event pattern, enter the following:
    {
      "source": ["aws.airflow"],
      "detail-type": ["MWAA Environment Status Change"]
    }

  8. Choose Next.
  9. For Select a target, choose Lambda function.

You may also specify an SNS notification in order to receive a message when the environment state changes.

  1. For Function, choose mwaa-create-lambda.
  2. Choose Next until you reach the final section, then choose Create rule.

Create an Amazon MWAA environment

Finally, we create an Amazon MWAA environment with customer-managed endpoints.

  1. On the Amazon MWAA console, choose Create environment.
  2. For Name, enter a unique name for your environment.
  3. For Airflow version, choose the latest Airflow version.
  4. For S3 bucket, choose Browse S3 and choose your S3 bucket, or enter the Amazon S3 URI.
  5. For DAGs folder, choose Browse S3 and choose the dags/ folder in your S3 bucket, or enter the Amazon S3 URI.
  6. Choose Next.
  7. For Virtual Private Cloud, choose the VPC you created earlier.
  8. For Web server access, choose Public network (Internet accessible).
  9. For Security groups, deselect Create new security group.
  10. Choose the shared VPC security group created by the CloudFormation template.

Because the security groups of the AWS PrivateLink endpoints from the earlier step are self-referencing, you must choose the same security group for your Amazon MWAA environment.

  1. For Endpoint management, choose Customer managed endpoints.
  2. Keep the remaining settings as default and choose Next.
  3. Choose Create environment.

When your environment is available, you can access it via the Open Airflow UI link on the Amazon MWAA console.

Clean up

Cleaning up resources that are not actively being used reduces costs and is a best practice. If you don’t delete your resources, you can incur additional charges. To clean up your resources, complete the following steps:

  1. Delete your Amazon MWAA environment, EventBridge rule, and Lambda function.
  2. Delete the VPC endpoints created by the Lambda function.
  3. Delete any security groups created, if applicable.
  4. After the above resources have completed deletion, delete the CloudFormation stack to ensure that you have removed all of the remaining resources.

Summary

This post described how to automate environment creation with shared VPC support in Amazon MWAA. This gives you the ability to manage your own endpoints within your VPC, adding compatibility to shared, or otherwise restricted, VPCs. Specifying customer-managed endpoints also provides the ability to meet strict security policies by explicitly restricting VPC resource access to just those needed by their Amazon MWAA environments. To learn more about Amazon MWAA, refer to the Amazon MWAA User Guide. For more posts about Amazon MWAA, visit the Amazon MWAA resources page.


About the author

John Jackson has over 25 years of software experience as a developer, systems architect, and product manager in both startups and large corporations and is the AWS Principal Product Manager responsible for Amazon MWAA.

Managing AWS Lambda runtime upgrades

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/managing-aws-lambda-runtime-upgrades/

This post is written by Julian Wood, Principal Developer Advocate, and Dan Fox, Principal Specialist Serverless Solutions Architect.

AWS Lambda supports multiple programming languages through the use of runtimes. A Lambda runtime provides a language-specific execution environment, which provides the OS, language support, and additional settings, such as environment variables and certificates that you can access from your function code.

You can use managed runtimes that Lambda provides or build your own. Each major programming language release has a separate managed runtime, with a unique runtime identifier, such as python3.11 or nodejs20.x.

Lambda automatically applies patches and security updates to all managed runtimes and their corresponding container base images. Automatic runtime patching is one of the features customers love most about Lambda. When these patches are no longer available, Lambda ends support for the runtime. Over the next few months, Lambda is deprecating a number of popular runtimes, triggered by end of life of upstream language versions and of Amazon Linux 1.

Runtime Deprecation
Node.js 14 Nov 27, 2023
Node.js 16 Mar 11, 2024
Python 3.7 Nov 27, 2023
Java 8 (Amazon Linux 1) Dec 31, 2023
Go 1.x Dec 31, 2023
Ruby 2.7 Dec 07, 2023
Custom Runtime (provided) Dec 31, 2023

Runtime deprecation is not unique to Lambda. You must upgrade code using Python 3.7 or Node.js 14 when those language versions reach end of life, regardless of which compute service your code is running on. Lambda can help make this easier by tracking which runtimes you are using and providing deprecation notifications.

This post contains considerations and best practices for managing runtime deprecations and upgrades when using Lambda. Adopting these techniques makes managing runtime upgrades easier, especially when working with a large number of functions.

Specifying Lambda runtimes

When you deploy your function as a .zip file archive, you choose a runtime when you create the function. To change the runtime, you can update your function’s configuration.

Lambda keeps each managed runtime up to date by taking on the operational burden of patching the runtimes with security updates, bug fixes, new features, performance enhancements, and support for minor version releases. These runtime updates are published as runtime versions. Lambda applies runtime updates to functions by migrating the function from an earlier runtime version to a new runtime version.

You can control how your functions receive these updates using runtime management controls. Runtime versions and runtime updates apply to patch updates for a given Lambda runtime. Lambda does not automatically upgrade functions between major language runtime versions, for example, from nodejs14.x to nodejs18.x.

For a function defined as a container image, you choose a runtime and the Linux distribution when you create the container image. Most customers start with one of the Lambda base container images, although you can also build your own images from scratch. To change the runtime, you create a new container image from a different base container image.

Why does Lambda deprecate runtimes?

Lambda deprecates a runtime when upstream runtime language maintainers mark their language end-of-life or security updates are no longer available.

In almost all cases, the end-of-life date of a language version or operating system is published well in advance. The Lambda runtime deprecation policy gives end-of-life schedules for each language that Lambda supports. Lambda notifies you by email and via your Personal Health Dashboard if you are using a runtime that is scheduled for deprecation.

Lambda runtime deprecation happens in several stages. Lambda first blocks creating new functions that use a given runtime. Lambda later also blocks updating existing functions using the unsupported runtime, except to update to a supported runtime. Lambda does not block invocations of functions that use a deprecated runtime. Function invocations continue indefinitely after the runtime reaches end of support.

Lambda is extending the deprecation notification period from 60 days before deprecation to 180 days. Previously, blocking new function creation happened at deprecation and blocking updates to existing functions 30 days later. Blocking creation of new functions now happens 30 days after deprecation, and blocking updates to existing functions 60 days after.

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

Can Lambda automatically upgrade my runtime?

Moving from one major version of the language runtime to another has a significant risk of being a breaking change. Some libraries and dependencies within a language have deprecation schedules and do not support versions of a language past a certain point. Moving functions to new runtimes could potentially impact large-scale production workloads that customers depend on.

Since Lambda cannot guarantee backward compatibility between major language versions, upgrading the Lambda runtime used by a function is a customer-driven operation.

Lambda function versions

You can use function versions to manage the deployment of your functions. In Lambda, you make code and configuration changes to the default function version, which is called $LATEST. When you publish a function version, Lambda takes a snapshot of the code, runtime, and function configuration to maintain a consistent experience for users of that function version. When you invoke a function, you can specify the version to use or invoke the $LATEST version. Lambda function versions are required when using Provisioned Concurrency or SnapStart.

Some developers use an auto-versioning process by creating a new function version each time they deploy a change. This results in many versions of a function, with only a single version actually in use.

While Lambda applies runtime updates to published function versions, you cannot update the runtime major version for a published function version, for example from Node.js 16 to Node.js 20. To update the runtime for a function, you must update the $LATEST version, then create a new published function version if necessary. This means that different versions of a function can use different runtimes. The following shows the same function with version 1 using Node.js 14.x and version 2 using Node.js 18.x.

Version 1 using Node.js 14.x

Version 1 using Node.js 14.x

Version 2 using Node.js 18.x

Version 2 using Node.js 18.x

Ensure you create a maintenance process for deleting unused function versions, which also impact your Lambda storage quota.

Managing function runtime upgrades

Managing function runtime upgrades should be part of your software delivery lifecycle, in a similar way to how you treat dependencies and security updates. You need to understand which functions are being actively used in your organization. Organizations can create prioritization based on security profiles and/or function usage. You can use the same communication mechanisms you may already be using for handling security vulnerabilities.

Implement preventative guardrails to ensure that developers can only create functions using supported runtimes. Using infrastructure as code, CI/CD pipelines, and robust testing practices makes updating runtimes easier.

Identifying impacted functions

There are tools available to check Lambda runtime configuration and to identify which functions and what published function versions are actually in use. Deleting a function or function version that is no longer in use is the simplest way to avoid runtime deprecations.

You can identify functions using deprecated or soon to be deprecated runtimes using AWS Trusted Advisor. Use the AWS Lambda Functions Using Deprecated Runtimes check, in the Security category that provides 120 days’ notice.

AWS Trusted Advisor Lambda functions using deprecated runtimes

AWS Trusted Advisor Lambda functions using deprecated runtimes

Trusted Advisor scans all versions of your functions, including $LATEST and published versions.

The AWS Command Line Interface (AWS CLI) can list all functions in a specific Region that are using a specific runtime. To find all functions in your account, repeat the following command for each AWS Region and account. Replace the <REGION> and <RUNTIME> parameters with your values. The --function-version ALL parameter causes all function versions to be returned; omit this parameter to return only the $LATEST version.

aws lambda list-functions --function-version ALL --region <REGION> --output text —query "Functions[?Runtime=='<RUNTIME>'].FunctionArn"

You can use AWS Config to create a view of the configuration of resources in your account and also store configuration snapshot data in Amazon S3. AWS Config queries do not support published function versions, they can only query the $LATEST version.

You can then use Amazon Athena and Amazon QuickSight to make dashboards to visualize AWS Config data. For more information, see the Implementing governance in depth for serverless applications learning guide.

Dashboard showing AWS Config data

Dashboard showing AWS Config data

There are a number of ways that you can track Lambda function usage.

You can use Amazon CloudWatch metrics explorer to view Lambda by runtime and track the Invocations metric within the default CloudWatch metrics retention period of 15 months.

Track invocations in Amazon CloudWatch metrics

Track invocations in Amazon CloudWatch metrics

You can turn on AWS CloudTrail data event logging to log an event every time Lambda functions are invoked. This helps you understand what identities are invoking functions and the frequency of their invocations.

AWS Cost and Usage Reports can show which functions are incurring cost and in use.

Limiting runtime usage

AWS CloudFormation Guard is an open-source evaluation tool to validate infrastructure as code templates. Create policy rules to ensure that developers only chose approved runtimes. For more information, see Preventative Controls with AWS CloudFormation Guard.

AWS Config rules allow you to check that Lambda function settings for the runtime match expected values. For more information on running these rules before deployment, see Preventative Controls with AWS Config. You can also reactively flag functions as non-compliant as your governance policies evolve. For more information, see Detective Controls with AWS Config.

Lambda does not currently have service control policies (SCP) to block function creation based on the runtime

Upgrade best practices

Use infrastructure as code tools to build and manage your Lambda functions, which can make it easier to manage upgrades.

Ensure you run tests against your functions when developing locally. Include automated tests as part of your CI/CD pipelines to provide confidence in your runtime upgrades. When rolling out function upgrades, you can use weighted aliases to shift traffic between two function versions as you monitor for errors and failures.

Using runtimes after deprecation

AWS strongly advises you to upgrade your functions to a supported runtime before deprecation to continue to benefit from security patches, bug-fixes, and the latest runtime features. While deprecation does not affect function invocations, you will be using an unsupported runtime, which may have unpatched security vulnerabilities. Your function may eventually stop working, for example, due to a certificate expiry.

Lambda blocks function creation and updates for functions using deprecated runtimes. To create or update functions after these operations are blocked, contact AWS Support.

Conclusion

Lambda is deprecating a number of popular runtimes over the next few months, reflecting the end-of-life of upstream language versions and Amazon Linux 1. This post covers considerations for managing Lambda function runtime upgrades.

For more serverless learning resources, visit Serverless Land.

Node.js 20.x runtime now available in AWS Lambda

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/node-js-20-x-runtime-now-available-in-aws-lambda/

This post is written by Pascal Vogel, Solutions Architect, and Andrea Amorosi, Senior Solutions Architect.

You can now develop AWS Lambda functions using the Node.js 20 runtime. This Node.js version is in active LTS status and ready for general use. To use this new version, specify a runtime parameter value of nodejs20.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 20 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes proven libraries to support common patterns such as observability, Parameter Store integration, idempotency, batch processing, and more.

You can also use Node.js 20 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 20 runtime in your serverless applications.

Node.js 20 runtime updates

Changes to Root CA certificate loading

By default, Node.js includes root certificate authority (CA) certificates from well-known certificate providers. Earlier Lambda Node.js runtimes up to Node.js 18 augmented these certificates with Amazon-specific CA certificates, making it easier to create functions accessing other AWS services. For example, it included the Amazon RDS certificates necessary for validating the server identity certificate installed on your Amazon RDS database.

However, loading these additional certificates has a performance impact during cold start. Starting with Node.js 20, Lambda no longer loads these additional CA certificates by default. The Node.js 20 runtime contains a certificate file with all Amazon CA certificates located at /var/runtime/ca-cert.pem. By setting the NODE_EXTRA_CA_CERTS environment variable to /var/runtime/ca-cert.pem, you can restore the behavior from Node.js 18 and earlier runtimes.

This causes Node.js to validate and load all Amazon CA certificates during a cold start. It can take longer compared to loading only specific certificates. For the best performance, we recommend bundling only the certificates that you need with your deployment package and loading them via NODE_EXTRA_CA_CERTS. The certificates file should consist of one or more trusted root or intermediate CA certificates in PEM format.

For example, for RDS, include the required certificates alongside your code as certificates/rds.pem and then load it as follows:

NODE_EXTRA_CA_CERTS=/var/task/certificates/rds.pem

See Using Lambda environment variables in the AWS Lambda Developer Guide for detailed instructions for setting environment variables.

Amazon Linux 2023

The Node.js 20 runtime is based on the provided.al2023 runtime. The provided.al2023 runtime in turn is based on the Amazon Linux 2023 minimal container image release and brings several improvements over Amazon Linux 2 (AL2).

provided.al2023 contains only the essential components necessary to install other packages and offers a smaller deployment footprint with a compressed image size of less than 40MB compared to the over 100MB AL2-based base image.

With glibc version 2.34, customers have access to a more recent version of glibc, updated from version 2.26 in AL2-based images.

The Amazon Linux 2023 minimal image uses microdnf as package manager, symlinked as dnf, replacing yum in AL2-based images. Additionally, curl and gnupg2 are also included as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Runtime Interface Client

The Node.js 20 runtime uses the open source AWS Lambda NodeJS Runtime Interface Client (RIC). You can now use the same RIC version in your Open Container Initiative (OCI) Lambda container images as the one used by the managed Node.js 20 runtime.

The Node.js 20 runtime supports Lambda response streaming which enables you to send response payload data to callers as it becomes available. Response streaming can improve application performance by reducing time-to-first-byte, can indicate progress during long-running tasks, and allows you to build functions that return payloads larger than 6MB, which is the Lambda limit for buffered responses.

Setting Node.js heap memory size

Node.js allows you to configure the heap size of the v8 engine via the --max-old-space-size and --max-semi-space-size options. By default, Lambda overrides the Node.js default values with values derived from the memory size configured for the function. If you need control over your runtime’s memory allocation, you can now set both of these options using the NODE_OPTIONS environment variable, without needing an exec wrapper script. See Using Lambda environment variables in the AWS Lambda Developer Guide for details.

Use the --max-old-space-size option to set the max memory size of V8’s old memory section, and the --max-semi-space-size option to set the maximum semispace size for V8’s garbage collector. See the Node.js documentation for more details on these options.

Node.js 20 language updates

Language features

With this release, Lambda customers can take advantage of new Node.js 20 language features, including:

  • HTTP(S)/1.1 default keepAlive: Node.js now sets keepAlive to true by default. Any outgoing HTTPs connections use HTTP 1.1 keep-alive with a default waiting window of 5 seconds. This can deliver improved throughput as connections are reused by default.
  • Fetch API is enabled by default: The global Node.js Fetch API is enabled by default. However, it is still an experimental module.
  • Faster URL parsing: Node.js 20 comes with the Ada 2.0 URL parser which brings performance improvements to URL parsing. This has also been back-ported to Node.js 18.7.0.
  • Web Crypto API now stable: The Node.js implementation of the standard Web Crypto API has been marked as stable. You can access the provided cryptographic primitives through globalThis.crypto.
  • Web assembly support: Node.js 20 enables the experimental WebAssembly System Interface (WASI) API by default without the need to set an experimental flag.

For a detailed overview of Node.js 20 language features, see the Node.js 20 release blog post and the Node.js 20 changelog.

Performance considerations

Node.js 19.3 introduced a change that impacts how non-essential modules are lazy-loaded during the Node.js process startup. In terms of the impact to your Lambda functions, this reduces the work during initialization of each execution environment, then if used, the modules will instead be loaded during the first function invoke. This change remains in Node.js 20.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

Migration from Node.js 16

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

In the case of Node.js 16, we have delayed deprecation from the community end of support date on September 11, 2023, to June 12, 2024. This gives customers the opportunity to migrate directly from Node.js 16 to Node.js 20, skipping Node.js 18.

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2020. Starting with Node.js 18, and continuing with Node.js 20, the Lambda Node.js runtimes have upgraded the version of the AWS SDK for JavaScript included in the runtime from v2 to v3. Customers upgrading from Node.js 16 or earlier runtimes who are using the included AWS SDK for JavaScript v2 should upgrade their code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Using the Node.js 20 runtime in AWS Lambda

AWS Management Console

To use the Node.js 20 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 20.x when creating or updating a function. The Node.js 20 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

Select Node.js 20.x when creating a new AWS Lambda function in the AWS Management Console

To update an existing Lambda function to Node.js 20, navigate to the function in the Lambda console, then choose Edit in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

Select Node.js 20.x when updating an existing AWS Lambda function in the AWS Management Console

AWS Lambda – Container Image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile:


FROM public.ecr.aws/lambda/nodejs:20
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

Customers running Node.js 20 Docker images locally, including customers using AWS SAM, will need to upgrade their Docker install to version 20.10.10 or later.

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node20.x to use this version:


AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs20.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

AWS Cloud Development Kit (AWS CDK)

In the AWS CDK, set the runtime attribute to Runtime.NODEJS_20_X to use this version:


import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 20 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node20LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_20_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}

Conclusion

Lambda now supports Node.js 20. This release uses the Amazon Linux 2023 OS, supports configurable CA certificate loading for faster cold starts, as well as other improvements detailed in this blog post.

You can build and deploy functions using the Node.js 20 runtime using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC). You can also use the Node.js 20 container base image if you prefer to build and deploy your functions using container images.

The Node.js 20 runtime empowers developers to build more efficient, powerful, and scalable serverless applications. Try the Node.js runtime in Lambda today and read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 20.

For more serverless learning resources, visit Serverless Land.

AWS Weekly Roundup – CloudFront security dashboard, EBS snapshots improvements, and more – November 13, 2023

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-cloudfront-security-dashboard-ebs-snapshots-improvements-and-more-november-13-2023/

This week, it was really difficult to choose what to recap here because, as we’re getting closer to AWS re:Invent, service teams are delivering new capabilities at an incredible pace.

Last week’s launches
Here are some of the launches that caught my attention last week:

Amazon Aurora – Aurora MySQL zero-ETL integration with Amazon Redshift is now generally available. Get a walk-through in our AWS News Blog post. Here’s a recap of data integration innovations at AWS. Optimized reads for Aurora PostgreSQL provide up to 8x improved query latency and up to 30 percent cost savings for I/O-intensive applications. Here’s more of a deep dive from the AWS Database Blog.

Amazon EBS – You can now block public sharing of EBS snapshots. Read more about how that works in the launch post.

Amazon Data Lifecycle Manager – Support for pre- and post-script automation of EBS snapshots simplifies application-consistent snapshots. Here’s how to use it with Windows applications.

AWS Health – There’s now improved visibility into planned lifecycle events like end of standard support of a Kubernetes version in Amazon EKS, Amazon RDS certificate rotations, and end of support for other open source software. Here’s how it works.

Amazon CloudFront – Unified security dashboard to enable, monitor, and manage common security protections for your web applications directly from the CloudFront console. Read more at Introducing CloudFront Security Dashboard, a Unified CDN and Security Experience.

Amazon Connect – Reduced outbound telephony pricing across Europe and South America. It’s also easier now to deliver persistent chat experiences for end users.

AWS Lambda – Busy week for the Lambda team! There is now support for Amazon Linux 2023 as both a managed runtime and a container base image. More details in this Compute Blog post. There’s also enhanced auto scaling for Kafka event sources (the Compute Blog has a post with more details) and faster polling scale-up rate for Amazon SQS events when AWS Lambda functions are configured with SQS.

AWS CodeBuild – Now supports AWS Lambda compute to build and test software packages. Read about how it works in this post.

Amazon SQS – Now supports JSON protocol to reduce latency and client-side CPU usage. More in the launch post. There’s also a new integration for Amazon SQS in the Amazon EventBridge Pipes console (the week before that, Amazon Kinesis Data Streams was also integrated into the EventBridge Pipes console).

Amazon SNS –  FIFO topics now support 3,000 messages per second by default.

Amazon EventBridge – There are 22 additional Amazon CloudWatch metrics to help you monitor the performance of your event buses. More info in this post from the AWS Compute Blog.

Amazon OpenSearch ServiceNeural search makes it easier to create and manage semantic search applications.

Amazon Timestream – The UNLOAD statement simplifies exporting time-series data for additional insights.

Amazon Comprehend – New trust and safety features with toxicity detection and prompt safety classification. Read how to apply that to generative AI applications using LangChain.

AWS App Runner – Now available in London, Mumbai, and Paris AWS Regions.

AWS Application Migration Service – Support for AWS App2Container replatforming  of .NET and Java based applications.

Amazon FSx for OpenZFS – Now available in ten additional AWS Regions with support for additional deployment types in seven Regions.

AWS Global Accelerator – There’s now IPv6 support for Network Load Balancer (NLB) endpoints. It was already available for Application Load Balancers (ALBs) and Amazon Elastic Compute Cloud (Amazon EC2) instances.

Amazon GuardDuty – New machine learning (ML) capability enhances threat detection for Amazon EKS.

Other AWS news
Some other news and blog posts that you might have missed:

AWS Local Zones Credit Program – If you have low-latency or data residency requirements for your application, our Local Zones Credit Program can get you started. Fill out our form to receive $500 in AWS credits and apply it to a Local Zones workload.

Amazon CodeWhispererCustomizing coding companions for organizations and optimizing for sustainability.

Sharing what we have learned – Creating a correction of errors document to understand what went wrong and what would be done to prevent it from happening again.

Good tips for containers – Securing API endpoints using Amazon API Gateway and Amazon VPC Lattice.

Another post in this amazing series – Let’s Architect! Tools for developers.

A few highlights from Community.AWS:

Don’t miss the latest AWS open source newsletter by my colleague Ricardo.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: Uruguay (November 14), Central Asia (Kazakhstan, Uzbekistan, Kyrgyzstan, and Mongolia on November 17–18), and Guatemala (November 18).

AWS re:Invent (November 27 – December 1) – Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Browse the session catalog and attendee guides and check out the highlights for generative AI. In the AWS re:Invent Builder Hub you can find developer-focused sessions, events, competitions, and content.

Here you can browse all upcoming AWS-led in-person and virtual events and developer-focused events.

And that’s all from me for this week. We’re now taking a break. The next weekly roundup will be after re:Invent!

Danilo

This post is part of our Weekly Roundup series. Check back for a quick roundup of interesting news and announcements from AWS!

Introducing the Amazon Linux 2023 runtime for AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-the-amazon-linux-2023-runtime-for-aws-lambda/

This post is written by Rakshith Rao, Senior Solutions Architect.

AWS Lambda now supports Amazon Linux 2023 (AL2023) as a managed runtime and container base image. Named provided.al2023, this runtime provides an OS-only environment to run your Lambda functions.

It is based on the Amazon Linux 2023 minimal container image release and has several improvements over Amazon Linux 2 (AL2), such as a smaller deployment footprint, updated versions of libraries like glibc, and a new package manager.

What are OS-only Lambda runtimes?

Lambda runtimes define the execution environment where your function runs. They provide the OS, language support, and additional settings such as environment variables and certificates.

Lambda provides managed runtimes for Java, Python, Node.js, .NET, and Ruby. However, if you want to develop your Lambda functions in programming languages that are not supported by Lambda’s managed language runtimes, the ‘provided’ runtime family provides an OS-only environment in which you can run code written in any language. This release extends the provided runtime family to support Amazon Linux 2023.

Customers use these OS-only runtimes in three common scenarios. First, they are used with languages that compile to native code, such as Go, Rust, C++, .NET Native AOT and Java GraalVM Native. Since you only upload the compiled binary to Lambda, these languages do not require a dedicated language runtime, they only require an OS environment in which the binary can run.

Second, the OS-only runtimes also enable building third-party language runtimes that you can use off the shelf. For example, you can write Lambda functions in PHP using Bref, or Swift using the Swift AWS Lambda Runtime.

Third, you can use the OS-only runtime to deploy custom runtimes, which you build for a language or language version which Lambda does not provide a managed runtime. For example, Node.js 19 – Lambda only provides managed runtimes for LTS releases, which for Node.js are the even-numbered releases.

New in Amazon Linux 2023 base image for Lambda

Updated packages

AL2023 base image for Lambda is based on the AL2023-minimal container image and includes various package updates and changes compared with provided.al2.

The version of glibc in the AL2023 base image has been upgraded to 2.34, from 2.26 that was bundled in the AL2 base image. Some libraries that developers wanted to use in provided runtimes required newer versions of glibc. With this launch, you can now use an up-to-date version of glibc with your Lambda function.

The AL2 base image for Lambda came pre-installed with Python 2.7. This was needed because Python was a required dependency for some of the packages that were bundled in the base image. The AL2023 base image for Lambda has removed this dependency on Python 2.7 and does not come with any pre-installed language runtime. You are free to choose and install any compatible Python version that you need.

Since the AL2023 base image for Lambda is based on the AL2023-minimal distribution, you also benefit from a significantly smaller deployment footprint. The new image is less than 40MB compared to the AL2-based base image, which is over 100MB in size. You can find the full list of packages available in the AL2023 base image for Lambda in the “minimal container” column of the AL2023 package list documentation.

Package manager

Amazon Linux 2023 uses dnf as the package manager, replacing yum, which was the default package manager in Amazon Linux 2. AL2023 base image for Lambda uses microdnf as the package manager, which is a standalone implementation of dnf based on libdnf and does not require extra dependencies such as Python. microdnf in provided.al2023 is symlinked as dnf. Note that microdnf does not support all options of dnf. For example, you cannot install a remote rpm using the rpm’s URL or install a local rpm file. Instead, you can use the rpm command directly to install such packages.

This example Dockerfile shows how you can install packages using dnf while building a container-based Lambda function:

# Use the Amazon Linux 2023 Lambda base image
FROM public.ecr.aws/lambda/provided.al2023

# Install the required Python version
RUN dnf install -y python3

Runtime support

With the launch of provided.al2023 you can migrate your AL2 custom runtime-based Lambda functions right away. It also sets the foundation of future Lambda managed runtimes. The future releases of managed language runtimes such as Node.js 20, Python 3.12, Java 21, and .NET 8 are based on Amazon Linux 2023 and will use provided.al2023 as the base image.

Changing runtimes and using other compute services

Previously, the provided.al2 base image was built as a custom image that used a selection of packages from AL2. It included packages like curl and yum that were needed to build functions using custom runtime. Also, each managed language runtime used different packages based on the use case.

Since future releases of managed runtimes use provided.al2023 as the base image, they contain the same set of base packages that come with AL2023-minimal. This simplifies migrating your Lambda function from a custom runtime to a managed language runtime. It also makes it easier to switch to other compute services like AWS Fargate or Amazon Elastic Container Services (ECS) to run your application.

Upgrading from AL1-based runtimes

For more information on Lambda runtime deprecation, see Lambda runtimes.

AL1 end of support is scheduled for December 31, 2023. The AL1-based runtimes go1.x, java8 and provided will be deprecated from this date. You should migrate your Go based Lambda functions to the provided runtime family, such as provided.al2 or provided.al2023. Using a provided runtime offers several benefits over the go1.x runtime. First, you can run your Lambda functions on AWS Graviton2 processors that offer up to 34% better price-performance compared to functions running on x86_64 processors. Second, it offers a smaller deployment package and faster function invoke path. And third, it aligns Go with other languages that also compile to native code and run on the provided runtime family.

The deprecation of the Amazon Linux 1 (AL1) base image for Lambda is also scheduled for December 31, 2023. With provided.al2023 now generally available, you should start planning the migration of your go1.x and AL1 based Lambda functions to provided.al2023.

Using the AL2023 base image for Lambda

To build Lambda functions using a custom runtime, follow these steps using the provided.al2023 runtime.

AWS Management Console

Navigate to the Create function page in the Lambda console. To use the AL2023 custom runtime, select Provide your own bootstrap on Amazon Linux 2023 as the Runtime value:

Runtime value

AWS Serverless Application Model (AWS SAM) template

If you use the AWS SAM template to build and deploy your Lambda function, use the provided.al2023 as the value of the Runtime:

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: hello-world/
      Handler: my.bootstrap.file
      Runtime: provided.al2023

Building Lambda functions that compile natively

Lambda’s custom runtime simplifies the experience to build functions in languages that compile to native code, broadening the range of languages you can use. Lambda provides the Runtime API, an HTTP API that custom runtimes can use to interact with the Lambda service. Implementations of this API, called Runtime Interface Client (RIC), allow your function to receive invocation events from Lambda, send the response back to Lambda, and report errors to the Lambda service. RICs are available as language-specific libraries for several popular programming langauges such as Go, Rust, Python, and Java.

For example, you can build functions using Go as shown in the Building with Go section of the Lambda developer documentation. Note that the name of the executable file of your function should always be bootstrap in provided.al2023 when using the zip deployment model. To use AL2023 in this example, use provided.al2023 as the runtime for your Lambda function.

If you are using CLI set the --runtime option to provided.al2023:

aws lambda create-function --function-name myFunction \
--runtime provided.al2023 --handler bootstrap \
--role arn:aws:iam::111122223333:role/service-role/my-lambda-role \
--zip-file fileb://myFunction.zip

If you are using AWS Serverless Application Model, use provided.al2023 as the value of the Runtime in your AWS SAM template file:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Metadata:
      BuildMethod: go1.x
    Properties:
      CodeUri: hello-world/ # folder where your main program resides
      Handler: bootstrap
      Runtime: provided.al2023
      Architectures: [arm64]

If you run your function as a container image as shown in the Deploy container image example, use this Dockerfile. You can use any name for the executable file of your function when using container images. You need to specify the name of the executable as the ENTRYPOINT in your Dockerfile:

FROM golang:1.20 as build
WORKDIR /helloworld

# Copy dependencies list
COPY go.mod go.sum ./

# Build with optional lambda.norpc tag
COPY main.go .
RUN go build -tags lambda.norpc -o main main.go

# Copy artifacts to a clean image
FROM public.ecr.aws/lambda/provided:al2023
COPY --from=build /helloworld/main ./main
ENTRYPOINT [ "./main" ]

Conclusion

With this launch, you can now build your Lambda functions using Amazon Linux 2023 as the custom runtime or use it as the base image to run your container-based Lambda functions. You benefit from the updated versions of libraries such as glibc, new package manager, and smaller deployment size than Amazon Linux 2 based runtimes. Lambda also uses Amazon Linux 2023-minimal as the basis for future Lambda runtime releases.

For more serverless learning resources, visit Serverless Land.

Scaling improvements when processing Apache Kafka with AWS Lambda

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/scaling-improvements-when-processing-apache-kafka-with-aws-lambda/

AWS Lambda is improving the automatic scaling behavior when processing data from Apache Kafka event-sources. Lambda is increasing the default number of initial consumers, improving how quickly consumers scale up, and helping to ensure that consumers don’t scale down too quickly. There is no additional action that you must take, and there is no additional cost.

Running Kafka on AWS

Apache Kafka is a popular open-source platform for building real-time streaming data pipelines and applications. You can deploy and manage your own Kafka solution on-premises or in the cloud on Amazon EC2.

Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that makes it easier to build and run applications that use Kafka to process streaming data. MSK Serverless is a cluster type for Amazon MSK that allows you to run Kafka without having to manage and scale cluster capacity. It automatically provisions and scales capacity while managing the partitions in your topic, so you can stream data without thinking about right-sizing or scaling clusters. MSK Serverless offers a throughput-based pricing model, so you pay only for what you use. For more information, see Using Kafka to build your streaming application.

Using Lambda to consume records from Kafka

Processing streaming data can be complex in traditional, server-based architectures, especially if you must react in real-time. Many organizations spend significant time and cost managing and scaling their streaming platforms. In order to react fast, they must provision for peak capacity, which adds complexity.

Lambda and serverless architectures remove the undifferentiated heavy lifting when processing Kafka streams. You don’t have to manage infrastructure, can reduce operational overhead, lower costs, and scale on-demand. This helps you focus more on building streaming applications. You can write Lambda functions in a number of programming languages, which provide flexibility when processing streaming data.

Lambda event source mapping

Lambda can integrate natively with your Kafka environments as a consumer to process stream data as soon as it’s generated.

To consume streaming data from Kafka, you configure an event source mapping (ESM) on your Lambda functions. This is a resource managed by the Lambda service, which is separate from your function. It continually polls records from the topics in the Kafka cluster. The ESM optionally filters records and batches them into a payload. Then, it calls the Lambda Invoke API to deliver the payload to your Lambda function synchronously for processing.

As Lambda manages the pollers, you don’t need to manage a fleet of consumers across multiple teams. Each team can create and configure their own ESM with Lambda handling the polling.

AWS Lambda event source mapping

AWS Lambda event source mapping

For more information on using Lambda to process Kafka streams, see the learning guide.

Scaling and throughput

Kafka uses partitions to increase throughput and spread the load of records to all brokers in a cluster.

The Lambda event source mapping resource includes pollers and processors. Pollers have consumers that read records from Kafka partitions. The poller assigners send them to processors which batch the records and invoke your function.

When you create a Kafka event source mapping, Lambda allocates consumers to process all partitions in the Kafka topic. Previously, Lambda allocated a minimum of one processor for a consumer.

Lambda previous initial scaling

Lambda previous initial scaling

With these scaling improvements, Lambda allocates multiple processors to improve processing. This reduces the possibility of a single invoke slowing down the entire processing stream.

Lambda updated initial scaling

Lambda updated initial scaling

Each consumer sends records to multiple processors running in parallel to handle increased workloads. Records in each partition are only assigned to a single processor to maintain order.

Lambda automatically scales up or down the number of consumers and processors based on workload. Lambda samples the consumer offset lag of all the partitions in the topic every minute. If the lag is increasing, this means Lambda can’t keep up with processing the records from the partition.

The scaling algorithm accounts for the current offset lag, and also the rate of new messages added to the topic. Lambda can reach the maximum number of consumers within three minutes to lower the offset lag as quickly as possible. Lambda is also reducing the scale down behavior to ensure records are processed more quickly and latency is reduced, particularly for bursty workloads.

Total processors for all pollers can only scale up to the total number of partitions in the topic.

After successful invokes, the poller periodically commits offsets to the respective brokers.

Lambda further scaling

Lambda further scaling

You can monitor the throughput of your Kafka topic using consumer metrics consumer_lag and consumer_offset.

To check how many function invocations occur in parallel, you can also monitor the concurrency metrics for your function. The concurrency is approximately equal to the total number of processors across all pollers, depending on processor activity. For example, if three pollers have five processors running for a given ESM, the function concurrency would be approximately 15 (5 + 5 + 5).

Seeing the improved scaling in action

There are a number of Serverless Patterns that you can use to process Kafka streams using Lambda. To set up Amazon MSK Serverless, follow the instructions in the GitHub repo:

  1. Create an example Amazon MSK Serverless topic with 1000 partitions.
  2. ./kafka-topics.sh --create --bootstrap-server "{bootstrap-server}" --command-config client.properties --replication-factor 3 --partitions 1000 --topic msk-1000p
  3. Add records to the topic using a UUID as a key to distribute records evenly across partitions. This example adds 13 million records.
  4. for x in {1..13000000}; do echo $(uuidgen -r),message_$x; done | ./kafka-console-producer.sh --broker-list "{bootstrap-server}" --topic msk-1000p --producer.config client.properties --property parse.key=true --property key.separator=, --producer-property acks=all
  5. Create a Python function based on this pattern, which logs the processed records.
  6. Amend the function code to insert a delay of 0.1 seconds to simulate record processing.
  7. import json
    import base64
    import time
    
    def lambda_handler(event, context):
        # Define a variable to keep track of the number of the message in the batch of messages
        i=1
        # Looping through the map for each key (combination of topic and partition)
        for record in event['records']:
            for messages in event['records'][record]:
                print("********************")
                print("Record number: " + str(i))
                print("Topic: " + str(messages['topic']))
                print("Partition: " + str(messages['partition']))
                print("Offset: " + str(messages['offset']))
                print("Timestamp: " + str(messages['timestamp']))
                print("TimestampType: " + str(messages['timestampType']))
                if None is not messages.get('key'):
                    b64decodedKey=base64.b64decode(messages['key'])
                    decodedKey=b64decodedKey.decode('ascii')
                else:
                    decodedKey="null"
                if None is not messages.get('value'):
                    b64decodedValue=base64.b64decode(messages['value'])
                    decodedValue=b64decodedValue.decode('ascii')
                else:
                    decodedValue="null"
                print("Key = " + str(decodedKey))
                print("Value = " + str(decodedValue))
                i=i+1
                time.sleep(0.1)
        return {
            'statusCode': 200,
        }
    
  8. Configure the ESM to point to the previously created cluster and topic.
  9. Use the default batch size of 100. Set the StartingPosition to TRIM_HORIZON to process from the beginning of the stream.
  10. Deploy the function, which also adds and configures the ESM.
  11. View the Amazon CloudWatch ConcurrentExecutions and OffsetLag metrics to view the processing.

With the scaling improvements, once the ESM is configured, the ESM and function scale up to handle the number of partitions.

Lambda automatic scaling improvement graph

Lambda automatic scaling improvement graph

Increasing data processing throughput

It is important that your function can keep pace with the rate of traffic. A growing offset lag means that the function processing cannot keep up. If the age is high in relation to the stream’s retention period, you can lose data as records expire from the stream.

This value should generally not exceed 50% of the stream’s retention period. When the value reaches 100% of the stream retention period, data is lost. One temporary solution is to increase the retention time of the stream. This gives you more time to resolve the issue before losing data.

There are several ways to improve processing throughput.

  1. Avoid processing unnecessary records by using content filtering to control which records Lambda sends to your function. This helps reduce traffic to your function, simplifies code, and reduces overall cost.
  2. Lambda allocates processors across all pollers based on the number of partitions up to a maximum of one concurrent Lambda function per partition. You can increase the number of processing Lambda functions by increasing the number of partitions.
  3. For compute intensive functions, you can increase the memory allocated to your function, which also increases the amount of virtual CPU available. This can help reduce the duration of a processing function.
  4. Lambda polls Kafka with a configurable batch size of records. You can increase the batch size to process more records in a single invocation. This can improve processing time and reduce costs, particularly if your function has an increased init time. A larger batch size increases the latency to process the first record in the batch, but potentially decreases the latency to process the last record in the batch. There is a tradeoff between cost and latency when optimizing a partition’s capacity and the decision depends on the needs of your workload.
  5. Ensure that your producers evenly distribute records across partitions using an effective partition key strategy. A workload is unbalanced when a single key dominates other keys, creating a hot partition, which impacts throughput.

See Increasing data processing throughput for some additional guidance.

Conclusion

Today, AWS Lambda is improving the automatic scaling behavior when processing data from Apache Kafka event-sources. Lambda is increasing the default number of initial consumers, improving how quickly they scale up, and ensuring they don’t scale down too quickly. There is no additional action that you must take, and there is no additional cost.

You can explore the scaling improvements with your existing workloads or deploy an Amazon MSK cluster and try one of the patterns to measure processing time.

To explore using Lambda to process Kafka streams, see the learning guide.

For more serverless learning resources, visit Serverless Land.

AWS CodeBuild adds support for AWS Lambda compute mode

Post Syndicated from Ryan Bachman original https://aws.amazon.com/blogs/devops/aws-codebuild-adds-support-for-aws-lambda-compute-mode/

AWS CodeBuild recently announced that it supports running projects on AWS Lambda. AWS CodeBuild is a fully managed continuous integration (CI) service that allows you to build and test your code without having to manage build servers. This new compute mode enables you to execute your CI process on the same AWS Lambda base images supported by the AWS Lambda service, providing consistency, performance, and cost benefits. If you are already building and deploying microservices or building code packages, its likely these light-weight and smaller code bases will benefit from the efficiencies gained by using Lambda compute for CI. In this post, I will explain the benefits of this new compute mode as well as provide an example CI workflow to demonstrate these benefits.

Consistency Benefits

One of the key benefits of running AWS CodeBuild projects using the AWS Lambda compute mode is consistency. By building and testing your serverless applications on the same base images AWS Lambda provides, you can ensure that your application works as intended before moving to production. This eliminates the possibility of compatibility issues that may arise when deploying to a different environment. Moreover, because the AWS Lambda runtime provides a standardized environment across all regions, you can build your serverless applications in Lambda on CodeBuild and have the confidence to deploy to any region where your customers are located.

Performance and Cost Benefits

Another significant advantage of running AWS CodeBuild projects on the AWS Lambda compute mode is performance. When you run your project with EC2 compute mode, there may be increased start-up latency that impacts the total CI process.  However, because AWS Lambda is designed to process events in near real-time, executing jobs on the AWS Lambda compute mode can result in significant time savings. As a result, by switching to this new compute mode, you save time within your CI process and increase your frequency of integration.

Related to performance, our customers are looking for opportunities to optimize cost and deliver business value at the lowest price point. Customers can see meaningful cost optimization when using Lambda. Lambda-based compute offers better price/performance than CodeBuild’s current Amazon Elastic Compute Cloud (Amazon EC2) based compute — with per-second billing, 6,000 seconds of free tier, and 10GB of ephemeral storage. CodeBuild supports both x86 and Arm using AWS Graviton Processors for the new compute mode. By using the Graviton option, customers can further achieve lower costs with better performance.

Considerations

There are a few things to consider before moving your projects to AWS Lambda compute mode. Because this mode utilizes the same AWS Lambda service used by customers, there is a maximum project duration of 15 minutes and custom build timeouts are not supported. Additionally, local caching, batch builds, and docker image builds are not supported with Lambda compute mode. For a full list of limitations please refer to the AWS CodeBuild User Guide.

Walk-Through

In this walk-through I used a simple React app to showcase the performance benefits of building using CodeBuild’s Lambda compute option. To get started, I created an AWS CodeCommit repository and used npm to create a sample application that I then pushed to my CodeCommit repo.

CodeCommit repository for react code

Figure 1: CodeCommit Repository

Once my sample application is stored in my code repository, I then navigated to the AWS CodeBuild console. Using the console, I created two different CodeBuild projects. I selected my CodeCommit repository for the projects’ source and selected EC2 compute for one and Lambda for the other.

CodeBuild compute mode options

Figure 2: CodeBuild compute mode options

The compute mode is independent from instructions included in the project’s buildspec.yaml, so in order to test and compare the CI job on the two different compute modes, I created a buildspec and pushed that to the CodeCommit repository where I stored my simple React app in the previous step.

buildspec.yaml
version: 0.2
phases:
  build:
    commands:
      - npm install -g yarn
      - yarn
      - npm run build
      - npm run test -- --coverage --watchAll=false --testResultsProcessor="jest-junit"
artifacts:
  name: "build-output"
  base-directory: 'react-app'
  files:
    - "**/*"
reports:
  test-report:
    base-directory: 'react-app'
    files:
      - 'junit.xml'
    file-format: 'JUNITXML'
  coverage-report:
    base-directory: 'react-app'
    files:
      - 'coverage/clover.xml'
    file-format: 'CLOVERXML'

Once my projects were ready, I manually started each project from the console. Below are screenshots of the details of the execution times for each.

EC2 compute mode details

Figure 3: EC2 compute mode details

Lambda Compute mode details

Figure 4: Lambda compute mode details

If you compare the provisioning and build times for this example you will see that the Lambda CodeBuild project executed significantly faster. In this example, the Lambda option was over 50% faster than the same build executed on EC2. If you are frequently building similar microservices that can take advantage of these efficiency gains, it is likely that CodeBuild on Lambda can save you both time and cost in your CI/CD pipelines.

Cleanup

If you followed along the walkthrough, you will want to remove the resources you created. First delete the two projects created in CodeBuild. This can be done by navigating to CodeBuild in the console, selecting each project and then hitting the “Delete build project” button. Next navigate to CodeCommit and delete the repository where you stored the code.

Conclusion

In conclusion, AWS CodeBuild using Lambda compute-mode provides significant benefits in terms of consistency, performance and cost. By building and testing your applications on the same runtime images executed by the AWS Lambda service, you can take advantage of benefits provided by the AWS Lambda service while reducing CI time and costs. We are excited to see how this new feature will help you build and deploy your applications faster and more efficiently. Get started today and take advantage of the benefits of running AWS CodeBuild projects on the AWS Lambda compute mode.

About the Author:
Ryan Bachman profile image

Ryan Bachman

Ryan Bachman is a Sr. Specialist Solutions Architect with Amazon Web Services (AWS) with a focus on DevOps. Ryan is passionate about helping customers adopt process and services that increase their efficiency developing applications for the cloud. He has over 20 years professional experience as a technologist, including roles in development, network architecture, and technical product management.

Introducing faster polling scale-up for AWS Lambda functions configured with Amazon SQS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-faster-polling-scale-up-for-aws-lambda-functions-configured-with-amazon-sqs/

This post was written by Anton Aleksandrov, Principal Solutions Architect, and Tarun Rai Madan, Senior Product Manager.

Today, AWS is announcing that AWS Lambda supports up to five times faster polling scale-up rate for spiky Lambda workloads configured with Amazon Simple Queue Service (Amazon SQS) as an event source.

This feature enables customers building event-driven applications using Lambda and SQS to achieve more responsive scaling during a sudden burst of messages in their SQS queues, and reduces the need to duplicate Lambda functions or SQS queues to achieve faster message processing.

Overview

Customers building modern event-driven and messaging applications with AWS Lambda use the Amazon SQS as a fundamental building block for creating decoupled architectures. Amazon SQS is a fully managed message queueing service for microservices, distributed systems, and serverless applications. When a Lambda function subscribes to an SQS queue as an event source, Lambda polls the queue, retrieves the messages, and sends retrieved messages in batches to the function handler for processing. To consume messages efficiently, Lambda detects the increase in queue depth, and increases the number of poller processes to process the queued messages.

Up until today, the Lambda was adding up to 60 concurrent executions per minute for Lambda functions subscribed to SQS queues, scaling up to a maximum of 1,250 concurrent executions in approximately 20 minutes. However, customers tell us that some of the modern event-driven applications they build using Lambda and SQS are sensitive to sudden spikes in messages, which may cause noticeable delay in processing of messages for end users. In order to harness the power of Lambda for applications that experience a burst of messages in SQS queues, these customers needed Lambda message polling to scale up faster.

With today’s announcement, Lambda functions that subscribe to an SQS queue can scale up to five times faster for queues that see a spike in message backlog, adding up to 300 concurrent executions per minute, and scaling up to a maximum of 1,250 concurrent executions. This scaling improvement helps to use the simplicity of Lambda and SQS integration to build event-driven applications that scale faster during a surge of incoming messages, particularly for real-time systems. It also offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent Lambda invocations per SQS event source.

Controlling the maximum concurrent Lambda invocations by SQS

The new improved scaling rates are automatically applied to all AWS accounts using Lambda and SQS as an event source. There is no explicit action that you must take, and there’s no additional cost. This scaling improvement helps customers to build more performant Lambda applications where they need faster SQS polling scale-up. To prevent potentially overloading the downstream dependencies, Lambda provides customers the control to set the maximum number of concurrent executions at a function level with reserved concurrency, and event source level with maximum concurrency.

The following diagram illustrates settings that you can use to control the flow rate of an SQS event-source. You use reserved concurrency to control function-level scaling, and maximum concurrency to control event source scaling.

Control the flow rate of an SQS event-source

Reserved concurrency is the maximum concurrency that you want to allocate to a function. When a function has reserved concurrency allocated, no other functions can use that concurrency.

AWS recommends using reserved concurrency when you want to ensure a function has enough concurrency to scale up. When an SQS event source is attempting to scale up concurrent Lambda invocations, but the function has already reached the threshold defined by the reserved concurrency, the Lambda service throttles further function invocations.

This may result in SQS event source attempting to scale down, reducing the number of concurrently processed messages. Depending on the queue configuration, the throttled messages are returned to the queue for retrying, expire based on the retention policy, or sent to a dead-letter queue (DLQ) or on-failure destination.

The maximum concurrency setting allows you to control concurrency at the event source level. It allows you to define the maximum number of concurrent invocations the event source attempts to send to the Lambda function. For scenarios where a single function has multiple SQS event sources configured, you can define maximum concurrency for each event source separately, providing more granular control. When trying to add rate control to SQS event sources, AWS recommends you start evaluating maximum concurrency control first, as it provides greater flexibility.

Reserved concurrency and maximum concurrency are complementary capabilities, and can be used together. Maximum concurrency can help to prevent overwhelming downstream systems and throttled invocations. Reserved concurrency helps to ensure available concurrency for the function.

Example scenario

Consider your business must process large volumes of documents from storage. Once every few hours, your business partners upload large volumes of documents to S3 buckets in your account.

For resiliency, you’ve designed your application to send a message to an SQS queue for each of the uploaded documents, so you can efficiently process them without accidentally skipping any. The documents are processed using a Lambda function, which takes around two seconds to process a single document.

Processing these documents is a CPU-intensive operation, so you decide to process a single document per invocation. You want to use the power of Lambda to fan out the parallel processing to as many concurrent execution environments as possible. You want the Lambda function to scale up rapidly to process those documents in parallel as fast as possible, and scale-down to zero once all documents are processed to save costs.

When a business partner uploads 200,000 documents, 200,000 messages are sent to the SQS queue. The Lambda function is configured with an SQS event source, and it starts consuming the messages from the queue.

This diagram shows the results of running the test scenario before the SQS event source scaling improvements. As expected, you can see that concurrent executions grow by 60 per minute. It takes approximately 16 minutes to scale up to 900 concurrent executions gradually and process all the messages in the queue.

Results of running the test scenario before the SQS event source scaling improvements

The following diagram shows the results of running the same test scenario after the SQS event source scaling improvements. The timeframe used for both charts is the same, but the performance on the second chart is better. Concurrent executions grow by 300 per minute. It only takes 4 minutes to scale up to 1,250 concurrent executions, and all the messages in the queue are processed in approximately 8 minutes.

The results of running the same test scenario after the SQS event source scaling improvements

Deploying this example

Use the example project to replicate this performance test in your own AWS account. Follow the instructions in README.md for provisioning the sample project in your AWS accounts using the AWS Cloud Development Kit (CDK).

This example project is configured to demonstrate a large-scale workload processing 200,000 messages. Running this sample project in your account may incur charges. See AWS Lambda pricing and Amazon SQS pricing.

Once deployed, use the application under the “sqs-cannon” directory to send 200,000 messages to the SQS queue (or reconfigure to any other number). It takes several minutes to populate the SQS queue with messages. After all messages are sent, enable the SQS event source, as described in the README.md, and monitor the charts in the provisioned CloudWatch dashboard.

The default concurrency quota for new AWS accounts is 1000. If you haven’t requested an increase in this quota, the number of concurrent executions is capped at this number. Use Service Quotas or contact your account team to request a concurrency increase.

Security best practices

Always use the least privileged permissions when granting your Lambda functions access to SQS queues. This reduces potential attack surface by ensuring that only specific functions have permissions to perform specific actions on specific queues. For example, in case your function only polls from the queue, grant it permission to read messages, but not to send new messages. A function execution role defines which actions your function is allowed to perform on other resources. A queue access policy defines the principals that can access this queue, and the actions that are allowed.

Use server-side encryption (SSE) to store sensitive data in encrypted SQS queues. With SSE, your messages are always stored in encrypted form, and SQS only decrypts them for sending to an authorized consumer. SSE protects the contents of messages in queues using SQS-managed encryption keys (SSE-SQS) or keys managed in the AWS Key Management Service (SSE-KMS).

Conclusion

The improved Lambda SQS event source polling scale-up capability enables up to five times faster scale-up performance for spiky event-driven workloads using SQS queues, at no additional cost. This improvement offers customers the benefit of faster processing during spikes of messages in SQS queues, while continuing to offer the flexibility to limit the maximum concurrent invokes by SQS as an event source.

For more serverless learning resources, visit Serverless Land.

Approaches for migrating users to Amazon Cognito user pools

Post Syndicated from Edward Sun original https://aws.amazon.com/blogs/security/approaches-for-migrating-users-to-amazon-cognito-user-pools/

Update: An earlier version of this post was published on September 14, 2017, on the Front-End Web and Mobile Blog.


Amazon Cognito user pools offer a fully managed OpenID Connect (OIDC) identity provider so you can quickly add authentication and control access to your mobile app or web application. User pools scale to millions of users and add layers of additional features for security, identity federation, app integration, and customization of the user experience. Amazon Cognito is available in regions around the globe, processing over 100 billion authentications each month. You can take advantage of security features when using user pools in Cognito, such as email and phone number verification, multi-factor authentication, and advanced security features, such as compromised credentials detection, and adaptive authentications.

Many customers ask about the best way to migrate their existing users to Amazon Cognito user pools. In this blog post, we describe several different recommended approaches and provide step-by-step instructions on how to implement them.

Key considerations

The main consideration when migrating users across identity providers is maintaining a consistent end-user experience. Ideally, users can continue to use their existing passwords so that their experience is seamless. However, security best practices dictate that passwords should never be stored directly as cleartext in a user store. Instead, passwords are used to compute cryptographic hashes and verifiers that can later be used to verify submitted passwords. This means that you cannot securely export passwords in cleartext form from an existing user store and import them into a Cognito user pool. You might ask your users to choose a new password during the migration. Or, if you want to retain the existing passwords, you need to retain access to the existing hashes and verifiers, at least during the migration period.

A secondary consideration is the migration timeline. For example, do you need a faster migration timeline because your current identity store’s license is expiring? Or do you prefer a slow and steady migration because you are modernizing your current application, and it takes time to connect your existing systems to the new identity provider?

The following two methods define our recommended approaches for migrating existing users into a user pool:

  • Bulk user import – Export your existing users into a comma-separated (.csv) file, and then upload this .csv file to import users into a user pool. Your desired user attributes (except passwords) can be included and mapped to attributes in the target user pool. This approach requires users to reset their passwords when they sign in with Cognito. You can choose to migrate your existing user store entirely in a single import job or split users into multiple jobs for parallel or incremental processing.
  • Just-in-time user migration – Migrate users just in time into a Cognito user pool as they sign in to your mobile or web app. This approach allows users to retain their current passwords, because the migration process captures and verifies the password during the sign-in process, seamlessly migrating them to the Cognito user pool.

In the following sections, we describe the bulk user import and just-in-time user migration methods in more detail and then walk through the steps of each approach.

Bulk user import

You perform bulk import of users into an Amazon Cognito user pool by uploading a .csv file that contains user profile data, including usernames, email addresses, phone numbers, and other attributes. You can download a template .csv file for your user pool from Cognito, with a user schema structured in the template header.

Following is an example of performing bulk user import.

To create an import job

  1. Open the Cognito user pool console and select the target user pool for migration.
  2. On the Users tab, navigate to the Import users section, and choose Create import job.
  3. Figure 1: Create import job

    Figure 1: Create import job

  4. In the Create import job dialog box, download the template.csv file for user import.
  5. Export your existing user data from your existing user directory or store your data into the .csv file
  6. Match the user attribute types with column headings in the template. Each user must have an email address or a phone number that is marked as verified in the .csv file, in order to receive the password reset confirmation code.
  7. Figure 2: Configure import job

    Figure 2: Configure import job

  8. Go back to the Create import job dialog box (as shown in Figure 2) and do the following:
    1. Enter a Job name.
    2. Choose to Create a new IAM role or Use an existing IAM role. This role grants Amazon Cognito permission to write to Amazon CloudWatch Logs in your account, so that Cognito can provide logs for successful imports and errors for skipped or failed transactions.
    3. Upload the .csv file that you have prepared, and choose Create and start job.

Depending on the size of the .csv file, the job can run for minutes or hours, and you can follow the status from that same page in the Amazon Cognito console.

Figure 3: Check import job status

Figure 3: Check import job status

Cognito runs through the import job and imports users with a RESET_REQUIRED state. When users attempt to sign in, Cognito will return PasswordResetRequiredException from the sign-in API, and the app should direct the user into the ForgotPassword flow.

Figure 4: View imported user

Figure 4: View imported user

The bulk import approach can also be used continuously to incrementally import users. You can set up an Extract-Transform-Load (ETL) batch job process to extract incremental changes to your existing user directories, such as the new sign-ups on the existing systems before you switch over to a Cognito user pool. Your batch job will transform the changes into a .csv file to map user attribute schemas, and load the .csv file as a Cognito import job through the CreateUserImportJob CLI or SDK operation. Then start the import job through the StartUserImportJob CLI or SDK operation. For more information, see Importing users into user pools in the Amazon Cognito Developer Guide.

Just-in-time user migration

The just-in-time (JIT) user migration method involves first attempting to sign in the user through the Amazon Cognito user pool. Then, if the user doesn’t exist in the Cognito user pool, Cognito calls your Migrate User Lambda trigger and sends the username and password to the Lambda trigger to sign the user in through the existing user store. If successful, the Migrate User Lambda trigger will also fetch user attributes and return them to Cognito. Then Cognito silently creates the user in the user pool with user attributes, as well as salts and password verifiers from the user-provided password. With the Migrate User Lambda trigger, your client app can start to use the Cognito user pool to sign in users who have already been migrated, and continue migrating users who are signing in for the first time towards the user pool. This just-in-time migration approach helps to create a seamless authentication experience for your users.

Cognito, by default, uses the USER_SRP_AUTH authentication flow with the Secure Remote Password (SRP) protocol. This flow doesn’t involve sending the password across the network, but rather allows the client to exchange a cryptographic proof with the Cognito service to prove the client’s knowledge of the password. For JIT user migration, Cognito needs to verify the username and password against the existing user store. Therefore, you need to enable a different Cognito authentication flow. You can choose to use either the USER_PASSWORD_AUTH flow for client-side authentication or the ADMIN_USER_PASSWORD_AUTH flow for server-side authentication. This will allow the password to be sent to Cognito over an encrypted TLS connection, and allow Cognito to pass the information to the Lambda function to perform user authentication against the original user store.

This JIT approach might not be compatible with existing identity providers that have multi-factor authentication (MFA) enabled, because the Lambda function cannot support multiple rounds of challenges. If the existing identity provider requires MFA, you might consider the alternative JIT migration approach discussed later in this blog post.

Figure 5 illustrates the steps for the JIT sign-in flow. The mobile or web app first tries to sign in the user in the user pool. If the user isn’t already in the user pool, Cognito handles user authentication and invokes the Migrate User Lambda trigger to migrate the user. This flow keeps the logic in the app simple and allows the app to use the Amazon Cognito SDK to sign in users in the standard way. The migration logic takes place in the Lambda function in the backend.

Figure 5: JIT migration user authentication flow

Figure 5: JIT migration user authentication flow

The flow in Figure 5 starts in the mobile or web app, which attempts to sign in the user by using the AWS SDK. If the user doesn’t exist in the user pool, the migration attempt starts. Cognito calls the Migrate User Lambda trigger with triggerSource set to UserMigration_Authentication, and passes the user’s username and password in the request in order to attempt to migrate the user.

This approach also works in the forgot password flow shown in Figure 6, where the user has forgotten their password and hasn’t been migrated yet. In this case, once the user makes a “Forgot Password” request, your mobile or web app will send a forgot password request to Cognito. Cognito invokes your Migrate User Lambda trigger with triggerSource set to UserMigration_ForgotPassword, and passes the username in the request in order to attempt user lookup, migrate the user profile, and facilitate the password reset process.

Figure 6: JIT migration forgot password flow

Figure 6: JIT migration forgot password flow

Just-in-time user migration sample code

In this section, we show sample source codes for a Migrate User Lambda trigger overall structure. We will fill in the commented sections with additional code, shown later in the section. When you set up your own Lambda function, configure a Lambda execution role to grant permissions for CloudWatch logs.

const handler = async (event) => {
    if (event.triggerSource == "UserMigration_Authentication") {
        //***********************************************************************
        // Attempt to sign in the user or verify the password with existing identity store
        // (shown in the Section A – Migrate User of this post)
        //***********************************************************************
    }
    else if (event.triggerSource == "UserMigration_ForgotPassword") {
       //***********************************************************************
       // Attempt to look up the user in your existing identity store
       // (shown in the section B – Forget Password of this post)
       //***********************************************************************
    }
    return event;
};
export { handler };

In the migration flow, the Lambda trigger will sign in the user and verify the user’s password in the existing user store. That may involve a sign-in attempt against your existing user store or a check of the password against a stored hash. You need to customize this step based on your existing setup. You can also create a function to fetch user attributes that you want to migrate. If your existing user store conforms to the OIDC specification, you can parse the ID Token claims to retrieve the user’s attributes. The following example shows how to set the username and attributes for the migrated user.

// Section A – Migrate User
if (event.triggerSource == "UserMigration_Authentication") {
// Attempt to sign in the user or verify the password with the existing user store.
// Add an authenticateUser() functionbased on your existing user store setup. 
    const user = await authenticateUser(event.userName, event.request.password);
    if (user) {
        // Migrating user attributes from the source user store. You can migrate additional attributes as needed.
        event.response.userAttributes = {
            // Setting username and email address
            username: event.userName,
            email: user.emailAddress,
            email_verified: "true",
        };
        // Setting user status to CONFIRMED to autoconfirm users so they can sign in to the user pool
        event.response.finalUserStatus = "CONFIRMED";
        // Setting messageAction to SUPPRESS to decline to send the welcome message that Cognito usually sends to new users
        event.response.messageAction = "SUPPRESS";
        }
    }

The user is now migrated from the existing user store to the user pool, as well as the user’s attributes. Users will also be redirected to your application with the authorization code or JSON Web Tokens, depending on the OAuth 2.0 grant types you configured in the user pool.

Let’s look at the forgot password flow. Your Lambda function calls the existing user store and migrates other attributes in the user’s profile first, and then Lambda sets user attributes in the response to the Cognito user pool. Cognito initiates the ForgotPassword flow and sends a confirmation code to the user to confirm the password reset process. The user needs to have a verified email address or phone number migrated from the existing user store to receive the forgot password confirmation code. The following sample code demonstrates how to complete the ForgotPassword flow.

// Section B – Forgot Password
else if (event.triggerSource == "UserMigration_ForgotPassword") {
        // Look up the user in your existing user store service.  
		// Add a lookupUser() function based on your existing user store setup. 
        const lookupResult = await lookupUser(event.userName);
        if (lookupResult) {
            // Setting user attributes from the source user store
            event.response.userAttributes = {
                username: event.userName,
                // Required to set verified communication to receive password recovery code
                email: lookupResult.emailAddress,
                email_verified: "true",
            };
            event.response.finalUserStatus = "RESET_REQUIRED";
            event.response.messageAction = "SUPPRESS";
        }
    }

Just-in-time user migration – alternative approach

Using the Migrate User Lambda trigger, we showed the JIT migration approach where the app switches to use the Cognito user pool at the beginning of the migration period, to interface with the user for signing in and migrating them from the existing user store. An alternative JIT approach is to maintain the existing systems and user store, but to silently create each user in the Cognito user pool in a backend process as users sign in, then switch over to use Cognito after enough users have been migrated.

Figure 7: JIT migration alternative approach with backend process

Figure 7: JIT migration alternative approach with backend process

Figure 7 shows this alternative approach in depth. When an end user signs in successfully in your mobile or web app, the backend migration process is initiated. This backend process first calls the Cognito admin API operation, AdminCreateUser, to create users and map user attributes in the destination user pool. The user will be created with a temporary password and be placed in FORCE_CHANGE_PASSWORD status. If you capture the user password during the sign-in process, you can also migrate the password by setting it permanently for the newly created user in the Cognito user pool using the AdminSetUserPassword API operation. This operation will also set the user status to CONFIRMED to allow the user to sign in to Cognito using the existing password.

Following is a code example for the AdminCreateUser function using the AWS SDK for JavaScript.

var params = {
    MessageAction: "SUPPRESS",
    UserAttributes: [{
        Name: "name",
        Value: "Nikki Wolf"
    },
    {
        Name: "email",
        Value: "[email protected]"
    },
    {
        Name: "email_verified",
        Value: "True"
    }
    ],
    UserPoolId: "us-east-1_EXAMPLE",
    Username: "nikki_wolf"
};
const cognito = new CognitoIdentityProviderClient();
const createUserCommand  = new AdminCreateUserCommand(params);
await cognito.send (createUserCommand);

The following is a code example for the AdminSetUserPassword function.

var params = {
    UserPoolId: 'us-east-1_EXAMPLE' ,
    Username: 'nikki_wolf' ,
    Password: 'ExamplePassword1$' ,
    Permanent: true
};
const cognito = new CognitoIdentityProviderClient();
const setUserPasswordCommand = new AdminSetUserPasswordCommand(params);
await cognito.send(setUserPasswordCommand);

This alternative approach does not require the app to update its authentication codebase until a majority of users are migrated, but you need to propagate user attribute changes and new user signups from the existing systems to Cognito. If you are capturing and migrating passwords, you should also build a similar logic to capture password changes in existing systems and set the new password in the user pool to keep it synchronized until you perform a full switchover from the existing identity store to the Cognito user pool.

Summary and best practices

In this post, we described our two recommended approaches for migrating users into an Amazon Cognito user pool. You can decide which approach is best suited for your use case. The bulk method is simpler to implement, but it doesn’t preserve user passwords like the just-in-time migration does. The just-in-time migration is transparent to users and mitigates the potential attrition of users that can occur when users need to reset their passwords.

You could also consider a hybrid approach, where you first apply JIT migration as users are actively signing in to your app, and perform bulk import for the remaining less-active users. This hybrid approach helps provide a good experience for your active user communities, while being able to decommission existing user stores in a manageable timeline because you don’t need to wait for every user to sign in and be migrated through JIT migration.

We hope you can use these explanations and code samples to set up the most suitable approach for your migration project.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Edward Sun

Edward Sun

Edward is a Security Specialist Solutions Architect focused on identity and access management. He loves helping customers throughout their cloud transformation journey with architecture design, security best practices, migration, and cost optimizations. Outside of work, Edward enjoys hiking, golfing, and cheering for his alma mater, the Georgia Bulldogs.

Sending and receiving webhooks on AWS: Innovate with event notifications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-and-receiving-webhooks-on-aws-innovate-with-event-notifications/

This post is written by Daniel Wirjo, Solutions Architect, and Justin Plock, Principal Solutions Architect.

Commonly known as reverse APIs or push APIs, webhooks provide a way for applications to integrate to each other and communicate in near real-time. It enables integration for business and system events.

Whether you’re building a software as a service (SaaS) application integrating with your customer workflows, or transaction notifications from a vendor, webhooks play a critical role in unlocking innovation, enhancing user experience, and streamlining operations.

This post explains how to build with webhooks on AWS and covers two scenarios:

  • Webhooks Provider: A SaaS application that sends webhooks to an external API.
  • Webhooks Consumer: An API that receives webhooks with capacity to handle large payloads.

It includes high-level reference architectures with considerations, best practices and code sample to guide your implementation.

Sending webhooks

To send webhooks, you generate events, and deliver them to third-party APIs. These events facilitate updates, workflows, and actions in the third-party system. For example, a payments platform (provider) can send notifications for payment statuses, allowing ecommerce stores (consumers) to ship goods upon confirmation.

AWS reference architecture for a webhook provider

The architecture consists of two services:

  • Webhook delivery: An application that delivers webhooks to an external endpoint specified by the consumer.
  • Subscription management: A management API enabling the consumer to manage their configuration, including specifying endpoints for delivery, and which events for subscription.

AWS reference architecture for a webhook provider

Considerations and best practices for sending webhooks

When building an application to send webhooks, consider the following factors:

Event generation: Consider how you generate events. This example uses Amazon DynamoDB as the data source. Events are generated by change data capture for DynamoDB Streams and sent to Amazon EventBridge Pipes. You then simplify the DynamoDB response format by using an input transformer.

With EventBridge, you send events in near real time. If events are not time-sensitive, you can send multiple events in a batch. This can be done by polling for new events at a specified frequency using EventBridge Scheduler. To generate events from other data sources, consider similar approaches with Amazon Simple Storage Service (S3) Event Notifications or Amazon Kinesis.

Filtering: EventBridge Pipes support filtering by matching event patterns, before the event is routed to the target destination. For example, you can filter for events in relation to status update operations in the payments DynamoDB table to the relevant subscriber API endpoint.

Delivery: EventBridge API Destinations deliver events outside of AWS using REST API calls. To protect the external endpoint from surges in traffic, you set an invocation rate limit. In addition, retries with exponential backoff are handled automatically depending on the error. An Amazon Simple Queue Service (SQS) dead-letter queue retains messages that cannot be delivered. These can provide scalable and resilient delivery.

Payload Structure: Consider how consumers process event payloads. This example uses an input transformer to create a structured payload, aligned to the CloudEvents specification. CloudEvents provides an industry standard format and common payload structure, with developer tools and SDKs for consumers.

Payload Size: For fast and reliable delivery, keep payload size to a minimum. Consider delivering only necessary details, such as identifiers and status. For additional information, you can provide consumers with a separate API. Consumers can then separately call this API to retrieve the additional information.

Security and Authorization: To deliver events securely, you establish a connection using an authorization method such as OAuth. Under the hood, the connection stores the credentials in AWS Secrets Manager, which securely encrypts the credentials.

Subscription Management: Consider how consumers can manage their subscription, such as specifying HTTPS endpoints and event types to subscribe. DynamoDB stores this configuration. Amazon API Gateway, Amazon Cognito, and AWS Lambda provide a management API for operations.

Costs: In practice, sending webhooks incurs cost, which may become significant as you grow and generate more events. Consider implementing usage policies, quotas, and allowing consumers to subscribe only to the event types that they need.

Monetization: Consider billing consumers based on their usage volume or tier. For example, you can offer a free tier to provide a low-friction access to webhooks, but only up to a certain volume. For additional volume, you charge a usage fee that is aligned to the business value that your webhooks provide. At high volumes, you offer a premium tier where you provide dedicated infrastructure for certain consumers.

Monitoring and troubleshooting: Beyond the architecture, consider processes for day-to-day operations. As endpoints are managed by external parties, consider enabling self-service. For example, allow consumers to view statuses, replay events, and search for past webhook logs to diagnose issues.

Advanced Scenarios: This example is designed for popular use cases. For advanced scenarios, consider alternative application integration services noting their Service Quotas. For example, Amazon Simple Notification Service (SNS) for fan-out to a larger number of consumers, Lambda for flexibility to customize payloads and authentication, and AWS Step Functions for orchestrating a circuit breaker pattern to deactivate unreliable subscribers.

Receiving webhooks

To receive webhooks, you require an API to provide to the webhook provider. For example, an ecommerce store (consumer) may rely on notifications provided by their payment platform (provider) to ensure that goods are shipped in a timely manner. Webhooks present a unique scenario as the consumer must be scalable, resilient, and ensure that all requests are received.

AWS reference architecture for a webhook consumer

In this scenario, consider an advanced use case that can handle large payloads by using the claim-check pattern.

AWS reference architecture for a webhook consumer

At a high-level, the architecture consists of:

  • API: An API endpoint to receive webhooks. An event-driven system then authorizes and processes the received webhooks.
  • Payload Store: S3 provides scalable storage for large payloads.
  • Webhook Processing: EventBridge Pipes provide an extensible architecture for processing. It can batch, filter, enrich, and send events to a range of processing services as targets.

Considerations and best practices for receiving webhooks

When building an application to receive webhooks, consider the following factors:

Scalability: Providers typically send events as they occur. API Gateway provides a scalable managed endpoint to receive events. If unavailable or throttled, providers may retry the request, however, this is not guaranteed. Therefore, it is important to configure appropriate rate and burst limits. Throttling requests at the entry point mitigates impact on downstream services, where each service has its own quotas and limits. In many cases, providers are also aware of impact on downstream systems. As such, they send events at a threshold rate limit, typically up to 500 transactions per second (TPS).

Considerations and best practices for receiving webhooks

In addition, API Gateway allows you to validate requests, monitor for any errors, and protect against distributed denial of service (DDoS). This includes Layer 7 and Layer 3 attacks, which are common threats to webhook consumers given public exposure.

Authorization and Verification: Providers can support different authorization methods. Consider a common scenario with Hash-based Message Authentication Code (HMAC), where a shared secret is established and stored in Secrets Manager. A Lambda function then verifies integrity of the message, processing a signature in the request header. Typically, the signature contains a timestamped nonce with an expiry to mitigate replay attacks, where events are sent multiple times by an attacker. Alternatively, if the provider supports OAuth, consider securing the API with Amazon Cognito.

Payload Size: Providers may send a variety of payload sizes. Events can be batched to a single larger request, or they may contain significant information. Consider payload size limits in your event-driven system. API Gateway and Lambda have limits of 10 Mb and 6 Mb. However, DynamoDB and SQS are limited to 400kb and 256kb (with extension for large messages) which can represent a bottleneck.

Instead of processing the entire payload, S3 stores the payload. It is then referenced in DynamoDB, via its bucket name and object key. This is known as the claim-check pattern. With this approach, the architecture supports payloads of up to 6mb, as per the Lambda invocation payload quota.

Considerations and best practices for receiving webhooks

Idempotency: For reliability, many providers prioritize delivering at-least-once, even if it means not guaranteeing exactly once delivery. They can transmit the same request multiple times, resulting in duplicates. To handle this, a Lambda function checks against the event’s unique identifier against previous records in DynamoDB. If not already processed, you create a DynamoDB item.

Ordering: Consider processing requests in its intended order. As most providers prioritize at-least-once delivery, events can be out of order. To indicate order, events may include a timestamp or a sequence identifier in the payload. If not, ordering may be on a best-efforts basis based on when the webhook is received. To handle ordering reliably, select event-driven services that ensure ordering. This example uses DynamoDB Streams and EventBridge Pipes.

Flexible Processing: EventBridge Pipes provide integrations to a range of event-driven services as targets. You can route events to different targets based on filters. Different event types may require different processors. For example, you can use Step Functions for orchestrating complex workflows, Lambda for compute operations with less than 15-minute execution time, SQS to buffer requests, and Amazon Elastic Container Service (ECS) for long-running compute jobs. EventBridge Pipes provide transformation to ensure only necessary payloads are sent, and enrichment if additional information is required.

Costs: This example considers a use case that can handle large payloads. However, if you can ensure that providers send minimal payloads, consider a simpler architecture without the claim-check pattern to minimize cost.

Conclusion

Webhooks are a popular method for applications to communicate, and for businesses to collaborate and integrate with customers and partners.

This post shows how you can build applications to send and receive webhooks on AWS. It uses serverless services such as EventBridge and Lambda, which are well-suited for event-driven use cases. It covers high-level reference architectures, considerations, best practices and code sample to assist in building your solution.

For standards and best practices on webhooks, visit the open-source community resources Webhooks.fyi and CloudEvents.io.

For more serverless learning resources, visit Serverless Land.

Spark on AWS Lambda: An Apache Spark runtime for AWS Lambda

Post Syndicated from John Cherian original https://aws.amazon.com/blogs/big-data/spark-on-aws-lambda-an-apache-spark-runtime-for-aws-lambda/

Spark on AWS Lambda (SoAL) is a framework that runs Apache Spark workloads on AWS Lambda. It’s designed for both batch and event-based workloads, handling data payload sizes from 10 KB to 400 MB. This framework is ideal for batch analytics workloads from Amazon Simple Storage Service (Amazon S3) and event-based streaming from Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon Kinesis. The framework seamlessly integrates data with platforms like Apache Iceberg, Apache Delta Lake, Apache HUDI, Amazon Redshift, and Snowflake, offering a low-cost and scalable data processing solution. SoAL provides a framework that enables you to run data-processing engines like Apache Spark and take advantage of the benefits of serverless architecture, like auto scaling and compute for analytics workloads.

This post highlights the SoAL architecture, provides infrastructure as code (IaC), offers step-by-step instructions for setting up the SoAL framework in your AWS account, and outlines SoAL architectural patterns for enterprises.

Solution overview

Apache Spark offers cluster mode and local mode deployments, with the former incurring latency due to the cluster initialization and warmup. Although Apache Spark’s cluster-based engines are commonly used for data processing, especially with ACID frameworks, they exhibit high resource overhead and slower performance for payloads under 50 MB compared to the more efficient Pandas framework for smaller datasets. When compared to Apache Spark cluster mode, local mode provides faster initialization and better performance for small analytics workloads. The Apache Spark local mode on the SoAL framework is optimized for small analytics workloads, and cluster mode is optimized for larger analytics workloads, making it a versatile framework for enterprises.

We provide an AWS Serverless Application Model (AWS SAM) template, available in the GitHub repo, to deploy the SoAL framework in an AWS account. The AWS SAM template builds the Docker image, pushes it to the Amazon Elastic Container Registry (Amazon ECR) repository, and then creates the Lambda function. The AWS SAM template expedites the setup and adoption of the SoAL framework for AWS customers.

SoAL architecture

The SoAL framework provides local mode and containerized Apache Spark running on Lambda. In the SoAL framework, Lambda runs in a Docker container with Apache Spark and AWS dependencies installed. On invocation, the SoAL framework’s Lambda handler fetches the PySpark script from an S3 folder and submits the Spark job on Lambda. The logs for the Spark jobs are recorded in Amazon CloudWatch.

For both streaming and batch tasks, the Lambda event is sent to the PySpark script as a named argument. Utilizing a container-based image cache along with the warm instance features of Lambda, it was found that the overall JVM warmup time reduced from approx. 70 seconds to under 30 seconds. It was observed that the framework performs well with batch payloads up to 400 MB and streaming data from Amazon MSK and Kinesis. The per-session costs for any given analytics workload depends on the number of requests, the run duration, and the memory configured for the Lambda functions.

The following diagram illustrates the SoAL architecture.

Enterprise architecture

The PySpark script is developed in standard Spark and is compatible with the SoAL framework, Amazon EMR, Amazon EMR Serverless, and AWS Glue. If needed, you can use the PySpark scripts in cluster mode on Amazon EMR, EMR Serverless, and AWS Glue. For analytics workloads with a size between a few KBs and 400 MB, you can use the SoAL framework on Lambda and in larger analytics workload scenarios over 400 MB, and run the same PySpark script on AWS cluster-based tools like Amazon EMR, EMR Serverless, and AWS Glue. The extensible script and architecture make SoAL a scalable framework for analytics workloads for enterprises. The following diagram illustrates this architecture.

Prerequisites

To implement this solution, you need an AWS Identity and Access Management (IAM) role with permission to AWS CloudFormation, Amazon ECR, Lambda, and AWS CodeBuild.

Set up the solution

To set up the solution in an AWS account, complete the following steps:

  1. Clone the GitHub repository to local storage and change the directory within the cloned folder to the CloudFormation folder:
    git clone https://github.com/aws-samples/spark-on-aws-lambda.git

  2. Run the AWS SAM template sam-imagebuilder.yaml using the following command with the stack name and framework of your choice. In this example, the framework is Apache HUDI:
    sam deploy --template-file sam-imagebuilder.yaml --stack-name spark-on-lambda-image-builder --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM --resolve-s3 --parameter-overrides 'ParameterKey=FRAMEWORK,ParameterValue=HUDI'

The command deploys a CloudFormation stack called spark-on-lambda-image-builder. The command runs a CodeBuild project that builds and pushes the Docker image with the latest tag to Amazon ECR. The command has a parameter called ParameterValue for each open-source framework (Apache Delta, Apache HUDI, and Apache Iceberg).

  1. After the stack has been successfully deployed, copy the ECR repository URI (spark-on-lambda-image-builder) that is displayed in the output of the stack.
  2. Run the AWS SAM Lambda package with the required Region and ECR repository:
    sam deploy --template-file sam-template.yaml --stack-name spark-on-lambda-stack --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM --resolve-s3 --image-repository <accountno>.dkr.ecr.us-east-1.amazonaws.com/sparkonlambda-spark-on-lambda-image-builder --parameter-overrides 'ParameterKey=ScriptBucket,ParameterValue=<Provide the s3 bcucket of the script> ParameterKey=SparkScript,ParameterValue=<provide s3 folder lcoation> ParameterKey=ImageUri,ParameterValue=<accountno>.dkr.ecr.us-east-1.amazonaws.com/sparkonlambda-spark-on-lambda-image-builder:latest'

This command creates the Lambda function with the container image from the ECR repository. An output file packaged-template.yaml is created in the local directory.

  1. Optionally, to publish the AWS SAM application to the AWS Serverless Application Repository, run the following command. This allows AWS SAM template sharing with the GUI interface using AWS Serverless Application Repository and other developers to use quick deployments in the future.
    sam publish --template packaged-template.yaml

After you run this command, a Lambda function is created using the SoAL framework runtime.

  1. To test it, use PySpark scripts from the spark-scripts folder. Place the sample script and accomodations.csv dataset in an S3 folder and provide the location via the Lambda environment variables SCRIPT_BUCKET and SCRIPT_LOCATION.

After Lambda is invoked, it uploads the PySpark script from the S3 folder to a container local storage and runs it on the SoAL framework container using SPARK-SUBMIT. The Lambda event is also passed to the PySpark script.

Clean up

Deploying an AWS SAM template incurs costs. Delete the Docker image from Amazon ECR, delete the Lambda function, and remove all the files or scripts from the S3 location. You can also use the following command to delete the stack:

sam delete --stack-name spark-on-lambda-stack

Conclusion

The SoAL framework enables you to run Apache Spark serverless tasks on AWS Lambda efficiently and cost-effectively. Beyond cost savings, it ensures swift processing times for small to medium files. As a holistic enterprise vision, SoAL seamlessly bridges the gap between big and small data processing, using the power of the Apache Spark runtime across both Lambda and other cluster-based AWS resources.

Follow the steps in this post to use the SoAL framework in your AWS account, and leave a comment if you have any questions.


About the authors

John Cherian is Senior Solutions Architect(SA) at Amazon Web Services helps customers with strategy and architecture for building solutions on AWS.

Emerson Antony is Senior Cloud Architect at Amazon Web Services helps customers with implementing AWS solutions.

Kiran Anand is Principal AWS Data Lab Architect at Amazon Web Services helps customers with Big data & Analytics architecture.

Enhancing runtime security and governance with the AWS Lambda Runtime API proxy extension

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/enhancing-runtime-security-and-governance-with-the-aws-lambda-runtime-api-proxy-extension/

This post is written by Anton Aleksandrov, Principal Serverless Solutions Architect,  and Shridhar Pandey, Senior AWS Lambda Product Manager.

AWS Lambda runtimes use the Lambda Runtime API to communicate with the Lambda service. Runtimes use it to retrieve inbound events to be processed by the function handler, return successful handler responses to the Lambda service, and report failures. This post shows how to intercept inbound events and outbound responses without changing function code, using the Runtime API proxy pattern.

This approach enables security vendors and engineering teams to provide enhanced, non-invasive security and governance tools for Lambda functions. Use cases include sanitizing event payload, blocking malicious events, and auditing and augmenting payloads.

Overview

The Lambda Runtime API is an HTTP endpoint available within the Lambda execution environment. It allows the Lambda runtime that executes the function code to communicate with the Lambda service. It is used by managed Lambda runtimes, such as Node.js or Python, as well as custom runtime, which enables developers to create their own Lambda runtime in any programming language of their choice.

Lambda runtimes use the Runtime API to retrieve the next incoming event to be processed by the function handler and return the handler response to the Lambda service.

Lambda Extensions enable you to integrate Lambda functions with your organization’s preferred tools for monitoring, observability, security, and governance. You can use extensions from AWS, AWS Lambda Ready partners, and open-source projects for a wide range of use cases. Extensions allow adding functionality, such as pre-fetching configurations or dispatching telemetry, without making intrusive changes to function code. Lambda Extensions are packaged as Lambda layers and run as a separate process in the execution environment.

This is how runtimes and extensions communicate with the Lambda service via the Runtime API and Extensions API endpoints:

AWS Lambda Runtime API and Extensions API endpoints

AWS Lambda Runtime API and Extensions API endpoints

When you register your extension with the Lambda service, you can specify you want to receive the INVOKE event. The Lambda service sends this event to the extension asynchronously when a function is invoked.

The information supplied contains the function invocation metadata, but not the event payload. This makes the event useful for observability, but limited for application security and governance use cases, such as inspecting payloads for vulnerabilities, sanitizing inputs, and blocking malicious events.

The Lambda Runtime API proxy is a pattern that enables you to hook into the function invocation request and response lifecycle. It allows you to use extensions to implement advanced security, compliance, governance, and observability scenarios without changes to function code. You can add runtime security mechanisms, implement audit procedures for data flowing in and out of the function, enhance observability by auto-injecting tracing headers, and many more.

Understanding the Lambda Runtime API workflow

This is how the Lambda Runtime consumes the Lambda Runtime API:

How the Lambda Runtime consumes the Lambda Runtime API

How the Lambda Runtime consumes the Lambda Runtime API

Lambda runtimes use the value of the AWS_LAMBDA_RUNTIME_API environment variable to make Runtime API requests. The two primary endpoints are /next, which is used to retrieve the next event to process, and /response, which is used to return event processing results to the Lambda service. In addition, the Runtime API also provides endpoints for reporting failures. See the full protocol and schema definitions of the Runtime API.

How the Runtime API proxy approach works

The Runtime API proxy is a component that you can build to hook into the invocation workflow. It proxies requests and responses, allowing you to augment them, and control the workflow:

Runtime API proxy hooks

Runtime API proxy hooks

When the Lambda service creates a new execution environment, it starts by initializing the extensions attached to the function. The execution environment waits for all extensions to register with the Lambda service by calling the Extensions API /register endpoint, then proceeds to initialize the runtime. This allows you to start the Runtime API proxy HTTP listener during extension initialization, making it ready to serve the runtime requests.

Runtime API proxy flow

Runtime API proxy flow

By default, the value of the AWS_LAMBDA_RUNTIME_API environment variable in the runtime process points to the Lambda Runtime API endpoint 127.0.0.1:9001. You can use a wrapper script to change that value to point to the Runtime API proxy endpoint instead.

A wrapper script enables you to customize the runtime startup behavior of your Lambda function by letting you set configuration parameters that cannot be set through language-specific environment variables. You can add a wrapper script to your function by setting the AWS_LAMBDA_EXEC_WRAPPER environment variable. The following wrapper script assumes that the Runtime API Proxy is listening on port 9009.

#!/bin/bash
export AWS_LAMBDA_RUNTIME_API="127.0.0.1:9009"
exec "$@"

You can either add this export line to an existing wrapper script or create a new one.

Runtime API proxy example

Runtime API proxy example

The Runtime API Proxy is bootstrapped by the Lambda service when a new execution environment is created and it’s ready to proxy requests from the Lambda runtime to the Runtime API before first invocation.

Implementing the Runtime API proxy logic

AWS recommends you implement extensions using a programming language that compiles to a binary executable, such as Golang or Rust. This allows you to use the extension with any Lambda runtime. Extensions implemented in interpreted languages, such as JavaScript and Python, or languages that require additional virtual machines, such as Java and C#, can only be used with that specific runtime.

This diagram shows a scenario where both incoming events and outbound responses are processed by the extension. You can use this workflow for auditing event or response payloads, sanitizing them, or injecting additional properties. You can use it for scenarios like masking account numbers, stripping personally identifiable information (PII), or injecting observability headers.

Runtime API proxy logic

Runtime API proxy logic

This diagram demonstrates an advanced scenario, where the first inbound event is identified as malicious, and rejected by the Runtime API proxy. The function handler is not invoked. The second event is not flagged as malicious, and is therefore forwarded to the handler for processing. You can use this workflow for security scenarios like runtime application protection.

Runtime API proxy security scenario

Runtime API proxy security scenario

AWS Partners using the Runtime API Proxy solution

“Using Lambda Runtime API proxy solution is a game-changing approach for us. It enables us to support multiple Lambda runtimes with a single implementation, provides comprehensive visibility into Lambda execution, and allows to detect attackers targeting serverless applications,” says Julio Guerra, Engineering Manager, Application Security Management, Datadog.

“Lambda Runtime API proxy is a simple solution that gives us a pluggable way to protect Lambda Function URLs. We can implement request authorization and enrichment with no changes to function code,” says Ilya Zilber, Senior Manager, Solutions Engineering, Okta Inc.

Security best practices

Extensions run within the same execution environment as the function, so they have the same level of access to resources such as file system, networking, and environment variables. IAM permissions assigned to the function are shared with extensions. Our guidance is to assign the least required privileges to your functions.

Always install extensions from a trusted source only. Use Infrastructure as Code (IaC) tools, such as AWS CloudFormation, to simplify the task of attaching the same extension configuration, including AWS Identity and Access Management (IAM) permissions, to multiple functions. Additionally, IaC tools allow you to have an audit record of extensions and versions you’ve used previously.

When building extensions, do not log sensitive data. Sanitize payloads and metadata before logging or persisting them for audit purposes.

Considerations

The Runtime API proxy approach allows you to hook into the Lambda request/response workflow, enabling new security and observability use cases. There are several important considerations:

  • This requires you to have a good understanding of the Lambda execution environment lifecycle and the Lambda Runtime API. You must implement proxying for all Runtime API endpoints and handle potential runtime failures.
  • Prepare your extension for composability for scenarions in which more than one extension implements the Runtime API proxy pattern. Allow your extension consumers to configure the extension via environment variables using at least two parameters – the port your proxy listens on and the Runtime API endpoint your proxy forwards requests to. The latter should default to the original value of the AWS_LAMBDA_RUNTIME_API environment variable. See sample implementations below for details.
  • Proxying API requests with default buffered responses requires additional work to support functions with response payload streaming.
  • Proxying API requests adds latency. The added overhead depends on your implementation. AWS recommends using programming languages that can be compiled to an executable binary, such as Rust and Golang, and keeping your extensions lightweight and optimized.

Samples

You can find sample extensions implementing the Runtime API Proxy at https://github.com/aws-samples/aws-lambda-extensions/. See Golang, Rust, and Node.js samples.

Follow the instructions described in README.md for a step-by-step tutorial on running the extension.

Conclusion

This post introduces and illustrates the Lambda Runtime API proxy pattern. You can use this pattern to hook into the Lambda function request and response workflow to intercept, process, audit, modify, and block inbound events and handler responses.

You can use this pattern to implement enhanced runtime security and governance scenarios, as well as scenarios from other domains.. AWS customers and partners can use this advanced solution approach to add enhanced security and observability to Lambda functions without requiring code changes.

For more serverless learning resources, visit Serverless Land.

Unstructured data management and governance using AWS AI/ML and analytics services

Post Syndicated from Sakti Mishra original https://aws.amazon.com/blogs/big-data/unstructured-data-management-and-governance-using-aws-ai-ml-and-analytics-services/

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Unstructured information may have a little or a lot of structure but in ways that are unexpected or inconsistent. Text, images, audio, and videos are common examples of unstructured data. Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data. After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data.

In this post, we discuss how AWS can help you successfully address the challenges of extracting insights from unstructured data. We discuss various design patterns and architectures for extracting and cataloging valuable insights from unstructured data using AWS. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

Why it’s challenging to process and manage unstructured data

Unstructured data makes up a large proportion of the data in the enterprise that can’t be stored in a traditional relational database management systems (RDBMS). Understanding the data, categorizing it, storing it, and extracting insights from it can be challenging. In addition, identifying incremental changes requires specialized patterns and detecting sensitive data and meeting compliance requirements calls for sophisticated functions. It can be difficult to integrate unstructured data with structured data from existing information systems. Some view structured and unstructured data as apples and oranges, instead of being complementary. But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly.

Solution overview

Data and metadata discovery is one of the primary requirements in data analytics, where data consumers explore what data is available and in what format, and then consume or query it for analysis. If you can apply a schema on top of the dataset, then it’s straightforward to query because you can load the data into a database or impose a virtual table schema for querying. But in the case of unstructured data, metadata discovery is challenging because the raw data isn’t easily readable.

You can integrate different technologies or tools to build a solution. In this post, we explain how to integrate different AWS services to provide an end-to-end solution that includes data extraction, management, and governance.

The solution integrates data in three tiers. The first is the raw input data that gets ingested by source systems, the second is the output data that gets extracted from input data using AI, and the third is the metadata layer that maintains a relationship between them for data discovery.

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store.

Unstructured Data Management - Block Level Architecture Diagram

The steps of the workflow are as follows:

  1. Integrated AI services extract data from the unstructured data.
  2. These services write the output to a data lake.
  3. A metadata layer helps build the relationship between the raw data and AI extracted output. When the data and metadata are available for end-users, we can break the user access pattern into additional steps.
  4. In the metadata catalog discovery step, we can use query engines to access the metadata for discovery and apply filters as per our analytics needs. Then we move to the next stage of accessing the actual data extracted from the raw unstructured data.
  5. The end-user accesses the output of the AI services and uses the query engines to query the structured data available in the data lake. We can optionally integrate additional tools that help control access and provide governance.
  6. There might be scenarios where, after accessing the AI extracted output, the end-user wants to access the original raw object (such as media files) for further analysis. Additionally, we need to make sure we have access control policies so the end-user has access only to the respective raw data they want to access.

Now that we understand the high-level architecture, let’s discuss what AWS services we can integrate in each step of the architecture to provide an end-to-end solution.

The following diagram is the enhanced version of our solution architecture, where we have integrated AWS services.

Unstructured Data Management - AWS Native Architecture

Let’s understand how these AWS services are integrated in detail. We have divided the steps into two broad user flows: data processing and metadata enrichment (Steps 1–3) and end-users accessing the data and metadata with fine-grained access control (Steps 4–6).

  1. Various AI services (which we discuss in the next section) extract data from the unstructured datasets.
  2. The output is written to an Amazon Simple Storage Service (Amazon S3) bucket (labeled Extracted JSON in the preceding diagram). Optionally, we can restructure the input raw objects for better partitioning, which can help while implementing fine-grained access control on the raw input data (labeled as the Partitioned bucket in the diagram).
  3. After the initial data extraction phase, we can apply additional transformations to enrich the datasets using AWS Glue. We also build an additional metadata layer, which maintains a relationship between the raw S3 object path, the AI extracted output path, the optional enriched version S3 path, and any other metadata that will help the end-user discover the data.
  4. In the metadata catalog discovery step, we use the AWS Glue Data Catalog as the technical catalog, Amazon Athena and Amazon Redshift Spectrum as query engines, AWS Lake Formation for fine-grained access control, and Amazon DataZone for additional governance.
  5. The AI extracted output is expected to be available as a delimited file or in JSON format. We can create an AWS Glue Data Catalog table for querying using Athena or Redshift Spectrum. Like the previous step, we can use Lake Formation policies for fine-grained access control.
  6. Lastly, the end-user accesses the raw unstructured data available in Amazon S3 for further analysis. We have proposed integrating Amazon S3 Access Points for access control at this layer. We explain this in detail later in this post.

Now let’s expand the following parts of the architecture to understand the implementation better:

  • Using AWS AI services to process unstructured data
  • Using S3 Access Points to integrate access control on raw S3 unstructured data

Process unstructured data with AWS AI services

As we discussed earlier, unstructured data can come in a variety of formats, such as text, audio, video, and images, and each type of data requires a different approach for extracting metadata. AWS AI services are designed to extract metadata from different types of unstructured data. The following are the most commonly used services for unstructured data processing:

  • Amazon Comprehend – This natural language processing (NLP) service uses ML to extract metadata from text data. It can analyze text in multiple languages, detect entities, extract key phrases, determine sentiment, and more. With Amazon Comprehend, you can easily gain insights from large volumes of text data such as extracting product entity, customer name, and sentiment from social media posts.
  • Amazon Transcribe – This speech-to-text service uses ML to convert speech to text and extract metadata from audio data. It can recognize multiple speakers, transcribe conversations, identify keywords, and more. With Amazon Transcribe, you can convert unstructured data such as customer support recordings into text and further derive insights from it.
  • Amazon Rekognition – This image and video analysis service uses ML to extract metadata from visual data. It can recognize objects, people, faces, and text, detect inappropriate content, and more. With Amazon Rekognition, you can easily analyze images and videos to gain insights such as identifying entity type (human or other) and identifying if the person is a known celebrity in an image.
  • Amazon Textract – You can use this ML service to extract metadata from scanned documents and images. It can extract text, tables, and forms from images, PDFs, and scanned documents. With Amazon Textract, you can digitize documents and extract data such as customer name, product name, product price, and date from an invoice.
  • Amazon SageMaker – This service enables you to build and deploy custom ML models for a wide range of use cases, including extracting metadata from unstructured data. With SageMaker, you can build custom models that are tailored to your specific needs, which can be particularly useful for extracting metadata from unstructured data that requires a high degree of accuracy or domain-specific knowledge.
  • Amazon Bedrock – This fully managed service offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API. It also offers a broad set of capabilities to build generative AI applications, simplifying development while maintaining privacy and security.

With these specialized AI services, you can efficiently extract metadata from unstructured data and use it for further analysis and insights. It’s important to note that each service has its own strengths and limitations, and choosing the right service for your specific use case is critical for achieving accurate and reliable results.

AWS AI services are available via various APIs, which enables you to integrate AI capabilities into your applications and workflows. AWS Step Functions is a serverless workflow service that allows you to coordinate and orchestrate multiple AWS services, including AI services, into a single workflow. This can be particularly useful when you need to process large amounts of unstructured data and perform multiple AI-related tasks, such as text analysis, image recognition, and NLP.

With Step Functions and AWS Lambda functions, you can create sophisticated workflows that include AI services and other AWS services. For instance, you can use Amazon S3 to store input data, invoke a Lambda function to trigger an Amazon Transcribe job to transcribe an audio file, and use the output to trigger an Amazon Comprehend analysis job to generate sentiment metadata for the transcribed text. This enables you to create complex, multi-step workflows that are straightforward to manage, scalable, and cost-effective.

The following is an example architecture that shows how Step Functions can help invoke AWS AI services using Lambda functions.

AWS AI Services - Lambda Event Workflow -Unstructured Data

The workflow steps are as follows:

  1. Unstructured data, such as text files, audio files, and video files, are ingested into the S3 raw bucket.
  2. A Lambda function is triggered to read the data from the S3 bucket and call Step Functions to orchestrate the workflow required to extract the metadata.
  3. The Step Functions workflow checks the type of file, calls the corresponding AWS AI service APIs, checks the job status, and performs any postprocessing required on the output.
  4. AWS AI services can be accessed via APIs and invoked as batch jobs. To extract metadata from different types of unstructured data, you can use multiple AI services in sequence, with each service processing the corresponding file type.
  5. After the Step Functions workflow completes the metadata extraction process and performs any required postprocessing, the resulting output is stored in an S3 bucket for cataloging.

Next, let’s understand how can we implement security or access control on both the extracted output as well as the raw input objects.

Implement access control on raw and processed data in Amazon S3

We just consider access controls for three types of data when managing unstructured data: the AI-extracted semi-structured output, the metadata, and the raw unstructured original files. When it comes to AI extracted output, it’s in JSON format and can be restricted via Lake Formation and Amazon DataZone. We recommend keeping the metadata (information that captures which unstructured datasets are already processed by the pipeline and available for analysis) open to your organization, which will enable metadata discovery across the organization.

To control access of raw unstructured data, you can integrate S3 Access Points and explore additional support in the future as AWS services evolve. S3 Access Points simplify data access for any AWS service or customer application that stores data in Amazon S3. Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations. Each access point has distinct permissions and network controls that Amazon S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy that is attached to the underlying bucket. With S3 Access Points, you can create unique access control policies for each access point to easily control access to specific datasets within an S3 bucket. This works well in multi-tenant or shared bucket scenarios where users or teams are assigned to unique prefixes within one S3 bucket.

An access point can support a single user or application, or groups of users or applications within and across accounts, allowing separate management of each access point. Every access point is associated with a single bucket and contains a network origin control and a Block Public Access control. For example, you can create an access point with a network origin control that only permits storage access from your virtual private cloud (VPC), a logically isolated section of the AWS Cloud. You can also create an access point with the access point policy configured to only allow access to objects with a defined prefix or to objects with specific tags. You can also configure custom Block Public Access settings for each access point.

The following architecture provides an overview of how an end-user can get access to specific S3 objects by assuming a specific AWS Identity and Access Management (IAM) role. If you have a large number of S3 objects to control access, consider grouping the S3 objects, assigning them tags, and then defining access control by tags.

S3 Access Points - Unstructured Data Management - Access Control

If you are implementing a solution that integrates S3 data available in multiple AWS accounts, you can take advantage of cross-account support for S3 Access Points.

Conclusion

This post explained how you can use AWS AI services to extract readable data from unstructured datasets, build a metadata layer on top of them to allow data discovery, and build an access control mechanism on top of the raw S3 objects and extracted data using Lake Formation, Amazon DataZone, and S3 Access Points.

In addition to AWS AI services, you can also integrate large language models with vector databases to enable semantic or similarity search on top of unstructured datasets. To learn more about how to enable semantic search on unstructured data by integrating Amazon OpenSearch Service as a vector database, refer to Try semantic search with the Amazon OpenSearch Service vector engine.

As of writing this post, S3 Access Points is one of the best solutions to implement access control on raw S3 objects using tagging, but as AWS service features evolve in the future, you can explore alternative options as well.


About the Authors

Sakti Mishra is a Principal Solutions Architect at AWS, where he helps customers modernize their data architecture and define their end-to-end data strategy, including data security, accessibility, governance, and more. He is also the author of the book Simplify Big Data Analytics with Amazon EMR. Outside of work, Sakti enjoys learning new technologies, watching movies, and visiting places with family.

Bhavana Chirumamilla is a Senior Resident Architect at AWS with a strong passion for data and machine learning operations. She brings a wealth of experience and enthusiasm to help enterprises build effective data and ML strategies. In her spare time, Bhavana enjoys spending time with her family and engaging in various activities such as traveling, hiking, gardening, and watching documentaries.

Sheela Sonone is a Senior Resident Architect at AWS. She helps AWS customers make informed choices and trade-offs about accelerating their data, analytics, and AI/ML workloads and implementations. In her spare time, she enjoys spending time with her family—usually on tennis courts.

Daniel Bruno is a Principal Resident Architect at AWS. He had been building analytics and machine learning solutions for over 20 years and splits his time helping customers build data science programs and designing impactful ML products.

Why AWS is the Best Place to Run Rust

Post Syndicated from Deval Parikh original https://aws.amazon.com/blogs/devops/why-aws-is-the-best-place-to-run-rust/

Introduction

The Rust programming language was created by Mozilla Research in 2010 to be “a programming language empowering everyone to build reliable and efficient(fast) software”[1]. If you are a beginner level SDE or a DevOps engineer or a decision maker in your organization looking to adopt Rust for your specific use, you will find this blog helpful to get started with Rust on AWS. We will begin by explaining why Rust has gained a huge traction over programming languages like C, C++, Java, Python, and Go. We will then talk about why AWS is one of the best platforms for Rust. Finally, we will provide an example of how you can quickly run a Rust program using AWS Lambda function.

Why Rust?

Rust is an efficient and reliable programming language that addresses performance, reliability, and productivity all at once. It distinguishes itself from its peers by boasting memory safety and thread safety without a need for garbage collector.

Historically, C and C++ have held the title of being the most performant programming languages; however, their speeds have often come with a significant cost to their safety and maintainability. The biggest threat in using such languages range from corruption of valid data to the execution of arbitrary code. The frequency of these issues is even more obvious when you notice that from 2007 to 2019, 70 percent of all vulnerabilities addressed by Microsoft through security updates pertain to memory safety [2]. Languages like Java have come a long way in mitigating such vulnerabilities using garbage collector, however this has come with significant performance bottleneck. Rust seeks to marry performance and safety using its novel borrow-checker, which is a type of static analysis tool that can help check for errors in code such as null-pointer dereferences, data races, etc.

There are other ways programs may access invalid memory. Iterating through an array, for example, requires the iterator to know how many elements are in the array to create a stopping condition. Furthermore, without checking array out of bounds, how would an accessor method be sure it is not accessing an index that does not exist? Here, safety comes with a performance overhead. Typically, the safety benefits of languages like Java are worth the performance overhead. However, for situations where safety and speed are both an absolute necessity, developers may choose to run their mission critical applications in Rust. Here, Rust can be viewed as a memory-safe, fast, low-resource programming language that requires no runtime. This makes Rust also suitable to run on embedded or low-resource device applications.

Rust brings polished tooling, a robust package manager (Cargo), and perhaps most importantly – a fast-growing and passionate community of developers. As Rust gains in popularity, so does the number of high-profile organizations adopting it (including AWS!) for critical applications where performance and safety are top concerns. Did you know that Amazon S3 leverages Rust to attempt to return responses with single-digit millisecond latency? To name a few, AWS product components written in Rust include Amazon CloudFront, Amazon EC2, and AWS Lambda among others.

There are many great resources to learn Rust. Most Rust developers start with the official Rust book, which is available for free online.

[1]: Rust Language official website

[2]: https://www.zdnet.com/article/microsoft-70-percent-of-all-security-bugs-are-memory-safety-issues/

[3]: https://codilime.com/blog/why-is-rust-programming-language-so-popular/#:~:text=Rust%20is%20a%20statically%2Dtyped,developed%20originally%20at%20Mozilla%20Research

Why Rust on AWS?

Rust matters to AWS for two main reasons. First, our customers are choosing to use Rust for their mission critical workloads and adoption is growing, therefore it becomes imperative that AWS provides the best tools possible to run Rust on AWS. In the next section, I will provide an example to show how easy it is to interact with AWS services using Rust runtime on AWS Lambda.

Additionally, it is important that we are creating high performant, safe infrastructure and services for our customers to run their business critical workloads on AWS. In 2018, AWS first launched its open source microVM technology Firecracker written completely in Rust. Since then, AWS has delivered over two dozen open source projects developed in Rust. For instance, AWS uses Firecracker to run AWS Lambda and AWS Fargate. Today, AWS Lambda processes trillions of executions for hundreds of thousands of active customers every month. Its ability to fire up AWS Lambda or AWS Fargate in less than 125ms attributes to blazing fast speed of Rust. AWS also developed and launched Bottlerocket, a Linux-based open source container OS purpose built for running containers. Veeva Systems a leader in cloud based software for the life sciences industry runs a variety of microservices on Bottleneck securely, with enhanced resource efficiency, and decreased management overhead, thanks to Rust.

Here at AWS, our product development teams have leveraged Rust to deliver more than a dozen services. Besides services such as Amazon Simple Storage Service (Amazon S3), AWS developers uses Rust as the language of choice to develop product components for Amazon Elastic Compute Cloud (Amazon EC2), Amazon CloudFront, Amazon Route 53, and more. Our Amazon EC2 team uses Rust for new AWS Nitro System components, including sensitive applications such as Nitro Enclaves.

Not only is AWS using Rust for improving their product response times, we are actively contributing to and supporting Rust and the open source ecosystem around it. AWS employs a number of core open source contributors to the Rust project and popular Rust libraries like tokio, used for writing asynchronous applications with Rust. According to Marc Brooker, Distinguished Engineer and Vice President of Database and AI at AWS, “Hiring engineers to work directly on Rust allows us to improve it in ways that matter to us and to our customers, and help grow the overall Rust community.” AWS is an active member on the Board of Directors for the Rust Foundation and have generously donated infrastructure and technology services to the Rust Foundation. You can read more about how AWS is helping the Rust community here.

Getting Started with Rust on AWS

This demonstration will walk you through creating your first AWS Lambda + Rust App! We’ll bootstrap the development process by utilizing the AWS Serverless Application Model (SAM)—a tool designed for building, deploying, and managing serverless applications. AWS SAM streamlines the Rust development process by setting up AWS’s official Rust Lambda Runtime, Cargo Lambda. This runtime offers a specialized build tool command for direct deployment to AWS. Additionally, AWS SAM integrates both Amazon DynamoDB table and an Amazon API Gateway endpoint. The provided example serves as a foundational template for leveraging the AWS Rust SDK with Amazon DynamoDB.

architecture diagram

Prerequisites

Steps

1.  Open a terminal and navigate to your project directory.

2.  Initialize the project using sam init

3.  Choose “1 - AWS Quick Start Templates”, then “16 - DynamoDB Example”.

4.  Name the project (for demo: “rust-ddb-example-app“)

5.  Now navigate into the newly created directory with the SAM application code and execute sam build && sam deploy --guided.

a.  Accept prompts with “y” or defaults.

6.  After deployment concludes, record the Amazon CloudFormationPutApi” output URL. (i.e https://a1b2c3d4e5f6.execute-api.us-west-2.amazonaws.com/Prod/)

7.  Add an element to your table. (For the demo the id of our element will be foo and the payload will be bar). (e.g  curl -X PUT <PutApi URL>/foo -d "bar")

8.  Validate the addition via the AWS Console’s DynamoDB. Locate the table named after your AWS SAM app and verify the new item. You can do this by going to the AWS Console, clicking DynamoDB, then Tables, and then Explore Items.

dynamodb example

What Next?

This is a great starting point on your journey with Rust on AWS. For taking your development journey to the next level consider:

  1. Explore More Rust on AWS: AWS provides a plethora of examples and documentation. Explore the AWS Rust GitHub Repository for more intricate use cases and examples.
  2. Join a Rust Workshop: AWS often hosts workshops and webinars on various topics. Keep an eye on the AWS Events Page for an upcoming Rust-focused session.
  3. Deepen Your Rust Knowledge: If you’re new to Rust or want to delve deeper, the Rust Book is an excellent resource. We also highly recommend watching the videos on the Cargo Lambda documentation page.
  4. Engage with the Community: The Rust community is vibrant and welcoming. Join forums, attend meetups, and participate in discussions to grow your network and knowledge. Become a member of Rust Foundation to collaborate with other members of the community.
  5. Contribute to make Rust even better: Report on bugs or fix them, write documentation, and add new features. Here is how.

Conclusion

For those of us living in the safety net confines of an interpreter, Rust changes how we can still execute safely in a compiler generated world. Most importantly, Rust brings to the table blazing fast speed and performance without compromises to the security and stability of the system. It is a language of choice in embedded-systems programming, mission critical systems, blockchain and crypto development, and has found its place in 3D video gaming as well.

Rust on AWS is a game changer in that it makes it easy for developers to run code without having the need to setup extensive infrastructure to run it. It serves as an excellent backend service with zero administration. AWS Lambda‘s in-built Rust support further exemplifies AWS’s commitment to accommodating popularity of this language. In addition, the popularity of Rust has mandated an inbuilt handler be added to AWS Lambda for further support of Rust.

Additional Reading

About the Authors

Deval Parikh Photo

Deval Parikh

Deval Parikh is a Sr. Enterprise Solutions Architect at Amazon Web Services. Deval is passionate about helping enterprises reimagine their businesses in the cloud by leading them with strategic architectural guidance and building prototypes as an AWS expert. She is an active member of Containers and DevOps technical communities at AWS. She is also an active board member of the Women at AWS affinity group where she oversees university programs to educate students on cloud technology and careers. Outside of work, Deval is an avid hiker and a painter. You can see many of her paintings at here. You can reach Deval via her LinkedIn

Saahil Parikh Photo

Saahil Parikh

Saahil Parikh is a Software Development Engineer at Amazon Web Services, where he specializes in Elastic Map Reduce (EMR). A passionate maker at heart, Saahil thrives on harnessing the power of emerging technologies to create groundbreaking solutions. His commitment to innovation has led him to continuously push the boundaries of what’s possible. Outside of work, Saahil is an avid hiker, culinary enthusiast, and soccer player. Interested in finding out more about Saahil? Check out Saahil’s GitHub. You can reach Saahil at LinkedIn here.

Serverless ICYMI Q3 2023

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/serverless-icymi-q3-2023/

Welcome to the 23rd edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

AWS announces the general availability of Amazon Bedrock

Amazon Web Services (AWS) unveils five generative artificial intelligence (AI) innovations to democratize generative AI applications. Amazon Bedrock, now generally available, enables experimentation with top foundation models (FMs) and allows customization with proprietary data.

It supports creating managed agents for complex tasks without code and ensures security and privacy. Amazon Titan Embeddings, another FM, is generally available for various language-related use cases. Meta’s Llama 2, coming soon, enhances dialogue scenarios.

The upcoming Amazon CodeWhisperer customization capability enables secure customization using private code bases. Generative BI authoring capabilities in Amazon QuickSight simplify visualization creation for business analysts.

AWS Lambda

AWS Lambda now detects and stops recursive loops in Lambda functions. AWS Lambda now detects and halts functions caught in recursive or infinite loops, guarding against unexpected costs. Lambda identifies recursive behavior, discontinuing requests after 16 invocations. The feature addresses pitfalls stemming from misconfiguration or coding bugs, introducing detailed error messaging, and allowing users to set maximum limits on retry intervals. Notifications about recursive occurrences are relayed through the AWS Health Dashboard, emails, and CloudWatch Alarms for streamlined troubleshooting. Lambda uses AWS X-Ray trace headers for invocation tracking, requiring supported AWS SDK versions.

AWS simplifies writing .NET 6 Lambda functions. The Lambda Annotations Framework for .NET. A new programming model makes the experience of writing Lambda functions in C# feel more natural for .NET developers by using C# source generator technology. This streamlines the development workflow for .NET developers, making it easier to create serverless applications using the latest version of the .NET framework.

AWS Lambda and Amazon EventBridge Pipes now support enhanced filtering. Additional filtering capabilities include the ability to match against characters at the end of a value (suffix filtering), ignore case sensitivity (equals-ignore-case), and have a single rule match if any conditions across multiple separate fields are true (OR matching).

AWS Lambda Functions powered by AWS Graviton2 are now available in 6 additional Regions. Graviton2 processors are known for their performance benefits, and this expansion provides users with more choices for running serverless workloads.

AWS Lambda adds support for Python 3.11 allowing developers to take advantage of the latest features and improvements in the Python programming language for their serverless functions.

AWS Step Functions

AWS Step Functions enhances Workflow Studio, focusing on an Advanced Starter Template and Code Mode for efficient AWS Step Functions workflow creation. Users benefit from streamlined design-to-code transitions, pasting Amazon States Language (ASL) definitions directly into Workflow Studio, speeding up adjustments. Enhanced workflow execution and configuration allow direct execution and setting adjustments within Workflow Studio, improving user experience.

AWS Step Functions launches enhanced error handling This update helps users to identify errors with precision and refine retry strategies. Step Functions now enables detailed error messages in Fail states and precise control over retry intervals. Use the new maximum limits and jitter functionality to ensure efficient and controlled retries, preventing service overload in recovery scenarios.

AWS Step Functions distributed map is now available in the AWS GovCloud (US) Regions. This release highlights the availability of the distributed map feature in Step Functions specifically tailored for the AWS GovCloud (US) Regions. The distributed map feature is a powerful capability for orchestrating parallel and distributed processing in serverless workflows.

AWS SAM

AWS SAM CLI announces local testing and debugging support on Terraform projects.

Developers can now use AWS SAM CLI to locally test and debug AWS Lambda functions and Amazon API Gateway defined in their Terraform projects. AWS SAM CLI reads infrastructure resource information from the Terraform application, allowing users to start Lambda functions and API Gateway endpoints locally in a Docker container.

This update enables faster development cycles for Terraform users, who can use AWS SAM CLI commands like `AWS SAM local start-api`, `sam local start-lambda`, and `sam local invoke`, along with `sam local generate` for generating mock test events.

Amazon EventBridge

Amazon EventBridge Scheduler adds schedule deletion after completion. This feature offers enhanced functionality by supporting the automatic deletion of schedules upon completion of their last invocation. It is applicable to various scheduling types, including one-time, cron, and rate schedules with an end date. Amazon EventBridge Scheduler, a centralized and highly scalable service, enables the creation, execution, and management of schedules.

With the ability to schedule millions of tasks invoking over 270 AWS services and 6,000 API operations. This update streamlines the process of managing completed schedules. The automatic deletion feature reduces the need for manual intervention or custom code, saving time and simplifying scalability for users leveraging EventBridge Scheduler.

Amazon EventBridge Pipes now available in three additional Regions. This update extends the availability of Amazon EventBridge Pipes, a powerful event-routing service, to three additional Regions.

Amazon EventBridge API Destinations is now available in additional Regions. Providing users with more options for building scalable and decoupled applications.

Amazon EventBridge Schema Registry and Schema Discovery now in additional Regions. This expansion allows you to discover and store event structure – or schema – in a shared, central location. You can download code bindings for those schemas for Java, Python, TypeScript, and Golang so it’s easier to use events as objects in your code.

Amazon SNS

To enhance message privacy and security, Amazon Simple Notification Service (SNS) implemented Message Data Protection, allowing users to de-identify outbound messages via redaction or masking. Amazon SNS FIFO topics now support message delivery to Amazon SQS Standard queues. This provides users with increased flexibility in managing message delivery and ordering.

Expanding its monitoring capabilities, Amazon SNS introduced Additional Usage Metrics in Amazon CloudWatch. This enhancement allows users to gain more comprehensive insights into the performance and utilization of their SNS resources. SNS extended its global SMS sending capabilities to Israel (Tel Aviv), providing users in that Region with additional options for SMS notifications. SNS also expanded its reach by supporting Mobile Push Notifications in twelve new AWS Regions. This expansion aligns with the growing demand for mobile notification capabilities, offering a broader coverage for users across diverse Regions.

Amazon SQS

Amazon Simple Queue Service (SQS) introduced a number of updates. Attribute-Based Access Control (ABAC) was implemented for scalable access permissions, while message data protection can now de-identify outbound messages via redaction or masking. SQS FIFO topics now support message delivery to Amazon SQS Standard queues, providing enhanced flexibility. Addressing throughput demands, SQS increased the quota for FIFO High Throughput mode. JSON protocol support was previewed, offering improved message format flexibility. These updates underscore SQS’s commitment to advanced security and flexibility.

Amazon API Gateway

Amazon API Gateway undergoes a console refresh, aligning with Cloudscape Design System guidelines. Notable enhancements include improved usability, sortable tables, enhanced API key management, and direct API deployment from the Resource view. The update introduces dark mode, accessibility improvements, and visual alignment with HTTP APIs and AWS Services.

GOTO EDA day Nashville 2023

Join GOTO EDA Day in Nashville on October 26 for insights on event-driven architectures. Learn from industry leaders at Music City Center with talks, panels, and Hands-On Labs. Limited tickets available.

Serverless blog posts

July 2023

July 5- Implementing AWS Lambda error handling patterns

July 6 – Implementing AWS Lambda error handling patterns

July 7 – Understanding AWS Lambda’s invoke throttling limits

July 10 – Detecting and stopping recursive loops in AWS Lambda functions

July 11 – Implementing patterns that exit early out of a parallel state in AWS Step Functions

July 26 – Migrating AWS Lambda functions from the Go1.x runtime to the custom runtime on Amazon Linux 2

July 27 – Python 3.11 runtime now available in AWS Lambda

August 2023

August 2 – Automatically delete schedules upon completion with Amazon EventBridge Scheduler

August 7 – Using response streaming with AWS Lambda Web Adapter to optimize performance

August 15 – Integrating IBM MQ with Amazon SQS and Amazon SNS using Apache Camel

August 15 – Implementing the transactional outbox pattern with Amazon EventBridge Pipes

August 23 – Protecting an AWS Lambda function URL with Amazon CloudFront and Lambda@Edge

August 29 – Enhancing file sharing using Amazon S3 and AWS Step Functions

August 31 – Enhancing Workflow Studio with new features for streamlined authoring

September 2023

September 5 – AWS SAM support for HashiCorp Terraform now generally available

September 14 – Building a secure webhook forwarder using an AWS Lambda extension and Tailscale

September 18 – Building resilient serverless applications using chaos engineering

September 19 – Implementing idempotent AWS Lambda functions with Powertools for AWS Lambda (TypeScript)

September 19 – Centralizing management of AWS Lambda layers across multiple AWS Accounts

September 26 – Architecting for scale with Amazon API Gateway private integrations

September 26 – Visually design your application with AWS Application Composer

Videos

Serverless Office Hours – Tues 10AM PT

July 2023

July 4 – Benchmarking Lambda cold starts

July 11 – Lambda testing: AWS SAM remote invoke

July 18 – Using DynamoDB global tables

July 25 – Serverless observability with SLIC-watch

August 2023

August 1 – Step Functions versions and aliases

August 8 – Deploying Lambda with EKS and Crossplane / Managing Lambda with Kubernetes

August 15 – Serverless caching with Momento

September 2023

September 5 – Run any web app on Lambda

September 12 – Building an API platform on AWS

September 19 – Idempotency: exactly once processing

September 26 – AWS Amplify Studio + GraphQL

FooBar Serverless YouTube channel

July 2023

July 27 – Generative AI and Serverless to create a new story everyday

August 2023

August 3Getting started with Data Streaming

August 10 – Amazon Kinesis Data Streams – Shards? Provisioned? On-demand? What does all this mean?

August 17 – Put and consume events with AWS Lambda, Amazon Kinesis Data Stream and Event Source Mapping

August 24 – Create powerful data pipelines with Amazon Kinesis and EventBridge Pipes

August 31 – New Step Functions versions and alias!

September 2023

September 7 – Amazon Kinesis Data Firehose – What is this service for?

September 14 – Kinesis Data Firehose with AWS CDK – Lambda transformations

September 21 – Advanced Event Source Mapping configuration | AWS Lambda and Amazon Kinesis Data Streams

September 28 – Data Streaming Patterns

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.