Tag Archives: serverless

WellRight modernizes to an event-driven architecture to manage bursty and unpredictable traffic

2025-02-24 John Lee

Post Syndicated from John Lee original https://aws.amazon.com/blogs/architecture/wellright-modernizes-to-an-event-driven-architecture-to-manage-bursty-and-unpredictable-traffic/

WellRight is a leading comprehensive corporate wellness platform provider that helps organizations and employees drive meaningful outcomes through personalized wellness programs. The platform increases engagement and benefit utilization by delivering engaging challenges across multiple dimensions of wellness, from physical activities like step tracking to mental health initiatives and team-building exercises.

In this post, we share how WellRight optimized the cost and performance of their application through a ground-up modernization to an event-driven architecture.

The challenge

WellRight’s infrastructure often experiences bursty and unpredictable traffic patterns. For instance, clients can upload bulk user data at any time, which can impact tens of thousands of users, which then cascade into millions of changes. WellRight’s legacy monolithic infrastructure had several challenges when faced with such traffic:

Multiple processes such as registration, progress calculation, and reward distribution relied on a single server, leading to a noisy neighbor problem.
Certain core services were isolated to avoid the noisy neighbor problem, but with high burst workloads, auto scaling didn’t react fast enough to meet the demand. This led to queues backing up with millions of requests. In addition, the database also had to be overprovisioned to avoid throttling, adding to the overall cost.
Parts of the application were not designed with auto scaling in mind, leading to overprovisioning of resources.

The following figure shows the Number of Messages Received metric from a sample Amazon Simple Queue Service (Amazon SQS) queue. WellRight would often receive burst of events at an unpredictable time.

A line graph showing the number of messages received in an SQS queue, with a sharp spike amid otherwise zero activity.

Solution overview

To address the challenges, WellRight made the strategic decision to transition to an event-driven architecture using fully managed AWS services. WellRight’s platform is driven by asynchronous state changes that propagate through multiple wellness programs, which is well suited for an event-driven architecture and can be broken down into microservices. Managed services such as AWS Lambda, Amazon SQS, and Amazon DynamoDB were appealing because they would eliminate the need to manage servers and allow WellRight to focus on core business logic and reduce the operational burden to their engineering team. It also has the added benefit of avoiding overprovisioning of infrastructure or continuously right-sizing resources. Each microservice would scale automatically as needed with no manual efforts, minimizing costs. The loosely coupled architecture would allow the WellRight team to be flexible, being able to add or make modifications to existing programs without affecting existing workflows.

Design

WellRight’s initial event-driven architecture was centered around using serverless and fully managed services. DynamoDB was used as a primary data store for user information. For instance, when a user makes progress on their step challenge, the update in the DynamoDB table would propagate through DynamoDB Streams to Amazon EventBridge. Then, the event would be routed to the appropriate SQS queue, which functions as a buffer and provides fault tolerance to the events. A Lambda function would then process individual user metrics and update the Programs table. The Programs table uses DynamoDB Streams to send out updates using Amazon Simple Notification Service (Amazon SNS), keeping users informed about their progress and rankings.

The following diagram illustrates the flow of an event after a user update.

The first iteration of the event-driven architecture fared better than the monolithic legacy application, but the bursty nature of the traffic was still an issue. Lambda functions triggered by SQS queues scaled rapidly, handling requests in under 15 minutes that previously required 30 servers and took hours to process. Lambda provided WellRight the scalability that they needed, but the rapid scaling introduced a new challenge. This resulted in the throttling of DynamoDB and reaching Lambda concurrency limits during times of extremely high load, which led to many unprocessed messages in the dead-letter queue (DLQ).

Maximum concurrency solution

In January 2023, AWS introduced the maximum concurrency feature for Lambda functions using Amazon SQS as an event source. This new feature allowed WellRight to control the concurrency of their Lambda functions for each SQS queue. Prior to this launch, Lambda functions would continue to scale as long as there were messages in the SQS queue. At times, Lambda functions would scale to its concurrency limits, resulting in it throttling itself. However, with this feature in place, the scaling Lambda functions would not exceed the set maximum concurrency value. This provided WellRight fine-grained control over the overall throughput of the system. WellRight would adjust the maximum concurrency value as needed to protect downstream processes from being overwhelmed, while responding to customer requests in a timely manner.

The following screenshot of the Lambda console shows the maximum concurrency for the function is set to 100 for an SQS trigger.

An AWS Lambda configuration screen showing a trigger from an SQS progress-calculation-queue with maximum concurrency set to 100, alongside a diagram illustrating the SQS to Lambda connection.

WellRight converted all Amazon SQS to Lambda integrations to use this feature. This provided WellRight with full control over the throughput of customer requests while preventing overloading the system. With the maximum concurrency feature, WellRight reduced failed processed messages by 99%, and eliminated DynamoDB throttling events. The feature was enabled for all Amazon SQS and Lambda integrations, including those without scaling issues, as a safeguard for potential future scaling demands.

Performance and cost savings

WellRight’s event-driven architecture significantly improved their ability to handle bursty and unpredictable traffic patterns. The managed serverless services can scale instantaneously to handle these traffic spikes, providing a seamless experience for their clients. With their previous legacy architecture, clients experienced lags in challenge progress, leaderboards, and reward processing.

Now, clients continue to upload updates with over 1 million entries at any time, and WellRight can maintain up-to-the-minute leaderboards and reward processing. The transition to the new architecture has also yielded significant cost savings for WellRight. Prior to the serverless architecture, their baseline architecture required several large Amazon Elastic Compute Cloud (Amazon EC2) instances to handle the initial burst of traffic. After implementing the event-driven architecture, WellRight reduced their costs by 70% on the progress calculation service.

Future plans

WellRight is currently in the process of rolling out the new event-driven architecture to the remaining clients. By the end of 2024, WellRight plans to retire the majority of their remaining servers, further reducing their infrastructure costs.

Conclusion

WellRight’s transition to an event-driven architecture on AWS has been a successful endeavor. By using fully managed services such as Lambda, Amazon SQS, and DynamoDB, they have been able to handle bursty and unpredictable traffic patterns efficiently, while providing a seamless experience for their clients. The introduction of maximum concurrency for Lambda functions has been a game changer, allowing WellRight to control the throughput of their Lambda functions and avoid overwhelming downstream resources.

Overall, the event-driven architecture has enabled WellRight to scale efficiently, improve performance, and reduce costs of their progress calculation service by over 70%. As they continue to optimize their serverless architecture and migrate remaining clients, WellRight is well-positioned to further enhance their platform and provide an exceptional experience to their customers.

To learn more about building event-driven architectures, including key concepts, best practices, AWS services, and getting started resources, visit Serverless Land.

About the authors

Create a serverless custom retry mechanism for stateless queue consumers

2025-02-11 Kaizad Wadia

Post Syndicated from Kaizad Wadia original https://aws.amazon.com/blogs/architecture/create-a-serverless-custom-retry-mechanism-for-stateless-queue-consumers/

Serverless queue processors like AWS Lambda often exist in architectures where they pull messages from queues such as Amazon Simple Queue Service (Amazon SQS) and interact with downstream services or external APIs in a distributed architecture. Robust retry approaches are necessary to provide reliable message processing due to the susceptibility of these downstream services to short-term outages or throttling. This often requires implementing special retry logic with features like dead-letter queues (DLQs) and exponential backoff to handle these cases gracefully, making sure that the downstream systems don’t get overwhelmed by too many retries.

In this post, we propose a solution that handles serverless retries when the workflow’s state isn’t managed by an additional service.

Solution overview

Some custom retry logic is required when Lambda functions interact with downstream services after consuming messages from SQS queues. This strategy involves the usage of Amazon EventBridge Scheduler and code in Lambda. The core concept is to implement a robust retry mechanism for handling failed message processing attempts using an EventBridge scheduler. When a Lambda function encounters a problem while processing a message, it triggers a specific error. Upon catching this error in a catch block, the function generates an EventBridge schedule. As a result, the message is sent back to the SQS queue and will be available for processing again at a specified future time.

In this approach, the retry mechanism can have a fine-grained level of control over the retry timing that might also support various techniques, including exponential backoff and linear retry intervals. This approach separates the retry logic from the code to process the message itself, making the Lambda function performant. Along with handling messages when all retries are exhausted, this solution interfaces with a DLQ to keep such messages separate from the main queue.

The following diagram illustrates the solution architecture.

The error handling and retry choice logic in the Lambda function code form the basis for how this custom retry mechanism is implemented. If there is an error while processing the message, the function raises a specific exception. Raising the exception then initiates the retry flow. A try-catch block catches this exception and calls a function that interfaces with the EventBridge Scheduler API to build a custom schedule. To configure the schedule, we include the destination SQS queue and the intended timestamp when the message is meant to be retried. We can change the delay with some code modifications depending on a number of parameters, such as error type, number of prior retries, or other custom backoff schemes.

As part of this approach, we use SQS message attributes for idempotency and to track retries. On each retry, the function adds the new timestamp to an array in the message body. If the function consumes the message more times than the maximum retry limit (determined by the array of retry attempts) it sends the message to the DLQ without rescheduling.

The solution also involves the integration of a DLQ so that it doesn’t keep messages in the main processing queue and be retried forever. The Lambda function will register messages with the DLQ in case of either exceeding the maximum retry limit or when certain error scenarios require it to stop early. This queue keeps all communications that have failed until such a time they can be manually reviewed, reprocessed, or even corrected.

Considerations and best practices

There are a few key factors to keep in mind while putting this custom retry system into practice. One aspect is handling partial failures, that is, processing where only part of the steps are complete. In such cases, we could use some form of compensating action or rollback to maintain consistency in data and avoid discrepancies downstream of the queue consumer.

Another crucial factor is controlling retry limits. Although the system design allows for variable retry limits, we must balance resource usage and resilience. Too many retries might cause higher costs and lead to slowdowns or service degradation. That is why we recommend that appropriate retry limits are set, considering probable failure rates, SLAs, and business consequences of failures.

We must also consider that EventBridge Scheduler has a granularity of 1 minute, and there is additional latency between the queue and the function, so the mechanism will not be completely precise. In principle, the scheduler sets the minimum time before which the message can be processed, making sure the Lambda function adheres to the rate limits at a minimum. This could also result in additional delays, so the mechanism would need to be adjusted for time-sensitive applications to account for these delays.

Because the solution might deal with variable volumes of messages and processing loads, scaling issues are also important. For example, the Lambda concurrency and retention period for the queue represent resource configurations we should monitor and adjust for optimal performance and cost.

Finally, we need to consider security as part of the solution. If the downstream service runs in a virtual private cloud (VPC), we would also need to place the Lambda function in the VPC. In this case, we would need to access EventBridge Scheduler through AWS PrivateLink, which enables secure and performant access to services from within a VPC.

Additionally, it is important to implement the AWS Identity and Access Management (IAM) roles (mainly the Lambda function role) with the principal of least privilege, which gives it access to create the EventBridge schedule (and iam:PassRole to give the scheduler the required permissions) as well as pass the scheduler’s IAM role to it. The scheduler’s role only needs permission to place a message into the source queue. We also need to give the function access to place a message in the DLQ and receive messages from the source queue.

Monitoring and troubleshooting

The custom retry mechanism demands efficient monitoring and debugging. With that in mind, we might view various behaviors of the system and identify potential problems by using Amazon CloudWatch logs and metrics.

The number of invocations of Lambda functions, related error rates, runtimes, and use of DLQ are the key indicators that we should monitor. It would be worth setting up alarms in CloudWatch to send an alert or initiate automated actions when the Lambda function’s metrics surpass certain predetermined thresholds. By doing this, we proactively detect and resolve certain issues pertaining to the function.

Also, we can examine logs of the Lambda function for certain error situations, retry patterns, or problems with the downstream services or with the retry logic itself. We can place logging lines judiciously in the function code to record pertinent information, including message attributes, retry attempts, and error details.

Future enhancements

There are some improvements we could consider to enhance the capabilities and flexibility of the suggested approach even further, which provides a foundation to customize retry mechanisms.

A possible improvement would be to introduce dynamic retry intervals depending on the conditions of a downstream service or kinds of errors. Instead of being based on predefined backoff schemes, the system might dynamically adjust the retry intervals based on specific error types detected or in-service health monitoring in real time. This concept’s principal disadvantage is additional complexity, which might cause the failure of the retry process itself.

Another potential enhancement is the integration of the system with external configuration services such as Amazon DynamoDB or Parameter Store, a capability of AWS Systems Manager. That way, we can handle the retry configurations centrally and dynamically to provide ease of maintenance and modification in retry strategies without having to redeploy the Lambda function code.

It would also be possible to build in advanced error analysis and reporting into the system. The system would then have the potential to provide key insights for root cause analysis and proactive remediation through comprehensive reporting, patterns of errors analyzed, and failures correlated with downstream service health.

Conclusion

It is often challenging to build scalable, robust serverless applications that might need to talk with external services. However, the proposed solution using Lambda, Amazon SQS, and EventBridge Scheduler brings a simple yet effective solution to implement customized retry mechanisms. It gives the developer fine-grained control over the retry interval, supports scenarios such as exponential backoff, and works seamlessly with DLQs for persisting failures and EventBridge Scheduler for delayed retries of messages. The mechanism can also be reused more broadly for stateless queue consumers, not only for Lambda functions. This pattern enables developers to implement robust, fault-tolerant serverless systems that handle disruptions in downstream services gracefully.

About the Author

Amazon Redshift Serverless adds higher base capacity of up to 1024 RPUs

2025-02-10 Ricardo Serafim

Post Syndicated from Ricardo Serafim original https://aws.amazon.com/blogs/big-data/amazon-redshift-serverless-adds-higher-base-capacity-of-up-to-1024-rpus/

In the rapidly evolving world of data and analytics, organizations are constantly seeking new ways to optimize their data infrastructure and unlock valuable insights. Amazon Redshift is changing the game for thousands of businesses every day by making analytics straightforward and more impactful. Fully managed, AI powered, and using parallel processing, Amazon Redshift helps companies uncover insights faster than ever. Whether you’re a small startup or a big player, Amazon Redshift helps you make smart decisions quickly and with the best price-performance at scale. Amazon Redshift Serverless is a pay-per-use serverless data warehousing service that eliminates the need for manual cluster provisioning and management. This approach is a game changer for organizations of all sizes with predictable or unpredictable workloads.

The key innovation of Redshift Serverless is its ability to automatically scale compute up or down based on your workload demands, maintaining optimal performance and cost-efficiency without manual intervention. Redshift Serverless allows you to specify the base data warehouse capacity the service uses to handle your queries for a steady level of performance on a well-known workload or use a price-performance target (AI-driven scaling and optimization), better suited in scenarios with fluctuating demands, optimizing costs while maintaining performance. The base capacity is measured in Redshift Processing Units (RPUs), where one RPU provides 16 GB of memory. Redshift Serverless defaults to a robust 128 RPUs, capable of analyzing petabytes of data, allowing you to scale up for more power or down for cost optimization, making sure that your data warehouse is optimally sized for your unique needs. By setting a higher base capacity, you can improve the overall performance of your queries, especially for data processing jobs that tend to consume a lot of compute resources. The more RPUs you allocate as the base capacity, the more memory and processing power Redshift Serverless will have available to tackle your most demanding workloads. This setting gives you the flexibility to optimize Redshift Serverless for your specific needs. If you have a lot of complex, resource-intensive queries, increasing the base capacity can help make sure those queries are executed efficiently, with little to no bottlenecks or delays.

In this post, we explore the new higher base capacity of 1024 RPUs in Redshift Serverless, which doubles the previous maximum of 512 RPUs. This enhancement empowers you to get high performance for your workload containing highly complex queries and write-intensive workloads, with concurrent data ingestion and transformation tasks that require high throughput and low latency with Redshift Serverless. Redshift Serverless also offers scale up to 10 times the base capacity. The focus is on helping you find the right balance between performance and cost to meet your organization’s unique data warehousing needs. By adjusting the base capacity, you can fine-tune Redshift Serverless to deliver the perfect combination of speed and efficiency for your workloads.

The need for 1024 RPUs

Data warehousing workloads are increasingly demanding high-performance computing resources to meet the challenges of modern data processing requirements. The need for 1024 RPUs is driven by several key factors. First, many data warehousing use cases involve processing petabyte-sized historical datasets, whether for initial data loading or periodic reprocessing and querying. This is particularly prevalent in industries like healthcare, financial services, manufacturing, retail, and engineering, where third-party data sources can deliver petabytes of information that must be ingested in a timely manner. Additionally, the seasonal nature of many business processes, such as month-end or quarter-end reporting, creates periodic spikes in computational needs that require substantial scalable resources.

The complexity of the queries and analytics run against data warehouses has also grown exponentially, with many workloads now scanning and processing multi-petabyte datasets. This level of complex data processing requires substantial memory and parallel processing capabilities that can be effectively provided by a 1024 RPU configuration. Furthermore, the increasing integration of data warehouses with data lakes and other distributed data sources adds to the overall computational burden, necessitating high-performing, scalable solutions.

Also, many data warehousing environments are characterized by heavy write-intensive workloads, with concurrent data ingestion and transformation tasks that require a high-throughput, low-latency processing architecture. For workloads requiring access to extremely large volumes of data with complex joins, aggregations, and numerous columns that necessitate substantial memory usage, the 1024 RPU configuration can deliver the necessary performance to help meet demanding service level agreements (SLAs) and provide timely data availability for downstream business intelligence and decision-making processes. And for the control of costs, we can set the maximum capacity (on the Limits tab at the workgroup configuration) to cap the usage of resources to a maximum. The following screenshot shows an example.

MaxCapacity

During the tests discussed later in this post, we compare using maximum capacity of 1024 RPUs vs. 512 RPUs.

When to consider using 1024 RPUs

Consider using 1024 RPUs in the following scenarios:

Complex and long-running queries – Large warehouses provide the compute power needed to process complex queries that involve multiple joins, aggregations, and calculations. For workloads analyzing terabytes or petabytes of data, the 1024 RPU capacity can significantly improve query completion times.
Data lake queries scanning large datasets – Queries that scan extensive data in external data lakes benefit from the additional compute resources. This provides faster processing and reduced latency, even for large-scale analytics.
High-memory queries – Queries requiring substantial memory—such as those with many columns, large intermediate results, or temporary tables—perform better with the increased capacity of a larger warehouse.
Accelerated data loading – Large capacity warehouses improve the performance of data ingestion tasks, such as loading massive datasets into the data warehouse. This is particularly beneficial for workloads involving frequent or high-volume data loads.
Performance-critical use cases – For applications or systems that demand low latency and high responsiveness, a 1024 RPU warehouse provides smooth operation by allocating sufficient compute resources to handle peak loads efficiently.

Balancing performance and cost

Choosing the right warehouse size requires evaluating your workload’s complexity and performance requirements. A larger warehouse size, such as 1024 RPUs, excels at handling computationally intensive tasks but should be balanced against cost-effectiveness. Consider testing your workload on different base capacities or using the Redshift Serverless price-performance slider to find the optimal setting.

When to avoid larger base capacity

Although larger warehouses offer powerful performance benefits, they might not always be the most cost-effective solution. Consider the following scenarios where a smaller base capacity might be more suitable:

Basic or small queries – Simple queries that process small datasets or involve minimal computation don’t require the high capacity of a 1024 RPU warehouse. In such cases, smaller warehouses can handle the workload effectively, avoiding unnecessary costs.
Cost-sensitive workloads – For workloads with predictable and moderate complexity, a smaller warehouse can deliver sufficient performance while keeping costs under control. Selecting a larger capacity might lead to overspending without proportional performance gains.

Comparison and cost-effectiveness

The previous maximum of 512 RPUs should suffice for most use cases, but there can be situations that need more. At 512 RPUs, you get 8 TB of memory on your workgroup; with 1024 RPU, it’s doubled to 16 TB. Consider a scenario where you are ingesting large volumes of data with the COPY command and there are healthcare datasets that go into the 30 TB (or more) range.

To illustrate, we ingested the TPC-H 30TB datasets available at AWS Labs Github repository amazon-redshift-utils on the 512 RPU workgroup and the 1024 RPU workgroup.

The following graph provides detailed runtimes. We see an overall 44% performance improvement on 1024 RPUs vs. 512 RPUs. You will notice that the larger ingestion workloads show a greater performance improvement.

Ingestion

The cost for running 6,809 seconds at 512 RPUs in the US East (Ohio) AWS Region at $0.36 per RPU-hour is calculated as 6809 * 512 * 0.36 / 60 / 60 = $348.62.

The cost for running 3,811 seconds at 1024 RPUs in the US East (Ohio) Region at $0.36 per RPU-hour is calculated as 3811 * 1024 * 0.36 / 60 / 60 = $390.25.

1024 RPUs is able to ingest the 30 TB of data 44% faster at a 12% higher cost compared to 512 RPUs.

Next, we ran the 22 TPC-H queries available at AWS Samples Github repository redshift-benchmarks on the same two workgroups to compare query performance.

The following graph provides detailed runtimes for each of the 22 TPC-H queries. We see an overall 17% performance improvement on 1024 RPUs vs. 512 RPUs for a single session sequential query execution, even though performance improved for some and deteriorated for others.

Queries

When running 20 sessions concurrently, we see 62% performance improvement, from 6,903 seconds on 512 RPUs down to 2,592 seconds on 1024 RPUs, with each concurrent session running the 22 TPC-H queries in a different order.

Notice the stark difference in performance improvement seen for concurrent execution (62%) vs. serial execution (17%). The concurrent executions represent a typical production system where multiple concurrent sessions are running queries against the database. It’s important to base your proof of concept decisions on production-like scenarios with concurrent executions, and not only on sequential executions, which typically come from a single user running the proof of concept. The following table compares both tests.

	512 RPU	1024 RPU
Sequential (seconds)	1276	1065
Concurrent executions (seconds)	6903	2592
Total (seconds)	8179	3657
Total ($)	$418.76	$374.48

The total ($) is calculated by seconds * RPUs * 0.36 / 60 / 60.

1024 RPUs are able to run the TPC-H queries against 30 TB benchmark data 55% faster, and at 11% lower cost compared to 512 RPUs.

Amazon Redshift offers system metadata views and system views, which are useful for tracking resource utilization. We analyzed additional metrics from the sys_query_history and sys_query_detail tables to identify which specific parts of query execution experienced performance improvements or declines. Notice that 1024 RPUs with 16 TB of memory is able to hold a larger number of data blocks in-memory, thereby needing to fetch 35% fewer SSD blocks compared to 512 RPUs with 8 TB of memory. It is able to run the larger workloads better by needing to fetch remote Amazon S3 blocks 71% less compared to 512 RPUs. Finally, local disk spill to SSD (when a query can’t be allocated more memory) was reduced by 63% and remote disk spill to S3 (when the SSD cache is fully occupied) was completely eliminated on 1024 RPUs compared to 512 RPUs.

Metric	Improvement (percentage)
Elapsed time	60%
Queue time	23%
Runtime	59%
Compile time	-8%
Planning time	64%
Lockwait time	-31%
Local SSD blocks read	35%
Remote S3 blocks read	71%
Local disk spill to SSD	63%
Remote disk spill to S3	100%

The following are some run characteristic graphs captured from the Amazon Redshift console. To find these, choose Query and database monitoring and Resource monitoring under Monitoring in the navigation pane.

Thanks to the performance enhancement, queries completed sooner with 1024 RPUs than with 512 RPUs, resulting on connections ending faster.

The following graph illustrates the database connection with 512 RPUs.

Database Connections - 512 RPUs

The following graph illustrates the database connection with 1024 RPUs.

Database Connections - 1024 RPUs

Regarding query classification, there are three categories: short queries (less than 10 seconds), medium queries (10 seconds to 10 minutes), and long queries (more than 10 minutes). We observed that due to performance improvements, the 1024 RPU configuration resulted in fewer long queries compared to the 512 RPU configuration.

The following graph illustrates the queries duration with 512 RPUs. Duration of Queries (512 RPUs)

The following graph illustrates the queries duration with 1024 RPUs.

Duration of Queries (1024 RPUs)

Due to the better performance, we noticed that the number of queries handled per second is higher on 1024 RPUs.

The following graph illustrates the queries completed per second with 512 RPUs.

Queries Per Second (512 RPUs)

The following graph illustrates the queries completed per second with 1024 RPUs.

Queries Per Second (1024 RPUs)

In the following graphs, we see that although the number of queries running looks similar, the 1024 RPU endpoint ends the queries faster, which means a smaller window to run the same number of queries.

The following graph illustrates the queries running with 512 RPUs.

Queries running (512 RPUs)

The following graph illustrates the queries running with 1024 RPUs.

Queries running (1024 RPUs)

There was no queuing when we compared both tests.

The following graph illustrates the queries queued with 512 RPUs.

Queries queued (512 RPUs)

The following graph illustrates the queries queued with 1024 RPUs.

Queries queued (1024 RPUs)

The following graph illustrates the query runtime breakdown with 512 RPUs.

Query Breakdown (512 RPUs)

The following graph illustrates the query runtime breakdown with 1024 RPUs.

Query Breakdown (1024 RPUs)

Queuing was largely avoided due to the automatic scaling feature offered by Redshift Serverless. By dynamically adding more resources, we can keep queries running and match the expected performance levels, even during usage peaks. You are able to set a maximum capacity to help prevent automatic scaling from exceeding your desired resource limits.

The following graph illustrates workgroup scaling with 512 RPUs. Redshift Serverless automatically scaled to 2x/1024 RPUs and peaked at 2.5x/1280 RPUs.

Workgroup Scaling With 512 RPUs

The following graph illustrates workgroup scaling with 1024 RPUs. Redshift Serverless automatically scaled to 2x/2048 RPUs and peaked at 3x/3072 RPUs.

Workgroup Scaling With 1024 RPUs

The following graph illustrates compute consumed with 512 RPUs.

Compute Consumed - 512 RPUs

The following graph illustrates compute consumed with 1024 RPUs.

Compute Consumed - 1024 RPUs

Conclusion

The introduction of the 1024 RPUs capacity for Redshift Serverless marks a significant advancement in data warehousing capabilities, offering substantial benefits for organizations handling large-scale, complex data processing tasks. Redshift Serverless ingestion scan scales up the ingestion performance with higher capacity. As evidenced by the benchmark tests in this post using the TPC-H dataset, this higher base capacity not only accelerates processing times, but can also prove more cost-effective for workloads as described in this post, demonstrating improvements such as 44% faster data ingestion, 62% better performance in concurrent query execution, and overall cost savings of 11% for combined workloads.

Given these impressive results, it’s crucial for organizations to evaluate their current data warehousing needs and consider running a proof of concept with the 1024 RPU configuration. Analyze your workload patterns using the Amazon Redshift monitoring tools, optimize your configurations accordingly, and don’t hesitate to engage with AWS experts for personalized advice. If your company is covered by an account team, ask them for a meeting. If not, post your analysis and question to the Re:Post forum.

By taking these steps and staying informed about future developments, you can make sure that your organization fully takes advantage of Redshift Serverless, potentially unlocking new levels of performance and cost-efficiency in your data warehousing operations.

About the authors

Ricardo Serafim is a Senior Analytics Specialist Solutions Architect at AWS.

Harshida Patel is a Analytics Specialist Principal Solutions Architect, with AWS.

Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.

Introducing JSONL support with Step Functions Distributed Map

2025-02-08 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/introducing-jsonl-support-with-step-functions-distributed-map/

This post written by Uma Ramadoss, Principal Specialist SA, Serverless and Vinita Shadangi, Senior Specialist SA, Serverless.

Today, AWS Step Functions is expanding the capabilities of Distributed Map by adding support for JSON Lines (JSONL) format. JSONL, a highly efficient text-based format, stores structured data as individual JSON objects separated by newlines, making it particularly suitable for processing large datasets.

This new capability enables you to process large collection of items stored in JSONL format directly through Distribtued Map and optionally exports the output of the Distributed Map as JSONL file. The enhancement also introduces support for additional delimited file formats, including semicolon and tab-delimited files, providing greater flexibility in data source options. Furthermore, new flexible output transformations gives developers more control over result formatting, enabling better integration with downstream processes for efficient data handling.

Overview

Distributed Map enables parallel processing of large-scale data by concurrently running the same processing steps for millions of entries in a dataset at the maximum scale of 10000. This is particularly useful for use cases like large scale payroll processing, image conversion, document processing and data migrations. Previously, the dataset can come from state input, JSON/CSV files in S3 and collection of S3 objects. With this new feature, the dataset can be a JSONL file in Amazon S3.

The AWS Step Functions workflow

Consider an example of end-to-end GenAI batch inferencing using Amazon Bedrock. Batch inference helps you process a large number of requests efficiently by bundling them as single request and storing the results in an S3 bucket. Since both input and output are handled as JSONL files, the blog uses the scenario as an example to demonstrate the new capabilities of Distributed Map.

The diagram below shows the end-to-end flow –

Step Functions workflow (Batch inference input generation worfklow) uses Distributed Map to build and bundle AI prompts for a collection of product review data. Workflow then invokes the Amazon Bedrock batch inference API.
Amazon Bedrock stores the results in S3 as JSONL file when the batch inference is completed.
An S3 object created event invokes the second Step Functions workflow (Batch inference output processing workflow) that processes the JSONL file and loads the results into an Amazon DynamoDB table.

Diagram showing the overall architecture for the end to end flow

Batch inferencing workflow

Introducing new output transformations through batch inference input generation workflow

The batch inference input generation workflow processes product review data in S3 using Distributed Map. Distributed Map spins multiple child workflows that generate AI prompts for sentiment analysis of each product review and exports the results of the child workflows to S3 as a JSONL file. The workflow calls Amazon Bedrock batch inference API (CreateModelInvocationJob) with the JSONL file as input upon completion of the Distributed Map state. Since the inference API operates asynchronously, the workflow completes immediately after receiving a successful response from the API.

Diagram of the AWS Step Function workflow that creates the batch process

Batch inference input generation workflow

Each child workflow receives a batch of product reviews as an array. It operates on the array using Pass state to create an array of AI prompts, one for each item. The Pass state manipulates the input using JSONata expressions, generates unique recordId using JSONata numeric functions, and outputs the results in a format Amazon Bedrock expects.

JSONata transformation to generate prompts

Once all child workflows are complete, Distributed Map uses the new output transformations to export the outputs from child workflows to S3.

Using the new output transformations to export in JSONL format

Distributed Map now offers more flexible output handling through an optional writer configuration. While it traditionally exports child workflow execution results to three separate JSON files (successful, failed, and pending), the new writer configuration streamlines the output and supports JSONL format in addition to JSON format.
The previous export option included comprehensive execution details – metadata, child workflow inputs, and outputs. The new configuration allows you to streamline output to include only the child workflow execution results, which are valuable for map/reduce patterns, where the output from one Distributed Map needs to feed directly into another without the need for additional transformation steps.

JSON structure showing the ASL for Step Functions

Output writer config for JSONL

Writer config also allows you to flatten the output array. When a child workflow processes batches of the input, it produces an array of results which will eventually become an array of arrays when the Distributed Map aggregates the outputs from all child workflows. With the new output transformation called FLATTEN, you can choose to flatten the array without additional code.

JSON examples showing the multiple arrays being flattened

Flattening output in JSONL

Introducing the new ItemReader for JSONL using batch inference output processing workflow

The second workflow processes output of the batch inference job by launching multiple child workflows using Distributed Map. Each child workflow processes batches of items, examining them for error objects and separating successful inferences from errors. The workflow then loads all successful inferences into a DynamoDB table while sending errors to a dead letter queue for subsequent analysis.

AWS Step Functions workflow architecture

Processing inference results

Using the new InputType to read the JSONL inference results

The Distributed Map in the batch inference results processing workflow uses the newly supported ItemReader-InputType, JSONL. Previously, the InputType only accepted CSV, JSON, and MANIFEST, which is an S3 Inventory manifest file.

AWS Step Functions ASL showing how to read the JSONL file

Reading JSONL file

There is no other change to how Distributed Map processes and shares data with child workflows. The Pass state in the child workflow receives batches of Items from the Map, and uses JSONata expressions to separate the errors from successful items.

AWS Step Functions ASL showing JSONata to separate processing for errors

Separating successful processing from errors

The following shows the input received by the Pass state and the output generated by the state using the above JSONata expression.

Resulting JSON showing processed records

Sample successful processing records

Using S3 events as connective tissue between the workflows

When Amazon Bedrock completes the batch inference job, it stores the output in the S3 location specified in the API request. An EventBridge rule triggers the batch inference results processing workflow using S3 event notifications. The rule looks for “Object Created” event from the specified S3 bucket and a wildcard pattern for JSONL file extension. When the rule matches the incoming event, it triggers the workflow.

EventBridge rule

You can detect failed batch inference jobs by setting up EventBridge rules that listen to Amazon Bedrock status events. Since failed jobs don’t create output files in S3, monitoring status events directly ensures you catch and handle job failures.

Key considerations

The new output transformations do not change the information in the FAILED execution results file in order to help you analyze the reasons for failures. To learn more about the output transformation configurations, visit the documentation.
The new transformation mode FLATTEN, COMPACT stores only the output of the execution results. To inspect the results for fact checking or troubleshooting, use the default transformation.
As a best practice, when implementing code changes, it’s advised to use the versioning and aliasing feature for gradual deployment of changes to production.
When using Distributed Map, there is an option to configure the child workflow as either Standard or Express. Express is the recommended choice if each iteration (child workflow) can be completed within 5 minutes, and batching items will help optimize costs. To learn more about optimizations for Distributed Map, visit the workshop.

Conclusion

Step Functions Distributed Map is a powerful feature that enables developers to create large-scale data processing solutions with ease, eliminating concerns about operational aspects and software challenges like batching, concurrency, and failure handling. The addition of JSONL support for both input and output expands workload capabilities and minimizes additional effort through transformations by natively deserializing and flattening the output. This blog demonstrated the new feature’s capabilities through a practical example of building large-scale data processing applications using Distributed Map.

For more information on Distributed Map and how to use it with JSONL files, refer to the user guide.

To explore generative AI samples with Step Functions, visit the the GitiHub repo.

To expand your serverless knowledge, visit Serverless Land.

Introducing cross-account targets for Amazon EventBridge Event Buses

2025-01-21 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/introducing-cross-account-targets-for-amazon-eventbridge-event-buses/

This post is written by Anton Aleksandrov, Principal Solutions Architect, Serverless and Alexander Vladimirov, Senior Solutions Architect, Serverless

Today, Amazon EventBridge is announcing support for cross-account targets for Event Buses. This new capability allows you to send events directly to targets, such as Amazon Simple Queue Service (Amazon SQS), AWS Lambda, and Amazon Simple Notification Service (Amazon SNS), located in other accounts.

Previously, EventBridge supported cross-account event delivery from an event bus in one account to an event bus in another account. This launch extends that capability and allows you to configure the source event bus to deliver events directly to all EventBridge supported targets in other accounts, not just event buses. This removes the need for an additional event bus in the target account.

Overview

Event-driven architectures built with EventBridge allow you to create solutions spanning many company departments and business domains, while remaining asynchronous and loosely coupled. As solutions grow, you may need to send events across account boundaries.

For example, you may have a set of event buses hosted in multiple accounts that are dispatching security-related events to an Amazon SQS queue hosted in a centralized account for further asynchronous processing and analysis.

Previously, EventBridge rules allowed you to define targets in the same account. The only target type that supported cross-account event delivery was another event bus. If you wanted to send events across account boundaries, you had to create event buses in both source and target accounts. After, you would configure a rule on the source event bus to send events to the target bus, and another rule on the target event bus to deliver the event to a desired target in the target account. Alternatively, a Lambda function or SNS topic could be used as a bridging mechanism to send events across accounts.

The following diagram illustrates what an architecture of cross-account event delivery looked like before the new capability. A “bridging” component, like another event bus, SNS topic, or Lambda function, was required to send events from one account to another.

Figure 1: Delivering cross-account events from source bus to target bus

With this new EventBridge feature, you can deliver events from the source event bus to the desired targets in different accounts directly. This simplifies the architecture and persmission model. It also reduces latency in your event-driven solutions by having fewer components processing events along the path from source to targets.

Figure 2: Delivering cross-account events to target directly

Configuring EventBridge delivery rule targets for cross-account event delivery

Enabling cross-account event delivery should be done with security in mind. You must establish mutual trust between the source and the target. Source event bus rules must have an AWS Identity and Access Management (IAM) role that allows them to send events to specific targets. This is achieved by attaching an execution role to the delivery rule targets.

Event delivery targets hosted in different accounts must have a resource access policy attached that explicitly allows receiving events from the execution role used in the source account. Due to this requirement, you can enable cross-account event delivery only for targets that support resource access policies, such as Amazon SQS queues, Amazon SNS topics, and AWS Lambda functions.

Having both an IAM role in the source account and resource policy in the target account allows you to have fine-grained control over which principals are allowed to use the PutEvents action and under which conditions. You can define service control policies (SCPs) to set organizational boundaries determining who can send and receive events in your organization.

As illustrated in the following diagram, consider Team A owns the source account (Account A). Team A is responsible for setting up the source event bus, its execution role, rules, and targets. Teams B and C own the target accounts (Account B and Account C, accordingly). Both teams manage their respective target accounts. For example, creating delivery targets, such as SQS queues, and granting permissions to accept events from the event bus in the source account. This approach enables Team A to manage the centralized source event bus for other teams, and Teams B and C to control who can send events to their targets. It provides high degree of overall control and governance.

Figure 3: A cross-team collaboration sending events from source account to target account targets

The following example describes setting up cross-account event delivery to an SQS queue. You can apply the same technique to other target types as well, such as Lambda functions or SNS topics.

See the following diagram for a conceptual architectural layout and resource creation order.

Figure 4: Permissions required for cross-account event delivery

Assuming the source event bus already exists, there are three general steps in setting up cross-account event delivery:

Target account – create the delivery target, such as an SQS queue.
Source account – configure a rule for cross-account event delivery. Set the target SQS queue ARN as rule target, and attach an execution role with permissions to send messages to the target SQS Queue.
Target account – apply a resource policy to the target SQS queue allowing the source event bus execution role to send events.

Showing cross-account delivery in action

Follow the instructions in this GitHub repository for provisioning the sample in your AWS accounts using AWS Serverless Application Model (AWS SAM). An event bus rule in the source account sends events directly to an SQS queue, a Lambda function, and an SNS topic in a target account. You must have two accounts for the sample to work.

Figure 5: The sample project architecture, delivering events cross-account to Lambda, SQS, and SNS.

Make sure you enter a valid email address as SnsSubscriptionEmail value and confirm your email subscription once target stack is deployed.

After deployment, open the EventBridge console in the source account. Navigate to the newly created event bus, which has “SourceEventBus” in its name. Use the Send Events functionality to publish sample events, as shown in the following screenshot. Make sure that the Source of your events is set to “test”.

Figure 6: Sending test event

Validate that the events are successfully delivered to all three cross-account targets. Open the target account in a different browser or an incognito window:

Navigate to the SQS console. Open the newly created queue, which has “TargetSqsQueue” in its name.
Choose Send and Receive messages then choose Poll for messages. You can see the events sent in the previous step.Figure 7: Receiving test event with SQS
Navigate to Amazon CloudWatch Logs. Open the Log Group for the newly created Lambda function, which has “TargetLambdaFunction” in its name. It shows events sent in the previous step.
Figure 8: Receiving test event with Lambda
Check your email. If you have confirmed the SNS topic subscription during the sample code deployment, it shows the events sent in the previous step.Figure 9: Receiving test event with SNS

Conclusion

The new EventBridge capability allows you to deliver events directly to targets across account boundaries. This capability helps to simplify your event-driven architectures, as well as improve latency by reducing the number of components processing your events as they travel from event buses to their destinations.

Refer to the EventBridge pricing page to learn more about cross-account events delivery costs.

For additional documentation, refer to Amazon EventBridge documentation. Get the sample code used in this blog from this GitHub repository.

For more serverless learning resources, visit Serverless Land.

Serverless ICYMI Q4 2024

2025-01-16 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2024/

Welcome to the 27^th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. At the end of a quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened in Q2 here.

Calendar showing October through December 2024

2024 Q4 calender

Serverless at re:Invent 2024

AWS re:Invent 2024 had 60,000 in-person attendees and 400,000 online viewers for the keynotes. The conference delivered 1,900 sessions from 3,500 speakers and included 546 AWS service and feature announcements.

The serverless content consisted of two tracks: Serverless (SVS) and App Integration (API). These tracks included 70 unique sessions and attracted nearly 11,000 attendees. Serverlesspresso, the coffee shop powered by serverless technology, operated in two locations during the event: the Expo Hall and the certification lounge.

Crowd of people standing around the AWS reI:nvent expo hall waiting to order coffee at the Serverlesspresso booth.

Serverlesspresso booth in the expo hall

Videos are available on Serverless Land YouTube.

AWS Lambda and Amazon Elastic Container Service (Amazon ECS) 10-year anniversary.

AWS marked significant milestones in serverless computing, celebrating 10 years of AWS Lambda and Amazon ECS. Lambda now serves over 1.5 million monthly customers and processes tens of trillions of requests each month. Amazon ECS launches more than 2.4 billion container tasks weekly and is used by over 65% of new AWS container customers.

AWS is commemorating this anniversary with insights from AWS Serverless Heroes, product leads, principal engineers, and AWS leadership sharing their perspectives on serverless evolution and future directions. These stories and insights are available at https://aws.amazon.com/serverless/10th-anniversary/.

AWS Lambda

The AWS Lambda team has spent a significant amount of time improving the Lambda development experience. Several enhancements have been made in the console as well as the local development experience.

Code-OSS as the new AWS Lambda inline editor

Lambda has launched a significant upgrade to its console by integrating Code-OSS, the open-source version of Visual Studio Code, delivering a familiar development experience directly in the cloud. The new Lambda Code Editor supports viewing larger function packages up to 50 MB, features a split-screen interface for simultaneous code editing and testing, and includes built-in Amazon Q Developer AI assistance for real-time coding suggestions. This enhancement comes at no additional cost and prioritizes accessibility with features like screen reader support and keyboard navigation. The update bridges the gap between cloud and local development by simplifying the process of downloading function code and AWS SAM templates, ultimately providing developers with a more streamlined and familiar serverless development experience. Watch the video explaining the changes in detail.

Additionally, the Lambda console enhances developer experience with two new features: a built-in CloudWatch Metrics Insights dashboard that surfaces key function metrics, and CloudWatch Logs Live Tail support for real-time log streaming and analysis, enabling faster troubleshooting without leaving the Lambda environment.

Top 10 Functions

Lambda now supports native JSON structured logging for .NET managed runtime applications, improving log searchability and analysis capabilities without requiring manual configuration of logging libraries.

Lambda has expanded its runtime support by adding Python 3.13 and Node.js 22 as both managed runtimes and container base images, providing access to the latest language features and ensuring long-term support through October 2029 and April 2027, respectively.

Lambda SnapStart capability is now available for Python and .NET runtimes, delivering sub-second startup performance for latency-sensitive applications by caching initialized execution environments.

Diagram of how SnapStart works compared to not having SnapStart

SnapStart support comparison

New CloudWatch metrics for Lambda Event Source Mappings provide enhanced visibility into event processing states for Amazon Simple Queue Service (SQS), Amazon Kinesis, and Amazon DynamoDB event sources, helping customers monitor and troubleshoot event processing issues.

Lambda introduces Provisioned Mode for Kafka event source mappings, allowing customers to optimize throughput by configuring dedicated event polling resources for applications with stringent performance requirements.

Finally, Lambda introduces an enhanced local development experience through the AWS Toolkit for Visual Studio Code, streamlining the serverless application development workflow. The update features a new Application Builder interface that guides developers through environment setup, offers sample applications, and provides quick-action buttons for common tasks like build, deploy, and invoke operations. Developers can now efficiently iterate on their code with features such as configurable build settings, step-through debugging, and the ability to sync local changes quickly to the cloud or perform full deployments. The toolkit integrates with AWS Infrastructure Composer for visual application building and includes comprehensive local testing capabilities with shareable test events. This enhancement simplifies the Lambda development process by enabling developers to author, test, debug, and deploy serverless applications without leaving their preferred IDE environment.

Screen capture of the getting started experience for serverless in a local IDE

Local IDE getting started

Amazon ECS and AWS Fargate

AWS enhances observability for containerized applications with CloudWatch Application Signals for Amazon ECS, adding infrastructure metrics correlation to existing traces and logs monitoring, enabling operators to identify and resolve performance issues across their application stack.

Amazon ECS adds service revision and deployment history tracking, allowing customers to monitor changes, track ongoing deployments, and debug deployment failures for long-running applications deployed after October 25, 2024.

A graph explaining the flow for service order and history

Service revisions and deployment history

Amazon ECS expands testing capabilities by supporting network fault injection experiments on AWS Fargate through AWS Fault Injection Service, enabling developers to verify application resilience using six different types of fault injection actions, including network disruptions and resource stress testing.

Amazon EventBridge

Amazon EventBridge announces significant performance improvements, reducing end-to-end latency by up to 94% from 2,235ms to 129.33ms at P99, enabling faster event processing for time-sensitive applications like fraud detection and gaming.

Amazon EventBridge and AWS Step Functions now integrate with private APIs through AWS PrivateLink and Amazon VPC Lattice, enabling secure connectivity between cloud and on-premises applications without custom networking code.

Screen capture of the Amazon EventBridge create connection screen showing the new Private option

Connections to Private APIs

EventBridge API destinations introduces proactive OAuth token refresh for public and private authorization endpoints, helping prevent delays and errors by automatically refreshing tokens before expiration.

AWS Step Functions

AWS Step Functions introduces the ability to export workflows as CloudFormation or SAM templates directly from the AWS console, enabling repeatable provisioning across accounts. Developers can export and customize templates from existing workflows, and use AWS Infrastructure Composer to visually connect workflows with other AWS resources.

Step Functions also adds Variables and JSONata support to enhance workflow development. Variables allow data assignment and reference between states, simplifying payload management, while JSONata provides advanced data transformation capabilities, including date formatting and mathematical operations. These features reduce the need for custom code and intermediate states, making it easier to build distributed serverless applications. Watch the in depth video to learn more.

JSONata and variables

Amazon Kinesis

Amazon Kinesis introduces significant updates to its client libraries. The new Kinesis Client Library (KCL) 3.0 reduces compute costs by up to 33% through enhanced load balancing, while the Kinesis Producer Library (KPL) 1.0 improves performance and security. Both libraries now support AWS SDK for Java 2.x and eliminate dependencies on SDK for Java 1.x, enabling seamless upgrades without requiring application code changes.

KCL 3.0 metrics

Amazon MQ

Amazon MQ adds support for AWS PrivateLink, enabling customers to access Amazon MQ API endpoints directly from their VPC through interface VPC endpoints, eliminating the need for internet access and providing enhanced security through AWS’s internal network infrastructure.

Amazon Finch

AWS announces general availability of Linux support for Finch, an open source container development tool that simplifies building, running, and publishing Linux containers across all major operating systems. The release includes support for the Finch Daemon with Docker API compatibility and is available through RPM packages for Amazon Linux 2 and Amazon Linux 2023.

Amazon Simple Queue Service (SQS)

Amazon SQS increases the in-flight message limit for FIFO queues from 20,000 to 120,000 messages, enabling higher concurrent message processing. This enhancement allows customers to scale their receivers and process up to six times more messages simultaneously, provided they have sufficient publish throughput.

Amazon Managed Streaming for Apache Kafka(Amazon MSK)

Amazon MSK now introduces Managed Streaming for Apache Flink blueprints to simplify real-time AI application development. The service enables vector-embedding generation through Amazon Bedrock, streamlining the integration of streaming data with generative AI models. Using a straightforward configuration process, users can generate and index vector embeddings in Amazon OpenSearch, while leveraging LangChain’s data chunking capabilities for enhanced data retrieval efficiency. The service handles all integration aspects between MSK, embedding models, and Amazon OpenSearch vector stores.

AWS Amplify

AWS Amplify launches the Amplify AI kit for Amazon Bedrock, providing fullstack developers with tools to integrate AI capabilities into web applications. The kit includes a customizable React UI component, secure Bedrock access, and context-sharing features, enabling developers to implement chat, search, and summarization functionalities without machine learning expertise.

AWS AppSync

AWS AppSync launches AppSync Events, enabling developers to broadcast real-time data to multiple subscribers through serverless WebSocket APIs. The service eliminates the need to build and manage WebSocket infrastructure while providing secure, scalable event broadcasting capabilities. Developers can create APIs that automatically scale and integrate with services like Amazon EventBridge. The system supports features such as channel namespaces, event handlers, and multiple authorization modes, and is available in all regions where AWS AppSync operates. Users only pay for API operations and real-time connection minutes used.

Creating an AppSunc Event API

Amazon API Gateway

Amazon API Gateway released a significant enhancement to Amazon API Gateway, enabling customers to manage private REST APIs using custom private DNS names. This highly requested feature allows API providers to use user-friendly domain names like private.example.com, while maintaining TLS encryption for security. The implementation process involves creating a private custom domain, configuring certificates through AWS Certificate Manager (ACM), mapping private APIs, and setting resource policies. The feature supports cross-account sharing through AWS Resource Access Manager (AWS RAM) and is now available in all AWS Regions, including AWS GovCloud (US).

Serverless blog posts

October

November

Serverless Office Hours

Serverless office hours videos

October

Oct 1 – Fullstack apps with Amplify Gen 2
Oct 8 – Step Functions + containers
Oct 22 – GraphQL fun with AppSync
Oct 29 – Serverless testing with Pawel Zubkiewicz

November

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on X (formerly Twitter) to see the latest news, follow conversations, and interact with the team.

Eric Johnson: @edjgeek
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
Romain Jourdan: @rjourdan_net

And finally, visit the Serverless Land for all your serverless needs.

Jumia builds a next-generation data platform with metadata-driven specification frameworks

2024-12-20 Ramon Diez Lejarazu

Post Syndicated from Ramon Diez Lejarazu original https://aws.amazon.com/blogs/big-data/jumia-builds-a-next-generation-data-platform-with-metadata-driven-specification-frameworks/

Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. Jumia is built around a marketplace, a logistics service, and a payment service. The logistics service enables the delivery of packages through a network of local partners, and the payment service facilitates the payments of online transactions within Jumia’s ecosystem. Jumia is present in NYSE and has a market cap of $554 million.

In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions. Some of the challenges that motivated the modernization were the high cost of maintenance, lack of agility to scale computing at specific times, job queuing, lack of innovation when it came to acquiring more modern technologies, complex automation of the infrastructure and applications, and the inability to develop locally.

Solution overview

The basic concept of the modernization project is to create metadata-driven frameworks, which are reusable, scalable, and able to respond to the different phases of the modernization process. These phases are: data orchestration, data migration, data ingestion, data processing, and data maintenance.

This standardization for each phase was considered as a way to streamline the development workflows and minimize the risk of errors that can arise from using disparate methods. This also enabled migration of different kinds of data following a similar approach regardless of the use case. By adopting this approach, the data handling is consistent, more efficient, and more straightforward to manage across different projects and teams. In addition, although the use cases have autonomy in their domain from a governance perspective, on top of them is a centralized governance model that defines the access control in the shared architectural components. Importantly, this implementation emphasizes data protection by enforcing encryption across all services, including Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Furthermore, it adheres to the principle of least privilege, thereby enhancing overall system security and reducing potential vulnerabilities.

The following diagram describes the frameworks that were created. In this design, the workloads in the new data platform are divided by use case. Each use case requires the creation of a set of YAML files for each phase, from data migration to data flow orchestration, and they are basically the input of the system. The output is a set of DAGs that run the specific tasks.

Overview

In the following sections, we discuss the objectives, implementation, and learnings of each phase in more detail.

Data orchestration

The objective of this phase is to build a metadata-driven framework to orchestrate the data flows along the whole modernization process. The orchestration framework provides a robust and scalable solution that has the following capacities: dynamically create DAGs, integrate natively with non-AWS services, allow the creation of dependencies based on past executions, and add an accessible metadata generation per each execution. Therefore, it was decided to use Amazon Managed Workflows for Apache Airflow (Amazon MWAA), which, through the Apache Airflow engine, provides these functionalities while abstracting users from the management operation.

The following is the description of the metadata files that are provided as part of the data orchestration phase for a given use case that performs the data processing using Spark on Amazon EMR Serverless:

owner: # Use case owner
dags: # List of DAGs to be created for this use case
  - name: # Use case name
    type: # Type of DAG (could be migration, ingestion, transformation or maintenance)
    tags: # List of TAGs
    notification: # Defines notificacions for this DAGs
      on_success_callback: true
      on_failure_callback: true
    spark: # Spark job information 
      entrypoint: # Spark script 
      arguments: # Arguments required by the Spark script
      spark_submit_parameters: # Spark submit parameters.

The idea behind all the frameworks is to build reusable artifacts that enable the development teams to accelerate their work while providing reliability. In this case, the framework provides the capabilities to create DAG objects within Amazon MWAA based on configuration files (YAML files).

This particular framework is built on layers that add different functionalities to the final DAG:

DAGs – The DAGs are built based on the metadata information provided to the framework. The data engineers don’t have to write Python code in order to create the DAGs, they are automatically created and this module is in charge of performing this dynamic creation of DAGs.
Validations – This layer handles YAML file validation in order to prevent corrupted files from affecting the creation of other DAGs.
Dependencies – This layer handles dependencies among different DAGs in order to handle complex interconnections.
Notifications – This layer handles the type of notifications and alerts that are part of the workflows.

Orchestration

One aspect to consider when using Amazon MWAA is that, being a managed service, it requires some maintenance from the users, and it’s important to have a good understanding of the number of DAGs and processes that you’re expected to have in order to fine-tune the instance and obtain the desired performance. Some of the parameters that were fine-tuned during the engagement were core.dagbag_import_timeout, core.dag_file_processor_timeout, core.min_serialized_dag_update_interval, core.min_serialized_dag_fetch_interval, scheduler.min_file_process_interval, scheduler.max_dagruns_to_create_per_loop, scheduler.processor_poll_interval, scheduler.dag_dir_list_interval, and celery.worker_autoscale.

One of the layers described in the preceding diagram corresponds to validation. This was an important component for the creation of dynamic DAGs. Because the input to the framework consists of YML files, it was decided to filter out corrupted files before attempting to create the DAG objects. Following this approach, Jumia could avoid undesired interruptions of the whole process. The module that actually builds DAGs only receives configuration files that follow the required specifications to successfully create them. In case of corrupted files, information regarding the actual issues is logged into Amazon CloudWatch so that developers can fix them.

Data migration

The objective of this phase is to build a metadata-driven framework for migrating data from HDFS to Amazon S3 with Apache Iceberg storage format, which involves the least operational overhead, provides scalability capacity during peak hours, and guarantees data integrity and confidentiality.

The following diagram illustrates the architecture.

Migration

During this phase, a metadata-driven framework built in PySpark receives a configuration file as input so that some migration tasks can run in an Amazon EMR Serverless job. This job uses the PySpark framework as the script location. Then the orchestration framework described previously is used to create a migration DAG that runs the following tasks:

The first task creates the DDLs in Iceberg format in the AWS Glue Data Catalog using the migration framework within an Amazon EMR Serverless job.
After the tables are created, the second task transfers HDFS data to a landing bucket in Amazon S3 using AWS DataSync to sync customer data. This process brings data from all the different layers of the data lake.
When this process is complete, a third task converts data to Iceberg format from the landing bucket to the destination bucket (raw, process, or analytics) using again another option of the migration framework embedded in an Amazon EMR Serverless job.

Data transfer performance is better when the size of the files to be transferred is around 128–256 MB, so it’s recommended to compress the files at the source. By reducing the number of files, metadata analysis and integrity phases are reduced, speeding up the migration phase.

Data ingestion

The objective of this phase is to implement another framework based on metadata that responds to the two data ingestion models. A batch mode is responsible for extracting data from different data sources (such as Oracle or PostgreSQL) and a micro-batch-based mode extracts data from a Kafka cluster that, based on configuration parameters, has the capacity to run native streams in streaming.

The following diagram illustrates the architecture for the batch and micro-batch and streaming approach.

Ingestion

During this phase, a metadata-driven framework builds the logic to bring data from Kafka, databases, or external services, that will be run using an ingestion DAG deployed in Amazon MWAA.

Spark Structured Streaming was used to ingest data from Kafka topics. The framework receives configuration files in YAML format that indicate which topics to read, what extraction processes should be performed, whether it should be read in streaming or micro-batch, and in which destination table the information should be saved, among other configurations.

For batch ingestion, a metadata-driven framework written in Pyspark was implemented. In the same way as the previous one, the framework received a configuration in YAML format with the tables to be migrated and their destination.

One of the aspects to consider in this type of migration is the synchronization of data from the ingestion phase and the migration phase, so that there is no loss of data and that data is not reprocessed unnecessarily. To this end, a solution has been implemented that saves the timestamps of the last historical data (per table) migrated in a DynamoDB table. Both types of frameworks are programmed to use this data the first time they are run. For micro-batching use cases, which use Spark Structured Streaming, Kafka data is read by assigning the value stored in DynamoDB to the startingTimeStamp parameter. For all other executions, priority will be given to the metadata in the checkpoint folder. This way, you can make sure ingestion is synchronized with the data migration.

Data processing

The objective in this phase was to be able to handle updates and deletions of data in an object-oriented file system, so Iceberg is a key solution that was adopted throughout the project as delta lake files because of its ACID capabilities. Although all phases use Iceberg as delta files, the processing phase makes extensive use of Iceberg’s capabilities to do incremental processing of data, creating the processing layer using UPSERT using Iceberg’s ability to run MERGE INTO commands.

The following diagram illustrates the architecture.

Processing

The architecture is similar to the ingestion phase, with just changes to the data source to be Amazon S3. This approach speeds up the delivery phase and maintains quality with a production-ready solution.

By default, Amazon EMR Serverless has the spark.dynamicAllocation.enabled parameter set to True. This option scales up or down the number of executors registered within the application, based on the workload. This brings a lot of advantages when dealing with different types of workloads, but it also brings considerations when using Iceberg tables. For instance, while writing data into an Iceberg table, the Amazon EMR Serverless application can use a large number of executors in order to speed up the task. This can result in reaching Amazon S3 limits, specifically the number of requests per second per prefix. For this reason, it’s important to apply good data partitioning practices.

Another important aspect to consider in these cases is the object storage file layout. By default, Iceberg uses the Hive storage layout, but it can be set to use ObjectStoreLocationProvider. By setting this property, a deterministic hash is generated for each file, with a hash appended directly after write.data.path. This can considerably minimize throttle requests based on object prefix, as well as maximize throughput for Amazon S3 related I/O operations, because the files written are equally distributed across multiple prefixes.

Data maintenance

When working with data lake table formats such as Iceberg, it’s essential to engage in routine maintenance tasks to optimize table metadata file management, preventing a large number of unnecessary files from accumulating and promptly removing any unused files. The objective of this phase was to build another framework that can perform these types of tasks on the tables within the data lake.

The following diagram illustrates the architecture.

Maintenance

The framework, as well as the other ones, receives a configuration file (YAML files) indicating the tables and the list of maintenance tasks with their respective parameters. It was built on PySpark so that it could run as an Amazon EMR Serverless job and could be orchestrated using the orchestration framework just like the other frameworks built as part of this solution.

The following maintenance tasks are supported by the framework:

Expire snapshots – Snapshots can be used for rollback operations as well as time traveling queries. However, they can accumulate over time and can lead to performance degradation. It’s highly recommended to regularly expire snapshots that are no longer needed.
Remove old metadata files – Metadata files can accumulate over time just like snapshots. Removing them regularly is also recommended, especially when dealing with streaming or micro-batching operations, which was one of the cases of the overall solution.
Compact files – As the number of data files increases, the number of metadata stored in the manifest files also increases, and small data files can lead to less efficient queries. Because this solution uses a streaming and micro-batching application writing into Iceberg tables, the size of the files tends to be small. For this reason, a method to compact files was imperative to enhance the overall performance.
Hard delete data – One of the requirements was to be able to perform hard deletes in the data older than a certain period of time. This implies removing expiring snapshots and removing metadata files.

The maintenance tasks were scheduled with different frequencies depending on the use case and the specific task. For this reason, the schedule information for this tasks is defined in each of the YAML files of the specific use case.

At the time this framework was implemented, there was no any automatic maintenance solution on top of Iceberg tables. At AWS re:Invent 2024, Amazon S3 Tables functionality has been released to automatize the maintenance of Iceberg Tables . This functionality automates file compaction, snapshot management, and unreferenced file removal.

Conclusion

Building a data platform on top of standarized frameworks that use metadata for different aspects of the data handling process, from data migration and ingestion to orchestration, enhances the visibility and control over each of the phases and significantly speeds up implementation and development processes. Furthermore, by using services such as Amazon EMR Serverless and DynamoDB, you can bring all the benefits of serverless architectures, including scalability, simplicity, flexible integration, improved reliability, and cost-efficiency.

With this architecture, Jumia was able to reduce their data lake cost by 50%. Furthermore, with this approach, data and DevOps teams were able to deploy complete infrastructures and data processing capabilities by creating metadata files along with Spark SQL files. This approach has reduced turnaround time to production and reduced failure rates. Additionally, AWS Lake Formation provided the capabilities to collaborate and govern datasets on various storage layers on the AWS platform and externally.

Leveraging AWS for our data platform has not only optimized and reduced our infrastructure costs but also standardized our workflows and ways of working across data teams and established a more trustworthy single source of truth for our data assets. This transformation has boosted our efficiency and agility, enabling faster insights and enhancing the overall value of our data platform.

– Hélder Russa, Head of Data Engineering at Jumia Group.

Take the first step towards streamlining the data migration process now, with AWS.

About the Authors

Ramón Díez is a Senior Customer Delivery Architect at Amazon Web Services. He led the project with the firm conviction of using technology in service of the business.

Paula Marenco is a Data Architect at Amazon Web Services, she enjoys designing analytical solutions that bring light into complexity, turning intricate data processes into clear and actionable insights. Her work focuses on making data more accessible and impactful for decision-making.

Hélder Russa is the Head of Data Engineering at Jumia Group, contributing to the strategy definition, design, and implementation of multiple Jumia data platforms that support the overall decision-making process, as well as operational features, data science projects, and real-time analytics.

Pedro Gonçalves is a Principal Data Engineer at Jumia Group, responsible for designing and overseeing the data architecture, emphasizing on AWS Platform and datalakehouse technologies to ensure robust and agile data solutions and analytics capabilities.

Efficient satellite imagery supply with AWS Serverless at BASF Digital Farming GmbH

2024-12-06 Kevin S. Ridolfi

Post Syndicated from Kevin S. Ridolfi original https://aws.amazon.com/blogs/architecture/efficient-satellite-imagery-supply-with-aws-serverless-at-basf-digital-farming-gmbh/

This post was co-written with Dr. Jan Melchior at BASF Digital Farming GmbH and xarvio Digital Farming Solutions.

BASF Digital Farming’s mission is to support farmers worldwide with cutting-edge digital agronomic decision advice by using its main crop optimization platform, xarvio FIELD MANAGER. This necessitates providing the most recent satellite imagery available as quickly as possible. This blog post describes the serverless architecture developed by BASF Digital Farming for efficiently downloading and supplying satellite imagery from various providers to support its xarvio platform.

Figure 1. Screenshot showing the xarvio Field Manager platform

Architecture

Figure 2 shows the serverless architecture implemented with AWS services for downloading and processing satellite imagery. The subscription management components handle subscription creation, updates, and deletions, while the actual data downloading and processing occurs in AWS Step Functions.

Figure 2. Serverless implementation of the new imagery service

Subscriptions are created using Amazon API Gateway for external API access, which provides request throttling and can be used to manage API request authorizations.
An AWS Lambda API function manages subscriptions. It implements common create, read, update, and delete operations with request validations and provides an endpoint for replaying failed requests. Subscriptions contain geometry, data provider, as well as start and end date and other parameters, which are stored in the subscription database (Step 7) before a message is sent out for processing.
Notice that the entire architecture is serverless and thus allows for theoretically unbounded scaling. In case of a bug, this can lead to severe cost impacts, so we implemented a safety buffer, which enables us to prioritize and limit the number of Step Functions executions of the processing pipeline.
All requests (such as the initial request for imagery when a subscription is created) are sent to the Amazon Simple Queue Service (Amazon SQS) processing queue first, which functions as a processing buffer and allows for request prioritization.
Subsequently, Amazon EventBridge Pipes connects the processing buffer with AWS Step Functions. It handles pipe-internal errors automatically; for example, when the Step Functions concurrency limit is reached, the invocation will be retired automatically. This does not handle exceptions raised within Step Functions, such as runtime errors.
AWS Step Functions then performs the actual downloading, processing, and ingestion to the STAC catalog of satellite data from different providers. In case of failure, the request message with error description is sent to the failure queue.
Step Functions uploads the data to Amazon Simple Storage Service (Amazon S3), which stores satellite imagery data.
Following this, Step Functions updates the subscriptions in the Amazon DynamoDB-based subscription database, which stores relevant metadata, such as start and end date, boundary, provider, collection, and last update.
A notification is sent out to inform the user that new data is available through Amazon Simple Notification Service (Amazon SNS), which informs users and services about any updates on a subscription, such as new data being available or subscriptions having been created, deleted, updated, or having failed.
Next, the data is published to our internal STAC catalog, which registers the satellite imagery and makes it directly accessible for subsequent processing.
In case of failed Step Functions execution in Step 5, the Amazon SQS-based failure queue buffers failed executions. Failure messages contain the error message and request body. Depending on error reasons, they can be replayed using the corresponding API endpoint, enabling reprocessing through the replay endpoint on the API Lambda function. The endpoint also allows users to filter messages based on their failure type and to delete messages that cannot be replayed.
An update checker, built on AWS Lambda, regularly checks whether a subscription can be updated. It is triggered in conjunction with an event scheduler every 5 minutes, checks the database for subscriptions that can be updated, and sends update request messages to the processing buffer. Besides actively checking resources, such as API endpoints and STAC catalogs, it also sends out an update message if a notification was received, for example, through an external notification service.
Finally, a delete checker, also built on AWS Lambda, identifies subscriptions that can be deleted. It is triggered in conjunction with an event scheduler every 12 hours. It regularly checks the database for subscriptions that can be deleted and removes them from the database, the S3 bucket, and the STAC catalog. As a safety mechanism, a subscription will first be marked for deletion for 6 months before it gets deleted.

Imagery step function

The actual downloading and processing of data from different providers is handled by the imagery function, illustrated for two different providers (Public and Planet) in Figure 3.

Figure 3. Diagram showing detail state machine for the Imagery Step Function

When a request arrives, the provider choice state determines the provider from the request body, depending on which the Step Functions flow routes to different Lambda states.
In case a public provider is selected (for example, Earth Search), the Public_Provider Lambda function downloads the data from STAC-based open data providers and directly uploads it to the S3 data bucket, as shown in Figure 2.
In case Planet data is selected, the data retrieval involves an asynchronous call to an external API: First, the Planet_Requester sends an order to the Planet API, together with a task token for pausing Step Functions and the URL of the Planet_Webhook Lambda function.
The Planet_Webhook function is invoked by Planet when the requested order is available for downloading. Given the transmitted task token, Step Functions is resumed with the next state.
Subsequently, the Planet_Provider Lambda function downloads and processes the Planet data.
For both public providers and Planet, the subsequent Public_Provider Lambda function updates the subscription database entries, as shown in Figure 2 (for example, with the latest available timestamp), and adds the download and processed data to the internal STAC catalog, before it ends in the Success state.
If an error occurs in any of the Lambda functions (2, 3, 5, 6), an error message is prepared in the Error_Parsing If an unknown provider is handed in, an error message, including the request body, is prepared in the Error_Provider_Unknown state. In both cases, the error message is pushed to the Failure_Queue (refer to #10 of Figure 2), before it ends in the Failure state.

Conclusion

BASF Digital Farming GmbH developed a serverless architecture on AWS for efficiently downloading and supplying satellite imagery for use by its xarvio platform. This architecture led to a 5x faster delivery rate, an 80% cost reduction through on-demand data downloading, and a 3x accelerated development cycle. Future work will include optimizing the architecture, exploring additional AWS services, and onboarding more satellite imagery providers. Similar serverless architectures using AWS services like AWS Step Functions, AWS Lambda, and Amazon API Gateway can enhance flexibility, scalability, and cost efficiency in imagery provisioning. Learn more about AWS serverless offerings at aws.amazon.com/serverless.

Let’s Architect! Serverless developer experience in AWS

2024-12-03 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-serverless-developer-experience-in-aws/

Are you a developer approaching serverless for the first time, or even an experienced one looking for a better way to accelerate your feedback loop from code to production? This collection of resources is perfect for you!

There are plenty of developer goodies available on AWS to streamline your code creation and achieve a faster flow in your development lifecycle. Let us share a few examples with you.

What if I told you that you could have an assistant to create your tests? Or that you could review the schema of DynamoDB tables without logging into the AWS Console? Get ready to discover some game-changing tools and techniques that will revolutionize your serverless development process.

And if you want to know more, check out the AWS developer center for more content dedicated to your developer experience on AWS.

Enjoy the journey!

Introducing an enhanced local IDE experience for AWS Lambda developers

We’re excited to announce significant enhancements to the AWS Toolkit, designed to streamline the AWS Lambda development experience. These new features bring the power of Lambda directly to your local development environment, allowing you to work more efficiently within your preferred IDE.

With this update, you can now create, test, and debug Lambda functions locally with unprecedented ease. The toolkit supports local invocation of Lambda functions, enabling real-time testing and debugging without cloud deployment. We’ve also incorporated intelligent code completion and inline documentation for AWS SDK calls, reducing errors and accelerating your coding process.

These improvements offer substantial benefits: faster iteration cycles, deeper insights into Lambda function behavior, and the ability to deliver high-quality serverless applications more rapidly. Whether you’re new to serverless or an experienced Lambda developer, this enhanced local development experience provides a more intuitive and productive environment for building cloud-native solutions.

Figure 1. AWS Toolkit offers the possibility to retrieve real-time the logs of your AWS Lambda functions directly inside your IDE

Take me to this blog

Test Driven Development with Amazon Q Developer

Amazon Q for developers is a versatile AI-powered assistant designed to enhance various aspects of the software development lifecycle. This innovative tool can help streamline numerous tasks, from writing code and documentation to generating unit tests, effectively reducing the time spent on common development activities. By embracing Amazon Q Developer, developers can boost their productivity and focus more on creative problem-solving, with capabilities like test generation serving as just one example of how it can accelerate the development process and improve code quality.

In this example, you will discover how Amazon Q Developer can help out to embrace test-driven development (TDD) in your projects.

Figure 2. Amazon Q Developer in action! As you can see you can choose the right recommendation for your code

Take me to this blog

Stop guesstimating the Lambda functions memory size

Optimizing Lambda function performance is crucial for both cost efficiency and user experience, yet many developers still rely on guesswork when setting memory allocations. This approach often leads to suboptimal configurations, resulting in either wasted resources or underperforming functions. Here is where AWS Lambda Power Tuning comes in. By automatically testing your Lambda function with various memory configurations, you can identify the optimal balance between performance and cost. This data-driven approach ensures your functions run at peak efficiency, potentially reducing costs and improving response times. Moreover, as your application evolves, regular power tuning can help you adapt to changing requirements and usage patterns.

The output of running Lambda Power Tuning with your code is a diagram that shows you the best memory size based on your goals. Either optimized for cost or response time or you can choose a more balanced approach

Figure 3. The output of running Lambda Power Tuning with your code is a diagram that shows you the best memory size based on your goals. Either optimized for cost or response time or you can choose a more balanced approach

Take me to this tool

NoSQL Workbench for Amazon DynamoDB

Developers working with Amazon DynamoDB have a powerful ally in their local development toolkit: NoSQL Workbench for Amazon DynamoDB. This intuitive, graphical tool changes the way you interact with DynamoDB tables, offering a fast and efficient feedback loop right on your laptop. With NoSQL Workbench, you can visually design, create, and modify your DynamoDB table structures without the need to constantly access the AWS Console. The tool’s data modeler allows you to experiment with different schemas, ensuring optimal design before deployment. Need to populate your tables for testing? NoSQL Workbench has you covered with its data visualization and manipulation features, enabling quick data insertion and querying. Moreover, its ability to generate sample data and visualize query results in real-time accelerates the development and debugging process.

Figure 4. Visualizing single table design helps you to understand how to structure your serverless applications

Take me to the documentation

Instrument observability for Lambda functions with Powertools

AWS Lambda Powertools is your go-to open source project when you want to instrument observability and beyond for AWS Lambda functions. Available for multiple programming languages including Python, Node.js, Java, and .NET, Powertools empowers developers to build production-ready Lambda functions with ease. At its core, it provides comprehensive observability features, enabling structured logging, creating custom metrics, and implementing distributed tracing with minimal overhead. But Powertools doesn’t stop there – it also includes utilities for parameter store and secrets management, making it simpler to handle configuration and sensitive data. The suite offers idempotency helpers to ensure reliable execution of your functions, even in the face of retries or duplicates. With its event handler functions, Powertools streamlines the processing of various AWS events, reducing boilerplate code and potential errors. By adopting Powertools, developers can significantly reduce the time spent on implementing best practices, allowing them to focus on building business logic while ensuring their Lambda functions are performant, secure, and easily maintainable.

Figure 5. Powertools for Python goes over and beyond just observability as you can see by the list on the left of this screenshot

Take me to this tool

AWS Serverless developer experience workshop

The AWS Serverless Developer Experience workshop is an hands-on guide that brings together all the cutting-edge tools and techniques we’ve discussed, offering developers a holistic approach to building serverless applications. This free, self-paced workshop is designed to elevate your serverless development skills, regardless of your experience level. It covers a wide range of topics, from implementing best practices with AWS Lambda Powertools, to optimizing your functions using AWS Lambda Power Tuning. The workshop also delves into CI/CD practices, showing you how to automate your deployment pipeline for faster, more reliable releases.

Figure 6. The serverless developer experience architecture you will work on during the workshop

Take me to the workshop

See you next time!

Thanks for reading! This is the last post of the year, thank you so much for being with us for the 3rd year in a row. To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

Introducing Provisioned Mode for Kafka Event Source Mappings with AWS Lambda

2024-11-26 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/introducing-provisioned-mode-for-kafka-event-source-mappings-with-aws-lambda/

This post is written by Tarun Rai Madan, Principal Product Manager, Serverless Compute and Rajesh Kumar Pandey, Principal Software Engineer, Serverless Compute

AWS is announcing the general availability of Provisioned Mode for AWS Lambda Event Source Mappings (ESMs) that subscribe to Apache Kafka event sources including Amazon MSK and self-managed Kafka. Provisioned Mode allows you to optimize the throughput of your Kafka ESM by provisioning event polling resources that remain ready to handle sudden spikes in traffic. Controlling the throughput of your ESM helps you build highly responsive and scalable event-driven Kafka applications with stringent performance requirements.

Overview

When you build modern applications using Event-Driven Architectures (EDAs), your event producers publish events, which are then processed by event source connectors like an ESM, and routed to serverless compute consumers like Lambda functions. Apache Kafka is a popular open-source platform for building real-time streaming data applications using Lambda functions as consumers. AWS Lambda’s fully-managed MSK ESM or self-managed Kafka ESM reads events from Kafka as an event source, performs operations like filtering and batching, and invokes Lambda functions. Both ESMs offer built-in integrations with event sources, auto-scaling, and features like batching and filtering. When a Kafka ESM is created, Lambda ESM allocates one event poller to start polling for messages in a Kafka topic. The ESM then evaluates the message backlog – using the OffsetLag metric – for all partitions in the topic, and auto-scales event pollers to process messages efficiently.

Many real-time applications using Kafka are sensitive to sudden spikes in traffic, which could lead to noticeable delays in your end users’ experience. Previously, there were no controls to optimize the throughput for performance-sensitive workloads when using Kafka ESMs. This forced you to explore alternative solutions for workloads with strict performance requirements, which added architectural complexity. To harness the power of Lambda for such performance-sensitive applications, you need to be able to control your Kafka ESM’s throughput and ensure responsive auto-scaling behavior.

What’s new

Provisioned Mode for ESM is a feature that helps you control the throughput of your ESM, and achieve an enhanced performance profile for performance-sensitive applications, particularly ones that see sudden spikes in traffic. You can use Provisioned Mode for Kafka ESM with a range of Kafka or Kafka-compatible streaming data platform providers like Amazon MSK, Confluent, Redpanda, and self-managed Kafka. Key benefits include:

Controls to optimize throughput: You can now fine-tune the throughput of your ESM by configuring a minimum and maximum number of resources called “event pollers”. An event poller (or a “poller”) represents a compute resource that underpins an ESM in the Provisioned Mode, and allocates up to 5 MB/s throughput.
Responsive auto-scaling: With Provisioned Mode, your Kafka ESM detects the increase in OffsetLag metric for all partitions in your Kafka topic, and auto-scales event pollers in a responsive manner. During idle periods, your ESM automatically scales down to the minimum event pollers set by you.
Simplified networking experience and charges: Previously, you were required to configure AWS PrivateLink or NAT Gateway to enable Lambda to poll messages from Kafka clusters in your VPC and invoke Lambda functions. With Provisioned Mode, you are no longer required to configure PrivateLink or NAT Gateway. This approach reduces overhead and improves the developer experience, allowing you to focus on building applications rather than managing networking setup. Consequently, you are not charged for PrivateLink VPC endpoints when using Kafka as an event source with Lambda in the Provisioned Mode for ESM, which reduces your networking charges.

Activating Provisioned Mode for ESM

To activate Provisioned Mode for a new or existing Kafka ESM, you can configure the minimum event pollers, the maximum event pollers, or both for your ESM. The allowed values range from 1 to 200 for minimum event pollers, and from 1 to 2000 for maximum event pollers.

Note that you must configure at least one of minimum or maximum event pollers to activate Provisioned Mode. When you configure only the minimum number of event pollers (‘Min-only’), your ESM allocates this minimum quantity and can dynamically scale up to a maximum. This maximum is determined by the OffsetLag and is limited by either the number of partitions or the default maximum event pollers, whichever is lower. When you configure only the maximum number of event pollers (“Max-only”), your ESM starts with one minimum poller by default, and can scale up to the maximum event pollers or number of partitions, whichever is lower. When you configure both the minimum and maximum number of event pollers (“Min and Max”), your ESM can auto-scale between this range of minimum and maximum event pollers configured.

Activating using AWS CLI

You can activate Provisioned Mode for ESM during creation of a new ESM, or by updating an existing ESM. Specify the –provisioned-poller-config parameter.

aws lambda create-event-source-mapping \
    --region <region-name> \
    --function-name <function-name> \
    --event-source-arn <event-source-arn> \
    --provisioned-poller-config '{"MinimumPollers":<number>, "MaximumPollers":<number>}'

Activating using AWS Lambda Console

Select Configure provisioned mode to activate Provisioned Mode when creating a new ESM, or updating an existing one.

Figure 1: Activating Provisioned Mode for ESM in Console

Provisioned Mode for Kafka ESM in action

To see the performance profile with Provisioned Mode for Kafka ESM, deploy a Lambda function that subscribes to an Amazon MSK topic. Use the reference pattern on Serverless Land and see this blog post outlining steps to configure MSK ESM for a Lambda function. In this case, a producer writes 20 million messages, each with 1KB payload size to an MSK topic – distributed evenly across 100 partitions. Use a batch size of 100, with function duration at 100ms, and set the StartingPosition to TRIM_HORIZON to process from the beginning of the stream.

Note the baseline performance profile observed with the default On-Demand mode. Then analyze two configurations with the Provisioned Mode activated.

Scenario 1 uses different configurations for minimum event pollers
Scenario 2 uses the default minimum event pollers and lets Lambda manage the event pollers through autoscaling.

Baseline performance profile for Kafka ESM On-demand

With Provisioned Mode disabled, Lambda takes approximately 20 minutes to drain the backlog of 20 million messages. It takes 4 minutes to reach the maximum concurrent executions. Use this result as a baseline to compare against Provisioned Mode for ESM.

Figure 2: Baseline performance without Provisioned Mode for ESM

Scenario 1: Configuring minimum event pollers, and auto-scaling

To optimize the ESM throughput for this workload and reduce the time to drain the message backlog, configure the minimum event pollers. Select values of 10 and 100 for minimum event pollers, and observe the results.

Configuring 10 minimum event pollers

Lambda drains the backlog of 20 million messages in approximately 11 minutes with minimum pollers set to 10. This is 45% faster than the baseline without Provisioned Mode. It takes approximately 6 minutes to reach maximum concurrent executions.

Figure 3: Performance profile with minimum event pollers set to 10

Configuring 100 minimum event pollers

To further improve the processing performance, configure the minimum event pollers to 100. Lambda now takes 6 minutes to drain the backlog of 20 million messages, which is 70% faster than the baseline. It instantly reaches the maximum concurrent executions.

Figure 4: Performance profile with minimum event pollers set to 100

Scenario 2: Default minimum event pollers, and auto-scaling

In some cases, the workload may not be as performance-sensitive. With the same volume of 20M messages in your Kafka topic, activate Provisioned Mode for ESM. Start with the default minimum event pollers (set to 1) and let Lambda auto-scale the event pollers based on incoming traffic.

Lambda automatically scales up your event pollers to process the incoming messages, and scales them down as the backlog is cleared. With the default minimum and maximum event pollers, Lambda takes approximately 12 minutes to clear the backlog of 20 million messages, which is 40% faster than the baseline. Lambda takes 7 minutes to reach maximum concurrent executions.

Figure 5: Performance profile with minimum event pollers set to 1

The following table summarizes the performance improvement for the analyzed workload using Provisioned Mode for ESM.

ESM Mode	Time to drain message backlog	Percentage improvement
On-demand Mode	20 minutes	Baseline
Provisioned Mode: Scenario 1 (fine-tuned minimum event pollers)
Minimum event pollers = 10	11 minutes	45%
Minimum event pollers = 100	6 minutes	70%
Provisioned Mode: Scenario 2 (default minimum event pollers)
Minimum event pollers = 1	12 minutes	40%

Table: Performance profile for reference test case before and after activating Provisioned Mode for ESM

Observability and Pricing

You can observe the usage of event pollers by monitoring the ProvisionedPollers Amazon CloudWatch metric, which measures the number of event pollers that actively processed at least one event in the last 5-minute window.

Pricing is based on the provisioned minimum event pollers and the number of event pollers consumed during automatic scaling. Provisioned Mode introduces a billing unit called Event Poller Unit (EPU). Each EPU supports up to 20 MB/s of throughput for event polling. The number of event pollers allocated on an EPU depends on the throughput consumed by each event poller. You pay for the number of EPUs used and the duration they run for, measured in Event Poller Unit hours. For details, refer to AWS Lambda pricing.

Best practices and considerations

The optimal configuration of minimum and maximum event pollers for your Kafka Event Source Mapping (ESM) depends on your application’s performance requirements. Start with the default minimum event pollers to baseline the performance profile, and adjust event pollers based on observed message processing patterns and your application’s performance requirements. For workloads with spiky traffic and strict performance needs, increase the minimum event pollers to handle sudden surges. You can fine-tune the minimum required event pollers by evaluating your desired throughput, your observed throughput – which depends on factors like the ingested messages per second and average payload size, and using the throughput capacity of one event poller (up to 5 MB/s) as reference. Note that to maintain ordered processing within a partition, Lambda caps the maximum event pollers at the number of partitions in the topic.

Update your network settings to remove PrivateLink VPC endpoints and associated permissions for existing ESMs when you activate Provisioned Mode.

Conclusion

Provisioned Mode for Lambda ESM allows you to fine-tune the throughput for your Kafka ESMs by configuring a minimum and maximum number of event pollers. This provides a responsive auto-scaling behavior for Kafka applications that have stringent performance requirements and see unpredictable and spiky traffic. You can fine-tune your configured event pollers based on your requirements and monitor usage via CloudWatch metrics. Provisioned Mode also simplifies network configuration by removing the requirement to configure PrivateLink.

For more serverless learning resources, visit Serverless Land.

Implementing transactions using JMS2.0 in Amazon MQ for ActiveMQ

2024-11-25 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/implementing-transactions-using-jms2-0-in-amazon-mq-for-activemq/

This post is written by Paras Jain, Senior Technical Account Manager and Vinodh Kannan Sadayamuthu, Senior Specialist Solutions Architect

This post describes the transactional capabilities of the ActiveMQ broker in Amazon MQ by using a producer client application written using the Java Messaging System(JMS) 2.0 API. The JMS 2.0 APIs are easier to use and have fewer interfaces than the previous version. To learn about ActiveMQ’s JMS 2.0 support, refer to the ActiveMQ documentation on JMS2.0. Also check out What’s New in JMS 2.0 to learn more about features in JMS2.0.

Amazon MQ now supports ActiveMQ 5.18. Amazon MQ also introduces a new semantic versioning system that displays the minor version (e.g., 5.18) and keeps your broker up-to-date with new patches (e.g., 5.18.4) within the same minor version. ActiveMQ 5.18 adds support for JMS 2.0, Spring 5.3.x, and several dependency updates and bug fixes. For the complete details, see release notes for the Active MQ 5.18.x release series.

Overview

Messaging Patterns in Distributed Systems

Implementing messaging in a message-broker based distributed messaging often involves a fire-and-forget mechanism. Message producers send the messages to the broker and it is message broker’s responsibility to ensure that the messages are delivered to the consumers. In non-transactional use cases, the messages are independent of each other. However, in some situations, a group of messages needs to be delivered to consumers as part of a single transaction. This means either all the messages in the group are to be delivered to the consumer or none of those messages are delivered.

ActiveMQ 5.18 provides two levels of transaction support — JMS transactions and XA transactions.

JMS transactions are used when multiple messages need to be sent to the ActiveMQ broker as a single atomic unit. This transactional behavior is enabled by invoking the commit() and rollback() methods on a Session (for JMS 1.x) or JMSContext (for JMS 2.0) object. If all the messages are successfully sent, the transaction can be committed, ensuring that the messages are processed as a unit. If any issues occur during the sending process, the transaction can be rolled back, preventing the partial delivery of messages. This transactional capability is crucial when maintaining data integrity and ensuring that complex messaging operations are executed reliably. See ActiveMQ FAQ – How Do Transactions work FAQ for more details on how transactions work in ActiveMQ.

XA transactions are used when two or more messages need to be sent to ActiveMQ brokers and other distributed resources in a transactional manner. This is achieved by using an XA Session, which acts as an XA resource. See ActiveMQ FAQ – Should I use XA transactions FAQ for more details on XA transactions.

Transactional use case in an Order Management System

The example in this blog post shows the transactional capabilities in an Order Management System (OMS) application, using ActiveMQ as the message broker. Upon receiving an order, the OMS application sends a message (message 1) to the warehouse queue to start the packing process. Then the application runs an internal business process. If this process is successful, the application sends another message (message 2) to the shipping queue to start the package pickup process. In the event of internal business process failure, it is necessary to prevent message 2 from being sent to the shipping queue and rollback message 1 from the warehouse queue.

The flowchart below illustrates the logic behind the transactional use case featured in this example.

Flowchart describing transactional use case.

The JMS client stores both messages in-memory until the transaction is committed or rolled back. The client achieves this by maintaining a Transacted Session between the message producer client and the broker. A transacted session is a session that uses transactions to ensure message delivery. In our example, transacted session is created using the following statement.

JMSContext jmsContext = connectionFactory.createContext(adminUsername, adminPassword, Session.SESSION_TRANSACTED);

In the example for this post, we have shown a transacted session between the message producer and the broker. We are not showing transactions between the broker and the message consumer. You can implement it using the similar pattern.

Creating ActiveMQ broker

The following prerequisites are required to create and configure ActiveMQ broker in Amazon MQ.

Prerequisites:

Install the AWS Command Line Interface(CLI).
Install Java 11 or above.
Configure AWS CLI with the IAM Principal(user/role) with AmazonMQFullAccess permission attached.

To create a broker (AWS CLI):

Run the following command to create the broker. This creates a publicly accessible broker for testing only. When creating brokers for production use, adhere to the Security best practices for Amazon MQ.
```
aws mq create-broker \
    --broker-name <broker-name> \
    --engine-type activemq \
    --engine-version 5.18 \
    --deployment-mode SINGLE_INSTANCE \
    --host-instance-type mq.t3.micro \
    --auto-minor-version-upgrade \
    --publicly-accessible \
    --users Username=<username>,Password=<password>,ConsoleAccess=true
```
Replace <broker-name> with the name you want to give to the broker. Replace <username> and <password> as per the create-broker CLI documentation. After the successful execution of the command the BrokerArn and the BrokerId is displayed on the command line. Note down these values.Creation of the broker takes about 15 minutes.
Run the following command to get the status
```
aws mq describe-broker --broker-id <BrokerId> --query 'BrokerState'
```
Proceed to next step once the broker state is Running.
Get the console URL and other broker endpoints by running the following command
```
aws mq describe-broker --broker-id <BrokerId> --query 'BrokerInstances[0]’
```
Note the ConsoleURL and ssl endpoint from the output.

Configuring the message producer client

The sample code in this post uses a sample message producer client written using JMS 2.0 API to send messages to the ActiveMQ broker.

In case of a successful transaction, the producer client sends a message to the first queue and waits for 15 seconds. Then it sends the message to the second queue and waits for another 15 seconds. Finally, it commits the transaction.
In case of a failed transaction, the producer client sends the first message and waits for 15 seconds. Then the code introduces an artificial failure, causing the transaction rollback.The 15 seconds wait time provides you the opportunity to verify the number of messages at broker side as the program progresses through the transaction flow. Until the producer client commits the transaction, none of the messages are sent to the broker, even for a successful transaction.

To download and configure the sample client:

Get the Amazon MQ Transactions Sample Jar from the GitHub repository.
To run the sample client, use the java command with -jar option which runs the program encapsulated in a jar file. The syntax for running the sample client is:
```
java -jar <path-to-jar-file>/<jar-filename> <username> <password> <ssl-endpoint> <first-queue> <second-queue> <message> <is-transaction-successful> 
```
Usage:
<path-to-jar-file> – path in your local machine where you have downloaded the jar file.
<jar-filename> – name of the jar file.
<username> – username you selected while creating the broker.
<password> – password you selected while creating the broker.
<ssl-endpoint> – ssl endpoint you noted down in the step above.
<first-queue> – name of the first queue in the transaction.
<second-queue> – name of the second queue in the transaction.
<message> – message text.
<is-transaction-successful> – flag to tell the producer client if the transaction has to be successful or not.

Testing successful transactions

Following are the steps to test successful transactions with ActiveMQ:

List queues and message counts in ActiveMQ console
1. Navigate to the Amazon MQ console and choose your ActiveMQ broker.
2. Login to ActiveMQ Web Console from URLs in Connections panel.
3. Click on Manage ActiveMQ broker.
4. Provide username and password used for the user created when you created the broker.
5. Click on Queues on the top navigation bar.
6. Check warehouse-queue and shipping-queue are not listed.
Run the following command to send messages for order1 to both the queues successfully:
```
java -jar <path-to-jar-file>/<jar-filename> <username> <password> <ssl-endpoint> warehouse-queue shipping-queue order1 true
```
Replace the placeholders as mentioned in the command instructions above.With this command, the example producer client sends the first message to the warehouse-queue and prints the following message to the console and waits for 15 seconds.
```
Sending message: order1 to the warehouse-queue
Message: order1 is sent to the queue: warehouse-queue but not yet committed.
```
During the 15 seconds wait, refresh the browser and verify that the warehouse-queue is now listed but has no pending or enqueued messages.

After 15 seconds, the producer client sends the second message to the shipping-queue and prints the following message to the console and waits for 15 more seconds.
```
Sending message: order1 to the shipping-queue
Message: order1 is sent to the queue: shipping-queue but not yet committed.
```
During this 15-second wait, refresh the browser window again and verify that the shipping-queue is now listed, but like the warehouse-queue, it has no pending or enqueued messages.

Finally, the producer client commits both the messages and prints:
```
Committing
Transaction for Message: order1 is now completely committed.
```
Refresh the browser and verify warehouse-queue and shipping-queue have 1 pending and enqueued message each. The list will look like below:Image showing the shipping and warehouse queues

Repeat this process for testing more successful transactions.

Testing failed transactions

Note down the beginning number of pending and enqueued messages in each of the queues.
Run the following command and pass false for <is-transaction-successful> to introduce an artificial failure.
```
java -jar <path-to-jar-file>/<jar-filename> <username> <password> <ssl-endpoint> warehouse-queue shipping-queue failedorder1 false
```
Replace the placeholders as mentioned in the initial command instructions above.With this command, the example producer client sends the first message to the warehouse-queue and prints the following message to the console and waits for 15 seconds.
```
Sending message: failedorder1 to the warehouse-queue
Message: failedorder1 is sent to the queue: warehouse-queue but not yet committed.
```
During the 15 seconds wait, refresh the browser and verify that the counts in the warehouse-queue and shipping-queue are unchanged.

Finally, the client artificially introduces a failure and rolls back the transaction and prints:
```
Message: failedorder1 cannot be delivered because of an unknown error. Hence the transaction is rolled back.
```
Refresh the browser to confirm that the counts for both the queues are unchanged. This example starts with 1 message each in each queue which remained unchanged after the failed transaction.Image showing shipping and warehouse queues with unchanged counts.

Note that for both the successful and unsuccessful scenarios, the messages that are sent to the queues as part of a transaction are stored in-memory at the client side. These messages are sent to the broker only when the transaction is committed.

Cleanup

Delete the broker by running the following command
```
aws mq delete-broker --broker-id <BrokerId>
```

Conclusion

In this post, you created an Amazon MQ broker for ActiveMQ for version 5.18. You also learned about the new semantic versioning introduced by Amazon MQ. ActiveMQ 5.18.x brings support for JMS 2.0, Spring 5.3.x and dependency updates. Finally, you created a sample application using JMS 2.0 API showing transactional capabilities of the ActiveMQ 5.18.x broker.

To learn more about Amazon MQ, visit https://aws.amazon.com/amazon-mq/.

Automating event validation with Amazon EventBridge Schema Discovery

2024-11-25 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/automating-event-validation-with-amazon-eventbridge-schema-discovery/

This post is written by Kurt Tometich, Senior Solutions Architect, and Giedrius Praspaliauskas, Senior Solutions Architect, Serverless

Event-driven architectures face challenges with event validation due to unique domains, varying event formats, frequencies, and governance levels. Events are constantly evolving, requiring a balanced approach between speed and governance. This blog post describes approaches to consumer and producer event validation, focusing on automated solutions for producer validation using Amazon EventBridge and Amazon API Gateway.

Consumer and Producer Event Validation

In an event-driven system, events should be validated by both producers and consumers to maintain data integrity. The producers’ job is to create and send valid events before they are routed to consumers. Failing to do so can lead to data inconsistencies, downstream errors in processing and unnecessary costs. As a consumer, even if events come from a trusted source, validation should still be applied. Producers may change data format over time, data may become corrupt, or interfaces between the producer and consumer may alter it.

A common way to manage and route events is through an event bus. EventBridge is a serverless event bus that can perform discovery, versioning and consumption of event schemas. When schema discovery is enabled on an event bus, new schema versions are generated when the event structure changes. These schemas can be used to perform validation on events.

The EventBridge Schema registry stores schemas in OpenAPI or JSONSchema formats. Schemas can be added to the registry automatically through schema discovery or by manually uploading your schema to the registry through the AWS console or programmatically. Schema discovery automates the process of finding schemas and adding them to your registry. Schemas for AWS events are automatically added to the registry.

Once a schema is added to the registry, you can generate a code binding for the schema. This allows you to represent the event as a strongly typed object in your code. Code bindings are available for Golang, Java, Python, or TypeScript programming languages. If preferred language-specific bindings are not available, schemas can be downloaded and validated using third-party schema validation libraries. For example, Ajv for JavaScript or the jsonschema library for Python.

If using code bindings, you can download them using the console, API, or within a supported IDE using the AWS Toolkit. Code bindings can be used like other code artifacts. If an AWS Lambda function is used as a consumer, add the code binding as a layer dependency. Bindings are not automatically synced to any artifact repositories, such as AWS CodeArtifact. The Lambda function code in this solution can be extended to automate binding uploads to your artifact repository.

The following diagram depicts a common producer (left) and consumer (right) event architecture on AWS. Producers send events through API Gateway or directly to an EventBridge event bus. It’s common to use API Gateway as a front door to provide authorization, validation and pre-processing of incoming events. Events going directly to EventBridge may also come from SaaS Partner Integrations (Salesforce, Jira, ServiceNow, etc.) or an application running in a private subnet using the AWS private network to connect to EventBridge. For these events, you can use third-party libraries to validate events prior to them arriving on EventBridge.

Common Architecture for Producer and Consumer Event Validation

Workflow steps:

Producers send events through API Gateway or directly to EventBridge. API Gateway provides request validation, parses and sends events to EventBridge if they pass validation. Invalid events that do not match the schema in API Gateway will be rejected before reaching EventBridge. Events going directly to EventBridge are validated using third party schema validation libraries (e.g. Ajv for JavaScript and jsonschema library for Python).
With schema discovery enabled on a custom event bus, that bus will receive the event from an application and generate a new schema version in the registry. New schema versions are only created when the event structure changes. When new schema versions are created, a schema version created event is automatically emitted on the default EventBridge event bus. The default bus automatically receives AWS events. EventBridge rules can be configured to match all schema version changes or by filtering on schema name, type and other fields available on the event.
Consumers define EventBridge rules to react to schema version change events. Consumers download the schema or code bindings from EventBridge and perform validation and parsing.
Producers define EventBridge rules to react to schema version change events. The new schema is retrieved from the registry and either used in local development with third-party schema validation libraries, or a model in API Gateway is updated with the new schema directly. This step doesn’t exist as a native feature of EventBridge. The solution later in this post will demonstrate how to automate this step.

To scale this architecture to multiple event sources and API endpoints, you can create different models in API Gateway for each event schema. A model in API Gateway is a data schema that defines the structure and format of data for request and response payloads. Those models are then applied to different resources and methods defined on your APIs. The solutions below will demonstrate how event schemas can be automatically synced to models in API Gateway.

Solution Walkthrough

The following solutions use API Gateway to perform request validation and EventBridge schema discovery to automatically generate up-to-date schema versions. Both can be extended or modified to fit unique use cases. These solutions build upon the general producer and consumer validation architecture covered previously by incorporating automated solutions to downloading, processing and applying new schemas to API Gateway. Refer to the README.md file in the AWS Samples GitHub repository for pre-requisites, deployment instructions and testing.

Lambda Driven Schema Updater

The following architecture uses EventBridge schema discovery to generate new schema versions, download, process and post the schema to an API Gateway model for request validation. The Lambda schema updater function will trigger on schema version changes. The function trigger can be enabled or disabled by updating the rule in EventBridge console.

This solution is a good fit for quick updates with minimal processing. If complex testing and validation is required before updating a new schema, see the CI/CD driven schema updater solution covered later in this post. The rule in this solution triggers when a new schema version is added to the registry. To filter further, the rule can be modified or additional processing can be applied to the Lambda function. This provides flexibility in handling multiple domains or event types.

Architecture for Lambda Driven Schema Updater

Workflow Steps:

Producers send events to API Gateway endpoint or directly to EventBridge.
API Gateway performs request validation on the body, modifies the event format and sends to EventBridge. If the event does not match the schema, API Gateway will reject the request.
A custom event bus will receive the event and an optional rule based on source can log all events for tracking and troubleshooting.
With schema discovery enabled on custom event bus, new event structures generate schema versions that are stored in the registry. If a new schema version is generated, consumers can download latest schema and code bindings from the registry.
The schema version creation rule will invoke the Lambda function.
The function will download, process and update the API Gateway model with the new schema. A new schema version is only generated if the structure of the event changes.

CI/CD Driven Schema Updater

The alternative approach uses a CI/CD pipeline to control schema changes. Instead of the Lambda function directly applying the new schema to the API Gateway model, it downloads, processes, and stores the schema in a repository. The CI/CD pipeline references the stored schema, performing additional tests and checks before the schema is promoted and enforced. This provides more control over the schema update process, though it introduces some additional complexity. The following diagram describes the CI/CD driven update process. The solution can be adapted to other artifact repositories and CI/CD systems.

Architecture for CI/CD Driven Schema Updater

Workflow steps:

Producers send events to API Gateway endpoint or directly to EventBridge.
API Gateway will perform request validation against the body, modify the event format and send to EventBridge.
A custom event bus will receive event and an optional rule based on source can log all events for tracking and troubleshooting.
With discovery enabled on the custom event bus, schema versions are produced and stored in the registry.
The schema version creation rule will invoke the Lambda function.
The function will download, process and store the new schema in a repository of choice (i.e. S3, Git, Artifact Repository).
The CI/CD pipeline updates the model in API Gateway and runs any necessary tests.
The consumer downloads schema and code bindings from appropriate repositories.

Conclusion

Event validation can be challenging, but leveraging schema discovery and request validation minimizes custom logic and overhead. EventBridge can discover new schemas from events, while API Gateway validates incoming requests. This approach streamlines validation, improves data quality, and reduces the maintenance burden of manual validation.

For more information on event driven architectures, you can view additional resources on AWS Samples and Serverless Land.

Simplifying developer experience with variables and JSONata in AWS Step Functions

2024-11-22 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/simplifying-developer-experience-with-variables-and-jsonata-in-aws-step-functions/

This post is written by Uma Ramadoss, Principal Specialist SA, Serverless and Dhiraj Mahapatro, Principal Specialist SA, Amazon Bedrock

AWS Step Functions is introducing variables and JSONata data transformations. Variables allow developers to assign data in one state and reference it in any subsequent steps, simplifying state payload management without the need to pass data through multiple intermediate states. With JSONata, an open source query and transformation language, you now perform advanced data manipulation and transformation, such as date and time formatting and mathematical operations.

This blog post explores the powerful capabilities of these new features, delving deep into simplifying data sharing across states using variables and reducing data manipulation complexity through advanced JSONata expressions.

Overview

Customers choose Step Functions to build complex workflows that involve multiple services such as AWS Lambda, AWS Fargate, Amazon Bedrock, and HTTP API integrations. Within these workflows, you build states to interface with these various services, passing input data and receiving responses as output. While you can use Lambda functions for date, time, and number manipulations beyond Step Functions’ intrinsic capabilities, these methods struggle with increasing complexity, leading to payload restrictions, data conversion burdens, and more state changes. This affects the overall cost of the solution. You use variables and JSONata to address this.

To illustrate these new features, consider the same business use case from the JSONPath blog, a customer onboarding process in the insurance industry. A potential customer provides basic information, including names, addresses, and insurance interests, while signing up. This Know-Your-Customer (KYC) process starts a Step Functions workflow with a payload containing these details. The workflow decides the customer’s approval or denial, followed by sending a notification.

{
  "data": {
    "firstname": "Jane",
    "lastname": "Doe",
    "identity": {
      "email": "[email protected]",
      "ssn": "123-45-6789"
    },
    "address": {
      "street": "123 Main St",
      "city": "Columbus",
      "state": "OH",
      "zip": "43219"
    },
    "interests": [
      {"category": "home", "type": "own", "yearBuilt": 2004, "estimatedValue": 800000},
      {"category": "auto", "type": "car", "yearBuilt": 2012, "estimatedValue": 8000},
      {"category": "boat", "type": "snowmobile", "yearBuilt": 2020, "estimatedValue": 15000},
      {"category": "auto", "type": "motorcycle", "yearBuilt": 2018, "estimatedValue": 25000},
      {"category": "auto", "type": "RV", "yearBuilt": 2015, "estimatedValue": 102000},
      {"category": "home", "type": "business", "yearBuilt": 2009, "estimatedValue": 500000}
    ]
  }
}

The original workflow diagram illustrates the workflow without new features, while the new workflow diagram shows the workflow built by applying variables and JSONata. Access the workflows in the GitHub repository from the main (original workflow) and jsonata-variables (new workflow) branches.

Figure 1: Original Workflow

Figure 2: New Workflow

Setup

Follow the steps in the README to create this state machine and cleanup once testing is complete.

Simplifying data sharing with variables

Variables allow you to instantiate or assign state results to a variable that is referenced in future states. In a single state, you assign multiple variables with different values, including static data, results of a state, JSONPath or JSONata expressions, and intrinsic functions. The following diagram illustrates how variables are assigned and used inside a state machine:

Figure 3: Variable assignment and scope

Variable scope

In Step Functions, variables have a scope similar to programming languages. You define variables at different levels, with inner scope and outer scope. Inner scope variables are defined inside map, parallel, or nested workflows and these variables are only accessible within their specific scope. Alternatively, you set outer scope variables at the top level. Once assigned, these variables can be accessed from any downstream state irrespective of their order of execution in the future. However, as of the release of this blog, distributed map state cannot reference variables in outer scopes. The user guide on variable scope elaborates on these edge cases.

Variable assignment and usage
To set a variable’s value, use the special field Assign. The JSONata part of this blog post further down explains the purpose of {%%}.

"Assign": {
  "inputPayload": "{% $states.context.Execution.Input %}",
  "isCustomerValid": "{% $states.result.isIdentityValid and $states.result.isAddressValid %}"
}

Use a variable by writing a dollar sign ($) before its name.

{
  "TableName": "AccountTable",
  "Item": {
    "email": {
      "S": "{% $inputPayload.data.email %}"
    },
    "firstname": {
      "S": "{% $inputPayload.data.firstname %}"
    },....
}

Simplifying data manipulations with JSONata

JSONata is a lightweight query and transformation language for Json data. JSONata offers more capabilities compared to JSONPath within Step Functions.

Setting QueryLanguage to “JSONata” and using {%%} tags for JSONata expressions allows you to leverage JSONata within a state machine. Apply this configuration at the top level of the state machine or at each task level. JSONata at the task level gives you fine-grained control of choosing JSONata vs JSONPath. This approach is valuable for complex workflows where you want to simplify a subset of states with JSONata and continue to use JSONPath for the rest. JSONata provides you with more functions and operators than JSONPath and intrinsic functions in Step Functions. Activating the QueryLanguage attribute as JSONata at the state machine level disables JSONPath, therefore, restricting the use of InputPath, Parameters, ResultPath, ResultSelector, and OutputPath. Instead of these JSONPath parameters, JSONata uses Arguments and Output.

Optimizing simple states

One of the first things to notice in the new state machine is that the Verification process does not use Lambda functions anymore as seen in the following comparison:

Figure 4: Lambda functions replaced with Pass states

In the previous approach, a Lambda function is used to validate email and SSN using regular expressions:

const ssnRegex = /^\d{3}-?\d{2}-?\d{4}$/;
const emailRegex = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/;

exports.lambdaHandler = async event => {
  const { ssn, email } = event;
  const approved = ssnRegex.test(ssn) && emailRegex.test(email);

  return {
    statusCode: 200,
    body: JSON.stringify({ 
      approved,
      message: `identity validation ${approved ? 'passed' : 'failed'}`
    })
  }
};

With JSONata, you define regular expressions directly in the state machine’s Amazon States Language (ASL). You use a Pass state and $match() from JSONata to validate the email and the SSN.

{
  "StartAt": "Check Identity",
   "States": {
    "Check Identity": {
      "Type": "Pass",
      "QueryLanguage": "JSONata",
      "End": true,
      "Output": {
        "isIdentityValid": "{% $match($states.input.data.identity.email, /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$/) and $match($states.input.data.identity.ssn, /^(\\d{3}-?\\d{2}-?\\d{4}|XXX-XX-XXXX)$/) %}"
      }
    }
   }
}

The same applies to validate the address inside a Pass state using sophisticated JSONata string functions like $length, $trim, $each, and $not from JSONata:

{
  "StartAt": "Check Address",
  "States": {
    "Check Address": {
      "Type": "Pass",
      "QueryLanguage": "JSONata",
      "End": true,
      "Output": {
        "isAddressValid": "{% $not(null in $each($states.input.data.address, function($v) { $length($trim($v)) > 0 ? $v : null })) %}"
      }
    }
  }
}

When using JSONata, $states becomes a reserved variable.

Result aggregation

Previously with JSONPath, using an expression outside of a Choice state was not available. That is not the case anymore with JSONata. The parallel state, in the example, gathers identity and address verification results from each sub-step. You merge the results into a boolean variable isCustomerValid.

"Verification": {
  "Type": "Parallel",
  "QueryLanguage": "JSONata",
  ...
  "Assign": {
    "inputPayload": "{% $states.context.Execution.Input %}",
    "isCustomerValid": "{% $states.result.isIdentityValid and $states.result.isAddressValid %}"
  },
  "Next": "Approve or Deny?"
}

The crucial part to note here is the access to results via $states.result and use of AND boolean-operator inside {%%}. This ultimately makes the downstream Choice state, which uses this variable, simpler. Operators in JSONata give you flexibility to write expressions like these wherever possible, which reduces the need of a compute layer to process simple data transformations.

Additionally, the Choice state becomes simpler to use with flexible JSONata operators and expressions, as long as the expressions within {%%} result in a true or false value.

"Approve or Deny?": {
  "Type": "Choice",
  "QueryLanguage": "JSONata",
  "Choices": [
    {
      "Next": "Add Account",
      "Condition": "{% $isCustomerValid %}"
    }
  ],
  "Default": "Deny Message"
}

Intrinsic functions as JSONata functions

Step Functions provides built-in JSONata functions to enable parity with Step Functions’ intrinsic functions. The DynamoDB putItem step shows how you use $uuid() that has the same functionality as States.UUID() intrinsic function. You also get JSONata specific functions on date and time. The following state shows the use of $now() to get the current timestamp as ISO-8601 as a string before inserting this item to the DynamoDB table.

"Add Account": {
  "Type": "Task",
  "QueryLanguage": "JSONata",
  "Resource": "arn:aws:states:::dynamodb:putItem",
  "Arguments": {
    "TableName": "AccountTable",
    "Item": {
      "PK": {
        "S": "{% $uuid() %}"
      },
      "email": {
        "S": "{% $inputPayload.data.identity.email %}"
      },
      "name": {
        "S": "{% $inputPayload.data.firstname & ' ' & $inputPayload.data.lastname  %}"
      },
      "address": {
        "S": "{% $join($each($inputPayload.data.address, function($v) { $v }), ', ') %}"
      },
      "timestamp": {
        "S": "{% $now() %}"
      }
    }
  },
  "Next": "Interests"
}

Notice that you don’t apply the .$ notation in S.$ anymore as JSONata expressions reduces developer pain while building state machine ASL. Explore the additional JSONata functions accessible within Step Functions.

Advanced JSONata

JSONata’s flexibility stems from its pre-built functions, higher-order functions support, and functional programming constructs. With JSONPath, you used the advanced expressions "InputPath": "$..interests[?(@.category==home)]" to filter Home insurance related interests from the interests array. JSONata does much more than filtering. For example, you look for home insurance interests, totalAssetValue of the category type as home, and refer to existing fields like name and email as JSONata variables:

(
    $e := data.identity.email;
    $n := data.firstname & ' ' & data.lastname;
    
    data.interests[category = 'home']{
      'customer': $n,
      'email': $e,
      'totalAssetValue': $sum(estimatedValue),
      category: {type: yearBuilt}
    }
)

The result JSON will be:

{
  "customer": "Jane Doe",
  "email": "[email protected]",
  "totalAssetValue": 1400000,
  "home": {
    "own": 2004,
    "business": 2009
  }
}

By following these steps, you ascend one level by collecting all of the insurance interests and their aggregated results. Notice that the category filter is no longer present.

(
    $e := data.identity.email;
    $n := data.firstname & ' ' & data.lastname;
    
    data.interests{
      'customer': $n,
      'email': $e,
      'totalAssetValue': $sum(estimatedValue),
      category: {type: yearBuilt}
    }
)

which results in:

{
  "customer": "Jane Doe",
  "email": "[email protected]",
  "totalAssetValue": 1549000,
  "home": {
    "own": 2004,
    "business": 2009
  },
  "auto": {
    "car": 2012,
    "motorcycle": 2018,
    "RV": 2015
  },
  "boat": {
    "snowmobile": 2020
  }
}

Discovering complex expressions

Use the JSONata playground with your sample data to discover detailed and complex expressions that fit your requirements. The following is an example of using the JSONata playground:

Figure 5: JSONata playground

Considerations

Variable Size

The maximum size of a single variable is 256Kib. This limit helps you bypass the Step Functions payload size restriction by letting you store state outputs in separate variables. While each individual variable can be up to 256Kib in size, the total size of all variables within a single Assign field cannot exceed 256Kib. Use Pass states to workaround this limitation, however, the total size of all stored variables cannot exceed 10MiB per execution.

Variable visibility

Variables are a powerful mechanism to simplify the data sharing across states. Prefer them over ResultPath, OutputPath or JSONata’s Output fields because of their ease of use and flexibility. There are two situations where you might still use Output. First, you can’t access inner-scoped variables in the outer scope. In these cases, fields in Output can help share data between different workflow levels. Second, when sending a response from the final state of the workflow, you may need to use fields in Output fields. The following transition diagram from JSONPath to JSONata provides additional details:

Figure 6: Transition from JSONPath to JSONata

Additionally, variables assigned to a specific state are not accessible in that same state:

"Assign Variables": {
  "Type": "Pass",
  "Next": "Reassign Variables",
  "Assign": {
    "x": 1,
    "y": 2
  }
},
"Reassign Variables": {
  "Type": "Pass",
  "Assign": {
    "x": 5,
    "y": 10,
      ## The assignment will fail unless you define x and y in a prior state.
      ## otherwise, the value of z will be 3 instead of 15.
    "z": "{% $x+$y %}"
  },
  "Next": "Pass"
}

Best practices

Step Functions’ validation API provides semantic checks for workflows, allowing for early problem identification. To ensure safe workflow updates, it’s best to combine the validation API with versioning and aliases for incremental deployment.

Multi-line expressions in JSONata are not valid JSON. Therefore, use a single line as string delimited by a semicolon “;” where the last line returns the expression.

Mutually exclusive

Use of QueryLanguage type is mutually exclusive. Do not mix JSONPath/intrinsic functions and JSONata during variable assignments. For example, the below task fails because the variable b uses JSONata, whereas c uses an intrinsic function.

"Store Inputs": {
  "Type": "Pass",
  "QueryLanguage": "JSONata"
  "Assign": {
    "inputs": {
      "a": 123,
      "b": "{% $states.input.randomInput %}",
      "c.$": "States.MathRandom($.start, $.end)"
    }
  },
  "Next": "Average"
}

To use variables with JSONPath, set the QueryLanguage to JSONPath or remove this attribute from the task definition.

Conclusion

With variables and JSONata, AWS Step Functions now elevates the developer’s experience to write elegant workflows with simpler code in Amazon States Language (ASL) that matches with the normal programming paradigm. Developers can now build faster and write cleaner code by cutting out extra data transformation steps. These capabilities can be used in both new and existing workflows, giving you the flexibility to upgrade from JSONPath to JSONata and variables.

Variables and JSONata are available at no additional cost to customers in all the AWS regions where AWS Step Functions is available. For more information, refer to the user guide for JSONata and variables, as well as the sample application in the jsonata-variables branch.

To expand your serverless knowledge, visit Serverless Land.

Introducing new Event Source Mapping (ESM) metrics for AWS Lambda

2024-11-22 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/introducing-new-event-source-mapping-esm-metrics-for-aws-lambda/

This post is written by Tarun Rai Madan, Principal Product Manager – Serverless, and Rajesh Kumar Pandey, Principal Software Engineer, Serverless

Today, AWS is announcing new opt-in Amazon CloudWatch metrics for AWS Lambda Event Source Mappings that subscribe to Amazon Simple Queue Service (Amazon SQS), Amazon Kinesis, and Amazon DynamoDB event sources. These metrics include PolledEventCount, InvokedEventCount, FilteredOutEventCount, FailedInvokeEventCount, DeletedEventCount, DroppedEventCount, and OnFailureDestinationDeliveredEventCount. The new metrics enable customers to monitor the processing state of events read by Event Source Mappings (ESMs), and helps them diagnose processing issues.

Previously, customers found it challenging to monitor the processing state of events read by an ESM. An ESM is a resource that polls events from an event source and invokes a Lambda function. With the new metrics for ESMs, you can count events by their processing state, which includes events that were polled, invoked, filtered out, deleted, dropped, failed, or sent to on-failure destination.

Overview

Customers building modern event-driven applications use services like SQS, Kinesis, and DynamoDB as fundamental building blocks for developing decoupled architectures, and use a Lambda function as a consumer to benefit from its simplicity, auto-scaling and cost effectiveness. To subscribe to an event source, customers configure a Lambda Event Source Mapping (ESM). An ESM is a fully-managed Lambda resource that runs an event poller which polls, processes (e.g., filters and batches), and delivers the events to a Lambda function. Due to the processing that happens on an ESM, for example, filtering, batching, and delivery to on-failure destinations, events can end up in varying terminal states. As a result, some polled events may not invoke a Lambda function. Previously, the count of polled, filtered, invoked, deleted or dropped events was not visible to customers. This made it challenging for customers to diagnose processing issues with their ESM, resulting from faulty permissions, misconfiguration, or function errors.

What’s new

With today’s announcement, customers can opt-in to CloudWatch metrics to monitor the processing state of events that are read by an ESM configured with SQS, Kinesis and DynamoDB as event sources.

PolledEventCount metric counts the number of events read by an ESM from the event source.

InvokedEventCount metric counts the number of events that invoked your Lambda function. For an event that experiences function errors, this metric may increase the count multiple times for the same polled event, due to retries.

FilteredOutEventCount metric counts the number of events filtered out by your ESM, based on the Filter Criteria defined by you.

FailedInvokeEventCount metric counts the number of events that attempted to invoke a Lambda function, but encountered partial or complete failure.

DeletedEventCount metric counts the events that have been deleted from the SQS queue by Lambda upon successful processing.

DroppedEventCount metric counts the number of events dropped due to event expiry or exhaustion of retry attempts, for Kinesis and DynamoDB ESMs configured with MaximumRecordAgeInSeconds or MaximumRetryAttempts.

OnFailureDestinationDeliveredEventCount metric counts the events sent to an on-failure destination upon reaching the MaximumRecordAgeInSeconds or MaximumRetryAttempts, for ESMs configured with DestinationConfig.

How to use the new ESM metrics

Once an ESM is created and reaches enabled state, it continuously polls the event source for new events. You can monitor the PolledEventCount metric to catch issues with polling due to misconfigured or deleted event source, misconfigured or deleted Lambda function execution role, incorrect permissions, or throttles from the event source. This metric typically increases when there is an increase in traffic in the event source. You can observe the InvokedEventCount metric to catch issues with the Lambda function, and whether the events are properly invoking your Lambda function. In case of Lambda function errors, InvokedEventCount could be more than PolledEventCount due to retries. This metric would also increase when there is an increase in events processed by an ESM. For ESMs that have filter criteria configured, you can monitor the FilteredOutEventCount to count events that were not sent to a Lambda function because they were filtered out per the defined filter criteria.

You can monitor the FailedInvokeEventCount metric to observe the number of events that failed processing when Lambda service tried to invoke your Lambda function. Invocations can fail due to network configuration issues, incorrect permissions, or a deleted Lambda function, version, or alias. If your event source mapping has partial batch responses enabled, this metric includes any event with a non-empty BatchItemFailures in the response. If all events in a batch are successfully processed by your Lambda function, Lambda service emits a 0 value for this metric. You can use the DeletedEventCount metric to ensure that processed events have been successfully deleted from your SQS queue after being processed by the Lambda function. You can use the DroppedEventCount metric to identify issues with message backlogs or misconfigured event expiry rules. You can use the OnFailureDestinationDeliveredEventCount metric to monitor issues such as failed events caused by Lambda function invocation errors.

The classification for available Lambda ESM metrics by event source is presented below:

CloudWatch metric	SQS	DynamoDB	Kinesis Data Stream
PolledEventCount	√	√	√
InvokedEventCount	√	√	√
FilteredOutEventCount	√	√	√
FailedInvokeEventCount	√	√	√
DeletedEventCount	√
DroppedEventCount		√	√
OnFailureDestinationDeliveredEventCount		√	√

Activating and testing the new ESM metrics

You can enable the new ESM metrics using AWS Lambda Console, AWS Command Line Interface (CLI), Lambda ESM API, AWS SDK, AWS CloudFormation, and AWS Serverless Application Model (SAM). The metrics will be published under the AWS/Lambda namespace and EventSourceMappingUUID dimension in the CloudWatch console. To learn more, see CloudWatch metrics for Lambda.

Using AWS CLI

To turn on the new metrics using AWS CLI, use the –metrics-config parameter.

aws lambda create-event-source-mapping \
    --region <region-name> \
    --function-name <function-name> \
    --event-source-arn <event-source-arn> \
    --metrics-config '{"Metrics": ["EventCount"]}'

Using AWS Lambda Console

To turn on the new metrics using AWS Lambda Console, click on “Enable metrics” while adding the trigger for your function.

Figure 1: Enabling ESM metrics in AWS Console

A typical scenario where the new ESM metrics can help with better observability is an ESM that uses event filtering. To test the ESM metrics, you can deploy a sample Lambda application with Kinesis as an event source using this serverless pattern, which uses event filtering with a certain criteria to control which events are sent to Lambda. Use this pattern for both the example scenarios; please follow the setup guidelines for this pattern and continue with testing for the scenarios. Running this sample project in your account may incur charges. See AWS Lambda pricing and Amazon Kinesis pricing.

Figure 2: Configuring Lambda function with Kinesis event source

Example scenario 1: ESM metrics with event filtering configured

The following diagram shows the results for the test scenario with Kinesis ESM, where the total polled events, filtered events, invoked events, and failed events are represented by PolledEventCount, FilteredOutEventCount, InvokedEventCount and FailedInvokeEventCount.

Figure 3: ESM metrics for scenario 1

Example Scenario 2: ESM metrics with event filtering and On-Failure Destination configured

Another common scenario is where you want to have visibility around the number of events delivered to Lambda function, events filtered, and additionally, the count of events routed to on-failure destination upon failure. To test this scenario, follow a setup similar to the one in scenario 1. Create or update the ESM with an on-failure destination, and set MaximumRetryCount to 1, as shown below.

aws lambda update-event-source-mapping \
    --uuid <event-source-mapping-uuid> \
    --maximum-retry-attempts 1 \
    --filter-criteria '{"Filters": [{"Pattern": "{\"data\": { \"tire_pressure\": [ { \"numeric\": [ \"<\", 32 ] } ] } }"}]}' \
    --destination-config '{"OnFailure": {"Destination": "<your_SQS_queue_ARN>"}}' \
    --function-name <lambda-function-name>

Publish a sample payload which matches the FilterCriteria defined above. Also generate sample data with different “tire_pressure” < 32 to match the event and invoke the Lambda function.

Sample Data:

{
    "time": "2021-11-09 13:32:04",
    "fleet_id": "fleet-452",
    "vehicle_id": "a42bb15c-43eb-11ec-81d3-0242ac130003",
    "lat": 47.616226213162406,
    "lon": -122.33989110734133,
    "speed": 43,
    "odometer": 43519,
    "tire_pressure": [41, 40, 31, 41],
    "weather_temp": 76,
    "weather_pressure": 1013,
    "weather_humidity": 66,
    "weather_wind_speed": 8,
    "weather_wind_dir": "ne"
}

Once you have published these records to the stream, you should be able to see the CloudWatch metrics under AWS/Lambda namespace with the EventSourceMappingUUID dimension, as shown below. Note that if an event experiences a function error, InvokedEventCount may increase multiple times for the same polled event due to automatic retries.

Figure 4: ESM metrics for scenario 2

Available Now

The new ESM metrics are generally available in all commercial regions that Lambda service is available in. Support is also available through AWS Lambda partners like Datadog, Elastic, and Lumigo. The Lambda service sends these new metrics to CloudWatch at no additional cost to you. However, charges apply for CloudWatch metrics at standard CloudWatch metrics pricing for these opt-in metrics, in addition to your AWS Lambda pricing and event source pricing.

Conclusion

With these new CloudWatch metrics, you can gain visibility into the processing state of your events that are polled by Lambda Event Source Mapping (ESM) for queue-based or stream-based applications. The blog explains the new metrics PolledEventCount, InvokedEventCount, FilteredOutEventCount, FailedInvokeEventCount, DeletedEventCount, DroppedEventCount, and OnFailureDestinationDeliveredEventCount, and how to use them to troubleshoot event processing issues for Lambda functions. These metrics help you track the invocation requests sent to Lambda via an ESM, monitor any delays or issues in processing, and take corrective actions if required. To learn more about these metrics, visit Lambda developer guide.

For more serverless learning resources, visit Serverless Land.

Node.js 22 runtime now available in AWS Lambda

2024-11-22 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/node-js-22-runtime-now-available-in-aws-lambda/

This post is written by Julian Wood, Principal Developer Advocate, and Andrea Amorosi, Senior SA Engineer.

You can now develop AWS Lambda functions using the Node.js 22 runtime, which is in active LTS status and ready for production use. Node.js 22 includes a number of additions to the language, including require()ing ES modules, as well as changes to the runtime implementation and the standard library. With this release, Node.js developers can take advantage of these new features and enhancements when creating serverless applications on Lambda.

You can develop Node.js 22 Lambda functions using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS SDK for JavaScript, AWS Serverless Application Model (AWS SAM), AWS Cloud Development Kit (AWS CDK), and other infrastructure as code tools.

To use this new version, specify a runtime parameter value of nodejs22.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 22 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes libraries to support common tasks such as observability, AWS Systems Manager Parameter Store integration, idempotency, batch processing, and more. You can also use Node.js 22 with Lambda@Edge to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 22 runtime in your serverless applications.

Node.js 22 language updates

Node.js 22 introduces several language updates and features that enhance developer productivity and improve application performance.

This release adds support for loading ECMAScript modules (ESM) using require(). You can enable this feature using the --experimental-require-module flag by configuring the NODE_OPTIONS environment variable. require() support for synchronous ESM graphs bridges the gap between CommonJS and ESM, providing more flexibility in module loading. It is important to note that this feature is currently experimental and may change in future releases.

WebSocket support which was previously available behind the --experimental-websocket flag is now enabled by default in Node.js 22. This brings a browser-compatible WebSocket client implementation to Node.js with no need for external dependencies. Native support simplifies building real-time applications and enhances the overall WebSocket experience in Node.js environments.

The new runtime also includes performance improvements to AbortSignal creation. This makes network operations faster and more efficient for the Fetch API and test runner. The Fetch API is also now considered stable in Node.js 22.

For TypeScript users, Node.js 22 introduces experimental support for transforming TypeScript-only syntax into JavaScript code. By using the --experimental-transform-types flag, you can enable this feature to support TypeScript syntax such as Enum and namespace directly. While you can enable the feature in Lambda, your function entrypoint (i.e. index.mjs or app.cjs) cannot currently be written using TypeScript as the runtime expects a file with a JavaScript extension. You can use TypeScript for any other module imported within your codebase.

For a detailed overview of Node.js 22 language features, see the Node.js 22 release blog post and the Node.js 22 changelog.

Experimental features that are unavailable

Node.js 22 includes an experimental feature to detect the module syntax automatically (CommonJS or ES Modules). This feature must be enabled when the Node.js runtime is compiled. Since the Lambda-provided Node.js 22 runtime is intended for production workloads, this experimental feature is not enabled in the Lambda build and cannot be enabled via an execution-time flag. To use this feature in Lambda, you need to deploy your own Node.js runtime using a custom runtime or container image with experimental module syntax detection enabled.

Performance considerations

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2022. Starting with Node.js 18, and continuing with Node.js 22, the Lambda Node.js runtimes include version 3. When upgrading from Node.js 16 or earlier runtimes and using the included version 2, you must upgrade your code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Amazon Linux 2023

The Node.js 22 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in Node.js 18 and earlier AL2-based images. If you deploy your Lambda function as a container image, you must update your Dockerfile to use dnf instead of yum when upgrading to the Node.js 22 base image from Node.js 18 or earlier.

Additionally AL2 includes curl and gnupg2 as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Using the Node.js 22 runtime in AWS Lambda

AWS Management Console

To use the Node.js 22 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 22.x when creating or updating a function. The Node.js 22 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

Creating Node.js function in AWS Management Console

To update an existing Lambda function to Node.js 22, navigate to the function in the Lambda console, then choose Node.js 22.x in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

Changing a function to Node.js 22

AWS Lambda container image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile.

FROM public.ecr.aws/lambda/nodejs:22
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node22.x to use this version:

AWSTemplateFormatVersion: "2210-09-09"
Transform: AWS::Serverless-2216-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs22.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

When you add function code directly in an AWS SAM or AWS CloudFormation template as an inline function, it is seen as common.js.

AWS SAM supports generating this template with Node.js 22 for new serverless applications using the sam init command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.NODEJS_22_X to use this version.

import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 22 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node22LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_22_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}

Conclusion

Lambda now supports Node.js 22 as a managed language runtime. This release uses the Amazon Linux 2023 OS as well as other improvements detailed in this blog post.

You can build and deploy functions using Node.js 22 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of infrastructure as code tool. You can also use the Node.js 22 container base image if you prefer to build and deploy your functions using container images.

The Node.js 22 runtime helps developers build more efficient, powerful, and scalable serverless applications. Read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 22. Try the Node.js runtime in Lambda today.

For more serverless learning resources, visit Serverless Land.

Track performance of serverless applications built using AWS Lambda with Application Signals

2024-11-21 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/track-performance-of-serverless-applications-built-using-aws-lambda-with-application-signals/

In November 2023, we announced Amazon CloudWatch Application Signals, an AWS built-in application performance monitoring (APM) solution, to solve the complexity associated with monitoring performance of distributed systems for applications hosted on Amazon EKS, Amazon ECS, and Amazon EC2. Application Signals automatically correlates telemetry across metrics, traces, and logs, to speed up troubleshooting and reduce application disruption. By providing an integrated experience for analyzing performance in the context of your applications, Application Signals gives you improved productivity focusing on the applications that support your most critical business functions.

Today we’re announcing the availability of Application Signals for AWS Lambda to eliminate the complexities of manual setup and performance issues required to assess application health for Lambda functions. With CloudWatch Application Signals for Lambda, you can now collect application golden metrics (the incoming and outgoing volume of requests, latency, faults, and errors).

AWS Lambda abstracts away the complexity of the underlying infrastructure, enabling you to focus on building your application without having to monitor server health. This allows you to shift your focus toward monitoring the performance and health of your applications, which is necessary to operate your applications at peak performance and availability. This requires deep visibility into performance insights such as volume of transactions, latency spikes, availability drops, and errors for your critical business operations and application programming interfaces (APIs).

Previously, you had to spend signiﬁcant time correlating disjointed logs, metrics, and traces across multiple tools to establish the root cause of anomalies, increasing mean time to recovery (MTTR) and operational costs. Additionally, building your own APM solutions with custom code or manual instrumentation using open source (OSS) libraries was time-consuming, complex, operationally expensive, and often resulted in increased cold start times and deployment challenges when managing large ﬂeets of Lambda functions. Now, you can use Application Signals to seamlessly monitor and troubleshoot health and performance issues in serverless applications, without requiring any manual instrumentation or code changes from your application developers.

How it works
Using the pre-built, standardized dashboards of Application Signals, you can identify the root cause of performance anomalies in just a few clicks by drilling down into performance metrics for critical business operations and APIs. This helps you visualize application topology which shows interactions between the function and its dependencies. In addition, you can define Service Level Objectives (SLOs) on your applications to monitor specific operations that matter most to you. An example of an SLO could be to set a goal that a webpage should render within 2000 ms 99.9 percent of the time in a rolling 28-day interval.

Application Signals auto-instruments your Lambda function using enhanced AWS Distro for OpenTelemetry (ADOT) libraries. This delivers better performance such as lower cold start latency,
memory consumption, and function invocation duration, so you can quickly monitor your applications.

I have an existing Lambda function appsignals1 and I will configure Application Signals in the Lambda Console to collect various telemetry on this application.

In the Configuration tab of the function I select Monitoring and operations tools to enable both the Application signals and the Lambda service traces.

I have an application myAppSignalsApp that has this Lambda function attached as a resource. I’ve defined an SLO for my application to monitor specific operations that matter most to me. I’ve defined a goal that states that the application executes within 10 ms 99.9 percent of the time in a rolling 1-day interval.

It can take 5-10 minutes for Application Signals to discover the function after it’s been invoked. As a result you’ll need to refresh the Services page before you can see the service.

Now I’m in the Services page and I can see a list of all my Lambda functions that have been discovered by Application Signals. Any telemetry that is emitted will be displayed here.

I can then visualize the complete application topology from the Service Map and quickly spot anomalies across my service’s individual operations and dependencies, using the newly collected metrics of volume of requests, latency, faults, and errors. To troubleshoot, I can click into any point in time for any application metric graph to discover correlated traces and logs related to that metric, to quickly identify if issues impacting end users are isolated to an individual task or deployment.

Available now
Amazon CloudWatch Application Signals for Lambda is now generally available and you can start using it today in all AWS Regions where Lambda and Application Signals are available. Today, Application Signals is available for Lambda functions that use Python and Node.js managed runtimes. We’ll continue to add support for other Lambda runtimes in near future.

To learn more, visit the AWS Lambda developer guide and Application Signals developer guide. You can submit your questions to AWS re:Post for Amazon CloudWatch, or through your usual AWS Support contacts.

– Veliswa.

Implementing custom domain names for private endpoints with Amazon API Gateway

2024-11-21 Chris McPeek

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/implementing-custom-domain-names-for-private-endpoints-with-amazon-api-gateway/

This post is written by Heeki Park, Principal Solutions Architect

Amazon API Gateway is introducing custom domain name support for private REST API endpoints. Customers choose private REST API endpoints when they want endpoints that are only callable from within their Amazon VPC. Custom domain names are simpler and more intuitive URLs that you can use with your applications and were previously only supported with public REST API endpoints. Now you can use custom domain names to map to private REST APIs and share those custom domain names across accounts using AWS Resource Access Manager (AWS RAM).

Overview of API Gateway connectivity

When considering network connectivity with API Gateway, two aspects are important to keep in mind: the integration type and the connectivity type. The following diagram shows examples of those considerations.

Figure 1: Overall architecture

The first aspect is the distinction between frontend integrations and backend integrations. Frontend integrations are how API clients like mobile devices, web browsers, or client applications connect to the API endpoint. Backend integrations are the API services to which your API Gateway endpoint proxies requests, like applications running on Amazon Elastic Compute Cloud (EC2) instances, Amazon Elastic Kubernetes Service (EKS) or Amazon Elastic Container Service (ECS) containers, or as AWS Lambda functions. The second aspect is whether that connectivity is via the public internet or via your private VPC.

Calling private REST API endpoints

In order to send requests to a private REST API endpoint, clients must operate within a VPC that is configured with a VPC endpoint. Once a VPC endpoint is configured, a client has three different options within the VPC for connecting to the API endpoint, depending on how the VPC and the VPC endpoint are configured.

If the VPC endpoint has private DNS enabled, the client can send requests to the standard endpoint URL: https://{api-id}.execute-api.{region}.amazonaws.com/{stage}. These requests resolve to the VPC endpoint, which then get routed to the appropriate API Gateway endpoint.

Figure 2: VPC endpoint configured with private DNS names enabled

Alternatively, if the VPC endpoint has private DNS disabled, the client can send requests to the VPC endpoint URL: https://{vpce-id}.execute-api.{region}.amazonaws.com/{stage}. One of the following headers also needs to be sent along with that request.

Host: {api-id}.execute-api.us-east-1.amazonaws.com
x-apigw-api-id: {api-id}

Finally, if the VPC endpoint has private DNS disabled and the private REST API endpoint is associated with the VPC endpoint, the client can send requests to the following URL: https://{api-id}-{vpce-id}.execute-api.{region}.amazonaws.com/{stage}. To associate a VPC endpoint with a private API, the following property configures that association.

      EndpointConfiguration:
        Type: PRIVATE
        VPCEndpointIds:
          - !Ref vpcEndpointId

You can see that configuration in the console, as follows.

Figure 3: Optional VPC endpoint configuration with private REST API endpoints

To simplify access to your private REST API endpoints, you can now also configure custom domain names, which functions as a stable vanity URL for your private APIs.

Implementing custom domain names for private endpoints

Before setting up a custom domain name for your private REST API endpoints, a VPC endpoint for API Gateway, an AWS Certificate Manager (ACM) certificate, an Amazon Route 53 private hosted zone, and one or more private REST API endpoints need to be configured.

Once those pre-requisites are set up, a custom domain name can be setup with the following steps:

In the API provider account, create a custom domain name and base path mapping.
In the provider account, use AWS RAM to create a resource share for the custom domain name. In the consumer account, accept the resource share request. This step is only required if the provider and consumer are in different AWS accounts.
In the consumer account, associate the custom domain name to a VPC endpoint.
In the consumer account, create a Route 53 alias to map the custom domain to the VPC endpoint.

Figure 4: Components for configuring a custom domain name

Step 1: Creating a private custom domain name

When configuring a custom domain name, two policies are used to manage permissions to the private custom domain name resource. Management policies specify which principals are allowed to associate a private custom domain name to a VPC endpoint. Resource-based policies specify which API consumers are allowed to invoke your private custom domain name.

Figure 5: Creating a private custom domain name

This is an example CloudFormation definition for a private custom domain name.

  DomainName:
    DependsOn: Certificate
    Type: AWS::ApiGateway::DomainNameV2
    Properties:
      CertificateArn: !Ref certificateArn
      DomainName: api.internal.example.com
      EndpointConfiguration:
        Types:
          - PRIVATE
      ManagementPolicy:
        Fn::ToJsonString:
          Statement:
            - Effect: Allow
              Principal:
                AWS:
                  - '123456789012'
              Action: apigateway:CreateAccessAssociation
              Resource: 'arn:aws:apigateway:us-east-1::/domainnames/*'
      Policy:
        Fn::ToJsonString:
          Statement:
            - Effect: Deny
              Principal: '*'
              Action: execute-api:Inovke
              Resource:
                - execute-api:/*
              Condition:
                StringNotEquals:
                  aws:SourceVpce: !Ref vpceEndpointId
            - Effect: Allow
              Principal:
                AWS:
                  - '123456789012'
              Action: execute-api:Invoke
              Resource:
                - execute-api:/*
      SecurityPolicy: TLS_1_2

In this example, the management policy specifies that the account 123456789012 is allowed to associate a private custom domain name with a VPC endpoint. The resource-based policy then denies any request that does not come from a particular VPC endpoint and only allows invoke requests that come from that same account 123456789012.

The private custom domain name then needs to be mapped to a private REST API.

  Mapping:
    DependsOn: DomainName
    Type: AWS::ApiGateway::BasePathMappingV2
    Properties:
      BasePath: app1
      DomainName: api.internal.example.com
      DomainNameId: abcde12345
      RestApiId: !Ref apiId
      Stage: !Ref stageName

In this example, the BasePath is set to app1. If the Stage is set as dev, then the private endpoint can be accessed via https://api.internal.example.com/app1/dev. The domain id is the identifier for the private custom domain name.

Note that with public custom domain names, the domain name has to be unique in the region, since they are resolved publicly. With private custom domain names, since they are resolved within a VPC, a private custom domain name with the same name can be created in different accounts. The private custom domain name is then resolved to the VPC endpoint in that account’s VPC.

Step 2: Sharing the private custom domain name using AWS RAM

In order for API consumers to access the private custom domain name from another account, the custom domain name needs to be shared with the consumer accounts using RAM. If the API provider and API consumer are in the same account, this step with RAM can be skipped.

Figure 6: Sharing the private custom domain name

The following CloudFormation definition creates a resource share in the provider account.

  Share:
    Type: AWS::RAM::ResourceShare
    Properties:
      Name: private-custom-domain-name
      Principals: 
        - '123456789012'
      ResourceArns: 
        - 'arn:aws:apigateway:us-east-1::/domainnames/api.internal.example.com+abcde12345'

The allowed Principals for the resource share specifies the consumer account ids. The ResourceArns specify the ARN of the private custom domain name.

In the consumer account, an administrator receives a notification to accept the resource share. This request must be accepted to allow the consumer account to see the private custom domain name. This handshake acts as a mutual agreement between the accounts to allow the private custom domain name to be exposed from the provider account to the consumer account. If the provider and consumer accounts are in the same AWS Organization, the share is automatically accepted on behalf of consumers.

Step 3: Associating the private custom domain name to a VPC endpoint

The private custom domain name is now visible in the consumer account. Next, associate the private custom domain name with a VPC endpoint in the consumer account and in the VPC where the client applications reside.

Figure 7: Associating the private custom domain name to a VPC endpoint

  Association:
    DependsOn: DomainName
    Type: AWS::ApiGateway::DomainNameAccessAssociation
    Properties:
      AccessAssociationSource: vpce-abcdefgh123456789
      AccessAssociationSourceType: VPCE
      DomainNameArn: 'arn:aws:apigateway:us-east-1::/domainnames/api.internal.example.com+abcde12345'

The AccessAssociationSource is the VPC endpoint id, and the DomainNameArn is the same ARN that is used in the RAM resource share.

Step 4: Creating a Route 53 alias for the custom domain name

The final step before being able to test the custom domain name in the consumer account is setting up a Route 53 alias. That alias is configured in a private hosted zone that is associated with the VPC where the VPC endpoint and client applications reside. The alias resolves the fully qualified domain name (FQDN) to the VPC endpoint DNS name.

Figure 8: Creating a Route 53 alias

The following CloudFormation definition creates that alias.

  Alias:
    Type: AWS::Route53::RecordSet
    Properties:
      HostedZoneId: !Ref privateZoneId
      Name: api.internal.example.com
      ResourceRecords:
        - vpce-abcdefgh123456789-abcd1234.execute-api.us-east-1.vpce.amazonaws.com
      TTL: 300
      Type: CNAME

The ResourceRecords point to the FQDN of the VPC endpoint to which our private custom domain name is associated. Once this alias is created, your client applications can test if it can successfully send requests to the private custom domain name.

Optional: Cleaning up the resources

If you’ve configured a test environment with these resources, you can clean up the deployment by following the steps in reverse order.

In the consumer account, delete the Route 53 alias.
In the consumer account, delete the association.
In both the consumer and provider account, remove the RAM resource share.
In the provider account, delete the custom domain name and base path mapping.

Conclusion

In this post, you learned about how clients can connect to private REST API endpoints with API Gateway. With custom domain names, your applications connect to stable URLs that can forward requests to many different private API backends. Furthermore, your application teams can deploy resources in separate line of business AWS accounts and access the private custom domain name as a central shared resource, using AWS RAM resource sharing. This allows your application teams to build secure, private API applications and expose them to API consumers securely and across multiple AWS accounts.

For more details, refer to the API Gateway documentation and check out patterns with API Gateway on Serverless Land.

Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless

2024-11-19 Jagadish Kumar

Post Syndicated from Jagadish Kumar original https://aws.amazon.com/blogs/big-data/introducing-point-in-time-queries-and-sql-ppl-support-in-amazon-opensearch-serverless/

Today we announced support for three new features for Amazon OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and Piped Processing Language (PPL) and Structured Query Language (SQL), which give you new ways to query your data. Querying with SQL or PPL is useful if you’re already familiar with the language or want to integrate your domain with an application that uses them.

OpenSearch Serverless is a powerful and scalable search and analytics engine that enables you to store, search, and analyze large volumes of data while reducing the burden of manual infrastructure provisioning and scaling as you ingest, analyze, and visualize your time series and search data, simplifying data management and enabling you to derive actionable insights from data. The vector engine for OpenSearch Serverless also makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (generative AI) applications without needing to manage the underlying vector database infrastructure.

PIT search

Point in Time (PIT) search lets you run different queries against a dataset that’s fixed in time. Typically, when you run the same query on the same index at different points in time, you receive different results because documents are constantly indexed, updated, and deleted. With PIT, you can query against a state of your dataset for a point in time. Although OpenSearch still supports other ways of paginating results, PIT search provides superior capabilities and performance because it isn’t bound to a query and supports consistent pagination. When you create a PIT for a set of indexes, OpenSearch creates contexts to access data at that point in time and when you use a query with a PIT ID, it searches the contexts that are frozen in time to provide consistent results.

Using PIT involves the following high-level steps:

Create a PIT.
Run search queries with a PIT ID and use the search_after parameter for the next page of results.
Close the PIT.

Create a PIT

When you create a PIT, OpenSearch Serverless provides a PIT ID, which you can use to run multiple queries on the frozen dataset. Even though the indexes continue to ingest data and modify or delete documents, the PIT references the data that hasn’t changed since the PIT creation.

Run a search query with the PIT ID

PIT search isn’t bound to a query, so you can run different queries on the same dataset, which is frozen in time.

When you run a query with a PIT ID, you can use the search_after parameter to retrieve the next page of results. This gives you control over the order of documents in the pages of results.

The following response contains the first 100 documents that match the query. To get the next set of documents, you can run the same query with the last document’s sort values as the search_after parameter, keeping the same sort and pit.id. You can use the optional keep_alive parameter to extend the PIT time.

Close the PIT

When your queries on the dataset are complete, you can delete the PIT using the DELETE operation. PITs automatically expire after the keep_alive duration.

Considerations and limitations

Keep in mind the following limitations when using this feature:

Search slicing is not supported in OpenSearch Serverless
PIT list segment is not supported
The total number of open PITs are restricted to 300 per collection that share the same AWS Key Management Service (AWS KMS) key

SQL and PPL support

OpenSearch Serverless provides a primary query interface called query DSL that you can use to search your data. Query DSL is a flexible language with a JSON interface. In addition to DSL, you can now extract insights out of OpenSearch Serverless using the familiar SQL query syntax.

You can use the SQL and PPL API, the /plugins/_sql and /plugins/_ppl endpoints respectively, to search the data. You can use aggregations, group by, and where clauses to investigate your data and read your data as JSON documents or CSV tables, so you have the flexibility to use the format that works best for you. By default, queries return data in JDBC format. You can specify the response format as JDBC, standard OpenSearch JSON, CSV, or raw.

Use the /plugins/_sql endpoint to send SQL queries to the SQL plugin, as shown in the following example.

Besides basic filtering and aggregation, OpenSearch SQL also supports complex queries, such as querying semi-structured data, set operations, sub-queries and limited JOINs. Beyond the standard functions, OpenSearch functions are provided for better analytics and visualization.

For PPL queries, use the /plugins/_ppl endpoint to send queries to the SQL plugin.

Considerations and limitations

Keep in mind the following:

Query Workbench is not supported for SQL and PPL queries
The SQL and PPL CLI is supported and can be used to issue SQL and PPL queries
DELETE statements are not supported
SQL plugin data sources are not supported
The SQL query stats API is not supported

Summary

In this post, we discussed new features in OpenSearch Serverless. PIT is a useful feature when you need to maintain a consistent view of your data for pagination during search operations. SQL in OpenSearch Service bridges the gap between traditional relational database concepts and the flexibility of OpenSearch’s document-oriented data storage. You can send SQL and PPL queries to the _sql and _ppl endpoints, respectively, and use aggregations, group by, and where clauses to analyze their data.

For more information, refer to :

About the Authors

Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Frank Dattalo is a Software Engineer with Amazon OpenSearch Service. He focuses on the search and plugin experience in Amazon OpenSearch Serverless. He has an extensive background in search, data ingestion, and AI/ML. In his free time, he likes to explore Seattle’s coffee landscape.

Milav Shah is an Engineering Leader with Amazon OpenSearch Service. He focuses on the search experience for OpenSearch customers. He has extensive experience building highly scalable solutions in databases, real-time streaming, and distributed computing. He also possesses functional domain expertise in verticals like Internet of Things, fraud protection, gaming, and ML/AI. In his free time, he likes to ride his bicycle, hike, and play chess.

The serverless attendee’s guide to AWS re:Invent 2024

2024-11-19 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/the-serverless-attendees-guide-to-aws-reinvent-2024/

AWS re:Invent 2024 offers an extensive selection of serverless and application integration content.

AWS re:Invent Banner

For detailed descriptions and schedule, visit the AWS re:Invent Session Catalog.

Join AWS serverless experts and community members at the AWS Modern Apps and Open Source Zone in the AWS Expo Village. This serves as a hub for serverless discussions at re:Invent. While you are there, enjoy a free coffee and learn about serverless architectures at the Serverlesspresso booth. There are two this year, another one at the Certificate Lounge. The AWS Expo Village also includes Serverless and Serverless Containers booths.

Don’t have a ticket yet? Join us in Las Vegas from November 28-December 2, 2022 by registering for re:Invent 2024.

This guide organizes the sessions into categories to help you find the content this is most relevant to you.

Session Types

Breakout Sessions are lecture-style presentations covering architecture, best practices, and deep dives into AWS services.
Workshops are 2-hour hands-on sessions where you work through tasks in AWS accounts using AWS services. Laptops are required and AWS credits are provided.
Chalk Talks are highly interactive 60-minute sessions with smaller audiences, focused on technical deep dives with whiteboards for architectural discussions.
Builders’ Sessions are 60-minute small-group sessions led by an AWS expert who guides you through a technical problem using AWS services.
Code Talks are 60-minute live coding sessions where AWS experts show how to build solutions using AWS services.

Leadership session: Nick Coult, Usman Khalid, Kathleen deValk

SVS211: Celebrating 10 years of pioneering serverless and containers – Breakout.
- Explore how serverless has evolved to help organizations drive the highest performance, availability, and security at low costs.

Getting started sessions

Are you new to serverless or taking your first steps? Hear from AWS experts and customers on best practices and strategies for building serverless workloads. Get hands-on with services by attending a workshop or builders session. Create the next great “to do” app or add a new customer experience for a theme park.

SVS202: Thinking serverless – Chalk Talk
- Learn how to approach building solutions with a serverless mindset by breaking down business problems into serverless building blocks.
SVS205: Building a serverless web application for a theme park – Workshop
- Learn how to build a complete serverless web application for a theme park called Innovator Island.
SVS201: Getting started with serverless patterns – Workshop
- Learn how to recognize and apply common serverless patterns by building production-ready code for a serverless application.
SVS204: Write less code: Building applications with a serverless mindset – Builders Session
- Get more value by using built-in integrations between AWS services through configuration rather than writing glue code.
SVS207: Effectively model costs for your serverless applications – Chalk Talk
- Gain insights into modeling the cost of serverless applications on AWS by considering request loads, payload sizes, and service pricing.
API201: The AWS Step Functions workshop – Workshop
- Learn about the features of AWS Step Functions through hands-on interactive modules.
API204: Building event-driven architectures – Workshop
- Learn about the basics of event-driven design using examples involving Amazon SNS, Amazon SQS, AWS Lambda, Amazon EventBridge, and more.
API205: Unlock the power of an exceptional serverless developer experience – Code Talk
- Learn how to accelerate your serverless development with AWS tools, including Amazon Q Developer integrated into IDEs.
SEG209: Getting started building serverless SaaS architectures
- Discover how to build your first serverless application, and learn how to handle multi-tenant architectures for SaaS applications.

Understanding serverless architectures

SVS208: Balance consistency and developer freedom with platform engineering – Breakout
- Learn how platform teams can provide opinionated security, cost, observability, reliability, and sustainability patterns while maintaining developer flexibility.
SVS209: Containers or serverless functions: A path for cloud-native success – Breakout
- Explore the fundamental differences between containers and serverless functions through real-world scenarios and insights into choosing the right approach.
OPN301: Level up your serverless applications with Powertools for AWS Lambda – Workshop
- Learn why Powertools for AWS Lambda can be the developer toolkit of choice for serverless workloads.
DEV341: From single to multi-tenant: Scaling a mission-critical serverless app
- Explore how to transition a mission-critical application from a single-tenant to a multi-tenant architecture
DEV337: Zero to production serverless in 8 weeks
- Hear about a real-world project journey, from concept to production in only eight weeks. Expect practical insights, mistakes, tips, and how using the right technologies and development process can deliver results fast.

Building event-driven applications

API204: Building event-driven architectures – Workshop
- Learn about the basics of event-driven design using examples involving Amazon SNS, Amazon SQS, AWS Lambda, Amazon EventBridge, and more.
API206: How event-driven architectures can go wrong and how to fix them – Chalk Talk
- Explore common event-driven pitfalls including YOLO events, god events, observability soup, event loops, and surprise bills.
DEV321: Choosing the right serverless compute services
- Learn when to use AWS serverless compute services like AWS Lambda and Amazon ECS on AWS Fargate and how to integrate them into your application architectures.
API307: Event-driven architectures at scale: Manage millions of events – Breakout
- Discover proven patterns for building high-scale event-driven systems that can be effectively managed across a distributed organization with Amazon EventBridge.
SVS206: Building an event sourcing system using AWS serverless technologies – Chalk Talk
- Explore strategies for building effective event sourcing architectures using AWS serverless technologies to store application state as an append-only event log.
COP408: Coding for serverless observability
- Join this code talk to learn best practices for collecting signals from your serverless applications. Dive deep into techniques to effectively instrument your applications to provide you with optimal observability.

Incorporating orchestration

API201: The AWS Step Functions workshop – Workshop
- Learn about the features of AWS Step Functions through hands-on interactive modules.
API203: Building common orchestrated workflows with AWS Step Functions – Builders Session
- Build three orchestrated workflows, including streamlined data processing with Distributed Map state, external system integration using callback, and implementing the saga pattern.
API207: Optimize data processing with built-in AWS Step Functions features – Chalk Talk
- Learn to optimize your serverless data processing workflows at scale using AWS Step Functions features, including intrinsic functions and Distributed Map state.
API402: Building advanced workflows with AWS Step Functions – Breakout
- Learn how you can use generative AI to generate state machines automatically from textual descriptions and chat with your workflow to optimize it.

Understanding integration patterns

API208: Building an integration strategy for the future – Breakout
- Boost productivity and create better customer experiences by building a modern integration strategy using AWS application, data, and file integration services.
API306: Integration patterns for distributed systems – Breakout
- Learn about common design trade-offs for distributed systems and how to navigate them with design patterns, illustrated with real-world examples.
API311: Application integration for platform builders – Breakout
- Explore the implementation of application integration using serverless components in enterprise environments.

Building APIs and frontends

SVS203: Create your first API from scratch with OpenAPI and Amazon API Gateway – Builders Session
- Learn how to design and provision complete APIs using infrastructure as code following the OpenAPI specification.
API303: Building modern API architectures: Which front door should I use? – Chalk Talk
- Explore options for building modern APIs including REST, GraphQL, and real-time APIs along with their benefits and drawbacks.
API304: Building rate-limited solutions on AWS – Chalk Talk
- Learn some of the best ways to build rate limiting into your systems for improved reliability.
API305: Asynchronous frontends: Building seamless event-driven experiences – Breakout
- Explore patterns to enable asynchronous, event-driven integrations with the frontend designed for architects and frontend, backend, and full-stack engineers.

Diving deep into advanced topics

SVS401: Best practices for serverless developers – Breakout
- Discover architectural best practices, optimizations, and useful shortcuts for building production-ready serverless workloads.
SVS403: From serverful to serverless Java – Workshop
- Learn how to bring your traditional Java Spring application to AWS Lambda with minimal effort and iteratively apply optimizations.
SVS406: Scale streaming workloads with AWS Lambda – Chalk Talk
- Learn how to implement parallel processing techniques for ordered and unordered use cases to address throughput limitations in streaming data processing.

Processing data

SVS404: Building serverless distributed data processing workloads – Workshop
- Learn how serverless technologies like AWS Step Functions and AWS Lambda can help you simplify management and scaling of distributed data processing.
API401: Multi-tenant Amazon SQS queues: Mitigating noisy neighbors – Chalk Talk
- Explore advanced strategies for managing multi-tenant Amazon SQS queues and effective mitigation techniques, including shuffle sharding and overflow queues.
SVS321: AWS Lambda and Apache Kafka for real-time data processing applications – Breakout
- Gain practical insights into building scalable, serverless data processing applications by integrating AWS Lambda with Apache Kafka.

Incorporating generative AI

API209: Generative AI at scale: Serverless workflows for enterprise-ready apps – Workshop
- Learn to build enterprise-ready, scalable generative AI applications that can scale from serving 100 to 100,000 users.
API310: Build a meeting summarization solution with generative AI & serverless – Code Talk
- See live coding of a serverless application for producing meeting summaries with generative AI using Amazon Transcribe and Amazon Bedrock, orchestrated with AWS Step Functions.
SVS319: Unlock the power of generative AI with AWS Serverless – Breakout
- Learn to harness AWS Serverless to build robust, cost-effective generative AI applications. Explore using AWS Step Functions to orchestrate complex AI workflows.
SVS325: Secure access to enterprise generative AI with serverless AI gateway – Chalk Talk
- Explore how to architect a serverless AI gateway on AWS to securely integrate and consume large language models from multiple providers.

Additional resources

For social activities see the Unofficial list of AWS re:Invent Conference and Vendor Parties.

If you are attending re:Invent, connect at our AWS Modern Apps and Open Source Zone in the AWS Expo Village. The AWS Expo Village also includes Serverless and Serverless Containers booths.

If you can not join us in-person, breakout sessions will be available via our YouTube channel after the event.

We look forward to seeing you at re:Invent 2024! For more serverless learning resources, visit Serverless Land.

The challenge

Solution overview

Design

Maximum concurrency solution

Performance and cost savings

Future plans

Conclusion

About the authors

Solution overview

Considerations and best practices

Monitoring and troubleshooting

Future enhancements

Conclusion

About the Author

The need for 1024 RPUs

When to consider using 1024 RPUs

Balancing performance and cost

When to avoid larger base capacity

Comparison and cost-effectiveness

Conclusion

About the authors

Overview

Introducing new output transformations through batch inference input generation workflow

Using the new output transformations to export in JSONL format

Introducing the new ItemReader for JSONL using batch inference output processing workflow

Using the new InputType to read the JSONL inference results

Using S3 events as connective tissue between the workflows

Key considerations

Conclusion

#10 Deploy Stable Diffusion ComfyUI on AWS elastically and efficiently

#9 Let’s Architect! Designing Well-Architected systems

#8 Let’s Architect! Learn About Machine Learning on AWS

#7 Creating an organizational multi-Region failover strategy

#6 Building a three-tier architecture on a budget

#5 Announcing updates to the AWS Well-Architected Framework guidance

#4 Let’s Architect! Serverless developer experience in AWS

#3 London Stock Exchange Group uses chaos engineering on AWS to improve resilience

#2 Achieving Frugal Architecture using the AWS Well-Architected Framework Guidance

#1 How an insurance company implements disaster recovery of 3-tier applications

Thank you!

Configuring EventBridge delivery rule targets for cross-account event delivery

Showing cross-account delivery in action

Conclusion

Serverless at re:Invent 2024

AWS Lambda and Amazon Elastic Container Service (Amazon ECS) 10-year anniversary.

AWS Lambda

Amazon ECS and AWS Fargate

Amazon EventBridge

AWS Step Functions

Amazon Kinesis

Amazon MQ

Amazon Finch

Amazon Simple Queue Service (SQS)

Amazon Managed Streaming for Apache Kafka(Amazon MSK)

AWS Amplify

AWS AppSync

Amazon API Gateway

Serverless blog posts

October

November

Serverless Office Hours

October

November

Still looking for more?

Solution overview

Data orchestration

Data migration

Data ingestion

Data processing

Data maintenance

Conclusion

About the Authors

Architecture

Imagery step function

Conclusion

See you next time!

Overview

What’s new

Activating Provisioned Mode for ESM

Activating using AWS CLI