Tag Archives: serverless

AWS Week in Review – May 16, 2022

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-16-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you in person. The Serverless Developer Advocates are going around many of the AWS Summits with the Serverlesspresso booth. If you attend an event that has the booth, say “Hi 👋” to my colleagues, and have a coffee while asking all your serverless questions. You can find all the upcoming AWS Summits in the events section at the end of this post.

Last week’s launches
Here are some launches that got my attention during the previous week.

AWS Step Functions announced a new console experience to debug your state machine executions – Now you can opt-in to the new console experience of Step Functions, which makes it easier to analyze, debug, and optimize Standard Workflows. The new page allows you to inspect executions using three different views: graph, table, and event view, and add many new features to enhance the navigation and analysis of the executions. To learn about all the features and how to use them, read Ben’s blog post.

Example on how the Graph View looks

Example on how the Graph View looks

AWS Lambda now supports Node.js 16.x runtime – Now you can start using the Node.js 16 runtime when you create a new function or update your existing functions to use it. You can also use the new container image base that supports this runtime. To learn more about this launch, check Dan’s blog post.

AWS Amplify announces its Android library designed for Kotlin – The Amplify Android library has been rewritten for Kotlin, and now it is available in preview. This new library provides better debugging capacities and visibility into underlying state management. And it is also using the new AWS SDK for Kotlin that was released last year in preview. Read the What’s New post for more information.

Three new APIs for batch data retrieval in AWS IoT SiteWise – With this new launch AWS IoT SiteWise now supports batch data retrieval from multiple asset properties. The new APIs allow you to retrieve current values, historical values, and aggregated values. Read the What’s New post to learn how you can start using the new APIs.

AWS Secrets Manager now publishes secret usage metrics to Amazon CloudWatch – This launch is very useful to see the number of secrets in your account and set alarms for any unexpected increase or decrease in the number of secrets. Read the documentation on Monitoring Secrets Manager with Amazon CloudWatch for more information.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other launches and news that you may have missed:

IBM signed a deal with AWS to offer its software portfolio as a service on AWS. This allows customers using AWS to access IBM software for automation, data and artificial intelligence, and security that is built on Red Hat OpenShift Service on AWS.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish. This week’s episode introduces you to Amazon DynamoDB and shares stories on how different customers use this database service. You can listen to all the episodes directly from your favorite podcast app or the podcast web page.

AWS Open Source News and Updates – Ricardo Sueiras, my colleague from the AWS Developer Relation team, runs this newsletter. It brings you all the latest open-source projects, posts, and more. Read edition #112 here.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

You can register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

Send email using Workers with MailChannels

Post Syndicated from Erwin van der Koogh original https://blog.cloudflare.com/sending-email-from-workers-with-mailchannels/

Send email using Workers with MailChannels

Send email using Workers with MailChannels

Here at Cloudflare we often talk about HTTP and related protocols as we work to help build a better Internet. However, the Simple Mail Transfer Protocol (SMTP) — used to send emails — is still a massive part of the Internet too.

Even though SMTP is turning 40 years old this year, most businesses still rely on email to validate user accounts, send notifications, announce new features, and more.

Sending an email is simple from a technical standpoint, but getting an email actually delivered to an inbox can be extremely tricky. Because of the enormous amount of spam that is sent every single day, all major email providers are very wary of things like new domains and IP addresses that start sending emails.

That is why we are delighted to announce a partnership with MailChannels. MailChannels has created an email sending service specifically for Cloudflare Workers that removes all the friction associated with sending emails. To use their service, you do not need to validate a domain or create a separate account. MailChannels filters spam before sending out an email, so you can feel safe putting user-submitted content in an email and be confident that it won’t ruin your domain reputation with email providers. But the absolute best part? Thanks to our friends at MailChannels, it is completely free to send email.

In the words of their CEO Ken Simpson: “Cloudflare Workers and Pages are changing the game when it comes to ease of use and removing friction to get started. So when we sat down to see what friction we could remove from sending out emails, it turns out that with our incredible anti-spam and anti-phishing, the answer is “everything”. We can’t wait to see what applications the community is going to build on top of this.”

The only constraint currently is that the integration only works when the request comes from a Cloudflare IP address. So it won’t work yet when you are developing on your local machine or running a test on your build server.

First let’s walk you through how to send out your first email using a Worker.

export default {
  async fetch(request) {
    send_request = new Request('https://api.mailchannels.net/tx/v1/send', {
      method: 'POST',
      headers: {
        'content-type': 'application/json',
      body: JSON.stringify({
        personalizations: [
            to: [{ email: '[email protected]', name: 'Test Recipient' }],
        from: {
          email: '[email protected]',
          name: 'Workers - MailChannels integration',
        subject: 'Look! No servers',
        content: [
            type: 'text/plain',
            value: 'And no email service accounts and all for free too!',

That is all there is to it. You can modify the example to make it send whatever email you want.

The MailChannels integration makes it easy to send emails to and from anywhere with Workers. However, we also wanted to make it easier to send emails to yourself from a form on your website. This is perfect for quickly and painlessly setting up pages such as “Contact Us” forms, landing pages, and sales inquiries.

The Pages Plugin Framework that we announced earlier this week allows other people to email you without exposing your email address.

The only thing you need to do is copy and paste the following code snippet in your /functions/_middleware.ts file. Now, every form that has the data-static-form-name attribute will automatically be emailed to you.

import mailchannelsPlugin from "@cloudflare/pages-plugin-mailchannels";

export const onRequest = mailchannelsPlugin({
  personalizations: [
      to: [{ name: "ACME Support", email: "[email protected]" }],
  from: { name: "Enquiry", email: "[email protected]" },
  respondWith: () =>
    new Response(null, {
      status: 302,
      headers: { Location: "/thank-you" },

Here is an example of what such a form would look like. You can make the form as complex as you like, the only thing it needs is the data-static-form-name attribute. You can give it any name you like to be able to distinguish between different forms.

<!DOCTYPE html>
    <form data-static-form-name="contact">
        <label>Name<input type="text" name="name" /></label>
        <label>Email<input type="email" name="email" /></label>
        <label>Message<textarea name="message"></textarea></label>
      <button type="submit">Send!</button>

So as you can see there is no barrier left when it comes to sending out emails. You can copy and paste the above Worker or Pages code into your projects and immediately start to send email for free.

If you have any questions about using MailChannels in your Workers, or want to learn more about Workers in general, please join our Cloudflare Developer Discord server.

Node.js 16.x runtime now available in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/node-js-16-x-runtime-now-available-in-aws-lambda/

This post is written by Dan Fox, Principal Specialist Solutions Architect, Serverless.

You can now develop AWS Lambda functions using the Node.js 16 runtime. This version is in active LTS status and considered ready for general use. To use this new version, specify a runtime parameter value of nodejs16.x when creating or updating functions or by using the appropriate container base image.

The Node.js 16 runtime includes support for ES modules and top-level await that was added to the Node.js 14 runtime in January 2022. This is especially useful when used with Provisioned Concurrency to reduce cold start times.

This runtime version is supported by functions running on either Arm-based AWS Graviton2 processors or x86-based processors. Using the Graviton2 processor architecture option allows you to get up to 34% better price performance.

We recognize that customers have been waiting for some time for this runtime release. We hear your feedback and plan to release the next Node.js runtime version in a timelier manner.

AWS SDK for JavaScript

The Node.js 16 managed runtimes and container base images bundle the AWS JavaScript SDK version 2. Using the bundled SDK is convenient for a few use cases. For example, developers writing short functions via the Lambda console or inline functions via CloudFormation templates may find it useful to reference the bundled SDK.

In general, however, including the SDK in the function’s deployment package is good practice. Including the SDK pins a function to a specific minor version of the package, insulating code from SDK API or behavior changes. To include the SDK, refer to the SDK package and version in the dependency object of the package.json file. Use a package manager like npm or yarn to download and install the library locally before building and deploying your deployment package to the AWS Cloud.

Customers who take this route should consider using the JavaScript SDK, version 3. This version of the SDK contains modular packages. This can reduce the size of your deployment package, improving your function’s performance. Additionally, version 3 contains improved TypeScript compatibility, and using it will maximize compatibility with future runtime releases.

Language updates

With this release, Lambda customers can take advantage of new Node.js 16 language features, including:

Prebuilt binaries for Apple Silicon

Node.js 16 is the first runtime release to ship prebuilt binaries for Apple Silicon. Customers using M1 processors in Apple computers may now develop Lambda functions locally using this runtime version.

Stable timers promises API

The timers promises API offers timer functions that return promise objects, improving the functionality for managing timers. This feature is available for both ES modules and CommonJS.

You may designate your function as an ES module by changing the file name extension of your handler file to .mjs, or by specifying “type” as “module” in the function’s package.json file. Learn more about using Node.js ES modules in AWS Lambda.

  // index.mjs
  import { setTimeout } from 'timers/promises';

  export async function handler() {
    await setTimeout(2000);

RegExp match indices

The RegExp match indices feature allows developers to get an array of the start and end indices of the captured string in a regular expression. Use the “/d” flag in your regular expression to access this feature.

  // handler.js
  exports.lambdaHandler = async () => {
    const matcher = /(AWS )(Lambda)/d.exec('AWS Lambda');
    console.log("match: " + matcher.indices[0]) // 0,10
    console.log("first capture group: " + matcher.indices[1]) // 0,4
    console.log("second capture group: " + matcher.indices[2]) // 4,10

Working with TypeScript

Many developers using Node.js runtimes in Lambda develop their code using TypeScript. To better support TypeScript developers, we have recently published new documentation on using TypeScript with Lambda, and added beta TypeScript support to the AWS SAM CLI.

We are also working on a TypeScript version of Lambda PowerTools. This is a suite of utilities for Lambda developers to simplify the adoption of best practices, such as tracing, structured logging, custom metrics, and more. Currently, AWS Lambda Powertools for TypeScript is in beta developer preview.

Runtime updates

To help keep Lambda functions secure, AWS will update Node.js 16 with all minor updates released by the Node.js community when using the zip archive format. For Lambda functions packaged as a container image, pull, rebuild, and deploy the latest base image from DockerHub or the Amazon ECR Public Gallery.

Amazon Linux 2

The Node.js 16 managed runtime, like Node.js 14, Java 11, and Python 3.9, is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.


Lambda now supports Node.js 16. Get started building with Node.js 16 by specifying a runtime parameter value of nodejs16.x when creating your Lambda functions using the zip archive packaging format.

You can also build Lambda functions in Node.js 16 by deploying your function code as a container image using the Node.js 16 AWS base image for Lambda. You can read about the Node.js programming model in the AWS Lambda documentation to learn more about writing functions in Node.js 16.

For existing Node.js functions, review your code for compatibility with Node.js 16 including deprecations, then migrate to the new runtime by changing the function’s runtime configuration to nodejs16.x.

For more serverless learning resources, visit Serverless Land.

Orchestrate big data jobs on on-premises clusters with AWS Step Functions

Post Syndicated from Göksel SARIKAYA original https://aws.amazon.com/blogs/big-data/orchestrate-big-data-jobs-on-on-premises-clusters-with-aws-step-functions/

Customers with specific needs to run big data compute jobs on an on-premises infrastructure often require a scalable orchestration solution. For large-scale distributed compute clusters, the orchestration of jobs must be scalable to maximize their utilization, while at the same time remain resilient to any failures to prevent blocking the ever-growing influx of data and jobs. Moreover, on-premises compute resources can’t be extended on demand, therefore, the jobs may be competing for the same resources with different priorities.

This post showcases serverless building blocks for orchestrating big data jobs using AWS Step Functions, AWS Lambda, and Amazon DynamoDB with a focus on reliability, maintainability, and monitoring. In this solution, Step Functions enables thousands of workflows to run parallel. Additionally, Lambda provides flexibility implementing arbitrary interfaces to the on-premises infrastructure and its compute resources. With additional steps in the orchestration, the solution also allows operations to monitor thousands of parallel jobs in a visual interface for better debugging.


The proposed serverless solution consists of the following main components:

  • Job trigger – Requests new compute jobs to run on the on-premises cluster. For simplicity, in this architecture we assume that the trigger is a client calling Step Functions directly. However, you could extend this to include Amazon API Gateway to create a job API to interface with the orchestration solution or a rule engine to trigger jobs when relevant data becomes available.
  • Job manager – This Step Functions workflow runs once per compute job, with multiple workflows running in parallel. It tracks the status of a job from queueing, scheduling, running, retrying, all the way to its completion. Ideally, a job can be scheduled immediately, but workflows can run for days if a job is very low priority and compute resources are sparse. The job manager delegates the decision when or where to run the job to the job queue manager. Communication to the on-premises cluster is abstracted through a Lambda adapter.
  • Job queue manager – Maintains a queue of all jobs. With the given job properties (for example based on priority), the job queue manager decides the running time of jobs, and the cluster on which they run. To illustrate the concept, the architecture considers real-time information on the resource utilization of the compute clusters (memory, CPU) for scheduling. However, you could apply different scheduling algorithms as required given the flexibility of Lambda.
  • On-premises compute cluster – Provides the computing resources, data nodes, and tools to run compute jobs.

The following diagram illustrates the solution architecture.

Solution Architecture

The main process of the solution consists of seven steps:

  1. The job trigger runs a new Step Functions workflow to run a compute job on premises and provides the necessary information (such as priority and required resources).
  2. The job manager creates a new record in DynamoDB to add the job to the queue of the job queue manager, and the workflow waits for the job queue manager to call back.
  3. Amazon EventBridge triggers a scheduled Lambda function in the job queue manager periodically (for example, every 5 minutes), decoupled from the job requests.
  4. The job scheduler Lambda function retrieves real-time information from cluster metrics to see whether jobs can be scheduled at this point in time.
  5. The job scheduler function fetches all queued jobs from DynamoDB and tries to schedule as many of those jobs as possible to available compute resources based on priority, as well as memory and CPU demands.
  6. For each job that can be scheduled to the compute cluster, the job scheduler function communicates back to the job manager to signal the workflow to continue and that the job can be run.
  7. The job manager communicates with the on-premises cluster through the compute cluster adapter Lambda function to run the job, track its status periodically, and retry in case of errors.

On-premises compute cluster

In this post, we assume the on-premises compute cluster offers interfaces to interact with the compute resources. For example, customers could run a Spark compute cluster on premises that allows the following basic interactions through an API:

  • Upload and trigger a compute job on a cluster (for example, upload a Spark JAR file and submit)
  • Get the status of a compute job (such as running, stopped, or error)
  • Get error output in case of failures in the compute job (for example, the job failed due to access denied)

In addition, we assume the cluster can provide metrics on its current utilization. For example, the cluster could provide Prometheus metrics as aggregates over all resources within a compute cluster:

  • Memory utilization (for example, 2 TB with 80% utilization)
  • CPU utilization (for example, 5,000 cores with 50% utilization)

We use the terminology introduced here for the example in this post. Depending on the capabilities of the on-premises cluster, you can adjust these concepts. For example, the compute cluster could use Kubernetes or SLURM instead of Spark.

Job manager

The job manager is responsible for communicating with on-premises clusters to trigger big data jobs and query their status. It’s a Step Functions state machine that consists of three steps, as illustrated in the following figure.

The first step is JobQueueRequest, which makes a request to the job queue manager component and waits for the callback. When the job queue manager sends OK to the waiting step with a callback pattern, the second step StartJobRun runs.

The StartJobRun step communicates with the on-premises environment (for example, via HTTP post to a REST API endpoint) to trigger an on-premises job.

The third step GetJobStatus queries the job status from the on-premises cluster. If the job status is InProgress, the state machine waits for a configured time. When the Wait state is over, it returns to the GetJobStatus step to query the job status again in a loop. When the job returns a successful state from the on-premises cluster, the state machine completes its cycle with a Success state. If the job fails with a timeout or with an error, the state machine completes its cycle with a Fail state.

The following screenshot shows the details of the state machine on the Step Functions console.

Jpob Manager Step Function Inputs

Job queue manager

The job queue manager is responsible for managing job queues based on job priorities and cluster utilization. It consists of DynamoDB, Lambda, and EventBridge.

The JobQueue table keeps data of waiting jobs, including jobId as the primary key, priority as the sort key, needed memory and CPU consumptions, callbackId, and timestamp information. You can add further information to the table dynamically if required by the scheduling algorithm.

The following screenshot shows the attribute details of the JobQueue table.

EventBridge triggers the job scheduler Lambda function on a regular bases in a configured interval. First, the job scheduler function gets waiting jobs data from the JobQueue table in DynamoDB. Then it establishes a connection with the on-premises cluster to fetch cluster metrics such as memory and CPU utilization. Based on this information, the function decides which jobs are ready to be triggered on the on-premises cluster.

The scheduling algorithm proposed here follows a simple concept to maximize resource utilization, while respecting the job priority. Essentially, for an on-premises cluster (we could potentially have multiple in different geographies), the job scheduler Lambda function builds a queue of jobs according to their priority, while allocating the first job in the queue to compute resources on the cluster. If enough resources are available, the scheduler moves to the next job in the queue and repeats.

Due to the flexibility of Lambda functions, you can tailor the scheduling algorithm for a specific use case. Cluster scheduling algorithms are still an open research topic with different optimization goals, such as throughput, data location, fairness, deadlines, and more.

Get started

In this section, we provide a starting point for the solution described in this post. The steps walk you through creating a Step Functions state machine with the appropriate template, and the necessary Lambda and DynamoDB interactions to create the job manager and job queue manager building blocks. Example code for the Lambda functions is excluded from this post, because the communication with the on-premises cluster to trigger jobs can vary depending on your on-premises interface.

  1. On the Step Functions console, choose State machines.
  2. Choose Create state machine.
  3. Select Run a sample project.
  4. Select Job Poller.
    Job Poller State Machine Template
  5. Scroll down to see the sample projects, which are defined using Amazon States Language (ASL).
  6. Review the example definition, then choose Next.Job Manager Step Functions Template
  7. Choose Deploy resources.
    Deployment can take up to 10 minutes.Step Functions Deploy Resources
    The deployment creates the state machine that is responsible for job management. After you deploy the resources, you need to edit the sample ASL code to add the extra JobQueueRequest step in the state machine.
  8. Select the created state machine.
  9. Choose Edit to add ARNs of the three Lambda functions to make a request in the job queue manager (Job Queue Request), to submit a job to the on-premises cluster (Submit Job), and to poll the status of the jobs (Get Job Status).Job Manager Step Functions Definition
    Now you’re ready to create the job queue manager.
  10. On the DynamoDB console, create a table for storing job metadata.
  11. On the EventBridge console, create a scheduled rule that triggers the Lambda function at a configured interval.
  12. On the Lambda console, create the function that communicates with the on-premises cluster to fetch cluster metrics. It also gets jobs from the DynamoDB table to retrieve information including job priorities, required memory, and CPU to run the job on the on-premises cluster.


This solution uses Step Functions to track all jobs until completion, and therefore the Step Functions quotas must be considered for potential use cases. Mainly, a workflow can run for a maximum of 1 year (cannot be increased) and by default 1 million parallel runs can run in a single account (can be increased to millions). See Quotas for further details.


This post described how to orchestrate big data jobs running in parallel on on-premises clusters with a Step Functions workflow. To learn more about how to use Step Functions workflows for serverless orchestration, visit Serverless Land.

About the Authors

Göksel Sarikaya is a Senior Cloud Application Architect at AWS Professional Services. He enables customers to design scalable, high-performance, and cost effective applications using the AWS Cloud. He helps them to be more flexible and competitive during their digital transformation journey.

Nicolas Jacob Baer is a Senior Cloud Application Architect with a strong focus on data engineering and machine learning, based in Switzerland. He works closely with enterprise customers to design data platforms and build advanced analytics/ml use-cases.

Shukhrat Khodjaev is a Senior Engagement Manager at AWS ProServe, based out of Berlin. He focuses on delivering engagements in the field of Big Data and AI/ML that enable AWS customers to uncover and to maximize their value through efficient use of data.

Durable Objects Alarms — a wake-up call for your applications

Post Syndicated from Matt Alonso original https://blog.cloudflare.com/durable-objects-alarms/

Durable Objects Alarms — a wake-up call for your applications

Durable Objects Alarms — a wake-up call for your applications

Since we launched Durable Objects, developers have leveraged them as a novel building block for distributed applications.

Durable Objects provide globally unique instances of a JavaScript class a developer writes, accessed via a unique ID. The Durable Object associated with each ID implements some fundamental component of an application — a banking application might have a Durable Object representing each bank account, for example. The bank account object would then expose methods for incrementing a balance, transferring money or any other actions that the application needs to do on the bank account.

Durable Objects work well as a stateful backend for applications — while Workers can instantiate a new instance of your code in any of Cloudflare’s data centers in response to a request, Durable Objects guarantee that all requests for a given Durable Object will reach the same instance on Cloudflare’s network.

Each Durable Object is single-threaded and has access to a stateful storage API, making it easy to build consistent and highly-available distributed applications on top of them.

This system makes distributed systems’ development easier — we’ve seen some impressive applications launched atop Durable Objects, from collaborative whiteboarding tools to conflict-free replicated data type (CRDT) systems for coordinating distributed state launch.

However, up until now, there’s been a piece missing — how do you invoke a Durable Object when a client Worker is not making requests to it?

As with any distributed system, Durable Objects can become unavailable and stop running. Perhaps the machine you were running on was unplugged, or the datacenter burned down and is never coming back, or an individual object exceeded its memory limit and was reset. Before today, a subsequent request would reinitialize the Durable Object on another machine, but there was no way to programmatically wake up an Object.

Durable Objects Alarms are here to change that, unlocking new use cases for Durable Objects like queues and deferred processing.

What is a Durable Object Alarm?

Durable Object Alarms allow you, from within your Durable Object, to schedule the object to be woken up at a time in the future. When the alarm’s scheduled time comes, the Durable Object’s alarm() handler will be called. If this handler throws an exception, the alarm will be automatically retried using exponential backoff until it succeeds — alarms have guaranteed at-least-once execution.

How are Alarms different from Workers Cron Triggers?

Alarms are more fine-grained than Cron Triggers. While a Workers service can have up to three Cron Triggers configured at once, it can have an unlimited amount of Durable Objects, each of which can have a single alarm active at a time.

Alarms are directly scheduled from and invoke a function within your Durable Object. Cron Triggers, on the other hand, are not programmatic — they execute based on their schedules, which have to be configured via the Cloudflare Dashboard or centralized configuration APIs.

How do I use Alarms?

First, you’ll need to add the durable_object_alarms compatibility flag to your wrangler.toml.

compatibility_flags = ["durable_object_alarms"]

Next, implement an alarm() handler in your Durable Object that will be called when the alarm executes. From anywhere else in your Durable Object, call state.storage.setAlarm() and pass in a time for the alarm to run at. You can use state.storage.getAlarm() to retrieve the currently set alarm time.

In this example, we implemented an alarm handler that wakes the Durable Object up once every 10 seconds to batch requests to a single Durable Object, deferring processing until there is enough work in the queue for it to be worthwhile to process them.

export default {
  async fetch(request, env) {
    let id = env.BATCHER.idFromName("foo");
    return await env.BATCHER.get(id).fetch(request);

const SECONDS = 1000;

export class Batcher {
  constructor(state, env) {
    this.state = state;
    this.storage = state.storage;
    this.state.blockConcurrencyWhile(async () => {
      let vals = await this.storage.list({ reverse: true, limit: 1 });
      this.count = vals.size == 0 ? 0 : parseInt(vals.keys().next().value);
  async fetch(request) {

    // If there is no alarm currently set, set one for 10 seconds from now
    // Any further POSTs in the next 10 seconds will be part of this kh.
    let currentAlarm = await this.storage.getAlarm();
    if (currentAlarm == null) {
      this.storage.setAlarm(Date.now() + 10 * SECONDS);

    // Add the request to the batch.
    await this.storage.put(this.count, await request.text());
    return new Response(JSON.stringify({ queued: this.count }), {
      headers: {
        "content-type": "application/json;charset=UTF-8",
  async alarm() {
    let vals = await this.storage.list();
    await fetch("http://example.com/some-upstream-service", {
      method: "POST",
      body: Array.from(vals.values()),
    await this.storage.deleteAll();
    this.count = 0;

Once every 10 seconds, the alarm() handler will be called. In the event an unexpected error terminates the Durable Object, it will be re-instantiated on another machine, following a short delay, after which it can continue processing.

Under the hood, Alarms are implemented by making reads and writes to the storage layer. This means Alarm get and set operations follow the same rules as any other storage operation – writes are coalesced with other writes, and reads have a defined ordering. See our blog post on the caching layer we implemented for Durable Objects for more information.

Durable Objects Alarms guarantee fault-tolerance

Alarms are designed to have no single point of failure and to run entirely on our edge – every Cloudflare data center running Durable Objects is capable of running alarms, including migrating Durable Objects from unhealthy data centers to healthy ones as necessary to ensure that their Alarm executes. Single failures should resolve in under 30 seconds, while multiple failures may take slightly longer.

We achieve this by storing alarms in the same distributed datastore that backs the Durable Object storage API. This allows alarm reads and writes to behave identically to storage reads and writes and to be performed atomically with them, and ensures that alarms are replicated across multiple datacenters.

Within each data center capable of running Durable Objects, there are multiple processes responsible for tracking upcoming alarms and triggering them, providing fault tolerance and scalability within the data center. A single elected leader in each data center is responsible for detecting failure of other data centers and assigning responsibility of those alarms to healthy local processes in its own data center. In the event of leader failure, another leader will be elected and become responsible for executing Alarms in the data center. This allows us to guarantee at-least-once execution for all Alarms.

How do I get started?

Alarms are a great way to build new distributed primitives, like queues, atop Durable Objects. They also provide a method for guaranteeing work within a Durable Object will complete, without relying on a client request to “kick” the Object.

You can get started with Alarms now by enabling Durable Objects in the Cloudflare dashboard. For more info, check the developer docs or jump in our Discord.

Come join us at Cloudflare Connect New York this Thursday!

Post Syndicated from Jen Taylor original https://blog.cloudflare.com/cloudflare-connect-nyc-2022/

Come join us at Cloudflare Connect New York this Thursday!

Come join us at Cloudflare Connect New York this Thursday!

We take a break from Platform Week to share big news – we’re going to New York this week for our Cloudflare Connect customer event.

We’re packing our bags, getting on planes and heading to New York to do our first live customer event since 2019 and we could not be more excited.  It is time with you – the people building, delivering and securing the apps and networks we know and trust – that are the inspiration for the innovation we deliver.  We can’t wait to spend time with you.

Our co-founder and CEO Matthew Prince will kick off the day with his view from the top.  We’ll then be breaking out into focused conversations to dig in on our latest product news and roadmaps.

Excited about what we’re talking about for Platform Week?  Come chat with the Workers team in person and hear more about the roadmap.

Intrigued by the latest DDoS stats we posted and want to learn more?  Meet with the team analyzing the attacks and learn about where we go from here.

Not sure where to start your Zero Trust journey?  We’ll talk you through what we’re seeing and introduce you to other customers who are in the process of rolling out Zero Trust solutions for their teams so you can learn from each other.

Don’t miss it!  Register now – use the code BetterInternet to join us in-person for free.  Not in New York?  No worries – we’re coming to London, Sydney and San Francisco later this year.

Debugging AWS Step Functions executions with the new console experience

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/debugging-aws-step-functions-executions-with-the-new-console-experience/

Today, AWS Step Functions introduces a new opt-in console experience that makes it easier to analyze, debug, and optimize Standard Workflows.

Builders create Step Functions workflows to orchestrate multiple services into business-critical applications with minimal code. Customers wanted better ways to debug workflow executions and analyze the payload as it passes through each state.

This blog post explains the new capabilities of the enhanced Step Functions executions page. It shows how to debug workflows quickly, sort and filter on state events, and view the input and output path processing for each state.


The new Executions Details page allows you to inspect executions using three different view types: Graph view, Table view, and Event view. It has multi-level navigation enhancements for analyzing the map state feature, the ability to search execution history based on unique attributes and improved events, and table navigation with filtering, sorting, and pagination. For a full list of all the feature enhancements, see the Step Functions documentation.

Getting started

To get started, go to the Step Functions state machines page in the AWS Management Console:

  1. Choose any standard workflow from the list.
  2. From the workflows executions list page, choose an execution to analyze.
  3. Choose the New executions page button:

An enhanced execution summary section at the top of the page contains some new information:

  1. You can copy the execution ARN to the clipboard by choosing the copy icon.
  2. This section now shows the number of state transitions for the execution. This is helpful in optimizing the cost of your workflows. With Step Functions, you pay for the number of state transitions used per month for each state transition over the AWS Free Tier.
  3. You can view the total execution duration.

The following execution summary is a new section that displays execution errors. This helps to find the root cause of any workflow faults or failures:

Choosing the between execution views

The following examples show how to use the new execution views to inspect a workflow execution. This example focuses on the Order Processing Workflow that powers Serverlesspresso. Serverlesspresso is an interactive serverless application showcased at AWS re:Invent and AWS Summits. It allows attendees to order coffee from their smartphones. Each order starts a workflow execution.

Graph view

The graph view provides a visual representation of the workflow execution path. It shows which states succeeded, failed, or are currently in progress, and any errors caught. The legend at the bottom of the graph helps to decode each color.

To access the Graph view, choose the Graph view from the view navigation, as shown in the following screenshot. There is a new option to render the graph vertically or horizontally, which you can choose from the Layout option.


The graph view shows that the workflow caught an error at the Emit–Workflow Started TT state. I choose this state from the graph to view more details about it.

The Events tab

Each state in a workflow execution moves through a sequence of events, from TaskStateEntered to TaskStateExited. The Events tab, shown in the following image, displays all the events for the selected state, with their corresponding timestamp.

You can drill down into an event to see its output. In this case, the TaskTimedOut event happens to the selected state with the error message “States.Timeout”. This corresponds with the Caught error shown in the Graph view.

The Input & Output tab

Amazon States Language (ASL) enables you to filter and manipulate data at various stages of a workflow state’s execution using paths. A path is a string beginning with $ that lets you identify and filter subsets of JSON text. Learning how to apply these filters helps to build efficient workflows with minimal state transitions.

Select the Input & Output tab, and then toggle the Advanced view option to display the payload after each path process is applied.

This is useful for checking what each JSON path evaluates to. For example, the following images show how I configure the state Parameters, along with the Task input. This is what the parameters evaluate to after Step Functions applies the JSON path processing:

State Details and Definition tabs

The new execution page lets you view a state’s definition and execution details in isolation from the other states. The task details section contains additional information such as the Duration, Heartbeat, Started After and Timeout values.

Table view

The Table view provides a tabular representation of each state. Use this view to access information quickly about a state’s duration, resources, or status. A new timeline column shows the relative duration that each state took to complete. You can configure which columns are displayed.

To search and filter the table based on unique attributes such as state name or error type, start typing into the search input. Use the predictive autocomplete to define the search criteria. You can also choose a relative or absolute time range to filter by.

Event view

Step Functions stores all changes to state as a sequence of events to give you more visibility into the execution. The event view allows you to find and investigate a particular state event quickly by using the search and sort options:

  1. Choose the Timestamp column header to list events in reverse order of occurrence.
  2. Choose the arrow in the left-most column to reveal more details about each event
  3. Use the search input to search by keyword, state name, event type, or attribute.
  4. The date button lets you filter events by date or time.

Map State

The map state (“Type”: “Map”) allows you to run a set of steps for each element of an input array. The Step Functions execution page now helps you investigate and debug workflows using the Map state with the following enhancements:

  1. A hierarchical table view of steps for each iteration.
  2. An integrations overview, showing the execution summary at a glance.
  3. A paginated list of every event across all iterations.

The following Serverlesspresso workflow processes orders in batches. The map state iterates over each order, sanitizes each item, and checks it is currently in stock. If the item is not in stock, the map state throws a failure. If it is in stock, the order is recorded to a database, and a new event is emitted onto the serverless event bus.

The workflow runs for a new batch of orders and produces the following executions results page:

Using the Graph view:

  1. You can step through each iteration using the Map iteration viewer
  2. See the summary status in the iterations overview section

The Table view shows a hierarchical list of each iteration and I can drill down into the execution that failed to investigate further.

Use the Event view to search for failed executions to quickly filter the results:


Today, Step Functions is launching a new opt-in console experience to help builders analyse, debug, and optimize Step Functions Standard Workflows. These enhancements include three different ways to view your workflow executions, better visibility into workflows that use the Map state, fast access to workflow faults and failures, and new tools to search and sort workflow executions. This blog post shows how to use the new views to debug workflows, sort and filter on state events, and view the input and output path processing for each state.

The new console executions experience is generally available in AWS Regions where Step Functions is available.

For more information on building applications with Step Functions, visit serverlessland.com.

Benefits of migrating to event-driven architecture

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/benefits-of-migrating-to-event-driven-architecture/

Two common options when building applications are request-response and event-driven architecture. In request-response architecture, an application’s components communicate via API calls. The client sends a request and expects a response before performing the next task. In event-driven architecture, the client generates an event and can immediately move on to its next task. Different parts of the application then respond to the event as needed.


In this post, you learn about reasons to consider moving from request-response architecture to an event-driven architecture.

Challenges with request-response architecture

When starting to a build a new application, many developers default to a request-response architecture. A request-response architecture may tightly integrate components and those components communicate via synchronous calls. While a request-response approach is often easier to get started with, it can become challenging as your application grows in complexity.

In this post, I review an example request-response ecommerce application and demonstrate the challenges of tightly coupled integrations. Then I show you how building the same application with an event-driven architecture can give you increased scalability, fault tolerance, and developer velocity.

Close coordination between microservices

In a typical ecommerce application that uses a synchronous API, the client makes a request to place an order and the order service sends the request downstream to an invoice service. If successful, the order service responds with a success message or confirmation number.

In this initial stage, this is a straightforward connection between the two services. The challenge comes when you add more services that integrate with the order service.


If you add a fulfillment service and a forecasting service, the order service has more responsibilities and more complexity. The order service must know how to call each service’s API, from the API call structure to the API’s retry semantics. If there are any backwards incompatible changes to the APIs, the order service team must update them. The system forwards heavy traffic spikes to the order service’s dependency, which may not have the same scaling capabilities. Also, dependent services may transmit errors back up the stack to the client.

Error handling and retries

Now, you add new downstream services for fulfillment and shipping orders to the ecommerce application.


In the happy path, everything works as expected: The order service triggers invoicing, payment systems, and updates forecasting. Once payment clears, this triggers the fulfillment and packing of the order, and then informs the shipping service to request tracking information.

However, if the fulfillment center cannot find the product because they are out of stock, then fulfillment might have to alert the invoice service, then reverse the payment or issue a refund. If fulfillment fails, then the system that triggers shipping might fail as well. Forecasting must also be updated to reflect the change. This remediation workflow is all just to address one of the many potential “unhappy paths” that can occur in this API-driven ecommerce application.

Close coordination between development teams

In a synchronously integrated application, teams must coordinate any new services that are added to the application. This can slow down each development team’s ability to release new features. Imagine your team works on the payment service but you weren’t told that another team added a new rewards service. What now happens when the fulfillment service errors?

Fulfillment may orchestrate all the other services. Your payments team gets a message and you undo the payment, but you may not know who handles retries and error logic. If the rewards service changes vendors and has a new API, and does not tell your team, you may not be aware of the new service.

Ultimately, it can be hard to coordinate these orchestrations and workflows as systems become more complex and management adds more services. This is one reason that it can be beneficial to migrate to event-driven architecture.

Benefits of event-driven architecture

Event-driven architecture can help solve the problems of the close coordination of microservices, error handling and retries, and coordination between development teams.

Close coordination between microservices

In event-driven architecture, the publisher emits an event, which is acknowledged by the event bus. The event bus routes events to subscribers, which process events with self-contained business logic. There is no direct communication between publishers and subscribers.

Decoupled applications enable teams to act more independently, which can increase their velocity. For example, with an API-based integration, if your team wants to know about a change that happened in another team’s microservice, you might have to ask that team to make an API call to your service. Consequently, you may have to account for authentication, coordination with the other team over the structure of the API call. This causes back and forth between teams, which slows down development time. With an event-driven application, you can subscribe to events sent from your microservice and the event bus (for example, Amazon EventBridge) takes care of routing the event and handling authentication.

Error handling and retries

Another reason to migrate to event-driven architecture is to handle unpredictable traffic. Ecommerce websites like Amazon.com have variable amounts of traffic depending on the day. Once you place an order, several things happen.

First, Amazon checks your credit card to make sure that funds are available. Then, Amazon has to pack the merchandise and load onto trucks. That all happens in an Amazon fulfillment center. There is no synchronous API call for the Amazon backend to package and ship products. After the system confirms your payment, the front end puts together some information describing the event and puts your account number, credit card info, and what you bought in a packaged event and put it into the cloud and onto a queue. Later, another piece of software removes the event from the queue and starts the packaging and shipping.

The key point about this process is that these processes can all run at different rates. Normally, the rate at which customers place orders and the rate at which the warehouses can get the boxes packed are roughly equivalent. However, on busy days like Prime Day, customers place orders much more quickly than the warehouses can operate.

Ecommerce applications, like Amazon.com, must be able to scale up to handle unpredictable traffic. When a customer places an order, an event bus like Amazon EventBridge receives the event and all of the downstream microservices are able to select the order event for processing. Because each of the microservices can fail independently, there are no single points of failure.

Loose coordination between development teams

Event-driven architectures promote development team independence due to loose coupling between publishers and subscribers. Applications can subscribe to events with routing requirements and business logic that are separate from the publisher and other subscribers. This allows publishers and subscribers to change independently of each other, providing more flexibility to the overall architecture.

Decoupled applications also allow you to build new features faster. Adding new features or extending existing ones can be simpler with event-driven architectures because you either add new events, or modify existing ones. This process removes complexity in your application.


In this post, you learn about the challenges of developing applications with request-response architecture. In request-response architecture, the client must send a request and wait for a response before moving on to its next task. As applications grow in complexity, this tightly coupled architecture can cause issues. Event-driven architectures can increase scalability, fault tolerance, and developer velocity by decoupling components of your application.

For more serverless content, go to serverlessland.com.

Orchestrating Amazon S3 Glacier Deep Archive object retrieval using AWS Step Functions

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/orchestrating-amazon-s3-glacier-deep-archive-object-retrieval-using-aws-step-functions/

This blog was written by Monica Cortes Sack, Solutions Architect, Oskar Neumann, Partner Solutions Architect, and Dhiraj Mahapatro, Principal Specialist SA, Serverless.

AWS Step Functions now support over 220 services and over 10,000 AWS API actions. This enables you to use the AWS SDK integration directly instead of writing an AWS Lambda function as a proxy.

One such service integration is with Amazon S3. Currently, you write scripts using AWS CLI S3 commands to achieve automation around running S3 tasks. For example, S3 integrates with AWS Transfer Family, builds a custom security check, takes action on S3 buckets on S3 object creations, or orchestrates a workflow around S3 Glacier Deep Archive object retrieval. These script executions do not provide an execution history or an easy way to validate the behavior.

Step Functions’ AWS SDK integration with S3 declaratively creates serverless workflows around S3 tasks without relying on those scripts. You can validate the execution history and behavior of a Step Functions workflow.

This blog highlights one of the S3 use cases. It shows how to orchestrate workflows around S3 Glacier Deep Archive object retrieval, cost estimation, and interaction with the requester using Step Functions. The demo application provides additional details on the entire architecture.

S3 Glacier Deep Archive is a storage class in S3 used for data that is rarely accessed. The service provides durable and secure long-term storage, trading immediate access for cost-effectiveness. You must restore archived objects before they are downloadable. It supports two options for object retrieval:

  1. Standard – Access objects within 12 hours of the start of the restoration process.
  2. Bulk – Access objects within 48 hours of the start of the restoration process.

Business use case

Consider a research institute that stores backups on S3 Glacier Deep Archive. The backups are maintained in S3 Glacier Deep Archive for redundancy. The institute has multiple researchers with one central IT team. When a researcher requests an object from S3 Glacier Deep Archive, the central IT team retrieves it and charges the corresponding research group for retrieval and data transfer costs.

Researchers are the end users and do not operate in the AWS Cloud. They run computing clusters on-premises and depend on the central IT team to provide them with the restored archive. A member of the research team requesting an object retrieval provides the following information to the central IT team:

  1. Object key to be restored.
  2. The number of days the researcher needs the object accessible for download.
  3. Researcher’s email address.
  4. Retrieve within 12 or 48 hours SLA. This determines whether “Standard” or “Bulk” retrieval respectively.

The following overall architecture explains the setup on AWS and the interaction between a researcher and the central IT team’s architecture.

Architecture overview

Architecture diagram

Architecture diagram

  1. The researcher uses a front-end application to request object retrieval from S3 Glacier Deep Archive.
  2. Amazon API Gateway synchronously invokes AWS Step Functions Express Workflow.
  3. Step Functions initiates RestoreObject from S3 Glacier Deep Archive.
  4. Step Functions stores the metadata of this retrieval in an Amazon DynamoDB table.
  5. Step Functions uses Amazon SES to email the researcher about archive retrieval initiation.
  6. Upon completion, S3 sends the RestoreComplete event to Amazon EventBridge.
  7. EventBridge rule triggers another Step Functions for post-processing after the restore is complete.
  8. A Lambda function inside the Step Functions calculates the estimated cost (retrieval and data transfer out) and updates existing metadata in the DynamoDB table.
  9. Sync data from DynamoDB table using Amazon Athena Federated Queries to generate reports dashboard in Amazon QuickSight.
  10. Step Functions uses SES to email the researcher with cost details.
  11. Once the researcher receives an email, the researcher uses the front-end application to call the /download API endpoint.
  12. API Gateway invokes a Lambda function that generates a pre-signed S3 URL of the retrieved object and returns it in the response.


  1. To run the sample application, you must install CDK v2, Node.js, and npm.
  2. To clone the repository, run:
    git clone https://github.com/aws-samples/aws-stepfunctions-examples.git
    cd cdk/app-glacier-deep-archive-retrieval
  3. To deploy the application, run:
    cdk deploy --all

Identifying workflow components

Starting the restore object workflow

The first component is accepting the researcher’s request to start the archive retrieval process. The sample application created from the demo provides a basic front-end app that shows the files from an S3 bucket that has objects stored in S3 Glacier Deep Archive. The researcher retrieves file requests from the front-end application reached by the sample application’s Amazon CloudFront URL.

Glacier Deep Archive Retrieval menu

Glacier Deep Archive Retrieval menu

The front-end app asks the researcher for an email address, the number of days the researcher wants the object to be available for download, and their ETA on retrieval speed. Based on the retrieval speed, the researcher accepts either Standard or Bulk object retrieval. To test this, put objects in the data bucket under the S3 Glacier Deep Archive storage class and use the front-end application to retrieve them.

Item retrieval prompt

Item retrieval prompt

The researcher then chooses the Retrieve file. This action invokes an API endpoint provided by API Gateway. The API Gateway endpoint synchronously invokes a Step Functions Express Workflow. This validates the restore object request, gets the object metadata, and starts to restore the object from S3 Glacier Deep Archive.

The state machine stores the metadata of the restore object AWS SDK call in a DynamoDB table for later use. You can use this metadata to build a dashboard in Amazon QuickSight for reporting and administration purposes. Finally, the state machine uses Amazon SES to email the researcher, notifying them about the restore object initiation process:

Restore object initiated

Restore object initiated

The following state machine shows the workflow:

Workflow diagram

Workflow diagram

The ability to use S3 APIs declaratively using AWS SDK from Step Functions makes it convenient to integrate with S3. This approach avoids writing a Lambda function to wrap the SDK calls. The following portion of the state machine definition shows the usage of S3 HeadObject and RestoreObject APIs:

"Get Object Metadata": {
    "Next": "Initiate Restore Object from Deep Archive",
    "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "Bad Request"
    "Type": "Task",
    "ResultPath": "$.result.metadata",
    "Resource": "arn:aws:states:::aws-sdk:s3:headObject",
    "Parameters": {
        "Bucket": "glacierretrievalapp-databucket-abc123",
        "Key.$": "$.fileKey"
"Initiate Restore Object from Deep Archive": {
    "Next": "Update restore operation metadata",
    "Type": "Task",
    "ResultPath": null,
    "Resource": "arn:aws:states:::aws-sdk:s3:restoreObject",
    "Parameters": {
        "Bucket": "glacierretrievalapp-databucket-abc123",
        "Key.$": "$.fileKey",
        "RestoreRequest": {
            "Days.$": "$.requestedForDays"

You can extend the previous workflow and build your own Step Functions workflows to orchestrate other S3 related workflows.

Processing after object restoration completion

S3 RestoreObject is a long-running process for S3 Glacier Deep Archive objects. S3 emits a RestoreCompleted event notification on the object restore completion to EventBridge. You set up an EventBridge rule to trigger another Step Functions workflow as a target for this event. This workflow takes care of the object restoration post-processing.

cfnDataBucket.addPropertyOverride('NotificationConfiguration.EventBridgeConfiguration.EventBridgeEnabled', true);

An EventBridge rule triggers the following Step Functions workflow and passes the event payload as an input to the Step Functions execution:

new aws_events.Rule(this, 'invoke-post-processing-rule', {
  eventPattern: {
    source: ["aws.s3"],
    detailType: [
      "Object Restore Completed"
    detail: {
      bucket: {
        name: [props.dataBucket.bucketName]
  targets: [new aws_events_targets.SfnStateMachine(this.stateMachine, {
    input: aws_events.RuleTargetInput.fromObject({
      's3Event': aws_events.EventField.fromPath('$')

The Step Functions workflow gets object metadata from the DynamoDB table and then invokes a Lambda function to calculate the estimated cost. The Lambda function calculates the estimated retrieval and the data transfer costs using the contentLength of the retrieved object and the Price List API for the unit cost. The workflow then updates the calculated cost in the DynamoDB table.

The retrieval cost and the data transfer out cost are proportional to the size of the retrieved object. The Step Functions workflow also invokes a Lambda function to create the download API URL for object retrieval. Finally, it emails the researcher with the estimated cost and the download URL as a restoration completion notification.

Workflow studio diagram

Workflow studio diagram

The email notification to the researcher looks like:

Email example

Email example

Downloading the restored object

Once the object restoration is complete, the researcher can download the object from the front-end application.

Front end retrieval menu

Front end retrieval menu

The researcher chooses the Download action, which invokes another API Gateway endpoint. The endpoint integrates with a Lambda function as a backend that creates a pre-signed S3 URL sent as a response to the browser.

Administering object restoration usage

This architecture also provides a view for the central IT team to understand object restoration usage. You achieve this by creating reports and dashboards from the metadata stored in DynamoDB.

The sample application uses Amazon Athena Federated Queries and Amazon Athena DynamoDB Connector to generate a reports dashboard in Amazon QuickSight. You can also use Step Functions AWS SDK integration with Amazon Athena and visualize the workflows in the Athena console.

The following QuickSight visualization shows the count of restored S3 Glacier Deep Archive objects by their contentType:

QuickSite visualization

QuickSight visualization


With the preceding approach, you should consider that:

  1. You must start the object retrieval in the same Region as the Region of the archived object.
  2. S3 Glacier Deep Archive only supports standard and bulk retrievals.
  3. You must enable the “Object Restore Completed” event notification on the S3 bucket with the S3 Glacier Deep Archive object.
  4. The researcher’s email must be verified in SES.
  5. Use a Lambda function for the Price List GetProducts API as the service endpoints are available in specific Regions.


To clean up the infrastructure used in this sample application, run:

cdk destroy --all


Step Functions’ AWS SDK integration opens up different opportunities to orchestrate a workflow. Step Functions provides native support for retries and error handling, which offloads the heavy lifting of handling them manually in scripts.

This blog shows one example use case with S3 Glacier Deep Archive. With AWS SDK integration in Step Functions, you can build any workflow orchestration using S3 or S3 control APIs. For example, a workflow to enforce AWS Key Management Service encryption based on an S3 event, or create static website hosting on-demand in a few steps.

With different S3 API calls available in Step Functions’ Workflow Studio, you can declaratively build a Step Functions workflow instead of imperatively calling each S3 API from a shell script or command line. Refer to the demo application for more details.

For more serverless learning resources, visit Serverless Land.

Let’s Architect! Serverless architecture on AWS

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-serverless-architecture-on-aws/

Serverless architecture and computing allow you and your teams to focus on delivering business value in place of investing time tweaking the infrastructure characteristics. AWS is not only providing serverless computing as a service, but share that half of our new applications built by Amazon are using AWS Lambda, as noted by Andy Jassy in his 2020 re:Invent keynote.

In this post, we share insights into reimagining a serverless environment.

I Build Applications – Event-driven Architecture

Event-driven architecture is common in modern applications built with microservices, and it is the cornerstone for designing serverless workloads. It uses events to trigger and communicate between decoupled services.

With this video, you can learn how to start with a prototype then scale to mass adoption using decoupled systems that run when responding to, without needing to redesign. Danilo Poccia, Chief Evangelist at AWS, begins the session with the APIs, then gives an example on how to build an event-driven architecture using Amazon EventBridge. The session closes with how to understand what is happening in this exchange of events.

Event-driven communication with asynchronous invocation

Event-driven communication with asynchronous invocation

Building modern cloud applications? Think integration

This re:Invent 2021 session explains modern cloud applications based on serverless or microservices, and how connections between components define important characteristics, like scalability, availability, and coupling.

How your systems are interconnected describes your system’s essential properties, such as resiliency and changeability. Gregor Hohpe, AWS Enterprise Strategist, shares tips on what to consider when integrating different services, such as lifecycle, level of control over the systems you are integrating, and how integration becomes an integral part of your software delivery cycle. The goal is to use the same method to integrate at the same speed as software deployment.

Integration approaches with Gregor Hohpe

Integration approaches with Gregor Hohpe

Serverless architectural patterns and best practices

Serverless architectures require a mindset shift: existing patterns need to be revisited, and new patterns created using the new architecture style. For each pattern created by AWS, we provide operational, security, and reliability best practices and discuss potential challenges. We also demonstrate some patterns in reference architecture diagrams.

This session helps you identify services and applications to create serverless architectures and understand areas of potential savings, increased agility, and reliability in your organization. Heitor Lessa, Principal Solutions Architect at AWS, starts the session identifying the benefits of Lambda Power Tuning: he details setting up memory when there are hundreds of functions, then follows with best practices for the pattern created.

Best practices for serverless architecture

Best practices for serverless architecture

Best practices of advanced serverless developers

This session is an overview of architectural best practices, optimizations, and handy codes that can be used to build secure, scalable, and high-performance serverless applications.

Julian Wood, Senior Developer Advocate at AWS, provides the recommended practices for implementing serverless applications inside your company, such as Lambda, to transform and not transport, avoid monolithic services and functions, orchestrate workflow with step functions, choreograph events. Julian also touches on understanding different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Three types of AWS Lambda invocation models

Three types of AWS Lambda invocation models

Building next-gen applications with event-driven architectures

Maintaining data consistency across multiple services can be challenging. It can also be difficult to work with large amounts of data in different data stores and locations. Teams building microservices architectures often find that integration with other applications and external services can make their workloads more monolithic and tightly coupled.

In this session, you can learn how to use event-based architectures to decouple and decentralize application components. Coupling is not one-dimensional, and it’s a trade-off to balance and optimize over time. This video demonstrates patterns based on message queues and events: for each pattern you can learn the advantages, the disadvantages, and the options for building it on AWS.

Sam Dengler, Principal Solutions Architect at AWS, explains the mental models to apply while designing choreography and orchestration in a scenario with microservices. The strategy adopted by Taco Bell for identifying their bounded contexts is also detailed, as well as the architecture built on Lambda for running the business logic and on AWS Step Functions for orchestration.

Choreography and orchestration are two modes of interaction in a microservices architecture

Choreography and orchestration are two modes of interaction in a microservices architecture

See you next time!

Thanks for joining our discussion on serverless architecting! If you want to deep dive into the topic, read all about Serverless on AWS!

See you in a couple of weeks when we discuss architecting for resilience!

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Other posts in this series

Build a custom Java runtime for AWS Lambda

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/build-a-custom-java-runtime-for-aws-lambda/

This post is written by Christian Müller, Principal AWS Solutions Architect and Maximilian Schellhorn, AWS Solutions Architect

When running applications on AWS Lambda, you have the option to use either one of the managed runtime versions that AWS provides or bring your own custom runtime. The following blog post provides a walkthrough of how you can create and optimize a custom runtime for Java based Lambda functions.

Builders might rely on customized or experimental runtime behavior when creating solutions in the cloud. The Java ecosystem fosters innovation and encourages experiments with the current six-month release schedule for the latest runtime versions.

However, Lambda focuses on providing stable long-term support (LTS) versions. The official Lambda runtimes are built around a combination of operating system, programming language, and software libraries that are subject to maintenance and security updates. For example, the Lambda runtime for Java supports the LTS versions Java 8 Corretto and Java 11 Corretto as of April 2022. The Java 17 Corretto version is pending. In addition, there is no provided runtime for non LTS versions like Java 15 Corretto, Java 16 Corretto, or Java 18 Corretto.

To use other language versions, Lambda allows you to create custom runtimes. Custom runtimes allow builders to provide and configure their own runtimes for running their application code. To enable communication between your custom runtime and Lambda, you can use the runtime interface client library in Java.

With the introduction of modular runtime images in Java 9 (JEP 220), it is possible to include only the Java runtime modules that your application depends on. This reduces the overall runtime size and increases performance, especially during cold-starts. In addition, there are other techniques in Java, like class data sharing and tiered compilation, which allow you to reduce the startup time of your application even further.

To combine those capabilities, this blog post provides an overview for creating and deploying a minified Java runtime on Lambda by using Java 18 Corretto. For step-by-step instructions and prerequisites, refer to the official GitHub example.

Overview of the example

In the following example, you build a custom runtime for a basic Java application that writes request headers to Amazon DynamoDB and is fronted by Amazon API Gateway.

Application architecture

The following diagram summarizes the steps to create the application and the custom runtime:

Steps to create the application custom runtime

  1. Download the preferred Java version and take advantage of jdeps, jlink and class data sharing to create a minified and optimized Java runtime based on the application code (function.jar).
  2. Create a bootstrap file with optimized starting instructions for the application.
  3. Package the application code, the optimized Java runtime, and the bootstrap file as a zip file.
  4. Deploy the runtime, including the app, to Lambda. For example, using the AWS Cloud Development Kit (CDK)

Steps 1–3 are automated and abstracted via Docker. The following section provides a high-level walkthrough of the build and deployment process. For the full version, see the Dockerfile in the GitHub example.

Creating the optimized Java runtime

1. Download the desired Java version and copy the local application code to the Docker environment and build it with Maven:

FROM amazonlinux:2


# Update packages and install Amazon Corretto 18, Maven and Zip
RUN yum -y update
RUN yum install -y java-18-amazon-corretto-devel maven zip


# Copy the software folder to the image and build the function
COPY software software
WORKDIR /software/example-function
RUN mvn clean package

2. This step results in an uber-jar (function.jar) that you can use as an input argument for jdeps. The output is a file containing all the Java modules that the function depends on:

RUN jdeps -q \
    --ignore-missing-deps \
    --multi-release 18 \
    --print-module-deps \
    target/function.jar > jre-deps.info

3. Create an optimized Java runtime based on those application modules with jlink. Remove unnecessary information from the runtime, for example header files or man-pages:

RUN jlink --verbose \
    --compress 2 \
    --strip-java-debug-attributes \
    --no-header-files \
    --no-man-pages \
    --output /jre18-slim \
    --add-modules $(cat jre-deps.info)

4. This creates your own custom Java 18 runtime in the /jre18-slim folder. You can apply additional optimization techniques such as Class-Data-Sharing (CDS) to generate a classes.jsa file to accelerate the class loading time of the JVM.

RUN /jre18-slim/bin/java -Xshare:dump

Adding optimized starting instructions

You must tell the Lambda execution environment how to start the application. You can achieve that with a bootstrap file that includes the necessary instructions. In addition, you can define parameters to improve the performance further. For example, you could use tiered compilation and SerialGC.

The following snippet represents an example of a bootstrap file:


$LAMBDA_TASK_ROOT/jre18-slim/bin/java \
    --add-opens java.base/java.util=ALL-UNNAMED \
    -XX:+TieredCompilation \
    -XX:TieredStopAtLevel=1 \
    -XX:+UseSerialGC \
    -jar function.jar "$_HANDLER"

Packaging the components

Combine the bootstrap file, the custom Java runtime, and the application code in a zip file for later use as the deployment package:

RUN zip -r runtime.zip \
    bootstrap \
    function.jar \

The GitHub example provides a build.sh script to run the above-mentioned process via Docker. This results in a runtime.zip that you can then use as a deployment package.

Deploying the application with the custom runtime

To deploy the custom runtime, use AWS CDK. This allows you to define the needed infrastructure as code more easily in your favorite programming language.

The following code snippet shows how to create a Lambda function from a custom runtime:

Function customJava18Function = new Function(this, "LambdaCustomRuntimeJava18", FunctionProps.builder()
        .environment(Map.of("TABLE_NAME", exampleTable.getTableName()))

To deploy the application and output the necessary API Gateway URL to invoke the Lambda function, use the following command or use the provided provision_infrastructure.sh script:

cdk deploy --outputs-file target/outputs.json

Testing the application and validating the example results

After deployment, you can load test the application with the open-source software project Artillery.

The following command creates 120 concurrent invocations of the Lambda function for a duration of 60 seconds. It uses the API Gateway URL that is exported after the AWS CDK successfully deployed the application:

artillery run -t $(cat infrastructure/target/outputs.json | jq -r '.LambdaCustomRuntimeMinimalJRE18InfrastructureStack.apiendpoint') -v '{ "url": "/custom-runtime" }' infrastructure/loadtest.yml

Use CloudWatch Log Insights to query the Lambda logs and gather information about the cold start (initDuration) and duration percentiles:

filter @type = "REPORT"
    | parse @log /\d+:\/aws\/lambda\/(?<function>.*)/
    | stats
    count(*) as invocations,
    pct(@duration+coalesce(@initDuration,0), 0) as p0,
    pct(@duration+coalesce(@initDuration,0), 25) as p25,
    pct(@duration+coalesce(@initDuration,0), 50) as p50,
    pct(@duration+coalesce(@initDuration,0), 75) as p75,
    pct(@duration+coalesce(@initDuration,0), 90) as p90,
    pct(@duration+coalesce(@initDuration,0), 95) as p95,
    pct(@duration+coalesce(@initDuration,0), 99) as p99,
    pct(@duration+coalesce(@initDuration,0), 100) as p100
    group by function, ispresent(@initDuration) as coldstart
    | sort by coldstart, function

The results provide an indication of how your application performs with the custom runtime. This is especially helpful when comparing different versions.

  • Invocation time (@duration) for both cold and warm starts plus function initialization time (@initDuration) if it is a cold start:

Invocation time

  • Function initialization time (@initDuration) only:

Function initialisation time


In this blog post, you learn how to create your own optimized Java runtime for AWS Lambda by using a variety of Java optimization techniques. This allows you to tailor your Java runtime to your application needs.

See the full example on GitHub and make use of your own preferred Java version. Add additional optimization steps in the Dockerfile or tune the parameters in the bootstrap file to optimize the start of the Java virtual machine.

In case you want to re-use your custom runtime in multiple Lambda functions, you can also distribute it via a Lambda layer.

For more serverless learning resources, visit Serverless Land.

Automatically Detect Operational Issues in Lambda Functions with Amazon DevOps Guru for Serverless

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/automatically-detect-operational-issues-in-lambda-functions-with-amazon-devops-guru-for-serverless/

Today we are announcing Amazon DevOps Guru for Serverless, a new capability for Amazon DevOps Guru. It allows developers to improve the operational performance and availability of serverless applications.

AWS pioneered the serverless computing space with the launch of AWS Lambda in 2014. Today, hundreds of thousands of customers are using AWS Lambda. Lambda allows you to configure many parameters for your functions, like memory allocation, provisioned concurrency, and timeouts. For many customers, finding the right balance between all those parameters to optimize the performance and availability of their functions is challenging.

In December 2020, we announced DevOps Guru, a fully managed AIOps (Artificial Intelligence for IT operations) service that automatically detects and alerts customers about application issues and helps them to improve their applications’ availability. Today, we are announcing DevOps Guru for Serverless, a new capability for DevOps Guru, to help developers using Lambda automatically detect anomalous behavior at the function level and use ML-powered recommendations to remediate any issues that were detected.

DevOps Guru for Serverless uses ML to automatically identify and analyze a wide range of performance and availability-related issues for Lambda functions, such as low provisioned concurrency or underutilization of memory. To use this capability, you don’t need to be a serverless or ML expert.

The reactive insights of this capability help you troubleshoot ongoing issues affecting serverless applications efficiently with actionable recommendations that help you identify and fix the root cause in the shortest time possible.

DevOps Guru for Serverless also provides proactive insights that help you identify a wider range of operational anomalies long before your serverless application performance is affected. It also gives you recommendations on how to resolve the root cause of the issues.

When an issue is detected, DevOps Guru for Serverless displays the finding in the DevOps Guru console and sends notifications using Amazon EventBridge or Amazon Simple Notification Service (Amazon SNS). This allows developers to automatically manage and take real-time action on the discovered issues.

DevOps Guru for Serverless Proactive Insights
DevOps Guru for Serverless enables developers to proactively detect application issues before an event that affects the customer occurs. For example, if provision concurrency is set too low for a Lambda function and traffic for this application is growing, DevOps Guru will detect the growing traffic and the application latency degradation and generate a proactive insight showing the issue.

ML algorithms create these insights from operational data and application metrics. An insight provides high-level information, severity, status, and a recommendation for how to solve this issue.

Nowadays, DevOps Guru for Serverless provides proactive insights for Lambda and Amazon DynamoDB. These are the operational issues and the proactive insights available today:

  • Lambda concurrent executions reaching account limit – Triggered when concurrent executions reach an account limit for a continuous period.
  • Lambda Provisioned Concurrency function limit breached – Triggered when the reserved amount of provisioned concurrency is not enough over a period.
  • Lambda timeout high compared to SQS’s visibility timeout – Triggered when the duration of the lambda function exceeds the visibility timeout for the event source Amazon Simple Queue Service (Amazon SQS).
  • ­Lambda­ Provisioned Concurrency usage is lower than expected – Triggered when the utilization of the provisioned concurrency is too low.
  • Account read/write capacity for DynamoDB consumption reaching account limit – Triggered when the account consumed capacity is approaching account-level limits during a period of time.
  • DynamoDB table read/write consumed capacity reaching table limit – Triggered when the writes or reads in a table are reaching the ProvisionedWriteCapacityUnits or ProvisionedReadCapacityUnits limits for the table over a period.
  • DynamoDB table consumed capacity reaching AutoScaling Max parameter limit – Triggered when table consumed capacity is reaching AutoScaling Max parameters limit over a period.
  • DynamoDB read/write consumption lower than expected – Triggered when the value for ProvisionedWriteCapacityUnits or ProvisionedReadCapacityUnits is far from what is being consumed during a period of time.

Get started with DevOps Guru for Serverless
To get started, navigate to the DevOps Guru console to enable the service for your Lambda-based applications, other supported resources, or your entire account.

Configuring DevOps Guru

For this demo, create a new Lambda function with provisioned concurrency of 1. You can do this from the AWS console or programmatically. After you create it, you can check on the function overview page that the provisioned concurrency is set to 1.

Configuring Lambda provisioned concurrency

Add to the Lambda function a CloudWatch Event that triggers the function every minute. You can do that from the AWS console or programmatically. You can follow this tutorial to learn how to do it. Repeat that process five more times. Now the function will get triggered six times every minute from different events.

To trigger the proactive insight, you need to have six concurrent invocations of this Lambda function. To accomplish that, you need to ensure that the duration of each invocation is long enough. For this demo, you can make your function sleep for 30 seconds.

'use strict';

exports.handler = async (event) => {
    console.log('Sleep for 30 seconds')
    await new Promise(r => setTimeout(r, 30000));
    console.log('finish sleeping')


This configuration will trigger the proactive insight Lambda Provisioned Concurrency function limit breached for this function. You should see the insight in the console in three hours or less after the issue starts.

How to Check an Insight From the DevOps Guru Console
After a few hours, you can visit your DevOps Guru console, and you can verify that the proactive insight was triggered by exceeding the provisioned concurrency.

List of proactive insights

Select the Ongoing insight to see more details. The insight page opens, and it displays information relevant to the insight, metrics, events, and recommended actions for this issue.

Let’s examine this page in more detail. At the top of the page is the insight overview, with a description of what the insight is about and the severity of the issue. This is a proactive insight, so the user experience is not compromised by this issue. You also learn if the issue is ongoing and when it started. If the issue is not happening anymore, you can learn the end date for that insight. If you select the link for the affected applications, you can confirm all the Lambda functions that are affected by this insight.

Insight description information box

The next information box contains information about the CloudWatch metrics related to the proactive insight. This graph shows the metric ProvisionedConcurrecySpilloverInvocations with the summary of all the invocations in the last hours that the provisioned concurrency spilled.

Information about metrics

Relevant events are the next information box available on the page. These are AWS CloudTrail events that DevOps Guru uses combined with CloudWatch metrics and operational data to identify anomalous behavior that created the insight.

Relevant info about the insight

And finally on the page is the Recommendations information box, where DevOps Guru will output all the generated recommendations to help you address the issue. You can use the recommendations to learn the immediate steps you can take to remediate the issue.

Recommendations for the insights

In this proactive insight, DevOps Guru recommends you tune the provision concurrency of your Lambda function. It tells you to which value to set it, based on the past utilization of your function. You can also find the reasoning on why DevOps Guru recommends this insight.

Pricing and Availability
DevOps Guru for Serverless is offered to customers at no additional charge.

DevOps Guru for Serverless is available in all AWS Regions where DevOps Guru is available, US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more about DevOps Guru for Serverless and register for the hands-on workshop on May 10 to learn more about this new launch.


Handling Lambda functions idempotency with AWS Lambda Powertools

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/handling-lambda-functions-idempotency-with-aws-lambda-powertools/

This post is written by Jerome Van Der Linden, Solutions Architect Builder and Dariusz Osiennik, Sr Cloud Application Architect.

One of the advantages of using AWS Lambda is its integration with messaging services like Amazon SQS or Amazon EventBridge. The integration is managed and can also handle the retrying of failed messages. If there’s an error within the Lambda function, the failed message is sent again and the function is re-invoked.

This feature increases the resilience of the application but also means that a message can be processed multiple times by the function. This is important when managing orders, payments, or any kind of transaction that must be handled only once.

As mentioned in the design principles of Lambda, “since the same event may be received more than once, functions should be designed to be idempotent”. This article explains what idempotency is and how to implement it more easily with Lambda Powertools.

Understanding idempotency

Idempotency is the property of an operation whereby it can be applied multiple times without changing the result beyond the initial application. You can run an idempotent operation safely multiple times without any side effects like duplicates or inconsistent data. For example, this is a key principle for infrastructure as code, where you don’t want to double the number of resources each time you apply a template.

Applied to Lambda, a function is idempotent when it can be invoked multiple times with the same event with no risk of side effects. To make a function idempotent, it must first identify that an event has already been processed. Therefore, it must extract a unique identifier, called an “idempotency key”.

This identifier may be in the event payload (for example, orderId), a combination of multiple fields in the payload (for example, customerId, and orderId), or even a hash of the full payload. If using the full payload, fields such as dates, timestamps, or random elements may affect the hash and lead to changing values.

The function then checks in a persistence layer (for example, Amazon DynamoDB or Amazon ElastiCache):

  • If the key is not there, then the Lambda function can proceed normally, perform the transaction, and save the idempotency key in the persistence layer. You can potentially add the result of the function in the persistence layer too, so that subsequent calls can retrieve this result directly.
  • If the key is there, then the function can return and avoid applying the transaction again.

The following diagram shows the sequence of events with this idempotency scenario:

Sequence diagram

There are edge cases in this example:

  • You can invoke the function twice with the same event within a few milliseconds. In that case, each function acts as if it’s the first time this event is received and processes it, resulting in inconsistencies.
  • The function may perform several operations that are not idempotent. If the first operation is successful and then an error happens, the idempotency key won’t be saved. Subsequent calls redo the first operation, resulting in inconsistencies.

You can guard against these edge cases by inserting a lock as soon as the event is received:

Second sequence diagram

There are other questions and edge cases that you must consider when implementing idempotency on your Lambda functions. Read Making retries safe with idempotent APIs from the Builder’s Library to dive into the details. You can choose to implement idempotency by yourself or you can use a library that handles it and takes care of these edge cases for you. This is what Lambda Powertools (for Python and Java) proposes.

Idempotency with Lambda Powertools

Lambda Powertools is a library, available in Python, Java, and TypeScript. It provides utilities for Lambda functions to ease the adoption of best practices and to reduce the amount of code to perform recurring tasks. In particular, it provides a module to handle idempotency (in the Java and Python versions).

This post shows examples using the Java version. To get started with the Lambda Powertools idempotency module, you must install the library and configure it within your build process. For more details, follow AWS Lambda Powertools documentation.

Next, you must configure a persistence storage layer where the idempotency feature can store its state. You can use the built-in support for DynamoDB or you can create your own implementation for the database of your choice. This example creates a new table in DynamoDB.

The following AWS Serverless Application Model (AWS SAM) template creates a suitable table to store the state:

    Type: AWS::DynamoDB::Table
        - AttributeName: id
          AttributeType: S
        - AttributeName: id
          KeyType: HASH
        AttributeName: expiration
        Enabled: true
      BillingMode: PAY_PER_REQUEST

In this definition:

  • The table is multi-tenant and can be reused by multiple Lambda functions that use the Powertools idempotency module.
  • The DynamoDB time-to-live configuration helps keep idempotency limited in time. You can configure the duration, which is 1 hour by default.

Configure the idempotency module’s behavior in the init phase of the function’s lifecycle, before the handleRequest method gets called:

public class SubscriptionHandler implements RequestHandler<Subscription, SubscriptionResult> {

  public SubscriptionHandler() {

Lambda Powertools follows the paradigm of convention over configuration and provides default values for many parameters. The persistence store is the only required element. To use the DynamoDB implementation, you must specify a table name. In the previous sample, the name is provided by the environment variable TABLE_NAME.

Adding the @Idempotent annotation to the handleRequest method enables the idempotency functionality. It uses a hash of the Subscription event as the idempotency key.

  public SubscriptionResult handleRequest(final Subscription event, final Context context) {
    SubscriptionPayment payment = createSubscriptionPayment(

    return new SubscriptionResult(payment.getId(), "success", 200);

Creating orders

The example is about creating an order for a user for a list of products. Orders should not be duplicated if the client repeats the request. API consumers can safely retry a create order request in case of issues (such as a timeout or networking disruption). The application should also allow the user to buy the same products in a short period of time if that is the user’s intention.

The following architecture diagram consists of an Amazon API Gateway REST API, the idempotent Lambda function, and a DynamoDB table for storing orders.

Architecture diagram

The Orders API allows creating a new order by calling its POST /orders endpoint with the following sample payload:

  "requestToken": "260d2efe-af84-11ec-b909-0242ac120002",
  "userId": "user1",  
  "items": [
      "productId": "product1",      
      "price": 6.50,      
      "quantity": 5    
      "productId": "product2",      
      "price": 13.50,      
      "quantity": 2    
  "comment": "AWSome Order"

Lambda Powertools uses JMESPath to extract the important fields from the request that uniquely identify it. It then calculates a hash of these fields to constitute the idempotency key.

In the example, the important fields are the userId and the items, to avoid duplicated orders. But the user can also buy the same list of products in a short period of time. To allow this, the API consumer can generate a client-side token and assign its value to the requestToken field. For each unique order, the token has a different value. If a request is retried by the client, it uses the same token.

This leads to the following configuration for the idempotency key:


If the same request is sent more than once, only the first call results in a new order created in the DynamoDB table. The same order identifier is returned by the endpoint for all the subsequent calls. In this way, the API consumer can safely retry the requests without worrying about duplicating the order.

You can find the source code of the example on GitHub.

Processing payments

This example shows asynchronous batch processing of payment messages from a queue. Messages must not be processed more than once to avoid charging users multiple times for the same order. You must consider edge cases like at-least-once message delivery, an error response returned by the third party payment API or retrying the batch of messages.

The following architecture diagram shows an Amazon SQS queue, the idempotent Lambda function, and a third-party API that the function calls for payment.

Architecture diagram

This is the body of a single payment SQS message:

    "orderId": "order1",
    "userId": "user1",
    "amount": "50.25"

In this example, the process method is annotated as idempotent, not the handleRequest. The method is responsible for processing a single payment record from the SQS batch. It uses @IdempotencyKey annotation to specify which parameter to use as the idempotency key.

public List<String> handleRequest(SQSEvent sqsEvent, Context context) {
return sqsEvent.getRecords()
  .map(record -> process(record.getMessageId(),record.getBody()))

private String process(String messageId, @IdempotencyKey String messageBody) {
    logger.info("Processing messageId: {}", messageId);
    PaymentRequest request = 
    return paymentService.process(request);

If an SQS record with the same payload is received more than once, the third-party API is not called multiple times. All the subsequent calls return before calling the process method.

If an exception is thrown from the process method, the idempotency feature does not store the idempotency state in the persistence layer. The payment is treated as unprocessed and can be retried safely. This may happen if the third-party API returns a server-side error.

By default, if one message from a batch fails, all the messages in the batch are retried. Lambda Powertools also offers the SQS Batch Processing module which can help in handling partial failures.

You can find the source code of this example in the GitHub repo.


Idempotency is a critical piece of serverless architectures and can be difficult to implement. If not done correctly, it can lead to inconsistent data and other issues. This post shows how you can use Lambda Powertools to make Lambda functions idempotent and ensure that critical transactions are handled only once.

For more details about the Lambda Powertools idempotency feature and its configuration options, refer to the full documentation.

For more serverless learning resources, visit Serverless Land.

Working with events and the Amazon EventBridge schema registry

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/working-with-events-and-amazon-eventbridge-schema-registry/

Event-driven architecture, at its core, is driven by producers creating events and subscribers being made aware of those events and acting upon them. An event is a data representation of something that happened elsewhere in the application or from an outside producer. When building event-driven applications, it is critical to determine what events exist in the application, who produces them, and who subscribes and react to them.

The first step in identifying these events is to work through the process of event discovery. In this process, you decide the events that the event source produces, and what parts of the application must know about those events. Events schemas describe the structure of an event and fields included in the event. If the event’s contents match the event target’s requirements, the service sends the event to the target. If you have an existing application that you want to discover event schemas automatically for you, can enable EventBridge schema discovery. If you are building a new application, you can conduct an event discovery exercise.

An event bus connects the event to the subscriber. The event bus at AWS is Amazon EventBridge. Use EventBridge to choreograph interactions between event sources and event targets.

In this post, you learn how to perform an event discovery exercise with your team. It shows how to create a schema registry with EventBridge, and how to represent the event as an object in your code to use in your application.

Event discovery

In the event discovery phase, all business stakeholders of an application come together and write down all the possible events that can happen. For example, possible events for an ecommerce application include: Account Created, Item added to cart, Order Placed. Write events in Noun + Past Tense Verb format.

Account Created
Item Added
Order Placed

Notice that events are not technical or focused on implementation; rather they are real-world things that happened in your system. This is because everyone, from developers to product owners, must understand them. In the event discovery phase, events act as your business requirements. Later, developers then translate those requirements into code.

Once you have the events laid out, you decide who is interested in each event. For example, consider the events listed for the ecommerce application: Account Created, Item Added to cart, and Order Placed.

Events Subscribers
Account Created

Marketing team

Security team

Item Added Inventory team
Order Placed Fulfillment team

Whenever a user creates an account, the marketing team subscribes to that event to send the account holder promotions. The security team encrypts the account holder’s user name and password into a database to save the account credentials. Whenever a user places an order, the system sends an email notification with the order details to stakeholders. The system also sends a message to the fulfillment team to start packing and shipping the order. This approach decouples the interactions between all of these services. This decoupling is beneficial because it increases developer independence by reducing the dependencies on other teams to write integrations.

Once the initial event schema planning is complete, you want to ensure that developers can continue to build new features without needing to coordinate with other teams closely on event schemas. For this reason, \EventBridge’s schema registry and automated schema discovery can help developers quickly build new features based on their application’s events.

You use EventBridge schema registries to search for, find, and track different schemas generated by your application. You can also automatically find schemas with the automated schema registry.

EventBridge schema registry and discovery

Applications can have many different types of events. Events generated from AWS services, third-party SaaS applications, and your custom applications.

With so many event sources, it can be challenging to know what to expect when consuming events. A schema represents the structure of an event. It describes what happened in the event, where the event came from, and the timestamp. The event schema is important for developers as it shows what data contained in the event, and allows them to write code based on that data.

For example, an Order Placed event might always contain a list of items in the order as an array, and a user ID as an integer. EventBridge helps automate the manual process of finding and documenting schemas. There are two capabilities to highlight: Schema registry and schema discovery.

A schema registry is a repository that stores a collection of schemas. You can use a schema registry to search for, find, and track different schemas used and generated by your application. AWS automatically stores schemas for all AWS sources for EventBridge in your schema registry. SaaS partner and custom schemas can be generated and added to the registry using the schema discovery feature. A schema registry enables you to use events as objects in your code more easily.

Adding an event to the EventBridge schema registry

In this tutorial, you create an Account Created event, which includes a user’s name and email address.

  1. Navigate to the Amazon EventBridge console and choose Schemas from the left panel.step 1There are three types of schemas represented in the tabs: AWS event schema registry, discovered schema registry, and custom schema registry.When you choose AWS event schema registry, you can search for any AWS service or event that is supported by EventBridge. There, you can view the schema for that event.
    pic 2

  2. To create a custom schema registry for your application, navigate to the custom schema registry tab and choose Create registry.
  3. Enter a name for the registry and then choose Create.
  4. There are currently no schemas in the registry. Choose Create custom schema to create one.
  5. Choose your registry as the destination and call the schema “user”. You can choose to load the schema template using an OpenAPI format from the Load Template option. You can then manually enter data for each of the fields.
  6. Alternatively, you can have the service discover the schema from JSON. Remember that events are written in JSON. Choose the Discover from JSON tab and enter the following code:{"id": 1, "name": "Talia Nassi", "emailAddress": "[email protected]"}
  7. Choose Discover schema.
  8. EventBridge extrapolates the schema from this information. The schema shows that the ID is a number, the name is a string, and the email address is a string. Choose Create to create the schema.
  9. When you choose your schema from the schema registry, you can see the structure of the event you just created.

Representing events as objects in your code with code bindings

Once a schema is added to the registry, you can download a code binding, which allows you to represent the event as an object in your code. You can take advantage of IDE features such as validation and auto-complete. Code bindings are available for Java, Python, or TypeScript programming languages. You can download bindings from the Amazon EventBridge Console, or directly from your IDE with the AWS Toolkit plugin for IntelliJ and Visual Studio Code.

Choose the programming language you prefer, then choose Download. This downloads the code binding to your local machine.

You can also choose to download code bindings directly to your IDE with the AWS Toolkit. This tutorial uses VS Code but you can also use IntelliJ.

  1. Ensure you have VS Code installed.
  2. Navigate to the VS Code marketplace and search for AWS and install the AWS Toolkit. You may have to restart VS Code.
  3. Choose a profile to connect to AWS. Set the Region to the same Region that you created the schema in. You see this icon on the left panel when you access the AWS Explorer:
  4. Choose Schemas from the left panel, then choose your schema, myRegistry. Open user by right clicking and choosing View Schema.
  5. You can now use this event object in your code.


In this post, you learn about event discovery, schema registry, and schema discovery. Event discovery is essential when creating event-driven applications because it allows the team to see which events are created by your application, and who needs to subscribe to those events.

Events have specific structures, called schemas. Your schema registry includes all of the schemas for your events. You can use the schema registry to search for events produced by other teams, which can make development faster. You learn how to create a custom schema registry, and how to download code bindings to use events in your code.

For more information, visit Serverless Land.

Building a serverless cloud-native EDI solution with AWS

Post Syndicated from Ripunjaya Pattnaik original https://aws.amazon.com/blogs/architecture/building-a-serverless-cloud-native-edi-solution-with-aws/

Electronic data interchange (EDI) is a technology that exchanges information between organizations in a structured digital form based on regulated message formats and standards. EDI has been used in healthcare for decades on the payer side for determination of coverage and benefits verification. There are different standards for exchanging electronic business documents, like American National Standards Institute X12 (ANSI), Electronic Data Interchange for Administration, Commerce and Transport (EDIFACT), and Health Level 7 (HL7).

HL7 is the standard to exchange messages between enterprise applications, like a Patient Administration System and a Pathology Laboratory Information. However, HL7 messages are embedded in Health Insurance Portability and Accountability Act (HIPAA) X12 for transactions between enterprises, like hospital and insurance companies.

HIPAA is a federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. It also mandates healthcare organizations to follow a standardized mechanism of EDI to submit and process insurance claims.

In this blog post, we will discuss how you can build a serverless cloud-native EDI implementation on AWS using the Edifecs XEngine Server.

EDI implementation challenges

Due to its structured format, EDI facilitates the consistency of business information for all participants in the exchange process. The primary EDI software that is used processes the information and then translates it into a more readable format. This can be imported directly and automatically into your integration systems. Figure 1 shows a high-level transaction for a healthcare EDI process.

EDI Transaction Sets exchanges between healthcare provider and payer

Figure 1. EDI Transaction Sets exchanges between healthcare provider and payer

Along with the implementation itself, the following are some of the common challenges encountered in EDI system development:

  1. Scaling. Despite the standard protocols of EDI, the document types and business rules differ across healthcare providers. You must scale the scope of your EDI judiciously to handle a diverse set of data rules with multiple EDI protocols.
  2. Flexibility in EDI integration. As standards evolve, your EDI system development must reflect those changes.
  3. Data volumes and handling bad data. As the volume of data increases, so does the chance for errors. Your storage plans must adjust as well.
  4. Agility. In healthcare, EDI handles business documents promptly, as real-time document delivery is critical.
  5. Compliance. State Medicaid and Medicare rules and compliance can be difficult to manage. HIPAA compliance and CAQH CORE certifications can be difficult to acquire.

Solution overview and architecture data flow

Providers and Payers can send requests as enrollment inquiry, certification request, or claim encounter to one another. This architecture uses these as source data requests coming from the Providers and Payers as flat files (.txt and .csv), Active Message Queues, and API calls (submitters).

The steps for the solution shown in Figure 2 are as follows:

1. Flat, on-premises files are transferred to Amazon Simple Storage Service (S3) buckets using AWS Transfer Family (2).
3. AWS Fargate on Amazon Elastics Container Service (Amazon ECS) runs Python packages to convert the transactions into JSON messages, then queues it on Amazon MQ (4).
5. Java Message Service (JMS) Bridge, which runs Apache Camel on Fargate, pulls the messages from the on-premises messaging systems and queues them on Amazon MQ (6).
7. Fargate also runs programs to call the on-premises API or web services to get the transactions and queues it on Amazon MQ (8).
9. Amazon CloudWatch monitors the queue depth. If queue depth goes beyond a set threshold, CloudWatch sends notifications to the containers through Amazon Simple Notification Service (SNS) (10).
11. Amazon SNS triggers AWS Lambda, which adds tasks to Fargate (12), horizontally scaling it to handle the spike.
13. Fargate runs Python programs to read the messages on Amazon MQ and uses PYX12 packages to convert the JSON messages to EDI file formats, depending on the type of transactions.
14. The container also may queue the EDI requests on different queues, as the solution uses multiple trading partners for these requests.
15. The solution runs Edifecs XEngine Server on Fargate with Docker image. This polls the messages from the queues previously mentioned and converts them to EDI specification by the trading partners that are registered with Edifecs.
16. Python module running on Fargate converts the response from the trading partners to JSON.
17. Fargate sends JSON payload as a POST request using Amazon API Gateway, which updates requestors’ backend systems/databases (12) that are running microservices on Amazon ECS (11).
18. The solution also runs Elastic Load Balancing to balance the load across the Amazon ECS cluster to take care of any spikes.
19. Amazon ECS runs microservices that uses Amazon RDS (20) for domain specific data.

EDI transaction-processing system architecture on AWS

Figure 2. EDI transaction-processing system architecture on AWS

Handling PII/PHI data

The EDI request and response file includes protected health information (PHI)/personal identifiable information (PII) data related to members, claims, and financial transactions. The solution leverages all AWS services that are HIPAA eligible and encrypts data at rest and in-transit. The file transfers are through FTP, and the on-premises request/response files are Pretty Good Privacy (PGP) encrypted. The Amazon S3 buckets are secured through bucket access policies and are AES-256 encrypted.

Amazon ECS tasks that are hosted in Fargate use ephemeral storage that is encrypted with AES-256 encryption, using an encryption key managed by Fargate. User data stored in Amazon MQ is encrypted at rest. Amazon MQ encryption at rest provides enhanced security by encrypting data using encryption keys stored in the AWS Key Management Service. All connections between Amazon MQ brokers use Transport Layer Security to provide encryption in transit. All APIs are accessed through API gateways secured through Amazon Cognito. Only authorized users can access the application.

The architecture provides many benefits to EDI processing:

  • Scalability. Because the solution is highly scalable, it can speed integration of new partner/provider requirements.
  • Compliance. Use the architecture to run sensitive, HIPAA-regulated workloads. If you plan to include PHI (as defined by HIPAA) on AWS services, first accept the AWS Business Associate Addendum (AWS BAA). You can review, accept, and check the status of your AWS BAA through a self-service portal available in AWS Artifact. Any AWS service can be used with a healthcare application, but only services covered by the AWS BAA can be used to store, process, and transmit protected health information under HIPAA.
  • Cost effective. Though serverless cost is calculated by usage, with this architecture you save as your traffic grows.
  • Visibility. Visualize and understand the flow of your EDI processing using Amazon CloudWatch to monitor your databases, queues, and operation portals.
  • Ownership. Gain ownership of your EDI and custom or standard rules for rapid change management and partner onboarding.


In this healthcare use case, we demonstrated how a combination of AWS services can be used to increase efficiency and reduce cost. This architecture provides a scalable, reliable, and secure foundation to develop your EDI solution, while using dependent applications. We established how to simplify complex tasks in order to manage and scale your infrastructure for a high volume of data. Finally, the solution provides for monitoring your workflow, services, and alerts.

For further reading:

Orchestrating high performance computing with AWS Step Functions and AWS Batch

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/orchestrating-high-performance-computing-with-aws-step-functions-and-aws-batch/

This post is written by Dan Fox, Principal Specialist Solutions Architect; Sabha Parameswaran, Senior Solutions Architect.

High performance computing (HPC) workloads address challenges in a wide variety of industries, including genomics, financial services, oil and gas, weather modeling, and semiconductor design. These workloads frequently have complex orchestration requirements that may include large datasets and hundreds or thousands of compute tasks.

AWS Step Functions is a workflow automation service that can simplify orchestrating other AWS services. AWS Batch is a fully managed batch processing service that can dynamically scale to address computationally intensive workloads. Together, these services can orchestrate and run demanding HPC workloads.

This blog post identifies three common challenges when creating HPC workloads. It describes some features with Step Functions and AWS Batch that can help to solve these challenges. It then shows a sample project that performs complex task orchestration using Step Functions and AWS Batch.

Reaching service quotas with polling iterations

It’s common for HPC workflows to require that a step comprising multiple jobs completes before advancing to the next step. In these cases, it’s typical for developers to implement iterative polling patterns to track job completions.

To handle task orchestration for a workload like this, you may choose to use Step Functions. Step Functions has a hard limit of 25,000 history events. A single state transition may contain multiple history events. For example, an entry to the state and an exit from the state count as two steps. A workflow that iteratively polls many long-running processes may run into limits with this service quota.

Step Functions addresses this by providing synchronization capabilities with several integrated services, including AWS Batch. For integrated services, Step Functions can wait for a task to complete before progressing to the next state. This feature, called “Run a Job (.sync)” in the documentation, returns a Success or Failure message only when the compute service task is complete, reducing the number of entries in the event history log. View the Step Functions documentation for a complete list of supported service integrations.

“Run a Job (.sync)” task pattern

Supporting parallel and dynamic tasks

HPC workloads may require a changing number of compute tasks from execution to execution. For example, the number of compute tasks required in a workflow may depend on the complexity or length of an input dataset. For performance reasons, you may desire for these tasks to run in parallel.

Step Functions supports faster data processing with a fixed number of parallel executions through the parallel state type. If a workload has an unknown number of branches, the map state type can run a set of parallel steps for each element of an input array. We refer to this pattern as dynamic parallelism.

Dynamic parallelism using the map state

Limits and flow control with dynamic tasks

The map state may limit concurrent iterations. When this occurs, some iterations do not begin until previous iterations have completed. The likelihood of this occurring increases when an input array has over 40 items. If your HPC workload benefits from increased concurrency, you may use nested workflows. Step Functions allows you to orchestrate more complex processes by composing modular, reusable workflows.

For example, a map state may invoke secondary, nested workflows, which also contain map states. By nesting Step Functions workflows, you can build larger, more complex workflows out of smaller, simpler workflows.

As your workflow grows in complexity, you may use the callback task, which is an additional flow control feature. Callback tasks provide a way to pause a workflow pending the return of a unique task token. A callback task passes this token to an integrated service and then stops. Once the integrated service has completed, it may return the task token to the Step Functions workflow with a SendTaskSuccess or SendTaskFailure call. Once the callback task receives the task token, the workflow proceeds to the next state.

View the documentation for a list of integrated services that support this pattern.

Nested workflows using callback with task token

A sample project with several orchestration patterns

This sample project demonstrates orchestration of HPC workloads using Step Functions, AWS Batch, and Lambda. The nesting in this project is three layers deep. The primary state machine runs the second layer nested state machines, which in turn run the third layer nested state machines.

The primary state machine demonstrates dynamic parallelism. It receives an input payload that contains an array of items used as input for its map step. Each dynamic map branch runs a nested secondary state machine.

The secondary state machine demonstrates several workflow patterns including a Lambda function with callback, a third layer nested state machine in sync mode, an AWS Batch job in sync mode, and a Lambda function calling AWS Batch with a callback. The tertiary state machine only notifies its parent with Success when called in sync mode.

Explore the ASL in this project to review the code for these patterns.

Sample project architecture

Deploy the sample application

The README of the Github project contains more detailed instructions, but you may also follow these steps.


  1. AWS Account: If you don’t have an AWS account, navigate to https://aws.amazon.com/ and choose Create an AWS Account.
  2. VPC: A valid existing VPC with subnets (for execution of AWS Batch jobs). Refer to https://docs.aws.amazon.com/vpc/latest/userguide/vpc-getting-started.html for creating a VPC.
  3. AWS CLI: This project requires a local install of AWS CLI. Refer to https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html for installing AWS CLI.
  4. Git CLI: This project requires a local install of Git CLI. Refer to https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
  5. AWS SAM CLI: This project requires a local install of AWS SAM CLI to build and deploy the sample application. Refer to https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html for instructions to install and use AWS SAM CLI.
  6. Python 3.8 and Docker: Required only if you plan for local development and testing with AWS SAM CLI.

Build and deploy in your account

Follow these steps to build this application locally and deploy it in your AWS account:

  1. Git clone the repository into a local folder.
    git clone https://github.com/aws-samples/aws-stepfunction-complex-orchestrator-app
    cd aws-stepfunction-complex-orchestrator-app
  2. Build the application using AWS SAM CLI.
    sam build
  3. Use AWS SAM deploy with the --guided flag.
    sam deploy --guided
    1. Parameter BatchScriptName (accept default: batch-notify-step-function.sh)
    2. Parameter VPCID (enter the id of your VPC)
    3. Parameter FargateSubnetAccessibility (choose Private or Public depending on subnet configuration)
    4. Parameter Subnets (enter ID for a single subnet)
    5. Parameter batchSleepDuration (accept default: 15)
    6. Accept all other defaults
  4. Note the name of the BucketName parameter in the Outputs of the deploy process. Save this S3 bucket name for the next step.
  5. Copy the batch script into the S3 bucket created in the prior step.
    aws s3 cp ./batch-script/batch-notify-step-function.sh s3://<S3-bucket-name>

Testing the sample application

Once the SAM CLI deploy is complete, navigate to Step Functions in the AWS Console.

  1. Note the three new state machines deployed to your account. Each state machine has a random suffix generated as part of the AWS SAM deploy process:
    1. ComplexOrchestratorStateMachine1-…
    2. SyncNestedStateMachine2-…
    3. CallbackNotifyStateMachine3-…
  2. Follow the link for the primary state machine: ComplexOrchestratorStateMachine1-…
  3. Choose Start execution.
  4. The sample payload for this state machine is located here: orchestrator-step-function-payload.json. This JSON document has 2 input elements within the entries array. Select and copy this JSON and paste it into the Input field in the console, overwriting the default value. This causes the map state to create and execute two nested state machines in parallel. You may modify this JSON to contain more elements to increase the number of parallel executions.
  5. View the “Execution event history” in the console for this execution. Under the Resource column, locate the links to the nested state machines. Select these links to follow their individual executions. You may also explore the links to Lambda, CloudWatch, and AWS Batch job.
  6. Navigate to AWS Batch in the console. Step Functions workflows are available to AWS Batch users within the AWS Batch console. Find the integration from the AWS Batch side navigation under “Related AWS services”. You can also read this blog post to learn more.
  7. Common troubleshooting. If the batch job fails or if the Step Functions workflow times out, make sure that you have correctly copied over the batch script into the S3 bucket as described in step 5 of the Build and Deploy in your account section of this post. Also make sure that the FargateSubnetAccessibility Parameter matches the configuration of your subnet (Public or Private).
  8. The state machine may take several minutes to complete. When successful, the Graph Inspector displays:Graph Inspector with Successful completion

Cleaning up

To clean up the deployment, run the following commands to delete the stack associated with the AWS SAM deployment:

aws cloudformation delete-stack --stack-name <stack-name>


This blog post describes several challenges common to orchestrating HPC workloads. I describe how Step Functions with AWS Batch can solve many of these challenges. I provide a project that contains several sample patterns and show how to deploy and test this in your account.

For more information on serverless patterns, visit Serverless Land.

Building an event-driven application with Amazon EventBridge

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/building-an-event-driven-application-with-amazon-eventbridge/

In event-driven architecture, services interact with each other through events. An event is something that happened in your application (for example, an item was put into a cart, a new order was placed). Events are JSON objects that tell you information about something that happened in your application. In event-driven architecture, each component of the application raises an event whenever anything changes. Other components listen and decide what to do with it and how they would like to react.

When you build applications with event-driven architecture, you decouple your event sources and event targets. This can enable teams to act more independently, because your services are loosely coupled. When you add new features to your applications, you raise new events and then decide on the event source and event target. The event source is what emits the event, and the event target is what subscribes to or receives the event. Decoupling event sources and event targets can greatly speed up development time, and it can simplify making changes to your application.

Decoupling your application can allow for more seamless cross-team collaboration. For example, let’s say you are a developer at an ecommerce company and you are building a serverless ecommerce application. Your team is in charge of the account creation and authentication process. You build the login workflow, and raise an event when a new user creates an account.

When the event is raised, other teams can be alerted. The marketing team can listen for the Account Created event and act on it (for example, send promotional emails). In this decoupled architecture, event producers and consumers don’t have to know about each other. They only have to listen to events and act accordingly when they are interested in an event. This can speed up development by reducing the complexity caused by building new features.

In AWS, events are choreographed through Amazon EventBridge rules. A rule matches incoming events from an event source and sends them to event targets for processing.

eventbridge architecture

EventBridge accepts events from many different event sources, including over 200 AWS services, custom events from your Lambda functions or applications, and third-party SaaS applications. You specify an action to take when EventBridge receives an event that matches the event pattern in the rule. When an event matches, Amazon EventBridge sends the event to the specified target and triggers the action defined in the rule.

To route events from these sources to the correct target, the events must be placed on a corresponding event bus. There are three types of event buses. The first type is the default bus, which is always available in every account, and it’s where AWS events are routed to. The second type is a custom event bus. You can create custom event buses for your own applications to meet your business needs. Lastly, you can also create SaaS event buses, which are created when you configure SaaS applications as an event source.

There are many potential event targets. Event targets are what the event bus route to once a corresponding event happens. Targets include AWS Lambda, Amazon Kinesis, AWS Step Functions, Amazon API Gateway, and even event buses in other accounts. This flexible design allows you to create a wide variety of integration patterns based on your specific needs.

Configuring events with Amazon EventBridge

This tutorial sends an event from Amazon S3 (the event source) to AWS Lambda (the event target) using an event rule.

In this tutorial, you learn how to configure events with Amazon EventBridge by deploying an AWS Serverless Application Model template. The AWS Serverless Application Model (AWS SAM) is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. AWS SAM is an extension of AWS CloudFormation, which is the AWS infrastructure as code tool. You define resources using CloudFormation in your AWS SAM template and use the full suite of resources, intrinsic functions, and other template features that are available in AWS CloudFormation.

First, you upload images to an S3 bucket. This raises an event, which invokes a Lambda function, which resizes the image and places it in a different S3 bucket.


  1. AWS SAM CLI (If you use AWS Cloud9, this is installed for you)

To configure events with Amazon EventBridge:

  1. Navigate to the Serverlessland Patterns Collection and choose the pattern Amazon S3 to Amazon EventBridge to AWS Lambda. This AWS SAM template deploys an S3 bucket, a Lambda function, an EventBridge rule, and the IAM resources required to run the application.
  2. Copy and paste the cloning instructions in your terminal.
  3. Run sam deploy --guided to deploy the pattern.
  4. You see the success message:
  5. Navigate to the EventBridge console and choose Rules from the left panel. Then choose the rule that was created by AWS SAM (starting with sam-app)
    The event source is S3 and the rule is invoked when an image is put into the source bucket in S3. Next, notice that the event target is the Lambda function that you created from the AWS SAM template.
  6. Navigate to the S3 console and choose Buckets on the left panel. Then choose the bucket that was created for you (starting with sam-app). Choose the Properties tab, and note that the integration with EventBridge is on.
  7. From the Objects tab, choose Upload, and upload an image.

  8. Navigate to the Lambda console and choose your Lambda function (starting with sam-app). Select the Monitor tab, and choose View Logs in CloudWatch.
  9. You can see the event that triggered the Lambda function in the logs:

Adding more event rules to your application

In the previous example, you add an EventBridge rule that routes events from S3 (the event source) to Lambda (the event target) using an event bus. Now, add another rule:

  1. From the EventBridge console, choose Rules, and then choose Create rule.
  2. Enter a name and description for the rule.
  3. Define the event pattern that is used to invoke the event targets. For Service provider, choose AWS, and for Service name choose S3. For Event type, choose Amazon S3 event notification, and in the event dropdown choose Object created. You are configuring the event source to be an object created in your S3 bucket.
  4. Select either the default AWS event bus or a custom event bus.
  5. Select the event target. In this example, configure an Amazon CloudWatch log group. Enter any name for the log group, which is created automatically.
  6. Choose Create.
  7. Upload an image to the S3 bucket, as shown in step 7 above.
  8. Navigate to the Amazon CloudWatch console and choose Log groups from the left panel. Choose the log group, and then choose a log stream.
  9. The event is logged to CloudWatch Logs.

Adding a second event rule did not change the event source’s behavior or affect other event targets.


This post is a brief introduction of event-driven architecture, and walks through a tutorial where you create an event-driven application with the Serverlessland Patterns Collection. You also add two different event rules to your event bus.

For more serverless learning resources, visit Serverless Land.

Introducing global endpoints for Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-global-endpoints-for-amazon-eventbridge/

This post is written by Stephen Liedig, Sr Serverless Specialist SA.

Last year, AWS announced two new features for Amazon EventBridge that allow you to route events from any commercial AWS Region, and across your AWS accounts. This supported a wide range of use cases, allowing you to implement easily global event delivery and replication scenarios.

From today, EventBridge extends this capability with global endpoints. Global endpoints provide a simpler and more reliable way for you to improve the availability and reliability of event-driven applications. The feature allows you to fail over event ingestion automatically to a secondary Region during service disruptions. Global endpoints also provide optional managed event replication, simplifying your event bus configuration and reducing the risk of event loss during any service disruption.

This blog post explains how to configure global endpoints in your AWS account, update your applications to publish events to the endpoint, and how to test endpoint failover.

How global endpoints work

Customers building multi-Region architectures today are building more resilience by using self-managed replication via EventBridge’s cross-Region capabilities. With this architecture, events are sent directly to an event bus in a primary Region and replicated to another event bus in a secondary Region.


This event flow can be interrupted if there is a service disruption. In this scenario, event producers in the primary Region cannot PutEvents to their event bus, and event replication to the secondary Region is impacted.

To put more resiliency around multi-Region architectures, you can now use global endpoints. Global endpoints solve these issues by introducing two core service capabilities:

  1. A global endpoint is a managed Amazon Route 53 DNS endpoint. It routes events to the event buses in either Region, depending on the health of the service in the primary Region.
  2. There is a new EventBridge metric called IngestionToInvocationStartLatency. This exposes the time to process events from the point at which they are ingested by EventBridge to the point the first invocation of a target in your rules is made. This is a service-level metric measured across all of your rules and provides an indication of the health of the EventBridge service. Any extended periods of high latency over 30 seconds may indicate a service disruption.

These two features provide you with the ability to failover event ingestion automatically to the event bus in the secondary Region. The failover is triggered via a Route 53 health check that monitors a CloudWatch alarm that observes the IngestionToInvocationStartLatency in the primary Region.

If the metric exceeds the configured threshold of 30 seconds consecutively for 5 minutes, the alarm state changes to “ALARM”. This causes the Route 53 health check state to become unhealthy, and updates the routing of the global endpoint. All events from that point on are delivered to the event bus in the secondary Region.

The diagram below illustrates how global endpoints reroutes events being delivered from the event bus in the primary Region to the event bus in the secondary Region when CloudWatch alarms trigger the failover of the Route 53 health check.

Rerouting events with global endpoints

Once events are routed to the secondary Region, you have a couple of options:

  1. Continue processing events by deploying the same solution that processes events in the primary Region to the secondary Region.
  2. Create an EventBridge archive to persist all events coming through the secondary event bus. EventBridge archives provide you with a type of “Active/Archive” architecture allowing you to replay events to the event bus once the primary Region is healthy again.

When the global endpoint alarm returns to a healthy state, the health check updates the endpoint configuration, and begins routing events back to the primary Region.

Global endpoints can be optionally configured to replicate events across Regions. When enabled, managed rules are created on your primary and secondary event buses that define the event bus in the other Region as the rule target.

Under normal operating conditions, events being sent to the primary event bus are also sent to the secondary in near-real-time to keep both Regions synchronized. When a failover occurs, consumers in the secondary Region have an up-to-date state of processed events, or the ability to replay messages delivered to the secondary Region, before the failover happens.

When the secondary Region is active, the replication rule attempts to send events back to the event bus in the primary Region. If the event bus in the primary Region is not available, EventBridge attempts to redeliver the events for up to 24 hours, per its default event retry policy. As this is a managed rule, you cannot change this. To manage for a longer period, you can archive events being ingested by the secondary event bus.

How do you identify if the event has been replicated from another Region? Events routed via global endpoints have identical resource fields, which contain the Amazon Resource Name (ARN) of the global endpoint that routed the event to the event bus. The region field shows the origin of the event. In the following example, the event is sent to the event bus in the primary Region and replicated to the event bus in the secondary Region. The event in the secondary Region is us-east-1, showing the source of the event was the event bus in the primary Region.

If there is a failover, events are routed to the secondary Region and replicated to the primary Region. Inspecting these events, you would expect to see us-west-2 as the source Region.

Replicated events

The two preceding events are identical except for the id. Event IDs can change across API calls so correlating events across Regions requires you to have an immutable, unique identifier. Consumers should also be designed with idempotency in mind. If you are replicating events, or replaying them from archives, this ensures that there are no side effects from duplicate processing.

Setting up a global endpoint

To configure a Global endpoint, define two event buses — one in the “primary” Region (this is the same Region you configure the endpoint in) and one in a “secondary” Region. To ensure that events are routed correctly, the event bus in the secondary Region must have the same name, in the same account, as the primary event bus.

  1. Create two event buses in different Regions with the same name. This is quickly set up using the AWS Command Line Interface (AWS CLI):Primary event bus:
    aws events create-event-bus --name orders-bus --region us-east-1Secondary event bus:
    aws events create-event-bus --name orders-bus --region us-west-2
  2. Open the Amazon EventBridge console in the Region where you want to create the global endpoint. This aligns with your primary Region. Navigate to the new global endpoints page and create a new endpoint.
    EventBridge console
  3. In the Endpoint details panel, specify a name for your global endpoint (for example, OrdersGlobalEndpoint) and enter a description.
  4. Select the event bus in the primary Region, orders-bus.
  5. Select the Region used when creating the secondary event bus previously. the secondary event bus by choosing the Region it was created in from the dropdown.Create global endpoint
  6. Select the Route 53 health check for triggering failover and recovery. If you have not created one before, choose New health check. This opens an AWS CloudFormation console to create the “LatencyFailuresHealthCheck” health check and CloudWatch alarm in your account. EventBridge provides a template with recommended defaults for a CloudWatch alarm that is triggered when the average latency exceeds 30 seconds for 5 minutes.Endpoint configuration
  7. Once the CloudFormation stack is deployed, return to the EventBridge console and refresh the dropdown list of health checks. Select the physical ID of the health check you created.Failover and recovery
  8. Ensure that event replication is enabled, and create the endpoint.
    Event replication
  9. Once the endpoint is created, it appears in the console. The global endpoint URL contains the EndpointId, which you must specify in PutEvents API calls to publish events to the endpoint.
    Endpoint listed in console

Testing failover

Once you have created an endpoint, you can test the configuration by creating “catch all” rules on the primary and secondary event buses. The simplest way to see events being processed is to create rules with CloudWatch log group target.

Testing global endpoint failure over is accomplished by inverting the Route 53 health check. This can be accomplished using any of the Route 53 APIs. Using the console, open the Route 53 health checks landing page and edit the “LatencyFailuresHealthCheck” associated with your global endpoint. Check “Invert health check status” and save to update the health check.

Within a few minutes, the health check changes state from “Healthy” to “Unhealthy” and you see events flowing to the event bus in the secondary.

Configure health check

Using the PutEvents API with global endpoints

To use global endpoints in your applications, you must update your current PutEvents API call. All AWS SDKs have been updated to include an optional EndpointId parameter that you must set when publishing events to a global endpoint. Even though you are no longer putting events directly on the event bus, the EventBusName must be defined to validate the endpoint configuration.

PutEvents SDK support for global endpoints requires the AWS Common Runtime (CRT) library, which is available for multiple programming languages, including Python, Node.js, and Java:


To install the awscrt module for Python using pip, run:

python3 -m pip install boto3 awscrt

This example shows how to send an event to a global endpoint using the Python SDK:

import json
import boto3
from datetime import datetime
import uuid
import random

client = session.client('events', config=my_config)

detail = {
    "order_date": datetime.now().isoformat(),
    "customer_id": str(uuid.uuid4()),
    "order_id": str(uuid.uuid4()),
    "order_total": round(random.uniform(1.0, 1000.0), 2)

put_response = client.put_events(
    EndpointId=" y6gho8g4kc.veo",
            'Source': 'com.aws.Orders',
            'DetailType': 'OrderCreated',
            'Detail': json.dumps(detail),
            'EventBusName': 'orders-bus'

Event producers can suffer data loss if the PutEvents API call fails, even if you are using global endpoints. Global endpoints allow you to automate the re-routing of events to another event-bus in another Region, but the health checks triggering the failover won’t be invoked for at least 5 minutes. It’s possible that your applications experience increased error rates for PutEvents operations before the failover occurs and events are routed to a healthy Region. To safeguard against message loss during this time, it’s best practice to use exponential retry and back-off patterns and durable store-and-forward capability at the producer level.


This blog shows how to create an EventBridge global endpoint to improve the availability and reliability of event ingestion of event-driven applications. This example shows how to use the PutEvents in the Python AWS SDK to publish events to a global endpoint.

To create a global endpoint using the API, see CreateEndpoint in the Amazon EventBridge API Reference. You can also create a global endpoint by using AWS CloudFormation using an AWS::Events:: Endpoints resource.

To learn more about EventBridge global endpoints, see the EventBridge Developer Guide. For more serverless learning resources, visit Serverless Land.

Announcing AWS Lambda Function URLs: Built-in HTTPS Endpoints for Single-Function Microservices

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/announcing-aws-lambda-function-urls-built-in-https-endpoints-for-single-function-microservices/

Organizations are adopting microservices architectures to build resilient and scalable applications using AWS Lambda. These applications are composed of multiple serverless functions that implement the business logic. Each function is mapped to API endpoints, methods, and resources using services such as Amazon API Gateway and Application Load Balancer.

But sometimes all you need is a simple way to configure an HTTPS endpoint in front of your function without having to learn, configure, and operate additional services besides Lambda. For example, you might need to implement a webhook handler or a simple form validator that runs within an individual Lambda function.

Today, I’m happy to announce the general availability of Lambda Function URLs, a new feature that lets you add HTTPS endpoints to any Lambda function and optionally configure Cross-Origin Resource Sharing (CORS) headers.

This lets you focus on what matters while we take care of configuring and monitoring a highly available, scalable, and secure HTTPS service.

How Lambda Function URLs Work
Create a new function URL and map it to any function. Each function URL is globally unique and can be associated with a function’s alias or the function’s unqualified ARN, which implicitly invokes the $LATEST version.

For example, if you map a function URL to your $LATEST version, each code update will be available immediately via the function URL. On the other hand, I’d recommend mapping a function URL to an alias, so you can safely deploy new versions, perform some integration tests, and then update the alias when you’re ready. This also lets you implement weighted traffic shifting and safe deployments.

Function URLs are natively supported by the Lambda API, and you can start using it via the AWS Management Console or AWS SDKs, as well as infrastructure as code(IaC) tools such as AWS CloudFormation, AWS SAM, or AWS Cloud Development Kit (AWS CDK).

Lambda Function URLs in Action
You can configure a function URL for a new or an existing function. Let’s see how to implement a new function to handle a webhook.

When creating a new function, I check Enable function URL in Advanced Settings.

Here, I select Auth type: AWS_IAM or NONE. My webhook will use custom authorization logic based on a signature provided in the HTTP headers. Therefore, I’ll choose AuthType None, which means Lambda won’t check for any AWS IAM Sigv4 signatures before invoking my function. Instead, I’ll extract and validate a custom header in my function handler for authorization.

AWS Lambda URLs - Create Function

Please note that when using AuthType None, my function’s resource-based policy must still explicitly allow for public access. Otherwise, unauthenticated requests will be rejected. You can add permissions programmatically using the AddPermission API. In this case, the Lambda console automatically adds the necessary policy for me, as the IAM role I’m using is authorized to call the AddPermission API in my account.

With one click, I can also enable CORS. The default CORS configuration will allow all origins. Then, I’ll add more granular controls after creating the function. In case you’re not familiar with CORS, it’s a header-based security mechanism implemented by browsers to make sure that only certain hosts are allowed to load resources and invoke APIs. If a website is allowed to consume your API, you’ll need to include a few CORS headers that declare which origins, methods, and custom headers are allowed. The new function URLs take care of it for you, so you don’t have to implement all of this in your Lambda handler.

A few seconds later, the function URL is available. I can also easily find and copy it in the Lambda console.

AWS Lambda URLs - Console URL

The function code that handles my webhook in Node.js looks like this:

exports.handler = async (event) => {
    // (optional) fetch method and querystring
    const method = event.requestContext.http.method;
    const queryParam = event.queryStringParameters.myCustomParameter;
    console.log(`Received ${method} request with ${queryParam}`)
    // retrieve signature and payload
    const webhookSignature = event.headers.SignatureHeader;
    const webhookPayload = JSON.parse(event.body);
    try {
        validateSignature(webhookSignature); // throws if invalid signature
        handleEvent(webhookPayload); // throws if processing error
    } catch (error) {
        return {
            statusCode: 400,
            body: `Cannot process event: ${error}`,

    return {
        statusCode: 200, // default value
        body: JSON.stringify({
            received: true,

The code is extracting a few parameters from the request headers, query string, and body. If you’re already familiar with the event structure provided by API Gateway or Application Load Balancer, this should look very familiar.

After updating the code, I decide to test the function URL with an HTTP client.

For example, here’s how I’d do it with curl:

$ curl "https://4iykoi7jk2kp5hhd5irhbdprn40yxest.lambda-url.us-west-2.on.aws/?myCustomParameter=squirrel"
    -X POST
    -H "SignatureHeader: XYZ"
    -H "Content-type: application/json"
    -d '{"type": "payment-succeeded"}'

Or with a Python script:

import json
import requests

url = "https://4iykoi7jk2kp5hhd5irhbdprn40yxest.lambda-url.us-west-2.on.aws/"
headers = {'SignatureHeader': 'XYZ', 'Content-type': 'application/json'}
payload = json.dumps({'type': 'payment-succeeded'})
querystring = {'myCustomParameter': 'squirrel'}

r = requests.post(url=url, params=querystring, data=payload, headers=headers)

Don’t forget to set the request’s Content-type to application/json or text/* in your tests, otherwise, the body will be base64-encoded by default, and you’ll need to decode it in the Lambda handler.

Of course, in this case we’re talking about a webhook, so this function will receive requests directly from the external system that I’m integrating with. I only need to provide them with the public function URL and start receiving events.

For this specific use case, I don’t need any CORS configuration. In other cases where the function URL is called from the browser, I’d need to configure a few more CORS parameters such as Access-Control-Allow-Origin, Access-Control-Allow-Methods, and Access-Control-Expose-Headers. I can easily review and edit these CORS parameters in the Lambda console or in my IaC templates. Here’s what it looks like in the console:

AWS Lambda URLs - CORS

Also, keep in mind that each function URL is unique and mapped to a specific alias or the $LATEST version of your function. This lets you define multiple URLs for the same function. For example, you can define one for testing the $LATEST version during development and one for each stage or alias, such as staging, production, and so on.

Support for Infrastructure as Code (IaC)
You can start configuring Lambda Function URLs directly in your IaC templates today using AWS CloudFormation, AWS SAM, and AWS Cloud Development Kit (AWS CDK).

For example, here’s how to define a Lambda function and its public URL with AWS SAM, including the alias mapping:

    Type: AWS::Serverless::Function
      CodeUri: webhook/
      Handler: index.handler
      Runtime: nodejs14.x
      AutoPublishAlias: live
        AuthType: NONE
                - "https://example.com"

If you have existing Lambda functions in your IaC templates, you can define a new function URL with a few lines of code.

Function URL Pricing
Function URLs are included in Lambda’s request and duration pricing. For example, let’s imagine that you deploy a single Lambda function with 128 MB of memory and an average invocation time of 50 ms. The function receives five million requests every month, so the cost will be $1.00 for the requests, and $0.53 for the duration. The grand total is $1.53 per month, in the US East (N. Virginia) Region.

When to use Function URLs vs. Amazon API Gateway
Function URLs are best for use cases where you must implement a single-function microservice with a public endpoint that doesn’t require the advanced functionality of API Gateway, such as request validation, throttling, custom authorizers, custom domain names, usage plans, or caching. For example, when you are implementing webhook handlers, form validators, mobile payment processing, advertisement placement, machine learning inference, and so on. It is also the simplest way to invoke your Lambda functions during research and development without leaving the Lambda console or integrating additional services.

Amazon API Gateway is a fully managed service that makes it easy for you to create, publish, maintain, monitor, and secure APIs at any scale. Use API Gateway to take advantage of capabilities like JWT/custom authorizers, request/response validation and transformation, usage plans, built-in AWS WAF support, and so on.

Generally Available Today
Function URLs are generally available today in all AWS Regions where Lambda is available, except for the AWS China Regions. Support is also available through many AWS Lambda Partners such as Datadog, Lumigo, Pulumi, Serverless Framework, Thundra, and Dynatrace.

I’m looking forward to hearing how you’re using this new functionality to simplify your serverless architectures, especially in single-function use cases where you want to keep things simple and cost-optimized.

Check out the new Lambda Function URLs documentation.


Getting Started with Event-Driven Architecture

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/getting-started-with-event-driven-architecture/

In modern application development, event-driven architecture is becoming more prominent because it can make building applications in the cloud easier. Event-driven architecture can allow you to decouple your services, which increases developer velocity, and can make it easier for you to debug applications. It also can help remove the bottleneck that occurs when features expand across different teams, which allows teams to progress more independently.

One way to think about how an application works is as a system that reacts to events from other places, like from within your application. In this approach, you focus on the system’s interaction with its surroundings as a transmission of events. The application receives and creates events. Inputs to the application and outputs from the application act as events. At its core, this is event-driven architecture.

API-driven architecture vs. event-driven architecture

Commands/APIs Events
Synchronous Asynchronous

Has an intent

Directed to a target

It’s a fact

Happened in the past





A common way of making components of an application work together is through an API-driven, request-response architecture where you have requests and responses. For example, you query a list of orders from an Orders API, and the Orders API responds with a list of orders. This is an example of synchronous architecture. The system asking for the orders waits for the response. You cannot move on until the response comes back. In this approach, you send commands that are directed to a target (for example, “place this order” or “add this record to the database”).

sync vs async

In a synchronous model, the client makes a request to Service A. Service A calls Service B, but then Service A waits for Service B to respond before it continues on and eventually responds to the client.

In asynchronous, event-driven architecture, there is no response path. The service surfaces the event and then immediately moves forward. The trade-off here is that there’s no direct channel for Service B to pass back information to Service A, besides confirming it received the event. But in many cases, you don’t need that explicit coupling between the request and response channels.

An event is something that happened. For example, a new account is created, or an item is dropped into an Amazon S3 bucket. Events are immutable, which means you cannot change them. Once an event happens, you cannot undo it. For example, if there is an event raised when an order is placed, there can be another event for an order being cancelled. Events can come from various places such as messaging systems or databases.

Events are JSON objects that tell you information about something that happened in your application. In event-driven architecture, events represent facts. Each component of the application raises an event whenever anything changes. Other components listen and decide what to do with it and how they would like to react.


In the event above, S3 raises the event when you put the image into an Amazon S3 bucket. The event source is an S3 bucket named sam-app-sourcebucket. The object that is put into the bucket is called “brad.jpeg”.

Request-driven applications typically use directed commands to coordinate downstream functions to complete an activity and are often tightly coupled. This makes it harder to determine when errors occur in your application. Event-driven applications create events that are observable by other services and systems. However, the event producer is unaware of which consumers, if any, are listening. Typically, these are loosely coupled.

Events are observable. Any service that is authorized can watch an event. Consider a coffee shop example where there is a barista, who makes coffee, and a pastry chef, who makes pastries. When a customer enters the coffee shop and orders a cup of coffee, the barista starts to make the coffee, and the pastry chef takes no action.

However, if a customer comes in to the coffee shop and orders a chocolate croissant, then the pastry chef starts making the chocolate croissant, and the barista takes no action. The pastry chef is only interested in orders relating to pastries and the barista is only interested in events relating to coffee.

In an ecommerce application, like Amazon.com, there are different departments that respond to different events. You can place orders through Whole Foods, Amazon Fresh, and Amazon.com. When you place an order with Amazon Fresh, the subscribers to that event take action and fulfill your order.


Event-driven architecture and command-driven architecture also differ in the ways that they store state. In a typical command-driven architecture, you have only one component store a particular piece of data, and other components ask that component for the data when needed.

In event-driven architecture, every component stores all the data it needs and listens to update events for that data. In command-driven architecture, the component that stores the data is responsible for updating it. In event-driven architecture, all it has to do is ensure new events are raised on the updates.

Benefits of using event-driven architecture

Decoupling event sources and event targets

Many applications are built in a monolith, where the components are tightly coupled, and are highly dependent on each other. This proves to be problematic when there are bugs and you are trying to pinpoint exactly what part of the application is failing. Decoupled architectures are composed of components or services that are loosely coupled. In an event-driven, decoupled architecture, you broadcast events without caring who responds to them. This saves time because events can be queued and forwarded whenever the receiver is ready to process them. This allows for building scalable, highly modifiable systems.

Decoupled applications enable teams to act more independently, which increases their velocity. For example, with an API-based integration, if my team wants to know about some change that happened in another team’s microservice, I have to ask that team to make an API call to my service. That means I have to deal with authentication, coordination with the other team over the structure of the API call, etc. This causes back and forth between teams, which slows down development time. With an event-driven application, you can subscribe to events sent from a microservice and the event router (for example, Amazon EventBridge) takes care of routing the event and handling authentication.

Decoupled applications also allow you to build new features faster. Adding new features or extending existing ones is simpler with event-driven architectures. This is because you only have to choose the event you need to trigger your new feature, and subscribe to it. There’s no need to modify any of your existing services to add new functionality.

Write less code

When you build applications using event-driven architecture, often you write less code because you only need to consider new events, as well as which service is subscribed to those events. For example, if you are building new features for your application, all you have to do is consider the existing events and then add senders and receivers as necessary. In this way, you speed up development time because each functional unit is smaller and there is often less code.

Better extensibility

In the example above, you built a highly extensible application. Other teams can extend features and add functionality without impacting other microservices. By publishing events using EventBridge, this application integrates with existing systems, but also enables any future application to integrate as an event consumer. Producers of events have no knowledge of event consumers, which can help simplify the microservice logic.

Enhancing team collaboration

A common process to build applications is to work with your product managers and business stakeholders to gather requirements. Developers then translate those requirements into code. However, there may be a disconnect between the product requirements and the code. When you use events, everyone in the business understands the logic. You define the events in an application (for example, a customer adds an item to their shopping cart or a customer account is created) and that becomes your product requirements. Whenever that action happens, it produces an event, and whoever is interested can take action on that event.

For example, a marketing manager could be interested whenever a customer creates a new account. One way to choreograph this in event-driven architecture is to have a Marketing event bus that listens for the New Account event. There could also be other teams that are interested, such as the Analytics team, who also subscribe to that event. Each team/service can subscribe to events that are relevant to them. Event-driven architecture is a great way for businesses to describe their business problems and represent them.


This post introduces events, and then compares event-driven architecture to command-driven, request-response architecture. It also explains the benefits of event-driven architecture, including decoupling event sources and targets, writing less code, having better extensibility, and enhancing team collaboration.

For more serverless learning resources, visit Serverless Land.