Tag Archives: serverless

A Thanksgiving 2020 Reading List

Post Syndicated from Val Vesa original https://blog.cloudflare.com/a-thanksgiving-2020-reading-list/

A Thanksgiving 2020 Reading List

While our colleagues in the US are celebrating Thanksgiving this week and taking a long weekend off, there is a lot going on at Cloudflare. The EMEA team is having a full day on CloudflareTV with a series of live shows celebrating #CloudflareCareersDay.

So if you want to relax in an active and learning way this weekend, here are some of the topics we’ve covered on the Cloudflare blog this past week that you may find interesting.

Improving Performance and Search Rankings with Cloudflare for Fun and Profit

Making things fast is one of the things we do at Cloudflare. More responsive websites, apps, APIs, and networks directly translate into improved conversion and user experience. On November 10, Google announced that Google Search will directly take web performance and page experience data into account when ranking results on their search engine results pages (SERPs), beginning in May 2021.

Rustam Lalkaka and Rita Kozlov explain in this blog post how Google Search will prioritize results based on how pages score on Core Web Vitals, a measurement methodology Cloudflare has worked closely with Google to establish, and we have implemented support for in our analytics tools. Read the full blog post.

Getting to the Core: Benchmarking Cloudflare’s Latest Server Hardware

At the Cloudflare Core, we process logs to analyze attacks and compute analytics. In 2020, our Core servers were in need of a refresh, so we decided to redesign the hardware to be more in line with our Gen X edge servers. We designed two major server variants for the core. The first is Core Compute 2020, an AMD-based server for analytics and general-purpose compute paired with solid-state storage drives. The second is Core Storage 2020, an Intel-based server with twelve spinning disks to run database workloads. This is a refresh of the hardware that Cloudflare uses to run analytics provided big efficiency improvements.

Read the full blog post by Brian Bassett

Moving Quicksilver into production

We previously explained how and why we built Quicksilver. Quicksilver is the data store responsible for storing and distributing the billions of KV pairs used to configure the millions of sites and Internet services which use Cloudflare. This second blog post is about the long journey to production which culminates with Kyoto Tycoon removal from Cloudflare infrastructure and points to the first signs of obsolescence.

Geoffrey Plouviez takes you through the entire story of real-world engineering challenges and what it’s like to replace one of Cloudflare’s oldest critical components: read the full blog post here.

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers

In this blog post, we explore how Cloudflare Workers continues to excel as a JAMstack deployment platform, and how it can be used to power e-commerce experiences, integrating with familiar tools like Stripe, as well as new technologies like Nuxt.js, and Sanity.io.

Read the full blog post and get all the details and open-source code from Kristian Freeman.

A Byzantine failure in the real world

When we review design documents at Cloudflare, we are always on the lookout for Single Points of Failure (SPOFs). In this post, we present a timeline of a real-world incident, and how an interesting failure mode known as a Byzantine fault played a role in a cascading series of events.

Tom Lianza and Chris Snook’s full blog post describes the consequences of a malfunctioning switch on a system built for reliability.

ASICs at the Edge

At Cloudflare, we pride ourselves in our global network that spans more than 200 cities in over 100 countries. To accelerate all that traffic through our network, there are multiple technologies at play. So let’s have a look at one of the cornerstones that makes all of this work.

Tom Strickx’ epic deep dive into ASICs is here.

Let us know your thoughts and comments below or feel free to also reach out to us via our social media channels. And because we talked about careers in the beginning of this blog post, check out our available jobs if you are interested to join Cloudflare.

ICYMI: Serverless pre:Invent 2020

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-preinvent-2020/

During the last few weeks, the AWS serverless team has been releasing a wave of new features in the build-up to AWS re:Invent 2020. This post recaps some of the most important releases for serverless developers.

re:Invent is virtual and free to all attendees in 2020 – register here. See the complete list of serverless sessions planned and join the serverless DA team live on Twitch. Also, follow your DAs on Twitter for live recaps and Q&A during the event.

AWS re:Invent 2020

AWS Lambda

We launched Lambda Extensions in preview, enabling you to more easily integrate monitoring, security, and governance tools into Lambda functions. You can also build your own extensions that run code during Lambda lifecycle events, and there is an example extensions repo for starting development.

You can now send logs from Lambda functions to custom destinations by using Lambda Extensions and the new Lambda Logs API. Previously, you could only forward logs after they were written to Amazon CloudWatch Logs. Now, logging tools can receive log streams directly from the Lambda execution environment. This makes it easier to use your preferred tools for log management and analysis, including Datadog, Lumigo, New Relic, Coralogix, Honeycomb, or Sumo Logic.

Lambda Extensions API

Lambda launched support for Amazon MQ as an event source. Amazon MQ is a managed broker service for Apache ActiveMQ that simplifies deploying and scaling queues. This integration increases the range of messaging services that customers can use to build serverless applications. The event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service manages an internal poller to invoke the target Lambda function.

We also released a new layer to make it simpler to integrate Amazon CodeGuru Profiler. This service helps identify the most expensive lines of code in a function and provides recommendations to help reduce cost. With this update, you can enable the profiler by adding the new layer and setting environment variables. There are no changes needed to the custom code in the Lambda function.

Lambda announced support for AWS PrivateLink. This allows you to invoke Lambda functions from a VPC without traversing the public internet. It provides private connectivity between your VPCs and AWS services. By using VPC endpoints to access the Lambda API from your VPC, this can replace the need for an Internet Gateway or NAT Gateway.

For developers building machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling in Lambda, you can now use AVX2 support to help reduce duration and lower cost. By using packages compiled for AVX2 or compiling libraries with the appropriate flags, your code can then benefit from using AVX2 instructions to accelerate computation. In the blog post’s example, enabling AVX2 for an image-processing function increased performance by 32-43%.

Lambda now supports batch windows of up to 5 minutes when using SQS as an event source. This is useful for workloads that are not time-sensitive, allowing developers to reduce the number of Lambda invocations from queues. Additionally, the batch size has been increased from 10 to 10,000. This is now the same as the batch size for Kinesis as an event source, helping Lambda-based applications process more data per invocation.

Code signing is now available for Lambda, using AWS Signer. This allows account administrators to ensure that Lambda functions only accept signed code for deployment. Using signing profiles for functions, this provides granular control over code execution within the Lambda service. You can learn more about using this new feature in the developer documentation.

Amazon EventBridge

You can now use event replay to archive and replay events with Amazon EventBridge. After configuring an archive, EventBridge automatically stores all events or filtered events, based upon event pattern matching logic. You can configure a retention policy for archives to delete events automatically after a specified number of days. Event replay can help with testing new features or changes in your code, or hydrating development or test environments.

EventBridge archived events

EventBridge also launched resource policies that simplify managing access to events across multiple AWS accounts. This expands the use of a policy associated with event buses to authorize API calls. Resource policies provide a powerful mechanism for modeling event buses across multiple account and providing fine-grained access control to EventBridge API actions.

EventBridge resource policies

EventBridge announced support for Server-Side Encryption (SSE). Events are encrypted using AES-256 at no additional cost for customers. EventBridge also increased PutEvent quotas to 10,000 transactions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). This helps support workloads with high throughput.

AWS Step Functions

Synchronous Express Workflows have been launched for AWS Step Functions, providing a new way to run high-throughput Express Workflows. This feature allows developers to receive workflow responses without needing to poll services or build custom solutions. This is useful for high-volume microservice orchestration and fast compute tasks communicating via HTTPS.

The Step Functions service recently added support for other AWS services in workflows. You can now integrate API Gateway REST and HTTP APIs. This enables you to call API Gateway directly from a state machine as an asynchronous service integration.

Step Functions now also supports Amazon EKS service integration. This allows you to build workflows with steps that synchronously launch tasks in EKS and wait for a response. In October, the service also announced support for Amazon Athena, so workflows can now query data in your S3 data lakes.

These new integrations help minimize custom code and provide built-in error handling, parameter passing, and applying recommended security settings.

AWS SAM CLI

The AWS Serverless Application Model (AWS SAM) is an AWS CloudFormation extension that makes it easier to build, manage, and maintains serverless applications. On November 10, the AWS SAM CLI tool released version 1.9.0 with support for cached and parallel builds.

By using sam build --cached, AWS SAM no longer rebuilds functions and layers that have not changed since the last build. Additionally, you can use sam build --parallel to build functions in parallel, instead of sequentially. Both of these new features can substantially reduce the build time of larger applications defined with AWS SAM.

Amazon SNS

Amazon SNS announced support for First-In-First-Out (FIFO) topics. These are used with SQS FIFO queues for applications that require strict message ordering with exactly once processing and message deduplication. This is designed for workloads that perform tasks like bank transaction logging or inventory management. You can also use message filtering in FIFO topics to publish updates selectively.

SNS FIFO

AWS X-Ray

X-Ray now integrates with Amazon S3 to trace upstream requests. If a Lambda function uses the X-Ray SDK, S3 sends tracing headers to downstream event subscribers. With this, you can use the X-Ray service map to view connections between S3 and other services used to process an application request.

AWS CloudFormation

AWS CloudFormation announced support for nested stacks in change sets. This allows you to preview changes in your application and infrastructure across the entire nested stack hierarchy. You can then review those changes before confirming a deployment. This is available in all Regions supporting CloudFormation at no extra charge.

The new CloudFormation modules feature was released on November 24. This helps you develop building blocks with embedded best practices and common patterns that you can reuse in CloudFormation templates. Modules are available in the CloudFormation registry and can be used in the same way as any native resource.

Amazon DynamoDB

For customers using DynamoDB global tables, you can now use your own encryption keys. While all data in DynamoDB is encrypted by default, this feature enables you to use customer managed keys (CMKs). DynamoDB also announced support for global tables in the Europe (Milan) and Europe (Stockholm) Regions. This feature enables you to scale global applications for local access in workloads running in different Regions and replicate tables for higher availability and disaster recovery (DR).

The DynamoDB service announced the ability to export table data to data lakes in Amazon S3. This enables you to use services like Amazon Athena and AWS Lake Formation to analyze DynamoDB data with no custom code required. This feature does not consume table capacity and does not impact performance and availability. To learn how to use this feature, see this documentation.

AWS Amplify and AWS AppSync

You can now use existing Amazon Cognito user pools and identity pools for Amplify projects, making it easier to build new applications for an existing user base. AWS Amplify Console, which provides a fully managed static web hosting service, is now available in the Europe (Milan), Middle East (Bahrain), and Asia Pacific (Hong Kong) Regions. This service makes it simpler to bring automation to deploying and hosting single-page applications and static sites.

AWS AppSync enabled AWS WAF integration, making it easier to protect GraphQL APIs against common web exploits. You can also implement rate-based rules to help slow down brute force attacks. Using AWS Managed Rules for AWS WAF provides a faster way to configure application protection without creating the rules directly. AWS AppSync also recently expanded service availability to the Asia Pacific (Hong Kong), Middle East (Bahrain), and China (Ningxia) Regions, making the service now available in 21 Regions globally.

Still looking for more?

Join the AWS Serverless Developer Advocates on Twitch throughout re:Invent for live Q&A, session recaps, and more! See this page for the full schedule.

For more serverless learning resources, visit Serverless Land.

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/building-black-friday-e-commerce-experiences-with-jamstack-and-cloudflare-workers/

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers

The idea of serverless is to allow developers to focus on writing code rather than operations — the hardest of which is scaling applications. A predictably great deal of traffic that flows through Cloudflare’s network every year is Black Friday. As John wrote at the end of last year, Black Friday is the Internet’s biggest online shopping day. In a past case study, we talked about how Cordial, a marketing automation platform, used Cloudflare Workers to reduce their API server latency and handle the busiest shopping day of the year without breaking a sweat.

The ability to handle immense scale is well-trodden territory for us on the Cloudflare blog, but scale is not always the first thing developers think about when building an application — developer experience is likely to come first. And developer experience is something Workers does just as well; through Wrangler and APIs like Workers KV, Workers is an awesome place to hack on new projects.

Over the past few weeks, I’ve been working on a sample open-source e-commerce app for selling software, educational products, and bundles. Inspired by Humble Bundle, it’s built entirely on Workers, and it integrates powerfully with all kinds of first-class modern tooling: Stripe, an API for accepting payments (both from customers and to authors, as we’ll see later), and Sanity.io, a headless CMS for data management.

This kind of project is perfectly suited for Workers. We can lean into Workers as a static site hosting platform (via Workers Sites), API server, and webhook consumer, all within a single codebase, and deployed instantly around the world on Cloudflare’s network.

If you want to see a deployed version of this template, check out ecommerce-example.signalnerve.workers.dev.

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers
The frontend of the e-commerce Workers template.

In this blog post, I’ll dive deeper into the implementation details of the site, covering how Workers continues to excel as a JAMstack deployment platform. I’ll also cover some new territory in integrating Workers with Stripe. The project is open-source on GitHub, and I’m actively working on improving the documentation, so that you can take the codebase and build on it for your own e-commerce sites and use cases.

The frontend

As I wrote last year, Workers continues to be an amazing platform for JAMstack apps. When I started building this template, I wanted to use some things I already knew — Sanity.io for managing data, and of course, Workers Sites for deploying — but some new tools as well.

Workers Sites is incredibly simple to use: just point it at a directory of static assets, and you’re good to go. With this project, I decided to try out Nuxt.js, a Vue-based static site generator, to power the frontend for the application.

Using Sanity.io, the data representing the bundles (and the products inside of those bundles) is stored on Sanity.io’s own CDN, and retrieved client-side by the Nuxt.js application.

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers
Managing data inside Sanity.io’s headless CMS interface.

When a potential customer visits a bundle, they’ll see a list of products from Sanity.io, and a checkout button provided by Stripe.

Responding to new checkout sessions and purchases

Making API requests with Stripe’s Node SDK isn’t currently supported in Workers (check out the GitHub issue where we’re discussing a fix), but because it’s just REST underneath, we can easily make REST requests using the library.

When a user clicks the checkout button on a bundle page, it makes a request to the Cloudflare Workers API, and securely generates a new session for the user to checkout with Stripe.

import { json, stripe } from '../helpers'

export default async (request) => {
  const body = await request.json()
  const { price_id } = body

  const session = await stripe('/checkout/sessions', {
    payment_method_types: ['card'],
    line_items: [{
        price: price_id,
        quantity: 1,
      }],
    mode: 'payment'
  }, 'POST')

  return json({ session_id: session.id })
}

This is where Workers excels as a JAMstack platform. Yes, it can do static site hosting, but with just a few extra lines of routing code, I can deploy a highly scalable API right alongside my Nuxt.js application.

Webhooks and working with external services

This idea extends throughout the rest of the checkout process. When a customer is successfully charged for their purchase, Stripe sends a webhook back to Cloudflare Workers. In order to complete the transaction on our end, the Workers application:

  • Validates the incoming data from Stripe to ensure that it’s legitimate. This means that every incoming webhook request is explicitly validated using your Stripe account details, and can be confirmed to be valid before the function acts on it.
  • Distributes payments to the authors using Stripe Connect. When a customer buys a bundle for $20, that $20 (minus Stripe fees) gets distributed evenly between the authors in that bundle — all of this calculation and the associated transfer requests happen inside the Worker.
  • Sends a unique download link to the customer. Using Workers KV, a unique token is set up that corresponds to the customer’s email, which can be used to retrieve the content the customer purchased. This integration uses Mailgun to construct an email and send it entirely over REST APIs.

By the time the purchase is complete, the Workers serverless API will have interfaced with four distinct APIs, persisting records, sending emails, and handling and distributing payments to everyone involved in the e-commerce transaction. With Workers, this all happens in a single codebase, with low latency and a superb developer experience. The entire API is type-checked and validated before it ever gets shipped to production, thanks to our TypeScript template.

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers

Each of these tasks involves a pretty serious level of complexity, but by using Workers, we can abstract each of them into smaller pieces of functionality, and compose powerful, on-demand, and infinitely scalable webhooks directly on the serverless edge.

Conclusion

I’m really excited about the launch of this template and, of course, it wouldn’t have been possible to ship something like this in just a few weeks without using Cloudflare Workers. If you’re interested in digging into how any of the above stuff works, check out the project on GitHub!

With the recent announcement of our Workers KV free tier, this project is perfect to fork and build your own e-commerce products with. Let me know what you build and say hi on Twitter!

Serving Content Using a Fully Managed Reverse Proxy Architecture in AWS

Post Syndicated from Leonardo Machado original https://aws.amazon.com/blogs/architecture/serving-content-using-fully-managed-reverse-proxy-architecture/

With the trends to autonomous teams and microservice style architectures, web frontend tiers are challenged to become more flexible and integrate different components with independent architectures and technology stacks. Two scenarios are prominent:

  • Micro-Frontends, where there is a single page application and components within this page are owned by different teams
  • Web portals, where there is a landing page and subsections of the presence are owned by different teams. In the following we will refer to these as components as well.

What these scenarios have in common is that they consist of loosely coupled components that are seamlessly hidden to the end user behind a common interface. Often, a reverse proxy serves content from one single entry domain but retrieves the content from different origins. In the example in Figure 1 (below) we want to address one specific domain name, and depending on the path prefix, we retrieve the content from an on-premises webserver, from a webserver running on Amazon Elastic Cloud Compute (EC2), or from Amazon S3 Static Hosting, in the figure represented by the prefixes /hotels, /pets, and /cars, respectively. If we forward the path to the webserver without the path prefix, the component would not know what prefix it is run under and the prefix could be changed any time without impacting the component, thus making the component context-unaware.

Figure 1 - Architecture, AWS Amplify Console

Figure 1: Architecture, AWS Amplify Console

Some common requirements to these approaches are:

  • Components should be technology-agnostic, each component should be able to choose the technology stack independently.
  • Each component can be maintained by a dedicated autonomous team without depending on other teams.
  • All components are served from the same domain name. For example, this could have implications on search engine optimization.
  • Components should be unaware of the context where it is used.

The traditional approach would be to run a reverse proxy tier with rewrite rules to different origins. In this post we look into managed alternatives in AWS that take away the heavy lifting of running and scaling the proxy infrastructure.

Note: AWS Application Load Balancer can be used as a reverse proxy, but it only supports static targets (fixed IP address), no dynamic targets (domain name). Thus, we do not consider it here.

AWS Amplify Console

The AWS Amplify Console provides a Git-based workflow for hosting fullstack serverless web apps with continuous deployment. Amplify Console also offers a rewrites and redirects feature, which can be used for forwarding incoming requests with different path patterns to different origins (see Figure 2).

Figure 2 - Dashboard, AWS Amplify Console (rewrites and redirects feature)

Figure 2: Dashboard, AWS Amplify Console (rewrites and redirects feature)

Note: In Figure 2, <*> stands for a wildcard that matches any pattern. Target addresses must be HTTPS (no HTTP allowed).

This architectural option is the simplest to setup and manage and is the best approach for teams looking for the least management effort. AWS Amplify Console offers a simple interface for easily mapping incoming patterns to target addresses. It also makes it easy to serve additional static content if needed. Configuration options are limited and more complex scenarios cannot be implemented.

If you want to rewrite paths to remove the path prefix, you can accomplish this by using the wildcard pattern. The source address would contain the path prefix, but the target address would omit the prefix as seen in Figure 2.

When looking at pricing compared to the other approaches it is important to look at the outgoing traffic. With higher volumes, this can get expensive.

Amazon API Gateway

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. API Gateway’s REST API type allows users to setup HTTP proxy integrations, which can be used for forwarding incoming requests with different path patterns to different origin servers according to the API specifications (Figure 3).

Figure 3 - Dashboard, Amazon API Gateway (HTTP proxy integration)

Figure 3: Dashboard, Amazon API Gateway (HTTP proxy integration)

Note: In Figure 3, {proxy+} and {proxy} stand for the same wildcard pattern.

API Gateway, in comparison to Amplify Console, is better suited when looking for a higher customization degree. API Gateway offers multiple customization and monitoring features, such as custom gateway responses and dashboard monitoring.

Similar to Amplify Console, API Gateway provides a feature to rewrite paths and thus remove context from the path using the {proxy} wildcard.

API Gateway REST API pricing is based on the number of API calls as well as any external data transfers. External data transfers are charged at the EC2 data transfer rate.

Note: The HTTP integration type in API Gateway REST APIs does not support forwarding trailing slashes. If this is needed for your application, consider other integration types such as AWS Lambda integration or AWS service integration.

Amazon CloudFront and AWS [email protected]

Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency and high transfer speeds. CloudFront is able to route incoming requests with different path patterns to different origins or origin groups by configuring its cache behavior rules (Figure 4).

Figure 4 - Dashboard, CloudFront (Cache Behavior)

Figure 4: Dashboard, CloudFront (Cache Behavior)

Additionally, Amazon CloudFront allows for integration with AWS [email protected] functions. [email protected] runs your code in response to events generated by CloudFront. In this scenario we can use [email protected] to change the path pattern before forwarding a request to the origin and thus removing the context. For details on see this detailed re:Invent session.

This approach offers most control over caching behavior and customization. Being able to add your own custom code through a custom Lambda function adds an entire new range of possibilities when processing your request. This enables you to do everything from simple HTTP request and response processing at the edge to more advanced functionality, such as website security, real-time image transformation, intelligent bot mitigation, and search engine optimization.

Amazon CloudFront is charged by request and by [email protected] invocation. The data traffic out is charged with the CloudFront regional data transfer out pricing.

Conclusion

With AWS Amplify Console, Amazon API Gateway, and Amazon CloudFront, we have seen three approaches to implement a reverse proxy pattern using managed services from AWS. The easiest approach to start with is AWS Amplify Console. If you run into more complex scenarios consider API Gateway. For most flexibility and when data traffic cost becomes a factor look into Amazon CloudFront with [email protected]

Using Amazon SQS dead-letter queues to replay messages

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-sqs-dead-letter-queues-to-replay-messages/

Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service. It enables you to decouple and scale microservices, distributed systems, and serverless applications. A commonly used feature of Amazon SQS is dead-letter queues. The DLQ (dead-letter queue) is used to store messages that can’t be processed (consumed) successfully.

This post describes how to add automated resilience to an existing SQS queue. It monitors the dead-letter queue and moves a message back to the main queue to see if it can be processed again. It also uses a specific algorithm to make sure this is not repeated forever. Each time it attempts to reprocess the message, the replay time increases until the message is finally considered dead.

I use Amazon SQS dead-letter queues, AWS Lambda, and a specific algorithm to decrease the rate of retries for failed messages. I then package and publish this serverless solution in the AWS Serverless Application Repository.

Dead-letter queues and message replay

The main task of a dead-letter queue (DLQ) is to handle message failure. It allows you to set aside and isolate non-processed messages to determine why processing failed. Often these failed messages are caused by application errors. For example, a consumer application fails to parse a message correctly and throws an unhandled exception. This exception then triggers an error response that sends the message to the DLQ. The AWS documentation contains a tutorial detailing the configuration of an Amazon SQS dead-letter queue.

To process the failed messages, I build a retry mechanism by implementing an exponential backoff algorithm. The idea behind exponential backoff is to use progressively longer waits between retries for consecutive error responses. Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. This spreads the message retries more evenly across time, allowing them to be processed more efficiently.

Solution overview

Solution architecture

The flow of the message sent by the producer to SQS is as follows:

  1. The producer application sends a message to an SQS queue
  2. The consumer application fails to process the message in the same SQS queue
  3. The message is moved from the main SQS queue to the default dead-letter queue as per the component settings.
  4. A Lambda function is configured with the SQS main dead-letter queue as an event source. It receives and sends back the message to the original queue adding a message timer.
  5. The message timer is defined by the exponential backoff and jitter algorithm.
  6. You can limit the number of retries. If the message exceeds this limit, the message is moved to a second DLQ where an operator processes it manually.

How the replay function works

Each time the SQS dead-letter queue receives a message, it triggers Lambda to run the replay function. The replay code uses an SQS message attribute `sqs-dlq-replay-nb` as a persistent counter for the current number of retries attempted. The number of retries is compared to the maximum number (defined in the application configuration file). If it exceeds the maximum, the message is moved to the human operated queue. If not, the function uses the AWS Lambda event data to build a new message for the Amazon SQS main queue. Finally it updates the retry counter, adds a new message timer to the message, and it sends the message back (replays) to the main queue.

def handler(event, context):
    """Lambda function handler."""
    for record in event['Records']:
        nbReplay = 0
        # number of replay
        if 'sqs-dlq-replay-nb' in record['messageAttributes']:
            nbReplay = int(record['messageAttributes']['sqs-dlq-replay-nb']["stringValue"])

        nbReplay += 1
        if nbReplay > config.MAX_ATTEMPS:
            raise MaxAttempsError(replay=nbReplay, max=config.MAX_ATTEMPS)

        # SQS attributes
        attributes = record['messageAttributes']
        attributes.update({'sqs-dlq-replay-nb': {'StringValue': str(nbReplay), 'DataType': 'Number'}})

        _sqs_attributes_cleaner(attributes)

        # Backoff
        b = backoff.ExpoBackoffFullJitter(base=config.BACKOFF_RATE, cap=config.MESSAGE_RETENTION_PERIOD)
        delaySeconds = b.backoff(n=int(nbReplay))

        # SQS
        SQS.send_message(
            QueueUrl=config.SQS_MAIN_URL,
            MessageBody=record['body'],
            DelaySeconds=int(delaySeconds),
            MessageAttributes=record['messageAttributes']
        )

How to use the application

You can use this serverless application via:

  • The Lambda console: choose the “Browse serverless app repository” option to create a function. Select “amazon-sqs-dlq-replay-backoff” application in the public applications repository. Then, configure the application with the default SQS parameters and the replay feature parameters.
  • The Serverless Framework, as described by Yan Cui in this blog post.
  • An AWS CloudFormation template by using the AWS::ServerlessRepo::Application resource, as described in the documentation.

Here is an example of a CloudFormation template using the AWS Serverless Application Repository application:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ReplaySqsQueue:
    Type: AWS::Serverless::Application
    Properties:
      Location: 
        ApplicationId: arn:aws:serverlessrepo:eu-west-1:1234123412:applications~sqs-dlq-replay
        SemanticVersion: 1.0.0
      Parameters:
        BackoffRate: "2"
        MaxAttempts: "3"

Conclusion

I describe how an exponential backoff algorithm (with jitter) enhances the message processing capabilities of an Amazon SQS queue. You can now find the amazon-sqs-dlq-replay-backoff application in the AWS Serverless Application Repository. Download the code from this GitHub repository.

To get started with dead-letter queues in Amazon SQS, read:

To implement replay mechanisms, see:

For more serverless learning resources, visit https://serverlessland.com.

New Synchronous Express Workflows for AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/new-synchronous-express-workflows-for-aws-step-functions/

Today, AWS is introducing Synchronous Express Workflows for AWS Step Functions. This is a new way to run Express Workflows to orchestrate AWS services at high-throughput.

Developers have been using asynchronous Express Workflows since December 2019 for workloads that require higher event rates and shorter durations. Customers were looking for ways to receive an immediate response from their Express Workflows without having to write additional code or introduce additional services.

What’s new?

Synchronous Express Workflows allow developers to quickly receive the workflow response without needing to poll additional services or build a custom solution. This is useful for high-volume microservice orchestration and fast compute tasks that communicate via HTTPS.

Getting started

You can build and run Synchronous Express Workflows using the AWS Management Console, the AWS Serverless Application Model (AWS SAM), the AWS Cloud Development Kit (AWS CDK), AWS CLI, or AWS CloudFormation.

To create Synchronous Express Workflows from the AWS Management Console:

  1. Navigate to the Step Functions console and choose Create State machine.
  2. Choose Author with code snippets. Choose Express.
    This generates a sample workflow definition that you can change once the workflow is created.
  3. Choose Next, then choose Create state machine. It may take a moment for the workflow to deploy.

Starting Synchronous Express Workflows

When starting an Express Workflow, a new Type parameter is required. To start a synchronous workflow from the AWS Management Console:

  1. Navigate to the Step Functions console.
  2. Choose an Express Workflow from the list.
  3. Choose Start execution.

    Here you have an option to run the Express Workflow as a synchronous or asynchronous type.
  4. Choose Synchronous and choose Start execution.

  5. Expand Details in the results message to view the output.

Monitoring, logging and tracing

Enable logging to inspect and debug Synchronous Express Workflows. All execution history is sent to CloudWatch Logs. Use the Monitoring and Logging tabs in the Step Functions console to gain visibility into Express Workflow executions.

The Monitoring tab shows six graphs with CloudWatch metrics for Execution Errors, Execution Succeeded, Execution Duration, Billed Duration, Billed Memory, and Executions Started. The Logging tab shows recent logs and the logging configuration, with a link to CloudWatch Logs.

Enable X-Ray tracing to view trace maps and timelines of the underlying components that make up a workflow. This helps to discover performance issues, detect permission problems, and track requests made to and from other AWS services.

Creating an example workflow

The following example uses Amazon API Gateway HTTP APIs to start an Express Workflow synchronously. The workflow analyses web form submissions for negative sentiment. It generates a case reference number and saves the data in an Amazon DynamoDB table. The workflow returns the case reference number and message sentiment score.

  1. The API endpoint is generated by an API Gateway HTTP APIs. A POST request is made to the API which invokes the workflow. It contains the contact form’s message body.
  2. The message sentiment is analyzed by Amazon Comprehend.
  3. The Lambda function generates a case reference number, which is recorded in the DynamoDB table.
  4. The workflow choice state branches based on the detected sentiment.
  5. If a negative sentiment is detected, a notification is sent to an administrator via Amazon Simple Email Service (SES).
  6. When the workflow completes, it returns a ticketID to API Gateway.
  7. API Gateway returns the ticketID in the API response.

The code for this application can be found in this GitHub repository. Three important files define the application and its resources:

Deploying the application

Clone the GitHub repository and deploy with the AWS SAM CLI:

$ git clone https://github.com/aws-samples/contact-form-processing-with-synchronous-express-workflows.git
$ cd contact-form-processing-with-synchronous-express-workflows 
$ sam build 
$ sam deploy -g

This deploys 12 resources, including a Synchronous Express Workflow, three Lambda functions, an API Gateway HTTP API endpoint, and all the AWS Identity & Access Management (IAM) roles and permissions required for the application to run.

Note the HTTP APIs endpoint and workflow ARN outputs.

Testing Synchronous Express Workflows:

A new StartSyncExecution AWS CLI command is used to run the synchronous Express Workflow:

aws stepfunctions start-sync-execution \
--state-machine-arn <your-workflow-arn> \
--input "{\"message\" : \"This is bad service\"}"

The response is received once the workflow completes. It contains the workflow output (sentiment and ticketid), the executionARN, and some execution metadata.

Starting the workflow from HTTP API Gateway:

The application deploys an API Gateway HTTP API, with a Step Functions integration. This is configured in the api.yaml file. It starts the state machine with the POST body provided as the input payload.

Trigger the workflow with a POST request, using the API HTTP API endpoint generated from the deploy step. Enter the following CURL command into the terminal:

curl --location --request POST '<YOUR-HTTP-API-ENDPOINT>' \
--header 'Content-Type: application/json' \
--data-raw '{"message":" This is bad service"}'

The POST request returns a 200 status response. The output field of the response contains the sentiment results (negative) and the generated ticketId (jc4t8i).

Putting it all together

You can use this application to power a web form backend to help expedite customer complaints. In the following example, a frontend application submits form data via an AJAX POST request. The application waits for the response, and presents the user with a message appropriate to the detected sentiment, and a case reference number.

If a negative sentiment is returned in the API response, the user is informed of their case number:

Setting IAM permissions

Before a user or service can start a Synchronous Express Workflow, it must be granted permission to perform the states:StartSyncExecution API operation. This is a new state-machine level permission. Existing Express Workflows can be run synchronously once the correct IAM permissions for StartSyncExecution are granted.

The example application applies this to a policy within the HttpApiRole in the AWS SAM template. This role is added to the HTTP API integration within the api.yaml file.

Conclusion

Step Functions Synchronous Express Workflows allow developers to receive a workflow response without having to poll additional services. This helps developers orchestrate microservices without needing to write additional code to handle errors, retries, and run parallel tasks. They can be invoked in response to events such as HTTP requests via API Gateway, from a parent state machine, or by calling the StartSyncExecution API action.

This feature is available in all Regions where AWS Step Functions is available. View the AWS Regions table to learn more.

For more serverless learning resources, visit Serverless Land.

Creating faster AWS Lambda functions with AVX2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-faster-aws-lambda-functions-with-avx2/

Customers use AWS Lambda to build a wide range of applications, including mission-critical and compute-intensive applications. The most demanding workloads include machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling. With the release of Advanced Vector Extensions 2 (AVX2) support for Lambda, builders can benefit from improved performance for these types of applications.

Overview

This blog post explains AVX2 and how you can take advantage of this instruction set in your Lambda functions. I walk through an example of how to enhance performance of a typical use case using AVX2 and measure the performance gain. This feature is available for new or existing Lambda-based workloads at no additional cost.

AVX2 provides extensions to the x86 instruction set architecture. This is a Single Instruction Multiple Data (SIMD) instruction set that enables running a set of highly parallelizable operations simultaneously. AVX2 allows CPUs to perform a higher number of integer and floating-point operations per clock cycle. For vectorizable algorithms, this can enhance performance resulting in lower latencies and higher throughput.

AVX2 for Lambda

Implementing AVX2 in an example application

Pillow is a popular Python-based imaging library. It provides powerful image manipulation functions that use computationally complex processes. Computer vision operations such as convolution resampling can benefit from parallelization. This is because the filters are applied on different windows that are independent and can be processed in parallel. In this section, I compare the performance of an image transformation after applying AVX2 instructions.

The following example downloads an original JPEG object from an Amazon S3 bucket, resizes the image, and then saves the result to another S3 bucket. There are three resizing filters used – bilinear, bicubic, and Lanczos.

import boto3
import os

from PIL import Image

# Download the image to /tmp
s3 = boto3.client('s3')
s3.download_file('my-input-bucket', 'photo.jpeg', '/tmp/photo.jpeg')

def lambda_handler(event, context):
    # Open image and perform resize
    image = Image.open('/tmp/photo.jpeg')

    # Select one of the three algorithms
    image = image.resize((256, 128), Image.BILINEAR) 
    # image = image.resize((256, 128), Image.BICUBIC) 
    # image = image.resize((256, 128), Image.LANCZOS) 
    
    # Save and upload to S3
    image.save('/tmp/thumbnail.jpeg', 'JPEG')
    s3.upload_file('/tmp/thumbnail.jpeg', 'my-output-bucket', 'thumbnail.jpeg')
    
    return "Success!"

To convert code to use AVX2, you must recompile the source code with the appropriate flags, or use packages and dependencies optimized for AVX2. In this example, you can use a production-ready fork of Pillow called pillow-simd. When compiled for AVX2, it uses the AVX2 instructions to accelerate many of the features in Pillow.

First, you must compile the library using the same Amazon Linux AMI and kernel version that is used by the Lambda service. To do this, use an EC2 or AWS Cloud9 instance running Amazon Linux 2, or using a Docker container with a Lambda-supported image. Compile the library using the following commands:

# Install dependencies
~ yum install -y \
    freetype-devel \
    gcc \
    ghostscript \
    lcms2-devel \
    libffi-devel \
    libimagequant-devel \
    libjpeg-devel \
    libraqm-devel \
    libtiff-devel \
    libwebp-devel \
    make \
    openjpeg2-devel \
    rh-python36 \
    rh-python36-python-virtualenv \
    sudo \
    tcl-devel \
    tk-devel \
    tkinter \
    which \
    xorg-x11-server-Xvfb \
    zlib-devel \
    && yum clean all

# Compile code with AVX2 flag
CC="cc -mavx2" pip install --force-reinstall --no-cache-dir -t . --compile  pillow-simd

You can then use this compiled, AVX2-compatible version of the Pillow library in a Lambda function. This can be bundled with the deployment code or you can deploy the library as a Lambda layer. Depending on the version, you may have to include the following binaries from the `/usr/lib64` directory with the function. If so, add this location to LD_LIBRARY_PATH so the binaries are discoverable:

cp /usr/lib64/libtiff.so.5 lib/libtiff.so.5
cp /usr/lib64/libjpeg.so.62 lib/libjpeg.so.62
cp /usr/lib64/libjbig.so.2.0 lib/libjbig.so.2.0

In this test, I compare the performance of both the original function and the AVX2-optimized version with 1024 MB of memory. The test uses the following image:

Resampled photo

Source: https://unsplash.com/photos/IMXhx6qhvf0. Photo credit: Daniel Seßler.

  1. Bilinear filter.
  2. Bicubic filter.
  3. Lanczos filter.

The timings exclude S3 transfers and only compare the image transformation operation. The results of the three resize operations are:

FilterWithout AVX2With AVX2Performance
improvement
Bilinear

105 ms

71 ms

32%

Bicubic

122 ms

73 ms

40%

Lanczos

136 ms

77 ms

43%

Using AVX2 in popular Lambda runtimes

This process involves recompiling the source code with appropriate flags, or by selecting packages and dependencies optimized for AVX2. For popular runtimes used in Lambda:

  • Python: Python developers frequently use scipy and numpy libraries to support scientific or computationally complex work. These libraries can be compiled with the AVX2 flag or linked with MKL to take advantage of AVX2.
  • Java: Java’s JIT compiler can auto-vectorize code to run with AVX2 instructions. To learn more, see this post on how to detect vectorization and potentially optimize code to take advantage of this.
  • Golang: the standard golang compiler does not currently support auto-vectorization. However, you can use the gcc compiler for Go, gccgo.
  • Node: for compute intensive workloads, use the AVX2-enabled or MKL-enabled versions of libraries.
  • Compiling from source: for C or C++ libraries for vectorizable work, compile with the appropriate flags to allow the compiler to automatically vectorize your code. See the documentation for additional details.

Enabling AVX2 for the Intel Math Kernel Library

The Intel Math Kernel Library (MKL) is a library of optimized math operations that implicitly use AVX2 instructions when available on the compute platform. Many popular frameworks, such as PyTorch, build with MKL by default so you don’t need to take additional actions. Some libraries, such as TensorFlow, provide options in their build process to specify MKL optimization (set –config=mkl as an option).

You can also build popular scientific Python libraries, such as SciPy and NumPy, with MKL. For instructions on building libraries with MKL, read Numpy/Scipy with Intel MKL and Intel Compilers. Intel also provides a Python distribution that includes SciPy and NumPy with MKL.

Performance improvements

After enabling AVX2 for your Lambda functions, you can compare the before-and-after performance. Lambda emits a latency metric in CloudWatch that you can use to measure the performance improvement. Other third-party production monitoring tools (for example, Datadog or New Relic) can also capture these metrics to profile the performance.

Pricing and availability

Starting today, customers can either compile their existing workloads or deploy new ones to target this instruction set at no additional cost. To learn more on how to build AVX2 compatible applications on AWS Lambda, read the Lambda Developer Guide.

Support for AVX2 is available in all Regions where Lambda is available, except for the Regions in China. For more information on availability, see the AWS Region table.

Conclusion

With the release of AVX2 for Lambda, customers can now run AVX2-optimized workloads while benefitting from the pay-for-use, reduced operational model of AWS Lambda. This feature is provided at no additional cost.

Developers can create highly scalable synchronous, asynchronous, or streaming applications. Compared to the x86 Intel baseline instruction set, AVX2 allows CPUs to perform more integer and floating-point operations per clock cycle. This speeds up compute-intensive applications with parallelizable operations to process data faster and improve throughput. Developers can schedule queues with data-intensive jobs and deliver performant end user experiences.

To learn more, read the Lambda documentation. For more serverless learning resources, visit Serverless Land.

 

Introducing Amazon API Gateway service integration for AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/introducing-amazon-api-gateway-service-integration-for-aws-step-functions/

AWS Step Functions now integrates with Amazon API Gateway to enable backend orchestration with minimal code and built-in error handling.

API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. These APIs enable applications to access data, business logic, or functionality from your backend services.

Step Functions allows you to build resilient serverless orchestration workflows with AWS services such as AWS Lambda, Amazon SNS, Amazon DynamoDB, and more. AWS Step Functions integrates with a number of services natively. Using Amazon States Language (ASL), you can coordinate these services directly from a task state.

What’s new?

The new Step Functions integration with API Gateway provides an additional resource type, arn:aws:states:::apigateway:invoke and can be used with both Standard and Express workflows. It allows customers to call API Gateway REST APIs and API Gateway HTTP APIs directly from a workflow, using one of two integration patterns:

  1. Request-Response: calling a service and let Step Functions progress to the next state immediately after it receives an HTTP response. This pattern is supported by Standard and Express Workflows.
  2. Wait-for-Callback: calling a service with a task token and have Step Functions wait until that token is returned with a payload. This pattern is supported by Standard Workflows.

The new integration is configured with the following Amazon States Language parameter fields:

  • ApiEndpoint: The API root endpoint.
  • Path: The API resource path.
  • Method: The HTTP request method.
  • HTTP headers: Custom HTTP headers.
  • RequestBody: The body for the API request.
  • Stage: The API Gateway deployment stage.
  • AuthType: The authentication type.

Refer to the documentation for more information on API Gateway fields and concepts.

Getting started

The API Gateway integration with Step Functions is configured using AWS Serverless Application Model (AWS SAM), the AWS Command Line Interface (AWS CLI), AWS CloudFormation or from within the AWS Management Console.

To get started with Step Functions and API Gateway using the AWS Management Console:

  1. Go to the Step Functions page of the AWS Management Console.
  2. Choose Run a sample project and choose Make a call to API Gateway.The Definition section shows the ASL that makes up the example workflow. The following example shows the new API Gateway resource and its parameters:
  3. Review example Definition, then choose Next.
  4. Choose Deploy resources.

This deploys a Step Functions standard workflow and a REST API with a /pets resource containing a GET and a POST method. It also deploys an IAM role with the required permissions to invoke the API endpoint from Step Functions.

The RequestBody field lets you customize the API’s request input. This can be a static input or a dynamic input taken from the workflow payload.

Running the workflow

  1. Choose the newly created state machine from the Step Functions page of the AWS Management Console
  2. Choose Start execution.
  3. Paste the following JSON into the input field:
    {
      "NewPet": {
        "type": "turtle",
        "price": 74.99
      }
    }
  4. Choose Start execution
  5. Choose the Retrieve Pet Store Data step, then choose the Step output tab.

This shows the successful responseBody output from the “Add to pet store” POST request and the response from the “Retrieve Pet Store Data” GET request.

Access control

The API Gateway integration supports AWS Identity and Access Management (IAM) authentication and authorization. This includes IAM roles, policies, and tags.

AWS IAM roles and policies offer flexible and robust access controls that can be applied to an entire API or individual methods. This controls who can create, manage, or invoke your REST API or HTTP API.

Tag-based access control allows you to set more fine-grained access control for all API Gateway resources. Specify tag key-value pairs to categorize API Gateway resources by purpose, owner, or other criteria. This can be used to manage access for both REST APIs and HTTP APIs.

API Gateway resource policies are JSON policy documents that control whether a specified principal (typically an IAM user or role) can invoke the API. Resource policies can be used to grant access to a REST API via AWS Step Functions. This could be for users in a different AWS account or only for specified source IP address ranges or CIDR blocks.

To configure access control for the API Gateway integration, set the AuthType parameter to one of the following:

  1. {“AuthType””: “NO_AUTH”}
    Call the API directly without any authorization. This is the default setting.
  2. {“AuthType””: “IAM_ROLE”}
    Step Functions assumes the state machine execution role and signs the request with credentials using Signature Version 4.
  3. {“AuthType””: “RESOURCE_POLICY”}
    Step Functions signs the request with the service principal and calls the API endpoint.

Orchestrating microservices

Customers are already using Step Functions’ built in failure handling, decision branching, and parallel processing to orchestrate application backends. Development teams are using API Gateway to manage access to their backend microservices. This helps to standardize request, response formats and decouple business logic from routing logic. It reduces complexity by allowing developers to offload responsibilities of authentication, throttling, load balancing and more. The new API Gateway integration enables developers to build robust workflows using API Gateway endpoints to orchestrate microservices. These microservices can be serverless or container-based.

The following example shows how to orchestrate a microservice with Step Functions using API Gateway to access AWS services. The example code for this application can be found in this GitHub repository.

To run the application:

  1. Clone the GitHub repository:
    $ git clone https://github.com/aws-samples/example-step-functions-integration-api-gateway.git
    $ cd example-step-functions-integration-api-gateway
  2. Deploy the application using AWS SAM CLI, accepting all the default parameter inputs:
    $ sam build && sam deploy -g

    This deploys 17 resources including a Step Functions standard workflow, an API Gateway REST API with three resource endpoints, 3 Lambda functions, and a DynamoDB table. Make a note of the StockTradingStateMachineArn value. You can find this in the command line output or in the Applications section of the AWS Lambda Console:

     

  3. Manually trigger the workflow from a terminal window:
    aws stepFunctions start-execution \
    --state-machine-arn <StockTradingStateMachineArnValue>

The response looks like:

 

When the workflow is run, a Lambda function is invoked via a GET request from API Gateway to the /check resource. This returns a random stock value between 1 and 100. This value is evaluated in the Buy or Sell choice step, depending on if it is less or more than 50. The Sell and Buy states use the API Gateway integration to invoke a Lambda function, with a POST method. A stock_value is provided in the POST request body. A transaction_result is returned in the ResponseBody and provided to the next state. The final state writes a log of the transition to a DynamoDB table.

Defining the resource with an AWS SAM template

The Step Functions resource is defined in this AWS SAM template. The DefinitionSubstitutions field is used to pass template parameters to the workflow definition.

StockTradingStateMachine:
    Type: AWS::Serverless::StateMachine # More info about State Machine Resource: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-resource-statemachine.html
    Properties:
      DefinitionUri: statemachine/stock_trader.asl.json
      DefinitionSubstitutions:
        StockCheckPath: !Ref CheckPath
        StockSellPath: !Ref SellPath
        StockBuyPath: !Ref BuyPath
        APIEndPoint: !Sub "${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com"
        DDBPutItem: !Sub arn:${AWS::Partition}:states:::dynamodb:putItem
        DDBTable: !Ref TransactionTable

The workflow is defined on a separate file (/statemachine/stock_trader.asl.json).

The following code block defines the Check Stock Value state. The new resource, arn:aws:states:::apigateway:invoke declares the API Gateway service integration type.

The parameters object holds the required fields to configure the service integration. The Path and ApiEndpoint values are provided by the DefinitionsSubstitutions field in the AWS SAM template. The RequestBody input is defined dynamically using Amazon States Language. The .$ at the end of the field name RequestBody specifies that the parameter use a path to reference a JSON node in the input.

"Check Stock Value": {
  "Type": "Task",
  "Resource": "arn:aws:states:::apigateway:invoke",
  "Parameters": {
      "ApiEndpoint":"${APIEndPoint}",
      "Method":"GET",
      "Stage":"Prod",
      "Path":"${StockCheckPath}",
      "RequestBody.$":"$",
      "AuthType":"NO_AUTH"
  },
  "Retry": [
      {
          "ErrorEquals": [
              "States.TaskFailed"
          ],
          "IntervalSeconds": 15,
          "MaxAttempts": 5,
          "BackoffRate": 1.5
      }
  ],
  "Next": "Buy or Sell?"
},

The deployment process validates the ApiEndpoint value. The service integration builds the API endpoint URL from the information provided in the parameters block in the format https://[APIendpoint]/[Stage]/[Path].

Conclusion

The Step Functions integration with API Gateway provides customers with the ability to call REST APIs and HTTP APIs directly from a Step Functions workflow.

Step Functions’ built in error handling helps developers reduce code and decouple business logic. Developers can combine this with API Gateway to offload responsibilities of authentication, throttling, load balancing and more. This enables developers to orchestrate microservices deployed on containers or Lambda functions via API Gateway without managing infrastructure.

This feature is available in all Regions where both AWS Step Functions and Amazon API Gateway are available. View the AWS Regions table to learn more. For pricing information, see Step Functions pricing. Normal service limits of API Gateway and service limits of Step Functions apply.

For more serverless learning resources, visit Serverless Land.

How to deploy the AWS Solution for Security Hub Automated Response and Remediation

Post Syndicated from Ramesh Venkataraman original https://aws.amazon.com/blogs/security/how-to-deploy-the-aws-solution-for-security-hub-automated-response-and-remediation/

In this blog post I show you how to deploy the Amazon Web Services (AWS) Solution for Security Hub Automated Response and Remediation. The first installment of this series was about how to create playbooks using Amazon CloudWatch Events, AWS Lambda functions, and AWS Security Hub custom actions that you can run manually based on triggers from Security Hub in a specific account. That solution requires an analyst to directly trigger an action using Security Hub custom actions and doesn’t work for customers who want to set up fully automated remediation based on findings across one or more accounts from their Security Hub master account.

The solution described in this post automates the cross-account response and remediation lifecycle from executing the remediation action to resolving the findings in Security Hub and notifying users of the remediation via Amazon Simple Notification Service (Amazon SNS). You can also deploy these automated playbooks as custom actions in Security Hub, which allows analysts to run them on-demand against specific findings. You can deploy these remediations as custom actions or as fully automated remediations.

Currently, the solution includes 10 playbooks aligned to the controls in the Center for Internet Security (CIS) AWS Foundations Benchmark standard in Security Hub, but playbooks for other standards such as AWS Foundational Security Best Practices (FSBP) will be added in the future.

Solution overview

Figure 1 shows the flow of events in the solution described in the following text.

Figure 1: Flow of events

Figure 1: Flow of events

Detect

Security Hub gives you a comprehensive view of your security alerts and security posture across your AWS accounts and automatically detects deviations from defined security standards and best practices.

Security Hub also collects findings from various AWS services and supported third-party partner products to consolidate security detection data across your accounts.

Ingest

All of the findings from Security Hub are automatically sent to CloudWatch Events and Amazon EventBridge and you can set up CloudWatch Events and EventBridge rules to be invoked on specific findings. You can also send findings to CloudWatch Events and EventBridge on demand via Security Hub custom actions.

Remediate

The CloudWatch Event and EventBridge rules can have AWS Lambda functions, AWS Systems Manager automation documents, or AWS Step Functions workflows as the targets of the rules. This solution uses automation documents and Lambda functions as response and remediation playbooks. Using cross-account AWS Identity and Access Management (IAM) roles, the playbook performs the tasks to remediate the findings using the AWS API when a rule is invoked.

Log

The playbook logs the results to the Amazon CloudWatch log group for the solution, sends a notification to an Amazon Simple Notification Service (Amazon SNS) topic, and updates the Security Hub finding. An audit trail of actions taken is maintained in the finding notes. The finding is updated as RESOLVED after the remediation is run. The security finding notes are updated to reflect the remediation performed.

Here are the steps to deploy the solution from this GitHub project.

  • In the Security Hub master account, you deploy the AWS CloudFormation template, which creates an AWS Service Catalog product along with some other resources. For a full set of what resources are deployed as part of an AWS CloudFormation stack deployment, you can find the full set of deployed resources in the Resources section of the deployed AWS CloudFormation stack. The solution uses the AWS Service Catalog to have the remediations available as a product that can be deployed after granting the users the required permissions to launch the product.
  • Add an IAM role that has administrator access to the AWS Service Catalog portfolio.
  • Deploy the CIS playbook from the AWS Service Catalog product list using the IAM role you added in the previous step.
  • Deploy the AWS Security Hub Automated Response and Remediation template in the master account in addition to the member accounts. This template establishes AssumeRole permissions to allow the playbook Lambda functions to perform remediations. Use AWS CloudFormation StackSets in the master account to have a centralized deployment approach across the master account and multiple member accounts.

Deployment steps for automated response and remediation

This section reviews the steps to implement the solution, including screenshots of the solution launched from an AWS account.

Launch AWS CloudFormation stack on the master account

As part of this AWS CloudFormation stack deployment, you create custom actions to configure Security Hub to send findings to CloudWatch Events. Lambda functions are used to provide remediation in response to actions sent to CloudWatch Events.

Note: In this solution, you create custom actions for the CIS standards. There will be more custom actions added for other security standards in the future.

To launch the AWS CloudFormation stack

  1. Deploy the AWS CloudFormation template in the Security Hub master account. In your AWS console, select CloudFormation and choose Create new stack and enter the S3 URL.
  2. Select Next to move to the Specify stack details tab, and then enter a Stack name as shown in Figure 2. In this example, I named the stack SO0111-SHARR, but you can use any name you want.
     
    Figure 2: Creating a CloudFormation stack

    Figure 2: Creating a CloudFormation stack

  3. Creating the stack automatically launches it, creating 21 new resources using AWS CloudFormation, as shown in Figure 3.
     
    Figure 3: Resources launched with AWS CloudFormation

    Figure 3: Resources launched with AWS CloudFormation

  4. An Amazon SNS topic is automatically created from the AWS CloudFormation stack.
  5. When you create a subscription, you’re prompted to enter an endpoint for receiving email notifications from Amazon SNS as shown in Figure 4. To subscribe to that topic that was created using CloudFormation, you must confirm the subscription from the email address you used to receive notifications.
     
    Figure 4: Subscribing to Amazon SNS topic

    Figure 4: Subscribing to Amazon SNS topic

Enable Security Hub

You should already have enabled Security Hub and AWS Config services on your master account and the associated member accounts. If you haven’t, you can refer to the documentation for setting up Security Hub on your master and member accounts. Figure 5 shows an AWS account that doesn’t have Security Hub enabled.
 

Figure 5: Enabling Security Hub for first time

Figure 5: Enabling Security Hub for first time

AWS Service Catalog product deployment

In this section, you use the AWS Service Catalog to deploy Service Catalog products.

To use the AWS Service Catalog for product deployment

  1. In the same master account, add roles that have administrator access and can deploy AWS Service Catalog products. To do this, from Services in the AWS Management Console, choose AWS Service Catalog. In AWS Service Catalog, select Administration, and then navigate to Portfolio details and select Groups, roles, and users as shown in Figure 6.
     
    Figure 6: AWS Service Catalog product

    Figure 6: AWS Service Catalog product

  2. After adding the role, you can see the products available for that role. You can switch roles on the console to assume the role that you granted access to for the product you added from the AWS Service Catalog. Select the three dots near the product name, and then select Launch product to launch the product, as shown in Figure 7.
     
    Figure 7: Launch the product

    Figure 7: Launch the product

  3. While launching the product, you can choose from the parameters to either enable or disable the automated remediation. Even if you do not enable fully automated remediation, you can still invoke a remediation action in the Security Hub console using a custom action. By default, it’s disabled, as highlighted in Figure 8.
     
    Figure 8: Enable or disable automated remediation

    Figure 8: Enable or disable automated remediation

  4. After launching the product, it can take from 3 to 5 minutes to deploy. When the product is deployed, it creates a new CloudFormation stack with a status of CREATE_COMPLETE as part of the provisioned product in the AWS CloudFormation console.

AssumeRole Lambda functions

Deploy the template that establishes AssumeRole permissions to allow the playbook Lambda functions to perform remediations. You must deploy this template in the master account in addition to any member accounts. Choose CloudFormation and create a new stack. In Specify stack details, go to Parameters and specify the Master account number as shown in Figure 9.
 

Figure 9: Deploy AssumeRole Lambda function

Figure 9: Deploy AssumeRole Lambda function

Test the automated remediation

Now that you’ve completed the steps to deploy the solution, you can test it to be sure that it works as expected.

To test the automated remediation

  1. To test the solution, verify that there are 10 actions listed in Custom actions tab in the Security Hub master account. From the Security Hub master account, open the Security Hub console and select Settings and then Custom actions. You should see 10 actions, as shown in Figure 10.
     
    Figure 10: Custom actions deployed

    Figure 10: Custom actions deployed

  2. Make sure you have member accounts available for testing the solution. If not, you can add member accounts to the master account as described in Adding and inviting member accounts.
  3. For testing purposes, you can use CIS 1.5 standard, which is to require that the IAM password policy requires at least one uppercase letter. Check the existing settings by navigating to IAM, and then to Account Settings. Under Password policy, you should see that there is no password policy set, as shown in Figure 11.
     
    Figure 11: Password policy not set

    Figure 11: Password policy not set

  4. To check the security settings, go to the Security Hub console and select Security standards. Choose CIS AWS Foundations Benchmark v1.2.0. Select CIS 1.5 from the list to see the Findings. You will see the Status as Failed. This means that the password policy to require at least one uppercase letter hasn’t been applied to either the master or the member account, as shown in Figure 12.
     
    Figure 12: CIS 1.5 finding

    Figure 12: CIS 1.5 finding

  5. Select CIS 1.5 – 1.11 from Actions on the top right dropdown of the Findings section from the previous step. You should see a notification with the heading Successfully sent findings to Amazon CloudWatch Events as shown in Figure 13.
     
    Figure 13: Sending findings to CloudWatch Events

    Figure 13: Sending findings to CloudWatch Events

  6. Return to Findings by selecting Security standards and then choosing CIS AWS Foundations Benchmark v1.2.0. Select CIS 1.5 to review Findings and verify that the Workflow status of CIS 1.5 is RESOLVED, as shown in Figure 14.
     
    Figure 14: Resolved findings

    Figure 14: Resolved findings

  7. After the remediation runs, you can verify that the Password policy is set on the master and the member accounts. To verify that the password policy is set, navigate to IAM, and then to Account Settings. Under Password policy, you should see that the account uses a password policy, as shown in Figure 15.
     
    Figure 15: Password policy set

    Figure 15: Password policy set

  8. To check the CloudWatch logs for the Lambda function, in the console, go to Services, and then select Lambda and choose the Lambda function and within the Lambda function, select View logs in CloudWatch. You can see the details of the function being run, including updating the password policy on both the master account and the member account, as shown in Figure 16.
     
    Figure 15: Lambda function log

    Figure 16: Lambda function log

Conclusion

In this post, you deployed the AWS Solution for Security Hub Automated Response and Remediation using Lambda and CloudWatch Events rules to remediate non-compliant CIS-related controls. With this solution, you can ensure that users in member accounts stay compliant with the CIS AWS Foundations Benchmark by automatically invoking guardrails whenever services move out of compliance. New or updated playbooks will be added to the existing AWS Service Catalog portfolio as they’re developed. You can choose when to take advantage of these new or updated playbooks.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security Hub forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ramesh Venkataraman

Ramesh is a Solutions Architect who enjoys working with customers to solve their technical challenges using AWS services. Outside of work, Ramesh enjoys following stack overflow questions and answers them in any way he can.

Simplifying cross-account access with Amazon EventBridge resource policies

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/simplifying-cross-account-access-with-amazon-eventbridge-resource-policies/

This post is courtesy of Stephen Liedig, Sr Serverless Specialist SA.

Amazon EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the targets and EventBridge routes the events accordingly.

A common architectural approach adopted by customers is to isolate these application components or services by using separate AWS accounts. This “account-per-service” strategy limits the blast radius by providing a logical and physical separation of resources. It provides additional security boundaries and allows customers to easily track service costs without having to adopt a complex tagging strategy.

To enable the flow of events from one account to another, you must create a rule on one event bus that routes events to an event bus in another account. To enable this routing, you need to configure the resource policy for your event buses.

This blog post shows you how to use EventBridge resource policies to publish events and create rules on event buses in another account.

Overview

Today, EventBridge launches improvements to resource policies that make it easier to build applications that work across accounts. The service expands the use of the policy associated with an event bus to the authorization of API calls.

This means you can manage permissions for API calls that interact with the event bus, such as PutEventsPutRule, and PutTargets, directly from that event bus’ resource policy. This replaces the need to create different IAM roles that are assumed by each account that interacts with the event bus. It also provides a central resource to manage your permissions.

There is support for organizations and tags via IAM conditions. Now when you call an API, it considers both the user’s IAM policy and the event bus resource policy in the authorization process.

EventBridge APIs that accept an event bus name parameter (including PutRule, PutTargets, DeleteRule, RemoveTargets, DisableRule, and EnableRule) now also support an event bus ARN. This allows you to target cross-account event buses through the APIs. For example, you can call PutRule to create a rule on an event bus in another account, without needing to assume a role.

EventBridge now supports using policy conditions for the following authorization context keys in the APIs, to help scope down permissions.

Context key

APIs

Customer usage

events:detail-typePutEventsUsed to restrict PutEvents calls for events with a specific “detail-type” field.
events:sourcePutEventsUsed to restrict PutEvents calls for events with a specific “source” field.
events:creatorAccountPutRule,
PutTargets,
DeleteRule,
RemoveTargets,
DisableRule,
EnableRule,
TagResource,
UntagResource,
DescribeRule,
ListTargetsByRule,
ListTagsForResource

Used to restrict control plane API calls on rules belonging to a certain account.

This can be used to allow a customer to edit/disable only rules created by their own account.

events:eventBusInvocationPutEvents

Used to differentiate a PutEvents API call from a cross-account event bus target invocation. This context key is set to true during a cross-account event bus target invocation authorization. For example, when a rule matches an event and sends that event to another event bus.

For an API call of PutEvents, this context key is set to false.

Ecommerce example walkthrough

In this ecommerce example, there are multiple services distributed across different accounts. A web store publishes an event when a new order is created. The event is sent via a central event bus, which is in another account. The bus has two rules with target services in different AWS accounts.

Walkthrough architecture

The goal is to create fine-grained permissions that only allow:

  • The web store to publish events for a specific detail-type and source.
  • The invoice processing service to create and manage its own rules on the central bus.

To complete this walk through, you set up three accounts. For account A (Web Store), you deploy an AWS Lambda function that sends the “newOrderCreated” event directly to the “central event bus” in account B. The invoice processing Lambda function in account C creates a rule on the central event bus to process the event published by account A.

Create the central event bus in account B

Account B event bus

Create the central event bus in account B, adding the following resource policy. Be sure to substitute your account numbers for accounts A, B, and C.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WebStoreCrossAccountPublish",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::[ACCOUNT-A]:root"
      },
      "Action": "events:PutEvents",
      "Resource": "arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus",
      "Condition": {
        "StringEquals": {
          "events:detail-type": "newOrderCreated",
          "events:source": "com.exampleCorp.webStore"
        }
      }
    },
    {
      "Sid": "InvoiceProcessingRuleCreation",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::[ACCOUNT-C]:root"
      },
      "Action": [
        "events:PutRule",
        "events:DeleteRule",
        "events:DescribeRule",
        "events:DisableRule",
        "events:EnableRule",
        "events:PutTargets",
        "events:RemoveTargets"
      ],
      "Resource": "arn:aws:events:us-east-1:[ACCOUNT-B]:rule/central-event-bus/*",
      "Condition": {
        "StringEqualsIfExists": {
          "events:creatorAccount": "${aws:PrincipalAccount}",
          "events:source": "com.exampleCorp.webStore"
        }
      }
    }
  ]
}

Create event bus

There are two statements in the resource policy: WebStoreCrossAccountPublish and InvoiceProcessingRuleCreation.

The WebStoreCrossAccountPublish statement allows the Lambda function in account A to publish events directly to the central event bus. There are two conditions in the statement that further restrict the types of events that can be sent to the event bus. The first restricts the event detail-type to equal “newOrderCreated” and the second condition requires that the event source equals “com.exampleCorp.webStore”.

The InvoiceProcessingRuleCreation statement allows the invoice processing function in account C to describe, add, update, enable, disable, and delete any rules created by account C. You apply this restriction by using the events:creatorAccount context key in the statements condition.

Importantly you should set the StringEqualsIfExists type for the events:creatorAccount condition. If you use StringEquals, it results in an AccessDeniedException. AWS CloudFormation calls DescribeRule to check if the rule already exists. However, as this is a new rule, and because you set a condition for events:creatorAccount for DescribeRule, this key is not populated and CloudFormation receives an AccessDeniedException instead of ResourceNotFoundException.

Here is how you create the event bus using AWS CloudFormation:

  CentralEventBus: 
      Type: AWS::Events::EventBus
      Properties: 
          Name: !Ref EventBusName

  WebStoreCrossAccountPublishStatement: 
      Type: AWS::Events::EventBusPolicy
      Properties: 
          EventBusName: !Ref CentralEventBus
          StatementId: "WebStoreCrossAccountPublish"
          Statement: 
              Effect: "Allow"
              Principal: 
                  AWS: !Sub arn:aws:iam::${AccountA}:root
              Action: "events:PutEvents"
              Resource: !GetAtt CentralEventBus.Arn
              Condition:
                  StringEquals:
                      "events:detail-type": "newOrderCreated"
                      "events:source" : "com.exampleCorp.webStore"
                      
  InvoiceProcessingRuleCreationStatement: 
      Type: AWS::Events::EventBusPolicy
      Properties: 
          EventBusName: !Ref CentralEventBus
          StatementId: "InvoiceProcessingRuleCreation"
          Statement: 
              Effect: "Allow"
              Principal: 
                  AWS: !Sub arn:aws:iam::${AccountC}:root
              Action: 
                  - "events:PutRule"
                  - "events:DeleteRule"
                  - "events:DescribeRule"
                  - "events:DisableRule"
                  - "events:EnableRule"
                  - "events:PutTargets"
                  - "events:RemoveTargets"
              Resource: 
                  - !Sub arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/${CentralEventBus.Name}/*
              Condition:
                  StringEqualsIfExists:
                      "events:creatorAccount" : "${aws:PrincipalAccount}"
                  StringEquals:
                      "events:source": "com.exampleCorp.webStore"

Now that you have a policy set up on the central event bus, configure the client applications to send and process events. The client application must also have permissions configured.

Create the web store order function in account A

Web Store order function

In account A, create a Lambda function to send the event to the central bus in account B. Set the EventBusName parameter to the central event bus ARN on the PutEvents API call. This allows you to target cross-account event buses directly.

import json
import boto3

EVENT_BUS_ARN = 'arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus'

# Create EventBridge client
events = boto3.client('events')

def lambda_handler(event, context):

  # new order created event datail
  eventDetail  = {
    "orderNo": "123",
    "orderDate": "2020-09-09T22:01:02Z",
    "customerId": "789",
    "lineItems": {
      "productCode": "P1",
      "quantityOrdered": 3,
      "unitPrice": 23.5,
      "currency": "USD"
    }
  }
  
  try:
    # Put an event
    response = events.put_events(
        Entries=[
            {
                'EventBusName': EVENT_BUS_ARN,
                'Source': 'com.exampleCorp.webStore',
                'DetailType': 'newOrderCreated',
                'Detail': json.dumps(eventDetail)
            }
        ]
    )
    
    print(response['Entries'])
    print('Event sent to the event bus ' + EVENT_BUS_ARN )
    print('EventID is ' + response['Entries'][0]['EventId'])
    
  except Exception as e:
      print(e)

Create the Invoice Processing service in account C

Invoice processing service in account C

Next, create the invoice processing function that processes the newOrderCreated event. You use the AWS Serverless Application Model (AWS SAM) to create the invoice processing function and other application resources. Before you can process any events from the central event bus, you must create a new event bus in account C to receive incoming events.

Next, you define the function that processes the events. Here, you define a Lambda event source that is an EventBridge rule. You set the EventBusName to the receiving invoice processing event bus. When this Lambda function is deployed, AWS SAM creates the rule on the event bus with the specified pattern and target. It configures the event source that triggers the function when an event is received.

Parameters:
  EventBusName:
    Description: Name of the central event bus
    Type: String
    Default: invoice-processing-event-bus
  CentralEventBusArn:
    Description: The ARN of the central event bus # e.g. arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus
    Type: String
Resources:
  # This is the receiving invoice processing event bus in account C.
  InvoiceProcessingEventBus: 
    Type: AWS::Events::EventBus
    Properties: 
        Name: !Ref EventBusName
# AWS Lambda function processes the newOrderCreated event
  InvoiceProcessingFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: invoice_processing
      Handler: invoice_processing_function/app.lambda_handler
      Runtime: python3.8
      Events:
        NewOrderCreatedRule:
          Type: EventBridgeRule
          Properties:
            EventBusName: !Ref InvoiceProcessingEventBus
            Pattern:
              source:
                - com.exampleCorp.webStore
              detail-type:
                - newOrderCreated

The next resource in the AWS SAM template is the rule that creates the target on the central event bus. It sends events to the invoice processing event bus. Though the rule is added to the central event bus, its definition is managed by the invoice processing service template. The rule definition sets EventBusName parameter to the ARN of the central event bus.

  # This is the rule that the invoice processing service creates on the central event bus
  InvoiceProcessingRule:
    Type: AWS::Events::Rule
    Properties:
      Name: InvoiceProcessingNewOrderCreatedSubscription
      Description: Cross account rule created by Invoice Processing service
      EventBusName: !Ref CentralEventBusArn # ARN of the central event bus
      EventPattern:
        source:
          - com.exampleCorp.webStore
        detail-type:
          - newOrderCreated
      State: ENABLED
      Targets: 
        - Id: SendEventsToInvoiceProcessingEventBus
          Arn: !GetAtt InvoiceProcessingEventBus.Arn
          RoleArn: !GetAtt CentralEventBusToInvoiceProcessingEventBusRole.Arn
          DeadLetterConfig:
            Arn: !GetAtt InvoiceProcessingTargetDLQ.Arn

For the central event bus target to send the event to the invoice processing event bus in account C, EventBridge needs the necessary permissions to invoke the PutEvents API. The CentralEventBusToInvoiceProcessingEventBusRole IAM role provides that permission. It is assumed by the central event bus in account B when it needs to send events to the invoice processing event bus, without you having to create an additional resource policy on the invoice processing event bus.

  CentralEventBusToInvoiceProcessingEventBusRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
              - events.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: PutEventsOnInvoiceProcessingEventBus
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 'events:PutEvents'
                Resource: !GetAtt InvoiceProcessingEventBus.Arn

You can also set up a dead-letter queue (DLQ) configuration for the rule in account C. This allows the subscriber of the event to control where events that fail to get delivered to the invoice processing event bus get sent. All you need to do to make this happen is create an Amazon SQS queue in account C, and a queue policy that sets a resource policy to allow EventBridge to send failed events from account B to the queue in account C.

  # Invoice Processing Target Dead Letter Queue 
  InvoiceProcessingTargetDLQ:
    Type: AWS::SQS::Queue

  # SQS resource policy required to allow target on central bus to send failed messages to target DLQ
  InvoiceProcessingTargetDLQPolicy: 
    Type: AWS::SQS::QueuePolicy
    Properties: 
      Queues: 
        - !Ref InvoiceProcessingTargetDLQ
      PolicyDocument: 
        Statement: 
          - Action: 
              - "SQS:SendMessage" 
            Effect: "Allow"
            Resource: !GetAtt InvoiceProcessingTargetDLQ.Arn
            Principal:  
              Service: "events.amazonaws.com"
            Condition:
              ArnEquals:
                "aws:SourceArn": !GetAtt InvoiceProcessingRule.Arn 

Conclusion

This post shows you how to use the new features Amazon EventBridge resource policies that make it easier to build applications that work across accounts. Resource policies provide you with a powerful mechanism for modeling your event buses across multiple accounts, and give you fine-grained control over EventBridge API invocations.

Download the code in this blog from https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples.

For more serverless learning resources, visit Serverless Land.

Tracking the latest server images in Amazon EC2 Image Builder pipelines

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/tracking-the-latest-server-images-in-amazon-ec2-image-builder-pipelines/

This post courtesy of Anoop Rachamadugu, Cloud Architect at AWS

The Amazon EC2 Image Builder service helps users to build and maintain server images. The images created by EC2 Image Builder can be used with Amazon Elastic Compute Cloud (EC2) and on-premises. Image Builder reduces the effort of keeping images up-to-date and secure by providing a graphical interface, built-in automation, and AWS-provided security settings. Customers have told us that they manage multiple server images and are looking for ways to track the latest server images created by the pipelines.

In this blog post, I walk through a solution that uses AWS Lambda and AWS Systems Manager (SSM) Parameter Store. It tracks and updates the latest Amazon Machine Image (AMI) IDs every time an Image Builder pipeline is run. With Lambda, you pay only for what you use. You are charged based on the number of requests for your functions and the time it takes for your code to run. In this case, the Lambda function is invoked upon the completion of the image builder pipeline. Standard SSM parameters are available at no additional charge.

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. Consider the use case of updating Amazon Machine Image (AMI) IDs for the EC2 instances in your CloudFormation templates. Normally, you might map AMI IDs to specific instance types and Regions. Then to update these, you would manually change them in each of your templates. With the SSM parameter integration, your code remains untouched and a CloudFormation stack update operation automatically fetches the latest Parameter Store value.

Overview

This solution uses a Lambda function written in Python that subscribes to an Amazon Simple Notification Service (SNS) topic. The Lambda function and the SNS topic are deployed using AWS SAM CLI. Once deployed, the SNS topic must be configured in an existing Image Builder pipeline. This results in the Lambda function being invoked at the completion of the Image Builder pipeline.

When a Lambda function subscribes to an SNS topic, it is invoked with the payload of the published messages. The Lambda function receives the message payload as an input parameter. The Lambda function first checks the message payload to see if the image status is available. If the image state is available, it retrieves the AMI ID from the message payload and updates the SSM parameter.

EC2 Image builder architecture diagram

EC2 Image builder architecture diagram

Prerequisites

To get started with this solution, the following is required:

Deploying the solution

The solution consists of two files, which can be downloaded from the amazon-ec2-image-builder GitHub repository.

  1. The Python file image-builder-lambda-update-ssm.py contains the code for the Lambda function. It first checks the SNS message payload to determine if the image is available. If it’s available, it extracts the AMI ID from the SNS message payload and updates the SSM parameter specified.The ‘ssm_parameter_name’ variable specifies the SSM parameter path where the AMI ID should be stored and updated. The Lambda function finishes by adding tags to the SSM parameter.
  2. The template.yaml file is an AWS SAM template. It deploys the Lambda function, SNS topic, and IAM role required for the Lambda function. I use Python 3.7 as the runtime and assign a memory of 256 MB for the Lambda function. The IAM policy gives the Lambda function permissions to retrieve and update SSM parameters. Deploy this application using the AWS SAM CLI guided deploy:
    sam deploy --guided

After deploying the application, note the ARN of the created SNS topic. Next, update the infrastructure settings of an existing Image Builder pipeline with this newly created SNS topic. This results in the Lambda function being invoked upon the completion of the image builder pipeline.

Configuration details

Configuration details

Verifying the solution

After the completion of the image builder pipeline, use the AWS CLI or check the AWS Management Console to verify the updated SSM parameter. To verify via AWS CLI, run the following commands to retrieve and list the tags attached to the SSM parameter:

aws ssm get-parameter --name ‘/ec2-imagebuilder/latest’
aws ssm list-tags-for-resource --resource-type "Parameter" --resource-id ‘/ec2-imagebuilder/latest’

To verify via the AWS Management Console, navigate to the Parameter Store under AWS Systems Manager. Search for the parameter /ec2-imagebuilder/latest:

AWS Systems Manager: Parameter Store

AWS Systems Manager: Parameter Store

Select the Tags tab to view the tags attached to the SSM parameter:

Image builder tags list

Image builder tags list

Referencing the SSM Parameter in CloudFormation templates

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. This sample code shows how to reference the SSM parameter in a CloudFormation template.

Parameters :
  LatestAmiId :
    Type : 'AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>'
    Default: ‘/ec2-imagebuilder/latest’

Resources :
  Instance :
    Type : 'AWS::EC2::Instance'
    Properties :
      ImageId : !Ref LatestAmiId

Conclusion

In this blog post, I demonstrate a solution that allows users to track and update the latest AMI ID created by the Image Builder pipelines. The Lambda function retrieves the AMI ID of the image created by a pipeline and update an AWS Systems Manager parameter. This Lambda function is triggered via an SNS topic configured in an Image Builder pipeline.

The solution is deployed using AWS SAM CLI. I also note how users can reference Systems Manager parameters in AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure.

The amazon-ec2-image-builder-samples GitHub repository provides a number of examples for getting started with EC2 Image Builder. Image Builder can make it easier for you to build virtual machine (VM) images.

Performing canary deployments for service integrations with Amazon API Gateway

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/performing-canary-deployments-for-service-integrations-with-amazon-api-gateway/

This post authored by Dhiraj Thakur and Sameer Goel, Solutions Architects at AWS.

When building serverless web applications, it is common to use AWS Lambda functions as the compute layer for business logic. To manage canary releases, it’s best practice to use Lambda deployment preferences. However, if you use Amazon API Gateway service integrations instead of Lambda functions, it is necessary to manage the canary release at the API level. This post shows how to use canary releases in REST APIs to gradually deploy changes to serverless applications.

Overview

Modern applications frequently deploy updates to implement new features. But updating or changing a production application is often risky and may introduce bugs. Canary deployments are a popular strategy to help mitigate this risk.

In a canary deployment, you partially deploy a new software feature and shift some percentage of traffic to a new version of the application. This allows you to verify stability and reduce risk associated with the new release. After gaining confidence in the new version, you continually increment traffic until all traffic flows to the new release. Additionally, a canary deployment can be a cost-effective approach as there is no need to duplicate application resources, compared with other deployment strategies such as blue/green deployments.

In this example, there are two service versions deployed with API Gateway. The canary version receives 10% of traffic and the remaining 90% is routed to the stable version.

Canary deploy example

Canary deploy example

After deploying the new version, you can test the health and performance of the new version. Once you are confident that it is ready for release, you can promote the canary version and send 100% of traffic to this API version.

Promoted deployment example

Promoted deployment example

In this post, I show how to use AWS Serverless Application Model (AWS SAM) to build a canary release with a REST API in API Gateway. This is an open-source framework for building serverless applications. It enables developers to define and deploy canary releases and then shift the traffic programmatically. In this example, AWS SAM creates the canary settings necessary to divide traffic and the IAM role used by API Gateway.

API Gateway canary deployment example

For this tutorial, a REST API integrates directly with Amazon DynamoDB. This returns three data attributes from the DynamoDB table. In the canary version, the code is modified to provide additional information from the table.

Create Amazon REST API and other resources

Download the code from this post from https://github.com/aws-samples/amazon-api-gateway-canary-deployment. The template.yaml file is the AWS SAM configuration for the application, and the api.yaml is the OpenAPI configuration for the API. Deploy this application by following the instructions in the README.md file.

The deployment creates an empty DynamoDB table called “<sam-stack-name>-DataTable-*” and an API Gateway REST API called “Canary Deployment” with the stage “PROD”.

  1. Run the Amazon DynamoDB put-item command to create a new item in the DynamoDB table from the AWS CLI. Ensure you have configured AWS CLI – refer to the quickstart guide to learn more.Replace <tablename> with the DynamoDB table name.
    aws dynamodb put-item --table-name <tablename> --item "{""country"":{""S"":""Germany""},""runner-up"":{""S"":""France""},""winner"":{""S"":""Italy""},""year"":{""S"":""2006""}}" --return-consumed-capacity TOTAL

    It returns a success message:

    Update Amazon DynamoDB output

    Update Amazon DynamoDB output

    You can verify the record in the DynamoDB table in the AWS Management Console:

    Scan of Amazon DynamoDB table

    Scan of Amazon DynamoDB table

  2. Select the REST API “Canary Deployment” in Amazon API Gateway. Choose “GET” under the resource section. In the Integration Request, you see the Mapping Template:
    {
      "Key": {
        "year": {
          "S": "$input.params("year")"
        }
      },
      "TableName": "<stack-name>-DataTable-<random-string>"
    }

    The Integration Response is an HTTP response encapsulating the backend response and template looks like this:The TableName indicates which table is used in the REST API call. The value for year is extracted from the request URL using $input.params(‘year’)

    {
      "year": "$input.path('$.Item.year.S')",
      "country": "$input.path('$.Item.country.S')",
      "winner": "$input.path('$.Item.winner.S')"
    }

    It returns the “country”, “year”, “winner” attributes.

  3. You can also check the logs/tracing configuration in the API stage as per the following settings. You can see Amazon CloudWatch Logs are enabled for the API, which helps to check the health of the canary API version.For example, a response code of 2xx indicates that the operation was successful. Other error codes indicate either a client error (4xx) or a server error (5xx). See this link for status code details. Analyze the status of the API in the logs before promoting the canary.

    Enabling logs on the Amazon API Gateway console

    Enabling logs on the Amazon API Gateway console

If you invoke the API endpoint URL in your browser, you can see it returns “country”, “year” and “winner”, as expected from the DynamoDB table.

Invoking endpoint from browser example

Invoking endpoint from browser example

Next, set up the canary release deployment to create a new version of the deployed API and route 10% of the API traffic to it.

Canary deployment

You can now create a new version of the API using the AWS SAM template, which changes the number of attributes returned. With the new version of the API, the additional attribute “runner-up” is returned from the DynamoDB table. For the initial deployment, 10% of API traffic is routed to this API version.

  1. Go to the canary-stack directory and deploy the application. Be sure to use the same stack name that you used for the previous deployment:
    sam deploy -gAWS CloudFormation deploys the canary version and configures the API to route 10% of traffic the new version.You can validate this by checking the canary setting in the PROD stage. You can see “percentage of requests directed to canary” (new version) is “10%” and “percentage of requests directed to Prod” (previous version) is 90%.
  2. Check the Integration Response. The modified template looks like this:
    {
      "year": "$input.path('$.Item.year.S')",
      "country": "$input.path('$.Item.country.S')",
      "winner": "$input.path('$.Item.winner.S')",
      "runner-up": "$input.path('$.Item.runner-up.S')"
    }
  3. Now, test the canary deployment using the API endpoint URL. You can refresh the browser and see the “runner-up” results shown for a small percentage of requests. This demonstrates that 10% of the traffic is routed to the canary. If don’t see this new attribute, even after multiple refreshes, clear your browser cache.Reviewing the Integration Response, you can see that the template now includes the additional attribute “runner-up”. This returns “country”, “year”, “winner” and “runner-up”, as per the new canary release requirement.

    Testing response in browser after change

    Testing response in browser after change

Analyze Amazon CloudWatch Logs

You can analyze the health of the canary version via Amazon CloudWatch Logs. To ensure that there is data in CloudWatch Logs, refresh your browser several times when accessing the API URL.

  1. In the AWS Management Console, navigate to Services -> CloudWatch.
  2. Choose the Region that matches your API Gateway Region, then select Logs on the Left menu.
  3. The logs for API Gateway are named based on the ID of the API. The form is “API-Gateway-Execution-Logs_<api id>/<api stage>
    Viewing the logs, you can see a list of log streams with GUID identifiers. Use the Last Event Time column for a date/time stamp and find a recent execution.
  4. Analyze the canary log to confirm that the REST API call is successful.
Canary promotion options

Canary promotion options

Promote or delete the canary version

To roll back to the initial version, choose Delete Canary or set “Percentage of requests directed to Canary“ to 0. If the Amazon CloudWatch analysis shows that the canary version is operating successfully, you are ready to promote the canary to receive all API traffic.

  1. Navigate to the Canary tab and choose Promote Canary.

    Promoting the canary in the Amazon API Gateway console

    Promoting the canary in the Amazon API Gateway console

  2. Choose Update to accept the settings. This sends 100% traffic to the new version.

    Canary promotion options

    Canary promotion options

Cleanup

See the repo’s README.md for cleanup instructions.

Conclusion

Canary deployments are a recommended practice for testing new versions of applications. This blog post shows how to implement canary deployments for service integrations in API Gateway. I walk through how to analyze the logs generated for canary requests and promote the canary to complete the deployment. Using AWS SAM, you deploy a canary in API Gateway with a predefined routing configuration and strategy.

To learn more, read Building APIs with Amazon API Gateway and Implementing safe AWS Lambda deployments with AWS CodeDeploy.

Application integration patterns for microservices: Orchestration and coordination

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-for-microservices-orchestration-and-coordination/

This post is courtesy of Stephen Liedig, Sr. Serverless Specialist SA.

This is the final blog post in the “Application Integration Patterns for Microservices” series. Previous posts cover asynchronous messaging for microservices, fan-out strategies, and scatter-gather design patterns.

In this post, I look at how to implement messaging patterns to help orchestrate and coordinate business workflows in our applications. Specifically, I cover two patterns:

  • Pipes and Filters, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004)
  • Saga Pattern, which is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987.

I discuss these patterns using the Wild Rydes example from this series.

Wild Rydes

Wild Rydes is a fictional technology start-up created to disrupt the transportation industry by replacing traditional taxis with unicorns. Several hands-on AWS workshops use the Wild Rydes scenario. It illustrates concepts such as serverless development, event-driven design, API management, and messaging in microservices.

Wild Rydes

This blog post explores the build process of the Wild Rydes workshop, to help you apply these concepts to your applications.

After completing a unicorn ride, the Wild Rydes customer application charges the customer. Once the driver submits a ride completion, an event triggers the following steps:

  • Registers the fare: registers the fare ride completion event.
  • Initiates the payment (via a payment service): calls a payment gateway for credit card pre-authorization. Using the pre-authorization code, it completes the payment transaction.
  • Updates customer accounting system: once the payment is processed, updates the Wild Rydes customer accounting system with the transaction detail.
  • Publishes “Fare Processed” event: sends a notification to interested components that the process is completed.

Each of the steps interfaces with separate systems – the Wild Rydes system, a third-party payment provider, and the customer accounting system. You could implement these steps inside a single component, but that would make it difficult to change and adapt. It’d also reduce the potential for components reuse within our application. Breaking down the steps into individual components allows you to build components with a single responsibility making it easier to manage each components dependencies and application lifecycle. You can be selective about how you implement the respective components, for example, different teams responsible for the development of the respective components may choose to use different languages. This is where the Pipes and Filters architectural pattern can help.

Pipes and filters

Hohpe and Woolf define Pipes and Filters as an “architectural style to divide a larger processing task into a sequence of smaller, independent processing steps (filters) that are connected by channels (pipes).”

Pipes and filters architecture

Pipes provide a communications channel that abstracts the consumer of messages sent through that channel. It decouples your filter from one another, so components only need to know the messaging channel, or endpoint, where they are sending messages. They do not know who, or what, is processing that message, or where the receiver is located on the network.

Amazon SQS provides a lightweight solution with the power and scale of messaging middleware. It is a simple, flexible, fully managed message queuing service for reliably and continuously exchanging large volume of messages. It has virtually limitless scalability and the ability to increase message throughput without pre-provisioning capacity.

You can create an SQS queue with this AWS CLI command:

aws sqs create-queue --queue-name MyQueue

For the fare processing scenario, you could implement a Pipes and Filters architectural pattern using AWS services. This uses two Amazon SQS queues and an Amazon SNS topic:

Pipes and filters pattern with AWS services

Amazon SQS provides a mechanism for decoupling the components. The filters only need to know to which queue to send the message, without knowing which component processes that message nor when it is processed. SQS does this in a secure, durable, and scalable way.

Despite the fact that none of the filters have a direct dependency on one another, there is still a degree of coupling at the pipe level. Changing execution order therefore forces you to update and redeploy your existing filters to point to a new pipe. In the Wild Rydes example, you can reduce the impact of this by defining an environment variable for the destination endpoint in AWS Lambda function configuration, rather than hardcoding this inside your implementations.

Dealing with failures and retries requires some consideration too. In Amazon SQS terms, this requires you to define configurations, such a message VisibilityTimeOut. The VisibilityTimeOut setting provides you with some transactional support. It ensures that the message is not removed from the queue until after you have finished processing the message and you explicitly delete it from the queue. Using Amazon SQS as an Event Source for AWS Lambda further simplifies that for you because the message polling implementation is managed by the service, so you don’t need to create an explicit implementation in your filter.

Amazon SQS helps deal with failures gracefully as it maintains a count of how many times a message is processed via ReceiveCount. By specifying a maxReceiveCount, you can limit the number of times a poisoned message gets processed. Combine this with a dead letter queue (DLQ), you can then move messages that have exceeded the maxReceiveCount number to the DLQ. Adding Amazon CloudWatch alarms on metrics such as ApproximateNumberOfMessagesVisible on the DLQ, you can proactively alert on system failures if the number of messages on the dead letter queue exceed and acceptable threshold.

Alternatively, you can model the fare payment scenario with AWS Step Functions. Step Functions externalizes the Pipes and Filters pattern. It extracts the coordination from the filter implementations into a state machine that orchestrates the sequence of events. Visual workflows allow you to change the sequence of execution without modifying code, reducing the amount of coupling between collaborating components.

Here is how you could model the fare processing scenario using Step Functions:

Fare processing with Step Functions

{
  "Comment": "StateMachine for Processing Fare Payments",
  "StartAt": "RegisterFare",
  "States": {
    "RegisterFare": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:RegisterFareFunction",
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ChargeFareFunction",
      "Next": "UpdateCustomerAccount"
    },
    "UpdateCustomerAccount": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:UpdateCustomerAccountFunction",
      "Next": "PublishFareProcessedEvent"
    },
    "PublishFareProcessedEvent": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:REGION:ACCOUNT_ID:myTopic",
        "Message": {
          "Input": "Hello from Step Functions!"
        }
      },
      "End": true
    }
  }
}

AWS Step Functions allows you to easily build more sophisticated workflows. These could include decision points, parallel processing, wait states to pause the state machine execution, error handling, and retry logic. Error and Retry states help you simplify your component implementation by providing a framework for error handling and implementation exponential backoff on retries. You can define alternate execution paths if failures cannot be handled.

In this implementation, each of these states is a discrete transaction. Some implement database transactions when registering the fare, others are calling the third-party payment provider APIs, and internal APIs or programming interfaces when updating the customer accounting system.

Dealing with each of these transactions independently is relatively straightforward. But what happens if you require consistency across all steps so that either all or none of the transactions complete? How can you deal with consistency across multiple, distributed transactions? How do we deal with the temporal aspects of coordinating these potentially long running heterogeneous integrations?

Consistency across multiple, distributed transactions.

Cloud providers do not support Distributed Transaction Coordinators (DTC) or two-phase commit protocols responsible for coordinating transactions across multiple cloud resources. Therefore, you need a mechanism to explicitly coordinate multiple local transactions. This is where the saga pattern and AWS Step Functions can help.

A saga is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987, they define the concept of a saga as:

“LLT is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions.” (Garcia-Molina, Salem 1987)

Fundamentally, saga can provide a failure management pattern to establish consistency across all of your distributed applications, by implementing a compensating transaction for each step in a series of functions. Compensating transactions allow you to back out of the changes that were previously committed in your series of functions, so that if one of your steps fails you can “undo” what you did before, and leave your system in stable state, devoid of side-effects.

AWS Step Functions provides a mechanism for implementing a saga pattern with the ability to build fully managed state machines that allow you to catch custom business exceptions and manage and share data across state transitions.

Infrastructure with service integrations

Figure 1: Using Step Functions’ Service Integrations for Amazon DynamoDB and Amazon SNS, you can further reduce the need for a custom AWS Lambda implementation to persist data to the database, or send a notification.

By using these capabilities, you can expand on the previous Fare Processing state machine and implementing compensating transaction states. If Register Fare fails, you may want to emit an event that invokes an external support function or generates a notification informing operators of the system the error.

If payment processing failed, you would want to ensure that the status is updated to reflect state change and then notify operators of the failed event. You might decide to refund customers, update the fare status and notify support, until you have been able to resolve issues with the customer accounting system. Regardless of the approach, Step Functions allows you to model a failure scenario that aligns with a more business-centric view of consistency.

Step Functions workflow results

If you want to see the full state machine implementation in Lab4 of Wild Rydes Asynchronous Messaging Workshop. The workshop guides you through building your own state machine so you can see how to apply the pattern to your own scenarios. There are also three other workshops you can walk through that cover the other patterns in the series.

Conclusion

Using Wild Rydes, I show how to use Amazon SQS and AWS Step Functions to decouple your application components and services. I show you how these services help to coordinate and orchestrate distributed components to build resilient and fault tolerant microservices architectures.

Take part in the Wild Rydes Asynchronous Messaging Workshop and learn about the other messaging patterns you can apply to microservices architectures, including fan-out and message filtering, topic-queue-chaining and load balancing (blog post), and scatter-gather.

The Wild Rydes Asynchronous Messaging Workshop resources are hosted on our AWS Samples GitHub repository, including the sample code for this blog post under Lab-4: Choreography and orchestration.

For a deeper dive into queues and topics and how to use these in microservices architectures, read:

  1. The AWS whitepaper, Implementing Microservices on AWS.
  2. Implementing enterprise integration patterns with AWS messaging services: point-to-point channels.
  3. Implementing enterprise integration patterns with AWS messaging services: publish-subscribe channels.
  4. Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox.

For more information on enterprise integration patterns, see:

Getting started with RPA using AWS Step Functions and Amazon Textract

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/getting-started-with-rpa-using-aws-step-functions-and-amazon-textract/

This post is courtesy of Joe Tringali, Solutions Architect.

Many organizations are using robotic process automation (RPA) to automate workflow, back-office processes that are labor-intensive. RPA, as software bots, can often handle many of these activities. Often RPA workflows contain repetitive manual tasks that must be done by humans, such as viewing invoices to find payment details.

AWS Step Functions is a serverless function orchestrator and workflow automation tool. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents. Combining these services, you can create an RPA bot to automate the workflow and enable employees to handle more complex tasks.

In this post, I show how you can use Step Functions and Amazon Textract to build a workflow that enables the processing of invoices. Download the code for this solution from https://github.com/aws-samples/aws-step-functions-rpa.

Overview

The following serverless architecture can process scanned invoices in PDF or image formats for submitting payment information to a database.

Example architecture
To implement this architecture, I use single-purpose Lambda functions and Step Functions to build the workflow:

  1. Invoices are scanned and loaded into an Amazon Simple Storage Service (S3) bucket.
  2. The loading of an invoice into Amazon S3 triggers an AWS Lambda function to be invoked.
  3. The Lambda function starts an asynchronous Amazon Textract job to analyze the text and data of the scanned invoice.
  4. The Amazon Textract job publishes a completion notification message with a status of “SUCCEEDED” or “FAILED” to an Amazon Simple Notification Service (SNS) topic.
  5. SNS sends the message to an Amazon Simple Queue Service (SQS) queue that is subscribed to the SNS topic.
  6. The message in the SQS queue triggers another Lambda function.
  7. The Lambda function initiates a Step Functions state machine to process the results of the Amazon Textract job.
  8. For an Amazon Textract job that completes successfully, a Lambda function saves the document analysis into an Amazon S3 bucket.
  9. The loading of the document analysis to Amazon S3 triggers another Lambda function.
  10. The Lambda function retrieves the text and data of the scanned invoice to find the payment information. It writes an item to an Amazon DynamoDB table with a status indicating if the invoice can be processed.
  11. If the DynamoDB item contains the payment information, another Lambda function is invoked.
  12. The Lambda function archives the processed invoice into another S3 bucket.
  13. If the DynamoDB item does not contain the payment information, a message is published to an Amazon SNS topic requesting that the invoice be reviewed.

Amazon Textract can extract information from the various invoice images and associate labels with the data. You must then handle the various labels that different invoices may associate with the payee name, due date, and payment amount.

Determining payee name, due date and payment amount

After the document analysis has been saved to S3, a Lambda function retrieves the text and data of the scanned invoice to find the information needed for payment. However, invoices can use a variety of labels for the same piece of data, such a payment’s due date.

In the example invoices included with this blog, the payment’s due date is associated with the labels “Pay On or Before”, “Payment Due Date” and “Payment Due”. Payment amounts can also have different labels, such as “Total Due”, “New Balance Total”, “Total Current Charges”, and “Please Pay”. To address this, I use a series of helper functions in the app.py file in the process_document_analysis folder of the GitHub repo.

In app.py, there is the following get_ky_map helper function:

def get_kv_map(blocks):
    key_map = {}
    value_map = {}
    block_map = {}
    for block in blocks:
        block_id = block['Id']
        block_map[block_id] = block
        if block['BlockType'] == "KEY_VALUE_SET":
            if 'KEY' in block['EntityTypes']:
                key_map[block_id] = block
            else:
                value_map[block_id] = block
    return key_map, value_map, block_map

The get_kv_map function is invoked by the Lambda function handler. It iterates over the “Blocks” element of the document analysis produced by Amazon Textract to create dictionaries of keys (labels) and values (data) associated with each block identified by Amazon Textract. It then invokes the following get_kv_relationship helper function:

def get_kv_relationship(key_map, value_map, block_map):
    kvs = {}
    for block_id, key_block in key_map.items():
        value_block = find_value_block(key_block, value_map)
        key = get_text(key_block, block_map)
        val = get_text(value_block, block_map)
        kvs[key] = val
    return kvs

The get_kv_relationship function merges the key and value dictionaries produced by the get_kv_map function to create a single Python key value dictionary where labels are the keys to the dictionary and the invoice’s data are the values. The handler then invokes the following get_line_list helper function:

def get_line_list(blocks):
    line_list = []
    for block in blocks:
        if block['BlockType'] == "LINE":
            if 'Text' in block: 
                line_list.append(block["Text"])
    return line_list

Extracting payee names is more complex because the data may not be labeled. The payee may often differ from the entity sending the invoice. With the Amazon Textract analysis in a format more easily consumable by Python, I use the following get_payee_name helper function to parse and extract the payee:

def get_payee_name(lines):
    payee_name = ""
    payable_to = "payable to"
    payee_lines = [line for line in lines if payable_to in line.lower()]
    if len(payee_lines) > 0:
        payee_line = payee_lines[0]
        payee_line = payee_line.strip()
        pos = payee_line.lower().find(payable_to)
        if pos > -1:
            payee_line = payee_line[pos + len(payable_to):]
            if payee_line[0:1] == ':':
                payee_line = payee_line[1:]
            payee_name = payee_line.strip()
    return payee_name

The get_amount helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment amount:

def get_amount(kvs, lines):
    amount = None
    amounts = [search_value(kvs, amount_tag) for amount_tag in amount_tags if search_value(kvs, amount_tag) is not None]
    if len(amounts) > 0:
        amount = amounts[0]
    else:
        for idx, line in enumerate(lines):
            if line.lower() in amount_tags:
                amount = lines[idx + 1]
                break
    if amount is not None:
        amount = amount.strip()
        if amount[0:1] == '$':
            amount = amount[1:]
    return amount

The amount_tags variable contains a list of possible labels associated with the payment amount:

amount_tags = ["total due", "new balance total", "total current charges", "please pay"]

Similarly, the get_due_date helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment due date:

def get_due_date(kvs):
    due_date = None
    due_dates = [search_value(kvs, due_date_tag) for due_date_tag in due_date_tags if search_value(kvs, due_date_tag) is not None]
    if len(due_dates) > 0:
        due_date = due_dates[0]
    if due_date is not None:
        date_parts = due_date.split('/')
        if len(date_parts) == 3:
            due_date = datetime(int(date_parts[2]), int(date_parts[0]), int(date_parts[1])).isoformat()
        else:
            date_parts = [date_part for date_part in re.split("\s+|,", due_date) if len(date_part) > 0]
            if len(date_parts) == 3:
                datetime_object = datetime.strptime(date_parts[0], "%b")
                month_number = datetime_object.month
                due_date = datetime(int(date_parts[2]), int(month_number), int(date_parts[1])).isoformat()
    else:
        due_date = datetime.now().isoformat()
    return due_date

The due_date_tag contains a list of possible labels associated with the payment due:

due_date_tags = ["pay on or before", "payment due date", "payment due"]

If all required elements needed to issue a payment are found, it adds an item to the DynamoDB table with a status attribute of “Approved for Payment”. If the Lambda function cannot determine the value of one or more required elements, it adds an item to the DynamoDB table with a status attribute of “Pending Review”.

Payment Processing

If the item in the DynamoDB table is marked “Approved for Payment”, the processed invoice is archived. If the item’s status attribute is marked “Pending Review”, an SNS message is published to an SNS Pending Review topic. You can subscribe to this topic so that you can add additional labels to the Python code for determining payment due dates and payment amounts.

Note that the Lambda functions are single-purpose functions, and all workflow logic is contained in the Step Functions state machine. This diagram shows the various tasks (states) of a successful workflow.

State machine workflow

For more information about this solution, download the code from the GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).

Prerequisites

Before deploying the solution, you must install the following prerequisites:

  1. Python.
  2. AWS Command Line Interface (AWS CLI) – for instructions, see Installing the AWS CLI.
  3. AWS Serverless Application Model Command Line Interface (AWS SAM CLI) – for instructions, see Installing the AWS SAM CLI.

Deploying the solution

The solution creates the following S3 buckets with names suffixed by your AWS account ID to prevent a global namespace collision of your S3 bucket names:

  • scanned-invoices-<YOUR AWS ACCOUNT ID>
  • invoice-analyses-<YOUR AWS ACCOUNT ID>
  • processed-invoices-<YOUR AWS ACCOUNT ID>

The following steps deploy the example solution in your AWS account. The solution deploys several components including a Step Functions state machine, Lambda functions, S3 buckets, a DynamoDB table for payment information, and SNS topics.

AWS CloudFormation requires an S3 bucket and stack name for deploying the solution. To deploy:

  1. Download code from GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).
  2. Run the following command to build the artifacts locally on your workstation:sam build
  3. Run the following command to create a CloudFormation stack and deploy your resources:sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

Monitor the progress and wait for the completion of the stack creation process from the AWS CloudFormation console before proceeding.

Testing the solution

To test the solution, upload the PDF test invoices to the S3 bucket named scanned-invoices-<YOUR AWS ACCOUNT ID>.

A Step Functions state machine with the name <YOUR STACK NAME>-ProcessedScannedInvoiceWorkflow runs the workflow. Amazon Textract document analyses are stored in the S3 bucket named invoice-analyses-<YOUR AWS ACCOUNT ID>, and processed invoices are stored in the S3 bucket named processed-invoices-<YOUR AWS ACCOUNT ID>. Processed payments are found in the DynamoDB table named <YOUR STACK NAME>-invoices.

You can monitor the status of the workflows from the Step Functions console. Upon completion of the workflow executions, review the items added to DynamoDB from the Amazon DynamoDB console.

Cleanup

To avoid ongoing charges for any resources you created in this blog post, delete the stack:

  1. Empty the three S3 buckets created during deployment using the S3 console:
    – scanned-invoices-<YOUR AWS ACCOUNT ID>
    – invoice-analyses-<YOUR AWS ACCOUNT ID>
    – processed-invoices-<YOUR AWS ACCOUNT ID>
  2. Delete the CloudFormation stack created during deployment using the CloudFormation console.

Conclusion

In this post, I showed you how to use a Step Functions state machine and Amazon Textract to automatically extract data from a scanned invoice. This eliminates the need for a person to perform the manual step of reviewing an invoice to find payment information to be fed into a backend system. By replacing the manual steps of a workflow with automation, an organization can free up their human workforce to handle more value-added tasks.

To learn more, visit AWS Step Functions and Amazon Textract for more information. For more serverless learning resources, visit https://serverlessland.com.

 

Using AWS Lambda extensions to send logs to custom destinations

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/using-aws-lambda-extensions-to-send-logs-to-custom-destinations/

You can now send logs from AWS Lambda functions directly to a destination of your choice using AWS Lambda Extensions. Lambda Extensions are a new way for monitoring, observability, security, and governance tools to easily integrate with AWS Lambda. For more information, see “Introducing AWS Lambda Extensions – In preview”.

To help you troubleshoot failures in Lambda functions, AWS Lambda automatically captures and streams logs to Amazon CloudWatch Logs. This stream contains the logs that your function code and extensions generate, in addition to logs the Lambda service generates as part of the function invocation.

Previously, to send logs to a custom destination, you typically configure and operate a CloudWatch Log Group subscription. A different Lambda function forwards logs to the destination of your choice.

Logging tools, running as Lambda extensions, can now receive log streams directly from within the Lambda execution environment, and send them to any destination. This makes it even easier for you to use your preferred extensions for diagnostics.

Today, you can use extensions to send logs to Coralogix, Datadog, Honeycomb, Lumigo, New Relic, and Sumo Logic.

Overview

To receive logs, extensions subscribe using the new Lambda Logs API.

Lambda Logs API

Lambda Logs API

The Lambda service then streams the logs directly to the extension. The extension can then process, filter, and route them to any preferred destination. Lambda still sends the logs to CloudWatch Logs.

You deploy extensions, including ones that use the Logs API, as Lambda layers, with the AWS Management Console and AWS Command Line Interface (AWS CLI). You can also use infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), Serverless Framework, and Terraform.

Logging extensions from AWS Lambda Ready Partners and AWS Partners available at launch

Today, you can use logging extensions with the following tools:

  • The Datadog extension now makes it easier than ever to collect your serverless application logs for visualization, analysis, and archival. Paired with Datadog’s AWS integration, end-to-end distributed tracing, and real-time enhanced AWS Lambda metrics, you can proactively detect and resolve serverless issues at any scale.
  • Lumigo provides monitoring and debugging for modern cloud applications. With the open source extension from Lumigo, you can send Lambda function logs directly to an S3 bucket, unlocking new post processing use cases.
  • New Relic enables you to efficiently monitor, troubleshoot, and optimize your Lambda functions. New Relic’s extension allows you send your Lambda service platform logs directly to New Relic’s unified observability platform, allowing you to quickly visualize data with minimal latency and cost.
  • Coralogix is a log analytics and cloud security platform that empowers thousands of companies to improve security and accelerate software delivery, allowing you to get deep insights without paying for the noise. Coralogix can now read Lambda function logs and metrics directly, without using Cloudwatch or S3, reducing the latency, and cost of observability.
  • Honeycomb is a powerful observability tool that helps you debug your entire production app stack. Honeycomb’s extension decreases the overhead, latency, and cost of sending events to the Honeycomb service, while increasing reliability.
  • The Sumo Logic extension enables you to get instant visibility into the health and performance of your mission-critical applications using AWS Lambda. With this extension and Sumo Logic’s continuous intelligence platform, you can now ensure that all your Lambda functions are running as expected, by analyzing function, platform, and extension logs to quickly identify and remediate errors and exceptions.

You can also build and use your own logging extensions to integrate your organization’s tooling.

Showing a logging extension to send logs directly to S3

This demo shows an example of using a simple logging extension to send logs to Amazon Simple Storage Service (S3).

To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

The example extension runs a local HTTP endpoint listening for HTTP POST events. Lambda delivers log batches to this endpoint. The example creates an S3 bucket to store the logs. A Lambda function is configured with an environment variable to specify the S3 bucket name. Lambda streams the logs to the extension. The extension copies the logs to the S3 bucket.

Lambda environment variable specifying S3 bucket

Lambda environment variable specifying S3 bucket

The extension uses the Extensions API to register for INVOKE and SHUTDOWN events. The extension, using the Logs API, then subscribes to receive platform and function logs, but not extension logs.

As the example is an asynchronous system, logs for one invoke may be processed during the next invocation. Logs for the last invoke may be processed during the SHUTDOWN event.

Testing the function from the Lambda console, Lambda sends logs to CloudWatch Logs. The logs stream shows logs from the platform, function, and extension.

Lambda logs visible in CloudWatch Logs

Lambda logs visible in CloudWatch Logs

The logging extension also receives the log stream directly from Lambda, and copies the logs to S3.

Browsing to the S3 bucket, the log files are available.

S3 bucket containing copied logs

S3 bucket containing copied logs.

Downloading the file shows the log lines. The log contains the same platform and function logs, but not the extension logs, as specified during the subscription.

[{'time': '2020-11-12T14:55:06.560Z', 'type': 'platform.start', 'record': {'requestId': '49e64413-fd42-47ef-b130-6fd16f30148d', 'version': '$LATEST'}},
{'time': '2020-11-12T14:55:06.774Z', 'type': 'platform.logsSubscription', 'record': {'name': 'logs_api_http_extension.py', 'state': 'Subscribed', 'types': ['platform', 'function']}},
{'time': '2020-11-12T14:55:06.774Z', 'type': 'platform.extension', 'record': {'name': 'logs_api_http_extension.py', 'state': 'Ready', 'events': ['INVOKE', 'SHUTDOWN']}},
{'time': '2020-11-12T14:55:06.776Z', 'type': 'function', 'record': 'Function: Logging something which logging extension will send to S3\n'}, {'time': '2020-11-12T14:55:06.780Z', 'type': 'platform.end', 'record': {'requestId': '49e64413-fd42-47ef-b130-6fd16f30148d'}}, {'time': '2020-11-12T14:55:06.780Z', 'type': 'platform.report', 'record': {'requestId': '49e64413-fd42-47ef-b130-6fd16f30148d', 'metrics': {'durationMs': 4.96, 'billedDurationMs': 100, 'memorySizeMB': 128, 'maxMemoryUsedMB': 87, 'initDurationMs': 792.41}, 'tracing': {'type': 'X-Amzn-Trace-Id', 'value': 'Root=1-5fad4cc9-70259536495de84a2a6282cd;Parent=67286c49275ac0ad;Sampled=1'}}}]

Lambda has sent specific logs directly to the subscribed extension. The extension has then copied them directly to S3.

For more example log extensions, see the Github repository.

How do extensions receive logs?

Extensions start a local listener endpoint to receive the logs using one of the following protocols:

  1. TCP – Logs are delivered to a TCP port in Newline delimited JSON format (NDJSON).
  2. HTTP – Logs are delivered to a local HTTP endpoint through PUT or POST, as an array of records in JSON format. http://sandbox:${PORT}/${PATH}. The $PATH parameter is optional.

AWS recommends using an HTTP endpoint over TCP because HTTP tracks successful delivery of the log messages to the local endpoint that the extension sets up.

Once the endpoint is running, extensions use the Logs API to subscribe to any of three different logs streams:

  • Function logs that are generated by the Lambda function.
  • Lambda service platform logs (such as the START, END, and REPORT logs in CloudWatch Logs).
  • Extension logs that are generated by extension code.

The Lambda service then sends logs to endpoint subscribers inside of the execution environment only.

Even if an extension subscribes to one or more log streams, Lambda continues to send all logs to CloudWatch.

Performance considerations

Extensions share resources with the function, such as CPU, memory, disk storage, and environment variables. They also share permissions, using the same AWS Identity and Access Management (IAM) role as the function.

Log subscriptions consume memory resources as each subscription opens a new memory buffer to store the logs. This memory usage counts towards memory consumed within the Lambda execution environment.

For more information on resources, security and performance with extensions, see “Introducing AWS Lambda Extensions – In preview”.

What happens if Lambda cannot deliver logs to an extension?

The Lambda service stores logs before sending to CloudWatch Logs and any subscribed extensions. If Lambda cannot deliver logs to the extension, it automatically retries with backoff. If the log subscriber crashes, Lambda restarts the execution environment. The logs extension re-subscribes, and continues to receive logs.

When using an HTTP endpoint, Lambda continues to deliver logs from the last acknowledged delivery. With TCP, the extension may lose logs if an extension or the execution environment fails.

The Lambda service buffers logs in memory before delivery. The buffer size is proportional to the buffering configuration used in the subscription request. If an extension cannot process the incoming logs quickly enough, the buffer fills up. To reduce the likelihood of an out of memory event due to a slow extension, the Lambda service drops records and adds a platform.logsDropped log record to the affected extension to indicate the number of dropped records.

Disabling logging to CloudWatch Logs

Lambda continues to send logs to CloudWatch Logs even if extensions subscribe to the logs stream.

To disable logging to CloudWatch Logs for a particular function, you can amend the Lambda execution role to remove access to CloudWatch Logs.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Deny",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": [
            "arn:aws:logs:*:*:*"
        ]
    }
  ]
}

Logs are no longer delivered to CloudWatch Logs for functions using this role, but are still streamed to subscribed extensions. You are no longer billed for CloudWatch logging for these functions.

Pricing

Logging extensions, like other extensions, share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served and the combined compute time used to run your code and all extensions, in 100 ms increments. To learn more about the billing for extensions, visit the Lambda FAQs page.

Conclusion

Lambda extensions enable you to extend the Lambda service to more easily integrate with your favorite tools for monitoring, observability, security, and governance.

Extensions can now subscribe to receive log streams directly from the Lambda service, in addition to CloudWatch Logs. Today, you can install a number of available logging extensions from AWS Lambda Ready Partners and AWS Partners. Extensions make it easier to use your existing tools with your serverless applications.

To try the S3 demo logging extension, follow the instructions in the README.md file in the GitHub repository.

Extensions are now available in preview in all commercial regions other than the China regions.

For more serverless learning resources, visit https://serverlessland.com.

Application integration patterns for microservices: Running distributed RFQs

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-running-distributed-rfqs/

This post is courtesy of Dirk Fröhner, Principal Solutions Architect.

The first blog in this series introduces asynchronous messaging for building loosely coupled systems that can scale, operate, and evolve individually. It considers messaging as a communications model for microservices architectures. Part 2 dives into fan-out strategies and applies the respective patterns to a concrete use case.

In this post, I look at how to apply messaging patterns to help coordinate distributed requests and responses. Specifically, I focus on a composite pattern called scatter-gather, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004).

I also show how a client can communicate with a backend via synchronous REST API operations while asynchronous messaging is applied internally for processing.

Overview

The use case is for Wild Rydes, a fictional application that replaces traditional taxis with unicorns. It’s used in several hands-on AWS workshops that illustrate serverless development concepts.

Wild Rydes wants to allow customers to initiate requests for quotation (RFQs) for their rides. This allows unicorns to make special offers to potential customers within a defined schedule. A customer can send their ride details and ask for quotations from all unicorns that are within a certain vicinity. The customer can then choose the best offer.

Wild Rydes

The scatter-gather pattern

The scatter-gather pattern can be used to implement this use-case on the server side. This pattern is ideal for requesting responses from multiple parties, then aggregating and processing that data.

As presented by Hohpe and Woolf, the scatter-gather pattern is a composite pattern that illustrates how to “broadcast a message to multiple recipients and re-aggregate the responses back into a single message”. The pattern is illustrated in the following diagram.

Scatter-gather architecture

The flow starts with the Requester to initiate the broadcast to all potential Responders. This can be architected in a loosely coupled manner using pub-sub messaging with Amazon SNS or Amazon MQ, as shown in this blog post.

All responders must send their answers somewhere for aggregation and processing. This can also be architected in a loosely coupled manner using a message queue with Amazon SQS or Amazon MQ, as described in this blog post.

The Aggregator component consumes the individual responses from the response queue. It forwards the aggregate to the Processor component for final processing. Both Aggregator and Processor can be part of the same application or process. If separated, they can be decoupled through messaging. The Requester can also be part of the same application or process as Aggregator and Processor.

Explaining the architecture and API

In this section, I walk through the use-case and explain how it can be architected and implemented. I show how the scatter-gather pattern works in the backend, and the client-to-backend communication.

Submit instant ride RFQ

To initiate such an RFQ, the customer app communicates with the ride booking service on the backend. The ride booking service exposes a REST API. By default, an RFQ runs for five minutes, but Wild Rydes is working on a feature to let a customer individually set that value.

A request to submit an instant-ride RFQ contains start and destination locations for the ride and the customer ID:

POST /<submit-instant-ride-rfq-resource-path> HTTP/1.1
...

{
    "from": "...",
    "to": "...",
    "customer": "..."
}

The RFQ is a lengthy process so the client app should not expect an immediate response. Instead, the API accepts the RFQ, creates an RFQ task resource, and returns to the client. The response contains a URL to request an update for the status. It also provides an estimated time for the end of the RFQ:

HTTP/1.1 202 Accepted
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "eta": "..."
}

The following architecture shows this interaction, excluding the process after a new RFQ is submitted.

Client app interaction

Processing the RFQ

The backend uses the scatter-gather pattern to publish the RFQ to unicorns and collect responses for aggregation and processing.

Backend architecture

1. The ride booking service acts as the requester in the scatter-gather pattern. Following a new RFQ from the client app, it publishes the details into an SNS topic. This topic is related to the location of the ride’s starting point since customers need quotes from unicorns within the vicinity. These messages are the green request messages.

2. The unicorn management service maintains instances of unicorn management resources and subscribes them to RFQ topics related to their current location. These resources receive the RFQ request messages and handle the interaction with the Wild Rydes unicorn app.

3. The unicorns in the vicinity are notified through the Wild Rydes unicorn app about the new RFQ and can react if they are available. Notification options between the unicorn management service and the Wild Rydes unicorn app include push notifications and web sockets.

4. Every addressed unicorn can now submit their quote. All quotes go back through the unicorn management resources and the unicorn management service into the RFQ response queue. They act as the responders in the sense of the scatter-gather pattern.

5. The ride booking service also acts as aggregator and processor in the sense of the scatter-gather pattern. It uses SQS to consume messages from an RFQ response queue that eventually contains the RFQ responses from the involved unicorns. It starts doing so immediately after it publishes the details of a new RFQ into the RFQ topic. The messages from the RFQ response queue relate to the blue response messages.

The ride booking service consumes all incoming responses from that queue. This continues until the deadline or all participating unicorns have answered, whatever occurs first. The aggregator responsibility can be as simple as persisting the details of each incoming RFQ response into an Amazon DynamoDB table.

To match incoming responses to the right RFQ, it uses a fundamental integration pattern, correlation ID. In this pattern, a requester adds a unique ID to an outgoing message and each responder is asked to forward this ID in their response.

Also, responders must know where to send their responses to. To keep this dynamic, there is another fundamental integration pattern: return address. It suggests that a requester adds meta information into outgoing messages that indicate the address for their responses. In this architecture, this is the ARN of the SQS queue that acts as the RFQ response queue. This supports an option to simplify the response management: the RFQ response queue is a dedicated queue per customer.

Lastly, the processor responsibility in the ride booking service reads the RFQ responses from the DynamoDB table. It converts the data to JSON for the Wild Rydes customer app.

Check RFQ status

During the RFQ processing, a customer may want to know how many responses have already arrived, or if the results are already available. After submitting an instant ride RFQ, the client receives a representation of the running task. It can use the self-link to request an update:

GET /<rfq-task-resource-path> HTTP/1.1

While the task is running, a response from the ride booking service comes back with the respective status value and the count of responses that have already arrived:

HTTP/1.1 200 OK
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "responses-received": 2,
    "eta": "..."
}

After the RFQ is completed

An RFQ is completed if either the time is up or all unicorns have answered. The result of the RFQ is then available to the customer. If the client requests an update to the task representation, the response indicates this by redirecting to the RFQ result:

HTTP/1.1 303 See Other
Location: <url-of-rfq-result-resource>

Requesting a representation of the results resource, the client receives the quotes of all the participating unicorns. The frontend customer app can visualize these accordingly:

HTTP/1.1 200 OK
...

{
    "links": { ... },
    "from": "...",
    "to": "...",
    "customer": "...",
    "quotes": [ ... ]
}

The ride booking service can also use means of active notifications to make the customer app aware once the RFQ result is ready, including the link to the RFQ result. Examples for this include push notifications and web sockets.

Conclusion

In this blog, I present the scatter-gather pattern, which is a composite pattern based on pub-sub and point-to-point messaging channels. It also employs correlation ID and return address. I show how this is implemented in the Wild Rydes example application. You can use this integration pattern for communication in your microservices.

I cover how synchronous API communication between end user client and backend can work along with asynchronous messaging for request processing internally.

To learn more:

For more serverless learning resources, visit https://serverlessland.com.

Building Serverless Land: Part 2 – An auto-building static site

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/building-serverless-land-part-2-an-auto-building-static-site/

In this two-part blog series, I show how serverlessland.com is built. This is a static website that brings together all the latest blogs, videos, and training for AWS serverless. It automatically aggregates content from a number of sources. The content exists in a static JSON file, which generates a new static site each time it is updated. The result is a low-maintenance, low-latency serverless website, with almost limitless scalability.

A companion blog post explains how to build an automated content aggregation workflow to create and update the site’s content. In this post, you learn how to build a static website with an automated deployment pipeline that re-builds on each GitHub commit. The site content is stored in JSON files in the same repository as the code base. The example code can be found in this GitHub repository.

The growing adoption of serverless technologies generates increasing amounts of helpful and insightful content from the developer community. This content can be difficult to discover. Serverless Land helps channel this into a single searchable location. By collating this into a static website, users can enjoy a browsing experience with fast page load speeds.

The serverless nature of the site means that developers don’t need to manage infrastructure or scalability. The use of AWS Amplify Console to automatically deploy directly from GitHub enables a regular release cadence with a fast transition from prototype to production.

Static websites

A static site is served to the user’s web browser exactly as stored. This contrasts to dynamic webpages, which are generated by a web application. Static websites often provide improved performance for end users and have fewer or no dependant systems, such as databases or application servers. They may also be more cost-effective and secure than dynamic websites by using cloud storage, instead of a hosted environment.

A static site generator is a tool that generates a static website from a website’s configuration and content. Content can come from a headless content management system, through a REST API, or from data referenced within the website’s file system. The output of a static site generator is a set of static files that form the website.

Serverless Land uses a static site generator for Vue.js called Nuxt.js. Each time content is updated, Nuxt.js regenerates the static site, building the HTML for each page route and storing it in a file.

The architecture

Serverless Land static website architecture

When the content.json file is committed to GitHub, a new build process is triggered in AWS Amplify Console.

Deploying AWS Amplify

AWS Amplify helps developers to build secure and scalable full stack cloud applications. AWS Amplify Console is a tool within Amplify that provides a user interface with a git-based workflow for hosting static sites. Deploy applications by connecting to an existing repository (GitHub, BitBucket Cloud, GitLab, and AWS CodeCommit) to set up a fully managed, nearly continuous deployment pipeline.

This means that any changes committed to the repository trigger the pipeline to build, test, and deploy the changes to the target environment. It also provides instant content delivery network (CDN) cache invalidation, atomic deploys, password protection, and redirects without the need to manage any servers.

Building the static website

  1. To get started, use the Nuxt.js scaffolding tool to deploy a boiler plate application. Make sure you have npx installed (npx is shipped by default with npm version 5.2.0 and above).
    $ npx create-nuxt-app content-aggregator

    The scaffolding tool asks some questions, answer as follows:Nuxt.js scaffolding tool inputs

  2. Navigate to the project directory and launch it with:
    $ cd content-aggregator
    $ npm run dev

    The application is now running on http://localhost:3000.The pages directory contains your application views and routes. Nuxt.js reads the .vue files inside this directory and automatically creates the router configuration.

  3. Create a new file in the /pages directory named blogs.vue:$ touch pages/blogs.vue
  4. Copy the contents of this file into pages/blogs.vue.
  5. Create a new file in /components directory named Post.vue :$ touch components/Post.vue
  6. Copy the contents of this file into components/Post.vue.
  7. Create a new file in /assets named content.json and copy the contents of this file into it.$ touch /assets/content.json

The blogs Vue component

The blogs page is a Vue component with some special attributes and functions added to make development of your application easier. The following code imports the content.json file into the variable blogPosts. This file stores the static website’s array of aggregated blog post content.

import blogPosts from '../assets/content.json'

An array named blogPosts is initialized:

data(){
    return{
      blogPosts: []
    }
  },

The array is then loaded with the contents of content.json.

 mounted(){
    this.blogPosts = blogPosts
  },

In the component template, the v-for directive renders a list of post items based on the blogPosts array. It requires a special syntax in the form of blog in blogPosts, where blogPosts is the source data array and blog is an alias for the array element being iterated on. The Post component is rendered for each iteration. Since components have isolated scopes of their own, a :post prop is used to pass the iterated data into the Post component:

<ul>
  <li v-for="blog in blogPosts" :key="blog">
     <Post :post="blog" />
  </li>
</ul>

The post data is then displayed by the following template in components/Post.vue.

<template>
    <div class="hello">
      <h3>{{ post.title }} </h3>
      <div class="img-holder">
          <img :src="post.image" />
      </div>
      <p>{{ post.intro }} </p>
      <p>Published on {{post.date}}, by {{ post.author }} p>
      <a :href="post.link"> Read article</a>
    </div>
</template>

This forms the framework for the static website. The /blogs page displays content from /assets/content.json via the Post component. To view this, go to http://localhost:3000/blogs in your browser:

The /blogs page

Add a new item to the content.json file and rebuild the static website to display new posts on the blogs page. The previous content was generated using the aggregation workflow explained in this companion blog post.

Connect to Amplify Console

Clone the web application to a GitHub repository and connect it to Amplify Console to automate the rebuild and deployment process:

  1. Upload the code to a new GitHub repository named ‘content-aggregator’.
  2. In the AWS Management Console, go to the Amplify Console and choose Connect app.
  3. Choose GitHub then Continue.
  4. Authorize to your GitHub account, then in the Recently updated repositories drop-down select the ‘content-aggregator’ repository.
  5. In the Branch field, leave the default as master and choose Next.
  6. In the Build and test settings choose edit.
  7. Replace - npm run build with – npm run generate.
  8. Replace baseDirectory: / with baseDirectory: dist

    This runs the nuxt generate command each time an application build process is triggered. The nuxt.config.js file has a target property with the value of static set. This generates the web application into static files. Nuxt.js creates a dist directory with everything inside ready to be deployed on a static hosting service.
  9. Choose Save then Next.
  10. Review the Repository details and App settings are correct. Choose Save and deploy.

    Amplify Console deployment

Once the deployment process has completed and is verified, choose the URL generated by Amplify Console. Append /blogs to the URL, to see the static website blogs page.

Any edits pushed to the repository’s content.json file trigger a new deployment in Amplify Console that regenerates the static website. This companion blog post explains how to set up an automated content aggregator to add new items to the content.json file from an RSS feed.

Conclusion

This blog post shows how to create a static website with vue.js using the nuxt.js static site generator. The site’s content is generated from a single JSON file, stored in the site’s assets directory. It is automatically deployed and re-generated by Amplify Console each time a new commit is pushed to the GitHub repository. By automating updates to the content.json file you can create low-maintenance, low-latency static websites with almost limitless scalability.

This application framework is used together with this automated content aggregator to pull together articles for http://serverlessland.com. Serverless Land brings together all the latest blogs, videos, and training for AWS Serverless. Download the code from this GitHub repository to start building your own automated content aggregation platform.

Archiving and replaying events with Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/archiving-and-replaying-events-with-amazon-eventbridge/

Amazon EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the targets, and EventBridge routes the events accordingly.

In event-driven architectures, it can be useful for services to access past events. This has previously required manual logging and archiving, and creating a mechanism to parse files and put events back on the event bus. This can be complex, since you may not have access to the applications that are publishing the events.

With the announcement of event replay, EventBridge can now record any events processed by any type of event bus. Replay stores these recorded events in archives. You can choose to record all events, or filter events to be archived by using the same event pattern matching logic used in rules.

Architectural overview

You can also configure a retention policy for an archive to store data either indefinitely or for a defined number of days. You can now easily configure logging and replay options for events created by AWS services, your own applications, and integrated SaaS partners.

Event replay can be useful for a number of different use-cases:

  • Testing code fixes: after fixing bugs in microservices, being able to replay historical events provides a way to test the behavior of the code change.
  • Testing new features: using historical production data from event archives, you can measure the performance of new features under load.
  • Hydrating development or test environments: you can replay event archives to hydrate the state of test and development environments. This helps provide a more realistic state that approximates production.

This blog post shows you how to create event archives for an event bus, and then how to replay events. I also cover some of the important features and how you can use these in your serverless applications.

Creating event archives

To create an event archive for an event bus:

  1. Navigate to the EventBridge console and select Archives from the left-hand submenu. Choose the Create Archive button.Archives sub-menu
  2. In the Define archive details page:
    1. Enter ‘my-event-archive’ for Name and provide an optional description.
    2. Select a source bus from the dropdown (choose default if you want to archive AWS events).
    3. For retention period, enter ‘30’.
    4. Choose Next.Define archive details
  3. In the Filter events page, you can provide an event pattern to archive a subset of events. For this walkthrough, select No event filtering and choose Create archive.Filter events
  4. In the Archives page, you can see the new archive waiting to receive events.Archives page
  5. Choose the archive to open the details page. Over time, as more events are sent to the bus, the archive maintains statistics about the number and size of events stored.

You can also create archives using AWS CloudFormation. The following example creates an archive that filters for a subset of events with a retention period of 30 days:

Type: AWS::Events::Archive
Properties: 
  Description: My filtered archive.
  EventPattern:
    source:
      -	"my-app-worker-service"
  RetentionDays: 30
  SourceArn: arn:aws:events:us-east-1:123456789012:event-bus/my-custom-application

How this works

Archives are always sourced from a single event bus. Once you have created an archive, it appears on the event bus details page:

Event bus details

You can make changes to an archive definition once it is created. If you shorten the duration, this deletes any events in the archives that are earlier than the new retention period. This deletion process occurs after a period of time and is not immediate. If you extend the duration, this affects event collection from the current point, but does not restore older events.

Each time you create an archive, this automatically generates a rule on the event bus. This is called a managed rule, which is created, updated, and deleted by the EventBridge service automatically. This rule does not count towards the default 300 rules per event bus service quota.

Rules page

When you open a managed rule, the configuration is read-only.

Managed rule configuration

This configuration shows an event pattern that is applied to all incoming events, including those that may be replayed from archives. The event pattern excludes events containing a replay-name attribute, which prevents replayed events from being archived multiple times.

Replaying archived events

To replay an archive of events:

  1. Navigate to the EventBridge console and select Replays from the left-hand submenu. Choose the Create Archive button.Replays menu
  2. In the Start new replay page:
    1. Enter ‘my-event-replay’ for Name and provide an optional description.
    2. Select a source bus from the dropdown. This must match the source bus for the event archive.
    3. For Specify rule(s), select All rules.
    4. Enter a time frame for the replay. This is the ingestion time for the first and last events in the archive.
    5. Choose Start Replay.
  3. The Replays page shows the new replay in Starting status.New replay status

How this works

When a replay is started, the service sends the archived event back to the original event bus. It processes these as quickly as possible, with no ordering guarantees. The replay process adds a “replay-name” attribute to the original event. This is the flow of events:

Flow of archived events

  1. The original event is sent to the event bus. It is received by any existing rules and the managed rule creating the archive. The event is saved to the event archive.
  2. When the archived event is replayed, the JSON object includes the replay-name attribute. The existing rules process the event as in the first step. Since the managed rule does not match the replayed event, it is not forwarded to the archive.

Showing additional replay fields

In the Replays console, choose the preferences cog icon to open the Preferences dialog box.

Setting replay preferences

From here, you can add:

  • Event start time and end time: Timestamps for the earliest and latest events in the archive that was replayed.
  • Replay start time and end time: shows the time filtering parameters set for the listed replay.
  • Last replayed: a timestamp of when the final replay event occurred.

You can sort on any of these additional fields.

Sorting on replay fields

Advanced routing of replayed events

In this simple example, a replayed archive matches the same rules that the original events triggered. Additionally, replayed events must be sent to the original bus where they were archived from. As a result, a basic replay allows you to duplicate events and copy the rule matching behaviors that occurred originally.

However, you may want to trigger different rules for replayed events or send the events to another bus. You can make use of the replay-name attribute in your own rules to add this advanced routing functionality. By creating a rule that filters for the presence of the “replay-name” event, it ignores all events that are not replays. When you create the replay, instead of targeting all rules on the bus, only target this one rule.

Routing of replayed events

  1. The original event is put on the event bus. The replay rule is evaluated but does not match.
  2. The event is played from the archive, targeting only the replay rule. All other rules are excluded automatically by the replay service. The replay rule matches and forwards events onto the rule’s targets.

The target of the replay rule may be typical rule target, including an AWS Lambda function for customized processing, or another event bus.

Conclusion

In event-driven architectures, it can be useful for services to access past events. The new event replay feature in Amazon EventBridge enables you to automatically archive and replay events on an event bus. This can help for testing new features or new code, or hydrating services in development and test to more closely approximate a production environment.

This post shows how to create and replay event archives. It discusses how the archives work, and how you can implement these in your own applications. To learn more about using Amazon EventBridge, visit the learning path for videos, blogs, and other resources.

For more serverless learning resources, visit Serverless Land.

Using Amazon MQ as an event source for AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-mq-as-an-event-source-for-aws-lambda/

Amazon MQ is a managed, highly available message broker for Apache ActiveMQ. The service manages the provisioning, setup, and maintenance of ActiveMQ. Now, with Amazon MQ as an event source for AWS Lambda, you can process messages from the service. This allows you to integrate Amazon MQ with downstream serverless workflows.

ActiveMQ is a popular open source message broker that uses open standard APIs and protocols. You can move from any message broker that uses these standards to Amazon MQ. With Amazon MQ, you can deploy production-ready broker configurations in minutes, and migrate existing messaging services to the AWS Cloud. Often, this does not require you to rewrite any code, and in many cases you only need to update the application endpoints.

In this blog post, I explain how to set up an Amazon MQ broker and networking configuration. I also show how to create a Lambda function that is invoked by messages from Amazon MQ queues.

Overview

Using Amazon MQ as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides these to your function as an event payload.

Lambda is a consumer application for your Amazon MQ queue. It processes records from one or more partitions and sends the payload to the target function. Lambda continues to process batches until there are no more messages in the topic.

The Lambda function’s event payload contains an array of records in the messages attribute. Each array item contains envelope details of the message, together with the base64 encoded message in the data attribute:

Base64 encoded data in payload

How to configure Amazon MQ as an event source for Lambda

Amazon MQ is a highly available service, so it must be configured to run in a minimum of two Availability Zones in your preferred Region. You can also run a single broker in one Availability Zone for development and test purposes. In this walk through, I show how to run a production, public broker and then configure an event source mapping for a Lambda function.

MQ broker architecture

There are four steps:

  • Configure the Amazon MQ broker and security group.
  • Create a queue on the broker.
  • Set up AWS Secrets Manager.
  • Build the Lambda function and associated permissions.

Configuring the Amazon MQ broker and security group

In this step, you create an Amazon MQ broker and then configure the broker’s security group to allow inbound access on ports 8162 and 61617.

  1. Navigate to the Amazon MQ console and choose Create brokers.
  2. In Step 1, keep the defaults and choose Next.
  3. In Configure settings, in the ActiveMQ Access panel, enter a user name and password for broker access.ActiveMQ access
  4. Expand the Additional settings panel, keep the defaults, and ensure that Public accessibility is set to Yes. Choose Create broker.Public accessibility setting
  5. The creation process takes up to 15 minutes. From the Brokers list, select the broker name. In the Details panel, choose the Security group.Security group setting
  6. On the Inbound rules tab, choose Edit inbound rules. Add rules to enable inbound TCP traffic on ports 61617 and 8162:Editing inbound rules
  • Port 8162 is used to access the ActiveMQ Web Console to configure the broker settings.
  • Port 61667 is used by the internal Lambda poller to connect with your broker, using the OpenWire endpoint.

Create a queue on the broker

The Lambda service subscribes to a queue on the broker. In this step, you create a new queue:

  1. Navigate to the Amazon MQ console and choose the newly created broker. In the Connections panel, locate the URLs for the web console.ActiveMQ web console URLs
  2. Only one endpoint is active at a time. Select both and one resolves to the ActiveMQ Web Console application. Enter the user name and password that you configured earlier.ActiveMQ Web Console
  3. In the top menu, select Queues. For Queue Name, enter myQueue and choose Create. The new queue appears in the Queues list.Creating a queue

Keep this webpage open, since you use this later for sending messages to the Lambda function.

Set up Secrets Manager

The Lambda service needs access to your Amazon MQ broker, using the user name and password you configured earlier. To avoid exposing secrets in plaintext in the Lambda function, it’s best practice to use a service like Secrets Manager. To create a secret, use the create-secret AWS CLI command. To do this, ensure you have the AWS CLI installed.

From a terminal window, enter this command, replacing the user name and password with your own values:

aws secretsmanager create-secret --name MQaccess --secret-string '{"username": "your-username", "password": "your-password"}'

The command responds with the ARN of the stored secret:

Secrets Manager CLI response

Build the Lambda function and associated permissions

The Lambda must have permission to access the Amazon MQ broker and stored secret. It must also be able to describe VPCs and security groups, and manage elastic network interfaces. These execution roles permissions are:

  • mq:DescribeBroker
  • secretsmanager:GetSecretValue
  • ec2:CreateNetworkInterface
  • ec2:DescribeNetworkInterfaces
  • ec2:DescribeVpcs
  • ec2:DeleteNetworkInterface
  • ec2:DescribeSubnets
  • ec2:DescribeSecurityGroups

If you are using an encrypted customer managed key, you must also add the kms:Decrypt permission.

To set up the Lambda function:

  1. Navigate to the Lambda console and choose Create Function.
  2. For function name, enter MQconsumer and choose Create Function.
  3. In the Permissions tab, choose the execution role to edit the permissions.Lambda function permissions tab
  4. Choose Attach policies then choose Create policy.
  5. Select the JSON tab and paste the following policy. Choose Review policy.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "mq:DescribeBroker",
                    "secretsmanager:GetSecretValue",
                    "ec2:CreateNetworkInterface",
                    "ec2:DescribeNetworkInterfaces",
                    "ec2:DescribeVpcs",
                    "ec2:DeleteNetworkInterface",
                    "ec2:DescribeSubnets",
                    "ec2:DescribeSecurityGroups",
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": "*"
            }
        ]
    }
    

    Create IAM policy

  6. For name, enter ‘AWSLambdaMQExecutionRole’. Choose Create policy.
  7. In the IAM Summary panel, choose Attach policies. Search for AWSLambdaMQExecutionRole and choose Attach policy.Attaching the IAM policy to the role
  8. On the Lambda function page, in the Designer panel, choose Add trigger. Select MQ from the drop-down. Choose the broker name, enter ‘myQueue’ for Queue name, and choose the secret ARN. Choose Add.Add trigger to Lambda function
  9. The status of the Amazon MQ trigger changes from Creating to Enabled after a couple of minutes. The trigger configuration confirms the settings.Trigger configuration settings

Testing the event source mapping

  1. In the ActiveMQ Web Console, choose Active consumers to confirm that the Lambda service has been configured to consume events.Active consumers
  2. In the main dashboard, choose Send To on the queue. For Number of messages to send, enter 10 and keep the other defaults. Enter a test message then choose Send.Send message to queue
  3. In the MQconsumer Lambda function, select the Monitoring tab and then choose View logs in CloudWatch. The log streams show that the Lambda function has been invoked by Amazon MQ.Lambda function logs

A single Lambda function consumes messages from a single queue in an Amazon MQ broker. You control the rate of message processing using the Batch size property in the event source mapping. The Lambda service limits the concurrency to one execution environment per queue.

For example, in a queue with 100,000 messages and a batch size of 100 and function duration of 2000 ms, the Monitoring tab shows this behavior. The Concurrent executions graph remains at 1 as the internal Lambda poller fetches messages. It continues to invoke the Lambda function until the queue is empty.

CloudWatch metrics for consuming function

Using AWS SAM

In AWS SAM templates, you can configure a Lambda function with an Amazon MQ event source mapping and the necessary permissions. For example:

Resources:
  ProcessMSKfunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: code/
      Timeout: 3
      Handler: app.lambdaHandler
      Runtime: nodejs12.x
      Events:
  MQEvent:
    Type: MQ
    Properties:
      BatchSize: 100
      Stream: arn:aws:mq:us-east-1:123456789012:broker:myMQbroker:b-bf02ad26-cc1a-4598-aa0d-82f2d88eb2ae
      QueueName:
        - myQueue
Policies:
  - Statement:
    - Effect: Allow
      Resource: '*'
      Action:
      - mq:DescribeBroker
      - secretsmanager:GetSecretValue
      - ec2:CreateNetworkInterface
      - ec2:DescribeNetworkInterfaces
      - ec2:DescribeVpcs
      - ec2:DeleteNetworkInterface
      - ec2:DescribeSubnets
      - ec2:DescribeSecurityGroups
      - logs:CreateLogGroup
      - logs:CreateLogStream
      - logs:PutLogEvents

Conclusion

Amazon MQ provide a fully managed, highly available message broker service for Apache ActiveMQ. Now Lambda supports Amazon MQ as an event source, you can invoke Lambda functions from messages in Amazon MQ queues to integrate into your downstream serverless workflows.

In this post, I give an overview of how to set up an Amazon MQ broker. I show how to configure the networking and create the event source mapping with Lambda. I also show how to set up a consumer Lambda function in the AWS Management Console, and refer to the equivalent AWS SAM syntax to simplify deployment.

To learn more about how to use this feature, read the documentation. For more serverless learning resources, visit https://serverlessland.com.

Using shared memory for low-latency, intra-node communication in AWS Batch

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-shared-memory-for-low-latency-intra-node-communication-in-aws-batch/

This post is courtesy of Dario La Porta, Senior Consultant, HPC.

AWS Batch enables developers, scientists, and engineers to run hundreds of thousands of HPC jobs in AWS. By managing the provisioning of computing resources, this allows you to focus on your core business. Shared memory support is a new feature that can help improve overall performance.

This post explains the shared memory paradigm and how it can help you improve the performance of your single and multi-node applications. Performance gains can also help you to reduce the total runtime of your jobs and therefore reduce the overall cost.

The second part of the post shows you how to use shared memory in AWS Batch both in the AWS Management Console and the AWS CLI. Finally, I show the performance gains that are made possible with shared memory usage by walking through a benchmarking analysis with OSU Micro-Benchmarks and GROMACS.

Shared memory paradigm

Advanced, compute-intensive workloads require high-performance hardware to use scalability to deliver results. The Amazon EC2 C5n instance type provides cost-efficient, high-performance hardware with a configurable number of cores.

HPC workloads use algorithms that require parallelization and a low latency communication between the different processes. The two main technologies used for the parallel communications are message-passing with distributed memory and shared memory.

Message Passing Interface (MPI) is a message-passing standard used for the communication in a parallel distributed environment. Elastic Fabric Adapter (EFA) enables your MPI applications to use low-latency, inter-node communication.

The shared memory paradigm allows multiple processors in the same system to communicate using a memory (RAM) portion that is shared between the processes. This method takes advantage of the high-speed memory bus.

Shared memory paradigm

MPI with intra-node shared memory communication

The two main MPI implementations, OpenMPI and Intel MPI, enable an intra-node shared memory communication in a distributed compute environment. When configured, you take advantage of the EFA libfabic implementation having consistent and reduced latency. This results in higher throughput than the TCP transport for the intra-node communication. From libfabric 1.9 onwards, the shared memory support has been directly added to the EFA provider. You no longer need to perform any modification to the OFI MTL.

MPI jobs in AWS Batch

AWS Batch enables the execution of MPI jobs using a multi-node configuration. First, a job definition is created that enables the execution of the job in multiple nodes. To learn how to create this definition, see Creating a Multi-node Parallel Job Definition.

To take advantage of the EFA capabilities, select a supported instance type and read Leveraging Elastic Fabric Adapter to run HPC and ML Workloads on AWS Batch. This post shows how to create the necessary resources in AWS Batch and run your first job with EFA.

Shared memory in AWS Batch

The new AWS Batch console interface enables you to configure the shared memory of the container inside the Job Definition. To see this, expand the Additional configuration in the Container properties section:

Container properties

The Linux parameters section contains the Shared memory size parameter in MB.

Linux parameters

You can set the same configuration in the AWS CLI by passing JSON parameters to the RegisterJobDefinition API:

"linuxParameters": {
    "sharedMemorySize": integer
}

When you run the job, it creates a shared memory area on each node that uses two or more processes. The shared memory area cannot be changed during the execution of the job. The size of the shared memory area is determined by the number of cores available in the node and the application requirements. For most jobs, a suggested initial value is 4096 MB.

Modern Linux kernels support a POSIX shared memory API. You can inspect the size of the container shared memory using the df -h /dev/shm command. The output can help you determine the shared memory space needed for your job.

Benchmarks

The following section compares the different performance using shared memory with Intel MPI 2019 update 7 and EFA.

The instance type used for the benchmark is the c5n.18xlarge and, for the multi-node use case, a cluster placement group. This compares the performance increase from shared memory versus using pure EFA communication. The first benchmark focuses on the latency of the communication in a single node use case.

OSU Micro-Benchmarks is a suite of benchmarks for measuring and evaluating the performance of MPI operations. The specific test case used is the osu_latency, measuring the minimum, maximum and average latency of a ping-pong communication between a sender and a receiver. Specifically, this is where the message sender waits for the reply from the receiver. The benchmark uses a variety of data sizes to report the average one-way latency.

OSU benchmark results

The chart shows the latency in μs on the horizontal axis and packet size in bits on the vertical axis. The result shows a decrease in the communication latency using shared memory for the intra-node communication compared with using only EFA. The following chart shows the latency improvement:

Latency improvement graph

The next benchmark demonstrates how shared memory can also increase the performance in a multi-node configuration. The test application is GROMACS, a versatile package to perform molecular dynamics. The overall performance of the application is susceptible to communication latency variance.

Gromacs performance

The code for the test has been downloaded from the Unified European Applications Benchmark Suite. The specific use case is named lignocellulose-rf and it uses the Reaction field for electrostatics. The details and the download link can be found in the UEABS repository.

The benchmark uses one thread per core and the following mdrun parameters:

-maxh 0.50 -resethway -noconfout -nsteps 10000 -dlb yes -nstlist 100 -pin on

The compilation options and the parameters configuration are explained in the README file. The test is run on a c5n.18xlarge instance, instead of a GPU instance, to focus on measuring the performance improvement caused specifically by increasing the number of total cores of the simulation. The chart explains the performance gain (measure in ns/day) that are achieved by increasing the number of cores. This is possible by using the shared memory for the intra-node communication during the simulation instead of using only EFA networking.

The following chart illustrates a significant percentage performance improvement by using the shared memory:

Shared memory performance improvement

Conclusion

In this post, I show how the new shared memory support in AWS Batch is able to improve performance while decreasing the latency of the intra-node communication. This performance gain can also lower the cost of running jobs overall.

I show how to enable the usage of the shared memory in AWS Batch from the AWS Management Console or the AWS CLI. I also highlight the performance gain from using shared memory with the high-speed memory bus of the c5n.18xlarge instance, using benchmarking analysis with OSU Micro-Benchmarks and GROMACS.

AWS Batch multi-node parallel jobs are now even more performant with EFA and shared memory configurations, enabling you to focus more on your applications and less on tuning. In addition, the Elastic Fabric Adapter (EFA) has a more consistent latency and higher throughput than the TCP transport for the inter-node communication.

To learn more about using this feature, visit the Getting Started guide.