All posts by James Beswick

Creating faster AWS Lambda functions with AVX2

2020-11-24 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-faster-aws-lambda-functions-with-avx2/

Customers use AWS Lambda to build a wide range of applications, including mission-critical and compute-intensive applications. The most demanding workloads include machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling. With the release of Advanced Vector Extensions 2 (AVX2) support for Lambda, builders can benefit from improved performance for these types of applications.

Overview

This blog post explains AVX2 and how you can take advantage of this instruction set in your Lambda functions. I walk through an example of how to enhance performance of a typical use case using AVX2 and measure the performance gain. This feature is available for new or existing Lambda-based workloads at no additional cost.

AVX2 provides extensions to the x86 instruction set architecture. This is a Single Instruction Multiple Data (SIMD) instruction set that enables running a set of highly parallelizable operations simultaneously. AVX2 allows CPUs to perform a higher number of integer and floating-point operations per clock cycle. For vectorizable algorithms, this can enhance performance resulting in lower latencies and higher throughput.

Implementing AVX2 in an example application

Pillow is a popular Python-based imaging library. It provides powerful image manipulation functions that use computationally complex processes. Computer vision operations such as convolution resampling can benefit from parallelization. This is because the filters are applied on different windows that are independent and can be processed in parallel. In this section, I compare the performance of an image transformation after applying AVX2 instructions.

The following example downloads an original JPEG object from an Amazon S3 bucket, resizes the image, and then saves the result to another S3 bucket. There are three resizing filters used – bilinear, bicubic, and Lanczos.

import boto3
import os

from PIL import Image

# Download the image to /tmp
s3 = boto3.client('s3')
s3.download_file('my-input-bucket', 'photo.jpeg', '/tmp/photo.jpeg')

def lambda_handler(event, context):
    # Open image and perform resize
    image = Image.open('/tmp/photo.jpeg')

    # Select one of the three algorithms
    image = image.resize((256, 128), Image.BILINEAR) 
    # image = image.resize((256, 128), Image.BICUBIC) 
    # image = image.resize((256, 128), Image.LANCZOS) 
    
    # Save and upload to S3
    image.save('/tmp/thumbnail.jpeg', 'JPEG')
    s3.upload_file('/tmp/thumbnail.jpeg', 'my-output-bucket', 'thumbnail.jpeg')
    
    return "Success!"

To convert code to use AVX2, you must recompile the source code with the appropriate flags, or use packages and dependencies optimized for AVX2. In this example, you can use a production-ready fork of Pillow called pillow-simd. When compiled for AVX2, it uses the AVX2 instructions to accelerate many of the features in Pillow.

First, you must compile the library using the same Amazon Linux AMI and kernel version that is used by the Lambda service. To do this, use an EC2 or AWS Cloud9 instance running Amazon Linux 2, or using a Docker container with a Lambda-supported image. Compile the library using the following commands:

# Install dependencies
~ yum install -y \
    freetype-devel \
    gcc \
    ghostscript \
    lcms2-devel \
    libffi-devel \
    libimagequant-devel \
    libjpeg-devel \
    libraqm-devel \
    libtiff-devel \
    libwebp-devel \
    make \
    openjpeg2-devel \
    rh-python36 \
    rh-python36-python-virtualenv \
    sudo \
    tcl-devel \
    tk-devel \
    tkinter \
    which \
    xorg-x11-server-Xvfb \
    zlib-devel \
    && yum clean all

# Compile code with AVX2 flag
CC="cc -mavx2" pip install --force-reinstall --no-cache-dir -t . --compile  pillow-simd

You can then use this compiled, AVX2-compatible version of the Pillow library in a Lambda function. This can be bundled with the deployment code or you can deploy the library as a Lambda layer. Depending on the version, you may have to include the following binaries from the `/usr/lib64` directory with the function. If so, add this location to LD_LIBRARY_PATH so the binaries are discoverable:

cp /usr/lib64/libtiff.so.5 lib/libtiff.so.5
cp /usr/lib64/libjpeg.so.62 lib/libjpeg.so.62
cp /usr/lib64/libjbig.so.2.0 lib/libjbig.so.2.0

In this test, I compare the performance of both the original function and the AVX2-optimized version with 1024 MB of memory. The test uses the following image:

Source: https://unsplash.com/photos/IMXhx6qhvf0. Photo credit: Daniel Seßler.

Bilinear filter.
Bicubic filter.
Lanczos filter.

The timings exclude S3 transfers and only compare the image transformation operation. The results of the three resize operations are:

Filter	Without AVX2	With AVX2	Performance improvement
Bilinear	105 ms	71 ms	32%
Bicubic	122 ms	73 ms	40%
Lanczos	136 ms	77 ms	43%

Using AVX2 in popular Lambda runtimes

This process involves recompiling the source code with appropriate flags, or by selecting packages and dependencies optimized for AVX2. For popular runtimes used in Lambda:

Python: Python developers frequently use scipy and numpy libraries to support scientific or computationally complex work. These libraries can be compiled with the AVX2 flag or linked with MKL to take advantage of AVX2.
Java: Java’s JIT compiler can auto-vectorize code to run with AVX2 instructions. To learn more, see this post on how to detect vectorization and potentially optimize code to take advantage of this.
Golang: the standard golang compiler does not currently support auto-vectorization. However, you can use the gcc compiler for Go, gccgo.
Node: for compute intensive workloads, use the AVX2-enabled or MKL-enabled versions of libraries.
Compiling from source: for C or C++ libraries for vectorizable work, compile with the appropriate flags to allow the compiler to automatically vectorize your code. See the documentation for additional details.

Enabling AVX2 for the Intel Math Kernel Library

The Intel Math Kernel Library (MKL) is a library of optimized math operations that implicitly use AVX2 instructions when available on the compute platform. Many popular frameworks, such as PyTorch, build with MKL by default so you don’t need to take additional actions. Some libraries, such as TensorFlow, provide options in their build process to specify MKL optimization (set –config=mkl as an option).

You can also build popular scientific Python libraries, such as SciPy and NumPy, with MKL. For instructions on building libraries with MKL, read Numpy/Scipy with Intel MKL and Intel Compilers. Intel also provides a Python distribution that includes SciPy and NumPy with MKL.

Performance improvements

After enabling AVX2 for your Lambda functions, you can compare the before-and-after performance. Lambda emits a latency metric in CloudWatch that you can use to measure the performance improvement. Other third-party production monitoring tools (for example, Datadog or New Relic) can also capture these metrics to profile the performance.

Pricing and availability

Starting today, customers can either compile their existing workloads or deploy new ones to target this instruction set at no additional cost. To learn more on how to build AVX2 compatible applications on AWS Lambda, read the Lambda Developer Guide.

Support for AVX2 is available in all Regions where Lambda is available, except for the Regions in China. For more information on availability, see the AWS Region table.

Conclusion

With the release of AVX2 for Lambda, customers can now run AVX2-optimized workloads while benefitting from the pay-for-use, reduced operational model of AWS Lambda. This feature is provided at no additional cost.

Developers can create highly scalable synchronous, asynchronous, or streaming applications. Compared to the x86 Intel baseline instruction set, AVX2 allows CPUs to perform more integer and floating-point operations per clock cycle. This speeds up compute-intensive applications with parallelizable operations to process data faster and improve throughput. Developers can schedule queues with data-intensive jobs and deliver performant end user experiences.

To learn more, read the Lambda documentation. For more serverless learning resources, visit Serverless Land.

Simplifying cross-account access with Amazon EventBridge resource policies

2020-11-20 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/simplifying-cross-account-access-with-amazon-eventbridge-resource-policies/

This post is courtesy of Stephen Liedig, Sr Serverless Specialist SA.

Amazon EventBridge is a serverless event bus used to decouple event producers and consumers. Event producers publish events onto an event bus, which then uses rules to determine where to send those events. The rules determine the targets and EventBridge routes the events accordingly.

A common architectural approach adopted by customers is to isolate these application components or services by using separate AWS accounts. This “account-per-service” strategy limits the blast radius by providing a logical and physical separation of resources. It provides additional security boundaries and allows customers to easily track service costs without having to adopt a complex tagging strategy.

To enable the flow of events from one account to another, you must create a rule on one event bus that routes events to an event bus in another account. To enable this routing, you need to configure the resource policy for your event buses.

This blog post shows you how to use EventBridge resource policies to publish events and create rules on event buses in another account.

Overview

Today, EventBridge launches improvements to resource policies that make it easier to build applications that work across accounts. The service expands the use of the policy associated with an event bus to the authorization of API calls.

This means you can manage permissions for API calls that interact with the event bus, such as PutEvents, PutRule, and PutTargets, directly from that event bus’ resource policy. This replaces the need to create different IAM roles that are assumed by each account that interacts with the event bus. It also provides a central resource to manage your permissions.

There is support for organizations and tags via IAM conditions. Now when you call an API, it considers both the user’s IAM policy and the event bus resource policy in the authorization process.

EventBridge APIs that accept an event bus name parameter (including PutRule, PutTargets, DeleteRule, RemoveTargets, DisableRule, and EnableRule) now also support an event bus ARN. This allows you to target cross-account event buses through the APIs. For example, you can call PutRule to create a rule on an event bus in another account, without needing to assume a role.

EventBridge now supports using policy conditions for the following authorization context keys in the APIs, to help scope down permissions.

Context key	APIs	Customer usage
events:detail-type	PutEvents	Used to restrict PutEvents calls for events with a specific “detail-type” field.
events:source	PutEvents	Used to restrict PutEvents calls for events with a specific “source” field.
events:creatorAccount	PutRule, PutTargets, DeleteRule, RemoveTargets, DisableRule, EnableRule, TagResource, UntagResource, DescribeRule, ListTargetsByRule, ListTagsForResource	Used to restrict control plane API calls on rules belonging to a certain account. This can be used to allow a customer to edit/disable only rules created by their own account.
events:eventBusInvocation	PutEvents	Used to differentiate a PutEvents API call from a cross-account event bus target invocation. This context key is set to true during a cross-account event bus target invocation authorization. For example, when a rule matches an event and sends that event to another event bus. For an API call of PutEvents, this context key is set to false.

Ecommerce example walkthrough

In this ecommerce example, there are multiple services distributed across different accounts. A web store publishes an event when a new order is created. The event is sent via a central event bus, which is in another account. The bus has two rules with target services in different AWS accounts.

The goal is to create fine-grained permissions that only allow:

The web store to publish events for a specific detail-type and source.
The invoice processing service to create and manage its own rules on the central bus.

To complete this walk through, you set up three accounts. For account A (Web Store), you deploy an AWS Lambda function that sends the “newOrderCreated” event directly to the “central event bus” in account B. The invoice processing Lambda function in account C creates a rule on the central event bus to process the event published by account A.

Create the central event bus in account B

Create the central event bus in account B, adding the following resource policy. Be sure to substitute your account numbers for accounts A, B, and C.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WebStoreCrossAccountPublish",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::[ACCOUNT-A]:root"
      },
      "Action": "events:PutEvents",
      "Resource": "arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus",
      "Condition": {
        "StringEquals": {
          "events:detail-type": "newOrderCreated",
          "events:source": "com.exampleCorp.webStore"
        }
      }
    },
    {
      "Sid": "InvoiceProcessingRuleCreation",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::[ACCOUNT-C]:root"
      },
      "Action": [
        "events:PutRule",
        "events:DeleteRule",
        "events:DescribeRule",
        "events:DisableRule",
        "events:EnableRule",
        "events:PutTargets",
        "events:RemoveTargets"
      ],
      "Resource": "arn:aws:events:us-east-1:[ACCOUNT-B]:rule/central-event-bus/*",
      "Condition": {
        "StringEqualsIfExists": {
          "events:creatorAccount": "${aws:PrincipalAccount}",
          "events:source": "com.exampleCorp.webStore"
        }
      }
    }
  ]
}

There are two statements in the resource policy: WebStoreCrossAccountPublish and InvoiceProcessingRuleCreation.

The WebStoreCrossAccountPublish statement allows the Lambda function in account A to publish events directly to the central event bus. There are two conditions in the statement that further restrict the types of events that can be sent to the event bus. The first restricts the event detail-type to equal “newOrderCreated” and the second condition requires that the event source equals “com.exampleCorp.webStore”.

The InvoiceProcessingRuleCreation statement allows the invoice processing function in account C to describe, add, update, enable, disable, and delete any rules created by account C. You apply this restriction by using the events:creatorAccount context key in the statements condition.

Importantly you should set the StringEqualsIfExists type for the events:creatorAccount condition. If you use StringEquals, it results in an AccessDeniedException. AWS CloudFormation calls DescribeRule to check if the rule already exists. However, as this is a new rule, and because you set a condition for events:creatorAccount for DescribeRule, this key is not populated and CloudFormation receives an AccessDeniedException instead of ResourceNotFoundException.

Here is how you create the event bus using AWS CloudFormation:

  CentralEventBus: 
      Type: AWS::Events::EventBus
      Properties: 
          Name: !Ref EventBusName

  WebStoreCrossAccountPublishStatement: 
      Type: AWS::Events::EventBusPolicy
      Properties: 
          EventBusName: !Ref CentralEventBus
          StatementId: "WebStoreCrossAccountPublish"
          Statement: 
              Effect: "Allow"
              Principal: 
                  AWS: !Sub arn:aws:iam::${AccountA}:root
              Action: "events:PutEvents"
              Resource: !GetAtt CentralEventBus.Arn
              Condition:
                  StringEquals:
                      "events:detail-type": "newOrderCreated"
                      "events:source" : "com.exampleCorp.webStore"
                      
  InvoiceProcessingRuleCreationStatement: 
      Type: AWS::Events::EventBusPolicy
      Properties: 
          EventBusName: !Ref CentralEventBus
          StatementId: "InvoiceProcessingRuleCreation"
          Statement: 
              Effect: "Allow"
              Principal: 
                  AWS: !Sub arn:aws:iam::${AccountC}:root
              Action: 
                  - "events:PutRule"
                  - "events:DeleteRule"
                  - "events:DescribeRule"
                  - "events:DisableRule"
                  - "events:EnableRule"
                  - "events:PutTargets"
                  - "events:RemoveTargets"
              Resource: 
                  - !Sub arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/${CentralEventBus.Name}/*
              Condition:
                  StringEqualsIfExists:
                      "events:creatorAccount" : "${aws:PrincipalAccount}"
                  StringEquals:
                      "events:source": "com.exampleCorp.webStore"

Now that you have a policy set up on the central event bus, configure the client applications to send and process events. The client application must also have permissions configured.

Create the web store order function in account A

In account A, create a Lambda function to send the event to the central bus in account B. Set the EventBusName parameter to the central event bus ARN on the PutEvents API call. This allows you to target cross-account event buses directly.

import json
import boto3

EVENT_BUS_ARN = 'arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus'

# Create EventBridge client
events = boto3.client('events')

def lambda_handler(event, context):

  # new order created event datail
  eventDetail  = {
    "orderNo": "123",
    "orderDate": "2020-09-09T22:01:02Z",
    "customerId": "789",
    "lineItems": {
      "productCode": "P1",
      "quantityOrdered": 3,
      "unitPrice": 23.5,
      "currency": "USD"
    }
  }
  
  try:
    # Put an event
    response = events.put_events(
        Entries=[
            {
                'EventBusName': EVENT_BUS_ARN,
                'Source': 'com.exampleCorp.webStore',
                'DetailType': 'newOrderCreated',
                'Detail': json.dumps(eventDetail)
            }
        ]
    )
    
    print(response['Entries'])
    print('Event sent to the event bus ' + EVENT_BUS_ARN )
    print('EventID is ' + response['Entries'][0]['EventId'])
    
  except Exception as e:
      print(e)

Create the Invoice Processing service in account C

Next, create the invoice processing function that processes the newOrderCreated event. You use the AWS Serverless Application Model (AWS SAM) to create the invoice processing function and other application resources. Before you can process any events from the central event bus, you must create a new event bus in account C to receive incoming events.

Next, you define the function that processes the events. Here, you define a Lambda event source that is an EventBridge rule. You set the EventBusName to the receiving invoice processing event bus. When this Lambda function is deployed, AWS SAM creates the rule on the event bus with the specified pattern and target. It configures the event source that triggers the function when an event is received.

Parameters:
  EventBusName:
    Description: Name of the central event bus
    Type: String
    Default: invoice-processing-event-bus
  CentralEventBusArn:
    Description: The ARN of the central event bus # e.g. arn:aws:events:us-east-1:[ACCOUNT-B]:event-bus/central-event-bus
    Type: String
Resources:
  # This is the receiving invoice processing event bus in account C.
  InvoiceProcessingEventBus: 
    Type: AWS::Events::EventBus
    Properties: 
        Name: !Ref EventBusName
# AWS Lambda function processes the newOrderCreated event
  InvoiceProcessingFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: invoice_processing
      Handler: invoice_processing_function/app.lambda_handler
      Runtime: python3.8
      Events:
        NewOrderCreatedRule:
          Type: EventBridgeRule
          Properties:
            EventBusName: !Ref InvoiceProcessingEventBus
            Pattern:
              source:
                - com.exampleCorp.webStore
              detail-type:
                - newOrderCreated

The next resource in the AWS SAM template is the rule that creates the target on the central event bus. It sends events to the invoice processing event bus. Though the rule is added to the central event bus, its definition is managed by the invoice processing service template. The rule definition sets EventBusName parameter to the ARN of the central event bus.

  # This is the rule that the invoice processing service creates on the central event bus
  InvoiceProcessingRule:
    Type: AWS::Events::Rule
    Properties:
      Name: InvoiceProcessingNewOrderCreatedSubscription
      Description: Cross account rule created by Invoice Processing service
      EventBusName: !Ref CentralEventBusArn # ARN of the central event bus
      EventPattern:
        source:
          - com.exampleCorp.webStore
        detail-type:
          - newOrderCreated
      State: ENABLED
      Targets: 
        - Id: SendEventsToInvoiceProcessingEventBus
          Arn: !GetAtt InvoiceProcessingEventBus.Arn
          RoleArn: !GetAtt CentralEventBusToInvoiceProcessingEventBusRole.Arn
          DeadLetterConfig:
            Arn: !GetAtt InvoiceProcessingTargetDLQ.Arn

For the central event bus target to send the event to the invoice processing event bus in account C, EventBridge needs the necessary permissions to invoke the PutEvents API. The CentralEventBusToInvoiceProcessingEventBusRole IAM role provides that permission. It is assumed by the central event bus in account B when it needs to send events to the invoice processing event bus, without you having to create an additional resource policy on the invoice processing event bus.

  CentralEventBusToInvoiceProcessingEventBusRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
              - events.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: PutEventsOnInvoiceProcessingEventBus
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 'events:PutEvents'
                Resource: !GetAtt InvoiceProcessingEventBus.Arn

You can also set up a dead-letter queue (DLQ) configuration for the rule in account C. This allows the subscriber of the event to control where events that fail to get delivered to the invoice processing event bus get sent. All you need to do to make this happen is create an Amazon SQS queue in account C, and a queue policy that sets a resource policy to allow EventBridge to send failed events from account B to the queue in account C.

  # Invoice Processing Target Dead Letter Queue 
  InvoiceProcessingTargetDLQ:
    Type: AWS::SQS::Queue

  # SQS resource policy required to allow target on central bus to send failed messages to target DLQ
  InvoiceProcessingTargetDLQPolicy: 
    Type: AWS::SQS::QueuePolicy
    Properties: 
      Queues: 
        - !Ref InvoiceProcessingTargetDLQ
      PolicyDocument: 
        Statement: 
          - Action: 
              - "SQS:SendMessage" 
            Effect: "Allow"
            Resource: !GetAtt InvoiceProcessingTargetDLQ.Arn
            Principal:  
              Service: "events.amazonaws.com"
            Condition:
              ArnEquals:
                "aws:SourceArn": !GetAtt InvoiceProcessingRule.Arn

Conclusion

This post shows you how to use the new features Amazon EventBridge resource policies that make it easier to build applications that work across accounts. Resource policies provide you with a powerful mechanism for modeling your event buses across multiple accounts, and give you fine-grained control over EventBridge API invocations.

Download the code in this blog from https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples.

For more serverless learning resources, visit Serverless Land.

Application integration patterns for microservices: Orchestration and coordination

2020-11-18 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-for-microservices-orchestration-and-coordination/

This post is courtesy of Stephen Liedig, Sr. Serverless Specialist SA.

This is the final blog post in the “Application Integration Patterns for Microservices” series. Previous posts cover asynchronous messaging for microservices, fan-out strategies, and scatter-gather design patterns.

In this post, I look at how to implement messaging patterns to help orchestrate and coordinate business workflows in our applications. Specifically, I cover two patterns:

Pipes and Filters, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004)
Saga Pattern, which is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987.

I discuss these patterns using the Wild Rydes example from this series.

Wild Rydes

Wild Rydes is a fictional technology start-up created to disrupt the transportation industry by replacing traditional taxis with unicorns. Several hands-on AWS workshops use the Wild Rydes scenario. It illustrates concepts such as serverless development, event-driven design, API management, and messaging in microservices.

This blog post explores the build process of the Wild Rydes workshop, to help you apply these concepts to your applications.

After completing a unicorn ride, the Wild Rydes customer application charges the customer. Once the driver submits a ride completion, an event triggers the following steps:

Registers the fare: registers the fare ride completion event.
Initiates the payment (via a payment service): calls a payment gateway for credit card pre-authorization. Using the pre-authorization code, it completes the payment transaction.
Updates customer accounting system: once the payment is processed, updates the Wild Rydes customer accounting system with the transaction detail.
Publishes “Fare Processed” event: sends a notification to interested components that the process is completed.

Each of the steps interfaces with separate systems – the Wild Rydes system, a third-party payment provider, and the customer accounting system. You could implement these steps inside a single component, but that would make it difficult to change and adapt. It’d also reduce the potential for components reuse within our application. Breaking down the steps into individual components allows you to build components with a single responsibility making it easier to manage each components dependencies and application lifecycle. You can be selective about how you implement the respective components, for example, different teams responsible for the development of the respective components may choose to use different languages. This is where the Pipes and Filters architectural pattern can help.

Pipes and filters

Hohpe and Woolf define Pipes and Filters as an “architectural style to divide a larger processing task into a sequence of smaller, independent processing steps (filters) that are connected by channels (pipes).”

Pipes provide a communications channel that abstracts the consumer of messages sent through that channel. It decouples your filter from one another, so components only need to know the messaging channel, or endpoint, where they are sending messages. They do not know who, or what, is processing that message, or where the receiver is located on the network.

Amazon SQS provides a lightweight solution with the power and scale of messaging middleware. It is a simple, flexible, fully managed message queuing service for reliably and continuously exchanging large volume of messages. It has virtually limitless scalability and the ability to increase message throughput without pre-provisioning capacity.

You can create an SQS queue with this AWS CLI command:

aws sqs create-queue --queue-name MyQueue

For the fare processing scenario, you could implement a Pipes and Filters architectural pattern using AWS services. This uses two Amazon SQS queues and an Amazon SNS topic:

Amazon SQS provides a mechanism for decoupling the components. The filters only need to know to which queue to send the message, without knowing which component processes that message nor when it is processed. SQS does this in a secure, durable, and scalable way.

Despite the fact that none of the filters have a direct dependency on one another, there is still a degree of coupling at the pipe level. Changing execution order therefore forces you to update and redeploy your existing filters to point to a new pipe. In the Wild Rydes example, you can reduce the impact of this by defining an environment variable for the destination endpoint in AWS Lambda function configuration, rather than hardcoding this inside your implementations.

Dealing with failures and retries requires some consideration too. In Amazon SQS terms, this requires you to define configurations, such a message VisibilityTimeOut. The VisibilityTimeOut setting provides you with some transactional support. It ensures that the message is not removed from the queue until after you have finished processing the message and you explicitly delete it from the queue. Using Amazon SQS as an Event Source for AWS Lambda further simplifies that for you because the message polling implementation is managed by the service, so you don’t need to create an explicit implementation in your filter.

Amazon SQS helps deal with failures gracefully as it maintains a count of how many times a message is processed via ReceiveCount. By specifying a maxReceiveCount, you can limit the number of times a poisoned message gets processed. Combine this with a dead letter queue (DLQ), you can then move messages that have exceeded the maxReceiveCount number to the DLQ. Adding Amazon CloudWatch alarms on metrics such as ApproximateNumberOfMessagesVisible on the DLQ, you can proactively alert on system failures if the number of messages on the dead letter queue exceed and acceptable threshold.

Alternatively, you can model the fare payment scenario with AWS Step Functions. Step Functions externalizes the Pipes and Filters pattern. It extracts the coordination from the filter implementations into a state machine that orchestrates the sequence of events. Visual workflows allow you to change the sequence of execution without modifying code, reducing the amount of coupling between collaborating components.

Here is how you could model the fare processing scenario using Step Functions:

{
  "Comment": "StateMachine for Processing Fare Payments",
  "StartAt": "RegisterFare",
  "States": {
    "RegisterFare": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:RegisterFareFunction",
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ChargeFareFunction",
      "Next": "UpdateCustomerAccount"
    },
    "UpdateCustomerAccount": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:UpdateCustomerAccountFunction",
      "Next": "PublishFareProcessedEvent"
    },
    "PublishFareProcessedEvent": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:REGION:ACCOUNT_ID:myTopic",
        "Message": {
          "Input": "Hello from Step Functions!"
        }
      },
      "End": true
    }
  }
}

AWS Step Functions allows you to easily build more sophisticated workflows. These could include decision points, parallel processing, wait states to pause the state machine execution, error handling, and retry logic. Error and Retry states help you simplify your component implementation by providing a framework for error handling and implementation exponential backoff on retries. You can define alternate execution paths if failures cannot be handled.

In this implementation, each of these states is a discrete transaction. Some implement database transactions when registering the fare, others are calling the third-party payment provider APIs, and internal APIs or programming interfaces when updating the customer accounting system.

Dealing with each of these transactions independently is relatively straightforward. But what happens if you require consistency across all steps so that either all or none of the transactions complete? How can you deal with consistency across multiple, distributed transactions? How do we deal with the temporal aspects of coordinating these potentially long running heterogeneous integrations?

Cloud providers do not support Distributed Transaction Coordinators (DTC) or two-phase commit protocols responsible for coordinating transactions across multiple cloud resources. Therefore, you need a mechanism to explicitly coordinate multiple local transactions. This is where the saga pattern and AWS Step Functions can help.

A saga is a design pattern for dealing with “long-lived transactions” (LLT), published by Garcia-Molina and Salem in 1987, they define the concept of a saga as:

“LLT is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions.” (Garcia-Molina, Salem 1987)

Fundamentally, saga can provide a failure management pattern to establish consistency across all of your distributed applications, by implementing a compensating transaction for each step in a series of functions. Compensating transactions allow you to back out of the changes that were previously committed in your series of functions, so that if one of your steps fails you can “undo” what you did before, and leave your system in stable state, devoid of side-effects.

AWS Step Functions provides a mechanism for implementing a saga pattern with the ability to build fully managed state machines that allow you to catch custom business exceptions and manage and share data across state transitions.

Figure 1: Using Step Functions’ Service Integrations for Amazon DynamoDB and Amazon SNS, you can further reduce the need for a custom AWS Lambda implementation to persist data to the database, or send a notification.

By using these capabilities, you can expand on the previous Fare Processing state machine and implementing compensating transaction states. If Register Fare fails, you may want to emit an event that invokes an external support function or generates a notification informing operators of the system the error.

If payment processing failed, you would want to ensure that the status is updated to reflect state change and then notify operators of the failed event. You might decide to refund customers, update the fare status and notify support, until you have been able to resolve issues with the customer accounting system. Regardless of the approach, Step Functions allows you to model a failure scenario that aligns with a more business-centric view of consistency.

If you want to see the full state machine implementation in Lab4 of Wild Rydes Asynchronous Messaging Workshop. The workshop guides you through building your own state machine so you can see how to apply the pattern to your own scenarios. There are also three other workshops you can walk through that cover the other patterns in the series.

Conclusion

Using Wild Rydes, I show how to use Amazon SQS and AWS Step Functions to decouple your application components and services. I show you how these services help to coordinate and orchestrate distributed components to build resilient and fault tolerant microservices architectures.

Take part in the Wild Rydes Asynchronous Messaging Workshop and learn about the other messaging patterns you can apply to microservices architectures, including fan-out and message filtering, topic-queue-chaining and load balancing (blog post), and scatter-gather.

The Wild Rydes Asynchronous Messaging Workshop resources are hosted on our AWS Samples GitHub repository, including the sample code for this blog post under Lab-4: Choreography and orchestration.

For a deeper dive into queues and topics and how to use these in microservices architectures, read:

For more information on enterprise integration patterns, see:

Getting started with RPA using AWS Step Functions and Amazon Textract

2020-11-16 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/getting-started-with-rpa-using-aws-step-functions-and-amazon-textract/

This post is courtesy of Joe Tringali, Solutions Architect.

Many organizations are using robotic process automation (RPA) to automate workflow, back-office processes that are labor-intensive. RPA, as software bots, can often handle many of these activities. Often RPA workflows contain repetitive manual tasks that must be done by humans, such as viewing invoices to find payment details.

AWS Step Functions is a serverless function orchestrator and workflow automation tool. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents. Combining these services, you can create an RPA bot to automate the workflow and enable employees to handle more complex tasks.

In this post, I show how you can use Step Functions and Amazon Textract to build a workflow that enables the processing of invoices. Download the code for this solution from https://github.com/aws-samples/aws-step-functions-rpa.

Overview

The following serverless architecture can process scanned invoices in PDF or image formats for submitting payment information to a database.

To implement this architecture, I use single-purpose Lambda functions and Step Functions to build the workflow:

Invoices are scanned and loaded into an Amazon Simple Storage Service (S3) bucket.
The loading of an invoice into Amazon S3 triggers an AWS Lambda function to be invoked.
The Lambda function starts an asynchronous Amazon Textract job to analyze the text and data of the scanned invoice.
The Amazon Textract job publishes a completion notification message with a status of “SUCCEEDED” or “FAILED” to an Amazon Simple Notification Service (SNS) topic.
SNS sends the message to an Amazon Simple Queue Service (SQS) queue that is subscribed to the SNS topic.
The message in the SQS queue triggers another Lambda function.
The Lambda function initiates a Step Functions state machine to process the results of the Amazon Textract job.
For an Amazon Textract job that completes successfully, a Lambda function saves the document analysis into an Amazon S3 bucket.
The loading of the document analysis to Amazon S3 triggers another Lambda function.
The Lambda function retrieves the text and data of the scanned invoice to find the payment information. It writes an item to an Amazon DynamoDB table with a status indicating if the invoice can be processed.
If the DynamoDB item contains the payment information, another Lambda function is invoked.
The Lambda function archives the processed invoice into another S3 bucket.
If the DynamoDB item does not contain the payment information, a message is published to an Amazon SNS topic requesting that the invoice be reviewed.

Amazon Textract can extract information from the various invoice images and associate labels with the data. You must then handle the various labels that different invoices may associate with the payee name, due date, and payment amount.

Determining payee name, due date and payment amount

After the document analysis has been saved to S3, a Lambda function retrieves the text and data of the scanned invoice to find the information needed for payment. However, invoices can use a variety of labels for the same piece of data, such a payment’s due date.

In the example invoices included with this blog, the payment’s due date is associated with the labels “Pay On or Before”, “Payment Due Date” and “Payment Due”. Payment amounts can also have different labels, such as “Total Due”, “New Balance Total”, “Total Current Charges”, and “Please Pay”. To address this, I use a series of helper functions in the app.py file in the process_document_analysis folder of the GitHub repo.

In app.py, there is the following get_ky_map helper function:

def get_kv_map(blocks):
    key_map = {}
    value_map = {}
    block_map = {}
    for block in blocks:
        block_id = block['Id']
        block_map[block_id] = block
        if block['BlockType'] == "KEY_VALUE_SET":
            if 'KEY' in block['EntityTypes']:
                key_map[block_id] = block
            else:
                value_map[block_id] = block
    return key_map, value_map, block_map

The get_kv_map function is invoked by the Lambda function handler. It iterates over the “Blocks” element of the document analysis produced by Amazon Textract to create dictionaries of keys (labels) and values (data) associated with each block identified by Amazon Textract. It then invokes the following get_kv_relationship helper function:

def get_kv_relationship(key_map, value_map, block_map):
    kvs = {}
    for block_id, key_block in key_map.items():
        value_block = find_value_block(key_block, value_map)
        key = get_text(key_block, block_map)
        val = get_text(value_block, block_map)
        kvs[key] = val
    return kvs

The get_kv_relationship function merges the key and value dictionaries produced by the get_kv_map function to create a single Python key value dictionary where labels are the keys to the dictionary and the invoice’s data are the values. The handler then invokes the following get_line_list helper function:

def get_line_list(blocks):
    line_list = []
    for block in blocks:
        if block['BlockType'] == "LINE":
            if 'Text' in block: 
                line_list.append(block["Text"])
    return line_list

Extracting payee names is more complex because the data may not be labeled. The payee may often differ from the entity sending the invoice. With the Amazon Textract analysis in a format more easily consumable by Python, I use the following get_payee_name helper function to parse and extract the payee:

def get_payee_name(lines):
    payee_name = ""
    payable_to = "payable to"
    payee_lines = [line for line in lines if payable_to in line.lower()]
    if len(payee_lines) > 0:
        payee_line = payee_lines[0]
        payee_line = payee_line.strip()
        pos = payee_line.lower().find(payable_to)
        if pos > -1:
            payee_line = payee_line[pos + len(payable_to):]
            if payee_line[0:1] == ':':
                payee_line = payee_line[1:]
            payee_name = payee_line.strip()
    return payee_name

The get_amount helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment amount:

def get_amount(kvs, lines):
    amount = None
    amounts = [search_value(kvs, amount_tag) for amount_tag in amount_tags if search_value(kvs, amount_tag) is not None]
    if len(amounts) > 0:
        amount = amounts[0]
    else:
        for idx, line in enumerate(lines):
            if line.lower() in amount_tags:
                amount = lines[idx + 1]
                break
    if amount is not None:
        amount = amount.strip()
        if amount[0:1] == '$':
            amount = amount[1:]
    return amount

The amount_tags variable contains a list of possible labels associated with the payment amount:

amount_tags = ["total due", "new balance total", "total current charges", "please pay"]

Similarly, the get_due_date helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment due date:

def get_due_date(kvs):
    due_date = None
    due_dates = [search_value(kvs, due_date_tag) for due_date_tag in due_date_tags if search_value(kvs, due_date_tag) is not None]
    if len(due_dates) > 0:
        due_date = due_dates[0]
    if due_date is not None:
        date_parts = due_date.split('/')
        if len(date_parts) == 3:
            due_date = datetime(int(date_parts[2]), int(date_parts[0]), int(date_parts[1])).isoformat()
        else:
            date_parts = [date_part for date_part in re.split("\s+|,", due_date) if len(date_part) > 0]
            if len(date_parts) == 3:
                datetime_object = datetime.strptime(date_parts[0], "%b")
                month_number = datetime_object.month
                due_date = datetime(int(date_parts[2]), int(month_number), int(date_parts[1])).isoformat()
    else:
        due_date = datetime.now().isoformat()
    return due_date

The due_date_tag contains a list of possible labels associated with the payment due:

due_date_tags = ["pay on or before", "payment due date", "payment due"]

If all required elements needed to issue a payment are found, it adds an item to the DynamoDB table with a status attribute of “Approved for Payment”. If the Lambda function cannot determine the value of one or more required elements, it adds an item to the DynamoDB table with a status attribute of “Pending Review”.

Payment Processing

If the item in the DynamoDB table is marked “Approved for Payment”, the processed invoice is archived. If the item’s status attribute is marked “Pending Review”, an SNS message is published to an SNS Pending Review topic. You can subscribe to this topic so that you can add additional labels to the Python code for determining payment due dates and payment amounts.

Note that the Lambda functions are single-purpose functions, and all workflow logic is contained in the Step Functions state machine. This diagram shows the various tasks (states) of a successful workflow.

For more information about this solution, download the code from the GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).

Prerequisites

Before deploying the solution, you must install the following prerequisites:

Python.
AWS Command Line Interface (AWS CLI) – for instructions, see Installing the AWS CLI.
AWS Serverless Application Model Command Line Interface (AWS SAM CLI) – for instructions, see Installing the AWS SAM CLI.

Deploying the solution

The solution creates the following S3 buckets with names suffixed by your AWS account ID to prevent a global namespace collision of your S3 bucket names:

scanned-invoices-<YOUR AWS ACCOUNT ID>
invoice-analyses-<YOUR AWS ACCOUNT ID>
processed-invoices-<YOUR AWS ACCOUNT ID>

The following steps deploy the example solution in your AWS account. The solution deploys several components including a Step Functions state machine, Lambda functions, S3 buckets, a DynamoDB table for payment information, and SNS topics.

AWS CloudFormation requires an S3 bucket and stack name for deploying the solution. To deploy:

Download code from GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).
Run the following command to build the artifacts locally on your workstation:sam build
Run the following command to create a CloudFormation stack and deploy your resources:sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

Monitor the progress and wait for the completion of the stack creation process from the AWS CloudFormation console before proceeding.

Testing the solution

To test the solution, upload the PDF test invoices to the S3 bucket named scanned-invoices-<YOUR AWS ACCOUNT ID>.

A Step Functions state machine with the name <YOUR STACK NAME>-ProcessedScannedInvoiceWorkflow runs the workflow. Amazon Textract document analyses are stored in the S3 bucket named invoice-analyses-<YOUR AWS ACCOUNT ID>, and processed invoices are stored in the S3 bucket named processed-invoices-<YOUR AWS ACCOUNT ID>. Processed payments are found in the DynamoDB table named <YOUR STACK NAME>-invoices.

You can monitor the status of the workflows from the Step Functions console. Upon completion of the workflow executions, review the items added to DynamoDB from the Amazon DynamoDB console.

Cleanup

To avoid ongoing charges for any resources you created in this blog post, delete the stack:

Empty the three S3 buckets created during deployment using the S3 console:
– scanned-invoices-<YOUR AWS ACCOUNT ID>
– invoice-analyses-<YOUR AWS ACCOUNT ID>
– processed-invoices-<YOUR AWS ACCOUNT ID>
Delete the CloudFormation stack created during deployment using the CloudFormation console.

Conclusion

In this post, I showed you how to use a Step Functions state machine and Amazon Textract to automatically extract data from a scanned invoice. This eliminates the need for a person to perform the manual step of reviewing an invoice to find payment information to be fed into a backend system. By replacing the manual steps of a workflow with automation, an organization can free up their human workforce to handle more value-added tasks.

To learn more, visit AWS Step Functions and Amazon Textract for more information. For more serverless learning resources, visit https://serverlessland.com.

Application integration patterns for microservices: Running distributed RFQs

2020-11-11 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/application-integration-patterns-running-distributed-rfqs/

This post is courtesy of Dirk Fröhner, Principal Solutions Architect.

The first blog in this series introduces asynchronous messaging for building loosely coupled systems that can scale, operate, and evolve individually. It considers messaging as a communications model for microservices architectures. Part 2 dives into fan-out strategies and applies the respective patterns to a concrete use case.

In this post, I look at how to apply messaging patterns to help coordinate distributed requests and responses. Specifically, I focus on a composite pattern called scatter-gather, as presented in the book “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions” (Hohpe and Woolf, 2004).

I also show how a client can communicate with a backend via synchronous REST API operations while asynchronous messaging is applied internally for processing.

Overview

The use case is for Wild Rydes, a fictional application that replaces traditional taxis with unicorns. It’s used in several hands-on AWS workshops that illustrate serverless development concepts.

Wild Rydes wants to allow customers to initiate requests for quotation (RFQs) for their rides. This allows unicorns to make special offers to potential customers within a defined schedule. A customer can send their ride details and ask for quotations from all unicorns that are within a certain vicinity. The customer can then choose the best offer.

The scatter-gather pattern

The scatter-gather pattern can be used to implement this use-case on the server side. This pattern is ideal for requesting responses from multiple parties, then aggregating and processing that data.

As presented by Hohpe and Woolf, the scatter-gather pattern is a composite pattern that illustrates how to “broadcast a message to multiple recipients and re-aggregate the responses back into a single message”. The pattern is illustrated in the following diagram.

The flow starts with the Requester to initiate the broadcast to all potential Responders. This can be architected in a loosely coupled manner using pub-sub messaging with Amazon SNS or Amazon MQ, as shown in this blog post.

All responders must send their answers somewhere for aggregation and processing. This can also be architected in a loosely coupled manner using a message queue with Amazon SQS or Amazon MQ, as described in this blog post.

The Aggregator component consumes the individual responses from the response queue. It forwards the aggregate to the Processor component for final processing. Both Aggregator and Processor can be part of the same application or process. If separated, they can be decoupled through messaging. The Requester can also be part of the same application or process as Aggregator and Processor.

Explaining the architecture and API

In this section, I walk through the use-case and explain how it can be architected and implemented. I show how the scatter-gather pattern works in the backend, and the client-to-backend communication.

Submit instant ride RFQ

To initiate such an RFQ, the customer app communicates with the ride booking service on the backend. The ride booking service exposes a REST API. By default, an RFQ runs for five minutes, but Wild Rydes is working on a feature to let a customer individually set that value.

A request to submit an instant-ride RFQ contains start and destination locations for the ride and the customer ID:

POST /<submit-instant-ride-rfq-resource-path> HTTP/1.1
...

{
    "from": "...",
    "to": "...",
    "customer": "..."
}

The RFQ is a lengthy process so the client app should not expect an immediate response. Instead, the API accepts the RFQ, creates an RFQ task resource, and returns to the client. The response contains a URL to request an update for the status. It also provides an estimated time for the end of the RFQ:

HTTP/1.1 202 Accepted
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "eta": "..."
}

The following architecture shows this interaction, excluding the process after a new RFQ is submitted.

Processing the RFQ

The backend uses the scatter-gather pattern to publish the RFQ to unicorns and collect responses for aggregation and processing.

1. The ride booking service acts as the requester in the scatter-gather pattern. Following a new RFQ from the client app, it publishes the details into an SNS topic. This topic is related to the location of the ride’s starting point since customers need quotes from unicorns within the vicinity. These messages are the green request messages.

2. The unicorn management service maintains instances of unicorn management resources and subscribes them to RFQ topics related to their current location. These resources receive the RFQ request messages and handle the interaction with the Wild Rydes unicorn app.

3. The unicorns in the vicinity are notified through the Wild Rydes unicorn app about the new RFQ and can react if they are available. Notification options between the unicorn management service and the Wild Rydes unicorn app include push notifications and web sockets.

4. Every addressed unicorn can now submit their quote. All quotes go back through the unicorn management resources and the unicorn management service into the RFQ response queue. They act as the responders in the sense of the scatter-gather pattern.

5. The ride booking service also acts as aggregator and processor in the sense of the scatter-gather pattern. It uses SQS to consume messages from an RFQ response queue that eventually contains the RFQ responses from the involved unicorns. It starts doing so immediately after it publishes the details of a new RFQ into the RFQ topic. The messages from the RFQ response queue relate to the blue response messages.

The ride booking service consumes all incoming responses from that queue. This continues until the deadline or all participating unicorns have answered, whatever occurs first. The aggregator responsibility can be as simple as persisting the details of each incoming RFQ response into an Amazon DynamoDB table.

To match incoming responses to the right RFQ, it uses a fundamental integration pattern, correlation ID. In this pattern, a requester adds a unique ID to an outgoing message and each responder is asked to forward this ID in their response.

Also, responders must know where to send their responses to. To keep this dynamic, there is another fundamental integration pattern: return address. It suggests that a requester adds meta information into outgoing messages that indicate the address for their responses. In this architecture, this is the ARN of the SQS queue that acts as the RFQ response queue. This supports an option to simplify the response management: the RFQ response queue is a dedicated queue per customer.

Lastly, the processor responsibility in the ride booking service reads the RFQ responses from the DynamoDB table. It converts the data to JSON for the Wild Rydes customer app.

Check RFQ status

During the RFQ processing, a customer may want to know how many responses have already arrived, or if the results are already available. After submitting an instant ride RFQ, the client receives a representation of the running task. It can use the self-link to request an update:

GET /<rfq-task-resource-path> HTTP/1.1

While the task is running, a response from the ride booking service comes back with the respective status value and the count of responses that have already arrived:

HTTP/1.1 200 OK
...

{
    "links": {
        "self": "http://.../<rfq-task-resource-path>",
        "...": "..."
    },
    "status": "running",
    "responses-received": 2,
    "eta": "..."
}

After the RFQ is completed

An RFQ is completed if either the time is up or all unicorns have answered. The result of the RFQ is then available to the customer. If the client requests an update to the task representation, the response indicates this by redirecting to the RFQ result:

HTTP/1.1 303 See Other
Location: <url-of-rfq-result-resource>

Requesting a representation of the results resource, the client receives the quotes of all the participating unicorns. The frontend customer app can visualize these accordingly:

HTTP/1.1 200 OK
...

{
    "links": { ... },
    "from": "...",
    "to": "...",
    "customer": "...",
    "quotes": [ ... ]
}

The ride booking service can also use means of active notifications to make the customer app aware once the RFQ result is ready, including the link to the RFQ result. Examples for this include push notifications and web sockets.

Conclusion

In this blog, I present the scatter-gather pattern, which is a composite pattern based on pub-sub and point-to-point messaging channels. It also employs correlation ID and return address. I show how this is implemented in the Wild Rydes example application. You can use this integration pattern for communication in your microservices.

I cover how synchronous API communication between end user client and backend can work along with asynchronous messaging for request processing internally.

To learn more:

The use-cases of this blog series are discussed and implemented in our Decoupled Microservices hands-on workshop.
Watch the AWS re:Invent 2019 talk “Application integration patterns for microservices”. This explains both the fundamental integration patterns in addition to all the use-cases of this blog series.

For more serverless learning resources, visit https://serverlessland.com.

Archiving and replaying events with Amazon EventBridge

2020-11-09 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/archiving-and-replaying-events-with-amazon-eventbridge/

In event-driven architectures, it can be useful for services to access past events. This has previously required manual logging and archiving, and creating a mechanism to parse files and put events back on the event bus. This can be complex, since you may not have access to the applications that are publishing the events.

With the announcement of event replay, EventBridge can now record any events processed by any type of event bus. Replay stores these recorded events in archives. You can choose to record all events, or filter events to be archived by using the same event pattern matching logic used in rules.

You can also configure a retention policy for an archive to store data either indefinitely or for a defined number of days. You can now easily configure logging and replay options for events created by AWS services, your own applications, and integrated SaaS partners.

Event replay can be useful for a number of different use-cases:

Testing code fixes: after fixing bugs in microservices, being able to replay historical events provides a way to test the behavior of the code change.
Testing new features: using historical production data from event archives, you can measure the performance of new features under load.
Hydrating development or test environments: you can replay event archives to hydrate the state of test and development environments. This helps provide a more realistic state that approximates production.

This blog post shows you how to create event archives for an event bus, and then how to replay events. I also cover some of the important features and how you can use these in your serverless applications.

Creating event archives

To create an event archive for an event bus:

Navigate to the EventBridge console and select Archives from the left-hand submenu. Choose the Create Archive button.
In the Define archive details page:
1. Enter ‘my-event-archive’ for Name and provide an optional description.
2. Select a source bus from the dropdown (choose default if you want to archive AWS events).
3. For retention period, enter ‘30’.
4. Choose Next.
In the Filter events page, you can provide an event pattern to archive a subset of events. For this walkthrough, select No event filtering and choose Create archive.
In the Archives page, you can see the new archive waiting to receive events.
Choose the archive to open the details page. Over time, as more events are sent to the bus, the archive maintains statistics about the number and size of events stored.

You can also create archives using AWS CloudFormation. The following example creates an archive that filters for a subset of events with a retention period of 30 days:

Type: AWS::Events::Archive
Properties: 
  Description: My filtered archive.
  EventPattern:
    source:
      -	"my-app-worker-service"
  RetentionDays: 30
  SourceArn: arn:aws:events:us-east-1:123456789012:event-bus/my-custom-application

How this works

Archives are always sourced from a single event bus. Once you have created an archive, it appears on the event bus details page:

You can make changes to an archive definition once it is created. If you shorten the duration, this deletes any events in the archives that are earlier than the new retention period. This deletion process occurs after a period of time and is not immediate. If you extend the duration, this affects event collection from the current point, but does not restore older events.

Each time you create an archive, this automatically generates a rule on the event bus. This is called a managed rule, which is created, updated, and deleted by the EventBridge service automatically. This rule does not count towards the default 300 rules per event bus service quota.

When you open a managed rule, the configuration is read-only.

This configuration shows an event pattern that is applied to all incoming events, including those that may be replayed from archives. The event pattern excludes events containing a replay-name attribute, which prevents replayed events from being archived multiple times.

Replaying archived events

To replay an archive of events:

Navigate to the EventBridge console and select Replays from the left-hand submenu. Choose the Create Archive button.
In the Start new replay page:
1. Enter ‘my-event-replay’ for Name and provide an optional description.
2. Select a source bus from the dropdown. This must match the source bus for the event archive.
3. For Specify rule(s), select All rules.
4. Enter a time frame for the replay. This is the ingestion time for the first and last events in the archive.
5. Choose Start Replay.
The Replays page shows the new replay in Starting status.

How this works

When a replay is started, the service sends the archived event back to the original event bus. It processes these as quickly as possible, with no ordering guarantees. The replay process adds a “replay-name” attribute to the original event. This is the flow of events:

The original event is sent to the event bus. It is received by any existing rules and the managed rule creating the archive. The event is saved to the event archive.
When the archived event is replayed, the JSON object includes the replay-name attribute. The existing rules process the event as in the first step. Since the managed rule does not match the replayed event, it is not forwarded to the archive.

Showing additional replay fields

In the Replays console, choose the preferences cog icon to open the Preferences dialog box.

From here, you can add:

Event start time and end time: Timestamps for the earliest and latest events in the archive that was replayed.
Replay start time and end time: shows the time filtering parameters set for the listed replay.
Last replayed: a timestamp of when the final replay event occurred.

You can sort on any of these additional fields.

Advanced routing of replayed events

In this simple example, a replayed archive matches the same rules that the original events triggered. Additionally, replayed events must be sent to the original bus where they were archived from. As a result, a basic replay allows you to duplicate events and copy the rule matching behaviors that occurred originally.

However, you may want to trigger different rules for replayed events or send the events to another bus. You can make use of the replay-name attribute in your own rules to add this advanced routing functionality. By creating a rule that filters for the presence of the “replay-name” event, it ignores all events that are not replays. When you create the replay, instead of targeting all rules on the bus, only target this one rule.

The original event is put on the event bus. The replay rule is evaluated but does not match.
The event is played from the archive, targeting only the replay rule. All other rules are excluded automatically by the replay service. The replay rule matches and forwards events onto the rule’s targets.

The target of the replay rule may be typical rule target, including an AWS Lambda function for customized processing, or another event bus.

Conclusion

In event-driven architectures, it can be useful for services to access past events. The new event replay feature in Amazon EventBridge enables you to automatically archive and replay events on an event bus. This can help for testing new features or new code, or hydrating services in development and test to more closely approximate a production environment.

This post shows how to create and replay event archives. It discusses how the archives work, and how you can implement these in your own applications. To learn more about using Amazon EventBridge, visit the learning path for videos, blogs, and other resources.

For more serverless learning resources, visit Serverless Land.

Using Amazon MQ as an event source for AWS Lambda

2020-11-05 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-mq-as-an-event-source-for-aws-lambda/

Amazon MQ is a managed, highly available message broker for Apache ActiveMQ. The service manages the provisioning, setup, and maintenance of ActiveMQ. Now, with Amazon MQ as an event source for AWS Lambda, you can process messages from the service. This allows you to integrate Amazon MQ with downstream serverless workflows.

ActiveMQ is a popular open source message broker that uses open standard APIs and protocols. You can move from any message broker that uses these standards to Amazon MQ. With Amazon MQ, you can deploy production-ready broker configurations in minutes, and migrate existing messaging services to the AWS Cloud. Often, this does not require you to rewrite any code, and in many cases you only need to update the application endpoints.

In this blog post, I explain how to set up an Amazon MQ broker and networking configuration. I also show how to create a Lambda function that is invoked by messages from Amazon MQ queues.

Overview

Using Amazon MQ as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides these to your function as an event payload.

Lambda is a consumer application for your Amazon MQ queue. It processes records from one or more partitions and sends the payload to the target function. Lambda continues to process batches until there are no more messages in the topic.

The Lambda function’s event payload contains an array of records in the messages attribute. Each array item contains envelope details of the message, together with the base64 encoded message in the data attribute:

How to configure Amazon MQ as an event source for Lambda

Amazon MQ is a highly available service, so it must be configured to run in a minimum of two Availability Zones in your preferred Region. You can also run a single broker in one Availability Zone for development and test purposes. In this walk through, I show how to run a production, public broker and then configure an event source mapping for a Lambda function.

There are four steps:

Configure the Amazon MQ broker and security group.
Create a queue on the broker.
Set up AWS Secrets Manager.
Build the Lambda function and associated permissions.

Configuring the Amazon MQ broker and security group

In this step, you create an Amazon MQ broker and then configure the broker’s security group to allow inbound access on ports 8162 and 61617.

Navigate to the Amazon MQ console and choose Create brokers.
In Step 1, keep the defaults and choose Next.
In Configure settings, in the ActiveMQ Access panel, enter a user name and password for broker access.
Expand the Additional settings panel, keep the defaults, and ensure that Public accessibility is set to Yes. Choose Create broker.
The creation process takes up to 15 minutes. From the Brokers list, select the broker name. In the Details panel, choose the Security group.
On the Inbound rules tab, choose Edit inbound rules. Add rules to enable inbound TCP traffic on ports 61617 and 8162:

Port 8162 is used to access the ActiveMQ Web Console to configure the broker settings.
Port 61667 is used by the internal Lambda poller to connect with your broker, using the OpenWire endpoint.

Create a queue on the broker

The Lambda service subscribes to a queue on the broker. In this step, you create a new queue:

Navigate to the Amazon MQ console and choose the newly created broker. In the Connections panel, locate the URLs for the web console.
Only one endpoint is active at a time. Select both and one resolves to the ActiveMQ Web Console application. Enter the user name and password that you configured earlier.
In the top menu, select Queues. For Queue Name, enter myQueue and choose Create. The new queue appears in the Queues list.

Keep this webpage open, since you use this later for sending messages to the Lambda function.

Set up Secrets Manager

The Lambda service needs access to your Amazon MQ broker, using the user name and password you configured earlier. To avoid exposing secrets in plaintext in the Lambda function, it’s best practice to use a service like Secrets Manager. To create a secret, use the create-secret AWS CLI command. To do this, ensure you have the AWS CLI installed.

From a terminal window, enter this command, replacing the user name and password with your own values:

aws secretsmanager create-secret --name MQaccess --secret-string '{"username": "your-username", "password": "your-password"}'

The command responds with the ARN of the stored secret:

Build the Lambda function and associated permissions

The Lambda must have permission to access the Amazon MQ broker and stored secret. It must also be able to describe VPCs and security groups, and manage elastic network interfaces. These execution roles permissions are:

mq:DescribeBroker
secretsmanager:GetSecretValue
ec2:CreateNetworkInterface
ec2:DescribeNetworkInterfaces
ec2:DescribeVpcs
ec2:DeleteNetworkInterface
ec2:DescribeSubnets
ec2:DescribeSecurityGroups

If you are using an encrypted customer managed key, you must also add the kms:Decrypt permission.

To set up the Lambda function:

Navigate to the Lambda console and choose Create Function.
For function name, enter MQconsumer and choose Create Function.
In the Permissions tab, choose the execution role to edit the permissions.
Choose Attach policies then choose Create policy.

Select the JSON tab and paste the following policy. Choose Review policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "mq:DescribeBroker",
                "secretsmanager:GetSecretValue",
                "ec2:CreateNetworkInterface",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeVpcs",
                "ec2:DeleteNetworkInterface",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

For name, enter ‘AWSLambdaMQExecutionRole’. Choose Create policy.
In the IAM Summary panel, choose Attach policies. Search for AWSLambdaMQExecutionRole and choose Attach policy.
On the Lambda function page, in the Designer panel, choose Add trigger. Select MQ from the drop-down. Choose the broker name, enter ‘myQueue’ for Queue name, and choose the secret ARN. Choose Add.
The status of the Amazon MQ trigger changes from Creating to Enabled after a couple of minutes. The trigger configuration confirms the settings.

Testing the event source mapping

In the ActiveMQ Web Console, choose Active consumers to confirm that the Lambda service has been configured to consume events.
In the main dashboard, choose Send To on the queue. For Number of messages to send, enter 10 and keep the other defaults. Enter a test message then choose Send.
In the MQconsumer Lambda function, select the Monitoring tab and then choose View logs in CloudWatch. The log streams show that the Lambda function has been invoked by Amazon MQ.

A single Lambda function consumes messages from a single queue in an Amazon MQ broker. You control the rate of message processing using the Batch size property in the event source mapping. The Lambda service limits the concurrency to one execution environment per queue.

For example, in a queue with 100,000 messages and a batch size of 100 and function duration of 2000 ms, the Monitoring tab shows this behavior. The Concurrent executions graph remains at 1 as the internal Lambda poller fetches messages. It continues to invoke the Lambda function until the queue is empty.

Using AWS SAM

In AWS SAM templates, you can configure a Lambda function with an Amazon MQ event source mapping and the necessary permissions. For example:

Resources:
  ProcessMSKfunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: code/
      Timeout: 3
      Handler: app.lambdaHandler
      Runtime: nodejs12.x
      Events:
  MQEvent:
    Type: MQ
    Properties:
      BatchSize: 100
      Stream: arn:aws:mq:us-east-1:123456789012:broker:myMQbroker:b-bf02ad26-cc1a-4598-aa0d-82f2d88eb2ae
      QueueName:
        - myQueue
Policies:
  - Statement:
    - Effect: Allow
      Resource: '*'
      Action:
      - mq:DescribeBroker
      - secretsmanager:GetSecretValue
      - ec2:CreateNetworkInterface
      - ec2:DescribeNetworkInterfaces
      - ec2:DescribeVpcs
      - ec2:DeleteNetworkInterface
      - ec2:DescribeSubnets
      - ec2:DescribeSecurityGroups
      - logs:CreateLogGroup
      - logs:CreateLogStream
      - logs:PutLogEvents

Conclusion

Amazon MQ provide a fully managed, highly available message broker service for Apache ActiveMQ. Now Lambda supports Amazon MQ as an event source, you can invoke Lambda functions from messages in Amazon MQ queues to integrate into your downstream serverless workflows.

In this post, I give an overview of how to set up an Amazon MQ broker. I show how to configure the networking and create the event source mapping with Lambda. I also show how to set up a consumer Lambda function in the AWS Management Console, and refer to the equivalent AWS SAM syntax to simplify deployment.

To learn more about how to use this feature, read the documentation. For more serverless learning resources, visit https://serverlessland.com.

Using shared memory for low-latency, intra-node communication in AWS Batch

2020-11-05 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-shared-memory-for-low-latency-intra-node-communication-in-aws-batch/

This post is courtesy of Dario La Porta, Senior Consultant, HPC.

AWS Batch enables developers, scientists, and engineers to run hundreds of thousands of HPC jobs in AWS. By managing the provisioning of computing resources, this allows you to focus on your core business. Shared memory support is a new feature that can help improve overall performance.

This post explains the shared memory paradigm and how it can help you improve the performance of your single and multi-node applications. Performance gains can also help you to reduce the total runtime of your jobs and therefore reduce the overall cost.

The second part of the post shows you how to use shared memory in AWS Batch both in the AWS Management Console and the AWS CLI. Finally, I show the performance gains that are made possible with shared memory usage by walking through a benchmarking analysis with OSU Micro-Benchmarks and GROMACS.

Shared memory paradigm

Advanced, compute-intensive workloads require high-performance hardware to use scalability to deliver results. The Amazon EC2 C5n instance type provides cost-efficient, high-performance hardware with a configurable number of cores.

HPC workloads use algorithms that require parallelization and a low latency communication between the different processes. The two main technologies used for the parallel communications are message-passing with distributed memory and shared memory.

Message Passing Interface (MPI) is a message-passing standard used for the communication in a parallel distributed environment. Elastic Fabric Adapter (EFA) enables your MPI applications to use low-latency, inter-node communication.

The shared memory paradigm allows multiple processors in the same system to communicate using a memory (RAM) portion that is shared between the processes. This method takes advantage of the high-speed memory bus.

MPI with intra-node shared memory communication

The two main MPI implementations, OpenMPI and Intel MPI, enable an intra-node shared memory communication in a distributed compute environment. When configured, you take advantage of the EFA libfabic implementation having consistent and reduced latency. This results in higher throughput than the TCP transport for the intra-node communication. From libfabric 1.9 onwards, the shared memory support has been directly added to the EFA provider. You no longer need to perform any modification to the OFI MTL.

MPI jobs in AWS Batch

AWS Batch enables the execution of MPI jobs using a multi-node configuration. First, a job definition is created that enables the execution of the job in multiple nodes. To learn how to create this definition, see Creating a Multi-node Parallel Job Definition.

To take advantage of the EFA capabilities, select a supported instance type and read Leveraging Elastic Fabric Adapter to run HPC and ML Workloads on AWS Batch. This post shows how to create the necessary resources in AWS Batch and run your first job with EFA.

Shared memory in AWS Batch

The new AWS Batch console interface enables you to configure the shared memory of the container inside the Job Definition. To see this, expand the Additional configuration in the Container properties section:

The Linux parameters section contains the Shared memory size parameter in MB.

You can set the same configuration in the AWS CLI by passing JSON parameters to the RegisterJobDefinition API:

"linuxParameters": {
    "sharedMemorySize": integer
}

When you run the job, it creates a shared memory area on each node that uses two or more processes. The shared memory area cannot be changed during the execution of the job. The size of the shared memory area is determined by the number of cores available in the node and the application requirements. For most jobs, a suggested initial value is 4096 MB.

Modern Linux kernels support a POSIX shared memory API. You can inspect the size of the container shared memory using the df -h /dev/shm command. The output can help you determine the shared memory space needed for your job.

Benchmarks

The following section compares the different performance using shared memory with Intel MPI 2019 update 7 and EFA.

The instance type used for the benchmark is the c5n.18xlarge and, for the multi-node use case, a cluster placement group. This compares the performance increase from shared memory versus using pure EFA communication. The first benchmark focuses on the latency of the communication in a single node use case.

OSU Micro-Benchmarks is a suite of benchmarks for measuring and evaluating the performance of MPI operations. The specific test case used is the osu_latency, measuring the minimum, maximum and average latency of a ping-pong communication between a sender and a receiver. Specifically, this is where the message sender waits for the reply from the receiver. The benchmark uses a variety of data sizes to report the average one-way latency.

The chart shows the latency in μs on the horizontal axis and packet size in bits on the vertical axis. The result shows a decrease in the communication latency using shared memory for the intra-node communication compared with using only EFA. The following chart shows the latency improvement:

The next benchmark demonstrates how shared memory can also increase the performance in a multi-node configuration. The test application is GROMACS, a versatile package to perform molecular dynamics. The overall performance of the application is susceptible to communication latency variance.

The code for the test has been downloaded from the Unified European Applications Benchmark Suite. The specific use case is named lignocellulose-rf and it uses the Reaction field for electrostatics. The details and the download link can be found in the UEABS repository.

The benchmark uses one thread per core and the following mdrun parameters:

-maxh 0.50 -resethway -noconfout -nsteps 10000 -dlb yes -nstlist 100 -pin on

The compilation options and the parameters configuration are explained in the README file. The test is run on a c5n.18xlarge instance, instead of a GPU instance, to focus on measuring the performance improvement caused specifically by increasing the number of total cores of the simulation. The chart explains the performance gain (measure in ns/day) that are achieved by increasing the number of cores. This is possible by using the shared memory for the intra-node communication during the simulation instead of using only EFA networking.

The following chart illustrates a significant percentage performance improvement by using the shared memory:

Conclusion

In this post, I show how the new shared memory support in AWS Batch is able to improve performance while decreasing the latency of the intra-node communication. This performance gain can also lower the cost of running jobs overall.

I show how to enable the usage of the shared memory in AWS Batch from the AWS Management Console or the AWS CLI. I also highlight the performance gain from using shared memory with the high-speed memory bus of the c5n.18xlarge instance, using benchmarking analysis with OSU Micro-Benchmarks and GROMACS.

AWS Batch multi-node parallel jobs are now even more performant with EFA and shared memory configurations, enabling you to focus more on your applications and less on tuning. In addition, the Elastic Fabric Adapter (EFA) has a more consistent latency and higher throughput than the TCP transport for the inter-node communication.

To learn more about using this feature, visit the Getting Started guide.

Choosing between AWS Lambda data storage options in web apps

2020-10-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/

AWS Lambda is an on-demand compute service that powers many serverless applications. Lambda functions are ephemeral, with execution environments only existing for a brief time when the function is invoked. Many compute operations need access to external data for a variety of purposes. This includes importing third-party libraries, accessing machine learning models, or exporting the output of the compute operation.

Lambda provides a comprehensive range of storage options to meet the needs of web application developers. These include other AWS services such as Amazon S3 and Amazon EFS. There are also native storage options available, such as temporary storage or Lambda layers. In this blog post, I explain the differences between these options, and discuss common use-cases to help you choose for your own applications.

This post references the Happy Path web application series, and you can download the code for that application from the repository.

Amazon S3 – Object storage

Amazon S3 is an object storage service that scales elastically. It offers high availability and 11 9’s of durability. The service is ideal for storing unstructured data. This includes binary data, such as images or media, log files and sensor data.

There are certain characteristics of S3 object storage that are important to remember. While S3 objects can be versioned, you cannot append data as you could in a file system. You have to store an entirely new version of an object. S3 also has a flat storage hierarchy that’s different to a file system. Instead of directories, you use folders to logically organize objects, by prefixing ‘foldername/’ in the key name.

S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3. In the Happy Path application, the image-processing workflows are initiated by this event integration. To learn more about using S3 to trigger automated serverless workflows, visit the learning path.

S3 is often an important repository for an organization’s data lake. If your application writes data to S3 buckets, this can be a useful staging area for downstream processing. For analytics workloads, you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards. To learn how to build business intelligence dashboards for your web application, visit the Innovator Island workshop.

S3 also provides object lifecycle management. This allows you to automatically change storage classes when certain conditions are met. For example, an application for uploading expenses could automatically archive PDFs after 1 year to Amazon S3 Glacier to reduce storage costs. In the Happy Path application, the original high-resolution uploads are stored in a separate bucket from the optimized distribution assets. To reduce storage costs, lifecycle management could be configured to automatically delete these original photo assets after 30 days.

Temporary storage with /tmp

The Lambda execution environment provides a file system for your code to use at /tmp. This space has a fixed size of 512 MB. The same Lambda execution environment may be reused by multiple Lambda invocations to optimize performance. The /tmp area is preserved for the lifetime of the execution environment and provides a transient cache for data between invocations. Each time a new execution environment is created, this area is deleted.

Consequently, this is intended as an ephemeral storage area. While functions may cache data here between invocations, it should be used only for data needed by code in a single invocation. It’s not a place to store data permanently, and is better-used to support operations required by your code.

Operationally, working with files in /tmp is the same as your local hard disk, and offers fast I/O throughput. For example, to unzip a file into this space in Python, use:

import os, zipfile
os.chdir('/tmp')
with zipfile.ZipFile(myzipfile, 'r') as zip:
    zip.extractall()

Lambda layers

Your Lambda functions may use additional libraries as part of the deployment package. You can bundle these in the deployment archive or optionally move to a layer instead. A Lambda function can have up to five layers, and is subject to the maximum deployment size of 50 MB (zipped). Packages in layers are available in the /opt directory during invocations. While layers are private to you by default, you can also share layers with other AWS accounts, or make layers public.

There are many benefits to using layers throughout the functions in your serverless application. It’s best practice to include the AWS SDK instead of depending on the version bundled with the Lambda service. This enables you to pin the version of the SDK. By using a layer, you don’t need to bundle the package with each function, which can increase your deployment package size and slow down deployments. You can create an AWS SDK layer and then include a reference to the layer in each function.

Layers can be an effective way to bundle large dependencies, or share compiled libraries with binaries that vary by operating system. For example, the Happy Path application uses the Sharp npm graphics library to process images. Similarly, the Innovator Island workshop uses the OpenCV library to perform image manipulation, and this is imported using a shared layer.

Layers are static once they are deployed. You can only change the contents of a layer by deploying a new version. Any Lambda function using the layer binds to a specific version and must be updated to change layer versions. To learn more, see using Lambda layers to simplify your development process.

Amazon EFS for Lambda

Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services. It is durable storage option that offers high availability. You can now mount EFS volumes in Lambda functions, which makes it simpler to share data across invocations. The file system grows and shrinks as you add or delete data, so you do not need to manage storage limits.

The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.

EFS enables new capabilities for serverless applications. The file system is a dynamic binding for Lambda functions, unlike layers. This makes it useful for deploying code libraries where you want to always use the latest version. You configure the mount path when integrating the file system with your function, and then include packages from this location. Additionally, you can use this to include packages that exceed the limits of layers.

Due to its speed and support of standard file operations, EFS is also useful for ingesting or writing large numbers files durably. This can be helpful for zipping or unzipping large archives, for example. For appending to existing files, EFS is also a preferred option to using S3.

To learn more, see using Amazon EFS for AWS Lambda in your serverless applications.

Comparing the different data storage options

This table compares the characteristics of these four different data storage options for Lambda:

	Amazon S3	/tmp	Lambda Layers	Amazon EFS
Maximum size	Elastic	512 MB	50 MB	Elastic
Persistence	Durable	Ephemeral	Durable	Durable
Content	Dynamic	Dynamic	Static	Dynamic
Storage type	Object	File system	Archive	File system
Lambda event source integration	Native	N/A	N/A	N/A
Operations supported	Atomic with versioning	Any file system operation	Immutable	Any file system operation
Object tagging	Y	N	N	N
Object metadata	Y	N	N	N
Pricing model	Storage + requests + data transfer	Included in Lambda	Included in Lambda	Storage + data transfer + throughput
Sharing/permissions model	IAM	Function-only	IAM	IAM + NFS
Source for AWS Glue	Y	N	N	N
Source for Amazon QuickSight	Y	N	N	N
Relative data access speed from Lambda	Fast	Fastest	Fastest	Very fast

Conclusion

Lambda is a flexible, on-demand compute service for serverless application. It supports a wide variety of workloads by providing a number of different data storage options.

In this post, I compare the capabilities and use-cases of S3, EFS, Lambda layers, and temporary storage for Lambda functions. There are benefits to each approach, as each type has different behaviors and characteristics. For web application developers, these storage types support different operations depending upon the needs of your serverless backend.

As the newest integration with Lambda, EFS now enables new workloads and capabilities. This includes sharing large code packages with Lambda, or durably operating on large numbers of files. It also opens up new possibilities for developers working on deep learning inference models.

To learn more about storage options available, visit the AWS Serverless homepage. For more serverless learning resources, visit https://serverlessland.com.

Building event-driven architectures with Amazon SNS FIFO

2020-10-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-event-driven-architectures-with-amazon-sns-fifo/

This post is courtesy of Christian Mueller, Principal Solutions Architect.

Developers increasingly adopt event-driven architectures to decouple their distributed applications. Often, these events must be propagated in a strictly ordered manner to all subscribed applications. Using Amazon SNS FIFO topics and Amazon SQS FIFO queues, you can address use cases that require end-to-end message ordering, deduplication, filtering, and encryption.

In this blog post, I introduce a sample event-driven architecture. I walk through an implementation based on Amazon SNS FIFO topics and Amazon SQS FIFO queues.

Common requirements in event-driven-architectures

In event-driven architectures, data consistency is a common business requirement. This is often translated into technical requirements such as zero message loss and strict message ordering. For example, if you update your domain object rapidly, you want to be sure that all events are received by each subscriber in exactly the order they occurred. This way, the current domain object state is what each subscriber received as the latest update event. Similarly, all update events should be received after the initial create event.

Before Amazon SNS FIFO, architects had to design applications to check if messages are received out of order before processing.

Another common challenge is preventing message duplicates when sending events to the messaging service. If an event publisher receives an error, such as a network timeout, the publisher does not know if the messaging service could receive and successfully process the message or not.

The client may retry, as this is the default behavior for some HTTP response codes in AWS SDKs. This can cause duplicate messages.

Before Amazon SNS FIFO, developers had to design receivers to be idempotent. In some cases, where the event cannot be idempotent, this requires the receiver to be implemented in an idempotent way. Often, this is done by adding a key-value store like Amazon DynamoDB or Amazon ElastiCache for Redis to the service. Using this approach, the receiver can track if the event has been seen before.

Exploring the recruiting agency example

This sample application models a recruitment agency with a job listings website. The application is composed of multiple services. I explain 3 of them in more detail.

A custom service, the anti-corruption service, receives a change data capture (CDC) event stream of changes from a relational database. This service translates the low-level technical database events into meaningful business events for the domain services for easy consumption. These business events are sent to the SNS FIFO “JobEvents.fifo“ topic. Here, interested services subscribe to these events and process them asynchronously.

In this domain, the analytics service is interested in all events. It has an SQS FIFO “AnalyticsJobEvents.fifo” queue subscribed to the SNS FIFO “JobEvents.fifo“ topic. It uses SQS FIFO as event source for AWS Lambda, which processes and stores these events in Amazon S3. S3 is object storage service with high scalability, data availability, durability, security, and performance. This allows you to use services like Amazon EMR, AWS Glue or Amazon Athena to get insights into your data to extract value.

The inventory service owns an SQS FIFO “InventoryJobEvents.fifo” queue, which is subscribed to the SNS FIFO “JobEvents.fifo“ topic. It is only interested in “JobCreated” and “JobDeleted” events, as it only tracks which jobs are currently available and stores this information in a DynamoDB table. Therefore, it uses an SNS filter policy to only receive these events, instead of receiving all events.

This sample application focuses on the SNS FIFO capabilities, so I do not explore other services subscribed to the SNS FIFO topic. This sample follows the SQS best practices and SNS redrive policy recommendations and configures dead-letter queues (DLQ). This is useful in case SNS cannot deliver an event to the subscribed SQS queue. It also helps if the function fails to process an event from the corresponding SQS FIFO queue multiple times. As a requirement in both cases, the attached SQS DLQ must be an SQS FIFO queue.

Deploying the application

To deploy the application using infrastructure as code, it uses the AWS Serverless Application Model (SAM). SAM provides shorthand syntax to express functions, APIs, databases, and event source mappings. It is expanded into AWS CloudFormation syntax during deployment.

To get started, clone the “event-driven-architecture-with-sns-fifo” repository, from here. Alternatively, download the repository as a ZIP file from here and extract it to a directory of your choice.

As a prerequisite, you must have SAM CLI, Python 3, and PIP installed. You must also have the AWS CLI configured properly.

Navigate to the root directory of this project and build the application with SAM. SAM downloads required dependencies and stores them locally. Execute the following commands in your terminal:

git clone https://github.com/aws-samples/event-driven-architecture-with-amazon-sns-fifo.git
cd event-driven-architecture-with-amazon-sns-fifo
sam build

You see the following output:

Now, deploy the application:

sam deploy --guided

Provide arguments for the deployments, such as the stack name and preferred AWS Region:

After a successful deployment, you see the following output:

Learning more about the implementation

I explore the three services forming this sample application, and how they use the features of SNS FIFO.

Anti-corruption service

The anti-corruption service owns the SNS FIFO “JobEvents.fifo” topic, where it publishes business events related to job postings. It uses an SNS FIFO topic, as end-to-end ordering per job ID is required. SNS FIFO is configured not to perform content-based deduplication, as I require a unique message deduplication ID for each event for deduplication. The corresponding definition in the SAM template looks like this:

  JobEventsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: JobEvents.fifo
      FifoTopic: true
      ContentBasedDeduplication: false

For simplicity, the anti-corruption function in the sample application doesn’t consume an external database CDC stream. It uses Amazon CloudWatch Events as an event source to trigger the function every minute.

I provide the SNS FIFO topic Amazon Resource Name (ARN) as an environment variable in the function. This makes this function more portable to deploy in different environments and stages. The function’s AWS Identity and Access Management (IAM) policy grants permissions to publish messages to only this SNS topic:

  AntiCorruptionFunction:
    Type: AWS::Serverless::
    Properties:
      CodeUri: anti-corruption-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TOPIC_ARN: !Ref JobEventsTopic
      Policies:
        - SNSPublishMessagePolicy
            TopicName: !GetAtt JobEventsTopic.TopicName
      Events:
        Trigger:
          Type: 
          Properties:
            Schedule: 'rate(1 minute)'

The anti-corruption function uses features in the SNS publish API, which allows you to define a “MessageDeduplicationId” and a “MessageGroupId”. The “MessageDeduplicationId” is used to filter out duplicate messages, which are sent to SNS FIFO within in 5-minute deduplication interval. The “MessageGroupId” is required, as SNS FIFO processes all job events for the same message group in a strictly ordered manner, isolated from other message groups processed through the same topic.

Another important aspect in this implementation is the use of “MessageAttributes”. We define a message attribute with the name “eventType” and values like “JobCreated”, “JobSalaryUpdated”, and “JobDeleted”. This allows subscribers to define SNS filter policies to only receive certain events they are interested in:

import boto3
from datetime import datetime
import json
import os
import random
import uuid

TOPIC_ARN = os.environ['TOPIC_ARN']

sns = boto3.client('sns')

def lambda_handler(event, context):
    jobId = str(random.randrange(0, 1000))

    send_job_created_event(jobId)
    send_job_updated_event(jobId)
    send_job_deleted_event(jobId)
    return

def send_job_created_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(
        TopicArn=TOPIC_ARN,
        Subject=f'Job {jobId} created',
        MessageDeduplicationId=messageId,
        MessageGroupId=f'JOB-{jobId}',
        Message={...},
        MessageAttributes = {
            'eventType': {
                'DataType': 'String',
                'StringValue': 'JobCreated'
            }
        }
    )
    print('sent message and received response: {}'.format(response))
    return

def send_job_updated_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

def send_job_deleted_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

Analytics service

The analytics service owns an SQS FIFO “AnalyticsJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. Following best practices, I define redrive policies for the SQS FIFO queue and the SNS FIFO subscription in the template:

  AnalyticsJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: AnalyticsJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt AnalyticsJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  AnalyticsJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt AnalyticsJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${AnalyticsJobEventsSubscriptionDLQ.Arn}"}'

The analytics function uses SQS FIFO as an event source for Lambda. The S3 bucket name is an environment variable for the function, which increases the code portability across environments and stages. The IAM policy for this function only grants permissions write objects to this S3 bucket:

  AnalyticsFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: analytics-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          BUCKET_NAME: !Ref AnalyticsBucket
      Policies:
        - S3WritePolicy:
            BucketName: !Ref AnalyticsBucket
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt AnalyticsJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Inventory service

The inventory service also owns an SQS FIFO “InventoryJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. It uses redrive policies for the SQS FIFO queue and the SNS FIFO subscription as well. This service is only interested in certain events, so uses an SNS filter policy to specify these events:

  InventoryJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: InventoryJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt InventoryJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  InventoryJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt InventoryJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      FilterPolicy: '{"eventType":["JobCreated", "JobDeleted"]}'
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${InventoryJobEventsQueueSubscriptionDLQ.Arn}"}'

The inventory function also uses SQS FIFO as event source for Lambda. The DynamoDB table name is set as an environment variable, so the function can look up the name during initialization. The IAM policy grants read/write permissions for only this table:

  InventoryFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: inventory-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TABLE_NAME: !Ref InventoryTable
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref InventoryTable
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt InventoryJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Conclusion

Amazon SNS FIFO topics can simplify the design of event-driven architectures and reduce custom code in building such applications.

By using the native integration with Amazon SQS FIFO queues, you can also build architectures that fan out to thousands of subscribers. This pattern helps achieve data consistency, deduplication, filtering, and encryption in near real time, using managed services.

For information on regional availability and service quotas, see SNS endpoints and quotas and SQS endpoints and quotas. For more information on the FIFO functionality, see SNS FIFO and SQS FIFO in their Developer Guides.

Optimizing the cost of serverless web applications

2020-10-19 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-the-cost-of-serverless-web-applications/

Web application backends are one of the most frequent types of serverless use-case for customers. The pay-for-value model can make it cost-efficient to build web applications using serverless tools.

While serverless cost is generally correlated with level of usage, there are architectural decisions that impact cost efficiency. The impact of these choices is more significant as your traffic grows, so it’s important to consider the cost-effectiveness of different designs and patterns.

This blog post reviews some common areas in web applications where you may be able to optimize cost. It uses the Happy Path web application as a reference example, which you can read about in the introductory blog post.

Serverless web applications generally use a combination of the services in the following diagram. I cover each of these areas to highlight common areas for cost optimization.

The API management layer: Selecting the right API type

Most serverless web applications use an API between the frontend client and the backend architecture. Amazon API Gateway is a common choice since it is a fully managed service that scales automatically. There are three types of API offered by the service – REST APIs, WebSocket APIs, and the more recent HTTP APIs.

HTTP APIs offer many of the features in the REST APIs service, but the cost is often around 70% less. It supports Lambda service integration, JWT authorization, CORS, and custom domain names. It also has a simpler deployment model than REST APIs. This feature set tends to work well for web applications, many of which mainly use these capabilities. Additionally, HTTP APIs will gain feature parity with REST APIs over time.

The Happy Path application is designed for 100,000 monthly active users. It uses HTTP APIs, and you can inspect the backend/template.yaml to see how to define these in the AWS Serverless Application Model (AWS SAM). If you have existing AWS SAM templates that are using REST APIs, in many cases you can change these easily:

Content distribution layer: Optimizing assets

Amazon CloudFront is a content delivery network (CDN). It enables you to distribute content globally across 216 Points of Presence without deploying or managing any infrastructure. It reduces latency for users who are geographically dispersed and can also reduce load on other parts of your service.

A typical web application uses CDNs in a couple of different ways. First, there is the distribution of the application itself. For single-page application frameworks like React or Vue.js, the build processes create static assets that are ideal for serving over a CDN.

However, these builds may not be optimized and can be larger than necessary. Many frameworks offer optimization plugins, and the JavaScript community frequently uses Webpack to bundle modules and shrink deployment packages. Similarly, any media assets used in the application build should be optimized. You can use tools like Lighthouse to analyze your web apps to find images that can be resized or compressed.

The second common CDN use-case for web apps is for user-generated content (UGC). Many apps allow users to upload images, which are then shared with other users. A typical photo from a 12-megapixel smartphone is 3–9 MB in size. This high resolution is not necessary when photos are rendered within web apps. Displaying the high-resolution asset results in slower download performance and higher data transfer costs.

The Happy Path application uses a Resizer Lambda function to optimize these uploaded assets. This process creates two different optimized images depending upon which component loads the asset.

The upload S3 bucket shows the original size of the upload from the smartphone:

The distribution S3 bucket contains the two optimized images at different sizes:

The distribution file sizes are 98–99% smaller. For a busy web application, using optimized image assets can make a significant difference to data transfer and CloudFront costs.

Additionally, you can convert to highly optimized file formats such as WebP to reduce file size even further. Not all browsers support this format, but you can use CSS on the frontend to fall back to other types if needed:

<img src="myImage.webp" onerror="this.onerror=null; this.src='myImage.jpg'">

The data layer

AWS offers many different database and storage options that can be useful for web applications. Billing models vary by service and Region. By understanding the data access and storage requirements of your app, you can make informed decisions about the right service to use.

Generally, it’s more cost-effective to store binary data in S3 than a database. First, when the data is uploaded, you can upload directly to S3 with presigned URLs instead of proxying data via API Gateway or another service.

If you are using Amazon DynamoDB, it’s best practice to store larger items in S3 and include a reference token in a table item. Part of DynamoDB pricing is based on read capacity units (RCUs). For binary items such as images, it is usually more cost-efficient to use S3 for storage.

Many web developers who are new to serverless are familiar with using a relational database, so choose Amazon RDS for their database needs. Depending upon your use-case and data access patterns, it may be more cost effective to use DynamoDB instead. RDS is not a serverless service so there are monthly charges for the underlying compute instance. DynamoDB pricing is based upon usage and storage, so for many web apps may be a lower-cost choice.

Integration layer

This layer includes services like Amazon SQS, Amazon SNS, and Amazon EventBridge, which are essential for decoupling serverless applications. Each of these have a request-based pricing component, where 64 KB of a payload is billed as one request. For example, a single SQS message with a 256 KB payload is billed as four requests. There are two optimization methods common for web applications.

1. Combine messages

Many messages sent to these services are much smaller than 64 KB. In some applications, the publishing service can combine multiple messages to reduce the total number of publish actions to SNS. Additionally, by either eliminating unused attributes in the message or compressing the message, you can store more data in a single request.

For example, a publishing service may be able to combine multiple messages together in a single publish action to an SNS topic:

Before optimization, a publishing service sends 100,000,000 1KB-messages to an SNS topic. This is charged as 100 million messages for a total cost of $50.00.
After optimization, the publishing service combines messages to send 1,562,500 64KB-messages to an SNS topic. This is charged as 1,562,500 messages for a total cost of $0.78.

2. Filter messages

In many applications, not every message is useful for a consuming service. For example, an SNS topic may publish to a Lambda function, which checks the content and discards the message based on some criteria. In this case, it’s more cost effective to use the native filtering capabilities of SNS. The service can filter messages and only invoke the Lambda function if the criteria is met. This lowers the compute cost by only invoking Lambda when necessary.

For example, an SNS topic receives messages about customer orders and forwards these to a Lambda function subscriber. The function is only interested in canceled orders and discards all other messages:

Before optimization, the SNS topic sends all messages to a Lambda function. It evaluates the message for the presence of an order canceled attribute. On average, only 25% of the messages are processed further. While SNS does not charge for delivery to Lambda functions, you are charged each time the Lambda service is invoked, for 100% of the messages.
After optimization, using an SNS subscription filter policy, the SNS subscription filters for canceled orders and only forwards matching messages. Since the Lambda function is only invoked for 25% of the messages, this may reduce the total compute cost by up to 75%.

3. Choose a different messaging service

For complex filtering options based upon matching patterns, you can use EventBridge. The service can filter messages based upon prefix matching, numeric matching, and other patterns, combining several rules into a single filter. You can create branching logic within the EventBridge rule to invoke downstream targets.

EventBridge offers a broader range of targets than SNS destinations. In cases where you publish from an SNS topic to a Lambda function to invoke an EventBridge target, you could use EventBridge instead and eliminate the Lambda invocation. For example, instead of routing from SNS to Lambda to AWS Step Functions, instead create an EventBridge rule that routes events directly to a state machine.

Business logic layer

Step Functions allows you to orchestrate complex workflows in serverless applications while eliminating common boilerplate code. The Standard Workflow service charges per state transition. Express Workflows were introduced in December 2019, with pricing based on requests and duration, instead of transitions.

For workloads that are processing large numbers of events in shorter durations, Express Workflows can be more cost-effective. This is designed for high-volume event workloads, such as streaming data processing or IoT data ingestion. For these cases, compare the cost of the two workflow types to see if you can reduce cost by switching across.

Lambda is the on-demand compute layer in serverless applications, which is billed by requests and GB-seconds. GB-seconds is calculated by multiplying duration in seconds by memory allocated to the function. For a function with a 1-second duration, invoked 1 million times, here is how memory allocation affects the total cost in the US East (N. Virginia) Region:

Memory (MB)	GB/S	Compute cost	Total cost
128	125,000	$ 2.08	$ 2.28
512	500,000	$ 8.34	$ 8.54
1024	1,000,000	$ 16.67	$ 16.87
1536	1,500,000	$ 25.01	$ 25.21
2048	2,000,000	$ 33.34	$ 33.54
3008	2,937,500	$ 48.97	$ 49.17

There are many ways to optimize Lambda functions, but one of the most important choices is memory allocation. You can choose between 128 MB and 3008 MB, but this also impacts the amount of virtual CPU as memory increases. Since total cost is a combination of memory and duration, choosing more memory can often reduce duration and lower overall cost.

Instead of manually setting the memory for a Lambda function and running executions to compare duration, you can use the AWS Lambda Power Tuning tool. This uses Step Functions to run your function against varying memory configurations. It can produce a visualization to find the optimal memory setting, based upon cost or execution time.

Conclusion

Web application backends are one of the most popular workload types for serverless applications. The pay-per-value model works well for this type of workload. As traffic grows, it’s important to consider the design choices and service configurations used to optimize your cost.

Serverless web applications generally use a common range of services, which you can logically split into different layers. This post examines each layer and suggests common cost optimizations helpful for web app developers.

To learn more about building web apps with serverless, see the Happy Path series. For more serverless learning resources, visit https://serverlessland.com.

ICYMI: Serverless Q3 2020

2020-10-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2020/

Welcome to the 11th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

AWS Lambda

In August, we launched support for using Amazon Managed Streaming for Apache Kafka (Amazon MSK) as an event source for Lambda functions. Lambda has existing support for processing streams from Kinesis and DynamoDB. Now you can process data streams from Amazon MSK and easily integrate with downstream serverless workflows. This integration allows you to process batches of records, one per partition at a time, and scale concurrency by increasing the number of partitions in a topic.

We also announced support for Java 8 (Corretto) in Lambda, and you can now use Amazon Linux 2 for custom runtimes. Amazon Linux 2 is the latest generation of Amazon Linux and provides an application environment with access to the latest innovations in the Linux ecosystem.

Amazon API Gateway

API Gateway continued to launch new features for HTTP APIs, including new integrations for five AWS services. HTTP APIs can now route requests to AWS AppConfig, Amazon EventBridge, Amazon Kinesis Data Streams, Amazon SQS, and AWS Step Functions. This makes it easy to create webhooks for business logic hosted in these services. The service also expanded the authorization capabilities, adding Lambda and IAM authorizers, and enabled wildcards in custom domain names. Over time, we will continue to improve and migrate features from REST APIs to HTTP APIs.

In September, we launched mutual TLS for both regional REST APIs and HTTP APIs. This is a new method for client-to-server authentication to enhance the security of your API. It can protect your data from exploits such as client spoofing or man-in-the-middle. This enforces two-way TLS (or mTLS) which enables certificate-based authentication both ways from client-to-server and server-to-client.

Enhanced observability variables now make it easier to troubleshoot each phase of an API request. Each phase from AWS WAF through to integration adds latency to a request, returns a status code, or raises an error. Developers can use these variables to identify the cause of latency within the API request. You can configure these variables in AWS SAM templates – see the demo application to see how you can use these variables in your own application.

AWS Step Functions

We added X-Ray tracing support for Step Functions workflows, giving you full visibility across state machine executions, making it easier to analyze and debug distributed applications. Using the service map view, you can visually identify errors in resources and view error rates across workflow executions. You can then drill into the root cause of an error. You can enable X-Ray in existing workflows by a single-click in the console. Additionally, you can now also visualize Step Functions workflows directly in the Lambda console. To see this new feature, open the Step Functions state machines page in the Lambda console.

Step Functions also increased the payload size to 256 KB and added support for string manipulation, new comparison operators, and improved output processing. These updates were made to the Amazon States Languages (ASL), which is a JSON-based language for defining state machines. The new operators include comparison operators, detecting the existence of a field, wildcarding, and comparing two input fields.

AWS Serverless Application Model (AWS SAM)

AWS SAM is an open source framework for building serverless applications that converts a shorthand syntax into CloudFormation resources.

In July, the AWS SAM CLI became generally available (GA). This tool operates on SAM templates and provides developers with local tooling for building serverless applications. The AWS SAM CLI offers a rich set of tools that enable developers to build serverless applications quickly.

AWS X-Ray

X-Ray launched a public preview of X-Ray Insights, which can help produce actionable insights for anomalies within your applications. Designed to make it easier to analyze and debug distributed applications, it can proactively identify issues caused by increases in faults. Using the incident timeline, you can visualize when the issue started and how it developed. The service identifies a probable root cause along with any anomalous services. There is no additional instrumentation needed to use X-Ray Insights – you can enable this feature within X-Ray Groups.

Amazon Kinesis

In July, Kinesis announced support for data delivery to generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk. Use the Amazon Kinesis console to configure your data producers to send data to Amazon Kinesis Data Firehose and specify one of these new delivery targets. Additionally, Amazon Kinesis Data Firehose is now available in the Europe (Milan) and Africa (Cape Town) AWS Regions.

Serverless Posts

Our team is always working to build and write content to help our customers better understand all our serverless offerings. Here is a list of the latest posts published to the AWS Compute Blog this quarter.

Tech Talks & Events

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the globe, regularly join in on podcasts, and record short videos you can find to learn in quick byte sized chunks.

Here are some from Q3:

Learning Paths

Learn How to Build and Deploy a Web App Backend that Supports Authentication, Geohashing, and Real-Time Messaging

Ask Around Me is an example web app that shows how to build authenticaton, geohashing and real-time messaging into your serverless applications. This learning path includes videos and learning resources to help walk you through the application.

Build a Serverless Web App for a Theme Park

This five-video learning path walks you through the Innovator Island workshop, and provides learning resources for building realtime serverless web applications.

Live streams

July

August

September

There are also a number of other helpful video series covering serverless available on the Serverless Land YouTube channel.

New AWS Serverless Heroes

We’re pleased to welcome Angela Timofte, Luca Bianchi, Matthieu Napoli, Peter Hanssens, Sheen Brisals, and Tom McLaughlin to the growing list of AWS Serverless Heroes.

The AWS Hero program is a selection of worldwide experts that have been recognized for their positive impact within the community. They share helpful knowledge and organize events and user groups. They’re also contributors to numerous open-source projects in and around serverless technologies.

New! The Serverless Land website

To help developers find serverless learning resources, we have curated a list of serverless blogs, videos, events and training programs at a new site, Serverless Land. This is regularly updated with new information – you can subscribe to the RSS feed for automatic updates, follow the LinkedIn page or subscribe to the YouTube channel.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow all of us on Twitter to see the latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Building resilient serverless patterns by combining messaging services

2020-10-05 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-resilient-no-code-serverless-patterns-by-combining-messaging-services/

In “Choosing between messaging services for serverless applications”, I explain the features and differences between the core AWS messaging services. Amazon SQS, Amazon SNS, and Amazon EventBridge provide queues, publish/subscribe, and event bus functionality for your applications. Individually, these are robust, scalable services that are fundamental building blocks of serverless architectures.

However, you can also combine these services to solve specific challenges in distributed architectures. By doing this, you can use specific features of each service to build sophisticated patterns with little code. These combinations can make your applications more resilient and scalable, and reduce the amount of custom logic and architecture in your workload.

In this blog post, I highlight several important patterns for serverless developers. I also show how you use and deploy these integrations with the AWS Serverless Application Model (AWS SAM).

Examples in this post refer to code that can be downloaded from this GitHub repo. The README.md file explains how to deploy and run each example.

SNS to SQS: Adding resilience and throttling to message throughput

SNS has a robust retry policy that results in up to 100,010 delivery attempts over 23 days. If a downstream service is unavailable, it may be overwhelmed by retries when it comes back online. You can solve this issue by adding an SQS queue.

Adding an SQS queue between the SNS topic and its subscriber has two benefits. First, it adds resilience to message delivery, since the messages are durably stored in a queue. Second, it throttles the rate of messages to the consumer, helping smooth out traffic bursts caused by the service catching up with missed messages.

To build this in an AWS SAM template, you first define the two resources, and the SNS subscription:

  MySqsQueue:
    Type: AWS::SQS::Queue

  MySnsTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Protocol: sqs
          Endpoint: !GetAtt MySqsQueue.Arn

Finally, you provide permission to the SNS topic to publish to the queue, using the AWS::SQS::QueuePolicy resource:

  SnsToSqsPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: "Allow SNS publish to SQS"
            Effect: Allow
            Principal: "*"
            Resource: !GetAtt MySqsQueue.Arn
            Action: SQS:SendMessage
            Condition:
              ArnEquals:
                aws:SourceArn: !Ref MySnsTopic
      Queues:
        - Ref: MySqsQueue

To test this, you can publish a message to the SNS topic and then inspect the SQS queue length using the AWS CLI:

aws sns publish --topic-arn "arn:aws:sns:us-east-1:123456789012:sns-sqs-MySnsTopic-ABC123ABC" --message "Test message"
aws sqs get-queue-attributes --queue-url "https://sqs.us-east-1.amazonaws.com/123456789012/sns-sqs-MySqsQueue- ABC123ABC " --attribute-names ApproximateNumberOfMessages

This results in the following output:

Another usage of this pattern is when you want to filter messages in architectures using an SQS queue. By placing the SNS topic in front of the queue, you can use the message filtering capabilities of SNS. This ensures that only the messages you need are published to the queue. To use message filtering in AWS SAM, use the AWS:SNS:Subcription resource:

  QueueSubcription:
    Type: 'AWS::SNS::Subscription'
    Properties:
      TopicArn: !Ref MySnsTopic
      Endpoint: !GetAtt MySqsQueue.Arn
      Protocol: sqs
      FilterPolicy:
        type:
        - orders
        - payments 
      RawMessageDelivery: 'true'

EventBridge to SNS: combining features of both services

Both SNS and EventBridge have different characteristics in terms of targets, and integration with broader features. This table compares the major differences between the two services:

	Amazon SNS	Amazon EventBridge
Number of targets	10 million (soft)	5
Limits	100,000 topics. 12,500,000 subscriptions per topic.	100 event buses. 300 rules per event bus.
Input transformation	No	Yes – see details.
Message filtering	Yes – see details.	Yes, including IP address matching – see details.
Format	Raw or JSON	JSON
Receive events from AWS CloudTrail	No	Yes
Targets	HTTP(S), SMS, SNS Mobile Push, Email/Email-JSON, SQS, Lambda functions	15 targets including AWS Lambda, Amazon SQS, Amazon SNS, AWS Step Functions, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose.
SaaS integration	No	Yes – see integration partners.
Schema Registry integration	No	Yes – see details.
Dead-letter queues supported	Yes	No
Public visibility	Can create public topics	Cannot create public buses
Cross-Region	You can subscribe your AWS Lambda functions to an Amazon SNS topic in any Region.	Targets must be same Region. You can publish across Region to another event bus.

In this pattern, you configure an SNS topic as a target of an EventBridge rule:

In the AWS SAM template, you declare the resources in the preceding diagram as follows:

Resources:
  MySnsTopic:
    Type: AWS::SNS::Topic

  EventRule: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "EventRule"
      EventPattern: 
        account: 
          - !Sub '${AWS::AccountId}'
        source:
          - "demo.cli"
      Targets: 
        - Arn: !Ref MySnsTopic
          Id: "SNStopic"

The default bus already exists in every AWS account, so there is no need to declare it. For the event bus to publish matching events to the SNS topic, you define permissions using the AWS::SNS::TopicPolicy resource:

  EventBridgeToToSnsPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties: 
      PolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: events.amazonaws.com
          Action: sns:Publish
          Resource: !Ref MySnsTopic
      Topics: 
        - !Ref MySnsTopic

EventBridge has a limit of five targets per rule. In cases where you must send events to hundreds or thousands of targets, publishing to SNS first and then subscribing those targets to the topic works around this limit. Both services have different targets, and this pattern allows you to deliver EventBridge events to SMS, HTTP(s), email and SNS mobile push.

You can transform and filter the message using these services, often without needing an AWS Lambda function. SNS does not support input transformation but you can do this in an EventBridge rule. Message filtering is possible in both services but EventBridge provides richer content filtering capabilities.

AWS CloudTrail can log and monitor activity across services in your AWS account. It can be a useful source for events, allowing you to respond dynamically to objects in Amazon S3 or react to changes in your environment, for example. This natively integrates with EventBridge, allowing you to ingest events at scale from dozens of services.

Using EventBridge enables you to source events from outside your AWS account, offering integrations with a list of software as a service (SaaS) providers. This capability allows you to receive events from your accounts with SaaS providers like Zendesk, PagerDuty, and Auth0. These events are delivered to a partner event bus in your account, and can then be filtered and routed to an SNS topic.

Additionally, this pattern allows you to deliver events to Lambda functions in other AWS accounts and in other AWS Regions. You can invoke Lambda from SNS topics in other Regions and accounts. It’s also possible to make SNS topics publicly read-only, making them extensible endpoints that other third parties can consume from. SNS has comprehensive access control, which you can incorporate into this pattern.

EventBridge to SQS: Building fault-tolerant microservices

EventBridge can route events to targets such as microservices. In the case of downstream failures, the service retries events for up to 24 hours. For workloads where you need a longer period of time to store and retry messages, you can deliver the events to an SQS queue in each microservice. This durably stores those events until the downstream service recovers. Additionally, this pattern protects the microservice from large bursts of traffic by throttling the delivery of messages.

The resources declared in the AWS SAM template are similar to the previous examples, but it uses the AWS::SQS::QueuePolicy resource to grant the appropriate permission to EventBridge:

  EventBridgeToToSqsPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: events.amazonaws.com
          Action: SQS:SendMessage
          Resource:  !GetAtt MySqsQueue.Arn
      Queues:
        - Ref: MySqsQueue

Conclusion

You can combine these services in your architectures to implement patterns that solve complex challenges, often with little code required. This blog post shows three examples that implement message throttling and queueing, integrating SNS and EventBridge, and building fault tolerant microservices.

To learn more building decoupled architectures, see this Learning Path series on EventBridge. For more serverless learning resources, visit https://serverlessland.com.

Pay as you go machine learning inference with AWS Lambda

2020-10-01 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/pay-as-you-go-machine-learning-inference-with-aws-lambda/

This post is courtesy of Eitan Sela, Senior Startup Solutions Architect.

Many customers want to deploy machine learning models for real-time inference, and pay only for what they use. Using Amazon EC2 instances for real-time inference may not be cost effective to support sporadic inference requests throughout the day.

AWS Lambda is a serverless compute service with pay-per-use billing. However, ML frameworks like XGBoost are too large to fit into the 250 MB application artifact size limit, or the 512 MB /tmp space limit. While you can store the packages in Amazon S3 and download to Lambda (up to 3 GB), this can increase the cost.

To address this, Lambda functions can now mount an Amazon Elastic File System (EFS). This is a scalable and elastic NFS file system storing data within and across multiple Availability Zones (AZ) for high availability and durability.

With this new capability, it’s now easier to use Python packages in Lambda that require storage space to load models and other dependencies.

In this blog post, I walk through how to:

Create an EFS file system and an Access Point as an application-specific entry point.
Provision an EC2 instance, mount EFS using the Access Point, and train a breast cancer XGBoost ML model. XGBoost, Python packages, and the model are saved on the EFS file system.
Create a Lambda function that loads the Python packages and model from EFS, and performs the prediction based on a test event.

Create an Amazon EFS file system with an Access Point

Configuring EFS for Lambda is straight-forward. I show how to do this in the AWS CloudFormation but you can also use the AWS CLI, AWS SDK, and AWS Serverless Application Model (AWS SAM).

EFS file systems are created within a customer VPC, so Lambda functions using the EFS file system must have access to the same VPC.

You can deploy the AWS CloudFormation stack located on this GitHub repository.

The stack includes the following:

Create a VPC with public subnet.
Create an EFS file system
Create an EFS Access Point
Create an EC2 in the VPC

It can take up to 10 minutes for the CloudFormation stack to create the resources. After the resource creation is complete, navigate to the EFS console to see the new file system.

Navigate to the Access Points panel to see a new Access Point with the File system ID from the previous page.

Note the Access Point ID and File System ID for the following sections.

Launch an Amazon EC2 instance to train a breast cancer model

In this section, you install Python packages on the EFS file system, after mounting it to EC2. You then train the breast cancer model, and save the model in the EFS file system used by the Lambda function.

The machine learning framework you use for this function is XGBoost. This is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. XGBoost is one of the most popular machine learning algorithms.

Navigate to the EC2 console to see the new EC2 instance created from the CloudFormation stack. This is an Amazon Linux 2 c5.large EC2 instance named ‘xgboost-for-serverless-inference-cfn-ec2’. In the instance details, you see that the security group is configured to allow inbound SSH access (for connecting to the instance).

Mount the EFS file system on the EC2

Connect to the instance using SSH and mount the EFS file system previously created by using the Access Point:

Install amazon-efs-utils tools:
sudo yum -y install amazon-efs-utils
Create a directory to mount EFS into:
mkdir efs
Mount the EFS file system using the Access Point:
sudo mount -t efs -o tls,accesspoint=<Access point ID> <File system ID>:/ efs

Install Python, pip and required packages

Install Python and pip:
sudo yum -y install python37
curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py --user
Verify the installation:
python3 --version
pip3 --version
Create a requirements.txt file containing the dependencies:
xgboost==1.1.1
pandas
sklearn
joblib
Install the Python packages using the requirements file:
pip3 install -t efs/lib/ -r requirements.txt
Note: using bursting throughput mode with EFS File system, this action can take up to 10 minutes.
Set the Python path to refer to the installed packages directory of EFS file system:
export PYTHONPATH=/home/ec2-user/efs/lib/

Train the breast cancer model

The breast cancer model predicts whether the breast mass is a malignant tumor or benign by looking at features computed from a digitized image of a fine needle aspirate of a breast mass.

The data used to train the model consists of the diagnosis in addition to the 10 real-valued features that are computed for each cell nucleus. Such features include radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. The prediction returned by the model is either “B” for benign or “M” for malignant. This sample project uses the public Breast Cancer Wisconsin (Diagnostic) dataset.

After installing the required Python packages, train a XGBoost model on the breast cancer dataset:

Create a bc_xgboost_train.py file containing the Python code needed to train a breast cancer XGBoost model. Download the code here.
Start the training of the model:python3 bc_xgboost_train.pyYou see the following message:The model file bc-xgboost-model is created in the root directory.
Create a new directory on the EFS file system and copy the XGBoost breast cancer model:
mkdir efs/model
cp bc-xgboost-model efs/model/
Check you have the required Python packages and the model on the EFS file system:
ls efs/model/ efs/lib/
You see all the Python packages installed previously in the lib directory, and the model file in the model directory.
Review the total size of lib Python packages directory:
du -sh efs/lib/

You can see that the total size of lib directory is 534 MB. This is a larger package size than was allowed before EFS for Lambda.

Building a serverless machine learning inference using Lambda

In this section, you use the EFS file system previously configured for the Lambda function to import the required libraries and load the model.

Using EFS with Lambda

The AWS SAM template creates the Lambda function, mount the EFS Access Point created earlier, and both IAM roles required.

It takes several minutes for the AWS SAM CLI to create the Lambda function. After, navigate to the Lambda console to see the created Lambda function.

In the Lambda function configuration, you see the environment variables, and basic settings, such as runtime, memory, and timeout.

Further down, you see that the Lambda function has the VPC access configured, and the file system is mounted.

Test your Lambda function

In the Lambda console, select Configure test events from the Test events dropdown.
For Event Name, enter InferenceTestEvent.
Copy the event JSON from here and paste in the dialog box.
Choose Create. After saving, you see InferenceTestEvent in the Test list. Now choose Test.

You see the Lambda function inference result, log output, and duration:

Conclusion

In this blog post, you train an XGBoost breast cancer model using Python packages installed on an Amazon EFS file system. You create an AWS Lambda function that loads the Python packages and the model from EFS file system, and perform the predictions.

Now you know how to call a machine learning model inference using a Lambda function. To learn more about other real-world examples, see:

Using AWS Lambda as a consumer for Amazon Kinesis

2020-09-30 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-aws-lambda-as-a-consumer-for-amazon-kinesis/

This post is courtesy of Prateek Mehrotra, Software Development Engineer.

AWS Lambda integrates natively with Amazon Kinesis as a consumer to process data ingested through a data stream. The polling, checkpointing, and error handling complexities are abstracted when you use this native integration. This allows the Lambda function code to focus on business logic processing.

This blog post describes how to operate and optimize this integration at high throughput with low system overhead time and processing latencies.

To learn more about Kinesis concepts and terminology, visit the documentation page.

Overview

You can attach a Lambda function to a Kinesis stream to process data. Multiple Lambda functions can consume from a single Kinesis stream for different kinds of processing independently. These can be used alongside other consumers such as Amazon Kinesis Data Firehose.

If a Kinesis stream has ‘n’ shards, then at least ‘n’ concurrency is required for a consuming Lambda function to process data without any induced delay. Less than ‘n’ available concurrency results in elevated iterator age in the Kinesis stream and elevated iterator age in the Lambda consumer. In a multi-consumer paradigm, if the Kinesis iterator age spikes then at least one of the stream consumers also reports a corresponding iterator age spike.

When the parallelization factor is greater than 1 for a Lambda consumer, the record processor polls up-to ‘parallelization-factor’ partition keys at a time while processing from a single shard. To learn more, read about handling traffic with a parallelization factor.

Kinesis shard level metrics

When using Kinesis streams, it’s best practice to enable enhanced shard level metrics. These metrics can help in detecting if the data distribution is happening uniformly within the shards of the stream, or not.

In a single-source, multiple-consumer use case, enhanced shard level metrics can help identify the cause of elevated iterator age. This could be due to a single shard receiving data too quickly, or at least one of the consumers failing to process the data.

To learn more about Kinesis monitoring, visit the documentation page. If per-partition processing is not a requirement, distribute data uniformly across shards. To learn more about Kinesis partition keys, visit the documentation page.

Processing delay caused by consumer misconfiguration

Kinesis reports an iterator age metric. If this value spikes, data processing from the stream is delayed. The metric value is set by the earliest record read from the stream measured over the specified time period.

This delay slows the data processing of the pipeline. This happens when a single shard is receiving data faster than the consumer can process it or the consumer is failing to complete processing due to errors.

In a single-source, multiple-consumer use case, at least one of the consumers shows a corresponding iterator age spike. If there are multiple Lambda consumers of the same data stream, then each Lambda consumer will report its own iterator age metric. This helps identify the problematic consumer for further analysis.

Tuning the configuration to optimize for iterator age

There are several tuning options available when the iterator age is increasing for the consumer Lambda function.

1. Increase the batch size

If the Lambda function operates at a low maximum duration, a single invocation may process less than a maximum batch size. Increase the batch size (up to a maximum of 10,000) to read more records from a shard in a single batch. This can help normalize the iterator age.

2. Change the parallelization factor

Increasing the parallelization factor in the Lambda function allows concurrent invocations to read a single shard. Multiple batches of records are created in the shard based on partition keys, resulting in faster data consumption.

Iterator age can spike when the batch size is set to 10,000 and the parallelization factor is set to 10. This can happen when data is produced faster than the consumer can process it, backing up the per-shard/per-partition queues. To mitigate this, subdivide the partition into multiple keys. This helps distribute the data for that partition key more evenly across shards.

3. Reduce the batch window

If data is distributed unequally across shards, or there is low write volume from producers, the Lambda poller may wait for an entire batch. You can reduce this wait time by reducing the batch window, which results in faster processing.

To learn more about Lambda poller batch window for Kinesis, visit the documentation page.

4. De-scale the Kinesis stream if overprovisioned

If the Kinesis stream metrics indicate that the stream is over-provisioned, de-scaling the stream helps increase data compaction within shards. This results in better throughput per Lambda invocation.

After reducing stream size, reduce the Lambda concurrency to maintain a 1:1 ratio of shard count to Lambda concurrency mapping. As load increases, increase the parallelization factor the keep the shard size constant. With this increase, the Lambda concurrency should be at least shard count * parallelization factor.

To learn more, read about handling traffic with a parallelization factor.

5. Enable enhanced fan-out for consumers

Enhanced fan-out allows developers to scale up the number of stream consumers by offering each stream consumer its own read throughput.

To learn more about Kinesis enhanced fan-out, visit the documentation page.

Conclusion

This blog post shows some of the best practices when using Lambda with Kinesis. It covers operational levers for high-throughput, low latency, single source data processing pipelines.

The enhanced Amazon Kinesis shard level metrics help monitor the maximum overhead processing delay per shard. When correlated with the Lambda consumer’s iterator age metrics, this shows each consumer’s performance. The effective combination of batch size, parallelization factor, batch window, and partition key can lead to more efficient stream processing.

To learn more about Amazon Kinesis, visit the Getting Started page.

Choosing between messaging services for serverless applications

2020-09-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-messaging-services-for-serverless-applications/

Most serverless application architectures use a combination of different AWS services, microservices, and AWS Lambda functions. Messaging services are important in allowing distributed applications to communicate with each other, and are fundamental to most production serverless workloads.

Messaging services can improve the resilience, availability, and scalability of applications, when used appropriately. They can also enable your applications to communicate beyond your workload or even the AWS Cloud, and provide extensibility for future service features and versions.

In this blog post, I compare the primary messaging services offered by AWS and how you can use these in your serverless application architectures. I also show how you use and deploy these integrations with the AWS Serverless Application Model (AWS SAM).

Examples in this post refer to code that can be downloaded from this GitHub repository. The README.md file explains how to deploy and run each example.

Overview

Three of the most useful messaging patterns for serverless developers are queues, publish/subscribe, and event buses. In AWS, these are provided by Amazon SQS, Amazon SNS, and Amazon EventBridge respectively. All of these services are fully managed and highly available, so there is no infrastructure to manage. All three integrate with Lambda, allowing you to publish messages via the AWS SDK and invoke functions as targets. Each of these services has an important role to play in serverless architectures.

SNS enables you to send messages reliably between parts of your infrastructure. It uses a robust retry mechanism for when downstream targets are unavailable. When the delivery policy is exhausted, it can optionally send those messages to a dead-letter queue for further processing. SNS uses topics to logically separate messages into channels, and your Lambda functions interact with these topics.

SQS provides queues for your serverless applications. You can use a queue to send, store, and receive messages between different services in your workload. Queues are an important mechanism for providing fault tolerance in distributed systems, and help decouple different parts of your application. SQS scales elastically, and there is no limit to the number of messages per queue. The service durably persists messages until they are processed by a downstream consumer.

EventBridge is a serverless event bus service, simplifying routing events between AWS services, software as a service (SaaS) providers, and your own applications. It logically separates routing using event buses, and you implement the routing logic using rules. You can filter and transform incoming messages at the service level, and route events to multiple targets, including Lambda functions.

Integrating an SQS queue with AWS SAM

The first example shows an AWS SAM template defining a serverless application with two Lambda functions and an SQS queue:

You can declare an SQS queue in an AWS SAM template with the AWS::SQS::Queue resource:

  MySqsQueue:
    Type: AWS::SQS::Queue

To publish to the queue, the publisher function must have permission to send messages. Using an AWS SAM policy template, you can apply policy that enables send messaging to one specific queue:

      Policies:
        - SQSSendMessagePolicy:
            QueueName: !GetAtt MySqsQueue.QueueName

The AWS SAM template passes the queue name into the Lambda function as an environment variable. The function uses the sendMessage method of the AWS.SQS class to publish the message:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const sqs = new AWS.SQS({apiVersion: '2012-11-05'})

// The Lambda handler
exports.handler = async (event) => {
  // Params object for SQS
  const params = {
    MessageBody: `Message at ${Date()}`,
    QueueUrl: process.env.SQSqueueName
  }
  
  // Send to SQS
  const result = await sqs.sendMessage(params).promise()
  console.log(result)
}

When the SQS queue receives the message, it publishes to the consuming Lambda function. To configure this integration in AWS SAM, the consumer function is granted the SQSPollerPolicy policy. The function’s event source is set to receive messages from the queue in batches of 10:

  QueueConsumerFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: code/
      Handler: consumer.handler
      Runtime: nodejs12.x
      Timeout: 3
      MemorySize: 128
      Policies:  
        - SQSPollerPolicy:
            QueueName: !GetAtt MySqsQueue.QueueName
      Events:
        MySQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt MySqsQueue.Arn
            BatchSize: 10

The payload for the consumer function is the message from SQS. This is an array of messages up to the batch size, containing a body attribute with the publishing function’s MessageBody. You can see this in the CloudWatch log for the function:

Integrating an SNS topic with AWS SAM

The second example shows an AWS SAM template defining a serverless application with three Lambda functions and an SNS topic:

You declare an SNS topic and the subscribing Lambda functions with the AWS::SNS:Topic resource:

  MySnsTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Protocol: lambda
          Endpoint: !GetAtt TopicConsumerFunction1.Arn    
        - Protocol: lambda
          Endpoint: !GetAtt TopicConsumerFunction2.Arn

You provide the SNS service with permission to invoke the Lambda functions but defining an AWS::Lambda::Permission for each:

  TopicConsumerFunction1Permission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      Action: 'lambda:InvokeFunction'
      FunctionName: !Ref TopicConsumerFunction1
      Principal: sns.amazonaws.com

The SNSPublishMessagePolicy policy template grants permission to the publishing function to send messages to the topic. In the function, the publish method of the AWS.SNS class handles publishing:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const sns = new AWS.SNS({apiVersion: '2012-11-05'})

// The Lambda handler
exports.handler = async (event) => {
  // Params object for SNS
  const params = {
    Message: `Message at ${Date()}`,
    Subject: 'New message from publisher',
    TopicArn: process.env.SNStopic
  }
  
  // Send to SQS
  const result = await sns.publish(params).promise()
  console.log(result)
}

The payload for the consumer functions is the message from SNS. This is an array of messages, containing subject and message attributes from the publishing function. You can see this in the CloudWatch log for the function:

Differences between SQS and SNS configurations

SQS queues and SNS topics offer different functionality, though both can publish to downstream Lambda functions.

An SQS message is stored on the queue for up to 14 days until it is successfully processed by a subscriber. SNS does not retain messages so if there are no subscribers for a topic, the message is discarded.

SNS topics may broadcast to multiple targets. This behavior is called fan-out. It can be used to parallelize work across Lambda functions or send messages to multiple environments (such as test or development). An SNS topic can have up to 12,500,000 subscribers, providing highly scalable fan-out capabilities. The targets may include HTTP/S endpoints, SMS text messaging, SNS mobile push, email, SQS, and Lambda functions.

In AWS SAM templates, you can retrieve properties such as ARNs and names of queues and topics, using the following intrinsic functions:

	Amazon SQS	Amazon SNS
Channel type	Queue	Topic
Get ARN	!GetAtt MySqsQueue.Arn	!Ref MySnsTopic
Get name	!GetAtt MySqsQueue.QueueName	!GetAtt MySnsTopic.TopicName

Integrating with EventBridge in AWS SAM

The third example shows the AWS SAM template defining a serverless application with two Lambda functions and an EventBridge rule:

The default event bus already exists in every AWS account. You declare a rule that filters events in the event bus using the AWS::Events::Rule resource:

  EventRule: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "EventRule"
      EventPattern: 
        source: 
          - "demo.event"
        detail: 
          state: 
            - "new"
      State: "ENABLED"
      Targets: 
        - Arn: !GetAtt EventConsumerFunction.Arn
          Id: "ConsumerTarget"

The rule describes an event pattern specifying matching JSON attributes. Events that match this pattern are routed to the list of targets. You provide the EventBridge service with permission to invoke the Lambda functions in the target list:

  PermissionForEventsToInvokeLambda: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: "EventConsumerFunction"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: !GetAtt EventRule.Arn

The AWS SAM template uses an IAM policy statement to grant permission to the publishing function to put events on the event bus:

  EventPublisherFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: code/
      Handler: publisher.handler
      Timeout: 3
      Runtime: nodejs12.x
      Policies:
        - Statement:
          - Effect: Allow
            Resource: '*'
            Action:
              - events:PutEvents

The publishing function then uses the putEvents method of the AWS.EventBridge class, which returns after the events have been durably stored in EventBridge:

const AWS = require('aws-sdk')
AWS.config.update({region: 'us-east-1'})
const eventbridge = new AWS.EventBridge()

exports.handler = async (event) => {
  const params = {
    Entries: [ 
      {
        Detail: JSON.stringify({
          "message": "Hello from publisher",
          "state": "new"
        }),
        DetailType: 'Message',
        EventBusName: 'default',
        Source: 'demo.event',
        Time: new Date 
      }
    ]
  }
  const result = await eventbridge.putEvents(params).promise()
  console.log(result)
}

The payload for the consumer function is the message from EventBridge. This is an array of messages, containing subject and message attributes from the publishing function. You can see this in the CloudWatch log for the function:

Comparing SNS with EventBridge

SNS and EventBridge have many similarities. Both can be used to decouple publishers and subscribers, filter messages or events, and provide fan-in or fan-out capabilities. However, there are differences in the list of targets and features for each service, and your choice of service depends on the needs of your use-case.

EventBridge offers two newer capabilities that are not available in SNS. The first is software as a service (SaaS) integration. This enables you to authorize supported SaaS providers to send events directly from their EventBridge event bus to partner event buses in your account. This replaces the need for polling or webhook configuration, and creates a highly scalable way to ingest SaaS events directly into your AWS account.

The second feature is the Schema Registry, which makes it easier to discover and manage OpenAPI schemas for events. EventBridge can infer schemas based on events routed through an event bus by using schema discovery. This can be used to generate code bindings directly to your IDE for type-safe languages like Python, Java, and TypeScript. This can help accelerate development by automating the generation of classes and code directly from events.

This table compares the major features of both services:

	Amazon SNS	Amazon EventBridge
Number of targets	10 million (soft)	5
Availability SLA	99.9%	99.99%
Limits	100,000 topics. 12,500,000 subscriptions per topic.	100 event buses. 300 rules per event bus.
Publish throughput	Varies by Region. Soft limits.	Varies by Region. Soft limits.
Input transformation	No	Yes – see details.
Message filtering	Yes – see details.	Yes, including IP address matching – see details.
Message size maximum	256 KB	256 KB
Billing	Per 64 KB
Format	Raw or JSON	JSON
Receive events from AWS CloudTrail	No	Yes
Targets	HTTP(S), SMS, SNS Mobile Push, Email/Email-JSON, SQS, Lambda functions.	15 targets including AWS Lambda, Amazon SQS, Amazon SNS, AWS Step Functions, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose.
SaaS integration	No	Yes – see integrations.
Schema Registry integration	No	Yes – see details.
Dead-letter queues supported	Yes	No
FIFO ordering available	No	No
Public visibility	Can create public topics	Cannot create public buses
Pricing	$0.50/million requests + variable delivery cost + data transfer out cost. SMS varies.	$1.00/million events. Free for AWS events. No charge for delivery.
Billable request size	1 request = 64 KB	1 event = 64 KB
AWS Free Tier eligible	Yes	No
Cross-Region	You can subscribe your AWS Lambda functions to an Amazon SNS topic in any Region.	Targets must be in the same Region. You can publish across Regions to another event bus.
Retry policy	For SQS/Lambda, exponential backoff over 23 days. For SMTP, SMS and Mobile push, exponential backoff over 6 hours.	At-least-once event delivery to targets, including retry with exponential backoff for up to 24 hours.

Conclusion

Messaging is an important part of serverless applications and AWS services provide queues, publish/subscribe, and event routing capabilities. This post reviews the main features of SNS, SQS, and EventBridge and how they provide different capabilities for your workloads.

I show three example applications that publish and consume events from the three services. I walk through AWS SAM syntax for deploying these resources in your applications. Finally, I compare differences between the services.

To learn more building decoupled architectures, see this Learning Path series on EventBridge. For more serverless learning resources, visit https://serverlessland.com.

Introducing mutual TLS authentication for Amazon API Gateway

2020-09-17 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-mutual-tls-authentication-for-amazon-api-gateway/

This post is courtesy of Justin Pirtle, Principal Serverless Solutions Architect.

Today, AWS is introducing certificate-based mutual Transport Layer Security (TLS) authentication for Amazon API Gateway. This is a new method for client-to-server authentication that can be used with API Gateway’s existing authorization options.

By default, the TLS protocol only requires a server to authenticate itself to the client. The authentication of the client to the server is managed by the application layer. The TLS protocol also offers the ability for the server to request that the client send an X.509 certificate to prove its identity. This is called mutual TLS (mTLS) as both parties are authenticated via certificates with TLS.

Mutual TLS is commonly used for business-to-business (B2B) applications. It’s used in standards such as Open Banking, which enables secure open API integrations for financial institutions across the United Kingdom and Australia. It’s common for Internet of Things (IoT) applications to authenticate devices using digital certificates. Also, many companies authenticate their employees before granting access to data and services when used with a private certificate authority (CA).

API Gateway now provides integrated mutual TLS authentication at no additional cost. You can enable mutual TLS authentication on your custom domains to authenticate regional REST and HTTP APIs. You can still authorize requests with bearer or JSON Web Tokens (JWTs) or sign requests with IAM-based authorization.

To use mutual TLS with API Gateway, you upload a CA public key certificate bundle as an object containing public or private/self-signed CA certs. This is used for validation of client certificates. All existing API authorization options are available for use with mTLS authentication.

Getting started

To complete the following sample setup, you must first create an HTTP API with a valid custom domain name using the AWS Management Console. Mutual TLS is now available for both regional REST APIs and the newer HTTP APIs. You use HTTP APIs for the examples depicted in this post. More details on the pre-requisites to configure a custom domain name are available in the documentation.

Securing your API with mutual TLS

To configure mutual TLS, you first create the private certificate authority and client certificates. You need the public keys of the root certificate authority and any intermediate certificate authorities. These must be uploaded to API Gateway to authenticate certificates properly using mutual TLS. This example uses OpenSSL to create the certificate authority and client certificate. You can alternatively use a managed service such as AWS Certificate Manager Private Certificate Authority (ACM Private CA).

You first create a new certificate authority with signed client certificate using OpenSSL:

Create the private certificate authority (CA) private and public keys:
openssl genrsa -out RootCA.key 4096
openssl req -new -x509 -days 36500 -key RootCA.key -out RootCA.pem
Provide the requested inputs for the root certificate authority’s subject name, locality, organization, and organizational unit properties. Choose your own values for these prompts to customize your root CA.
You can optionally create any intermediary certificate authorities (CAs) using the previously issued root CA. The certificate chain length for certificates authenticated with mutual TLS in API Gateway can be up to four levels.
Once the CA certificates are created, you create the client certificate for use with authentication.
Create client certificate private key and certificate signing request (CSR):openssl genrsa -out my_client.key 2048
openssl req -new -key my_client.key -out my_client.csr
Enter the client’s subject name, locality, organization, and organizational unit properties of the client certificate. Keep the optional password challenge empty default.
Sign the newly created client cert by using your certificate authority you previously created:
openssl x509 -req -in my_client.csr -CA RootCA.pem -CAkey RootCA.key -set_serial 01 -out my_client.pem -days 36500 -sha256
You now have a minimum of five files in your directory (there are additional files if you are also using an intermediate CA):
- RootCA.key (root CA private key)
- RootCA.pem (root CA public key)
- my_client.csr (client certificate signing request)
- my_client.key (client certificate private key)
- my_client.pem (client certificate public key)
Prepare a PEM-encoded trust store file for all certificate authority public keys you want to use with mutual TLS:
1. If only using a single root CA (with no intermediary CAs), only the RootCA.pem file is required. Copy the existing root CA public key to a new truststore.pem file name for further clarity on which file is being used by API Gateway as the trust store:cp RootCA.pem truststore.pem
2. If using one or more intermediary CAs to sign certificates with a root of trust to your root CA previously created, you must bundle the respective PEM files of each CA into a single trust store PEM file. Use the cat command to build the bundle file:cat IntermediateCA_1.pem IntermediateCA_2.pem RootCA.pem > truststore.pem
  
  Note: The trust store CA bundle can contain up to 1,000 certificates authority PEM-encoded public key certificates up to 1 MB total object size.
Upload the trust store file to an Amazon S3 bucket in the same AWS account as our API Gateway API. It is also recommended to enable object versioning for the bucket you choose. You can perform these actions using the AWS Management Console, SDKs, or AWS CLI. Using the AWS CLI, create an S3 bucket, enable object versioning on the bucket, and upload the CA bundle file:aws s3 mb s3://your-name-ca-truststore --region us-east-1 #creates a new S3 bucket – skip if using existing bucket
aws s3api put-bucket-versioning --bucket your-name-ca-truststore --versioning-configuration Status=Enabled #enables versioning on S3 bucket
aws s3 cp truststore.pem s3://your-name-ca-truststore/truststore.pem #uploads object to S3 bucket

After uploading the new truststore CA bundle file, enable mutual TLS on the API Gateway custom domain name.

Enabling mutual TLS on a custom domain name

To configure mutual TLS within API Gateway:

Browse to the API Gateway console and choose Custom domain names:
Before changing settings, test a custom domain name with an API mapping to ensure that the API works without mutual TLS using curl. If your custom domain name and API configuration are correct, you receive a well-formed response and HTTP status code of 200.
After validation, enable mutual TLS for additional protection. Choose Edit to update the custom domain name configuration:
Enable the Mutual TLS authentication option and enter the path of the truststore PEM file, stored in an S3 bucket. You can optionally provide an S3 object version identifier to reference a specific version of the truststore CA bundle object:
Choose Save to enable mutual TLS for all APIs that the custom domain name maps to.
Wait for the custom domain status to show “Available”, indicating that the mutual TLS change is successfully deployed.
Test the HTTP request again using curl with the same custom domain name and without modifying the request. The request is now forbidden as the call cannot be properly authenticated with mutual TLS.
Test again with additional parameters in the curl command to include the local client certificate and negotiate the mutual TLS session for authentication. You can use curl with the —key and —cert parameters to send the client certificate as part of the request:curl --key my_client.key --cert my_client.pem https://api.yourdomain.com

The request is now properly authenticated and returns successfully.

Hardening the configuration

After setting up mutual TLS authentication for the API, harden the configuration with several additional capabilities.

Disabling access to the default API endpoint

Mutual TLS is successfully enabled on the custom domain name but the default API endpoint URL is still active. This default endpoint has the format https://{apiId}.execute-api.{region}.amazonaws.com. Since the default endpoint does not require mutual TLS, you may want to disable it. This helps to ensure that mutual TLS authentication is enforced for all traffic to the API.

To disable the endpoint:

Browse to the HTTP API in the API Gateway console.
Choose the API name in the menu:
In the API, choose Edit:
Disable the default endpoint toggle to force traffic to the custom domain name and use mutual TLS authentication. Choose Save.

Note: Disabling the default endpoint is only currently available for HTTP APIs.
Test invoking the default endpoint again. It is no longer active. The custom domain name continues to serve requests when authenticated using your client certificate.

Additional authorization capabilities

In addition to the initial mutual TLS authentication via client certificate, you can use all existing API Gateway authorizer options. This includes JSON Web Tokens (JWT)/Cognito user pool authorizers, Lambda authorizers, and IAM-based authorization.

For Lambda authorizers, the event payload is expanded to include additional certificate properties from the client’s authenticated certificate. These properties are found at requestContext.identity.clientCert with the Lambda authorizer v1 payload version or at requestContext.authentication.clientCert with the v2 payload version. These additional attributes include the PEM-encoded public key of the client cert and also the certificate subject distinguished name (DN), its issuer’s CA distinguished name, and the certificate’s valid from and to timestamps.

These additional context properties enable any custom validation of the calling certificate with any other request properties, such as bearer tokens in authorization headers, all with a unified authorizer response:

"requestContext": {
    "authentication": {
        "clientCert": {
            "clientCertPem": "-----BEGIN CERTIFICATE-----\nMIIEZTCCAk0CAQEwDQ...",
            "issuerDN": "C=US,ST=Washington,L=Seattle,O=Amazon Web Services,OU=Security,CN=My Private CA",
            "serialNumber": "1",
            "subjectDN": "C=US,ST=Washington,L=Seattle,O=Amazon Web Services,OU=Security,CN=My Client",
            "validity": {
                "notAfter": "Aug  5 00:28:21 2120 GMT",
                "notBefore": "Aug 29 00:28:21 2020 GMT"
            }
        }
    },
    ...

For Lambda authorizer blueprint samples, refer to https://github.com/awslabs/aws-apigateway-lambda-authorizer-blueprints.

Certificate revocation validation

You can validate certificates against any certificate revocation list (CRL) or by using the Online Certificate Status Protocol (OCSP) directly from a Lambda custom authorizer. A Lambda authorizer can locally cache a CRL for re-use across API authorization requests without downloading it each time.

For OCSP requests, the authorizer can make an API call to the OCSP server requesting validation that the certificate is still valid before returning the authorization response to API Gateway. Further enhancements supporting native certificate revocation verification capabilities are planned for future API Gateway releases.

Conclusion

Mutual TLS (mTLS) for API Gateway is generally available today at no additional cost. It’s available in all AWS commercial Regions, AWS GovCloud (US) Regions, and China Regions. It supports configuration via the API Gateway console, AWS CLI, SDKs, and AWS CloudFormation.

This post shows how to configure mutual TLS on a custom domain name and disable the default execute-api API endpoint. It also covers how to use Lambda authorizer extensions to further authorize client invocations or verify certificate revocation.

To learn more about Amazon API Gateway, visit the API Gateway developer guide documentation.

Uploading to Amazon S3 directly from a web or mobile application

2020-09-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/

In web and mobile applications, it’s common to provide users with the ability to upload data. Your application may allow users to upload PDFs and documents, or media such as photos or videos. Every modern web server technology has mechanisms to allow this functionality. Typically, in the server-based environment, the process follows this flow:

The user uploads the file to the application server.
The application server saves the upload to a temporary space for processing.
The application transfers the file to a database, file server, or object store for persistent storage.

While the process is simple, it can have significant side-effects on the performance of the web-server in busier applications. Media uploads are typically large, so transferring these can represent a large share of network I/O and server CPU time. You must also manage the state of the transfer to ensure that the entire object is successfully uploaded, and manage retries and errors.

This is challenging for applications with spiky traffic patterns. For example, in a web application that specializes in sending holiday greetings, it may experience most traffic only around holidays. If thousands of users attempt to upload media around the same time, this requires you to scale out the application server and ensure that there is sufficient network bandwidth available.

By directly uploading these files to Amazon S3, you can avoid proxying these requests through your application server. This can significantly reduce network traffic and server CPU usage, and enable your application server to handle other requests during busy periods. S3 also is highly available and durable, making it an ideal persistent store for user uploads.

In this blog post, I walk through how to implement serverless uploads and show the benefits of this approach. This pattern is used in the Happy Path web application. You can download the code from this blog post in this GitHub repo.

Overview of serverless uploading to S3

When you upload directly to an S3 bucket, you must first request a signed URL from the Amazon S3 service. You can then upload directly using the signed URL. This is two-step process for your application front end:

Call an Amazon API Gateway endpoint, which invokes the getSignedURL Lambda function. This gets a signed URL from the S3 bucket.
Directly upload the file from the application to the S3 bucket.

To deploy the S3 uploader example in your AWS account:

Navigate to the S3 uploader repo and install the prerequisites listed in the README.md.
In a terminal window, run:
git clone https://github.com/aws-samples/amazon-s3-presigned-urls-aws-sam
cd amazon-s3-presigned-urls-aws-sam
sam deploy --guided
At the prompts, enter s3uploader for Stack Name and select your preferred Region. Once the deployment is complete, note the APIendpoint output.

Testing the application

I show two ways to test this application. The first is with Postman, which allows you to directly call the API and upload a binary file with the signed URL. The second is with a basic frontend application that demonstrates how to integrate the API.

To test using Postman:

First, copy the API endpoint from the output of the deployment.
In the Postman interface, paste the API endpoint into the box labeled Enter request URL.
Choose Send.
After the request is complete, the Body section shows a JSON response. The uploadURL attribute contains the signed URL. Copy this attribute to the clipboard.
Select the + icon next to the tabs to create a new request.
Using the dropdown, change the method from GET to PUT. Paste the URL into the Enter request URL box.
Choose the Body tab, then the binary radio button.
Choose Select file and choose a JPG file to upload.
Choose Send. You see a 200 OK response after the file is uploaded.
Navigate to the S3 console, and open the S3 bucket created by the deployment. In the bucket, you see the JPG file uploaded via Postman.

To test with the sample frontend application:

Copy index.html from the example’s repo to an S3 bucket.
Update the object’s permissions to make it publicly readable.
In a browser, navigate to the public URL of index.html file.
Select Choose file and then select a JPG file to upload in the file picker. Choose Upload image. When the upload completes, a confirmation message is displayed.
Navigate to the S3 console, and open the S3 bucket created by the deployment. In the bucket, you see the second JPG file you uploaded from the browser.

Understanding the S3 uploading process

When uploading objects to S3 from a web application, you must configure S3 for Cross-Origin Resource Sharing (CORS). CORS rules are defined as an XML document on the bucket. Using AWS SAM, you can configure CORS as part of the resource definition in the AWS SAM template:

   S3UploadBucket:
    Type: AWS::S3::Bucket
    Properties:
      CorsConfiguration:
        CorsRules:
        - AllowedHeaders:
            - "*"
          AllowedMethods:
            - GET
            - PUT
            - HEAD
          AllowedOrigins:
            - "*"

The preceding policy allows all headers and origins – it’s recommended that you use a more restrictive policy for production workloads.

In the first step of the process, the API endpoint invokes the Lambda function to make the signed URL request. The Lambda function contains the following code:

const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const s3 = new AWS.S3()
const URL_EXPIRATION_SECONDS = 300

// Main Lambda entry point
exports.handler = async (event) => {
  return await getUploadURL(event)
}

const getUploadURL = async function(event) {
  const randomID = parseInt(Math.random() * 10000000)
  const Key = `${randomID}.jpg`

  // Get signed URL from S3
  const s3Params = {
    Bucket: process.env.UploadBucket,
    Key,
    Expires: URL_EXPIRATION_SECONDS,
    ContentType: 'image/jpeg'
  }
  const uploadURL = await s3.getSignedUrlPromise('putObject', s3Params)
  return JSON.stringify({
    uploadURL: uploadURL,
    Key
  })
}

This function determines the name, or key, of the uploaded object, using a random number. The s3Params object defines the accepted content type and also specifies the expiration of the key. In this case, the key is valid for 300 seconds. The signed URL is returned as part of a JSON object including the key for the calling application.

The signed URL contains a security token with permissions to upload this single object to this bucket. To successfully generate this token, the code calling getSignedUrlPromise must have s3:putObject permissions for the bucket. This Lambda function is granted the S3WritePolicy policy to the bucket by the AWS SAM template.

The uploaded object must match the same file name and content type as defined in the parameters. An object matching the parameters may be uploaded multiple times, providing that the upload process starts before the token expires. The default expiration is 15 minutes but you may want to specify shorter expirations depending upon your use case.

Once the frontend application receives the API endpoint response, it has the signed URL. The frontend application then uses the PUT method to upload binary data directly to the signed URL:

let blobData = new Blob([new Uint8Array(array)], {type: 'image/jpeg'})
const result = await fetch(signedURL, {
  method: 'PUT',
  body: blobData
})

At this point, the caller application is interacting directly with the S3 service and not with your API endpoint or Lambda function. S3 returns a 200 HTML status code once the upload is complete.

For applications expecting a large number of user uploads, this provides a simple way to offload a large amount of network traffic to S3, away from your backend infrastructure.

Adding authentication to the upload process

The current API endpoint is open, available to any service on the internet. This means that anyone can upload a JPG file once they receive the signed URL. In most production systems, developers want to use authentication to control who has access to the API, and who can upload files to your S3 buckets.

You can restrict access to this API by using an authorizer. This sample uses HTTP APIs, which support JWT authorizers. This allows you to control access to the API via an identity provider, which could be a service such as Amazon Cognito or Auth0.

The Happy Path application only allows signed-in users to upload files, using Auth0 as the identity provider. The sample repo contains a second AWS SAM template, templateWithAuth.yaml, which shows how you can add an authorizer to the API:

  MyApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      Auth:
        Authorizers:
          MyAuthorizer:
            JwtConfiguration:
              issuer: !Ref Auth0issuer
              audience:
                - https://auth0-jwt-authorizer
            IdentitySource: "$request.header.Authorization"
        DefaultAuthorizer: MyAuthorizer

Both the issuer and audience attributes are provided by the Auth0 configuration. By specifying this authorizer as the default authorizer, it is used automatically for all routes using this API. Read part 1 of the Ask Around Me series to learn more about configuring Auth0 and authorizers with HTTP APIs.

After authentication is added, the calling web application provides a JWT token in the headers of the request:

const response = await axios.get(API_ENDPOINT_URL, {
  headers: {
    Authorization: `Bearer ${token}`
        }
})

API Gateway evaluates this token before invoking the getUploadURL Lambda function. This ensures that only authenticated users can upload objects to the S3 bucket.

Modifying ACLs and creating publicly readable objects

In the current implementation, the uploaded object is not publicly accessible. To make an uploaded object publicly readable, you must set its access control list (ACL). There are preconfigured ACLs available in S3, including a public-read option, which makes an object readable by anyone on the internet. Set the appropriate ACL in the params object before calling s3.getSignedUrl:

const s3Params = {
  Bucket: process.env.UploadBucket,
  Key,
  Expires: URL_EXPIRATION_SECONDS,
  ContentType: 'image/jpeg',
  ACL: 'public-read'
}

Since the Lambda function must have the appropriate bucket permissions to sign the request, you must also ensure that the function has PutObjectAcl permission. In AWS SAM, you can add the permission to the Lambda function with this policy:

        - Statement:
          - Effect: Allow
            Resource: !Sub 'arn:aws:s3:::${S3UploadBucket}/'
            Action:
              - s3:putObjectAcl

Conclusion

Many web and mobile applications allow users to upload data, including large media files like images and videos. In a traditional server-based application, this can create heavy load on the application server, and also use a considerable amount of network bandwidth.

By enabling users to upload files to Amazon S3, this serverless pattern moves the network load away from your service. This can make your application much more scalable, and capable of handling spiky traffic.

This blog post walks through a sample application repo and explains the process for retrieving a signed URL from S3. It explains how to the test the URLs in both Postman and in a web application. Finally, I explain how to add authentication and make uploaded objects publicly accessible.

To learn more, see this video walkthrough that shows how to upload directly to S3 from a frontend web application. For more serverless learning resources, visit https://serverlessland.com.

Using Lambda layers to simplify your development process

2020-09-08 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-lambda-layers-to-simplify-your-development-process/

Serverless developers frequently import libraries and dependencies into their AWS Lambda functions. While you can zip these dependencies as part of the build and deployment process, in many cases it’s easier to use layers instead. In this post, I explain how layers work, and how you can build and include layers in your own applications.

This blog post references the Happy Path application, which shows how to build a flexible backend to a photo-processing web application. To learn more, refer to Using serverless backends to iterate quickly on web apps – part 1. This code in this post is available at this GitHub repo.

Overview of Lambda layers

A Lambda layer is an archive containing additional code, such as libraries, dependencies, or even custom runtimes. When you include a layer in a function, the contents are extracted to the /opt directory in the execution environment. You can include up to five layers per function, which count towards the standard Lambda deployment size limits.

Layers are deployed as immutable versions, and the version number increments each time you publish a new layer. When you include a layer in a function, you specify the layer version you want to use. Layers are automatically set as private, but they can be shared with other AWS accounts, or shared publicly. Permissions only apply to a single version of a layer.

Using layers can make it faster to deploy applications with the AWS Serverless Application Model (AWS SAM) or the Serverless framework. By moving runtime dependencies from your function code to a layer, this can help reduce the overall size of the archive uploaded during a deployment.

Creating a layer containing the AWS SDK

The AWS SDK allows you to interact programmatically with AWS services using one of the supported runtimes. The Lambda service includes the AWS SDK so you can use it without explicitly importing in your deployment package.

However, there is no guarantee of the version provided in the execution environment. The SDK is upgraded frequently to support new AWS services and features. As a result, the version may change at any time. You can see the current version used by Lambda by declaring an instance of the SDK and logging out the version method:

For production workloads, it’s best practice to lock the version of the AWS SDK used in your functions. You can achieve this by including the SDK with your code package. Once you include this library, your code always uses the version in the deployment package and not the version included in the Lambda service.

A serverless application may consist of many functions, which all use a common SDK version. Instead of bundling the SDK with each function deployment, you can create a layer containing the SDK. The effect of this is to reduce the size of the uploaded archive, which makes your deployments faster.

To create an AWS SDK layer:

First, clone this blog post’s GitHub repo. From a terminal window, execute:
git clone https://github.com/aws-samples/aws-lambda-layers-aws-sam-examples
cd ./aws-sdk-layer
This directory contains an AWS SAM template and Node.js package.json file. Install the package.json contents:
npm install
Create the layer directory defined in the AWS SAM template and the nodejs directory required by Lambda. Next, move the node_modules directory:
mkdir -p ./layer/nodejs
mv ./node_modules ./layer/nodejs
Next, deploy the AWS SAM template to create the layer:
sam deploy --guided
For the Stack name, enter “aws-sdk-layer”. Enter your preferred AWS Region and accept the other defaults.
After the deployment completes, the new Lambda layer is available to use. Run this command to see the available layers:aws lambda list-layers

After adding a layer to a function, you can use console.log to log out the AWS SDK version. This shows that the function is now using the SDK version in the layer instead of the version provided by the Lambda service:

Creating layers with OS-specific binaries

Many code libraries include binaries that are operating-system specific. When you build packages on your local development machine, by default the binaries for that operating system are used. These may not be the right binaries for Lambda, which runs on Amazon Linux. If you are not using a compatible operating system, you must ensure you include Linux binaries in the layer.

The simplest way to package these libraries correctly is to use AWS Cloud9. This is an IDE in the AWS Cloud, which runs on Amazon EC2. After creating an environment, you can clone a git repository directly to the local storage of the instance, and run the necessary build scripts.

The Happy Path application resizes images using the Sharp npm library. This library uses libvips, which is written in C, so the compilation is operating system-specific. By creating a layer containing this library, it simplifies the packaging and deployment of the consuming Lambda function.

To create a Sharp layer using AWS Cloud9:

Navigate to the AWS Cloud9 console.
Choose Create environment.
Enter the name “My IDE” and choose Next step.
Accept all the default and choose Next step.
Review the settings and choose Create environment.
In the terminal panel, enter:
git clone https://github.com/aws-samples/aws-lambda-layers-aws-sam-examples
cd ./aws-lambda-layers-aws-sam-examples/sharp-layer
npm install
From a terminal window, ensure you are in the directory where you cloned this post’s GitHub repo. Execute the following commands:cd ./sharp-layer
npm install
mkdir -p ./layer/nodejs
mv ./node_modules ./layer/nodejs
Next, deploy the AWS SAM template to create the layer:
sam deploy --guided
For the Stack name, enter “sharp-layer”. Enter your preferred AWS Region and accept the other defaults. After the deployment completes, the new Lambda layer is available to use.

In some runtimes, you can specify a local set of packages for development, and another set for production. For example, in Node.js, the package.json file allows you to specify two sections for dependencies. If your development machine uses a different operating system to Lambda, and therefore uses different binaries, you can use package.json to resolve this. In the Happy Path Resizer function, which uses the Sharp layer, the package.json refers to a local binary for development.

AWS SAM defines Lambda functions with the AWS::Serverless::Function resource. Layers are defined as a property of functions, as a list of layer ARNs including the version:

  MyLambdaFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: myFunction/
      Handler: app.handler
      MemorySize: 128
      Layers:
        - !Ref SharpLayerARN

Sharing a layer

Layers are private to your account by default but you can optionally share with other AWS accounts or make a layer public. You cannot share layers via the AWS Management Console but instead use the AWS CLI.

To share a layer, use add-layer-version-permission, specifying the layer name, version, AWS Region, and principal:

aws lambda add-layer-version-permission \
  --layer-name node-sharp \
  --principal '*' \
  --action lambda:GetLayerVersion \
  --version-number 3 
  --statement-id public 
  --region us-east-1

In the principal parameter, specify an individual account ID or use an asterisk to make the layer public. The CLI responds with a RevisionId containing the current revision of the policy:

You can check the permissions associated with a layer version by calling get-layer-version-policy with the layer name and version:

aws lambda get-layer-version-policy \
  --layer-name node-sharp \
  --version-number 3 \
  --region us-east-1

Similarly, you can delete permissions associated with a layer version by calling remove-layer-vesion-permission with the layer name, statement ID, and version:

aws lambda remove-layer-version-permission \
 -- layer-name node-sharp \
 -- statement-id public \
 -- version-number 3

Once the permissions are removed, calling get-layer-version-policy results in an error:

Conclusion

Lambda layers provide a convenient and effective way to package code libraries for sharing with Lambda functions in your account. Using layers can help reduce the size of uploaded archives and make it faster to deploy your code.

Layers can contain packages using OS-specific binaries, providing a convenient way to distribute these to developers. While layers are private by default, you can share with other accounts or make a layer public. Layers are published as immutable versions, and deleting a layer has no effect on deployed Lambda functions already using that layer.

To learn more about using Lambda layers, visit the documentation, or see how layers are used in the Happy Path web application.

Using serverless backends to iterate quickly on web apps – part 3

2020-08-31 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-serverless-backends-to-iterate-quickly-on-web-apps-part-3/

This series is about building flexible backends for web applications. The example is Happy Path, a web app that allows park visitors to upload and share maps and photos, to help replace printed materials.

In part 1, I show how the application works and walk through the backend architecture. In part 2, you deploy a basic image-processing workflow. To do this, you use an AWS Serverless Application Model (AWS SAM) template to create an AWS Step Functions state machine.

In this post, I show how to deploy progressively more complex workflow functionality while using minimal code. This solution in this web app is designed for 100,000 monthly active users. Using this approach, you can introduce more sophisticated workflows without affecting the existing backend application.

The code and instructions for this application are available in the GitHub repo.

Introducing image moderation

In the first version, users could upload any images to parks on the map. One of the most important feature requests is to enable moderation, to prevent inappropriate images from appearing on the app. To handle a large number of uploaded images, using human moderation would be slow and labor-intensive.

In this section, you use Amazon ML services to automate analyzing the images for unsafe content. Amazon Rekognition provides an API to detect if an image contains moderation labels. These labels are categorized into different types of unsafe content that would not be appropriate for this app.

Version 2 of the workflow uses this API to automate the process of checking images. To install version 2:

From a terminal window, delete the v1 workflow stack:
aws cloudformation delete-stack --stack-name happy-path-workflow-v1
Change directory to the version 2 AWS SAM template in the repo:
cd .\workflows\templates\v2
Build and deploy the solution:
sam build
sam deploy --guided
The deploy process prompts you for several parameters. Enter happy-path-workflow-v2 as the Stack Name. The other values are the outputs from the backend deployment process, detailed in the repo’s README. Enter these to complete the deployment.

From VS Code, open the v2 state machine in the repo from workflows/statemachines/v2.asl.json. Choose the Render graph option in the CodeLens to see the workflow visualization.

This new workflow introduces a Moderator step. This invokes a Moderator Lambda function that uses the Amazon Rekognition API. If this API identifies any unsafe content labels, it returns these as part of the function output.

The next step in the workflow is a Moderation result choice state. This evaluates the output of the previous function – if the image passes moderation, the process continues to the Resizer function. If it fails, execution moves to the RecordFailState step.

Step Functions integrates directly with some AWS services so that you can call and pass parameters into the APIs of those services. The RecordFailState uses an Amazon DynamoDB service integration to write the workflow failure to the application table, using the arn:aws:states:::dynamodb:updateItem resource.

Testing the workflow

To test moderation, I use an unsafe image with suggestive content. This is an image that is not considered appropriate for this application. To test the deployed v2 workflow:

Open the frontend application at https://localhost:8080 in your browser.
Select a park location, choose Show Details, and then choose Upload images.
Select an unsafe image to upload.
Navigate to the Step Functions console. This shows the v2StateMachine with one successful execution:
Select the state machine, and choose the execution to display more information. Scroll down to the Visual workflow panel.

This shows that the moderation failed and the path continued to log the failed state in the database. If you open the Output resource, this displays more details about why the image is considered unsafe.

Checking the image size and file type

The upload component in the frontend application limits file selection to JPG images but there is no check to reject images that are too small. It’s prudent to check and enforce image types and sizes on the backend API in addition to the frontend. This is because it’s possible to upload images via the API without using the frontend.

The next version of the workflow enforces image sizes and file types. To install version 3:

From a terminal window, delete the v2 workflow stack:
aws cloudformation delete-stack --stack-name happy-path-workflow-v2
Change directory to the version 3 AWS SAM template in the repo:
cd .\workflows\templates\v3
Build and deploy the solution:
sam build
sam deploy --guided
The deploy process prompts you for several parameters. Enter happy-path-workflow-v3 as the Stack Name. The other values are the outputs from the backend deployment process, detailed in the repo’s README. Enter these to complete the deployment.

From VS Code, open the v3 state machine in the repo from workflows/statemachines/v3.asl.json. Choose the Render graph option in the CodeLens to see the workflow visualization.

This workflow changes the starting point of the execution, introducing a Check Dimensions step. This invokes a Lambda function that checks the size and types of the Amazon S3 object using the image-size npm package. This function uses environment variables provided by the AWS SAM template to compare against a minimum size and allowed type array.

The output is evaluated by the Dimension Result choice state. If the image is larger than the minimum size allowed, execution continues to the Moderator function as before. If not, it passes to the RecordFailState step to log the result in the database.

Testing the workflow

To test, I use an image that’s narrower than the mixPixels value. To test the deployed v3 workflow:

Open the frontend application at https://localhost:8080 in your browser.
Select a park location, choose Show Details, and then choose Upload images.
Select an image with a width smaller than 800 pixels. After a few seconds, a rejection message appears:
Navigate to the Step Functions console. This shows the v3StateMachine with one successful execution. Choose the execution to show more detail.

The execution shows that the Check Dimension step added the image dimensions to the event object. Next, the Dimensions Result choice state rejected the image, and logged the result at the RecordFailState step. The application’s DynamoDB table now contains details about the failed upload:

Pivoting the application to a new concept

Until this point, the Happy Path web application is designed to help park visitors share maps and photos. This is the development team’s original idea behind the app. During the product-market fit stage of development, it’s common for applications to pivot substantially from the original idea. For startups, it can be critical to have the agility to modify solutions quickly to meet the needs of customers.

In this scenario, the original idea has been less successful than predicted, and park visitors are not adopting the app as expected. However, the business development team has identified a new opportunity. Restaurants would like an app that allows customers to upload menus and food photos. How can the development team create a new proof-of-concept app for restaurant customers to test this idea?

In this version, you modify the application to work for restaurants. While features continue to be added to the parks workflow, it now supports business logic specifically for the restaurant app.

To create the v4 workflow and update the frontend:

From a terminal window, delete the v3 workflow stack:
aws cloudformation delete-stack --stack-name happy-path-workflow-v3
Change directory to the version 4 AWS SAM template in the repo:
cd .\workflows\templates\v4
Build and deploy the solution:
sam build
sam deploy --guided
The deploy process prompts you for several parameters. Enter happy-path-workflow-v4 as the Stack Name. The other values are the outputs from the backend deployment process, detailed in the repo’s README. Enter these to complete the deployment.
Open frontend/src/main.js and update the businessType variable on line 63. Set this value to ‘restaurant’.
Start the local development server:
npm run serve
Open the application at http://localhost:8080. This now shows restaurants in the local area instead of parks.

In the Step Functions console, select the v4StateMachine to see the latest workflow, then open the Definition tab to see the visualization:

This workflow starts with steps that apply to both parks and restaurants – checking the image dimensions. Next, it determines the place type from the placeId record in DynamoDB. Depending on place type, it now follows a different execution path:

Parks continue to run the automated moderator process, then resizer and publish the result.
Restaurants now use Amazon Rekognition to determine the labels in the image. Any photos containing people are rejected. Next, the workflow continues to the resizer and publish process.
Other business types go to the RecordFailState step since they are not supported.

Testing the workflow

To test the deployed v4 workflow:

Open the frontend application at https://localhost:8080 in your browser.
Select a restaurant, choose Show Details, and then choose Upload images.
Select an image from the test photos dataset. After a few seconds, you see a message confirming the photo has been added.
Next, select an image that contains one or more people. The new restaurant workflow rejects this type of photo:
In the Step Functions console, select the last execution for the v4StateMachine to see how the Check for people step rejected the image:

If other business types are added later to the application, you can extend the Step Functions workflow accordingly. The cost of Step Functions is based on the number of transitions in a workflow, not the number of total steps. This means you can branch by business type in the Happy Path application. This doesn’t affect the overall cost of running an execution, if the total transitions are the same per execution.

Conclusion

Previously in this series, you deploy a simple workflow for processing image uploads in the Happy Path web application. In this post, you add progressively more complex functionality by deploying new versions of workflows.

The first iteration introduces image moderation using Amazon Rekognition, providing the ability to automate the evaluation of unsafe content. Next, the workflow is modified to check image size and file type. This allows you to reject any images that are too small or do not meet the type requirements. Finally, I show how to expand the logic further to accept other business types with their own custom workflows.

To learn more about building serverless web applications, see the Ask Around Me series.