All posts by James Beswick

Creating AWS Lambda environment variables from AWS Secrets Manager

2021-10-28 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-aws-lambda-environmental-variables-from-aws-secrets-manager/

This post is written by Andy Hall, Senior Solutions Architect.

AWS Lambda layers and extensions are used by third-party software providers for monitoring Lambda functions. A monitoring solution needs environmental variables to provide configuration information to send metric information to an endpoint.

Managing this information as environmental variables across thousands of Lambda functions creates operational overhead. Instead, you can use the approach in this blog post to create environmental variables dynamically from information hosted in AWS Secrets Manager.

This can help avoid managing secret rotation for individual functions. It ensures that values stay encrypted until runtime, and abstracts away the management of the environmental variables.

Overview

This post shows how to create a Lambda layer for Node.js, Python, Ruby, Java, and .NET Core runtimes. It retrieves values from Secrets Manager and converts the secret into an environmental variable that can be used by other layers and functions. The Lambda layer uses a wrapper script to fetch information from Secrets Manager and create environmental variables.

The steps in the process are as follows:

The Lambda service responds to an event and initializes the Lambda context.
The wrapper script is called as part of the Lambda init phase.
The wrapper script calls a Golang executable passing in the ARN for the secret to retrieve.
The Golang executable uses the Secrets Manager API to retrieve the decrypted secret.
The wrapper script converts the information into environmental variables and calls the next step in processing.

All of the code for this post is available from this GitHub repo.

The wrapper script

The wrapper script is the main entry-point for the extension and is called by the Lambda service as part of the init phase. During this phase, the wrapper script will read in basic information from the environment and call the Golang executable. If there was an issue with the Golang executable, the wrapper script will log a statement and exit with an error.

# Get the secret value by calling the Go executable
values=$(${fullPath}/go-retrieve-secret -r "${region}" -s "${secretArn}" -a "${roleName}" -t ${timeout})
last_cmd=$?

# Verify that the last command was successful
if [[ ${last_cmd} -ne 0 ]]; then
    echo "Failed to setup environment for Secret ${secretArn}"
    exit 1
fi

Golang executable

This uses Golang to invoke the AWS APIs since the Lambda execution environment does natively provide the AWS Command Line Interface. The Golang executable can be included in a layer so that the layer works with a number of Lambda runtimes.

The Golang executable captures and validates the command line arguments to ensure that required parameters are supplied. If Lambda does not have permissions to read and decrypt the secret, you can supply an ARN for a role to assume.

The following code example shows how the Golang executable retrieves the necessary information to assume a role using the AWS Security Token Service:

client := sts.NewFromConfig(cfg)

return client.AssumeRole(ctx,
    &sts.AssumeRoleInput{
        RoleArn: &roleArn,
        RoleSessionName: &sessionName,
    },
)

After obtaining the necessary permissions, the secret can be retrieved using the Secrets Manager API. The following code example uses the new credentials to create a client connection to Secrets Manager and the secret:

client := secretsmanager.NewFromConfig(cfg, func(o *secretsmanager.Options) {
    o.Credentials = aws.NewCredentialsCache(credentials.NewStaticCredentialsProvider(*assumedRole.Credentials.AccessKeyId, *assumedRole.Credentials.SecretAccessKey, *assumedRole.Credentials.SessionToken))
})
return client.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{
    SecretId: aws.String(secretArn),
})

After retrieving the secret, the contents must be converted into a format that the wrapper script can use. The following sample code covers the conversion from a secret string to JSON by storing the data in a map. Once the data is in a map, a loop is used to output the information as key-value pairs.

// Convert the secret into JSON
var dat map[string]interface{}

// Convert the secret to JSON
if err := json.Unmarshal([]byte(*result.SecretString), &dat); err != nil {
    fmt.Println("Failed to convert Secret to JSON")
    fmt.Println(err)
    panic(err)
}

// Get the secret value and dump the output in a manner that a shell script can read the
// data from the output
for key, value := range dat {
    fmt.Printf("%s|%s\n", key, value)
}

Conversion to environmental variables

After the secret information is retrieved by using Golang, the wrapper script can now loop over the output, populate a temporary file with export statements, and execute the temporary file. The following code covers these steps:

# Read the data line by line and export the data as key value pairs 
# and environmental variables
echo "${values}" | while read -r line; do 
    
    # Split the line into a key and value
    ARRY=(${line//|/ })

    # Capture the kay value
    key="${ARRY[0]}"

    # Since the key had been captured, no need to keep it in the array
    unset ARRY[0]

    # Join the other parts of the array into a single value.  There is a chance that
    # The split man have broken the data into multiple values.  This will force the
    # data to be rejoined.
    value="${ARRY[@]}"
    
    # Save as an env var to the temp file for later processing
    echo "export ${key}=\"${value}\"" >> ${tempFile}
done

# Source the temp file to read in the env vars
. ${tempFile}

At this point, the information stored in the secret is now available as environmental variables to layers and the Lambda function.

Deployment

To deploy this solution, you must build on an instance that is running an Amazon Linux 2 AMI. This ensures that the compiled Golang executable is compatible with the Lambda execution environment.

The easiest way to deploy this solution is from an AWS Cloud9 environment but you can also use an Amazon EC2 instance. To build and deploy the solution into your environment, you need the ARN of the secret that you want to use. A build script is provided to ease deployment and perform compilation, archival, and AWS CDK execution.

To deploy, run:

./build.sh <ARN of the secret to use>

Once the build is complete, the following resources are deployed into your AWS account:

A Lambda layer (called get-secrets-layer)
A second Lambda layer for testing (called second-example-layer)
A Lambda function (called example-get-secrets-lambda)

Testing

To test the deployment, create a test event to send to the new example-get-secrets-lambda Lambda function using the AWS Management Console. The test Lambda function uses both the get-secrets-layer and second-example-layer Lambda layers, and the secret specified from the build. This function logs the values of environmental variables that were created by the get-secrets-layer and second-example-layer layers:

The secret contains the following information:

{
  "EXAMPLE_CONNECTION_TOKEN": "EXAMPLE AUTH TOKEN",
  "EXAMPLE_CLUSTER_ID": "EXAMPLE CLUSTER ID",
  "EXAMPLE_CONNECTION_URL": "EXAMPLE CONNECTION URL",
  "EXAMPLE_TENANT": "EXAMPLE TENANT",
  "AWS_LAMBDA_EXEC_WRAPPER": "/opt/second-example-layer"
}

This is the Python code for the example-get-secrets-lambda function:

import os
import json
import sys

def lambda_handler(event, context):
    print(f"Got event in main lambda [{event}]",flush=True)
    
    # Return all of the data
    return {
        'statusCode': 200,
        'layer': {
            'EXAMPLE_AUTH_TOKEN': os.environ.get('EXAMPLE_AUTH_TOKEN', 'Not Set'),
            'EXAMPLE_CLUSTER_ID': os.environ.get('EXAMPLE_CLUSTER_ID', 'Not Set'),
            'EXAMPLE_CONNECTION_URL': os.environ.get('EXAMPLE_CONNECTION_URL', 'Not Set'),
            'EXAMPLE_TENANT': os.environ.get('EXAMPLE_TENANT', 'Not Set'),
            'AWS_LAMBDA_EXEC_WRAPPER': os.environ.get('AWS_LAMBDA_EXEC_WRAPPER', 'Not Set')
        },
        'secondLayer': {
            'SECOND_LAYER_EXECUTE': os.environ.get('SECOND_LAYER_EXECUTE', 'Not Set')
        }
    }

When running a test using the AWS Management Console, you see the following response returned from the Lambda in the AWS Management Console:

{
  "statusCode": 200,
  "layer": {
    "EXAMPLE_AUTH_TOKEN": "EXAMPLE AUTH TOKEN",
    "EXAMPLE_CLUSTER_ID": "EXAMPLE CLUSTER ID",
    "EXAMPLE_CONNECTION_URL": "EXAMPLE CONNECTION URL",
    "EXAMPLE_TENANT": "EXAMPLE TENANT",
    "AWS_LAMBDA_EXEC_WRAPPER": "/opt/second-example-layer"
  },
  "secondLayer": {
    "SECOND_LAYER_EXECUTE": "true"
  }
}

When the secret changes, there is a delay before those changes are available to the Lambda layers and function. This is because the layer only executes in the init phase of the Lambda lifecycle. After the Lambda execution environment is recreated and initialized, the layer executes and creates environmental variables with the new secret information.

Conclusion

This solution provides a way to convert information from Secrets Manager into Lambda environment variables. By following this approach, you can centralize the management of information through Secrets Manager, instead of at the function level.

For more information about the Lambda lifecycle, see the Lambda execution environment lifecycle documentation.

The code for this post is available from this GitHub repo.

For more serverless learning resources, visit Serverless Land.

Monitoring and tuning federated GraphQL performance on AWS Lambda

2021-10-26 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/monitoring-and-tuning-federated-graphql-performance-on-aws-lambda/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

Our federated GraphQL at IMDb distributes requests across 19 subgraphs (graphlets). To ensure reliability for customers, IMDb monitors availability and performance across the whole stack. This article focuses on this challenge and concludes a 3-part federated GraphQL series:

Part 1 presents the migration from a monolithic REST API to a federated GraphQL (GQL) endpoint running on AWS Lambda.
Part 2 describes schema management in federated GQL systems.

This presents an approach towards performance tuning. It compares graphlets with the same logic and different runtime (for example, Java and Node.js) and shows best practices for AWS Lambda tuning.

The post describes IMDb’s test strategy that emphasizes the areas of ownership for the Gateway and Graphlet teams. In contrast to the legacy monolithic system described in part 1, the federated GQL gateway does not own any business logic. Consequently, the gateway integration tests focus solely on platform features, leaving the resolver logic entirely up to the graphlets.

Monitoring and alarming

Efficient monitoring of a distributed system requires you to track requests across all components. To correlate service issues with issues in the gateway or other services, you must pass and log the common request ID.

Capture both error and latency metrics for every network call. In Lambda, you cannot send a response to the client until all work for that request is complete. As a result, this can add latency to a request.

The recommended way to capture metrics is Amazon CloudWatch embedded metric format (EMF). This scales with Lambda and helps avoid throttling by the Amazon CloudWatch PutMetrics API. You can also search and analyze your metrics and logs more easily using CloudWatch Logs Insights.

Lambda configured timeouts emit a Lambda invocation error metric, which can make it harder to separate timeouts from errors thrown during invocation. By specifying a timeout in-code, you can emit a custom metric to alarm on to treat timeouts differently from unexpected errors. With EMF, you can flush metrics before timing out in code, unlike the Lambda-configured timeout.

Running out of memory in a Lambda function also appears as a timeout. Use CloudWatch Insights to see if there are Lambda invocations that are exceeding the memory limits.

You can enable AWS X-Ray tracing for Lambda with a small configuration change to enable tracing. You can also trace components like SDK calls or custom sub segments.

Gateway integration tests

The Gateway team wants tests to be independent from the underlying data served by the graphlets. At the same time, they must test platform features provided by the Gateway – such as graphlet caching.

To simulate the real gateway-graphlet integration, IMDb uses a synthetic test graphlet that serves mock data. Given the graphlet’s simplicity, this reduces the risk of unreliable graphlet data. We can run tests asserting only platform features with the assumption of stable and functional, improving confidence that failing tests indicate issues with the platform itself.

This approach helps to reduce false positives in pipeline blockages and improves the continuous delivery rate. The gateway integration tests are run against the exposed endpoint (for example, a content delivery network) or by invoking the gateway Lambda function directly and passing the appropriate payload.

The former approach allows you to detect potential issues with the infrastructure setup. This is useful when you use infrastructure as code (IaC) tools like AWS CDK. The latter further narrows down the target of the tests to the gateway logic, which may be appropriate if you have extensive infrastructure monitoring and testing already in place.

Graphlet integration tests

The Graphlet team focuses only on graphlet-specific features. This usually means the resolver logic for the graph fields they own in the overall graph. All the platform features – including query federation and graphlet response caching – are already tested by the Gateway Team.

The best way to test the specific graphlet is to run the test suite by directly invoking the Lambda function. If there is any issue with the gateway itself, it does cause a false-positive failure for the graphlet team.

Load tests

It’s important to determine the maximum traffic volume your system can handle before releasing to production. Before the initial launch and before any high traffic events (for example, the Oscars or Golden Globes), IMDb conducts thorough load testing of our systems.

To perform meaningful load testing, the workload captures traffic logs to IMDb pages. We later replay the real customer traffic at the desired transaction-per-second (TPS) volume. This ensures that our tests approximate real-life usage. It reduces the risk of skewing test results due to over-caching and disproportionate graphlet usage. Vegeta is an example of a tool you can use to run the load test against your endpoint.

Canary tests

Canary testing can also help ensure high availability of an endpoint. The canary produces the traffic. This is a configurable script that runs on a schedule. You configure the canary script to follow the same routes and perform the same actions as a user, which allows you to continually verify the user experience even without live traffic.

Canaries should emit success and failure metrics that you can alarm on. For example, if a canary runs 100 times per minute and the success rate drops below 90% in three consecutive data points, you may choose to notify a technician about a potential issue.

Compared with integration tests, canary tests run continuously and do not require any code changes to trigger. They can be a useful tool to detect issues that are introduced outside the code change. For example, through manual resource modification in the AWS Management Console or an upstream service outage.

Performance tuning

There is a per-account limit on the number of concurrent Lambda invocations shared across all Lambda functions in a single account. You can help to manage concurrency by separating high-volume Lambda functions into different AWS accounts. If there is a traffic surge to any one of the Lambda functions, this isolates the concurrency used to a single AWS account.

Lambda compute power is controlled by the memory setting. With more memory comes more CPU. Even if a function does not require much memory, you can adjust this parameter to get more CPU power and improve processing time.

When serving real-time traffic, Provisioned Concurrency in Lambda functions can help to avoid cold start latency. (Note that you should use max, not average for your auto scaling metric to keep it more responsive for traffic increases.) For Java functions, code in static blocks is run before the function is invoked. Provisioned Concurrency is different to reserved concurrency, which sets a concurrency limit on the function and throttles invocations above the hard limit.

Use the maximum number of concurrent executions in a load test to determine the account concurrency limit for high-volume Lambda functions. Also, configure a CloudWatch alarm for when you are nearing the concurrency limit for the AWS account.

There are concurrency limits and burst limits for Lambda function scaling. Both are per-account limits. When there is a traffic surge, Lambda creates new instances to handle the traffic. “Burst limit = 3000” means that the first 3000 instances can be obtained at a much faster rate (invocations increase exponentially). The remaining instances are obtained at a linear rate of 500 per minute until reaching the concurrency limit.

An alternative way of thinking this is that the rate at which concurrency can increase is 500 per minute with a burst pool of 3000. The burst limit is fixed, but the concurrency limit can be increased by requesting a quota increase.

You can further reduce cold start latency by removing unused dependencies, selecting lightweight libraries for your project, and favoring compile-time over runtime dependency injection.

Impact of Lambda runtime on performance

Choice of runtime impacts the overall function performance. We migrated a graphlet from Java to Node.js with complete feature parity. The following graph shows the performance comparison between the two:

To illustrate the performance difference, the graph compares the slowest latencies for Node.js and Java – the P80 latency for Node.js was lower than the minimal latency we recorded for Java.

Conclusion

There are multiple factors to consider when tuning a federated GQL system. You must be aware of trade-offs when deciding on factors like the runtime environment of Lambda functions.

An extensive testing strategy can help you scale systems and narrow down issues quickly. Well-defined testing can also keep pipelines clean of false-positive blockages.

Using CloudWatch EMF helps to avoid PutMetrics API throttling and allows you to run CloudWatch Logs Insights queries against metric data.

For more serverless learning resources, visit Serverless Land.

Building a difference checker with Amazon S3 and AWS Lambda

2021-10-25 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-difference-checker-with-amazon-s3-and-aws-lambda/

When saving different versions of files or objects, it can be useful to detect and log the differences between the versions automatically. A difference checker tool can detect changes in JSON files for configuration changes, or log changes in documents made by users.

This blog post shows how to build and deploy a scalable difference checker service using Amazon S3 and AWS Lambda. The example application uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application more easily in your own AWS account.

This walkthrough creates resources covered in the AWS Free Tier but usage beyond the Free Tier allowance may incur cost. To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

Overview

By default in S3, when you upload an object with the same name as an existing object, the new object overwrites the existing one. However, when you enable versioning in a S3 bucket, the service stores every version of an object. Versioning provides an effective way to recover objects in the event of accidental deletion or overwriting. It also provides a way to detect changes in objects, since you can compare the latest version to previous versions.

In the example application, the S3 bucket triggers a Lambda function every time an object version is unloaded. The Lambda function compares the latest version with the last version and then writes the differences to Amazon CloudWatch Logs.

Additionally, the application uses a configurable environment variable to determine how many versions of the object to retain. By default, it keeps the latest three versions. The Lambda function deletes versions that are earlier than the configuration allows, providing an effective way to implement object life cycling.

This shows the application flow when multiple versions of an object are uploaded:

When v1 is uploaded, there is no previous version to compare against.
When v2 is uploaded, the Lambda function logs the differences compared with v1.
When v3 is uploaded, the Lambda function logs the differences compared with v2.
When v4 is uploaded, the Lambda function logs the differences compared with v3. It then deletes v1 of the object, since it is earlier than the configured setting.

Understanding the AWS SAM template

The application’s AWS SAM template configures the bucket with versioning enabled using the VersioningConfiguration attribute:

  SourceBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref BucketName
      VersioningConfiguration:
        Status: Enabled

It defines the Lambda function with an environment variable KEEP_VERSIONS, which determines how many versions of an object to retain:

  S3ProcessorFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: src/
      Handler: app.handler
      Runtime: nodejs14.x
      MemorySize: 128
      Environment:
        Variables:
          KEEP_VERSIONS: 3

The template uses an AWS SAM policy template to provide the Lambda function with an S3ReadPolicy to the objects in the bucket. The version handling logic requires s3:ListBucketVersions permission on the bucket and s3:DeleteObjectVersion permission on the objects in the bucket. It’s important to note which permissions apply to the bucket and which apply to the objects within the bucket. The template defines these three permission types in the function’s policy:

      Policies:
        - S3ReadPolicy:
            BucketName: !Ref BucketName      
        - Statement:
          - Sid: VersionsPermission
            Effect: Allow
            Action:
            - s3:ListBucketVersions
            Resource: !Sub "arn:${AWS::Partition}:s3:::${BucketName}" 
        - Statement:
          - Sid: DeletePermission
            Effect: Allow
            Action:
            - s3:DeleteObject
            - s3:DeleteObjectVersion
            Resource: !Sub "arn:${AWS::Partition}:s3:::${BucketName}/*"

The example application only works for text files but you can use the same logic to process other file types. The event definition ensures that only objects ending in ‘.txt’ invoke the Lambda function:

      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref SourceBucket
            Events: s3:ObjectCreated:*
            Filter: 
              S3Key:
                Rules:
                  - Name: suffix
                    Value: '.txt'

Processing events from the S3 bucket

S3 sends events to the Lambda function when objects are created. The event contains metadata about the objects but not the contents of the object. It’s good practice to separate the business logic of the function from the Lambda handler, so the generic handler in app.js iterates through the event’s records and calls the custom logic for each record:

const { processS3 } = require('./processS3')

exports.handler = async (event) => {
  console.log (JSON.stringify(event, null, 2))

  await Promise.all(
    event.Records.map(async (record) => {
      try {
        await processS3(record)
      } catch (err) {
        console.error(err)
      }
    })
  )
}

The processS3.js file contains a function that fetches the object versions in the bucket and sorts the event data received. The listObjectVersions method of the S3 API requires the s3:ListBucketVersions permission, as provided in the AWS SAM template:

    // Decode URL-encoded key
    const Key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "))

    // Get the list of object versions
    const data = await s3.listObjectVersions({
      Bucket: record.s3.bucket.name,
      Prefix: Key
    }).promise()

   // Sort versions by date (ascending by LastModified)
    const versions = data.Versions
    const sortedVersions = versions.sort((a,b) => new Date(a.LastModified) - new Date(b.LastModified))

Finally, the compareS3.js file contains a function that loads the latest two versions of the S3 object and uses the Diff npm library to compare:

const compareS3 = async (oldVersion, newVersion) => {
  try {
    console.log ({oldVersion, newVersion})

    // Get original text from objects 
    const oldObject = await s3.getObject({ Bucket: oldVersion.BucketName, Key: oldVersion.Key }).promise()
    const newObject = await s3.getObject({ Bucket: newVersion.BucketName, Key: newVersion.Key }).promise()

    // Convert buffers to strings
    const oldFile = oldObject.Body.toString()
    const newFile = newObject.Body.toString()

    // Use diff library to compare files (https://www.npmjs.com/package/diff)
    return Diff.diffWords(oldFile, newFile)

  } catch (err) {
    console.error('compareS3: ', err)
  }
}

Life-cycling earlier versions of an S3 object

You can use an S3 Lifecycle configuration to apply rules automatically based on object transition actions. Using this approach, you can expire objects based upon age and the S3 service processes the deletion asynchronously. Lifecyling with rules is entirely managed by S3 and does not require any custom code. This implementation uses a different approach, using code to delete objects based on number of retained versions instead of age.

When versioning is enabled on a bucket, S3 adds a VersionId attribute to an object when it is created. This identifier is a random string instead of a sequential identifier. Listing the versions of an object also returns a LastModified attribute, which can be used to determine the order of the versions. The length of the response array also indicates the number of versions available for an object:

[
  {
    Key: 'test.txt',
    VersionId: 'IX_tyuQrgKpMFfq5YmLOlrtaleRBQRE',
    IsLatest: false,
    LastModified: 2021-08-01T18:48:50.000Z,
  },
  {
    Key: 'test.txt',
    VersionId: 'XNpxNgUYhcZDcI9Q9gXCO9_VRLlx1i.',
    IsLatest: false,
    LastModified: 2021-08-01T18:52:58.000Z,
  },
  {
    Key: 'test.txt',
    VersionId: 'RBk2BUIKcYYt4hNA5hrTVdNit.MDNMZ',
    IsLatest: true,
    LastModified: 2021-08-01T18:53:26.000Z,
  }
]

For convenience, this code adds a sequential version number attribute, determined by sorting the array by date. The deleteS3 function uses the deleteObjects method in the S3 API to delete multiple objects in one action. It builds a params object containing the list of keys for deletion, using the sequential version ID to flag versions for deletion:

const deleteS3 = async (versions) => {

  const params = {
    Bucket: versions[0].BucketName, 
    Delete: {
     Objects: [ ]
    }
  }

  try {
    // Add keys/versions from objects that are process.env.KEEP_VERSIONS behind
    versions.map((version) => {
      if ((versions.length - version.VersionNumber) >= process.env.KEEP_VERSIONS ) {
        console.log(`Delete version ${version.VersionNumber}: versionId = ${version.VersionId}`)
        params.Delete.Objects.push({ 
          Key: version.Key,
          VersionId: version.VersionId
        })
      }
    })

    // Delete versions
    const result = await s3.deleteObjects(params).promise()
    console.log('Delete object result: ', result)

  } catch (err) {
    console.error('deleteS3: ', err)
  }
}

Testing the application

To test this example, upload a sample text file to the S3 bucket by using the AWS Management Console or with the AWS CLI:

aws s3 cp sample.txt s3://myS3bucketname

Modify the test file and then upload again using the same command. This creates a second version in the bucket. Repeat this process multiple times to create more versions of the object. The Lambda function’s log file shows the differences between versions and any deletion activity for earlier versions:

You can also test the object locally using the test.js function and supplying a test event. This can be useful for local debugging and testing.

Conclusion

This blog post shows how to create a scalable difference checking tool for objects stored in S3 buckets. The Lambda function is invoked when S3 writes new versions of an object to the bucket. This example also shows how to remove earlier versions of object and define a set number of versions to retain.

I walk through the AWS SAM template for deploying this example application and highlight important S3 API methods in the SDK used in the implementation. I explain how version IDs work in S3 and how to use this in combination with the LastModified date attribute to implement sequential versioning.

To learn more about best practices when using S3 to Lambda, see the Lambda Operator Guide. For more serverless learning resources, visit Serverless Land.

Creating AWS Serverless batch processing architectures

2021-10-21 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-aws-serverless-batch-processing-architectures/

This post is written by Reagan Rosario, AWS Solutions Architect and Mark Curtis, Solutions Architect, WWPS.

Batch processes are foundational to many organizations and can help process large amounts of information in an efficient and automated way. Use cases include file intake processes, queue-based processing, and transactional jobs, in addition to heavy data processing jobs.

This post explains a serverless solution for batch processing to implement a file intake process. This example uses AWS Step Functions for orchestration, AWS Lambda functions for on-demand instance compute, Amazon S3 for storing the data, and Amazon SES for sending emails.

Overview

This post’s example takes a common use-case of a business’s need to process data uploaded as a file. The test file has various data fields such as item ID, order date, order location. The data must be validated, processed, and enriched with related information such as unit price. Lastly, this enriched data may need to be sent to a third-party system.

Step Functions allows you to coordinate multiple AWS services in fully managed workflows to build and update applications quickly. You can also create larger workflows out of smaller workflows by using nesting. This post’s architecture creates a smaller and modular Chunk processor workflow, which is better for processing smaller files.

As the file size increases, the size of the payload passed between states increases. Executions that pass large payloads of data between states can be stopped if they exceed the maximum payload size of 262,144 bytes.

To process large files and to make the workflow modular, I split the processing between two workflows. One workflow is responsible for splitting up a larger file into chunks. A second nested workflow is responsible for processing records in individual chunk files. This separation of high-level workflow steps from low-level workflow steps also allows for easier monitoring and debugging.

Splitting the files in multiple chunks can also improve performance by processing each chunk in parallel. You can further improve the performance by using dynamic parallelism via the map state for each chunk.

The file upload to an S3 bucket triggers the S3 event notification. It invokes the Lambda function asynchronously with an event that contains details about the object.
Lambda function calls the Main batch orchestrator workflow to start the processing of the file.
Main batch orchestrator workflow reads the input file and splits it into multiple chunks and stores them in an S3 bucket.
Main batch orchestrator then invokes the Chunk Processor workflow for each split file chunk.
Each Chunk processor workflow execution reads and processes a single split chunk file.
Chunk processor workflow writes the processed chunk file back to the S3 bucket.
Chunk processor workflow writes the details about any validation errors in an Amazon DynamoDB table.
Main batch orchestrator workflow then merges all the processed chunk files and saves it to an S3 bucket.
Main batch orchestrator workflow then emails the consolidated files to the intended recipients using Amazon Simple Email Service.

The Main batch orchestrator workflow orchestrates the processing of the file.
1. The first task state Split Input File into chunks calls a Lambda function. It splits the main file into multiple chunks based on the number of records and stores each chunk into an S3 bucket.
2. The next task state Call Step Functions for each chunk invokes a Lambda function. It triggers a workflow for each chunk of the file. It passes information such as the name of bucket and the key where the chunk file to be processed is present.
3. Then we wait for all the child workflow executions to complete.
4. Once all the child workflows are processed successfully, the next task state is Merge all Files. This combines all the processed chunks into a single file and then stores the file back to the S3 bucket.
5. The next task state Email the file takes the output file. It generates an S3 presigned URL for the file and sends an email with the S3 presigned URL.
The Chunk processor workflow is responsible for processing each row from the chunk file that was passed.
1. The first task state Read reads the chunked file from S3 and converts it to an array of JSON objects. Each JSON object represents a row in the chunk file.
2. The next state is a map state called Process messages (not shown in the preceding visual workflow). It runs a set of steps for each element of an input array. The input to the map state is an array of JSON objects passed by the previous task.
3. Within the map state, Validate Data is the first state. It invokes a Lambda function that validates each JSON object using the rules that you have created. Records that fail validation are stored in an Amazon DynamoDB table.
4. The next state Get Financial Data invokes Amazon API Gateway endpoints to enrich the data in the file with data from a DynamoDB table.
5. When the map state iterations are complete, the Write output file state triggers a task. It calls a Lambda function, which converts the JSON data back to CSV and writes the output object to S3.

Prerequisites

AWS account.
AWS SAM CLI.
Python 3.
An AWS Identity and Access Management (IAM) role with appropriate access.

Deploying the application

Clone the repository.
Change to the directory and build the application source:
sam build
Package and deploy the application to AWS. When prompted, input the corresponding parameters as shown below:
sam deploy –guided
Note the template parameters:
- SESSender: The sender email address for the output file email.
- SESRecipient: The recipient email address for the output file email.
- SESIdentityName: An email address or domain that Amazon SES users use to send email.
- InputArchiveFolder: Amazon S3 folder where the input file will be archived after processing.
- FileChunkSize: Size of each of the chunks, which is split from the input file.
- FileDelimiter: Delimiter of the CSV file (for example, a comma).
After the stack creation is complete, you see the source bucket created in Outputs.
Review the deployed components in the AWS CloudFormation Console.

Testing the solution

Before you can send an email using Amazon SES, you must verify each identity that you’re going to use as a “From”, “Source”, “Sender”, or “Return-Path” address to prove that you own it. Refer Verifying identities in Amazon SES for more information.
Locate the S3 bucket (SourceBucket) in the Resources section of the CloudFormation stack. Choose the physical ID.
In the S3 console for the SourceBucket, choose Create folder. Name the folder input and choose Create folder.
The S3 event notification on the SourceBucket uses “input” as the prefix and “csv” as the suffix. This triggers the notification Lambda function. This is created as a part of the custom resource in the AWS SAM template.
In the S3 console for the SourceBucket, choose the Upload button. Choose Add files and browse to the input file (testfile.csv). Choose Upload.
Review the data in the input file testfile.csv.
After the object is uploaded, the event notification triggers the Lambda Function. This starts the main orchestrator workflow. In the Step Functions console, you see the workflow is in a running state.
Choose an individual state machine to see additional information.
After a few minutes, both BlogBatchMainOrchestrator and BlogBatchProcessChunk workflows have completed all executions. There is one execution for the BlogBatchMainOrchestrator workflow and multiple invocations of the BlogBatchProcessChunk workflow. This is because the BlogBatchMainOrchestrator invokes the BlogBatchProcessChunk for each of the chunked files.

Checking the output

Open the S3 console and verify the folders created after the process has completed.

The following subfolders are created after the processing is complete:
– input_archive – Folder for archival of the input object.
– 0a47ede5-4f9a-485e-874c-7ff19d8cadc5 – Subfolder with a unique UUID in the name. This is created for storing the objects generated during batch execution.
Select the folder 0a47ede5-4f9a-485e-874c-7ff19d8cadc5.

output – This folder contains the completed output objects, some housekeeping files, and processed chunk objects.
to_process – This folder contains all the split objects from the original input file.
Open the processed object from the output/completed folder.

Inspect the output object testfile.csv. It is enriched with additional data (columns I through N) from the DynamoDB table fetched through an API call.

Viewing a completed workflow

Open the Step Functions console and browse to the BlogBatchMainOrchestrator and BlogBatchProcessChunk state machines. Choose one of the executions of each to locate the Graph Inspector. This shows the execution results for each state.

BlogBatchMainOrchestrator:

BlogBatchProcessChunk:

Batch performance

For this use case, this is the time taken for the batch to complete, based on the number of input records:

No. of records	Time for batch completion
10 k	5 minutes
100 k	7 minutes

The performance of the batch depends on other factors such as the Lambda memory settings and data in the file. Read more about Profiling functions with AWS Lambda Power Tuning.

Conclusion

This blog post shows how to use Step Functions’ features and integrations to orchestrate a batch processing solution. You use two Steps Functions workflows to implement batch processing, with one workflow splitting the original file and a second workflow processing each chunk file.

The overall performance of our batch processing application is improved by splitting the input file into multiple chunks. Each chunk is processed by a separate state machine. Map states further improve the performance and efficiency of workflows by processing individual rows in parallel.

Download the code from this repository to start building a serverless batch processing system.

Additional Resources:

For more serverless learning resources, visit Serverless Land.

Operating serverless at scale: Keeping control of resources – Part 3

2021-10-19 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-serverless-at-scale-keeping-control-of-resources-part-3/

This post is written by Jerome Van Der Linden, Solutions Architect.

In the previous part of this series, I provide application archetypes for developers to follow company best practices and include libraries needed for compliance. But using these archetypes is optional and teams can still deploy resources without them. Even if they use them, the templates can be modified. Developers can remove a layer, over-permission functions, or allow access to APIs without appropriate authorization.

To avoid this, you must define guardrails. Templates are good for providing guidance, best practices and to improve productivity. But they do not prevent actions like guardrails do. There are two kinds of guardrails:

Proactive: you define rules and permissions that avoid some specific actions.
Reactive: you define controls that detect if something happens and trigger notifications to alert someone or remediate actions.

This third part on serverless governance describes different guardrails and ways to implement them.

Implementing proactive guardrails

Proactive guardrails are often the most efficient but also the most restrictive. Be sure to apply them with caution as you could reduce developers’ agility and productivity. For example, test in a sandbox account before applying more broadly.

In this category, you typically find IAM policies and service control policies. This section explores some examples applied to serverless applications.

Controlling access through policies

Part 2 discusses Lambda layers, to include standard components and ensure compliance of Lambda functions. You can enforce the use of a Lambda layer when creating or updating a function, using the following policy. The condition checks if a layer is configured with the appropriate layer ARN:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ConfigureFunctions",
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction",
                "lambda:UpdateFunctionConfiguration"
            ],
            "Resource": "*",
            "Condition": {
                "ForAllValues:StringLike": {
                    "lambda:Layer": [
                        "arn:aws:lambda:*:123456789012:layer:my-company-layer:*"
                    ]
                }
            }
        }
    ]
}

When deploying Lambda functions, some companies also want to control the source code integrity and verify it has not been altered. Using code signing for AWS Lambda, you can sign the package and verify its signature at deployment time. If the signature is not valid, you can be warned or even block the deployment.

An administrator must first create a signing profile (you can see it as a trusted publisher) using AWS Signer. Then, a developer can reference this profile in its AWS SAM template to sign the Lambda function code:

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9
      CodeSigningConfigArn: !Ref MySignedFunctionCodeSigningConfig

  MySignedFunctionCodeSigningConfig:
    Type: AWS::Lambda::CodeSigningConfig
    Properties:
      AllowedPublishers:
        SigningProfileVersionArns:
          - arn:aws:signer:eu-central-1:123456789012:/signing-profiles/MySigningProfile
      CodeSigningPolicies:
        UntrustedArtifactOnDeployment: "Enforce"

Using the AWS SAM CLI and the --signing-profile option, you can package and deploy the Lambda function using the appropriate configuration. Read the documentation for more details.

You can also enforce the use of code signing by using a policy so that every function must be signed before deployment. Use the following policy and a condition requiring a CodeSigningConfigArn:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ConfigureFunctions",
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "lambda:CodeSigningConfigArn": "arn:aws:lambda:eu-central-1:123456789012:code-signing-config:csc-0c44689353457652"
                }
            }
        }
    ]
}

When using Amazon API Gateway, you may want to use a standard authorization mechanism. For example, a Lambda authorizer to validate a JSON Web Token (JWT) issued by your company identity provider. You can do that using a policy like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyWithoutJWTLambdaAuthorizer",
      "Effect": "Deny",
      "Action": [
        "apigateway:PUT",
        "apigateway:POST",
        "apigateway:PATCH"
      ],
      "Resource": [
        "arn:aws:apigateway:eu-central-1::/apis",
        "arn:aws:apigateway:eu-central-1::/apis/??????????",
        "arn:aws:apigateway:eu-central-1::/apis/*/authorizers",
        "arn:aws:apigateway:eu-central-1::/apis/*/authorizers/*"
      ],
      "Condition": {
        "ForAllValues:StringNotEquals": {
          "apigateway:Request/AuthorizerUri": 
            "arn:aws:apigateway:eu-central-1:lambda:path/2015-03-31/functions/arn:aws:lambda:eu-central-1:123456789012:function:MyCompanyJWTAuthorizer/invocations"
        }
      }
    }
  ]
}

To enforce the use of mutual authentication (mTLS) and TLS version 1.2 for APIs, use the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EnforceTLS12",
      "Effect": "Allow",
      "Action": [
        "apigateway:POST"
      ],
      "Resource": [
        "arn:aws:apigateway:eu-central-1::/domainnames",
        "arn:aws:apigateway:eu-central-1::/domainnames/*"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
            "apigateway:Request/SecurityPolicy": "TLS_1_2"
        }
      }
    }
  ]
}

You can apply other guardrails for Lambda, API Gateway, or another service. Read the available policies and conditions for your service here.

Securing self-service with permissions boundaries

When creating a Lambda function, developers must create a role that the function will assume when running. But by giving the ability to create roles to developers, one could elevate their permission level. In the following diagram, you can see that an admin gives this ability to create roles to developers:

Developer 1 creates a role for a function. This only allows Amazon DynamoDB read/write access and a basic execution role for Lambda (for Amazon CloudWatch Logs). But developer 2 is creating a role with administrator permission. Developer 2 cannot assume this role but can pass it to the Lambda function. This role could be used to create resources on Amazon EC2, delete an Amazon RDS database or an Amazon S3 bucket, for example.

To avoid users elevating their permissions, define permissions boundaries. With these, you can limit the scope of a Lambda function’s permissions. In this example, an admin still gives the same ability to developers to create roles but this time with a permissions boundary attached. Now the function cannot perform actions that exceed this boundary:

The admin must first define the permissions boundaries within an IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "LambdaDeveloperBoundary",
            "Effect": "Allow",
            "Action": [
                "s3:List*",
                "s3:Get*",
                "logs:*",
                "dynamodb:*",
                "lambda:*"
            ],
            "Resource": "*"
        }
    ]
}

Note that this boundary is still too permissive and you should reduce and adopt a least privilege approach. For example, you may not want to grant the dynamodb:DeleteTable permission or restrict it to a specific table.

The admin can then provide the CreateRole permission with this boundary using a condition:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CreateRole",
            "Effect": "Allow",
            "Action": [
                "iam:CreateRole"
            ],
            "Resource": "arn:aws:iam::123456789012:role/lambdaDev*",
            "Condition": {
                "StringEquals": {
                    "iam:PermissionsBoundary": "arn:aws:iam::123456789012:policy/lambda-dev-boundary"
                }
            }
        }
    ]
}

Developers assuming a role lambdaDev* can create a role for their Lambda functions but these functions cannot have more permissions than defined in the boundary.

Deploying reactive guardrails

The principle of least privilege is not always easy to accomplish. To achieve it without this permission management burden, you can use reactive guardrails. Actions are allowed but they are detected and trigger a notification or a remediation action.

To accomplish this on AWS, use AWS Config. It continuously monitors your resources and their configurations. It assesses them against compliance rules that you define and can notify you or automatically remediate to non-compliant resources.

AWS Config has more than 190 built-in rules and some are related to serverless services. For example, you can verify that an API Gateway REST API is configured with SSL or protected by a web application firewall (AWS WAF). You can ensure that a DynamoDB table has back up configured in AWS Backup or that data is encrypted.

Lambda also has a set of rules. For example, you can ensure that functions have a concurrency limit configured, which you should. Most of these rules are part of the “Operational Best Practices for Serverless” conformance pack to ease their deployment as a single entity. Otherwise, setting rules and remediation can be done in the AWS Management Console or AWS CLI.

If you cannot find a rule for your use case in the AWS Managed Rules, you can find additional ones on GitHub or write your own using the Rule Development Kit (RDK). For example, enforcing the use of a Lambda layer for functions. This is possible using a service control policy but it denies the creation or modification of the function if the layer is not provided. You can use this policy in production but you may only want to notify the developers in their sandbox accounts or test environments.

By using the RDK CLI, you can bootstrap a new rule:

rdk create LAMBDA_LAYER_CHECK --runtime python3.9 \
--resource-types AWS::Lambda::Function \
--input-parameters '{"LayerArn":"arn:aws:lambda:region:account:layer:layer_name", "MinLayerVersion":"1"}'

It generates a Lambda function, some tests, and a parameters.json file that contains the configuration for the rule. You can then edit the Lambda function code and the evaluate_compliance method. To check for a layer:

LAYER_REGEXP = 'arn:aws:lambda:[a-z]{2}((-gov)|(-iso(b?)))?-[a-z]+-\d{1}:\d{12}:layer:[a-zA-Z0-9-_]+'

def evaluate_compliance(event, configuration_item, valid_rule_parameters):
    pkg = configuration_item['configuration']['packageType']
    if not pkg or pkg != "Zip":
        return build_evaluation_from_config_item(configuration_item, 'NOT_APPLICABLE',
                                                 annotation='Layers can only be used with functions using Zip package type')

    layers = configuration_item['configuration']['layers']
    if not layers:
        return build_evaluation_from_config_item(configuration_item, 'NON_COMPLIANT',
                                                 annotation='No layer is configured for this Lambda function')

    regex = re.compile(LAYER_REGEXP + ':(.*)')
    annotation = 'Layer ' + valid_rule_parameters['LayerArn'] + ' not used for this Lambda function'
    for layer in layers:
        arn = layer['arn']
        version = regex.search(arn).group(5)
        arn = re.sub('\:' + version + '$', '', arn)
        if arn == valid_rule_parameters['LayerArn']:
            if version >= valid_rule_parameters['MinLayerVersion']:
                return build_evaluation_from_config_item(configuration_item, 'COMPLIANT')
            else:
                annotation = 'Wrong layer version (was ' + version + ', expected ' + valid_rule_parameters['MinLayerVersion'] + '+)'

    return build_evaluation_from_config_item(configuration_item, 'NON_COMPLIANT',
                                             annotation=annotation)

You can find the complete source of this AWS Config rule and its tests on GitHub.

Once the rule is ready, use the command rdk deploy to deploy it on your account. To deploy it across multiple accounts, see the documentation. You can then define remediation actions. For example, automatically add the missing layer to the function or send a notification to the developers using Amazon Simple Notification Service (SNS).

Conclusion

This post describes guardrails that you can set up in your accounts or across the organization to keep control over deployed resources. These guardrails can be more or less restrictive according to your requirements.

Use proactive guardrails with service control policies to define coarse-grained permissions and block everything that must not be used. Define reactive guardrails for everything else to aid agility and productivity but still be informed of the activity and potentially remediate.

This concludes this series on serverless governance:

Standardization is an important aspect of the governance to speed up teams and ensure that deployed applications are operable and compliant with your internal rules. Use templates, layers, and other mechanisms to create shareable archetypes to apply these standards and rules at the enterprise level.
It’s important to keep visibility and control on your resources, to understand how your environment evolves and to be able to operate and act if needed. Tags and guardrails are helpful to achieve this and they should evolve as your maturity with the cloud evolves.

Find more SCP examples and all the AWS managed AWS Config rules in the documentation.

For more serverless learning resources, visit Serverless Land.

Building dynamic Amazon SNS subscriptions for auto scaling container workloads

2021-10-18 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-dynamic-amazon-sns-subscriptions-for-auto-scaling-container-workloads/

This post is written by Mithun Mallick, Senior Specialist Solutions Architect, App Integration.

Amazon Simple Notification Service (SNS) is a serverless publish subscribe messaging service. It supports a push-based subscriptions model where subscribers must register an endpoint to receive messages. Amazon Simple Queue Service (SQS) is one such endpoint, which is used by applications to receive messages published on an SNS topic.

With containerized applications, the container instances poll the queue and receive the messages. However, containerized applications can scale out for a variety of reasons. The creation of an SQS queue for each new container instance creates maintenance overhead for customers. You must also clean up the SNS-SQS subscription once the instance scales in.

This blog walks through a dynamic subscription solution, which automates the creation, subscription, and deletion of SQS queues for an Auto Scaling group of containers running in Amazon Elastic Container Service (ECS).

Overview

The solution is based on the use of events to achieve the dynamic subscription pattern. ECS uses the concept of tasks to create an instance of a container. You can find more details on ECS tasks in the ECS documentation.

This solution uses the events generated by ECS to manage the complete lifecycle of an SNS-SQS subscription. It uses the task ID as the name of the queue that is used by the ECS instance for pulling messages. More details on the ECS task ID can be found in the task documentation.

This also uses Amazon EventBridge to apply rules on ECS events and trigger an AWS Lambda function. The first rule detects the running state of an ECS task and triggers a Lambda function, which creates the SQS queue with the task ID as queue name. It also grants permission to the queue and creates the SNS subscription on the topic.

As the container instance starts up, it can send a request to its metadata URL and retrieve the task ID. The task ID is used by the container instance to poll for messages. If the container instance terminates, ECS generates a task stopped event. This event matches a rule in Amazon EventBridge and triggers a Lambda function. The Lambda function retrieves the task ID, deletes the queue, and deletes the subscription from the SNS topic. The solution decouples the container instance from any overhead in maintaining queues, applying permissions, or managing subscriptions. The security permissions for all SNS-SQS management are handled by the Lambda functions.

This diagram shows the solution architecture:

Events from ECS are sent to the default event bus. There are various events that are generated as part of the lifecycle of an ECS task. You can find more on the various ECS task states in ECS task documentation. This solution uses ECS as the container orchestration service but you can also use Amazon Elastic Kubernetes Service.(EKS). For EKS, you must apply the rules for EKS task state events.

Walkthrough of the implementation

The code snippets are shortened for brevity. The full source code of the solution is in the GitHub repository. The solution uses AWS Serverless Application Model (AWS SAM) for deployment.

SNS topic

The SNS topic is used to send notifications to the ECS tasks. The following snippet from the AWS SAM template shows the definition of the SNS topic:

  SNSDynamicSubsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Ref DynamicSubTopicName

Container instance

The container instance subscribes to the SNS topic using an SQS queue. The container image is a Java class that reads messages from an SQS queue and prints them in the logs. The following code shows some of the message processor implementation:

AmazonSQS sqs = AmazonSQSClientBuilder.defaultClient();
AmazonSQSResponder responder = AmazonSQSResponderClientBuilder.standard()
        .withAmazonSQS(sqs)
        .build();

SQSMessageConsumer consumer = SQSMessageConsumerBuilder.standard()
        .withAmazonSQS(responder.getAmazonSQS())
        .withQueueUrl(queue_url)
        .withConsumer(message -> {
            System.out.println("The message is " + message.getBody());
            sqs.deleteMessage(queue_url,message.getReceiptHandle());

        }).build();
consumer.start();

The queue_url highlighted is the task ID of the ECS task. It is retrieved in the constructor of the class:

String metaDataURL = map.get("ECS_CONTAINER_METADATA_URI_V4");

HttpGet request = new HttpGet(metaDataURL);
CloseableHttpResponse response = httpClient.execute(request);

HttpEntity entity = response.getEntity();
if (entity != null) {
    String result = EntityUtils.toString(entity);
    String taskARN = JsonPath.read(result, "$['Labels']['com.amazonaws.ecs.task-arn']").toString();
    String[] arnTokens = taskARN.split("/");
    taskId = arnTokens[arnTokens.length-1];
    System.out.println("The task arn : "+taskId);
}

queue_url = sqs.getQueueUrl(taskId).getQueueUrl();

The queue URL is constructed from the task ID of the container. Each queue is dedicated to each of the tasks or the instances of the container running in ECS.

EventBridge rules

The following event pattern on the default event bus captures events that match the start of the container instance. The rule triggers a Lambda function:

      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - "ECS Task State Change"
        detail:
          desiredStatus:
            - "RUNNING"
          lastStatus:  
            - "RUNNING"

The start rule routes events to a Lambda function that creates a queue with the name as the task ID. It creates the subscription to the SNS topic and grants permission on the queue to receive messages from the topic.

This event pattern matches STOPPED events of the container task. It also triggers a Lambda function to delete the queue and the associated subscription:

      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - "ECS Task State Change"
        detail:
          desiredStatus:
            - "STOPPED"
          lastStatus:  
            - "STOPPED"

Lambda functions

There are two Lambda functions that perform the queue creation, subscription, authorization, and deletion.

The SNS-SQS-Subscription-Service

The following code creates the queue based on the task id, applies policies, and subscribes it to the topic. It also stores the subscription ARN in a Amazon DynamoDB table:

# get the task id from the event
taskArn = event['detail']['taskArn']
taskArnTokens = taskArn.split('/')
taskId = taskArnTokens[len(taskArnTokens)-1]

create_queue_resp = sqs_client.create_queue(QueueName=queue_name)

response = sns.subscribe(TopicArn=topic_arn, Protocol="sqs", Endpoint=queue_arn)

ddbresponse = dynamodb.update_item(
    TableName=SQS_CONTAINER_MAPPING_TABLE,
    Key={
        'id': {
            'S' : taskId.strip()
        }
    },
    AttributeUpdates={
        'SubscriptionArn':{
            'Value': {
                'S': subscription_arn
            }
        }
    },
    ReturnValues="UPDATED_NEW"
)

The cleanup service

The cleanup function is triggered when the container instance is stopped. It fetches the subscription ARN from the DynamoDB table based on the taskId. It deletes the subscription from the topic and deletes the queue. You can modify this code to include any other cleanup actions or trigger a workflow. The main part of the function code is:

taskId = taskArnTokens[len(taskArnTokens)-1]

ddbresponse = dynamodb.get_item(TableName=SQS_CONTAINER_MAPPING_TABLE,Key={'id': { 'S' : taskId}})
snsresp = sns.unsubscribe(SubscriptionArn=subscription_arn)

queuedelresp = sqs_client.delete_queue(QueueUrl=queue_url)

Conclusion

This blog shows an event driven approach to handling dynamic SNS subscription requirements. It relies on the ECS service events to trigger appropriate Lambda functions. These create the subscription queue, subscribe it to a topic, and delete it once the container instance is terminated.

The approach also allows the container application logic to focus only on consuming and processing the messages from the queue. It does not need any additional permissions to subscribe or unsubscribe from the topic or apply any additional permissions on the queue. Although the solution has been presented using ECS as the container orchestration service, it can be applied for EKS by using its service events.

For more serverless learning resources, visit Serverless Land.

Using JSONPath effectively in AWS Step Functions

2021-10-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-jsonpath-effectively-in-aws-step-functions/

This post is written by Dhiraj Mahapatro, Senior Serverless Specialist SA, Serverless.

AWS Step Functions uses Amazon States Language (ASL), which is a JSON-based, structured language used to define the state machine. ASL uses paths for input and output processing in between states. Paths follow JSONPath syntax.

JSONPath provides the capability to select parts of JSON structures similar to how XPath expressions select nodes of XML documents. Step Functions provides the data flow simulator, which helps in modeling input and output path processing using JSONPath.

This blog post explains how you can effectively use JSONPath in a Step Functions workflow. It shows how you can separate concerns between states by specifically identifying input to and output from each state. It also explains how you can use advanced JSONPath expressions for filtering and mapping JSON content.

Overview

The sample application in this blog is based on a use case in the insurance domain. A new potential customer signs up with an insurance company by creating an account. The customer provides their basic information, and their interests in the types of insurances for shopping later.

The information provided by the potential insurance customer is accepted by the insurance company’s new account application for processing. This application is built using Step Functions, which accepts provided input as a JSON payload and applies the following business logic:

Verify the identity of the user.
Verify the address of the user.
Approve the new account application if the checks pass.
Upon approval, insert user information into the Amazon DynamoDB Accounts table.
Collect home insurance interests and store in an Amazon SQS queue.
Send email notification to the user about the application approval.
Deny the new account application if the checks fail.
Send an email notification to the user about the application denial.

Deploying the application

Before deploying the solution, you need:

An AWS account (sign up for an account if you don’t have one).
The AWS SAM CLI installed.
Node.js installed (version 14 minimum).

To deploy:

From a terminal window, clone the GitHub repo:
git clone [email protected]:aws-samples/serverless-account-signup-service.git
Change directory:
cd ./serverless-account-signup-service
Download and install dependencies:
sam build
Deploy the application to your AWS account:
sam deploy --guided
During the guided deployment process, enter a valid email address for the parameter “Email” to receive email notifications.
Once deployed, a confirmation email is sent to the provided email address from SNS. Confirm the subscription by clicking the link in the email.

To run the application using the AWS CLI, replace the state machine ARN from the output of deployment steps:

aws stepfunctions start-execution \
  --state-machine-arn <StepFunctionArnHere> \
  --input "{\"data\":{\"firstname\":\"Jane\",\"lastname\":\"Doe\",\"identity\":{\"email\":\"[email protected]\",\"ssn\":\"123-45-6789\"},\"address\":{\"street\":\"123 Main St\",\"city\":\"Columbus\",\"state\":\"OH\",\"zip\":\"43219\"},\"interests\":[{\"category\":\"home\",\"type\":\"own\",\"yearBuilt\":2004},{\"category\":\"auto\",\"type\":\"car\",\"yearBuilt\":2012},{\"category\":\"boat\",\"type\":\"snowmobile\",\"yearBuilt\":2020},{\"category\":\"auto\",\"type\":\"motorcycle\",\"yearBuilt\":2018},{\"category\":\"auto\",\"type\":\"RV\",\"yearBuilt\":2015},{\"category\":\"home\",\"type\":\"business\",\"yearBuilt\":2009}]}}"

Paths in Step Functions

Here is the sample payload structure :

{
  "data": {
    "firstname": "Jane",
    "lastname": "Doe",
    "identity": {
      "email": "[email protected]",
      "ssn": "123-45-6789"
    },
    "address": {
      "street": "123 Main St",
      "city": "Columbus",
      "state": "OH",
      "zip": "43219"
    },
    "interests": [
      {"category": "home", "type": "own", "yearBuilt": 2004},
      {"category": "auto", "type": "car", "yearBuilt": 2012},
      {"category": "boat", "type": "snowmobile", "yearBuilt": 2020},
      {"category": "auto", "type": "motorcycle", "yearBuilt": 2018},
      {"category": "auto", "type": "RV", "yearBuilt": 2015},
      {"category": "home", "type": "business", "yearBuilt": 2009}
    ]
  }
}

The payload has data about the new user (identity and address information) and the user’s interests in the types of insurance.

The Compute Blog post on using data flow simulator elaborates on how to use Step Functions paths. To summarize how paths work:

InputPath – What input does a task need?
Parameters – How does the task need the structure of the input to be?
ResultSelectors – What to choose from the task’s output?
ResultPath – Where to put the chosen output?
OutputPath – What output to send to the next state?

The key idea is that the input of downstream states input depends on the output of previous states. JSONPath expressions help structuring input and output between states.

Using JSONPath inside paths

This is how paths are used in the sample application for each type.

InputPath

The first two main tasks in the Step Functions state machine validate the identity and the address of the user. Since both validations are unrelated, they can work independently by using parallel state.

Each state needs the identity and address information provided by the input payload. There is no requirement to provide interests to those states, so InputPath can help answer “What input does a task need?”.

Inside the Check Identity state:

"InputPath": "$.data.identity"

Inside the Check Address state:

"InputPath": "$.data.address"

Parameters

What should the input of the underlying task look like? Check Identity and Check Address use their respective AWS Lambda functions. When Lambda functions or any other AWS service integration is used as a task, the state machine should follow the request syntax of the corresponding service.

For a Lambda function as a task, the state should provide the FunctionName and an optional Payload as parameters. For the Check Identity state, the parameters section looks like:

"Parameters": {
    "FunctionName": "${CheckIdentityFunctionArn}",
    "Payload.$": "$"
}

Here, Payload is the entire identity JSON object provided by InputPath.

ResultSelector

Once the Check Identity task is invoked, the Lambda function successfully validates the user’s identity and responds with an approval response:

{
  "ExecutedVersion": "$LATEST",
  "Payload": {
    "statusCode": "200",
    "body": "{\"approved\": true,\"message\": \"identity validation passed\"}"
  },
  "SdkHttpMetadata": {
    "HttpHeaders": {
      "Connection": "keep-alive",
      "Content-Length": "43",
      "Content-Type": "application/json",
      "Date": "Thu, 16 Apr 2020 17:58:15 GMT",
      "X-Amz-Executed-Version": "$LATEST",
      "x-amzn-Remapped-Content-Length": "0",
      "x-amzn-RequestId": "88fba57b-adbe-467f-abf4-daca36fc9028",
      "X-Amzn-Trace-Id": "root=1-5e989cb6-90039fd8971196666b022b62;sampled=0"
    },
    "HttpStatusCode": 200
  },
  "SdkResponseMetadata": {
    "RequestId": "88fba57b-adbe-467f-abf4-daca36fc9028"
  },
  "StatusCode": 200
}

The identity validation approval must be provided to the downstream states for additional processing. However, the downstream states only need the Payload.body from the preceding JSON.

You can use a combination of intrinsic function and ResultSelector to choose attributes from the task’s output:

"ResultSelector": {
  "identity.$": "States.StringToJson($.Payload.body)"
}

ResultSelector takes the JSON string $.Payload.body and applies States.StringToJson to convert the string to JSON store in a new attribute named identity:

"identity": {
    "approved": true,
    "message": "identity validation passed"
}

When Check Identity and Check Address states finish their work and exit, the step output from each state is captured as a JSON array. This JSON array is the step output of the parallel state. Reconcile the results from the JSON array using the ResultSelector that is available in parallel state.

"ResultSelector": {
    "identityResult.$": "$[0].result.identity",
    "addressResult.$": "$[1].result.address"
}

ResultPath

After ResultSelector, where should the identity result go to in the initial payload? The downstream states need access to the actual input payload in addition to the results from the previous state. ResultPath provides the mechanism to extend the initial payload to add results from the previous state.

ResultPath: "$.result" informs the state machine that any result selected from the task output (actual output if none specified) should go under result JSON attribute and result should get added to the incoming payload. The output from ResultPath looks like:

{
  "data": {
    "firstname": "Jane",
    "lastname": "Doe",
    "identity": {
      "email": "[email protected]",
      "ssn": "123-45-6789"
    },
    "address": {
      "street": "123 Main St",
      "city": "Columbus",
      "state": "OH",
      "zip": "43219"
    },
    "interests": [
      {"category":"home", "type":"own", "yearBuilt":2004},
      {"category":"auto", "type":"car", "yearBuilt":2012},
      {"category":"boat", "type":"snowmobile","yearBuilt":2020},
      {"category":"auto", "type":"motorcycle","yearBuilt":2018},
      {"category":"auto", "type":"RV", "yearBuilt":2015},
      {"category":"home", "type":"business", "yearBuilt":2009}
    ]
  },
  "result": {
    "identity": {
      "approved": true,
      "message": "identity validation passed"
    }
  }
}

The preceding JSON has results from an operation but also the incoming payload is intact for business logic in downstream states.

This pattern ensures that the previous state keeps the payload hydrated for the next state. Use these combinations of paths across all states to make sure that each state has all the information needed.

As with the parallel state’s ResultSelector, an appropriate ResultPath is needed to hold both the results from Check Identity and Check Address to get the below results JSON object added to the payload:

"results": {
  "addressResult": {
    "approved": true,
    "message": "address validation passed"
  },
  "identityResult": {
    "approved": true,
    "message": "identity validation passed"
  }
}

With this approach for all of the downstream states, the input payload is still intact and the state machine has collected results from each state in results.

OutputPath

To return results from the state machine, ideally you do not send back the input payload to the caller of the Step Functions workflow. You can use OutputPath to select a portion of the state output as an end result. OutputPath determines what output to send to the next state.

In the sample application, the last states (Approved Message and Deny Message) defined OutputPath as:

“OutputPath”: “$.results”

The output from the state machine is:

{
  "addressResult": {
    "approved": true,
    "message": "address validation passed"
  },
  "identityResult": {
    "approved": true,
    "message": "identity validation passed"
  },
  "accountAddition": {
    "statusCode": 200
  },
  "homeInsuranceInterests": {
    "statusCode": 200
  },
  "sendApprovedNotification": {
    "statusCode": 200
  }
}

This response strategy is also effective when using a Synchronous Express Workflow for this business logic.

Advanced JSONPath

You can declaratively use advanced JSONPath expressions to apply logic without writing imperative code in utility functions.

Let’s focus on the interests that the new customer has asked for in the input payload. The Step Functions state machine has a state that focuses on interests in the “home” insurance category. Once the new account application is approved and added to the database successfully, the application captures home insurance interests. It adds home-related detail in an HomeInterestsQueue SQS queue and transitions to the Approved Message state.

The interests JSON array has the information about insurance interests. An effective way to get home-related details is to filter out the interests array based on the category “home”. You can try this in data flow simulator:

You can apply additional filter expressions to filter data according to your use case. To learn more, visit the the data flow simulator blog.

Inside the state machine JSON, the Home Insurance Interests task has:

"InputPath": "$..interests[?(@.category==home)]"

It uses advanced JSONPath with $.. notation and [?(@.category==home)] filters.

Using advanced expressions on JSONPath is not limited to home insurance interests and can be extended to other categories and business logic.

Cleanup

To delete the sample application, use the latest version of the AWS SAM CLI and run:

sam delete

Conclusion

This post uses a sample application to highlight effective use of JSONPath and data filtering strategies that can be used in Step Functions.

JSONPath provides the flexibility to work on JSON objects and arrays inside the Step Functions states machine by reducing the amount of utility code. It allows developers to build state machines by separating concerns for states’ input and output data. Advanced JSONPath expressions help writing declarative filtering logic without needing imperative utility code, optimizing cost, and performance.

For more serverless learning resources, visit Serverless Land.

Operating serverless at scale: Improving consistency – Part 2

2021-10-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-serverless-at-scale-improving-consistency-part-2/

This post is written by Jerome Van Der Linden, Solutions Architect.

Part one of this series describes how to maintain visibility on your AWS resources to operate them properly. This second part focuses on provisioning these resources.

It is relatively easy to create serverless resources. For example, you can set up and deploy a new AWS Lambda function in a few clicks. By using infrastructure as code, such as the AWS Serverless Application Model (AWS SAM), you can deploy hundreds of functions in a matter of minutes.

Before reaching these numbers, companies often want to standardize developments. Standardization is an effective way of reducing development time and costs, by using common tools, best practices, and patterns. It helps in meeting compliance objectives and lowering some risks, mainly around security and operations, by adopting enterprise standards. It can also reduce the scope of required skills, both in terms of development and operations. However, excessive standardization can reduce agility and innovation and sometimes even productivity.

You may want to provide application samples or archetypes, so that each team does not reinvent the wheel for every new project. These archetypes get a standard structure so that any developer joining the team is quickly up-to-speed. The archetypes also bundle everything that is required by your company:

Security library and setup to apply a common security layer to all your applications.
Observability framework and configuration, to collect and centralize logs, metrics and traces.
Analytics, to measure usage of the application.
Other capabilities or standard services you might have in your company.

This post describes different ways to create and share project archetypes.

Sharing AWS SAM templates

AWS SAM makes it easier to create and deploy serverless applications. It uses infrastructure as code (AWS SAM templates) and a command line tool (AWS SAM CLI). You can also create customizable templates that anyone can use to initialize a project. Using the sam init command and the --location option, you can bootstrap a serverless project based on the template available at this location.

For example, here is a CRUDL (Create/Read/Update/Delete/List) microservice archetype. It contains an Amazon DynamoDB table, an API Gateway, and five AWS Lambda functions written in Java (one for each operation):

You must first create a template. Not only an AWS SAM template describing your resources, but also the source code, dependencies and some config files. Using cookiecutter, you can parameterize this template. You add variables surrounded by {{ and }} in files and folders (for example, {{cookiecutter.my_variable}}). You also define a cookiecutter.json file that describes all the variables and their possible values.

In this CRUD microservice, I add variables in the AWS SAM template file, the Java code, and in other project resources:

You then need to share this template either in a Git/Mercurial repository or an HTTP/HTTPS endpoint. Here the template is available on GitHub.

After sharing, anyone within your company can use it and bootstrap a new project with the AWS SAM CLI and the init command:

$ sam init --location git+ssh://[email protected]/aws-samples/java-crud-microservice-template.git

project_name [Name of the project]: product-crud-microservice
object_model [Model to create / read / update / delete]: product
runtime [java11]:

The command prompts you to enter the variables and generates the following project structure and files.

Variable placeholders have been replaced with their values. The project can now be modified for your business requirements and deployed in the AWS Cloud.

There are alternatives to AWS SAM like AWS CloudFormation modules or AWS CDK constructs. These define high-level building blocks that can be shared across the company. Enterprises with multiple development teams and platform engineering teams can also use AWS Proton. This is a managed solution to create templates (see an example for serverless) and share them across the company.

Creating a base container image for Lambda functions

Defining complete application archetypes may be too much work for your company. Or you want to let teams design their architecture and choose their programming languages and frameworks. But you need them to apply a few patterns (centralized logging, standard security) plus some custom libraries. In that case, use Lambda layers if you deploy your functions as zip files or provide a base image if you deploy them as container images.

When building a Lambda function as a container image, you must choose a base image. It contains the runtime interface client to manage the interaction between Lambda and your function code. AWS provides a set of open-source base images that you can use to create your container image.

Using Docker, you can also build your own base image on top of these, including any library, piece of code, configuration, and data that you want to apply to all Lambda functions. Developers can then use this base image containing standard components and benefit from them. This also reduces the risk of misconfiguration or forgetting to add something important.

First, create the base image. Using Docker and a Dockerfile, specify the files that you want to add to the image. The following example uses the Python base image and installs some libraries (security, logging) thanks to pip and the requirements.txt file. It also adds Python code and some config files:

It uses the /var/task folder, which is the working directory for Lambda functions, and where code should reside. Next you must create the image and push it to Amazon Elastic Container Registry (ECR):

# login to ECR with Docker
$ aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account-number>.dkr.ecr.<region>.amazonaws.com

# if not already done, create a repository to store the image
$ aws ecr create-repository --repository-name <my-company-image-name> --image-tag-mutability IMMUTABLE --image-scanning-configuration scanOnPush=true

# build the image
$ docker build -t <my-company-image-name>:<version> .

# get the image hash (alphanumeric string, 12 chars, eg. "ece3a6b5894c")
$ docker images | grep my-company-image-name | awk '{print $3}'

# tag the image
$ docker tag <image-hash> <account-number>.dkr.ecr.<region>.amazonaws.com/<my-company-image-name>:<version>

# push the image to the registry
$ docker push <account-number>.dkr.ecr.<region>.amazonaws.com/<my-company-image-name>:<version>

The base image is now available to use for everyone with access to the ECR repository. To use this image, developers must use it in the FROM instruction in their Lambda Dockerfile. For example:

FROM <account-number>.dkr.ecr.<region>.amazonaws.com/<my-company-image-name>:<version>

COPY app.py ${LAMBDA_TASK_ROOT}

COPY requirements.txt .

RUN pip3 install -r requirements.txt -t "${LAMBDA_TASK_ROOT}"

CMD ["app.lambda_handler"]

Now all Lambda functions using this base image include your standard components and comply with your requirements. See this blog post for more details on building Lambda container-based functions with AWS SAM.

Conclusion

In this post, I show a number of solutions to create and share archetypes or layers across the company. With these archetypes, development teams can quickly bootstrap projects with company standards and best practices. It provides consistency across applications and helps meet compliance rules. From a developer standpoint, it’s a good accelerator and it also allows them to have some topics handled by the archetype.

In governance, companies generally want to enforce these rules. Part 3 will describe how to be more restrictive using different guardrails mechanisms.

Read more information about AWS Proton, which is now generally available.

For more serverless learning resources, visit Serverless Land.

Avoiding recursive invocation with Amazon S3 and AWS Lambda

2021-10-11 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/avoiding-recursive-invocation-with-amazon-s3-and-aws-lambda/

Serverless applications are often composed of services sending events to each other. In one common architectural pattern, Amazon S3 send events for processing with AWS Lambda. This can be used to build serverless microservices that translate documents, import data to Amazon DynamoDB, or process images after uploading.

To avoid recursive invocations between S3 and Lambda, it’s best practice to store the output of a process in a different resource from the source S3 bucket. However, it’s sometimes useful to store processed objects in the same source bucket. In this blog post, I show three different ways you can do this safely and provide other important tips if you use this approach.

The example applications use the AWS Serverless Application Model (AWS SAM), enabling you to deploy the applications more easily to your own AWS account. This walkthrough creates resources covered in the AWS Free Tier but usage beyond the Free Tier allowance may incur cost. To set up the examples, visit the GitHub repo and follow the instructions in the README.md file.

Overview

Infinite loops are not a new challenge for developers. Any programming language that supports looping logic has the capability to generate a program that never exits. However, in serverless applications, services can scale as traffic grows. This makes infinite loops more challenging since they can consume more resources.

In the case of the S3 to Lambda recursive invocation, a Lambda function writes an object to an S3 object. In turn, it invokes the same Lambda function via a put event. The invocation causes a second object to be written to the bucket, which invokes the same Lambda function, and so on:

If you trigger a recursive invocation loop accidentally, you can press the “Throttle” button in the Lambda console to scale the function concurrency down to zero and break the recursion cycle.

The most practical way to avoid this possibility is to use two S3 buckets. By writing an output object to a second bucket, this eliminates the risk of creating additional events from the source bucket. As shown in the first example in the repo, the two-bucket pattern should be the preferred architecture for most S3 object processing workloads:

If you need to write the processed object back to the source bucket, here are three alternative architectures to reduce the risk of recursive invocation.

(1) Using a prefix or suffix in the S3 event notification

When configuring event notifications in the S3 bucket, you can additionally filter by object key, using a prefix or suffix. Using a prefix, you can filter for keys beginning with a string, or belonging to a folder, or both. Only those events matching the prefix or suffix trigger an event notification.

For example, a prefix of “my-project/images” filters for keys in the “my-project” folder beginning with the string “images”. Similarly, you can use a suffix to match on keys ending with a string, such as “.jpg” to match JPG images. Prefixes and suffixes do not support wildcards so the strings provided are literal.

The AWS SAM template in this example shows how to define a prefix and suffix in an S3 event notification. Here, the S3 invokes the Lambda function if the key begins with ‘original/’ and ends with ‘.txt’:

  S3ProcessorFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: src/
      Handler: app.handler
      Runtime: nodejs14.x
      MemorySize: 128
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref SourceBucketName
      Environment:
        Variables:
          DestinationBucketName: !Ref SourceBucketName              
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref SourceBucket
            Events: s3:ObjectCreated:*
            Filter: 
              S3Key:
                Rules:
                  - Name: prefix
                    Value: 'original/'                     
                  - Name: suffix
                    Value: '.txt'

You can then write back to the same bucket providing that the output key does not match the prefix or suffix used in the event notification. In the example, the Lambda function writes the same data to the same bucket but the output key does not include the ‘original/’ prefix.

To test this example with the AWS CLI, upload a sample text file to the S3 bucket:

aws s3 cp sample.txt s3://myS3bucketname

Shortly after, list the objects in the bucket. There is a second object with the same key with no folder name. The first uploaded object invoked the Lambda function due to the matching prefix. The second PutObject action without the prefix did not trigger an event notification and invoke the function.

Providing that your application logic can handle different prefixes and suffixes for source and output objects, this provides a way to use the same bucket for processed objects.

(2) Using object metadata to identify the original S3 object

If you need to ensure that the source object and processed object have the same key, configure user-defined metadata to differentiate between the two objects. When you upload S3 objects, you can set custom metadata values in the S3 console, AWS CLI, or AWS SDK.

In this design, the Lambda function checks for the presence of the metadata before processing. The Lambda handler in this example shows how to use the AWS SDK’s headObject method in the S3 API:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const s3 = new AWS.S3()

exports.handler = async (event) => {
  await Promise.all(
    event.Records.map(async (record) => {
      try {
        // Decode URL-encoded key
        const Key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "))

        const data = await s3.headObject({
          Bucket: record.s3.bucket.name,
          Key
        }).promise()

        if (data.Metadata.original != 'true') {
          console.log('Exiting - this is not the original object.', data)
          return
        }

  // Do work ... /     

      } catch (err) {
        console.error(err)
      }
    })
  )
}

To test this example with the AWS CLI, upload a sample text file to the S3 bucket using the “original” metatag:

aws s3 cp sample.txt s3://myS3bucketname --metadata '{"original":"true"}'

Shortly after, list the objects in the bucket – the original object is overwritten during the Lambda invocation. The second S3 object causes another Lambda invocation but it exits due to the missing metadata.

This allows you to use the same bucket and key name for processed objects, but it requires that the application creating the original object can set object metadata. In this approach, the Lambda function is always invoked twice for each uploaded S3 object.

(3) Using an Amazon DynamoDB table to filter duplicate events

If you need the output object to have the same bucket name and key but you cannot set user-defined metadata, use this design:

In this example, there are two Lambda functions and a DynamoDB table. The first function writes the key name to the table. A DynamoDB stream triggers the second Lambda function which processes the original object. It writes the object back to the same source bucket. Because the same item is put to the DynamoDB table, this does not trigger a new DynamoDB stream event.

To test this example with the AWS CLI, upload a sample text file to the S3 bucket:

aws s3 cp sample.txt s3://myS3bucketname

Shortly after, list the objects in the bucket. The original object is overwritten during the Lambda invocation. The new S3 object invokes the first Lambda function again but the second function is not triggered. This solution allows you to use the same output key without user-defined metadata. However, it does introduce a DynamoDB table to the architecture.

To automatically manage the table’s content, the example in the repo uses DynamoDB’s Time to Live (TTL) feature. It defines a TimeToLiveSpecification in the AWS::DynamoDB::Table resource:

  ## DynamoDB table
  DDBtable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
      - AttributeName: ID
        AttributeType: S
      KeySchema:
      - AttributeName: ID
        KeyType: HASH
      TimeToLiveSpecification:
        AttributeName: TimeToLive
        Enabled: true        
      BillingMode: PAY_PER_REQUEST 
      StreamSpecification:
        StreamViewType: NEW_IMAGE

When the first function writes the key name to the DynamoDB table, it also sets a TimeToLive attribute with a value of midnight on the next day:

        // Epoch timestamp set to next midnight
        const TimeToLive = new Date().setHours(24,0,0,0)

        // Create DynamoDB item
        const params = {
          TableName : process.env.DDBtable,
          Item: {
             ID: Key,
             TimeToLive
          }
        }

The DynamoDB service automatically expires items once the TimeToLive value has passed. In this example, if another object with the same key is stored in the S3 bucket before the TTL value, it does not trigger a stream event. This prevents the same object from being processed multiple times.

Comparing the three approaches

Depending upon the needs of your workload, you can choose one of these three approaches for storing processed objects in the same source S3 bucket:

	1. Prefix/suffix	2. User-defined metadata	3. DynamoDB table
Output uses the same bucket	Y	Y	Y
Output uses the same key	N	Y	Y
User-defined metadata	N	Y	N
Lambda invocations per object	1	2	2 for an original object. 1 for a processed object.

Monitoring applications for recursive invocation

Whenever you have a Lambda function writing objects back to the same S3 bucket that triggered the event, it’s best practice to limit the scaling in the development and testing phases.

Use reserved concurrency to limit a function’s scaling, for example. Setting the function’s reserved concurrency to a lower limit prevents the function from scaling concurrently beyond that limit. It does not prevent the recursion, but limits the resources consumed as a safety mechanism.

Additionally, you should monitor the Lambda function to make sure the logic works as expected. To do this, use Amazon CloudWatch monitoring and alarming. By setting an alarm on a function’s concurrency metric, you can receive alerts if the concurrency suddenly spikes and take appropriate action.

Conclusion

The S3-to-Lambda integration is a foundational building block of many serverless applications. It’s best practice to store the output of the Lambda function in a different bucket or AWS resource than the source bucket.

In cases where you need to store the processed object in the same bucket, I show three different designs to help minimize the risk of recursive invocations. You can use event notification prefixes and suffixes or object metadata to ensure the Lambda function is not invoked repeatedly. Alternatively, you can also use DynamoDB in cases where the output object has the same key.

To learn more about best practices when using S3 to Lambda, see the Lambda Operator Guide. For more serverless learning resources, visit Serverless Land.

Using Okta as an identity provider with Amazon MWAA

2021-10-07 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-okta-as-an-identity-provider-with-amazon-mwaa/

This post is written by Henry Robalino, Solutions Architect.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA), is a fully managed service that allows data engineers and data scientists to run data processing workflows in the cloud. Okta is a third-party identity provider (IdP) that allows customers to use AWS Single Sign-On (AWS SSO) for their employees to be able to log in quickly and securely.

This blog post shows how to integrate Okta with AWS SSO to access Amazon MWAA using single sign-on.

Overview

Customers use Amazon MWAA to run workflows at scale on the cloud. They want to use their existing login solutions and investments the business made on their current IdP, in this case Okta.

AWS SSO does not yet provide APIs to automate creation and configuration of custom SAML 2.0 applications. As a result, many of the steps in this blog are manual and require using the AWS Management Console.

Prerequisites

Deploying this solution requires:

Okta:
- Okta account
- Okta user account to be used with Amazon MWAA
AWS Identity and Access Management (IAM):
- IAM account with proper permissions to edit AWS SSO
AWS SSO:
Amazon MWAA:
- Amazon MWAA environment deployed
- Amazon MWAA web UI URL

Creating an Amazon MWAA application in AWS SSO

Create a custom SAML 2.0 application for Amazon MWAA

Sign into the AWS Management Console, using an account with the appropriate permissions to modify AWS SSO.
In the AWS SSO console, navigate to Applications. Select “Add a new application”.
On the Add New Application page, select “Add a custom SAML 2.0 application”:
On the Configure Custom SAML 2.0 application:
1. For Display name, enter AWS_SSO_Amazon_MWAA.
2. For Description, enter AWS SSO Application for Amazon MWAA.
In the Application metadata section, select the option to manually type in the metadata values.
Before:

After:
Enter the Application properties and Application metadata sections:
- Application start URL: This is the Amazon MWAA WebLogin URL, which you can locate in the Amazon MWAA console.
  - For example: https://123456a0-0101-2020-9e11-1b159eec9000.c2.us-east-1.airflow.amazonaws.com
- Application ACS URL: This is the Assertion Consumer Service (ACS) URL that AWS SSO provides.
  - For example: https://us-east-1.signin.aws.amazon.com/platform/saml/acs/012345678-0102-0304-0506-EXAMPLE01
- Application SAML audience: This is the SAML audience that AWS SSO provides.
  - For example: https://us-east-1.signin.aws.amazon.com/platform/saml/d-012345678E
The Application properties and Application metadata now look like this:
Choose Save changes. A custom SAML 2.0 application for Amazon MWAA is created. You are now redirected to the AWS_SSO_Amazon_MWAA application page.
On the Attribute mappings tab, modify the existing Subject attribute to “${user:subject}” and a Format of “unspecified.” Choose Save changes.
On the Assigned users tab, add the previously created Amazon MWAA Okta user. Select Assign users and the user. Choose Save changes.

You have now created a custom application for Amazon MWAA in AWS SSO. You have added a user and configured the attribute mappings.

Configuring an Amazon MWAA Permission Sets in AWS SSO

Assign IAM permissions to the newly created Amazon MWAA application by using a permissions set. A permission set is a collection of administrator-defined policies that AWS SSO uses to determine a user’s effective permissions to access a given AWS account.

Navigate to the AWS SSO console. Select on AWS accounts on the left-hand side. Select the Permission sets tab and choose the Create permission set button.
Select the Create a custom permission set option.
Provide a name for the Custom Permission Set and an optional description. Choose the Create a custom permissions policy check box.

In the new text field, add the IAM policy below. This set of permissions is associated with the AWS_SSO_Amazon_MWAA application. Make sure to use the correct Amazon Resource Names (ARN) for your Amazon MWAA environment in the below sample text.Sample IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "airflow:GetEnvironment",
                "airflow:CreateCliToken"
            ],
            "Resource": "arn:aws:airflow:us-east-1:111222333444:environment/MY-MWAA-ENV"
        },
        {
            "Effect": "Allow",
            "Action": "airflow:CreateWebLoginToken",
            "Resource": "arn:aws:airflow:us-east-1:111222333444:role/MY-MWAA-ENV/viewer"
        }
    ]
}

The policy enables the following permissions:

- GetEnvironment – retrieves the details of an Amazon MWAA environment
- CreateCLIToken – creates a CLI token request for an MWAA environment.
- CreateWebLoginToken – creates an Airflow web UI login token request for the Amazon MWAA environment.
  - /viewer – level of access for the user of the web token, see Apache Airflow roles.

5. Follow the prompts to fill out tags as necessary. Choose Proceed to AWS accounts.

You have now finished configuring the Amazon MWAA application inside of AWS SSO.

Testing and validation

To test and validate the configuration:

Navigate to your Okta SSO portal. Sign in with the appropriate account that is assigned to the Amazon MWAA application.
To access Amazon MWAA, select the AWS Account application. This opens up the AWS Management Console in another window. Once this window opens, close it. As of this writing Amazon MWAA does not support “Auth Mode: SSO”, hence this workaround.
Next, select the AWS_SSO_Amazon_MWAA application. You are redirected to the Amazon MWAA SSO Page.
Choose the Sign in with AWS Management Console SSO.
You are redirected to the Amazon MWAA web server UI.

In this page, you can see all the DAGs available to you and view the DAG history. In the top-right corner, you can see that you are logged in using the AWS SSO assumed role.

Conclusion

This blog post shows you how to integrate Amazon MWAA with Okta as your managed AWS SSO implementation. You can use this solution for your own use cases and enable Okta SSO and Amazon MWAA.

To stay up to date with AWS Identity launches, see: https://aws.amazon.com/blogs/security/highlights-from-the-latest-aws-identity-launches/.

For more serverless learning resources, visit Serverless Land.

Operating serverless at scale: Implementing governance – Part 1

2021-10-06 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-serverless-at-scale-implementing-governance-part-1/

This post is written by Jerome Van Der Linden, Solutions Architect.

With serverless services, infrastructure management tasks like capacity provisioning and patching are handled by AWS, so you can focus on writing code and deliver value to your customers. By reducing operational overhead, developers can iterate faster and release new features more often.

But with increased agility and productivity, you must also keep control. When scaling to thousands of AWS Lambda functions, hundreds of AWS Step Functions workflows, and millions of Amazon EventBridge events sent throughout the company, you must maintain visibility. What is provisioned, what is running, and how does everything work as a whole?

This three-part series covers important topics to help you maintain control over a growing set of serverless resources.

Maintaining visibility on resources and workloads

For governance, the first recommendation is to have clear visibility of your environment:

Visibility on resources: APIs, Lambda functions, state machines, event buses, queues, or topics. It is essential to have an up-to-date inventory of all your resources together with metadata such as the application it belongs to, the environment where it is deployed, and the owner. This is needed to track cost, manage compliance and evaluate risks.
Visibility on how these resources are linked together. They may be components of the same application. You must track who is calling who (in synchronous calls) or who is consuming what message or event from whom (in asynchronous calls). This dynamic view is as necessary as the inventory. It gives you important insight on the architecture and potential security and compliance issues.

This visibility into your AWS environment is essential to understand what you do with it and be able to operate it. It allows you to understand your workloads and make sure they follow your compliance rules. It allows you to track your usage of AWS and potentially optimize and reduce your costs. This is a best practice for building and growing on AWS.

Tagging your resources

For all resources on AWS, assign tags to your resources. A tag consists in a label (the key) and an optional value. It makes it easier to organize, search for, and filter resources by application, environment, or other criteria. Tags can serve different purposes: automation, access control, cost allocation, risk management. Above all, they provide an additional layer of information you can use to understand why each resource exists.

Assigning tags during provisioning is the preferred method. Using the AWS Serverless Application Model (AWS SAM), you can define tags. For example, for a Lambda function:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: function/
      Handler: app.lambda_handler
      Runtime: python3.8
      Tags:
        mycompany:environment: "dev"
        mycompany:application-id: "ecommerce"
        mycompany:service-id: "products"
        mycompany:business-owner: "[email protected]"

As the number of resources increases in a template, you can use the --tags option of the sam deploy command. This applies a set of tags to all compatible resources declared in the template:

sam deploy \
--tags mycompany:environment=dev \
mycompany:application-id=ecommerce \
mycompany:service-id=products \
mycompany:[email protected]

You can also add these tags in the samconfig.toml file so you don’t need to specify them each time on the command line:

tags = "mycompany:environment=dev mycompany:application-id=ecommerce mycompany:service-id=products mycompany:[email protected]"

Enforcing consistency in tags

To maintain organization, tags must be consistent. For example, do you use “environment“, ”Environment“ or ”env“? And is the value ”dev“, ”Dev“ or ”Development“?

Define which tags are necessary and what do you need to identify: the owner, the application, the environment, the business line, etc. You can have up to 50 tags per resource but you should restrict yourself to a set of needed tags and iterate. Apply the YAGNI principle to minimize maintenance.
Agree on the syntax. You may use spinal-case (lower case with hyphens to separate words) and a prefix to identify your company. For example, mycompany:application-id. Having a tag dictionary shared with developers and administrators may be useful. See this guide for best practices.
Enforce the “rules” that are established:
- At the Organization level, use Service Control Policies to block the creation of a resource if not correctly tagged. The following example denies the creation of Lambda functions when the environment tag is absent. You can find more examples here. Before applying, test policies in a sandbox:
```
{
   "Sid": "DenyCreateLambdaWithNoEnvironmentTag",
   "Effect": "Deny",
   "Action": "lambda:CreateFunction",
   "Resource": [
      "arn:aws:lambda:*:*:function/*"
   ],
   "Condition": {
      "Null": {
          "aws:RequestTag/environment": "true"
       }
    }
}
```

- Use Config Rules in AWS Config to verify that resources have the appropriate tags (see this example for Lambda). You can also perform remediation if necessary.
- Use Tag Policies to enforce consistency across your accounts and resources. Tag Policies verify the syntax and values set on your resources. They mark as non-compliant all the resources that do not match the policies. For example, the following policy defines the tag “environment“ and its possible values:
```
{
    "tags": {
        "mycompany:environment": {
            "tag_value": {
                "@@assign": [
                    "dev",
                    "test",
                    "qa",
                    "prod"
                ]
            }
        }
    }
}
```

4. Monitor the percentage of resources untagged or badly tagged and try to improve. Also iterate on your tag dictionary as your requirements evolve by adding new tags or removing unused ones.

Applying tags on serverless resources

For serverless resources, there are additional tags you may add:

For an API or a microservice, you may want to know if it is public (B2C), semi-public (B2B) or private (internal). This can help you adjust the RTO and the level of support. You can add a tag “exposition” with a value “public” or “private” to your API Gateway and Lambda functions.
For an API or a microservice again, you may want to know the route from where traffic is coming. For a Lambda function, use the tag “route” with a value like “POST /products”, “GET /products/_id_” (“{}” are not valid characters for tags).
For Lambda functions, add sources (triggers) and destinations within tags. For example, add the SNS topic name or SQS queue name to a “trigger” or “destination” tag. This helps document the dependencies between resources and have a better view of the architecture.

Adding more tags increases maintenance, and unmaintained tags can be misleading. Add tags if you really need them and if you can automate their maintenance. For example with AWS CloudFormation or AWS SAM, or using scripts or scheduled Lambda functions (using propagate-cfn-tags for example).

Grouping related resources

In addition to tags, use resource groups to better organize resources, by creating groups of related resources. For example, you can consolidate a set of Lambda functions and APIs for a microservice, or more globally a set of components related to the same application.

If you already created tagged resources, you can create a resource group based on these tags, using the following command. This example groups all supported resources that are tagged with the tag “mycompany:service-id” and value “products”:

aws resource-groups create-group \
--name products-service \
--resource-query '{"Type":"TAG_FILTERS_1_0","Query":"{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"mycompany:service-id\",\"Values\":[\"products\"]}]}"}'

If you use infrastructure as code (AWS CloudFormation or AWS SAM), which is the recommended approach, you can have a resource group created for a complete stack. This is convenient for serverless applications with a reasonable number of related resources:

Resources:
  ResourceGroup:
    Type: AWS::ResourceGroups::Group
    Properties:
      Name: products-service

With a group defined, visualize the list of its resources in the AWS Management Console or using the following CLI command:

aws resource-groups list-group-resources --group products-service

For more details on resource groups, read this blog post.

Getting a dynamic view of your resources

Tags and resource groups give you a static and declarative view of your resources and their relations. To know what is running, and how everything is linked, it is also important to have a dynamic view. It is the best representation of your environment but since it is often documentation-based and rarely automated, it quickly becomes inaccurate and outdated.

Using AWS X-Ray, you can trace requests made from one resource to another, and thus have a complete map of interactions between components of your application. Combine it with Amazon CloudWatch Logs and metrics and you have Amazon CloudWatch ServiceLens.

The primary use case for ServiceLens is to help you debug and troubleshoot performance issues. But you can also use the Service Map to gain visibility into your applications. What are the components that make up your application? What are the transactions and dependencies between those components?

To obtain such a map, you must enable X-Ray tracing in your application for all supported resources. This can be done in the AWS SAM template by enabling “tracing”:

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: function/
      Handler: app.lambda_handler
      Runtime:python3.8
      Tracing: Active
      
  MyApi:
    Type: AWS::Serverless::Api
    Properties:
      DefinitionBody:
        Fn::Transform:
          Name: "AWS::Include"
          Parameters:
            Location: "resources/openapi.yaml"
      EndpointConfiguration: REGIONAL
      StageName: prod
      TracingEnabled: true
      
  MyStateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: statemachine/my_state_machine.asl.json
      Role: arn:aws:iam::123456123456:role/service-role/my-sample-role
      Tracing:
        Enabled: True

You may also need to instrument your Lambda function code using the X-Ray SDK, to retrieve and propagate traces when using services like Amazon SNS, Amazon SQS or Amazon EventBridge.

When this is done, open the CloudWatch ServiceLens console to get the dynamic view of all your components. See their respective size (based on the number of requests they handle), their relations, and the services they use. As it is based on the real execution of your application, it is always up to date.

Conclusion

Having visibility on your AWS resources is the key to operating and growing successfully. In this first part of this series on serverless governance, I describe how you can get this visibility by using tags to organize and group your resources, and ease the search and management of related resources. I also describe how AWS X-Ray, combined with CloudWatch ServiceLens, can provide a dynamic view of workloads and help you understand how serverless resources are acting together.

The second part will focus on provisioning and how to standardize deployments to improve consistency and compliance.

Read this whitepaper on tagging best practices to get more details on tags.

For more serverless learning resources, visit Serverless Land.

Simplifying B2B integrations with AWS Step Functions Workflow Studio

2021-10-04 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/simplifying-b2b-integrations-with-aws-step-functions-workflow-studio/

This post is written by Patrick Guha, Associate Solutions Architect, WWPS, Hamdi Hmimy, Associate Solutions Architect, WWPS, and Madhu Bussa, Senior Solutions Architect, WWPS.

B2B integration helps organizations with disparate technologies communicate and exchange business-critical information. However, organizations typically have few options in building their B2B pipeline infrastructure. Often, they use an out-of-the-box SaaS integration platform that may not meet all of their technical requirements. Other times, they code an entire pipeline from scratch.

AWS Step Functions offers a solution to this challenge. It provides both the code customizability of AWS Lambda with the low-code, visual building experience of Workflow Studio.

This post introduces a customizable serverless architecture for B2B integrations. It explains how the solution works and how it can save you time in building B2B integrations.

Overview

The following diagram illustrates components and interactions of the example application architecture:

This section describes building a configurable B2B integration pipeline with AWS serverless technologies. It walks through the components and discusses the flow of transactions from trading partners to a consolidated database.

Communication

The B2B integration pipeline accepts transactions both in batch (a single file with many transactions) and in real-time (a single transaction) modes. Batch communication is over open standard SFTP protocol, and real-time communication is over the REST/HTTPS protocol.

For batch communication needs, AWS Transfer Family provides managed support for file transfers directly between Amazon Simple Storage Service (S3) or Amazon Elastic File System (EFS). EFS provides a serverless, elastic file system that lets you share file data without provisioning or managing storage.

Amazon EventBridge provides serverless event bus functionality to automate specific actions in the B2B pipeline. In this case, batch transaction uploads from a partner trigger the B2B pipeline. When a file is put in S3 from the Transfer SFTP server, EventBridge captures the event and routes to a target Lambda function.

As batch transactions are saved via AWS Transfer SFTP to Amazon S3, AWS CloudTrail captures the events. It provides the underlying API requests as files are PUT into S3, which triggers the EventBridge rule created previously.

For real-time communication needs, Amazon API Gateway provides an API management layer, allowing you to manage the lifecycle of APIs. Trading partners can send their transactions to this API over the ubiquitous REST API protocol.

Processing

Amazon Simple Queue Service (SQS) is a fully managed queuing service that allows you to decouple applications. In this solution, SQS manages and stores messages to be processed individually.

Lambda is a fully managed serverless compute service that allows you to create business logic functions as needed. In this example, Lambda functions process the data from the pipeline to clean, format, and upload the transactions from SQS.

Step Functions manages the workflow of a B2B transaction. Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Workflows manage failures, retries, parallelization, service integrations, and observability so developers can focus on higher-value business logic.

API Gateway is used in the processing pipeline of the solution to enrich the transactions coming through the pipeline.

Amazon DynamoDB serves as the database for the solution. DynamoDB is a key-value and document database that can scale to virtually any number of requests. As the pipeline experiences a wide range of transaction throughputs, DynamoDB is a good solution to store processed transactions as they arrive.

Batch transaction flow

A trading partner logs in to AWS Transfer SFTP and uploads a batch transaction file.
An S3 bucket is populated with the batch transaction file.
A CloudTrail data event captures the batch transaction file being PUT into S3.
An EventBridge rule is triggered from the CloudTrail data event.
Lambda is triggered from the EventBridge rule. It processes each message from the batch transaction file and sends individual messages to SQS.
SQS stores each message from the file as it is passed through from Lambda.
Lambda is triggered from each SQS incoming message, then invokes Step Functions to run through the following steps for each transaction.
Lambda accepts and formats the transaction payload.
API Gateway enriches the transaction.
DynamoDB stores the transaction.

Single/real-time transaction flow

A trading partner uploads a single transaction via an API Gateway REST API.
API Gateway sends a single transaction to Lambda SQS writer function via proxy integration.
SQS stores each message from the API POSTs.
Lambda is triggered from each SQS incoming message. It invokes Step Functions to run through the workflow for each transaction.
Lambda accepts and formats the transaction payload.
API Gateway enriches the transaction.
DynamoDB stores the transaction.

Exploring and testing the architecture

To explore and test the architecture, there is an AWS Serverless Application Model (AWS SAM) template. The AWS SAM template creates an AWS CloudFormation stack for you. This can help you save time building your own B2B pipeline, as you can deploy and customize the example application.

To deploy in your own AWS account:

To install AWS SAM, visit the installation page.

To deploy the AWS SAM template, navigate to the directory where the template is located. Run the following bash commands in the terminal:

git clone https://github.com/aws-samples/simplified-serverless-b2b-application
cd simplified-serverless-b2b-application
sam build
sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

Prerequisites

Create an SSH key pair. To authenticate to the AWS Transfer SFTP server and upload files, you must generate an SSH key pair. Once created, you must copy the contents of the public key to the SshPublicKeyParameter displayed after running the sam deploy command. Follow the instructions to create an SSH key pair for Transfer.

Copy batch and real-time input. The following XML content contains multiple example transactions to be processed in the batch workflow. Create an XML file on your machine with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<Transactions>
  <Transaction TransactionID="1">
    <Notes>Transaction made between user 57 and user 732.</Notes>
  </Transaction>
  <Transaction TransactionID="2">
    <Notes>Transaction made between user 9824 and user 2739.</Notes>
  </Transaction>
  <Transaction TransactionID="3">
    <Notes>Transaction made between user 126 and user 543.</Notes>
  </Transaction>
  <Transaction TransactionID="4">
    <Notes>Transaction made between user 5785 and user 839.</Notes>
  </Transaction>
  <Transaction TransactionID="5">
    <Notes>Transaction made between user 83782 and user 547.</Notes>
  </Transaction>
  <Transaction TransactionID="6">
    <Notes>Transaction made between user 64783 and user 1638.</Notes>
  </Transaction>
  <Transaction TransactionID="7">
    <Notes>Transaction made between user 785 and user 7493.</Notes>
  </Transaction>
  <Transaction TransactionID="8">
    <Notes>Transaction made between user 5473 and user 3829.</Notes>
  </Transaction>
  <Transaction TransactionID="9">
    <Notes>Transaction made between user 3474 and user 9372.</Notes>
  </Transaction>
  <Transaction TransactionID="10">
    <Notes>Transaction made between user 1537 and user 9473.</Notes>
  </Transaction>
  <Transaction TransactionID="11">
    <Notes>Transaction made between user 2837 and user 7383.</Notes>
  </Transaction>
</Transactions>

Similarly, the following content contains a single transaction to be processed in the real-time workflow.

transactionId=12&transactionMessage= Transaction made between user 687 and user 329.

Download Cyberduck, an SFTP client, to upload the batch transaction file to the B2B pipeline.

Uploading the XML file to Transfer and POST to API Gateway

Use Cyberduck to upload the batch transaction file to the B2B pipeline. Follow the instructions here to upload the preceding transactions XML file. You can find the Transfer server endpoint in both the Transfer console and the Outputs section of the AWS SAM template.

Use the API Gateway console to test the POST method for the single transaction workflow. Navigate to API Gateway in the AWS Management Console and choose the REST API created by the AWS SAM template called SingleTransactionAPI.

In the Resources pane, view the methods. Choose the POST method. Next, choose the client Test bar on the left.

Copy the single real-time transaction into the Query Strings text box then choose Test. This sends the single transaction and starts the real-time workflow of the B2B pipeline.

Viewing Step Functions executions

Navigate to the Step Functions console. Choose the state machine created by the AWS SAM template called StepFunctionsStateMachine.

In the Executions tab, you see a number of successful executions. Each transaction represents a Step Functions state machine execution. This means that every time a transaction is submitted by a trading partner to SQS, it is individually processed as a unique Step Functions state machine execution. This granularity is useful for tracking transactions.

Viewing Workflow Studio

Next, view the Step Functions state machine definition. On the StepFunctionsStateMachine page, choose Edit. You see a code and visual representation of your workflow.

The code version uses Amazon States Language, allowing you to modify the state machine as needed. Choose the Workflow Studio button to get a visual representation of the services and integrations in the workflow.

The Workflow Studio helps you to save time while building a B2B pipeline. There are over 40 different actions you can take on various AWS services and flow states that can provide additional logic to the workflow.

One of the largest benefits of Workflow Studio are the time-savings possible through built-in integrations to AWS services. This architecture includes two integrations: API Gateway request and DynamoDB PutItem.

Choose the API Gateway request state in the diagram. To make a request to the API Gateway REST API, you update the API Parameters section in Configuration. Previously, you may have used a Lambda function to perform this action, adding extra code to maintain a B2B pipeline.

The same is true for DynamoDB. Choose the DynamoDB PutItem state to view more. The configuration to put the item is made in the API Parameters section. To connect to any other AWS services via Actions, add Identity and Access Manager (IAM) permissions for Step Functions to access them. These examples include the necessary IAM permissions for Step Functions to both API Gateway and DynamoDB in the AWS SAM template.

Cleaning up

To avoid ongoing charges to your AWS account, remove the created resources:

Use the CloudFormation console to delete the stack created as part of this demo. Choose the stack, choose Delete, and then choose Delete stack.
Navigate to the S3 console and both Empty then Delete S3 buckets created from the stack: [stackName]-cloudtrails3bucket-[uniqueId] and [stackName]-sftpservers3bucket-[uniqueId]
Navigate to the CloudWatch console and delete the following log groups created from the stack: /aws/lambda/IntakeSingleTransactionLambdaFunction, /aws/lambda/SingleQueueUploadLambda, and /aws/lambda/TriggerStepFunctionsLambdaFunction.

Conclusion

This post shows an architecture to share your business data with your trading partners using API Gateway, AWS Transfer for SFTP, Lambda, and Step Functions. This architecture enables organizations to quickly on-board partners, build event-driven pipelines, and streamline business processes.

To learn more about B2B pipelines on AWS, read: https://aws.amazon.com/blogs/compute/integrating-b2b-using-event-notifications-with-amazon-sns/.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless Q3 2021

2021-10-01 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2021/

Welcome to the 15th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Lambda

You can now choose next-generation AWS Graviton2 processors in your Lambda functions. This Arm-based processor architecture can provide up to 19% better performance at 20% lower cost. You can configure functions to use Graviton2 in the AWS Management Console, API, CloudFormation, and CDK. We recommend using the AWS Lambda Power Tuning tool to see how your function compare and determine the price improvement you may see.

All Lambda runtimes built on Amazon Linux 2 support Graviton2, with the exception of versions approaching end-of-support. The AWS Free Tier for Lambda includes functions powered by both x86 and Arm-based architectures.

You can also use the Python 3.9 runtime to develop Lambda functions. You can choose this runtime version in the AWS Management Console, AWS CLI, or AWS Serverless Application Model (AWS SAM). Version 3.9 includes a range of new features and performance improvements.

Lambda now supports Amazon MQ for RabbitMQ as an event source. This makes it easier to develop serverless applications that are triggered by messages in a RabbitMQ queue. This integration does not require a consumer application to monitor queues for updates. The connectivity with the Amazon MQ message broker is managed by the Lambda service.

Lambda has added support for up to 10 GB of memory and 6 vCPU cores in AWS GovCloud (US) Regions and in the Middle East (Bahrain), Asia Pacific (Osaka), and Asia Pacific (Hong Kong) Regions.

AWS Step Functions

Step Functions now integrates with the AWS SDK, supporting over 200 AWS services and 9,000 API actions. You can call services directly from the Amazon States Language definition in the resource field of the task state. This allows you to work with services like DynamoDB, AWS Glue Jobs, or Amazon Textract directly from a Step Functions state machine. To learn more, see the SDK integration tutorial.

AWS Amplify

The Amplify Admin UI now supports importing existing Amazon Cognito user pools and identity pools. This allows you to configure multi-platform apps to use the same user pools with different client IDs.

Amplify CLI now enables command hooks, allowing you to run custom scripts in the lifecycle of CLI commands. You can create bash scripts that run before, during, or after CLI commands. Amplify CLI has also added support for storing environment variables and secrets used by Lambda functions.

Amplify Geo is in developer preview and helps developers provide location-aware features to their frontend web and mobile applications. This uses the Amazon Location Service to provide map UI components.

Amazon EventBridge

The EventBridge schema registry now supports discovery of cross-account events. When schema registry is enabled on a bus, it now generates schemes for events originating from another account. This helps organize and find events in multi-account applications.

Amazon DynamoDB

The new DynamoDB console experience is now the default for viewing and managing DynamoDB tables. This makes it easier to manage tables from the navigation pane and also provided a new dedicated Items page. There is also contextual guidance and step-by-step assistance to help you perform common tasks more quickly.

API Gateway

API Gateway can now authenticate clients using certificate-based mutual TLS. Previously, this feature only supported AWS Certificate Manager (ACM). Now, customers can use a server certificate issued by a third-party certificate authority or ACM Private CA. Read more about using mutual TLS authentication with API Gateway.

The Serverless Developer Advocacy team built the Amazon API Gateway CORS Configurator to help you configure cross origin resource scripting (CORS) for REST and HTTP APIs. Fill in the information specific to your API and the AWS SAM configuration is generated for you.

Serverless blog posts

July

August

September

Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q3:

Choosing Events, Queues, Topics, and Streams in Your Serverless Application with Talia Nassi.
Getting Started with Serverless for Developers with Ben Smith.

Videos

Serverless Office Hours – Tues 10 AM PT

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws

July

Jul 6 – AWS Step Functions Workflow Studio
Jul 20 – AWS Messaging and Events – AWS Lambda with Amazon MSK
Jul 26 – Happy 15th Birthday to Amazon SQS!
Jul 27 – Serverless Surprise! – SAM Pipelines

August

Aug 3 – AWS Lambda – The patterns collection
Aug 10 – AWS Step Functions – Ask the Experts
Aug 17 – Amazon API Gateway – The CORS Episode
Aug 24 – Serverless Surprise! – Serverless Heroes & Community Builders!
Aug 31 – AWS Messaging & Events – Processing DynamoDB change data

September

Sep 7 – AWS Lambda – The GIF Creator
Sep 14 – AWS Step Functions – Near real time media conversion
Sep 21 – AWS SAM Accelerate
Sep 28 – AWS Messaging & Events – Large payloads with Amazon EventBridge

DynamoDB Office Hours

Are you an Amazon DynamoDB customer with a technical question you need answered? If so, join us for weekly Office Hours on the AWS Twitch channel led by Rick Houlihan, AWS principal technologist and Amazon DynamoDB expert. See upcoming and previous shows

Jul 28 – Building a SAM template for single table AppSync
Aug 11 – Cost optimization best practices
Aug 18 – Building serverless applications, best practices
Sep 8 – Modeling a comparison shopping service
Sep 15 – Modeling a hotel reservation service

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Talia Nassi: @talia_nassi

Building an API poller with AWS Step Functions and AWS Lambda

2021-09-29 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-an-api-poller-with-aws-step-functions-and-aws-lambda/

This post is written by Siarhei Kazhura, Solutions Architect.

Many customers have to integrate with external APIs. One of the most common use cases is data synchronization between a customer and their trusted partner.

There are multiple ways of doing this. For example, the customer can provide a webhook that the partner can call to notify the customer of any data changes. Often the customer has to poll the partner API to stay up to date with the changes. Even when using a webhook, a complete synchronization happening on schedule is necessary.

Furthermore, the partner API may not allow loading all the data at once. Often, a pagination solution allows loading only a portion of the data via one API call. That requires the customer to build an API poller that can iterate through all the data pages to fully synchronize.

This post demonstrates a sample API poller architecture, using AWS Step Functions for orchestration, AWS Lambda for business logic processing, along with Amazon API Gateway, Amazon DynamoDB, Amazon SQS, Amazon EventBridge, Amazon Simple Storage Service (Amazon S3), and the AWS Serverless Application Model (AWS SAM).

Overall architecture

The application consists of the following resources defined in the AWS SAM template:

PollerHttpAPI: The front door of the application represented via an API Gateway HTTP API.
PollerTasksEventBus: An EventBridge event bus that is directly integrated with API Gateway. That means that an API call results in an event being created in the event bus. EventBridge allows you to route the event to the destination you want. It also allows you to archive and replay events as needed, adding resiliency to the architecture. Each event has a unique id that this solution uses for tracing purposes.
PollerWorkflow: The Step Functions workflow.
ExternalHttpApi: The API Gateway HTTP API that is used to simulate an external API.
PayloadGenerator: A Lambda function that is generating a sample payload for the application.
RawPayloadBucket: An Amazon S3 bucket that stores the payload received from the external API. The Step Functions supported payload size is up to 256 KB. For larger payloads, you can store the API payload in an S3 bucket.
PollerTasksTable: A DynamoDB table that tracks each poller’s progress. The table has a TimeToLive (TTL) attribute enabled. This automatically discards tasks that exceed the TTL value.
ProcessPayoadQueue: Amazon SQS queue that decouples our payload fetching mechanism from our payload processing mechanism.
ProcessPayloadDLQ: Amazon SQS dead letter queue is collecting the messages that we are unable to process.
ProcessPayload: Lambda function that is processing the payload. The function reports progress of each poller task, marking it as complete when given payload is processed successfully.

Data flow

When the API poller runs:

After a POST call is made to PollerHttpAPI /jobs endpoint, an event containing the API payload is put on the PollerTasksEventBus.
The event triggers the PollerWorkflow execution. The event payload (including the event unique id) is passed to the PollerWorkflow.
The PollerWorkflow starts by running the PreparePollerJob function. The function retrieves required metadata from the ExternalHttpAPI. For example, the total number of records to be loaded and maximum records that can be retrieved via a single API call. The function creates poller tasks that are required to fetch the data. The task calculation is based on the metadata received.
The PayloadGenerator function generates random ExternalHttpAPI payloads. The PayloadGenerator function also includes code that simulates random errors and delays.
All the tasks are processed in a fan-out fashion using dynamic-parallelism. The FetchPayload function retrieves a payload chunk from the ExternalHttpAPI, and the payload is saved to the RawPayloadBucket.
A message, containing a pointer to the payload file stored in the RawPayloadBucket, the id of the task, and other task information is sent to the ProcessPayloadQueue. Each message has jobId and taskId attributes. This helps correlate the message with the poller task.
Anytime a task is changing its status (for example, when the payload is saved to S3 bucket, or when a message has been sent to SQS queue) the progress is reported to the PollerTaskTable.
The ProcessPayload function is long-polling the ProcessPayloadQueue. As messages appear on the queue, they are being processed.
The ProcessPayload function is removing an object from the RawPayloadBucket. This is done to illustrate a type of processing that you can do with the payload stored in the S3 bucket.
After the payload is removed successfully, the progress is reported to the PollerTasksTable. The corresponding task is marked as complete.
If the ProcessPayload function experiences errors, it tries to process the message several times. If it cannot process the message, the message is pushed to the ProcessPayloadDLQ. This is configured as a dead-letter queue for the ProcessPayloadQueue.

Step Functions state machine

The Step Functions state machine orchestrates the following workflow:

Fetch external API metadata and create tasks required to fetch all payload.
For each task, report that the task has entered Started state.
Since the task is in the Started state, the next action is FetchPayload
Fetch payload from the external API and store it in an S3 bucket.
In case of success, move the task to a PayloadSaved state.
In case of an error, report that the task is in a failed state.
Report that the task has entered PayloadSaved (or failed) state.
In case the task is in the PayloadSaved state, move to the SendToSQS step. If the task is in a failed state, exit.
Send the S3 object pointer and additional task metadata to the SQS queue.
Move the task to an enqueued state.
Report that the task has entered enqueued state.
Since the task is in the enqueued state, we are done.
Combine the results for a single task execution.
Combine the results for all the task executions.

Prerequisites to implement the solution

The following prerequisites are required for this walk-through:

An AWS account.
The AWS SAM CLI installed.
Node.js 14, npm, TypeScript, and jq installed.

Step-by-step instructions

You can use AWS Cloud9, or your preferred IDE, to deploy the AWS SAM template. Refer to the cleanup section of this post for instructions to delete the resources to stop incurring any further charges.

Clone the repository by running the following command:
git clone https://github.com/aws-samples/sam-api-poller.git
Change to the sam-api-poller directory, install dependencies and build the application:
npm install
sam build -c -p
Package and deploy the application to the AWS Cloud, following the series of prompts. Name the stack sam-api-poller:
sam deploy --guided --capabilities CAPABILITY_NAMED_IAM
After stack creation, you see ExternalHttpApiUrl, PollerHttpApiUrl, StateMachineName, and RawPayloadBucket in the outputs section.

Store API URLs as variables:

POLLER_HTTP_API_URL=$(aws cloudformation describe-stacks --stack-name sam-api-poller --query "Stacks[0].Outputs[?OutputKey=='PollerHttpApiUrl'].OutputValue" --output text)

EXTERNAL_HTTP_API_URL=$(aws cloudformation describe-stacks --stack-name sam-api-poller --query "Stacks[0].Outputs[?OutputKey=='ExternalHttpApiUrl'].OutputValue" --output text)

Make an API call:

REQUEST_PYLOAD=$(printf '{"url":"%s/payload"}' $EXTERNAL_HTTP_API_URL)
EVENT_ID=$(curl -d $REQUEST_PYLOAD -H "Content-Type: application/json" -X POST $POLLER_HTTP_API_URL/jobs | jq -r '.Entries[0].EventId')

The EventId that is returned by the API is stored in a variable. You can trace all the poller tasks related to this execution via the EventId. Run the following command to track task progress:
```
curl -H "Content-Type: application/json" $POLLER_HTTP_API_URL/jobs/$EVENT_ID
```

Inspect the output. For example:

{"Started":9,"PayloadSaved":15,"Enqueued":11,"SuccessfullyCompleted":0,"FailedToComplete":0,"Total":35}%

Navigate to the Step Functions console and choose the state machine name that corresponds to the StateMachineName from step 4. Choose an execution and inspect the visual flow.
Inspect each individual step by clicking on it. For example, for the PollerJobComplete step, you see:

Cleanup

Make sure that the `RawPayloadBucket` bucket is empty. In case the bucket has some files, follow emptying a bucket guide.
To delete all the resources permanently and stop incurring costs, navigate to the CloudFormation console. Select the sam-api-poller stack, then choose Delete -> Delete stack.

Cost optimization

For Step Functions, this example uses the Standard Workflow type because it has a visualization tool. If you are planning to re-use the solution, consider switching from standard to Express Workflows. This may be a better option for the type of workload in this example.

Conclusion

This post shows how to use Step Functions, Lambda, EventBridge, S3, API Gateway HTTP APIs, and SQS to build a serverless API poller. I show how you can deploy a sample solution, process sample payload, and store it to S3.

I also show how to perform clean-up to avoid any additional charges. You can modify this example for your needs and build a custom solution for your use case.

For more serverless learning resources, visit Serverless Land.

Creating a serverless face blurring service for photos in Amazon S3

2021-09-27 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-a-serverless-face-blurring-service-for-photos-in-amazon-s3/

Many workloads process photos or imagery from web applications or mobile applications. For privacy reasons, it can be useful to identify and blur faces in these photos. This blog post shows how to build a serverless face blurring service for photos uploaded to an Amazon S3 bucket.

The example application uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application more easily in your own AWS account. This walkthrough creates resources covered in the AWS Free Tier but usage beyond the Free Tier allowance may incur cost. To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

Overview

Using a serverless approach, this face blurring microservice runs on demand in response to new photos being uploaded to S3. The solution uses the following architecture:

When an image is uploaded to the source S3 bucket, S3 sends a notification event to an Amazon SQS queue.
The Lambda service polls the SQS queue and invokes an AWS Lambda function when messages are available.
The Lambda function uses Amazon Rekognition to detect faces in the source image. The service returns the coordinates of faces to the function.
After blurring the faces in the source image, the function stores the resulting image in the output S3 bucket.

Deploying the solution

Before deploying the solution, you need:

An AWS account (sign up for an account if you don’t have one).
The AWS SAM CLI installed.
Node.js installed (version 14 minimum).

To deploy:

From a terminal window, clone the GitHub repo:
git clone https://github.com/aws-samples/serverless-face-blur-service
Change directory:
cd ./serverless-face-blur-service
Download and install dependencies:
sam build
Deploy the application to your AWS account:
sam deploy --guided
During the guided deployment process, enter unique names for the two S3 buckets. These names must be globally unique.

To test the application, upload a JPG file containing at least one face into the source S3 bucket. After a few seconds, the destination bucket contains the output file, with the same name. The output file shows blur content when the faces are detected:

How the face blurring Lambda function works

The Lambda function receives messages from the SQS queue when available. These messages contain metadata about the JPG object uploaded to S3:

{
    "Records": [
        {
            "messageId": "e9a12dd2-1234-1234-1234-123456789012",
            "receiptHandle": "AQEBnjT2rUH+kmEXAMPLE",
            "body": "{\"Records\":[{\"eventVersion\":\"2.1\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-east-1\",\"eventTime\":\"2021-06-21T19:48:14.418Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AROA3DTKMEXAMPLE:username\"},\"requestParameters\":{\"sourceIPAddress\":\"73.123.123.123\"},\"responseElements\":{\"x-amz-request-id\":\"AZ39QWJFVEQJW9RBEXAMPLE\",\"x-amz-id-2\":\"MLpNwwQGQtrNai/EXAMPLE\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"5f37ac0f-1234-1234-82f12343-cbc8faf7a996\",\"bucket\":{\"name\":\"s3-face-blur-source\",\"ownerIdentity\":{\"principalId\":\"EXAMPLE\"},\"arn\":\"arn:aws:s3:::s3-face-blur-source\"},\"object\":{\"key\":\"face.jpg\",\"size\":3541,\"eTag\":\"EXAMPLE\",\"sequencer\":\"123456789\"}}}]}",
            "attributes": {
                "ApproximateReceiveCount": "6",
                "SentTimestamp": "1624304902103",
                "SenderId": "AIDAJHIPREXAMPLE",
                "ApproximateFirstReceiveTimestamp": "1624304902103"
            },
            "messageAttributes": {},
            "md5OfBody": "12345",
            "eventSource": "aws:sqs",
            "eventSourceARN": "arn:aws:sqs:us-east-1:123456789012:s3-lambda-face-blur-S3EventQueue-ABCDEFG01234",
            "awsRegion": "us-east-1"
        }
    ]
}

The body attribute contained a serialized JSON object with an array of records, containing the S3 bucket name and object keys. The Lambda handler in app.js uses the JSON.parse method to create a JSON object from the string:

  const s3Event = JSON.parse(event.Records[0].body)

The handler extracts the bucket and key information. Since the S3 key attribute is URL encoded, it must be decoded before further processing:

const Bucket = s3Event.Records[0].s3.bucket.name
const Key = decodeURIComponent(s3Event.Records[0].s3.object.key.replace(/\+/g, " "))

There are three steps in processing each image: detecting faces in the source image, blurring faces, then storing the output in the destination bucket.

Detecting faces in the source image

The detectFaces.js file contains the detectFaces function. This accepts the bucket name and key as parameters, then uses the AWS SDK for JavaScript to call the Amazon Rekognition service:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const rekognition = new AWS.Rekognition()

const detectFaces = async (Bucket, Name) => {

  const params = {
    Image: {
      S3Object: {
       Bucket,
       Name
      }
     }    
  }

  console.log('detectFaces: ', params)

  try {
    const result = await rekognition.detectFaces(params).promise()
    return result.FaceDetails
  } catch (err) {
    console.error('detectFaces error: ', err)
    return []
  }  
}

The detectFaces method of the Amazon Rekognition API accepts a parameter object defining a reference to the source S3 bucket and key. The service returns a data object with an array called FaceDetails:

{
    "BoundingBox": {
        "Width": 0.20408163964748383,
        "Height": 0.4340078830718994,
        "Left": 0.727995753288269,
        "Top": 0.3109045922756195
    },
    "Landmarks": [
        {
            "Type": "eyeLeft",
            "X": 0.784351646900177,
            "Y": 0.46120116114616394
        },
        {
            "Type": "eyeRight",
            "X": 0.8680923581123352,
            "Y": 0.5227685570716858
        },
        {
            "Type": "mouthLeft",
            "X": 0.7576283812522888,
            "Y": 0.617080807685852
        },
        {
            "Type": "mouthRight",
            "X": 0.8273565769195557,
            "Y": 0.6681531071662903
        },
        {
            "Type": "nose",
            "X": 0.8087539672851562,
            "Y": 0.5677543878555298
        }
    ],
    "Pose": {
        "Roll": 23.821317672729492,
        "Yaw": 1.4818285703659058,
        "Pitch": 2.749311685562134
    },
    "Quality": {
        "Brightness": 83.74250793457031,
        "Sharpness": 89.85481262207031
    },
    "Confidence": 99.9793472290039
}

The Confidence score is the percentage confidence that the image contains a face. This example uses the BoundingBox coordinates to find the location of the face in the image. The response also includes positional data for facial features like the mouth, nose, and eyes.

Blurring faces in the source image

In the blurFaces.js file, the blurFaces function uses the open source GraphicsMagick library to process the source image. The function takes the bucket and key as parameters with the metadata returned by the Amazon Rekognition service:

const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION 
const s3 = new AWS.S3()
const gm = require('gm').subClass({imageMagick: process.env.localTest})

const blurFaces = async (Bucket, Key, faceDetails) => {

  const object = await s3.getObject({ Bucket, Key }).promise()
  let img = gm(object.Body)

  return new Promise ((resolve, reject) => {
    img.size(function(err, dimensions) {
        if (err) reject(err)
        console.log('Image size', dimensions)

        faceDetails.map((faceDetail) => {
            const box = faceDetail.BoundingBox
            const width  = box.Width * dimensions.width
            const height = box.Height * dimensions.height
            const left = box.Left * dimensions.width
            const top = box.Top * dimensions.height

            img.region(width, height, left, top).blur(0, 70)
        })

        img.toBuffer((err, buffer) => resolve(buffer))
    })
  })
}

The function loads the source object from S3 using the getObject method of the S3 API. In the response, the Body attribute contains a buffer with the image data – this is used to instantiate a ‘gm’ object for processing.

Amazon Rekognition’s bounding box coordinates are percentage-based relative to the size of the image. This code converts these percentages to X- and Y-based coordinates and uses the region method to identify a portion of the image. The blur method uses a Gaussian operator based on the inputs provided. Once the transformation is complete, the function returns a buffer with the new image.

Using GraphicsMagick with Lambda functions

The GraphicsMagick package contains operating system-specific binaries. Depending on the operating system of your development machine, you may install binaries locally that are not compatible with Lambda. The Lambda service uses Amazon Linux 2 (AL2).

To simplify local testing and deployment, the sample application uses Lambda layers to package this library. This open source Lambda layer repo shows how to build, deploy, and test GraphicsMagick as a Lambda layer. It also publishes public layers to help you use the library in your Lambda functions.

When testing this function locally with the test.js script, the GM npm package uses the binaries on the local development machine. When the function is deployed to the Lambda service, the package uses the Lambda layer with the AL2-compatible binaries.

Limiting throughput with Amazon Rekognition

Both S3 and Lambda are highly scalable services and in this example can handle thousands of image uploads a second. In this configuration, S3 sends Event Notifications to an SQS queue each time an object is uploaded. The Lambda function processes events from this queue.

When using downstream services in Lambda functions, it’s important to note the quotas and throughputs in place for those services. This can help avoid throttling errors or overwhelming non-serverless services that may not be able to handle the same level of traffic.

The Amazon Rekognition service sets default transaction per second (TPS) rates for AWS accounts. For the DetectFaces API, the default is between 5-50 TPS depending upon the AWS Region. If you need a higher throughput, you can request an increase in the Service Quotas console.

In the AWS SAM template of the example application, the definition of the Lambda function uses two attributes to control the throughput. The ReservedConcurrentExecutions attribute is set to 1, which prevents the Lambda service from scaling beyond one instance of the function. The BatchSize in event source mapping is also set to 1, so each invocation contains only a single S3 event from the SQS queue:

  BlurFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.handler
      Runtime: nodejs14.x
      Timeout: 10
      MemorySize: 2048
      ReservedConcurrentExecutions: 1
      Policies:
        - S3ReadPolicy:
            BucketName: !Ref SourceBucketName
        - S3CrudPolicy:
            BucketName: !Ref DestinationBucketName
        - RekognitionDetectOnlyPolicy: {}
      Environment:
        Variables:
          DestinationBucketName: !Ref DestinationBucketName
      Events:
        MySQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt S3EventQueue.Arn
            BatchSize: 1

The combination of these two values means that this function processes images one at a time, regardless of how many images are uploaded to S3. By increasing these values, you can change the scaling behavior and number of messages processed per invocation. This allows you to control the throughput of the number of the messages sent to Amazon Rekognition for processing.

Conclusion

A serverless face blurring service can provide a simpler way to process photos in workloads with large amounts of traffic. This post introduces an example application that blurs faces when images are saved in an S3 bucket. The S3 PutObject event invokes a Lambda function that uses Amazon Rekognition to detect faces and GraphicsMagick to process the images.

This blog post shows how to deploy the example application and walks through the functions that process the images. It explains how to use GraphicsMagick and how to control throughput in the SQS event source mapping.

For more serverless learning resources, visit Serverless Land.

Managing federated schema with AWS Lambda and Amazon S3

2021-09-22 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/managing-federated-schema-with-aws-lambda-and-amazon-s3/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

GraphQL schema management is one of the biggest challenges in the federated setup. IMDb has 19 subgraphs (graphlets) – each of them owns and publishes a part of the schema as a part of an independent CI/CD pipeline.

To manage federated schema effectively, IMDb introduced a component called Schema Manager. This is responsible for fetching the latest schema changes and validating them before publishing it to the Gateway.

Part 1 presents the migration from a monolithic REST API to a federated GraphQL (GQL) endpoint running on AWS Lambda. This post focuses on schema management in federated GQL systems. It shows the challenges that the teams faced when designing this component and how we addressed them. It also shares best practices and processes for schema management, based on our experience.

Comparing monolithic and federated GQL schema

In the standard, monolithic implementation of GQL, there is a single file used to manage the whole schema. This makes it easier to ensure that there are no conflicts between the new changes and the earlier schema. Everything can be validated at the build time and there is no risk that external changes break the endpoint during runtime.

This is not true for the federated GQL endpoint. The gateway fetches service definitions from the graphlets on runtime and composes the overall schema. If any of the graphlets introduces a breaking change, the gateway fails to compose the schema and won’t be able to serve the requests.

The more graphlets we federate to, the higher the risk of introducing a breaking change. In enterprise scale systems, you need a component that protects the production environment from potential downtime. It must notify graphlet owners that they are about to introduce a breaking change, preferably during development before releasing the change.

Federated schema challenges

There are other aspects of handling federated schema to consider. If you use AWS Lambda, the default schema composition increases the gateway startup time, which impacts the endpoint’s performance. If any of the declared graphlets are unavailable at the time of schema composition, there may be gateway downtime or at least an incomplete overall schema. If schemas are pre-validated and stored in a highly available store such as Amazon S3, you mitigate both of these issues.

Another challenge is schema consistency. Ideally, you want to propagate the changes to the gateway in a timely manner after a schema change is published. You also need to consider handling field deprecation and field transfer across graphlets (change of ownership). To catch potential errors early, the system should support dry-run-like functionality that will allow developers to validate changes against the current schema during the development stage.

The Schema Manager

To mitigate these challenges, the Gateway/Platform team introduces a Schema Manager component to the workload. Whenever there’s a deployment in any of the graphlet pipelines, the schema validation process is triggered.

Schema Manager fetches the most recent sub-schemas from all the graphlets and attempts to compose an overall schema. If there are no errors and conflicts, a change is approved and can be safely promoted to production.

In the case of a validation failure, the breaking change is blocked in the graphlet deployment pipeline and the owning team must resolve the issue before they can proceed with the change. Deployments of graphlet code changes also depend on this approval step, so there is no risk that schema and backend logic can get out of sync, when the approval step blocks the schema change.

Integration with the Gateway

To handle versioning of the composed schema, a manifest file stores the locations of the latest approved set of graphlet schemas. The manifest is a JSON file mapping the name of the graphlet to the S3 key of the schema file, in addition to the endpoint of the graphlet service.

The file name of each graphlet schema is a hash of the schema. The Schema Manager pulls the current manifest and uses the hash of the validated schema to determine if it has changed:

{
   "graphlets": {
     "graphletOne": {
        "schemaPath": "graphletOne/1a3121746e75aafb3ca9cccb94f23d89",
        "endpoint": "arn:aws:lambda:us-east-1:123456789:function:GraphletOne"
     },
     "graphletTwo": { 
        "schemaPath": "graphletTwo/213362c3684c89160a9b2f40cd8f191a",
        "endpoint": "arn:aws:lambda:us-east-1:123456789:function:GraphletTwo"
     },
     ...
  }
}

Based on these details, the Gateway fetches the graphlet schemas from S3 as part of service startup and stores them in the in-memory cache. It later polls for the updates every 5 minutes.

Using S3 as the schema store addresses the latency, availability and validation concerns of fetching schemas directly from the graphlets on runtime.

Eventual schema consistency

Since there are multiple graphlets that can be updated at the same time, there is no guarantee that one schema validation workflow will not overwrite the results of another.

For example:

SchemaUpdater 1 runs for graphlet A.
SchemaUpdater 2 runs for graphlet B.
SchemaUpdater 1 pulls the manifest v1.
SchemaUpdater 2 pulls the manifest v1.
SchemaUpdater 1 uploads manifest v2 with change to graphlet A
SchemaUpdater 2 uploads manifest v3 that overwrites the changes in v2. Contains only changes to graphlet B.

This is not a critical issue because no matter which version of the manifest wins in this scenario both manifests represent a valid schema and the gateway does not have any issues. When SchemaUpdater is run for graphlet A again, it sees that the current manifest does not contain the changes uploaded before, so it uploads again.

To reduce the risk of schema inconsistency, Schema Manager polls for schema changes every 15 minutes and the Gateway polls every 5 minutes.

Local schema development

Schema validation runs automatically for any graphlet change as a part of deployment pipelines. However, that feedback loop happens too late for an efficient schema development cycle. To reduce friction, the team uses a tool that performs this validation step without publishing any changes. Instead, it would output the results of the validation to the developer.

The Schema Validator script can be added as a dependency to any of the graphlets. It fetches graphlet’s schema definition described in Schema Definition Language (SDL) and passes it as payload to Schema Manager. It performs the full schema validation and returns any validation errors (or success codes) to the user.

Best practices for federated schema development

Schema Manager addresses the most critical challenges that come from federated schema development. However, there are other issues when organizing work processes at your organization.

It is crucial for long term maintainability of the federated schema to keep a high-quality bar for the incoming schema changes. Since there are multiple owners of sub-schemas, it’s good to keep a communication channel between the graphlet teams so that they can provide feedback for planned schema changes.

It is also good to extract common parts of the graph to a shared library and generate typings and the resolvers. This lets the graphlet developers benefit from strongly typed code. We use open-source libraries to do this.

Conclusion

Schema Management is a non-trivial challenge in federated GQL systems. The highest risk to your system availability comes with the potential of introducing breaking schema change by one of the graphlets. Your system cannot serve any requests after that. There is the problem of the delayed feedback loop for the engineers working on schema changes and the impact of schema composition during runtime on the service latency.

IMDb addresses these issues with a Schema Manager component running on Lambda, using S3 as the schema store. We have put guardrails in our deployment pipelines to ensure that no breaking change is deployed to production. Our graphlet teams are using common schema libraries with automatically generated typings and review the planned schema changes during schema working group meetings to streamline the development process.

These factors enable us to have stable and highly maintainable federated graphs, with automated change management. Next, our solution must provide mechanisms to prevent still-in-use fields from getting deleted and to allow schema changes coordinated between multiple graphlets. There are still plenty of interesting challenges to solve at IMDb.

For more serverless learning resources, visit Serverless Land.

Building federated GraphQL on AWS Lambda

2021-09-20 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-federated-graphql-on-aws-lambda/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

IMDb is the world’s most popular source for movie, TV, and celebrity content. It deals with a complex business domain including movies, shows, celebrities, industry professionals, events, and a distributed ownership model. There are clear boundaries between systems and data owned by various teams.

Historically, IMDb uses a monolithic REST gateway system that serves clients. Over the years, it has become challenging to manage effectively. There are thousands of files, business logic that lacks clear ownership, and unreliable integration tests tied to the data. To fix this, the team used GraphQL (GQL). This is a query language for APIs that lets you request only the data that you need and a runtime for fulfilling those queries with your existing data.

It’s common to implement this protocol by creating a monolithic service that hosts the complete schema and resolves all fields requested by the client. It is good for applications with a relatively small domain and clear, single-threaded ownership. IMDb chose the federated approach, that allows us to federate GQL requests to all existing data teams. This post shows how to build federated GraphQL on AWS Lambda.

Overview

This article covers migration from a monolithic REST API and monolithic frontend to a federated backend system powering a modern frontend. It enumerates challenges in the earlier system and explains why federated GraphQL addresses these problems.

I present the architecture overview and describe the decisions made when designing the new system. I also present our experiences with developing and running high-traffic and high-visibility pages on the new endpoint – improvement in IMDb’s ownership model, development lifecycle, in addition to ease of scaling.

Comparing GraphQL with federated GraphQL

Federated GraphQL allows you to combine GraphQLs APIs from multiple microservices into a single API. Clients can make a single request and fetch data from multiple sources, including joining across data sources, without additional support from the source services.

This is an example schema fragment:

type TitleQuote {
  "Quote ID"
  id: ID!
  "Is this quote a spoiler?"
  isSpoiler: Boolean!
  "The lines that make up this quote"
  lines: [TitleQuoteLine!]!
  "Votes from users about this quote..."
  interestScore: InterestScore!
  "The language of this quote"
  language: DisplayableLanguage!
}
"A specific line in the Title Quote. Can be a verbal line with characters speaking or stage directions"
type TitleQuoteLine {
  "The characters who speak this line, e.g.  'Rick'. Not required: a line may be non-verbal"
  characters: [TitleQuoteCharacter!]
  "The body of the quotation line, e.g 'Here's looking at you kid. '.  Not required: you may have stage directions with no dialogue."
  text: String
  "Stage direction, e.g. 'Rick gently places his hand under her chin and raises it so their eyes meet'. Not required."
  stageDirection: String
}

This is an example monolithic query: “Get the 2 top quotes from The A-Team (title identifier: tt0084967)”:

{ 
  title(id:"tt0084967"){ 
    quotes(first:2){ 
      lines { text } 
    } 
  }
}

Here is an example response:

{ 
  "data": { 
    "title": { 
      "quotes": { 
        "lines": [
          { 
            "text": "I love it when a plan comes together!"
          },
          {
            "text": "10 years ago a crack commando unit was sent to prison by a military court for a crime they didn't commit..."
          }
        ]
      } 
    }
  }
}

This is an example federated query: “What is Jackie Chan (id nm0000329) known for? Get the text, rating and image for each title”

{
  name(id: "nm0000329") {
    knownFor(first: 4) {
      title {
        titleText {
          text
        }
        ratingsSummary {
          aggregateRating
        }
        primaryImage {
          url
        }
      }
    }
  }
}

The monolithic example fetches quotes from a single service. In the federated example, knownFor, titleText, ratingsSummary, primaryImage are fetched transparently by the gateway from separate services. IMDb federates the requests across 19 graphlets, which are transparent to the clients that call the gateway.

Architecture overview

IMDb supports three channels for users: website, iOS, and Android apps. Each of the channels can request data from a single federated GraphQL gateway, which federates the request to multiple graphlets (sub-graphs). Each of the invoked graphlets returns a partial response, which the gateway merges with responses returned by other graphlets. The client receives only the data that they requested, in the shape specified in the query. This can be especially useful when the developers must be conscious of network usage (for example, over mobile networks).

This is the architecture diagram:

There are two core components in the architecture: the Gateway and Schema Manager, which run on Lambda. The Gateway is a Node.js-based Lambda function that is built on top of open-source Apollo Gateway code. It is customized with code responsible predominantly for handling authentication, caching, metrics, and logging.

Other noteworthy components are Upstream Graphlets and an A/B Testing Service that enables A/B tests in the graph. The Gateway is connected to an Application Load Balancer, which is protected by AWS WAF and fronted by Amazon CloudFront as our CDN layer. This uses Amazon ElastiCache with Redis as the engine to cache partial responses coming from graphlets. All logs and metrics generated by the system are collected in Amazon CloudWatch.

Choosing the compute platform

This uses Lambda, since it scales on demand. IMDb uses Lambda’s Provisioned Concurrency to reduce cold start latency. The infrastructure is abstracted away so there is no need for us to manage our own capacity and handle patches.

Additionally, Application Load Balancer (ALB) has support for directing HTTP traffic to Lambda. This is an alternative to API Gateway. The workload does not need many of the features that API Gateway provides, since the gateway has a single endpoint, making ALB a better choice. ALB also supports AWS WAF.

Using Lambda, the team designed a way to fetch schema updates without needing to fetch the schema with every request. This is addressed with the Schema Manager component. This component improves latency and improves the overall customer experience.

Integration with legacy data services

The main purpose of the federated GQL migration is to deprecate a monolithic service that aggregates data from multiple backend services before sending it to the clients. Some of the data in the federated graph comes from brand new graphlets that are developed with the new system in mind.

However, much of the data powering the GQL endpoint is sourced from the existing backend services. One benefit of running on Lambda is the flexibility to choose the runtime environment that works best with the underlying data sources and data services.

For the graphlets relying on the legacy services, IMDb uses lightweight Java Lambda functions using provided client libraries written in Java. They connect to legacy backends via AWS PrivateLink, fetch the data, and shape it in the format expected by the GQL request. For the modern graphlets, we recommend the graphlet teams to explore Node.js as the first option due to improved performance and ease of development.

Caching

The gateway supports two caching modes for graphlets: static and dynamic. Static caching allows graphlet owners to specify a default TTL for responses returned by a graphlet. Dynamic caching calculates TTL based on a custom caching extension returned with the partial response. It allows graphlet owners to decide on the optimal TTL for content returned by their graphlet. For example, it can be 24 hours for queries containing only static text.

Permissions

Each of the graphlets runs in a separate AWS account. The graphlet accounts grant the gateway AWS account (as AWS principal) invoke permissions on the graphlet Lambda function. This uses a cross-account IAM role in the development environment that is assumed by stacks deployed in engineers’ personal accounts.

Experience with developing on federated GraphQL

The migration to federated GraphQL delivered on expected results. We moved the business logic closer to the teams that have the right expertise – the graphlet teams. At the same time, a dedicated platform team owns and develops the core technical pieces of the ecosystem. This includes the Gateway and Schema Manager, in addition to the common libraries and CDK constructs that can be reused by the graphlet teams. As a result, there is a clear ownership structure, which is aligned with the way IMDb teams are organized.

In terms of operational excellence of the platform team, this reduced support tickets related to business logic. Previously, these were routed to an appropriate data service team with a delay. Integration tests are now stable and independent from underlying data, which increases our confidence in the Continuous Deployment process. It also eliminates changing data as a potential root cause for failing tests and blocked pipelines.

The graphlet teams now have full ownership of the data available in the graph. They own the partial schema and the resolvers that provide data for that part of the graph. Since they have the most expertise in that area, the potential issues are identified early on. This leads to a better customer experience and overall health of the system.

The web and app developers groups are also impacted by the migration. The learning curve was aided by tools like GraphQL Playground and Apollo Client. The teams covered the learning gap quickly and started delivering new features.

One of the main pages at IMDb.com is the Title Page (for example, Shutter Island). This was successfully migrated to use the new GQL endpoint. This proves that the new, serverless federated system can serve production traffic at over 10,000 TPS.

Conclusion

A single, highly discoverable, and well-documented backend endpoint enabled our clients to experiment with the data available in the graph. We were able to clean up the backend API layer, introduce clear ownership boundaries, and give our client powerful tools to speed up their development cycle.

The infrastructure uses Lambda to remove the burden of managing, patching, and scaling our EC2 fleets. The team dedicated this time to work on features and operational excellence of our systems.

Part two will cover how IMDb manages the federated schema and the guardrails used to ensure high availability of the federated GraphQL endpoint.

For more serverless learning resources, visit Serverless Land.

Building a serverless GIF generator with AWS Lambda: Part 2

2021-09-13 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-gif-generator-with-aws-lambda-part-2/

In part 1 of this blog post, I explain how a GIF generation service can support a front-end application for video streaming. I compare the performance of a server-based and serverless approach and show how parallelization can significantly improve processing time. I introduce an example application and I walk through the solution architecture.

In this post, I explain the scaling behavior of the example application and consider alternative approaches. I also look at how to manage memory, temporary space, and files in this type of workload. Finally, I discuss the cost of this approach and how to determine if a workload can use parallelization.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file. The example application uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application more easily in your own AWS account. This walkthrough creates some resources covered in the AWS Free Tier but others incur cost.

Scaling up the AWS Lambda workers with Amazon EventBridge

There are two AWS Lambda functions in the example application. The first detects the length of the source video and then generates batches of events containing start and end times. These events are put onto the Amazon EventBridge default event bus.

An EventBridge rule matches the events and invokes the second Lambda function. This second function receives the events, which have the following structure:

{
    "version": "0",
    "id": "06a1596a-1234-1234-1234-abc1234567",
    "detail-type": "newVideoCreated",
    "source": "custom.gifGenerator",
    "account": "123456789012",
    "time": "2021-0-17T11:36:38Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "key": "long.mp4",
        "start": 2250,
        "end": 2279,
        "length": 3294.024,
        "tsCreated": 1623929798333
    }
}

The detail attribute contains the unique start and end time for the slice of work. Each Lambda invocation receives a different start and end time and works on a 30-second snippet of the whole video. The function then uses FFMPEG to download the original video from the source Amazon S3 bucket and perform the processing for its allocated time slice.

The EventBridge rule matches events and invokes the target Lambda function asynchronously. The Lambda service scales up the number of execution environments in response to the number of events:

The first function produces batches of events almost simultaneously but the worker function takes several seconds to process a single request. If there is no existing environment available to handle the request, the Lambda scales up to process the work. As a result, you often see a high level of concurrency when running this application, which is how parallelization is achieved:

Lambda continues to scale up until it reaches the initial burst concurrency quotas in the current AWS Region. These quotas are between 500 and 3000 execution environments per minute initially. After the initial burst, concurrency scales by an additional 500 instances per minute.

If the number of events is higher, Lambda responds to EventBridge with a throttling error. The EventBridge service retries the events with exponential backoff for 24 hours. Once Lambda is scaled sufficiently or existing execution environments become available, the events are then processed.

This means that under exceptional levels of heavy load, this retry pattern adds latency to the overall GIF generation task. To manage this, you can use Provisioned Concurrency to ensure that more execution environments are available during periods of very high load.

Alternative ways to scale the Lambda workers

The asynchronous invocation mode for Lambda allows you to scale up worker Lambda functions quickly. This is the mode used by EventBridge when Lambda functions are defined as targets in rules. The other benefit of using EventBridge to decouple the two functions in this example is extensibility. Currently, the events have only a single consumer. However, you can add new capabilities to this application by building new event consumers, without changing the producer logic. Note that using EventBridge in this architecture costs $1 per million events put onto the bus (this cost varies by Region). Delivery to targets in EventBridge is free.

This design could similarly use Amazon SNS, which also invokes consuming Lambda functions asynchronously. This costs $0.50 per million messages and delivery to Lambda functions is free (this cost varies by Region). Depending on if you use EventBridge capabilities, SNS may be a better choice for decoupling the two Lambda functions.

Alternatively, the first Lambda function could invoke the second function by using the invoke method of the Lambda API. By using the AWS SDK for JavaScript, one Lambda function can invoke another directly from the handler code. When the InvocationType is set to ‘Event’, this invocation occurs asynchronously. That means that the calling function does not wait for the target function to finish before continuing.

This direct integration between two Lambda services is the lowest latency alternative. However, this limits the extensibility of the solution in the future without modifying code.

Managing memory, temp space, and files

You can configure the memory for a Lambda function up to 10,240 MB. However, the temporary storage available in /tmp is always 512 MB, regardless of memory. Increasing the memory allocation proportionally increases the amount of virtual CPU and network bandwidth available to the function. To learn more about how this works in detail, watch Optimizing Lambda performance for your serverless applications.

The original video files used in this workload may be several gigabytes in size. Since these may be larger than the /tmp space available, the code is designed to keep the movie file in memory. As a result, this solution works for any length of movie that can fit into the 10 GB memory limit.

The FFMPEG application expects to work with local file systems and is not designed to work with object stores like Amazon S3. It can also read video files from HTTP endpoints, so the example application loads the S3 object over HTTPS instead of downloading the file and using the /tmp space. To achieve this, the code uses the getSignedUrl method of the S3 class in the SDK:

 	// Configure S3
 	const AWS = require('aws-sdk')
 	AWS.config.update({ region: process.env.AWS_REGION })
 	const s3 = new AWS.S3({ apiVersion: '2006-03-01' }) 

 	// Get signed URL for source object
	const params = {
		Bucket: record.s3.bucket.name, 
		Key: record.s3.object.key, 
		Expires: 300
	}
	const url = s3.getSignedUrl('getObject', params)

The resulting URL contains credentials to download the S3 object over HTTPs. The Expires attributes in the parameters determines how long the credentials are valid for. The Lambda function calling this method must have appropriate IAM permissions for the target S3 bucket.

The GIF generation Lambda function stores the output GIF and JPG in the /tmp storage space. Since the function can be reused by subsequent invocations, it’s important to delete these temporary files before each invocation ends. This prevents the function from using all of the /tmp space available. This is handled by the tmpCleanup function:

const fs = require('fs')
const path = require('path')
const directory = '/tmp/'

// Deletes all files in a directory
const tmpCleanup = async () => {
    console.log('Starting tmpCleanup')
    fs.readdir(directory, (err, files) => {
        return new Promise((resolve, reject) => {
            if (err) reject(err)

            console.log('Deleting: ', files)                
            for (const file of files) {
                const fullPath = path.join(directory, file)
                fs.unlink(fullPath, err => {
                    if (err) reject (err)
                })
            }
            resolve()
        })
    })
}

When the GenerateFrames parameter is set to true in the AWS SAM template, the worker function generates one frame per second of video. For longer videos, this results in a significant number of files. Since one of the dimensions of S3 pricing is the number of PUTs, this function increases the cost of the workload when using S3.

For applications that are handling large numbers of small files, it can be more cost effective to use Amazon EFS and mount the file system to the Lambda function. EFS charges based upon data storage and throughput, instead of number of files. To learn more about using EFS with Lambda, read this Compute Blog post.

Calculating the cost of the worker Lambda function

While parallelizing Lambda functions significantly reduces the overall processing time in this case, it’s also important to calculate the cost. To process the 3-hour video example in part 1, the function uses 345 invocations with 4096 MB of memory. Each invocation has an average duration of 4,311 ms.

Using the AWS Pricing Calculator, and ignoring the AWS Free Tier allowance, the costs to process this video is approximately $0.10.

There are additional charges for other services used in the example application, such as EventBridge and S3. However, in terms of compute cost, this may compare favorably with server-based alternatives that you may have to scale manually depending on traffic. The exact cost depends upon your implementation and latency needs.

Deciding if a workload can be parallelized

The GIF generation workload is a good candidate for parallelization. This is because each 30-second block of work is independent and there is no strict ordering requirement. The end result is not impacted by the order that the GIFs are generated in. Each GIF also takes several seconds to generate, which is why the time saving comparison with the sequential, server-based approach is so significant.

Not all workloads can be parallelized and in many cases the work duration may be much shorter. This workload interacts with S3, which can scale to any level of read or write traffic created by the worker functions. You may use other downstream services that cannot scale this way, which may limit the amount of parallel processing you can use.

To learn more about designing and operating Lambda-based applications, read the Lambda Operator Guide.

Conclusion

Part 2 of this blog post expands on some of the advanced topics around scaling Lambda in parallelized workloads. It explains how the asynchronous invocation mode of Lambda scales and different ways to scale the worker Lambda function.

I cover how the example application manages memory, files, and temporary storage space. I also explain how to calculate the compute cost of using this approach, and considering if you can use parallelization in a workload.

For more serverless learning resources, visit Serverless Land.

Building a serverless distributed application using a saga orchestration pattern

2021-09-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-distributed-application-using-a-saga-orchestration-pattern/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx (Developer Acceleration).

This post shows how to use the saga design pattern to preserve data integrity in distributed transactions across multiple services. In a distributed transaction, multiple services can be called before a transaction is completed. When the services store data in different data stores, it can be challenging to maintain data consistency across these data stores.

To maintain consistency in a transaction, relational databases provide two-phase commit (2PC). This consists of a prepare phase and a commit phase. In the prepare phase, the coordinating process requests the transaction’s participating processes (participants) to promise to commit or rollback the transaction. In the commit phase, the coordinating process requests the participants to commit the transaction. If the participants cannot agree to commit in the prepare phase, then the transaction is rolled back.

In distributed systems architected with microservices, two-phase commit is not an option as the transaction is distributed across various databases. In this case, one solution is to use the saga pattern.

A saga consists of a sequence of local transactions. Each local transaction in saga updates the database and triggers the next local transaction. If a transaction fails, then the saga runs compensating transactions to revert the database changes made by the preceding transactions.

There are two types of implementations for the saga pattern: choreography and orchestration.

Saga choreography

The saga choreography pattern depends on the events emitted by the microservices. The saga participants (microservices) subscribe to the events and they act based on the event triggers. For example, the order service in the following diagram emits an OrderPlaced event. The inventory service subscribes to that event and updates the inventory when the OrderPlaced event is emitted. Similarly the participant services act based on the context of the emitted event.

Saga orchestration

The saga orchestration pattern has a central coordinator called orchestrator. The saga orchestrator manages and coordinates the entire transaction lifecycle. It is aware of the series of steps to be performed to complete the transaction. To run a step, it sends a message to the participant microservice to perform the operation. The participant microservice completes the operation and sends a message to the orchestrator. Based on the received message, the orchestrator decides which microservice to run next in the transaction:

You can use AWS Step Functions to implement the saga orchestration when the transaction is distributed across multiple databases.

Overview

This example uses a Step Functions workflow to implement the saga orchestration pattern, using the following architecture:

When a customer calls the API, the invocation occurs, and pre-processing occurs in the Lambda function. The function starts the Step Functions workflow to start processing the distributed transaction.

The Step Functions workflow calls the individual services for order placement, inventory update, and payment processing to complete the transaction. It sends an event notification for further processing. The Step Functions workflow acts as the orchestrator to coordinate the transactions. If there is any error in the workflow, the orchestrator runs the compensatory transactions to ensure that the data integrity is maintained across various services.

When pre-processing is not required, you can also trigger the Step Functions workflow directly from API Gateway without the Lambda function.

The Step Functions workflow

The following diagram shows the steps that are run inside the Step Functions workflow. The green boxes show the steps that are run successfully. The order is placed, inventory is updated, and payment is processed before a Success state is returned to the caller.

The orange boxes indicate the compensatory transactions that are run when any one of the steps in the workflow fails. If the workflow fails at the Update inventory step, then the orchestrator calls the Revert inventory and Remove order steps before returning a Fail state to the caller. These compensatory transactions ensure that the data integrity is maintained. The inventory reverts to original levels and the order is reverted back.

This preceding workflow is an example of a distributed transaction. The transaction data is stored across different databases and each service writes to its own database.

Prerequisites

For this walkthrough, you need:

An AWS account
An AWS user with AdministratorAccess (see the instructions on the AWS Identity and Access Management (IAM) console)
Access to the following AWS services: Amazon API Gateway, AWS Lambda, AWS Step Functions, and Amazon DynamoDB.
Node.js installed
.NET Core 3.1 SDK installed
JetBrains Rider or Microsoft Visual Studio 2017 or later (or Visual Studio Code)
Postman to make the API call

Setting up the environment

For this walkthrough, use the AWS CDK code in the GitHub Repository to create the AWS resources. These include IAM roles, REST API using API Gateway, DynamoDB tables, the Step Functions workflow and Lambda functions.

You need an AWS access key ID and secret access key for configuring the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
Clone the repo:
git clone https://github.com/aws-samples/saga-orchestration-netcore-blog
After cloning, this is the directory structure:
The Lambda functions in the saga-orchestration directory must be packaged and copied to the cdk-saga-orchestration\lambdas directory before deployment. Run these commands to process the PlaceOrderLambda function:
```
cd PlaceOrderLambda/src/PlaceOrderLambda 
dotnet lambda package
cp bin/Release/netcoreapp3.1/PlaceOrderLambda.zip ../../../../cdk-saga-orchestration/lambdas
```
Repeat the same commands for all the Lambda functions in the saga-orchestration directory.

Build the CDK code before deploying to the console:

cd cdk-saga-orchestration/src/CdkSagaOrchestration
dotnet build

Install the aws-cdk package:
```
npm install -g aws-cdk 
```
The cdk synth command causes the resources defined in the application to be translated into an AWS CloudFormation template. The cdk deploy command deploys the stacks into your AWS account. Run:
```
cd cdk-saga-orchestration
cdk synth 
cdk deploy
```
CDK deploys the environment to AWS. You can monitor the progress using the CloudFormation console. The stack name is CdkSagaOrchestrationStack:

The Step Functions configuration

The CDK creates the Step Functions workflow, DistributedTransactionOrchestrator. The following snippet defines the workflow with AWS CDK for .NET:

var stepDefinition = placeOrderTask
    .Next(new Choice(this, "Is order placed")
        .When(Condition.StringEquals("$.Status", "ORDER_PLACED"), updateInventoryTask
            .Next(new Choice(this, "Is inventory updated")
                .When(Condition.StringEquals("$.Status", "INVENTORY_UPDATED"),
                    makePaymentTask.Next(new Choice(this, "Is payment success")
                        .When(Condition.StringEquals("$.Status", "PAYMENT_COMPLETED"), successState)
                        .When(Condition.StringEquals("$.Status", "ERROR"), revertPaymentTask)))
                .When(Condition.StringEquals("$.Status", "ERROR"), waitState)))
        .When(Condition.StringEquals("$.Status", "ERROR"), failState));

Compare the states language definition for the state machine with the definition above. Also observe the inputs and outputs for each step and how the conditions have been configured. The steps with type Task call a Lambda function for the processing. The steps with type Choice are decision-making steps that define the workflow.

Setting up the DynamoDB table

The Orders and Inventory DynamoDB tables are created using AWS CDK. The following snippet creates a DynamoDB table with AWS CDK for .NET:

var inventoryTable = new Table(this, "Inventory", new TableProps
{
    TableName = "Inventory",
    PartitionKey = new Attribute
    {
        Name = "ItemId",
        Type = AttributeType.STRING
    },
    RemovalPolicy = RemovalPolicy.DESTROY
});

Open the DynamoDB console and select the Inventory table.
Choose Create Item.

Select Text, paste the following contents, then choose Save.

{
  "ItemId": "ITEM001",
  "ItemName": "Soap",
  "ItemsInStock": 1000,
  "ItemStatus": ""
}

Create two more items in the Inventory table:

{
  "ItemId": "ITEM002",
  "ItemName": "Shampoo",
  "ItemsInStock": 500,
  "ItemStatus": ""
}

{
  "ItemId": "ITEM003",
  "ItemName": "Toothpaste",
  "ItemsInStock": 2000,
  "ItemStatus": ""
}

The Lambda functions UpdateInventoryLambda and RevertInventoryLambda increment and decrement the ItemsInStock attribute value. The Lambda functions PlaceOrderLambda and UpdateOrderLambda insert and delete items in the Orders table. These are invoked by the saga orchestration workflow.

Triggering the saga orchestration workflow

The API Gateway endpoint, SagaOrchestratorAPI, is created using AWS CDK. To invoke the endpoint:

From the API Gateway service page, select the SagaOrchestratorAPI:
Select Stages in the left menu panel:
Select the prod stage and copy the Invoke URL:
From Postman, open a new tab. Select POST in the dropdown and enter the copied URL in the textbox. Move to the Headers tab and add a new header with the key ‘Content-Type’ and value as ‘application/json’:

In the Body tab, enter the following input and choose Send.

{
  "ItemId": "ITEM001",
  "CustomerId": "ABC/002",
  "MessageId": "",
  "FailAtStage": "None"
}

You see the output:
Open the Step Functions console and view the execution. The graph inspector shows that the execution has completed successfully.
Check the items in the DynamoDB tables, Orders & Inventory. You can see an item in the Orders table indicating that an order is placed. The ItemsInStock in the Inventory table has been deducted.
To simulate the failure workflow in the saga orchestrator, send the following JSON as body in the Postman call. The FailAtStage parameter injects the failure in the workflow. Select Send in Postman after updating the Body:
```
{
  "ItemId": "ITEM002",
  "CustomerId": "DEF/002",
  "MessageId": "",
  "FailAtStage": "UpdateInventory"
}
```
Open the Step Functions console to see the execution.
While the function waits in the wait state, look at the items in the DynamoDB tables. A new item is added to the Orders table and the stock for Shampoo is deducted in the Inventory table.
Once the wait completes, the compensatory transaction steps are run:
In the graph inspector, select the Update Inventory step. On the right pane, click on the Step output tab. The status is ERROR, which changes the control flow to run the compensatory transactions.
Look at the items in the DynamoDB table again. The data is now back to a consistent state, as the compensatory transactions have run to preserve data integrity:

The Step Functions workflow implements the saga orchestration pattern. It performs the coordination across distributed services and runs the transactions. It also performs compensatory transactions to preserve the data integrity.

Cleaning up

To avoid incurring additional charges, clean up all the resources that have been created. Run the following command from a terminal window. This deletes all the resources that were created as part of this example.

cdk destroy

Conclusion

This post showed how to implement the saga orchestration pattern using API Gateway, Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This can help maintain data integrity in distributed transactions across multiple services. Step Functions makes it easier to implement the orchestration in the saga pattern.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about the features, refer to the AWS CDK Features page.

Increasing performance of Java AWS Lambda functions using tiered compilation

2021-08-31 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/increasing-performance-of-java-aws-lambda-functions-using-tiered-compilation/

This post is written by Mark Sailes, Specialist Solutions Architect, Serverless and Richard Davison, Senior Partner Solutions Architect.

The Operating Lambda: Performance optimization blog series covers important topics for developers, architects, and system administrators who are managing applications using AWS Lambda functions. This post explains how you can reduce the initialization time to start a new execution environment when using the Java-managed runtimes.

Many Lambda workloads are designed to deliver fast responses to synchronous or asynchronous workloads in a fraction of a second. Examples of these could be public APIs to deliver dynamic content to websites or a near-real time data pipeline doing small batch processing.

As usage of these systems increases, Lambda creates new execution environments. When a new environment is created and used for the first time, additional work is done to make it ready to process an event. This creates two different performance profiles: one with and one without the additional work.

To improve the response time, you can minimize the effect of this additional work. One way you can minimize the time taken to create a new managed Java execution environment is to tune the JVM. It can be optimized specifically for these workloads that do not have long execution durations.

One example of this is configuring a feature of the JVM called tiered compilation. From version 8 of the Java Development Kit (JDK), the two just-in-time compilers C1 and C2 have been used in combination. C1 is designed for use on the client side and to enable short feedback loops for developers. C2 is designed for use on the server side and to achieve higher performance after profiling.

Tiering is used to determine which compiler to use to achieve better performance. These are represented as five levels:

Profiling has an overhead, and performance improvements are only achieved after a method has been invoked a number of times, the default being 10,000. Lambda customers wanting to achieve faster startup times can use level 1 with little risk of reducing warm start performance. The article “Startup, containers & Tiered Compilation” explains tiered compilation further.

For customers who are doing highly repetitive processing, this configuration might not be suited. Applications that repeat the same code paths many times want the JVM to profile and optimize these paths. Concrete examples of these would be using Lambda to run Monte Carlo simulation or hash calculations. You can run the same simulations thousands of times and the JVM profiling can reduce the total execution time significantly.

Performance improvements

The example project is a Java 11-based application used to analyze the impact of this change. The application is triggered by Amazon API Gateway and then puts an item into Amazon DynamoDB. To compare the performance difference caused by this change, there is one Lambda function with the additional changes and one without. There are no other differences in the code.

Download the code for this example project from the GitHub repo: https://github.com/aws-samples/aws-lambda-java-tiered-compilation-example.

To install prerequisite software:

Install the AWS CDK.
Install Apache Maven, or use your preferred IDE.
Build and package the Java application in the software folder:
```
cd software/ExampleFunction/
mvn package
```

Zip the execution wrapper script:

cd ../OptimizationLayer/
./build-layer.sh
cd ../../

Synthesize CDK. This previews changes to your AWS account before it makes them:
```
cd infrastructure
cdk synth
```
Deploy the Lambda functions:
```
cdk deploy --outputs-file outputs.json
```

The API Gateway endpoint URL is displayed in the output and saved in a file named outputs.json. The contents are similar to:

InfrastructureStack.apiendpoint = https://{YOUR_UNIQUE_ID_HERE}.execute-api.eu-west-1.amazonaws.com

Using Artillery to load test the changes

First, install prerequisites:

Install jq and Artillery Core.

Run the following two scripts from the /infrastructure directory:

artillery run -t $(cat outputs.json | jq -r '.InfrastructureStack.apiendpoint') -v '{ "url": "/without" }' loadtest.yml

artillery run -t $(cat outputs.json | jq -r '.InfrastructureStack.apiendpoint') -v '{ "url": "/with" }' loadtest.yml

Check the results using Amazon CloudWatch Insights

Navigate to Amazon CloudWatch.
Select Logs then Logs Insights.
Select the following two log groups from the drop-down list:
/aws/lambda/example-with-layer
/aws/lambda/example-without-layer

Copy the following query and choose Run query:

    filter @type = "REPORT"
    | parse @log /\d+:\/aws\/lambda\/example-(?<function>\w+)-\w+/
    | stats
    count(*) as invocations,
    pct(@duration, 0) as p0,
    pct(@duration, 25) as p25,
    pct(@duration, 50) as p50,
    pct(@duration, 75) as p75,
    pct(@duration, 90) as p90,
    pct(@duration, 95) as p95,
    pct(@duration, 99) as p99,
    pct(@duration, 100) as p100
    group by function, ispresent(@initDuration) as coldstart
    | sort by function, coldstart

You see results similar to:

Here is a simplified table of the results:

Settings	Type	# of invocations	p90 (ms)	p95 (ms)	p99 (ms)
Default settings	Cold start	754	5,212	5,338	5,517
Default settings	Warm start	35,247	58	93	255
Tiered compilation stopping at level 1	Cold start	383	2,071	2,086	2,221
Tiered compilation stopping at level 1	Warm start	35,618	23	32	86

The results are from testing 120 concurrent requests over 5 minutes using an open-source software project called Artillery. You can find instructions on how to run these tests in the GitHub repo. The results show that for this application, cold starts for 90% of invocations improve by 3141 ms (60%). These numbers are specific for this application and your application may behave differently.

Using wrapper scripts for Lambda functions

Wrapper scripts are a feature available in Java 8 and Java 11 on Amazon Linux 2 managed runtimes. They are not available for the Java 8 on Amazon Linux 1 managed runtime.

To apply this optimization flag to Java Lambda functions, create a wrapper script and add it to a Lambda layer zip file. This script will alter the JVM flags which Java is started with, within the execution environment.

#!/bin/sh
shift
export _JAVA_OPTIONS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"
java "$@"

Read the documentation to learn how to create and share a Lambda layer.

Console walkthrough

This change can be configured using AWS Serverless Application Model (AWS SAM), the AWS Command Line Interface (AWS CLI), AWS CloudFormation, or from within the AWS Management Console.

Using the AWS Management Console:

Navigate to the AWS Lambda console.
Select Functions and choose the Lambda function to add the layer to.
The Code tab is selected by default. Scroll down to the Layers panel.
Select Add a layer.
Select Custom layers and choose your layer.
Select the Version. Choose Add.
From the menu, select the Configuration tab and Environment variables. Choose Edit.
Choose Add environment variable. Add the following:
– Key: AWS_LAMBDA_EXEC_WRAPPER
– Value: /opt/java-exec-wrapper
Choose Save.You can verify that the changes are applied by invoking the function and viewing the log events. The log line Picked up _JAVA_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=1 is added.

Conclusion

Tiered compilation stopping at level 1 reduces the time the JVM spends optimizing and profiling your code. This could help reduce start up times for Java applications that require fast responses, where the workload doesn’t meet the requirements to benefit from profiling.

You can make further reductions in startup time using GraalVM. Read more about GraalVM and the Quarkus framework in the architecture blog. View the code example at https://github.com/aws-samples/aws-lambda-java-tiered-compilation-example to see how you can apply this to your Lambda functions.

For more serverless learning resources, visit Serverless Land.