Tag Archives: contributed

Translating content dynamically by using Amazon S3 Object Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/translating-content-dynamically-by-using-amazon-s3-object-lambda/

This post is written by Sandeep Mohanty, Senior Solutions Architect.

The recent launch of Amazon S3 Object Lambda creates many possibilities to transform data in S3 buckets dynamically. S3 Object Lambda can be used with other AWS serverless services to transform content stored in S3 in many creative ways. One example is using S3 Object Lambda with Amazon Translate to translate and serve content from S3 buckets on demand.

Amazon Translate is a serverless machine translation service that delivers fast and customizable language translation. With Amazon Translate, you can localize content such as websites and applications to serve a diverse set of users.

Using S3 Object Lambda with Amazon Translate, you do not need to translate content in advance for all possible permutations of source to target languages. Instead, you can transform content in near-real time using a data driven model. This can serve multiple language-specific applications simultaneously.

S3 Object Lambda enables you to process and transform data using Lambda functions as objects are being retrieved from S3 by a client application. S3 GET object requests invoke the Lambda function and you can customize it to transform the content to meet specific requirements.

For example, if you run a website or mobile application with global visitors, you must provide translations in multiple languages. Artifacts such as forms, disclaimers, or product descriptions can be translated to serve a diverse global audience using this approach.

Solution architecture

This is the high-level architecture diagram for the example application that translates dynamic content on demand:

Solution architecture

In this example, you create an S3 Object Lambda that intercepts S3 GET requests for an object. It then translates the file to a target language, passed as an argument appended to the S3 object key. At a high level, the steps can be summarized as follows:

  1. Create a Lambda function to translate data from a source language to a target language using Amazon Translate.
  2. Create an S3 Object Lambda Access Point from the S3 console.
  3. Select the Lambda function created in step 1.
  4. Provide a supporting S3 Access Point to give S3 Object Lambda access to the original object.
  5. Retrieve a file from S3 by invoking the S3 GetObject API, and pass the Object Lambda Access Point ARN as the bucket name instead of the actual S3 bucket name.

Creating the Lambda function

In the first step, you create the Lambda function, DynamicFileTranslation. This Lambda function is invoked by an S3 GET Object API call and it translates the requested object. The target language is passed as an argument appended to the S3 object key, corresponding to the object being retrieved.

For example, for the object key passed in the S3 GetObject API call is customized to look something like “ContactUs/contact-us.txt#fr”, the characters after the pound sign represent the code for the target language. In this case, ‘fr’ is French. The full list of supported languages and language codes can be found here.

This Lambda function dynamically translates the content of an object in S3 to a target language:

import json
import boto3
from urllib.parse import urlparse, unquote
from pathlib import Path
def lambda_handler(event, context):
    print(event)

   # Extract the outputRoute and outputToken from the object context
    object_context = event["getObjectContext"]
    request_route = object_context["outputRoute"]
    request_token = object_context["outputToken"]

   # Extract the user requested URL and the supporting access point arn
    user_request_url = event["userRequest"]["url"]
    supporting_access_point_arn = event["configuration"]["supportingAccessPointArn"]

    print("USER REQUEST URL: ", user_request_url)
   
   # The S3 object key is after the Host name in the user request URL.
   # The user request URL looks something like this, 
   # https://<User Request Host>/ContactUs/contact-us.txt#fr.
   # The target language code in the S3 GET request is after the "#"
    
    user_request_url = unquote(user_request_url)
    result = user_request_url.split("#")
    user_request_url = result[0]
    targetLang = result[1]
    
   # Extract the S3 Object Key from the user requested URL
    s3Key = str(Path(urlparse(user_request_url).path).relative_to('/'))
       
   # Get the original object from S3
    s3 = boto3.resource('s3')
    
   # To get the original object from S3,use the supporting_access_point_arn 
    s3Obj = s3.Object(supporting_access_point_arn, s3Key).get()
    srcText = s3Obj['Body'].read()
    srcText = srcText.decode('utf-8')

   # Translate original text
    translateClient = boto3.client('translate')
    response = translateClient.translate_text(
                                                Text = srcText,
                                                SourceLanguageCode='en',
                                                TargetLanguageCode=targetLang)
    
  # Write object back to S3 Object Lambda
    s3client = boto3.client('s3')
    s3client.write_get_object_response(
                                        Body=response['TranslatedText'],
                                        RequestRoute=request_route,
                                        RequestToken=request_token )
    
    return { 'statusCode': 200 }

The code in the Lambda function:

  • Extracts the outputRoute and outputToken from the object context. This defines where the WriteGetObjectResponse request is delivered.
  • Extracts the user-requested url from the event object.
  • Parses the S3 object key and the target language that is appended to the S3 object key.
  • Calls S3 GetObject to fetch the raw text of the source object.
  • Invokes Amazon Translate with the raw text extracted.
  • Puts the translated output back to S3 using the WriteGetObjectResponse API.

Configuring the Lambda IAM role

The Lambda function needs permissions to call back to the S3 Object Lambda access point with the WriteGetObjectResponse. It also needs permissions to call S3 GetObject and Amazon Translate. Add the following permissions to the Lambda execution role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3ObjectLambdaAccess",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3-object-lambda:WriteGetObjectResponse"
            ],
            "Resource": [
                "<arn of your S3 access point/*>”,
                "<arn of your Object Lambda accesspoint>"
            ]
        },
        {
            "Sid": "AmazonTranslateAccess",
            "Effect": "Allow",
            "Action": "translate:TranslateText",
            "Resource": "*"
        }
    ]
}

Deploying the Lambda using an AWS SAM template

Alternatively, deploy the Lambda function with the IAM role by using an AWS SAM template. The code for the Lambda function and the AWS SAM template is available for download from GitHub.

Creating the S3 Access Point

  1. Navigate to the S3 console and create a bucket with a unique name.
  2. In the S3 console, select “Access Points” and choose “Create access point”. Enter a name for the access point.
  3. For Bucket name, enter the S3 bucket name you entered in step 1.Create access point

This access point is the supporting access point for the Object Lambda Access Point you create in the next step. Keep all other settings on this page as default.

After creating the S3 access point, create the S3 Object Lambda Access Point using the supporting Access Point. The Lambda function you created earlier uses the supporting Access Point to download the original untransformed objects from S3.

Create Object Lambda Access Point

In the S3 console, go to the Object Lambda Access Point configuration and create an Object Lambda Access Point. Enter a name.

Create Object Lambda Access Point

For the Lambda function configurations, associate this with the Lambda function created earlier. Select the latest version of the Lambda function and keep all other settings as default.

Select Lambda function

To understand how to use the other settings of the S3 Object Lambda configuration, refer to the product documentation.

Testing dynamic content translation

In this section, you create a Python script and invoke the S3 GetObject API twice. First, against the S3 bucket and then against the Object Lambda Access Point. You can then compare the output to see how content is transformed using Object Lambda:

  1. Upload a text file to the S3 bucket using the Object Lambda Access Point you configured. For example, upload a sample “Contact Us” file in English to S3.
  2. To use the object Lambda Access Point, locate its ARN from the Properties tab of the Object Lambda Access Point.Object Lambda Access Point
  3. Create a local file called s3ol_client.py that contains the following Python script:
    import json
    import boto3
    import sys, getopt
      
    def main(argv):
    	
      try:	
        targetLang = sys.argv[1]
        print("TargetLang = ", targetLang)
        
        s3 = boto3.client('s3')
        s3Bucket = "my-s3ol-bucket"
        s3Key = "ContactUs/contact-us.txt"
    
        # Call get_object using the S3 bucket name
        response = s3.get_object(Bucket=s3Bucket,Key=s3Key)
        print("Original Content......\n")
        print(response['Body'].read().decode('utf-8'))
    
        print("\n")
    
        # Call get_object using the S3 Object Lambda access point ARN
        s3Bucket = "arn:aws:s3-object-lambda:us-west-2:123456789012:accesspoint/my-s3ol-access-point"
        s3Key = "ContactUs/contact-us.txt#" + targetLang
        response = s3.get_object(Bucket=s3Bucket,Key=s3Key)
        print("Transformed Content......\n")
        print(response['Body'].read().decode('utf-8'))
    
        return {'Success':200}             
      except:
        print("\n\nUsage: s3ol_client.py <Target Language Code>")
    
    #********** Program Entry Point ***********
    if __name__ == '__main__':
        main(sys.argv[1: ])
    
  4. Run the client program from the command line, passing the target language code as an argument. The full list of supported languages and codes in Amazon Translate can be found here.python s3ol_client.py "fr"

The output looks like this:

Example output

The first output is the original content that was retrieved when calling GetObject with the S3 bucket name. The second output is the transformed content when calling GetObject against the Object Lambda access point. The content is transformed by the Lambda function as it is being retrieved, and translated to French in near-real time.

Conclusion

This blog post shows how you can use S3 Object Lambda with Amazon Translate to simplify dynamic content translation by using a data driven approach. With user-provided data as arguments, you can dynamically transform content in S3 and generate a new object.

It is not necessary to create a copy of this new object in S3 before returning it to the client. We also saw that it is not necessary for the object with the same name to exist in the S3 bucket when using S3 Object Lambda. This pattern can be used to address several real world use cases that can benefit from the ability to transform and generate S3 objects on the fly.

For more serverless learning resources, visit Serverless Land.

Implementing a LIFO task queue using AWS Lambda and Amazon DynamoDB

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/implementing-a-lifo-task-queue-using-aws-lambda-and-amazon-dynamodb/

This post was written by Diggory Briercliffe, Senior IoT Architect.

When implementing a task queue, you can use Amazon SQS standard or FIFO (First-In-First-Out) queue types. Both queue types give priority to tasks created earlier over tasks that are created later. However, there are use cases where you need a LIFO (Last-In-First-Out) queue.

This post shows how to implement a serverless LIFO task queue. This uses AWS Lambda, Amazon DynamoDB, AWS Serverless Application Model (AWS SAM), and other AWS Serverless technologies.

The LIFO task queue gives priority to newer queue tasks over earlier tasks. Under heavy load, earlier tasks are deprioritized and eventually removed. This is useful when your workload must communicate with a system that is throughput-constrained and newer tasks should have priority.

To help understand the approach, consider the following use case. As part of optimizing the responsiveness of a mobile application, an IoT application validates device IP addresses after connecting to AWS IoT Core. Users open the application soon after the device connects so the most recent connection events should take priority for the validation work.

If the validation work is not done at connection time, it can be done later. A legacy system validates the IP addresses, but its throughput capacity cannot match the peak connection rate of the IoT devices. A LIFO queue can manage this load, by prioritizing validation of newer connection events. It can buffer or load shed earlier connection event validation.

For a more detailed discussion around insurmountable queue backlogs and queuing theory, read “Avoiding insurmountable queue backlogs” in the Amazon Builders’ Library.

Example application

An example application implementing the LIFO queue approach is available at https://github.com/aws-samples/serverless-lifo-queue-demonstration.

The application uses AWS SAM and the Lambda functions are written in Node.js. The AWS SAM template describes AWS resources required by the application. These include a DynamoDB table, Lambda functions, and Amazon SNS topics.

The README file contains instructions on deploying and testing the application, with detailed information on how it works.

Overview

The example application has the following queue characteristics:

  1. Newer queue tasks are prioritized over earlier tasks.
  2. Queue tasks are buffered if they cannot be processed.
  3. Queue tasks are eventually deleted if they are never processed, such as when the queue is under insurmountable load.
  4. Correct queue task state transition is maintained (such as PENDING to TAKEN, but not PENDING to SUCCESS).

A DynamoDB table stores queue task items. It uses the following DynamoDB features:

  • A global secondary index (GSI) sorts queue task items by a created timestamp, in reverse chronological (LIFO) order.
  • Update expressions and condition expressions provide atomic and exclusive queue task item updates. This prevents duplicate processing of queue tasks and ensures that the queue task state transitions are valid.
  • Time to live (TTL) deletes queue task items once they expire. Under insurmountable load, this ensures that tasks are deleted if they are never processed from the queue. It also deletes queue task items once they have been processed.
  • DynamoDB Streams invoke a Lambda function when new queue task items are inserted into the table and must be processed.

The application consists of the following resources defined in the AWS SAM template:

  • QueueTable: A DynamoDB table containing queue task items, which is configured for DynamoDB Streams to invoke a TriggerFunction.
  • TriggerFunction: A Lambda function, which governs triggering of queue task processing. Source code: app/trigger.js
  • ProcessTasksFunction: A Lambda function, which processes queue tasks and ensures consistent queue task state flow. Source code: app/process_tasks.js
  • CreateTasksFunction: A Lambda function, which inserts queue task items into the QueueTable. Source code: app/create_tasks.js
  • TriggerTopic: An SNS topic which TriggerFunction subscribes to.
  • ProcessTasksTopic: An SNS topic which ProcessTasksFunction subscribes to.

The following diagram illustrates how those resources interact to implement the LIFO queue.

LIFO Architecture diagram

LIFO Architecture diagram

  1. CreateTasksFunction inserts queue task items into QueueTable with PENDING state.
  2. A DynamoDB stream invokes TriggerFunction for all queue task item activity in QueueTable.
  3. TriggerFunction publishes a notification on ProcessTasksTopic if queue tasks should be processed.
  4. ProcessTasksFunction subscribes to ProcessTasksTopic.
  5. ProcessTasksFunction queries for PENDING queue task items in QueueTable for up to 1 minute, or until no PENDING queue task items remain.
  6. ProcessTasksFunction processes each PENDING queue task by calling the throughput constrained legacy system.
  7. ProcessTasksFunction updates each queue task item during processing to reflect state (first to TAKEN, and then to SUCCESS, FAILURE, or PENDING).
  8. ProcessTasksFunction publishes an SNS notification on TriggerTopic if PENDING tasks remain in the queue.
  9. TriggerFunction subscribes to TriggerTasksTopic.

Application activity continues while DynamoDB Streams receives QueueTable events (2) or TriggerTasksTopic receives notifications (9).

LIFO queue DynamoDB table

A DynamoDB table stores the LIFO queue task items. The AWS SAM template defines this resource (named QueueTable):

  • Each item in the table represents a queue task. It has the item attributes taskId (hash key), taskStatus, taskCreated, and taskUpdated.
  • The table has a single global secondary index (GSI) with taskStatus as the hash key and taskCreated as the range key. This GSI is fundamental to LIFO queue characteristics. It allows you to query for PENDING queue tasks, in reverse chronological order, so that the newest tasks can be processed first.
  • The DynamoDB TTL attribute causes earlier queue tasks to expire and be deleted. This prevents the queue from growing indefinitely if there is insurmountable load.
  • DynamoDB Streams invokes the TriggerFunction Lambda function for all changes in QueueTable.

Triggering queue task processing

The application continuously processes all PENDING queue tasks until there is none remaining. With no PENDING queue tasks, the application will be idle.

As the application is serverless, task processing is triggered by events. If a single Lambda function cannot process the volume of PENDING tasks, the application notifies itself so that processing can continue in another invocation. This is a tail call, which is an SNS notification sent by ProcessTasksFunction to TriggerTopic.

The Lambda functions, which collaborate on managing the LIFO queue are:

  • TriggerFunction is a proxy to ProcessTasksFunction and decides if task processing should be triggered. This function is invoked by DynamoDB Streams events on item changes in QueueTable or by a tail call SNS notification received from TriggerTopic.
  • ProcessTasksFunction performs the processing of queue tasks and implements the LIFO queue behavior. An SNS notification published on ProcessTasksTopic invokes this function.

Processing queue task items

The ProcessTasksFunction function processes queue tasks:

  1. The function is invoked by an SNS notification on ProcessTasksTopic.
  2. While the function runs, it polls QueueTable for PENDING queue tasks.
  3. The function processes each queue task and then updates the item.
  4. The function stops polling after 1 minute or if there are no PENDING queue tasks remaining.
  5. If there are more PENDING tasks in the queue, the function triggers another task. It sends a tail call SNS notification to TriggerTopic.

This uses DynamoDB expressions to ensure that tasks are not processed more than once during periods of concurrent function invocations. To prevent higher concurrency, the reserved concurrent executions attribute is set to 1.

Before processing a queue task, the taskStatus item attribute is transitioned from PENDING to TAKEN. Following queue task processing, the taskStatus item attribute is transitioned from TAKEN to SUCCESS or FAILURE.

If a queue task cannot be processed (for example, an external system has reached capacity), the item taskStatus attribute is set to PENDING again. Any aging PENDING queue tasks that cannot be processed are buffered. They are eventually deleted once they expire, due to the TTL configuration.

Querying for queue task items

To get the most recently created PENDING queue tasks, query the task-status-created-index GSI. The following shows the DynamoDB query action request parameters for the task-status-created-index. By using a Limit of 10 and setting ScanIndexForward to false, it retrieves the 10 most recently created queue task items:

{
  "TableName": "QueueTable",
  "IndexName": "task-status-created-index",
  "ExpressionAttributeValues": {
    ":taskStatus": {
      "S": "PENDING"
    }
  },
  "KeyConditionExpression": "taskStatus = :taskStatus",
  "Limit": 10,
  "ScanIndexForward": false
}

Updating queue tasks items

The following code shows request parameters for the DynamoDB UpdateItem action. This sets the taskStatus attribute of a queue task item (to TAKEN from PENDING). The update expression and condition expression ensure that the taskStatus is set (to TAKEN) only if the current value is as expected (from PENDING). It also ensures that the update is atomic. This prevents more-than-once processing of a queue task.

{
  "TableName": "QueueTable",
  "Key": {
    "taskId": {
      "S": "task-123"
    }
  },
  "UpdateExpression": "set taskStatus = :toTaskStatus, taskUpdated = :taskUpdated",
  "ConditionExpression": "taskStatus = :fromTaskStatus",
  "ExpressionAttributeValues": {
    ":fromTaskStatus": {
      "S": "PENDING"
    },
    ":toTaskStatus": {
      "S": "TAKEN"
    },
    ":taskUpdated": {
      "N": "1623241938151"
    }
  }
}

Conclusion

This post describes how to implement a LIFO queue with AWS Serverless technologies, using an example application as an example. Newer tasks in the queue are prioritized over earlier tasks. Tasks that cannot be processed are buffered and eventually load shed. This helps for use cases with heavy load and where newer queue tasks must take priority.

For more serverless learning resources, visit Serverless Land.

Using GitHub Actions to deploy serverless applications

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/using-github-actions-to-deploy-serverless-applications/

This post is written by Gopi Krishnamurthy, Senior Solutions Architect.

Continuous integration and continuous deployment (CI/CD) is one of the major DevOps components. This allows you to build, test, and deploy your applications rapidly and reliably, while improving quality and reducing time to market.

GitHub is an AWS Partner Network (APN) with the AWS DevOps Competency. GitHub Actions is a GitHub feature that allows you to automate tasks within your software development lifecycle. You can use GitHub Actions to run a CI/CD pipeline to build, test, and deploy software directly from GitHub.

The AWS Serverless Application Model (AWS SAM) is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. With a few lines per resource, you can define the application you want and model it using YAML.

During deployment, AWS SAM transforms and expands the AWS SAM syntax into AWS CloudFormation syntax, enabling you to build serverless applications faster. The AWS SAM CLI allows you to build, test, and debug applications locally, defined by AWS SAM templates. You can also use the AWS SAM CLI to deploy your applications to AWS. For AWS SAM example code, see the serverless patterns collection.

In this post, you learn how to create a sample serverless application using AWS SAM. You then use GitHub Actions to build, and deploy the application in your AWS account.

New GitHub action setup-sam

A GitHub Actions runner is the application that runs a job from a GitHub Actions workflow. You can use a GitHub hosted runner, which is a virtual machine hosted by GitHub with the runner application installed. You can also host your own runners to customize the environment used to run jobs in your GitHub Actions workflows.

AWS has released a GitHub action called setup-sam to install AWS SAM, which is pre-installed on GitHub hosted runners. You can use this action to install a specific, or the latest AWS SAM version.

This demo uses AWS SAM to create a small serverless application using one of the built-in templates. When the code is pushed to GitHub, a GitHub Actions workflow triggers a GitHub CI/CD pipeline. This builds, and deploys your code directly from GitHub to your AWS account.

Prerequisites

  1. A GitHub account: This post assumes you have the required permissions to configure GitHub repositories, create workflows, and configure GitHub secrets.
  2. Create a new GitHub repository and clone it to your local environment. For this example, create a repository called github-actions-with-aws-sam.
  3. An AWS account with permissions to create the necessary resources.
  4. Install AWS Command Line Interface (CLI) and AWS SAM CLI locally. This is separate from using the AWS SAM CLI in a GitHub Actions runner. If you use AWS Cloud9 as your integrated development environment (IDE), AWS CLI and AWS SAM are pre-installed.
  5. Create an Amazon S3 bucket in your AWS account to store the build package for deployment.
  6. An AWS user with access keys, which the GitHub Actions runner uses to deploy the application. The user also write requires access to the S3 bucket.

Creating the AWS SAM application

You can create a serverless application by defining all required resources in an AWS SAM template. AWS SAM provides a number of quick-start templates to create an application.

  1. From the CLI, open a terminal, navigate to the parent of the cloned repository directory, and enter the following:
  2. sam init -r python3.8 -n github-actions-with-aws-sam --app-template "hello-world"
  3. When asked to select package type (zip or image), select zip.

This creates an AWS SAM application in the root of the repository named github-actions-with-aws-sam, using the default configuration. This consists of a single AWS Lambda Python 3.8 function invoked by an Amazon API Gateway endpoint.

To see additional runtimes supported by AWS SAM and options for sam init, enter sam init -h.

Local testing

AWS SAM allows you to test your applications locally. AWS SAM provides a default event in events/event.json that includes a message body of {\"message\": \"hello world\"}.

    1. Invoke the HelloWorldFunction Lambda function locally, passing the default event:
    2. sam local invoke HelloWorldFunction -e events/event.json
    3. The function response is:
    4. {"message": "hello world"}

    5. Test the API Gateway functionality in front of the Lambda function by first starting the API locally:
    6. sam local start-api
    7. AWS SAM launches a Docker container with a mock API Gateway endpoint listening on localhost:3000.
    8. Use curl to call the hello API:
    curl http://127.0.0.1:3000/hello

    The API response should be:

    {"message": "hello world"}

    Creating the sam-pipeline.yml file

    GitHub CI/CD pipelines are configured using a YAML file. This file configures what specific action triggers a workflow, such as push on main, and what workflow steps are required.

    In the root of the repository containing the files generated by sam init, create the directory: .github/workflows.

    1. Create a new file called sam-pipeline.yml under the .github/workflows directory.
    2. sam-pipeline.yml file

      sam-pipeline.yml file

    3. Edit the sam-pipeline.yml file and add the following:
    4. on:
        push:
          branches:
            - main
      jobs:
        build-deploy:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v2
            - uses: actions/setup-python@v2
            - uses: aws-actions/setup-sam@v1
            - uses: aws-actions/configure-aws-credentials@v1
              with:
                aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
                aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
                aws-region: ##region##
            # sam build 
            - run: sam build --use-container
      
      # Run Unit tests- Specify unit tests here 
      
      # sam deploy
            - run: sam deploy --no-confirm-changeset --no-fail-on-empty-changeset --stack-name sam-hello-world --s3-bucket ##s3-bucket## --capabilities CAPABILITY_IAM --region ##region## 
      
    5. Replace ##s3-bucket## with the name of the S3 bucket previously created to store the deployment package.
    6. Replace both ##region## with your AWS Region.

    The configuration triggers the GitHub Actions CI/CD pipeline when code is pushed to the main branch. You can amend this if you are using another branch. For a full list of supported events, refer to GitHub documentation page.

    You can further customize the sam build –use-container command if necessary. By default the Docker image used to create the build artifact is pulled from Amazon ECR Public. The default Python 3.8 image in this example is based on the language specified during sam init. To pull a different container image, use the --build-image option as specified in the documentation.

    The AWS CLI and AWS SAM CLI are installed in the runner using the GitHub action setup-sam. To install a specific version, use the version parameter.

    uses: aws-actions/setup-sam@v1
    with:
      version: 1.23.0

    As part of the CI/CD process, we recommend you scan your code for quality and vulnerabilities in bundled libraries. You can find these security offerings from our AWS Lambda Technology Partners.

    Configuring AWS credentials in GitHub

    The GitHub Actions CI/CD pipeline requires AWS credentials to access your AWS account. The credentials must include AWS Identity and Access Management (IAM) policies that provide access to Lambda, API Gateway, AWS CloudFormation, S3, and IAM resources.

    These credentials are stored as GitHub secrets within your GitHub repository, under Settings > Secrets. For more information, see “GitHub Actions secrets”.

    In your GitHub repository, create two secrets named AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and enter the key values. We recommend following IAM best practices for the AWS credentials used in GitHub Actions workflows, including:

    • Do not store credentials in your repository code. Use GitHub Actions secrets to store credentials and redact credentials from GitHub Actions workflow logs.
    • Create an individual IAM user with an access key for use in GitHub Actions workflows, preferably one per repository. Do not use the AWS account root user access key.
    • Grant least privilege to the credentials used in GitHub Actions workflows. Grant only the permissions required to perform the actions in your GitHub Actions workflows.
    • Rotate the credentials used in GitHub Actions workflows regularly.
    • Monitor the activity of the credentials used in GitHub Actions workflows.

    Deploying your application

    Add all the files to your local git repository, commit the changes, and push to GitHub.

    git add .
    git commit -am "Add AWS SAM files"
    git push

    Once the files are pushed to GitHub on the main branch, this automatically triggers the GitHub Actions CI/CD pipeline as configured in the sam-pipeline.yml file.

    The GitHub actions runner performs the pipeline steps specified in the file. It checks out the code from your repo, sets up Python, and configures the AWS credentials based on the GitHub secrets. The runner uses the GitHub action setup-sam to install AWS SAM CLI.

    The pipeline triggers the sam build process to build the application artifacts, using the default container image for Python 3.8.

    sam deploy runs to configure the resources in your AWS account using the securely stored credentials.

    To view the application deployment progress, select Actions in the repository menu. Select the workflow run and select the job name build-deploy.

    GitHub Actions progress

    GitHub Actions progress

    If the build fails, you can view the error message. Common errors are:

    • Incompatible software versions such as the Python runtime being different from the Python version on the build machine. Resolve this by installing the proper software versions.
    • Credentials could not be loaded. Verify that AWS credentials are stored in GitHub secrets.
    • Ensure that your AWS account has the necessary permissions to deploy the resources in the AWS SAM template, in addition to the S3 deployment bucket.

    Testing the application

    1. Within the workflow run, expand the Run sam deploy section.
    2. Navigate to the AWS SAM Outputs section. The HelloWorldAPI value shows the API Gateway endpoint URL deployed in your AWS account.
    AWS SAM outputs

    AWS SAM outputs

  1. Use curl to test the API:
curl https://<api-id>.execute-api.us-east-1.amazonaws.com/Prod/hello/

The API response should be:
{"message": "hello world"}

Cleanup

To remove the application resources, navigate to the CloudFormation console and delete the stack. Alternatively, you can use an AWS CLI command to remove the stack:

aws cloudformation delete-stack --stack-name sam-hello-world

Empty, and delete the S3 deployment bucket.

Conclusion

GitHub Actions is a GitHub feature that allows you to run a CI/CD pipeline to build, test, and deploy software directly from GitHub. AWS SAM is an open-source framework for building serverless applications.

In this post, you use GitHub Actions CI/CD pipeline functionality and AWS SAM to create, build, test, and deploy a serverless application. You use sam init to create a serverless application and tested the functionality locally. You create a sam-pipeline.yml file to define the pipeline steps for GitHub Actions.

The GitHub action setup-sam installed AWS SAM on the GitHub hosted runner. The GitHub Actions workflow uses sam build to create the application artifacts and sam deploy to deploy them to your AWS account.

For more serverless learning resources, visit https://serverlessland.com.

Announcing migration of the Java 8 runtime in AWS Lambda to Amazon Corretto

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/announcing-migration-of-the-java-8-runtime-in-aws-lambda-to-amazon-corretto/

This post is written by Jonathan Tuliani, Principal Product Manager, AWS Lambda.

What is happening?

Beginning July 19, 2021, the Java 8 managed runtime in AWS Lambda will migrate from the current Open Java Development Kit (OpenJDK) implementation to the latest Amazon Corretto implementation.

To reflect this change, the AWS Management Console will change how Java 8 runtimes are displayed. The display name for runtimes using the ‘java8’ identifier will change from ‘Java 8’ to ‘Java 8 on Amazon Linux 1’. The display name for runtimes using the ‘java8.al2’ identifier will change from ‘Java 8 (Corretto)’ to ‘Java 8 on Amazon Linux 2’. The ‘java8’ and ‘java8.al2’ identifiers themselves, as used by tools such as the AWS CLI, CloudFormation, and AWS SAM, will not change.

Why are you making this change?

This change enables customers to benefit from the latest innovations and extended support of the Amazon Corretto JDK distribution. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the OpenJDK. Corretto is certified as compatible with the Java SE standard and used internally at Amazon for many production services.

Amazon is committed to Corretto, and provides regular updates that include security fixes and performance enhancements. With this change, these benefits are available to all Lambda customers. For more information on improvements provided by Amazon Corretto 8, see Amazon Corretto 8 change logs.

How does this affect existing Java 8 functions?

Amazon Corretto 8 is designed as a drop-in replacement for OpenJDK 8. Most functions benefit seamlessly from the enhancements in this update without any action from you.

In rare cases, switching to Amazon Corretto 8 introduces compatibility issues. See below for known issues and guidance on how to verify compatibility in advance of this change.

When will this happen?

This migration to Amazon Corretto takes place in several stages:

  • June 15, 2021: Availability of Lambda layers for testing the compatibility of functions with the Amazon Corretto runtime. Start of AWS Management Console changes to java8 and java8.al2 display names.
  • July 19, 2021: Any new functions using the java8 runtime will use Amazon Corretto. If you update an existing function, it will transition to Amazon Corretto automatically. The public.ecr.aws/lambda/java:8 container base image is updated to use Amazon Corretto.
  • August 16, 2021: For functions that have not been updated since June 28, AWS will begin an automatic transition to the new Corretto runtime.
  • September 10, 2021: Migration completed.

These changes are only applied to functions not using the arn:aws:lambda:::awslayer:Java8Corretto or arn:aws:lambda:::awslayer:Java8OpenJDK layers described below.

Which of my Lambda functions are affected?

Lambda supports two versions of the Java 8 managed runtime: the java8 runtime, which runs on Amazon Linux 1, and the java8.al2 runtime, which runs on Amazon Linux 2. This change only affects functions using the java8 runtime. Functions the java8.al2 runtime are already using the Amazon Corretto implementation of Java 8 and are not affected.

The following command shows how to use the AWS CLI to list all functions in a specific Region using the java8 runtime. To find all such functions in your account, repeat this command for each Region:

aws lambda list-functions --function-version ALL --region us-east-1 --output text --query "Functions[?Runtime=='java8'].FunctionArn"

What do I need to do?

If you are using the java8 runtime, your functions will be updated automatically. For production workloads, we recommend that you test functions in advance for compatibility with Amazon Corretto 8.

For Lambda functions using container images, the existing public.ecr.aws/lambda/java:8 container base image will be updated to use the Amazon Corretto Java implementation. You must manually update your functions to use the updated container base image.

How can I test for compatibility with Amazon Corretto 8?

If you are using the java8 managed runtime, you can test functions with the new version of the runtime by adding the layer reference arn:aws:lambda:::awslayer:Java8Corretto to the function configuration. This layer instructs the Lambda service to use the Amazon Corretto implementation of Java 8. It does not contain any data or code.

If you are using container images, update the JVM in your image to Amazon Corretto for testing. Here is an example Dockerfile:

FROM public.ecr.aws/lambda/java:8

# Update the JVM to the latest Corretto version
## Import the Corretto public key
rpm --import https://yum.corretto.aws/corretto.key

## Add the Corretto yum repository to the system list
curl -L -o /etc/yum.repos.d/corretto.repo https://yum.corretto.aws/corretto.repo

## Install the latest version of Corretto 8
yum install -y java-1.8.0-amazon-corretto-devel

# Copy function code and runtime dependencies from Gradle layout
COPY build/classes/java/main ${LAMBDA_TASK_ROOT}
COPY build/dependency/* ${LAMBDA_TASK_ROOT}/lib/

# Set the CMD to your handler
CMD [ "com.example.LambdaHandler::handleRequest" ]

Can I continue to use the OpenJDK version of Java 8?

You can continue to use the OpenJDK version of Java 8 by adding the layer reference arn:aws:lambda:::awslayer:Java8OpenJDK to the function configuration. This layer tells the Lambda service to use the OpenJDK implementation of Java 8. It does not contain any data or code.

This option gives you more time to address any code incompatibilities with Amazon Corretto 8. We do not recommend that you use this option to continue to use Lambda’s OpenJDK Java implementation in the long term. Following this migration, it will no longer receive bug fix and security updates. After addressing any compatibility issues, remove this layer reference so that the function uses the Lambda-Amazon Corretto managed implementation of Java 8.

What are the known differences between OpenJDK 8 and Amazon Corretto 8 in Lambda?

Amazon Corretto caches TCP sessions for longer than OpenJDK 8. Functions that create new connections (for example, new AWS SDK clients) on each invoke without closing them may experience an increase in memory usage. In the worst case, this could cause the function to consume all the available memory, which results in an invoke error and a subsequent cold start.

We recommend that you do not create AWS SDK clients in your function handler on every function invocation. Instead, create SDK clients outside the function handler as static objects that can be used by multiple invocations. For more information, see static initialization in the Lambda Operator Guide.

If you must use a new client on every invocation, make sure it is shut down at the end of every invocation. This avoids TCP session caches using unnecessary resources.

What if I need additional help?

Contact AWS Support, the AWS Lambda discussion forums, or your AWS account team if you have any questions or concerns.

For more serverless learning resources, visit Serverless Land.

Extending SaaS products with serverless functions

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/extending-saas-products-with-serverless-functions/

This post was written by Santiago Cardenas, Sr Partner SA. and Nir Mashkowski, Principal Product Manager.

Increasingly, customers turn to software as a service (SaaS) solutions for the potential of lowering the total cost of ownership (TCO). This enables customers to focus their teams on business priorities instead of managing and maintaining software and infrastructure. Startups are building SaaS products for a wide variety of common application types to take advantage of these market needs.

As SaaS accelerates adoption, enterprise customers expect the same capabilities that are available with traditional, on-premises software. They want the ability to customize system behavior and use rich integrations that can help build solutions rapidly.

For customization and extensibility, many independent software vendors (ISVs) are building application programming interfaces (APIs) and integration hooks. To extend these capabilities, many SaaS builders expose a common set of APIs:

  • Event APIs emit events when SaaS entities change. Synchronous event APIs block the SaaS action until the API completes a request. Asynchronous are non-blocking and use mechanisms like pub/sub and webhooks to inform the caller of updates. Event APIs are used for many purposes, such as enriching incoming data or triggering workflows.
  • CRUD APIs allow developers to interact with entities within the SaaS product. They can be used by mobile or web clients to add, update, and remove records, for example.
  • Schema APIs allow developers to create data entities in the SaaS product, such as tables, key-value stores, or document repositories.
  • User experience (UX) components. Many SaaS products include an SDK that helps provide a consistent look-and-feel and built-in support for common functions, such as authentication. Components are sometimes delivered as code libraries or as an online API that renders the UI.

Business systems expose different subsets of the APIs based on the application domain. Extensibility models are built on top of those APIs and can take various different forms. ISVs use these APIs to build features such as “no code” workflow engines, UX, and report generators. In those cases, the SaaS product runs a domain-specific language (DSL) where it controls compute, storage, and memory consumption.

Figure 1: Example of various APIs providing extensibility within a SaaS app

This level of customization is acceptable for many business users. However, for more sophisticated customization, this requires the ability to write custom code. When coding is needed, some business systems choose to provide sandboxing for the user code within the service. Others choose to ask developers to host the extensibility model themselves.

The growth of vendor-hosted SaaS extensions

First-generation SaaS products essentially “lift and shift” on-premises enterprise software, where each customer has a copy of the entire stack. This single tenant model offers simplicity, a smaller blast radius, and faster time to market.

Newer, born-in-the-cloud SaaS products implement a multi-tenant approach, where all resources are shared across customers. This model may be easier to maintain but can present challenges for handling security, isolation, and resource allocation.

Multi-tenancy challenges are harder when customers can run custom code inside the SaaS infrastructure. To solve this, SaaS builders may start with a customer hosted approach, where customers implement their own extensions by consuming SaaS APIs. This means customers must learn and install an SDK, deploy, and maintain an app in their cloud. This often results in higher cost and slower time to market.

To simplify this model, SaaS builders are finding ways to allow developers to write code directly within the SaaS product. The event driven, pay-per-execution, and polyglot nature of serverless functions provides new capabilities for implementing SaaS extensibility. This model is called vendor hosted SaaS extensions.

SaaS builders are using AWS Lambda for serverless functions to provide flexible compute options to their customers. The goal is to abstract away and simplify the consumption model. AWS provides SaaS builders with features and controls to customize the execution environments as part of their own SaaS product. This allows SaaS owners more flexibility when deciding on isolation models, usability, and cost considerations.

Isolating tenant requests

Isolation of customer requests is important both at the product level and at the tenant level. Product-level isolation focuses on controlling and enforcing the access to data between tenants. It ensures that one tenant is separated from another tenant’s functions. Tenant-level isolation focuses on resources allocated to serve requests. These may include identity, network and internet access, file system access, and memory/CPU allocation.

Figure 2: Example of hierarchical levels of abstraction

Usability

SaaS product owners can allow customers to use familiar programming languages within the serverless functions. This allows customers to grow with the service and potentially host and scale independently, using their own infrastructure.

Usability considers the domain and industry of the product. For example, if the SaaS product enables data processing, it may enable invocation of serverless functions during these workflows. Additionally, these functions may provide the customer the context of the user, application, tenant, and the domain. A streamlined, opinionated deployment workflow that abstracts away initial configuration can also aid customer adoption.

Managing costs

Cost is an important factor in driving adoption. It’s an important differentiator to pay only for the resources used, while being able to scale in response to events. This can help reduce costs that are passed on to SaaS customers.

Examples of SaaS product extensibility

Multiple AWS Partners are extending their SaaS product using Lambda for on-demand scalable compute. This enables them to focus on enriching the customer experience that is associated with their business domain. Examples include:

  • Segment Functions, which seamlessly integrates as a source or destination. The service uses code snippets to allow customers to enrich data, enforce consistency, and connect to APIs and services that power their workflows.
  • Freshworks’ Neo platform provides extensibility using the concept of apps. These are powered by Lambda functions hosting the core business logic and backends. Apps are triggered by unplanned and scheduled Freshworks events (customer support tickets, IT service cases, contacts, and deal updates), in addition to app-specific and external events.
  • Netlify Functions enables customers to supercharge frontend code with functions in their development workflow. These can power automated triggers, connect to third-party APIs, or provide user authentication.

All of these SaaS partners abstract away the deployment, versioning, and configuration of custom code using Lambda.

Conclusion

As customers increasingly use SaaS solutions in their businesses, they want the same customization and extensibility available in on-premises solutions. SaaS partners have developed APIs and integration hooks to help address this need. For more sophisticated customization, products enable custom code to run within their SaaS workflows.

This presents SaaS partners with isolation, usability, and cost challenges and many of them are now using serverless functions to address these challenges. Lambda provides a pay-per-value compute service that scales automatically to meet customer demand. Segment Functions, Freshworks, and Netlify Functions have all used Lambda to provide extensibility to their customers.

Lambda continues to develop features and functionality to power the extensibility of SaaS products. We look forward to seeing the new ways you use Lambda to extend your SaaS product for your customers. Share your Lambda extensibility story with us at [email protected].

For more serverless learning resources, visit Serverless Land.

Building server-side rendering for React in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-server-side-rendering-for-react-in-aws-lambda/

This post is courtesy of Roman Boiko, Solutions Architect.

React is a popular front-end framework used to create single-page applications (SPAs). It is rendered and run on the client-side in the browser. However, for SEO or performance reasons, you may need to render parts of a React application on the server. This is where the server-side rendering (SSR) is useful.

This post introduces the concepts and demonstrates rendering a React application with AWS Lambda. To deploy this solution and to provision the AWS resources, I use the AWS Cloud Development Kit (CDK). This is an open-source framework, which helps you reduce the amount of code required to automate deployment.

Overview

This solution uses Amazon S3, Amazon CloudFront, Amazon API Gateway, AWS Lambda, and Lambda@Edge. It creates a fully serverless SSR implementation, which automatically scales according to the workload. This solution addresses three scenarios.

1. A static React app hosted in an S3 bucket with a CloudFront distribution in front of the website. The backend is running behind API Gateway, implemented as a Lambda function. Here, the application is fully downloaded to the client and rendered in a web browser. It sends requests to the backend.

SSR app 1

2. The React app is rendered with a Lambda function. The CloudFront distribution is configured to forward requests from the /ssr path to the API Gateway endpoint. This calls the Lambda function where the rendering is happening. While rendering the requested page, the Lambda function calls the backend API to fetch the data. It returns a static HTML page with all the data. This page may be cached in CloudFront to optimize subsequent requests.

SSR app 2

 

3. The React app is rendered with a Lambda@Edge function. This scenario is similar but rendering happens at edge locations. The requests to /edgessr are handled by the Lambda@Edge function. This sends requests to the backend and returns a static HTML page.

SSR app 3

 

Walkthrough

The example application shows how the preceding scenarios are implemented with the AWS CDK. This solution requires:

This solution deploys a Lambda@Edge function so it must be provisioned in the US East (N. Virginia) Region.

To get started, download and configure the sample:

  1. From a terminal, clone the GitHub repository:
    git clone https://github.com/aws-samples/react-ssr-lambda
  2. Provide a unique name for the S3 bucket, which is created by the stack and used for React application hosting. Change the placeholder <your bucket name> to your bucket name. To install the solution, run:
    cd react-ssr-lambda
    cd ./cdk
    npm install
    npm run build
    cdk bootstrap
    cdk deploy SSRApiStack --outputs-file ../simple-ssr/src/config.json
    
    cd ../simple-ssr
    npm install
    npm run build-all
    cd ../cdk
    cdk deploy SSRAppStack --parameters mySiteBucketName=<your bucket name>
  3. Note the following values from the output:
    • SSRAppStack.CFURL – the URL of the CloudFront distribution. Its root path returns the React application stored in S3.
    • SSRAppStack.LambdaSSRURL – the URL of the CloudFront /ssr distribution, which returns a page rendered by the Lambda function.
    • SSRAppStack.LambdaEdgeSSRURL – the URL of the CloudFront /edgessr distribution, which returns a page rendered by Lambda@Edge function.Stack outputs
  4. In a browser, open each of the URLs from step 3. You see the same page with a different footer, indicating how it is rendered.Comparing the served pages

Understanding the React app

The application is created by the create-react-app utility. You can run and test this application locally by navigating to the simple-ssr directory and running the npm start command.

This small application consists of two components that render the list of products received from the backend. The App.js file sends the request, parses the result, and passes it to the component.

import React, { useEffect, useState } from "react";
import ProductList from "./components/ProductList";
import config from "./config.json";
import axios from "axios";

const App = ({ isSSR, ssrData }) => {
  const [err, setErr] = useState(false);
  const [result, setResult] = useState({ loading: true, products: null });
  useEffect(() => {
    const getData = async () => {
      const url = config.SSRApiStack.apiurl;
      try {
        let result = await axios.get(url);
        setResult({ loading: false, products: result.data });
      } catch (error) {
        setErr(error);
      }
    };
    getData();
  }, []);
  if (err) {
    return <div>Error {err}</div>;
  } else {
    return (
      <div>
        <ProductList result={result} />
      </div>
    );
  }
};

export default App;

Adding server-side rendering

To support SSR, I change the preceding application using several Lambda functions with the implementation. As I change the way data is retrieved from the backend, I remove this code from App.js. Instead, the data is retrieved in the Lambda function and injected into the application during the rendering process.

The new file SSRApp.js reflects these changes:

import React, { useState } from "react";
import ProductList from "./components/ProductList";

const SSRApp = ({ data }) => {
  const [result, setResult] = useState({ loading: false, products: data });
  return (
    <div>
      <ProductList result={result} />
    </div>
  );
};

export default SSRApp;

Next, I implement SSR logic in the Lambda function. For simplicity, I use React’s built-in renderToString method, which returns an HTML string. You can find the corresponding file in the simple-ssr/src/server/index.js. The handler function fetches data from the backend, renders the React components, and injects them into the HTML template. It returns the response to API Gateway, which responds to the client.

const handler = async function (event) {
  try {
    const url = config.SSRApiStack.apiurl;
    const result = await axios.get(url);
    const app = ReactDOMServer.renderToString(<SSRApp data={result.data} />);
    const html = indexFile.replace(
      '<div id="root"></div>',
      `<div id="root">${app}</div>`
    );
    return {
      statusCode: 200,
      headers: { "Content-Type": "text/html" },
      body: html,
    };
  } catch (error) {
    console.log(`Error ${error.message}`);
    return `Error ${error}`;
  }
};

For rendering the same code on Lambda@Edge, I change the code to work with CloudFront events and also modify the response format. This function searches for a specific path (/edgessr). All other logic stays the same. You can view the full code at simple-ssr/src/edge/index.js:

const handler = async function (event) {
  try {
    const request = event.Records[0].cf.request;
    if (request.uri === "/edgessr") {
      const url = config.SSRApiStack.apiurl;
      const result = await axios.get(url);
      const app = ReactDOMServer.renderToString(<SSRApp data={result.data} />);
      const html = indexFile.replace(
        '<div id="root"></div>',
        `<div id="root">${app}</div>`
      );
      return {
        status: "200",
        statusDescription: "OK",
        headers: {
          "cache-control": [
            {
              key: "Cache-Control",
              value: "max-age=100",
            },
          ],
          "content-type": [
            {
              key: "Content-Type",
              value: "text/html",
            },
          ],
        },
        body: html,
      };
    } else {
      return request;
    }
  } catch (error) {
    console.log(`Error ${error.message}`);
    return `Error ${error}`;
  }
};

The create-react-app utility configures tools such as Babel and webpack for the client-side React application. However, it is not designed to work with SSR. To make the functions work as expected, I transpile these into CommonJS format in addition to transpiling React JSX files. The standard tool for this task is Babel. To add it to this project, I create the configuration file .babelrc.json with instructions to transpile the functions into Node.js v12 format:

{
  "presets": [
    [
      "@babel/preset-env",
      {
        "targets": {
          "node": 12
        }
      }
    ],
    "@babel/preset-react"
  ]
}

I also include all the dependencies. I use the popular frontend tool webpack, which also works with Lambda functions. It adds only the necessary dependencies and minimizes the package size. For this purpose, I create configurations for both functions. You can find these in the webpack.edge.js and webpack.server.js files:

const path = require("path");

module.exports = {
  entry: "./src/edge/index.js",

  target: "node",

  externals: [],

  output: {
    path: path.resolve("edge-build"),
    filename: "index.js",
    library: "index",
    libraryTarget: "umd",
  },

  module: {
    rules: [
      {
        test: /\.js$/,
        use: "babel-loader",
      },
      {
        test: /\.css$/,
        use: "css-loader",
      },
    ],
  },
};

The result of running webpack is one file for each build. I use these files to deploy the Lambda and Lambda@Edge functions. To automate the build process, I add several scripts to package.json.

"build-server": "webpack --config webpack.server.js --mode=development",
"build-edge": "webpack --config webpack.edge.js --mode=development",
"build-all": "npm-run-all --parallel build build-server build-edge"

Launch the build by running npm run build-all.

Deploying the application

After the application successfully builds, I deploy to the AWS Cloud. I use AWS CDK for an infrastructure as code approach. The code is located in cdk/lib/ssr-stack.ts.

First, I create an S3 bucket for storing the static content and I pass the name of the bucket as a parameter. To ensure only CloudFront can access my S3 bucket, I use an access identity configuration:

const mySiteBucketName = new CfnParameter(this, "mySiteBucketName", {
      type: "String",
      description: "The name of S3 bucket to upload react application"
    });

const mySiteBucket = new s3.Bucket(this, "ssr-site", {
      bucketName: mySiteBucketName.valueAsString,
      websiteIndexDocument: "index.html",
      websiteErrorDocument: "error.html",
      publicReadAccess: false,
      //only for demo not to use in production
      removalPolicy: cdk.RemovalPolicy.DESTROY
    });

new s3deploy.BucketDeployment(this, "Client-side React app", {
      sources: [s3deploy.Source.asset("../simple-ssr/build/")],
      destinationBucket: mySiteBucket
    });

const originAccessIdentity = new cloudfront.OriginAccessIdentity(
      this,
      "ssr-oia"
    );
    mySiteBucket.grantRead(originAccessIdentity);

I deploy the Lambda function from the build directory and configure an integration with API Gateway. I also note the API Gateway domain name for later use in the CloudFront distribution.

const ssrFunction = new lambda.Function(this, "ssrHandler", {
      runtime: lambda.Runtime.NODEJS_12_X,
      code: lambda.Code.fromAsset("../simple-ssr/server-build"),
      memorySize: 128,
      timeout: Duration.seconds(5),
      handler: "index.handler"
    });

const ssrApi = new apigw.LambdaRestApi(this, "ssrEndpoint", {
      handler: ssrFunction
    });

const apiDomainName = `${ssrApi.restApiId}.execute-api.${this.region}.amazonaws.com`;

I configure the Lambda@Edge function. It’s important to create a function version explicitly to use with CloudFront:

const ssrEdgeFunction = new lambda.Function(this, "ssrEdgeHandler", {
      runtime: lambda.Runtime.NODEJS_12_X,
      code: lambda.Code.fromAsset("../simple-ssr/edge-build"),
      memorySize: 128,
      timeout: Duration.seconds(5),
      handler: "index.handler"
    });

const ssrEdgeFunctionVersion = new lambda.Version(
      this,
      "ssrEdgeHandlerVersion",
      { lambda: ssrEdgeFunction }
    );

Finally, I configure the CloudFront distribution to communicate with all the origins:

const distribution = new cloudfront.CloudFrontWebDistribution(
      this,
      "ssr-cdn",
      {
        originConfigs: [
          {
            s3OriginSource: {
              s3BucketSource: mySiteBucket,
              originAccessIdentity: originAccessIdentity
            },
            behaviors: [
              {
                isDefaultBehavior: true,
                lambdaFunctionAssociations: [
                  {
                    eventType: cloudfront.LambdaEdgeEventType.ORIGIN_REQUEST,
                    lambdaFunction: ssrEdgeFunctionVersion
                  }
                ]
              }
            ]
          },
          {
            customOriginSource: {
              domainName: apiDomainName,
              originPath: "/prod",
              originProtocolPolicy: cloudfront.OriginProtocolPolicy.HTTPS_ONLY
            },
            behaviors: [
              {
                pathPattern: "/ssr"
              }
            ]
          }
        ]
      }
    );

The template is now ready for deployment. This approach allows you to use this code in your Continuous Integration and Continuous Delivery/Deployment (CI/CD) pipelines to automate deployments of your SSR applications. Also, you can create a CDK construct to reuse this code in different applications.

Cleaning up

To delete all the resources used in this solution, run:

cd react-ssr-lambda/cdk
cdk destroy SSRApiStack
cdk destroy SSRAppStack

Conclusion

This post demonstrates two ways you can implement and deploy a solution for server-side rendering in React applications, by using Lambda or Lambda@Edge.

It also shows how to use open-source tools and AWS CDK to automate the building and deployment of such applications.

For more serverless learning resources, visit Serverless Land.

Building a Jenkins Pipeline with AWS SAM

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-a-jenkins-pipeline-with-aws-sam/

This post is courtesy of Tarun Kumar Mall, SDE at AWS.

This post shows how to set up a multi-stage pipeline on a Jenkins host for a serverless application, using the AWS Serverless Application Model (AWS SAM).

Overview

This tutorial uses Jenkins Pipeline plugin. A commit to the main branch of the repository starts and deploys the application, using the AWS SAM CLI. This tutorial deploys a small serverless API application called HelloWorldApi.

The pipeline consists of stages to build and deploy the application. Jenkins first ensures that the build environment is set up and installs any necessary tools. Next, Jenkins prepares the build artifacts. It promotes the artifacts to the next stage, where they are deployed to a beta environment using the AWS SAM CLI. Integration tests are run after deployment. If the tests pass, the application is deployed to the production environment.

CICD workflow diagram

CICD workflow diagram

The following prerequisites are required:

Setting up the backend application and development stack

Using AWS CloudFormation to define the infrastructure, you can create multiple environments or stacks from the same infrastructure definition. A “dev stack” is a copy of production infrastructure deployed to a developer account for testing purposes.

As serverless services use a pay-for-value model, it can be cost effective to use a high-fidelity copy of your production stack. Dev stacks are created by each developer as needed and deleted without having any negative impact on production.

For complex applications, it may not be feasible for every developer to have their own stack. However, for this tutorial, setting up the dev stack first for testing is recommended. Setting up a dev stack takes you through a manual process of how a stack is created. Later, this process is used to automate the setup using Jenkins.

To create a dev stack:

  1. Clone backend application repository https://github.com/aws-samples/aws-sam-jenkins-pipeline-tutorial
    git clone https://github.com/aws-samples/aws-sam-jenkins-pipeline-tutorial.git
  2. Build the application and run the guided deploy command:
    cd aws-sam-jenkins-pipeline-tutorial
    sam build
    sam deploy --guided

    AWS SAM guided deploy output

    AWS SAM guided deploy output

This sets up a development stack and saves the settings in the samconfig.toml file with configuration environment specific to a user. This also triggers a deployment.

  1. After deployment, make a small code change. For example, in the file hello-world/app.js change the message Hello world to Hello world from user <your name>.
  2. Deploy the updated code:
    sam build
    sam deploy -–config-env <your_username>

With this command, each developer can create their own configuration environment. They can use this for deploying to their development stack and testing changes before pushing changes to the repository.

Once deployment finishes, the API endpoint is displayed in the console output. You can use this endpoint to make GET requests and test the API manually.

Deployment output

Deployment output

To update and run the integration test:

  1. Open the hello-world/tests/integ/test-integ-api.js file.
  2. Update the assert statement in line 32 to include <your name> from the previous step:
    it("verifies if response contains my username", async () => {
      assert.include(apiResponse.data.message, "<your name>");
    });
  3. Open package.json and add the line in bold:
    {
      ...
      "scripts": {
        "test": "mocha tests/unit/",
        "integ-test": "mocha tests/integ/"
      }
      ...
    }
  4. From the terminal, run the following commands:
    cd hello-world
    npm install
    AWS_REGION=us-west-2 STACK_NAME=sam-app-user1-dev-stack npm run integ-test
    If you are using Microsoft Windows, instead run:
    cd hello-world
    npm install
    set AWS_REGION=us-west-2
    set STACK_NAME=sam-app-user1-dev-stack
    npm run integ-test

    Test results

    Test results

You have deployed a fully configured development stack with working integration tests. To push the code to GitHub:

  1. Create a new repository in GitHub.
    1. From the GitHub account homepage, choose New.
    2. Enter a repository name and choose Create Repository.
    3. Copy the repository URL.
  2. From the root directory of the AWS SAM project, run:
    git init
    git commit -am “first commit”
    git remote add origin <your-repository-url>
    git push -u origin main

Creating an IAM user for Jenkins

To create an IAM user for the Jenkins deployment:

  1. Sign in to the AWS Management Console and navigate to IAM.
  2. Select Users from side navigation and choose Add user.
  3. Enter the User name as sam-jenkins-demo-credentials and grant Programmatic access to this user.
  4. On the next page, select Attach existing policies directly and choose Create Policy.
  5. Select the JSON tab and enter the following policy. Replace <YOUR_ACCOUNT_ID> with your AWS account ID:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "CloudFormationTemplate",
                "Effect": "Allow",
                "Action": [
                    "cloudformation:CreateChangeSet"
                ],
                "Resource": [
                    "arn:aws:cloudformation:*:aws:transform/Serverless-2016-10-31"
                ]
            },
            {
                "Sid": "CloudFormationStack",
                "Effect": "Allow",
                "Action": [
                    "cloudformation:CreateChangeSet",
                    "cloudformation:DeleteStack",
                    "cloudformation:DescribeChangeSet",
                    "cloudformation:DescribeStackEvents",
                    "cloudformation:DescribeStacks",
                    "cloudformation:ExecuteChangeSet",
                    "cloudformation:GetTemplateSummary"
                ],
                "Resource": [
                    "arn:aws:cloudformation:*:<YOUR_ACCOUNT_ID>:stack/*"
                ]
            },
            {
                "Sid": "S3",
                "Effect": "Allow",
                "Action": [
                    "s3:CreateBucket",
                    "s3:GetObject",
                    "s3:PutObject"
                ],
                "Resource": [
                    "arn:aws:s3:::*/*"
                ]
            },
            {
                "Sid": "Lambda",
                "Effect": "Allow",
                "Action": [
                    "lambda:AddPermission",
                    "lambda:CreateFunction",
                    "lambda:DeleteFunction",
                    "lambda:GetFunction",
                    "lambda:GetFunctionConfiguration",
                    "lambda:ListTags",
                    "lambda:RemovePermission",
                    "lambda:TagResource",
                    "lambda:UntagResource",
                    "lambda:UpdateFunctionCode",
                    "lambda:UpdateFunctionConfiguration"
                ],
                "Resource": [
                    "arn:aws:lambda:*:<YOUR_ACCOUNT_ID>:function:*"
                ]
            },
            {
                "Sid": "IAM",
                "Effect": "Allow",
                "Action": [
                    "iam:AttachRolePolicy",
                    "iam:CreateRole",
                    "iam:DeleteRole",
                    "iam:DetachRolePolicy",
                    "iam:GetRole",
                    "iam:PassRole",
                    "iam:TagRole"
                ],
                "Resource": [
                    "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/*"
                ]
            },
            {
                "Sid": "APIGateway",
                "Effect": "Allow",
                "Action": [
                    "apigateway:DELETE",
                    "apigateway:GET",
                    "apigateway:PATCH",
                    "apigateway:POST",
                    "apigateway:PUT"
                ],
                "Resource": [
                    "arn:aws:apigateway:*::*"
                ]
            }
        ]
    }
  6. Choose Review Policy and add a policy name on the next page.
  7. Choose Create Policy button.
  8. Return to the previous tab to continue creating the IAM user. Choose Refresh and search for the policy name you created. Select the policy.
  9. Choose Next Tags and then Review.
  10. Choose Create user and save the Access key ID and Secret access key.

Configuring Jenkins

To configure AWS credentials in Jenkins:

  1. On the Jenkins dashboard, go to Manage Jenkins > Manage Plugins in the Available tab. Search for the Pipeline: AWS Steps plugin and choose Install without restart.
  2. Navigate to Manage Jenkins > Manage Credentials > Jenkins (global) > Global Credentials > Add Credentials.
  3. Select Kind as AWS credentials and use the ID sam-jenkins-demo-credentials.
  4. Enter the access key ID and secret access key and choose OK.

    Jenkins credential configuration

    Jenkins credential configuration

  5. Create Amazon S3 buckets for each Region in the pipeline. S3 bucket names must be unique within a partition:
    aws s3 mb s3://sam-jenkins-demo-us-west-2-<your_name> --region us-west-2
    aws s3 mb s3://sam-jenkins-demo-us-east-1-<your_name> --region us-east-1
  6. Create a file named Jenkinsfile at the root of the project and add:
    pipeline {
      agent any
     
      stages {
        stage('Install sam-cli') {
          steps {
            sh 'python3 -m venv venv && venv/bin/pip install aws-sam-cli'
            stash includes: '**/venv/**/*', name: 'venv'
          }
        }
        stage('Build') {
          steps {
            unstash 'venv'
            sh 'venv/bin/sam build'
            stash includes: '**/.aws-sam/**/*', name: 'aws-sam'
          }
        }
        stage('beta') {
          environment {
            STACK_NAME = 'sam-app-beta-stage'
            S3_BUCKET = 'sam-jenkins-demo-us-west-2-user1'
          }
          steps {
            withAWS(credentials: 'sam-jenkins-demo-credentials', region: 'us-west-2') {
              unstash 'venv'
              unstash 'aws-sam'
              sh 'venv/bin/sam deploy --stack-name $STACK_NAME -t template.yaml --s3-bucket $S3_BUCKET --capabilities CAPABILITY_IAM'
              dir ('hello-world') {
                sh 'npm ci'
                sh 'npm run integ-test'
              }
            }
          }
        }
        stage('prod') {
          environment {
            STACK_NAME = 'sam-app-prod-stage'
            S3_BUCKET = 'sam-jenkins-demo-us-east-1-user1'
          }
          steps {
            withAWS(credentials: 'sam-jenkins-demo-credentials', region: 'us-east-1') {
              unstash 'venv'
              unstash 'aws-sam'
              sh 'venv/bin/sam deploy --stack-name $STACK_NAME -t template.yaml --s3-bucket $S3_BUCKET --capabilities CAPABILITY_IAM'
            }
          }
        }
      }
    }
  7. Commit and push the code to the GitHub repository by running following commands:
    git commit -am “Adding Jenkins pipeline config.”
    git push origin -u main

Next, create a Jenkins Pipeline project:

  1. From the Jenkins dashboard, choose New Item, select Pipeline, and enter the project name sam-jenkins-demo-pipeline.

    Jenkins Pipeline creation wizard

    Jenkins Pipeline creation wizard

  2. Under Build Triggers, select Poll SCM and enter * * * * *. This polls the repository for changes every minute.

    Jenkins build triggers configuration

    Jenkins build triggers configuration

  3. Under the Pipeline section, select Definition as Pipeline script from SCM.
    • Select GIT under SCM and enter the repository URL.
    • Set Branches to build to */main.
    • Set the Script Path to Jenkinsfile.

      Jenkins pipeline configuration

      Jenkins pipeline configuration

  4. Save the project.

After the build finishes, you see the pipeline:

Jenkins pipeline stages

Jenkins pipeline stages

Review the logs for the beta stage to check that the integration test is completed successfully.

Jenkins stage logs

Jenkins stage logs

Conclusion

This tutorial uses a Jenkins Pipeline to add an automated CI/CD pipeline to an AWS SAM-generated example application. Jenkins automatically builds, tests, and deploys the changes after each commit to the repository.

Using Jenkins, developers can gain the benefits of continuous integration and continuous deployment of serverless applications to the AWS Cloud with minimal configuration.

For more information, see the Jenkins Pipeline and AWS Serverless Application Model documentation.

We want to hear your feedback! Is this approach useful for your organization? Do you want to see another implementation? Contact us on Twitter @edjgeek or via comments!

Introducing message archiving and analytics for Amazon SNS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-message-archiving-and-analytics-for-amazon-sns/

This blog post is courtesy of Sebastian Caceres (AWS Consultant, DevOps), Otavio Ferreira (Sr. Manager, Amazon SNS), Prachi Sharma and Mary Gao (Software Engineers, Amazon SNS).

Today, we are announcing the release of a message delivery protocol for Amazon SNS based on Amazon Kinesis Data Firehose. This is a new way to integrate SNS with storage and analytics services, without writing custom code.

SNS provides topics for push-based, many-to-many pub/sub messaging to help you decouple distributed systems, microservices, and event-driven serverless applications. As applications grow, so does the need to archive messages to meet compliance goals. These archives can also provide important operational and business insights.

Previously, custom code was required to create data pipelines, using general-purpose SNS subscription endpoints, such as Amazon SQS queues or AWS Lambda functions. You had to manage data transformation, data buffering, data compression, and the upload to data stores.

Overview

With the new native integration between SNS and Kinesis Data Firehose, you can send messages to storage and analytics services, using a purpose-built SNS subscription type.

Once you configure a subscription, messages published to the SNS topic are sent to the subscribed Kinesis Data Firehose delivery stream. The messages are then delivered to the destination endpoint configured in the delivery stream, which can be an Amazon S3 bucket, an Amazon Redshift table, or an Amazon Elasticsearch Service index.

You can also use a third-party service provider as the destination of a delivery stream, including Datadog, New Relic, MongoDB, and Splunk. No custom code is required to bridge the services. For more information, see Fanout to Kinesis Data Firehose streams, in the SNS Developer Guide.

Amazon SNS subscriber types with Amazon Kinesis Data Firehose.

The new Kinesis Data Firehose subscription type and its destinations are part of the application-to-application (A2A) messaging offering of SNS. The addition of this subscription type expands the SNS A2A offering to include the following use cases:

  • Run analytics on SNS messages, using Amazon Kinesis Data Analytics, Amazon Elasticsearch Service, or Amazon Redshift as a delivery stream destination. You can use this option to gain insights and detect anomalies in workloads.
  • Index and search SNS messages, using Amazon Elasticsearch Service as a delivery stream destination. From there, you can create dashboards using Kibana, a data visualization and exploration tool.
  • Store SNS messages for backup and auditing purposes, using S3 as a destination of choice. You can then use Amazon Athena to query the S3 bucket for analytics purposes.
  • Apply transformation to SNS messages. For example, you may obfuscate personally identifiable information (PII) or protected health information (PHI) using a Lambda function invoked by the delivery stream.
  • Feed SNS messages into cloud-based application monitoring and observability tools, using Datadog, New Relic, or Splunk as a destination. You can choose this option to enrich DevOps or marketing workflows.

As with all supported message delivery protocols, you can filter, monitor, and encrypt messages.

To simplify architecture and further avoid custom code, you can use an SNS subscription filter policy. This enables you to route only the relevant subset of SNS messages to the Kinesis Data Firehose delivery stream. For more information, see SNS message filtering.

To monitor the throughput, you can check the NumberOfMessagesPublished and the NumberOfNotificationsDelivered metrics for SNS, and the IncomingBytes, IncomingRecords, DeliveryToS3.Records and DeliveryToS3.Success metrics for Kinesis Data Firehose. For additional information, see Monitoring SNS topics using CloudWatch and Monitoring Kinesis Data Firehose using CloudWatch.

For security purposes, you can choose to have data encrypted at rest, using server-side encryption (SSE), in addition to encrypted in transit using HTTPS. For more information, see SNS SSE, Kinesis Data Firehose SSE, and S3 SSE.

Applying SNS message archiving and analytics in a use case

For example, consider an airline ticketing platform that operates in a regulated environment. The compliance framework requires that the company archives all ticket sales for at least 5 years.

Example architecture of a flight ticket selling platform.

The platform is based on an event-driven serverless architecture. It has a ticket seller Lambda function that publishes an event to an SNS topic for every ticket sold. The SNS topic fans out the event to subscribed systems that are interested in processing this type of event. In the preceding diagram, two systems are interested: one focused on payment processing, and another on fraud control. Each subscribed system is invoked by an SQS queue and an event processing Lambda function.

To meet the compliance goal on data retention, the airline company subscribes a Kinesis Data Firehose delivery stream to their existing SNS topic. They use an S3 bucket as the stream destination. After this, all events published to the SNS topic are archived in the S3 bucket.

The company can then use Athena to query the S3 bucket with standard SQL to run analytics and gain insights on ticket sales. For example, they can query for the most popular flight destinations or the most frequent flyers.

Subscribing a Kinesis Data Firehose stream to an SNS topic

You can set up a Kinesis Data Firehose subscription to an SNS topic using the AWS Management Console, the AWS CLI, or the AWS SDKs. You can also use AWS CloudFormation to automate the provisioning of these resources.

We use CloudFormation for this example. The provided CloudFormation template creates the following resources:

  • An SNS topic
  • An S3 bucket
  • A Kinesis Data Firehose delivery stream
  • A Kinesis Data Firehose subscription in SNS
  • Two SQS subscriptions in SNS
  • Two IAM roles with access to deliver messages:
    • From SNS to Kinesis Data Firehose
    • From Kinesis Data Firehose to S3

To provision the infrastructure, use the following template:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Template for creating an SNS archiving use case
Resources:
  ticketUploadStream:
    DependsOn:
    - ticketUploadStreamRolePolicy
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      S3DestinationConfiguration:
        BucketARN: !Sub 'arn:${AWS::Partition}:s3:::${ticketArchiveBucket}'
        BufferingHints:
          IntervalInSeconds: 60
          SizeInMBs: 1
        CompressionFormat: UNCOMPRESSED
        RoleARN: !GetAtt ticketUploadStreamRole.Arn
  ticketArchiveBucket:
    Type: AWS::S3::Bucket
  ticketTopic:
    Type: AWS::SNS::Topic
  ticketPaymentQueue:
    Type: AWS::SQS::Queue
  ticketFraudQueue:
    Type: AWS::SQS::Queue
  ticketQueuePolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
          Effect: Allow
          Principal:
            Service: sns.amazonaws.com
          Action:
            - sqs:SendMessage
          Resource: '*'
          Condition:
            ArnEquals:
              aws:SourceArn: !Ref ticketTopic
      Queues:
        - !Ref ticketPaymentQueue
        - !Ref ticketFraudQueue
  ticketUploadStreamSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketUploadStream.Arn
      Protocol: firehose
      SubscriptionRoleArn: !GetAtt ticketUploadStreamSubscriptionRole.Arn
  ticketPaymentQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketPaymentQueue.Arn
      Protocol: sqs
  ticketFraudQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketFraudQueue.Arn
      Protocol: sqs
  ticketUploadStreamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service: firehose.amazonaws.com
          Action: sts:AssumeRole
  ticketUploadStreamRolePolicy:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: FirehoseticketUploadStreamRolePolicy
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Action:
          - s3:AbortMultipartUpload
          - s3:GetBucketLocation
          - s3:GetObject
          - s3:ListBucket
          - s3:ListBucketMultipartUploads
          - s3:PutObject
          Resource:
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}'
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}/*'
      Roles:
      - !Ref ticketUploadStreamRole
  ticketUploadStreamSubscriptionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - sns.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: SNSKinesisFirehoseAccessPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - firehose:DescribeDeliveryStream
            - firehose:ListDeliveryStreams
            - firehose:ListTagsForDeliveryStream
            - firehose:PutRecord
            - firehose:PutRecordBatch
            Effect: Allow
            Resource:
            - !GetAtt ticketUploadStream.Arn

To test, publish a message to the SNS topic. After the delivery stream buffer interval of 60 seconds, the message appears in the destination S3 bucket. For information on message formats, see Amazon SNS message formats in Amazon Kinesis Data Firehose destinations.

Cleaning up

After testing, avoid incurring usage charges by deleting the resources you created during the walkthrough. If you used the CloudFormation template, delete all the objects from the S3 bucket before deleting the stack.

Conclusion

In this post, we show how SNS delivery to Kinesis Data Firehose enables you to integrate SNS with storage and analytics services. The example shows how to create an SNS subscription to use a Kinesis Data Firehose delivery stream to store SNS messages in an S3 bucket.

You can adapt this configuration for your needs for storage, encryption, data transformation, and data pipeline architecture. For more information, see Fanout to Kinesis Data Firehose streams in the SNS Developer Guide.

For details on pricing, see SNS pricing and Kinesis Data Firehose pricing. For more serverless learning resources, visit Serverless Land.

Discovering sensitive data in AWS CodeCommit with AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/discovering-sensitive-data-in-aws-codecommit-with-aws-lambda-2/

This post is courtesy of Markus Ziller, Solutions Architect.

Today, git is a de facto standard for version control in modern software engineering. The workflows enabled by git’s branching capabilities are a major reason for this. However, with git’s distributed nature, it can be difficult to reliably remove changes that have been committed from all copies of the repository. This is problematic when secrets such as API keys have been accidentally committed into version control. The longer it takes to identify and remove secrets from git, the more likely that the secret has been checked out by another user.

This post shows a solution that automatically identifies credentials pushed to AWS CodeCommit in near-real-time. I also show three remediation measures that you can use to reduce the impact of secrets pushed into CodeCommit:

  • Notify users about the leaked credentials.
  • Lock the repository for non-admins.
  • Hard reset the CodeCommit repository to a healthy state.

I use the AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

Overview of solution

The services in this solution are AWS Lambda, AWS CodeCommit, Amazon EventBridge, and Amazon SNS. These services are part of the AWS serverless platform. They help reduce undifferentiated work around managing servers, infrastructure, and the parts of the application that add less value to your customers. With serverless, the solution scales automatically, has built-in high availability, and you only pay for the resources you use.

Solution architecture

This diagram outlines the workflow implemented in this blog:

  1. After a developer pushes changes to CodeCommit, it emits an event to an event bus.
  2. A rule defined on the event bus routes this event to a Lambda function.
  3. The Lambda function uses the AWS SDK for JavaScript to get the changes introduced by commits pushed to the repository.
  4. It analyzes the changes for secrets. If secrets are found, it publishes another event to the event bus.
  5. Rules associated with this event type then trigger invocations of three Lambda functions A, B, and C with information about the problematic changes.
  6. Each of the Lambda functions runs a remediation measure:
    • Function A sends out a notification to an SNS topic that informs users about the situation (A1).
    • Function B locks the repository by setting a tag with the AWS SDK (B2). It sends out a notification about this action (B2).
    • Function C runs git commands that remove the problematic commit from the CodeCommit repository (C2). It also sends out a notification (C1).

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

Checkout and deploy the sample stack:

  1. After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
    git clone [email protected]:aws-samples/discover-sensitive-data-in-aws-codecommit-with-aws-lambda.git
  2. Open the repository in a local editor and review the contents of cdk/lib/resources.ts, src/handlers/commits.ts, and src/handlers/remediations.ts.
  3. Follow the instructions in the README.md to deploy the stack.

The CDK will deploy resources for the following services in your account.

Using CodeCommit to manage your git repositories

The CDK creates a new empty repository called TestRepository and adds a tag RepoState with an initial value of ok. You later use this tag in the LockRepo remediation strategy to restrict access.

It also creates two IAM groups with one user in each. Members of the CodeCommitSuperUsers group are always able to access the repository, while members of the CodeCommitUsers group can only access the repository when the value of the tag RepoState is not locked.

I also import the CodeCommitSystemUser into the CDK. Since the user requires git credentials in a downloaded CSV file, it cannot be created by the CDK. Instead it must be created as described in the README file.

The following CDK code sets up all the described resources:

const TAG_NAME = "RepoState";

const superUsers = new Group(this, "CodeCommitSuperUsers", { groupName: "CodeCommitSuperUsers" });
superUsers.addUser(new User(this, "CodeCommitSuperUserA", {
    password: new Secret(this, "CodeCommitSuperUserPassword").secretValue,
    userName: "CodeCommitSuperUserA"
}));

const users = new Group(this, "CodeCommitUsers", { groupName: "CodeCommitUsers" });
users.addUser(new User(this, "User", {
    password: new Secret(this, "CodeCommitUserPassword").secretValue,
    userName: "CodeCommitUserA"
}));

const systemUser = User.fromUserName(this, "CodeCommitSystemUser", props.codeCommitSystemUserName);

const repo = new Repository(this, "Repository", {
    repositoryName: "TestRepository",
    description: "The repository to test this project out",
});
Tags.of(repo).add(TAG_NAME, "ok");

users.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn],
    conditions: {
        StringNotEquals: {
            [`aws:ResourceTag/${TAG_NAME}`]: "locked"
        }
    }
}));

superUsers.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn]
}));

Using EventBridge to pass events between components

I use EventBridge, a serverless event bus, to connect the Lambda functions together. Many AWS services like CodeCommit are natively integrated into EventBridge and publish events about changes in their environment.

repo.onCommit is a higher-level CDK construct. It creates the required resources to invoke a Lambda function for every commit to a given repository. The created events rule looks like this:

EventBridge rule definition

Note that this event rule only matches commit events in TestRepository. To send commits of all repositories in that account to the inspecting Lambda function, remove the resources filter in the event pattern.

CodeCommit Repository State Change is a default event that is published by CodeCommit if changes are made to a repository. In addition, I define CodeCommit Security Event, a custom event, which Lambda publishes to the same event bus if secrets are discovered in the inspected code.

The sample below shows how you can set up Lambda functions as targets for both type of events.

const DETAIL_TYPE = "CodeCommit Security Event";
const eventBus = new EventBus(this, "CodeCommitEventBus", {
    eventBusName: "CodeCommitSecurityEvents"
});

repo.onCommit("AnyCommitEvent", {
    ruleName: "CallLambdaOnAnyCodeCommitEvent",
    target: new targets.LambdaFunction(commitInspectLambda)
});


new Rule(this, "CodeCommitSecurityEvent", {
    eventBus,
    enabled: true,
    ruleName: "CodeCommitSecurityEventRule",
    eventPattern: {
        detailType: [DETAIL_TYPE]
    },
    targets: [
        new targets.LambdaFunction(lockRepositoryLambda),
        new targets.LambdaFunction(raiseAlertLambda),
        new targets.LambdaFunction(forcefulRevertLambda)
    ]
});

Using Lambda functions to run remediation measures

AWS Lambda functions allow you to run code in response to events. The example defines four Lambda functions.

By comparing the delta to its predecessor, the commitInspectLambda function analyzes if secrets are introduced by a commit. With the CDK, you can create a Lambda function with:

const myLambdaInCDK = new Function(this, "UniqueIdentifierRequiredByCDK", {
    runtime: Runtime.NODEJS_12_X,
    handler: "<handlerfile>.<function name>",
    code: Code.fromAsset(path.join(__dirname, "..", "..", "src", "handlers")),
    // See git repository for complete code
});

The code for this Lambda function uses the AWS SDK for JavaScript to fetch the details of the commit, the differences introduced, and the new content.

The code checks each modified file line by line with a regular expression that matches typical secret formats. In src/handlers/regex.json, I provide a few regular expressions that match common secrets. You can extend this with your own patterns.

If a secret is discovered, a CodeCommit Security Event is published to the event bus. EventBridge then invokes all Lambda functions that are registered as targets with this event. This demo triggers three remediation measures.

The raiseAlertLambda function uses the AWS SDK for JavaScript to send out a notification to all subscribers (that is, CodeCommit administrators) on an SNS topic. It takes no further action.

SNS.publish({
    TopicArn: <TOPIC_ARN>,
    Subject: `[ACTION REQUIRED] Secrets discovered in <repo>`
    Message: `<Your message>
}

Notification about secrets discovered in a commit in TestRepository

The lockRepositoryLambda function uses the AWS SDK for JavaScript to change the RepoState tag from ok to locked. This restricts access to members of the CodeCommitSuperUsers IAM group.

CodeCommit.tagResource({
    resourceArn: event.detail.repositoryArn,
    tags: {
        RepoState: "locked"
    }
})

In addition, the Lambda function uses SNS to send out a notification. The forcefulRevertLambda function runs the following git commands:

git clone <repository>
git checkout <branch>
git reset –hard <previousCommitId>
git push origin <branch> --force

These commands reset the repository to the last accepted commit, by forcefully removing the respective commit from the git history of your CodeCommit repo. I advise you to handle this with care and only activate it on a real project if you fully understand the consequences of rewriting git history.

The Node.js v12 runtime for Lambda does not have a git runtime installed by default. You can add one by using the git-lambda2 Lambda layer. This allows you to run git commands from within the Lambda function.

Logs for the remediation measure Hard Reset

Finally, this Lambda function also sends out a notification. The complete code is available in the GitHub repo.

Using SNS to notify users

To notify users about secrets discovered and actions taken, you create an SNS topic and subscribe to it via email.

const topic = new Topic(this, "CodeCommitSecurityEventNotification", {
    displayName: "CodeCommitSecurityEventNotification",
});

topic.addSubscription(new subs.EmailSubscription(/* your email address */));

Testing the solution

You can test the deployed solution by running these two sets of commands. First, add a file with no credentials:

echo "Clean file - no credentials here" > clean_file.txt
git add clean_file.txt
git commit clean_file.txt -m "Adds clean_file.txt"
git push

Then add a file containing credentials:

SECRET_LIKE_STRING=$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo "secret=$SECRET_LIKE_STRING" > problematic_file.txt
git add problematic_file.txt
git commit problematic_file.txt -m "Adds secret-like string to problematic_file.txt"
git push

This first command creates, commits and pushes an unproblematic file clean_file.txt that will pass the checks of commitInspectLambda. The second command creates, commits, and pushes problematic_file.txt, which matches the regular expressions and triggers the remediation measures.

If you check your email, you soon receive notifications about actions taken by the Lambda functions.

Cleaning up

To avoid incurring charges, delete the resources by running cdk destroy and confirming the deletion.

Conclusion

This post demonstrates how you can implement a solution to discover secrets in commits to AWS CodeCommit repositories. It also defines different strategies to remediate this.

The CDK code to set up all components is minimal and can be extended for remediation measures. The template is portable between Regions and uses serverless technologies to minimize cost and complexity.

For more serverless learning resources, visit Serverless Land.