Tag Archives: AWS Lambda

Updated timeframe for the upcoming AWS Lambda and AWS [email protected] execution environment update

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/updated-timeframe-for-the-upcoming-aws-lambda-and-aws-lambdaedge-execution-environment-update/

On May 14th we announced an upcoming update to the AWS Lambda and AWS [email protected] execution environments. In that announcement we shared that we are updating the execution environment to a more recent version of Amazon Linux. This newer execution environment brings updates that offer improvements in capabilities, performance, security, and updated packages that your application code might interface with. The previous post explained approaches to proactively testing against the new update, and methods to update your code to be compatible in the rare case you were impacted.

So far, we’ve heard from many customers that their functions have not been impacted when using the new execution environment by configuring the Opt-in mechanism. For those that have been impacted, they have been able to follow the guidance on rebuilding any dependencies against the new execution environment and retesting their functions with success.

We also received feedback that customers wanted to see a longer time frame for validation as well as have more control over it, and so based on this feedback we’ve decided to modify the timeframe in two ways.

The first phase, Begin Testing, will be extended by three weeks, retroactive starting May 21 and now ending June 10. This will give you more time to test your functions with the Opt-in layer before any further changes to the platform kick in.

We are then taking the second phase, originally called Update/Create and breaking into two independent periods of time. The first, now referred to as the New Function Create phase, will be two weeks long and during this time all newly created functions will use the new execution environment unless a delayed-update layer is configured. The second new phase, Existing Function Update, will be three weeks long and during this time both newly created functions as well as existing functions that you update, will use the new execution environment unless a delayed-update layer is configured.

The end result is that you now have 5 more weeks in total to test and potentially update your functions for this change before the General Update begins. As a reminder, starting at that time, all functions without a delayed-update layer configured will begin migrating to the new execution environment.

New update timeline

The following is the new timeline for the update, which is now broken up over five phases:

May 14, 2019 – Begin Testing: You can begin testing your functions for the new execution environment locally with AWS SAM CLI or using an Amazon EC2 instance running on Amazon Linux 2018.03. You can also proactively enable the new environment in AWS Lambda using the opt-in mechanism described in the original announcement post.
June 11, 2019 – New Function Create: All newly created functions will result in those functions running on the new execution environment, unless they have a delayed-update layer configured.
June 25, 2019 – Existing Function Update: All newly created functions or existing functions that you update will result in those functions running on the new execution environment, unless they have a delayed-update layer configured.
July 16, 2019 – General Update: Existing functions begin using the new execution environment on invoke, unless they have a delayed-update layer configured.
July 23, 2019 – Delayed Update End: All functions with a delayed-update layer configured start being migrated automatically.
July 29, 2019 – Migration End: All functions have been migrated over to the new execution environment.

Note, that we have updated the original announcement post with this new timeline as well.

FAQ

We also wanted to take the chance to provide additional information on follow up questions customers have had about the update.

Q. How does this relate to the recent Node.js v10 runtime launch?
A. The Node.js v10 launch is unrelated and is not impacted by this change. The Node.js v10 runtime is based on Amazon Linux 2 as its execution environment. Please see the AWS Lambda Runtimes section in the documentation for more information.

Q. Does this update change the execution environment for other runtimes to run on Amazon Linux 2?
A. No, this update brings the execution environment to the latest Amazon Linux 1 distribution release. In the future, new runtimes will launch on Amazon Linux 2, but all previous existing runtimes will continue to run on Amazon Linux 1.

Q. Was this update related to the recent Intel Quarterly Security Release (QSR) 2019.1?
A. No, this motion to begin updating the execution environment for Lambda and [email protected] is unrelated to the Intel QSR. There is no action for Lambda or [email protected] customers to take in relation to the QSR.

Next Steps

Your feedback greatly matters to us and we will continue to listen and learn from you. Please continue to contact us through AWS Support, the AWS forums, or AWS account teams.

ICYMI: Serverless Q1 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-q1-2019/

Welcome to the fifth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

If you didn’t see them, check our previous posts for what happened in 2018:

So, what might you have missed this past quarter? Here’s the recap.

Amazon API Gateway

Amazon API Gateway improved the experience for publishing APIs on the API Gateway Developer Portal. In addition, we also added features like a search capability, feedback mechanism, and SDK-generation capabilities.

Last year, API Gateway announced support for WebSockets. As of early February 2019, it is now possible to build WebSocket-enabled APIs via AWS CloudFormation and AWS Serverless Application Model (AWS SAM). The following diagram shows an example application.WebSockets

API Gateway is also now supported in AWS Config. This feature enhancement allows API administrators to track changes to their API configuration automatically. With the power of AWS Config, you can automate alerts—and even remediation—with triggered Lambda functions.

In early January, API Gateway also announced a service level agreement (SLA) of 99.95% availability.

AWS Step Functions

Step Functions Local

AWS Step Functions added the ability to tag Step Function resources and provide access control with tag-based permissions. With this feature, developers can use tags to define access via AWS Identity and Access Management (IAM) policies.

In addition to tag-based permissions, Step Functions was one of 10 additional services to have support from the Resource Group Tagging API, which allows a single central point of administration for tags on resources.

In early February, Step Functions released the ability to develop and test applications locally using a local Docker container. This new feature allows you to innovate faster by iterating faster locally.

In late January, Step Functions joined the family of services offering SLAs with an SLA of 99.9% availability. They also increased their service footprint to include the AWS China (Ningxia) and AWS China (Beijing) Regions.

AWS SAM Command Line Interface

AWS SAM Command Line Interface (AWS SAM CLI) released the AWS Toolkit for Visual Studio Code and the AWS Toolkit for IntelliJ. These toolkits are open source plugins that make it easier to develop applications on AWS. The toolkits provide an integrated experience for developing serverless applications in Node.js (Visual Studio Code) as well as Java and Python (IntelliJ), with more languages and features to come.

The toolkits help you get started fast with built-in project templates that leverage AWS SAM to define and configure resources. They also include an integrated experience for step-through debugging of serverless applications and make it easy to deploy your applications from the integrated development environment (IDE).

AWS Serverless Application Repository

AWS Serverless Application Repository applications can now be published to the application repository using AWS CodePipeline. This allows you to update applications in the AWS Serverless Application Repository with a continuous integration and continuous delivery (CICD) process. The CICD process is powered by a pre-built application that publishes other applications to the AWS Serverless Application Repository.

AWS Event Fork Pipelines

Event Fork Pipelines

AWS Event Fork Pipelines is now available in AWS Serverless Application Repository. AWS Event Fork Pipelines is a suite of nested open-source applications based on AWS SAM. You can deploy Event Fork Pipelines directly from AWS Serverless Application Repository into your AWS account. These applications help you build event-driven serverless applications by providing pipelines for common event-handling requirements.

AWS Cloud9

Cloud9

AWS Cloud9 announced that, in addition to Amazon Linux, you can now select Ubuntu as the operating system for their AWS Cloud9 environment. Before this announcement, you would have to stand up an Ubuntu server and connect AWS Cloud9 to the instance by using SSH. With native support for Ubuntu, you can take advantage of AWS Cloud9 features, such as instance lifecycle management for cost efficiency and preconfigured tooling environments.

AWS Cloud9 also added support for AWS CloudTrail, which allows you to monitor and react to changes made to your AWS Cloud9 environment.

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics now supports CloudTrail logging. CloudTrail captures changes made to Kinesis Data Analytics and delivers the logs to an Amazon S3 bucket. This makes it easy for administrators to understand changes made to the application and who made them.

Amazon DynamoDB

Amazon DynamoDB removed the associated costs of DynamoDB Streams used in replicating data globally. Because of their use of streams to replicate data between Regions, this translates to cost savings in global tables. However, DynamoDB streaming costs remain the same for your applications reading from a replica table’s stream.

DynamoDB added the ability to switch encryption keys used to encrypt data. DynamoDB, by default, encrypts all data at rest. You can use the default encryption, the AWS-owned customer master key (CMK), or the AWS managed CMK to encrypt data. It is now possible to change between the AWS-owned CMK and the AWS managed CMK without having to modify code or applications.

Amazon DynamoDB Local, a local installable version of DynamoDB, has added support for transactional APIs, on-demand capacity, and as many as 20 global secondary indexes per table.

AWS Amplify

Amplify Deploy

AWS Amplify added support for OAuth 2.0 Authorization Code Grant flows in the native (iOS and Android) and React Native libraries. Previously, you would have to use third-party libraries and handwritten logic to achieve these use cases.

Additionally, Amplify also launched the ability to perform instant cache invalidation and delta deployments on every code commit. To achieve this, Amplify creates unique references to all the build artifacts on each deploy. Amplify has also added the ability to detect and upload only modified artifacts at the time of release to help reduce deployment time.

Amplify also added features for multiple environments, custom resolvers, larger data models, and IAM roles, including multi-factor authentication (MFA).

AWS AppSync

AWS AppSync increased its availability footprint to the EU (London) Region.

Amazon Cognito

Amazon Cognito increased its service footprint to include the Canada (central) Region. It also published an SLA of 99.9% availability.

Amazon Aurora

Amazon Aurora Serverless increases performance visibility by publishing logs to Amazon CloudWatch.

AWS CodePipeline

CodePipeline

AWS CodePipeline announces support for deploying static files to Amazon S3. While this may not usually fall under the serverless blogs and announcements, if you’re a developer who builds single-page applications or host static websites, this makes your life easier. Your static site can now be part of your CICD process without custom coding.

Serverless Posts

January:

February:

March

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q1:

Whitepapers

Security Overview of AWS Lambda: This whitepaper presents a deep dive into the Lambda service through a security lens. It provides a well-rounded picture of the service, which can be useful for new adopters, as well as deepening understanding of Lambda for current users. Read the full whitepaper.

Twitch

AWS Launchpad Santa Clara

There is always something going on at our Twitch channel! Be sure and follow us so you don’t miss anything! For information about upcoming broadcasts and recent livestreams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

In other news

Building Happy Little APIs

Twitch Series: Building Happy Little APIs

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

Twitch series: Build on Serverless: Season 2

Build On Serverless

Join Heitor Lessa across 14 weeks, nearly every Wednesday from April 24 – August 7 at 8AM PST/11AM EST/3PM UTC. Heitor is live-building a full-stack, serverless airline-booking application using a bunch of services: Lambda, Amplify, API Gateway, Amazon Cognito, AWS SAM, CloudWatch, AWS AppSync, and others. See the episode guide and sign up for stream reminders.

2019 AWS Summits

AWS Summit

The 2019 schedule is in full swing for 2019 AWS Global Summits held in major cities around the world. These free events bring the cloud computing community together to connect, collaborate, and learn about AWS. They attract technologists from all industries and skill levels who want to discover how AWS can help them innovate quickly and deliver flexible, reliable solutions at scale. Get notified when to register and learn more at the AWS Global Summit Program website.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Upcoming updates to the AWS Lambda and AWS [email protected] execution environment

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/upcoming-updates-to-the-aws-lambda-execution-environment/

AWS Lambda was first announced at AWS re:Invent 2014. Amazon CTO Werner Vogels highlighted the aspect of needing to run no servers, no instances, nothing, you just write your code. In 2016, we announced the launch of [email protected], which lets you run Lambda functions to customize content that CloudFront delivers, executing the functions in AWS locations closer to the viewer.

At AWS, we talk often about “shared responsibility” models. Essentially, those are the places where there is a handoff between what we as a technology provider offer you and what you as the customer are responsible for. In the case of Lambda and [email protected], one of the key things that we manage is the “execution environment.” The execution environment is what your code runs inside of. It is composed of the underlying operating system, system packages, the runtime for your language (if a managed one), and common capabilities like environment variables. From the customer standpoint, your primary responsibility is for your application code and configuration.

In this post, we outline an upcoming change to the execution environment for Lambda and [email protected] functions for all runtimes with the exception of Node.js v10. As with any update, some functionality could be affected. We encourage you to read through this post to understand the changes and any actions that you might need to take.

Update overview

AWS Lambda and AWS [email protected] run on top of the Amazon Linux operating system distribution and maintain updates to both the core OS and managed language runtimes. We are updating the Lambda execution environment AMI to version 2018.03 of Amazon Linux. This newer AMI brings updates that offer improvements in capabilities, performance, security, and updated packages that your application code might interface with.

This does not apply to the recently announced Node.js v10 runtime which today runs on Amazon Linux 2.

The majority of functions will benefit seamlessly from the enhancements in this update without any action from you. However, in rare cases, package updates may introduce compatibility issues. Potential impacted functions may be those that contain libraries or application code compiled against specific underlying OS packages or other system libraries. If you are primarily using an AWS SDK or the AWS X-Ray SDK with no other dependencies, you will see no impact.

You have the following options in terms of next steps:

  • Take no action before the automatic update of the execution environment starting May 21 for all newly created/updated functions and June 11 for all existing functions.
  • Proactively test your functions against the new environment starting today.
  • Configure your functions to delay the execution environment update until June 18 to allow for a longer testing window.

In addition to the overall timeline for this change, this post also provides instructions on the following:

  • How to test your functions for this new execution environment locally and on Lambda/[email protected]
  • How to proactively update your functions.
  • How to extend the testing window by one week.

Update timeline

The following is the timeline for the update, which is broken up over four phases over the next several weeks:

May 14, 2019—Begin Testing: You can begin testing your functions for the new execution environment locally with AWS SAM CLI or using an Amazon EC2 instance running on Amazon Linux 2018.03. You can also proactively enable the new environment in AWS Lambda using the opt-in mechanism described later in this post.
May 21, 2019—Update/Create: All new function creates or function updates result in your functions running on the new execution environment.
June 11, 2019—General Update: Existing functions begin using the new execution environment on invoke unless they have a delayed-update layer configured.
June 18, 2019—Delayed Update End: All functions with a delayed-update layer configured start being migrated automatically.
June 24, 2019—Migration End: All functions have been migrated over to the new execution environment.

Recommended Approach

decision tree

You only have to act if your application uses dependencies that are compiled to work on the previous execution environment. Otherwise, you can continue to deploy new and updated Lambda functions without needing to perform any other testing steps. For those who aren’t sure if their functions use such dependencies, we encourage you to do a new deployment of your functions and to test their functionality.

There are two options for when you can start testing your functions on the new execution environment:

  • You can begin testing today using the opt-in mechanism described later.
  • Starting May 21, a new deploy or update of your functions uses the new execution environment.

If you confirm that your functions would be affected by the new execution environment, you can begin re-compiling or building your dependencies using the new reference AMI for the execution environment today and then repeat the testing. The final step is to redeploy your applications any time after May 21 to use the new execution environment.

Building your dependencies and application for the new execution environment

Because we are basing the environment off of an existing Amazon Linux AMI, you can start with building and testing your code against that AMI on EC2. With an updated EC2 instance running this AMI, you can compile and build your packages using your normal processes. For the list of AMI IDs in all public Regions, check the release notes. To start an EC2 instance running this AMI, follow the steps in the Launching an Instance Using the Launch Instance Wizard topic in the Amazon EC2 User Guide.

Opt-in/Delayed-Update with Lambda layers

Some of you may want to begin testing as soon as you’ve read this announcement. Some know that they should postpone until later in the timeline.

To give you some control over testing, we’re releasing two special Lambda layers. Lambda layers can be used to provide shared resources, code, or data between Lambda functions and can simplify the deployment and update process. These layers don’t actually contain any data or code. Instead, they act as a special flag to Lambda to run your function executions either specifically on the new or old execution environment.

The Opt-In layer allows you to start testing today. You can use the Delayed-Update layer when you know that you must make updates to your function or its configuration after May 21, but aren’t ready to deploy to the new execution environment. The Delayed-Update layer extends the initial period available to you to deploy your functions by one week until the end of June 17, without changing the execution environment.

Neither layer brings any performance or runtime changes beyond this. After June 24, the layers will have no functionality. In a future deployment, you should remove them from any function configurations.

The ARNs for the two scenarios:

  • To OPT-IN to the update to the new execution environment, add the following layer:

arn:aws:lambda:::awslayer:AmazonLinux1803

  • To DELAY THE UPDATE to the new execution environment until June 18, add the following layer:

arn:aws:lambda:::awslayer:AmazonLinux1703

The action for adding a layer to your existing functions requires an update to the Lambda function’s configuration. You can do this with the AWS CLI, AWS CloudFormation or AWS SAM, popular third-party frameworks, the AWS Management Console, or an AWS SDK.

Validating your functions

There are several ways for you to test your function code and assure that it will work after the execution environment has been updated.

Local testing

We’re providing an update to the AWS SAM CLI to enable you to test your functions locally against this new execution environment. The AWS SAM CLI uses a Docker image that mirrors the live Lambda environment locally wherever you do development. To test against this new update, make sure that you have the most recent update to AWS SAM CLI version 0.16.0. You also should have an AWS SAM template configured for your function.

  1. Install or update the AWS SAM CLI:
    $ pip install --upgrade aws-sam-cli

    -Or-

    $ pip install aws-sam-cli
  2. Confirm that you have a valid AWS SAM template:
    $ sam validate -t <template file name>

    If you don’t have a valid AWS SAM template, you can begin with a basic template to test your functions. The following example represents the basic needs for running your function against a variety. The Runtime value must be listed in the AWS Lambda Runtimes topic.

    AWSTemplateFormatVersion: 2010-09-09
    Transform: 'AWS::Serverless-2016-10-31'
    
    Resources:
      myFunction:
        Type: 'AWS::Serverless::Function'
        Properties:
          CodeUri: ./ 
          Handler: YOUR_HANDLER
          Runtime: YOUR_RUNTIME
  3. With a valid template, you can begin testing your function with mock event payloads. To generate a mock event payload, you can use the AWS SAM CLI local generate-event command. Here is an example of that command being run to generate an Amazon S3 notification type of event:
    sam local generate-event s3 put --bucket munns-test --key somephoto.jpeg
    {
      "Records": [
        {
          "eventVersion": "2.0", 
          "eventTime": "1970-01-01T00:00:00.000Z", 
          "requestParameters": {
            "sourceIPAddress": "127.0.0.1"
          }, 
          "s3": {
            "configurationId": "testConfigRule", 
            "object": {
              "eTag": "0123456789abcdef0123456789abcdef", 
              "sequencer": "0A1B2C3D4E5F678901", 
              "key": "somephoto.jpeg", 
              "size": 1024
            }, 
            "bucket": {
              "arn": "arn:aws:s3:::munns-test", 
              "name": "munns-test", 
              "ownerIdentity": {
                "principalId": "EXAMPLE"
              }
            }, 
            "s3SchemaVersion": "1.0"
          }, 
          "responseElements": {
            "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH", 
            "x-amz-request-id": "EXAMPLE123456789"
          }, 
          "awsRegion": "us-east-1", 
          "eventName": "ObjectCreated:Put", 
          "userIdentity": {
            "principalId": "EXAMPLE"
          }, 
          "eventSource": "aws:s3"
        }
      ]
    }

    You can then use the AWS SAM CLI local invoke command and pipe in the output from the previous command. Or, you can save the output from the previous command to a file and then pass in a reference to the file’s name and location with the -e flag. Here is an example of the pipe event method:

    sam local generate-event s3 put --bucket munns-test --key somephoto.jpeg | sam local invoke myFunction
    2019-02-19 18:45:53 Reading invoke payload from stdin (you can also pass it from file with --event)
    2019-02-19 18:45:53 Found credentials in shared credentials file: ~/.aws/credentials
    2019-02-19 18:45:53 Invoking index.handler (python2.7)
    
    Fetching lambci/lambda:python2.7 Docker container image......
    2019-02-19 18:45:53 Mounting /home/ec2-user/environment/forblog as /var/task:ro inside runtime container
    START RequestId: 7c14eea1-96e9-4b7d-ab54-ed1f50bd1a34 Version: $LATEST
    {"Records": [{"eventVersion": "2.0", "eventTime": "1970-01-01T00:00:00.000Z", "requestParameters": {"sourceIPAddress": "127.0.0.1"}, "s3": {"configurationId": "testConfigRule", "object": {"eTag": "0123456789abcdef0123456789abcdef", "key": "somephoto.jpeg", "sequencer": "0A1B2C3D4E5F678901", "size": 1024}, "bucket": {"ownerIdentity": {"principalId": "EXAMPLE"}, "name": "munns-test", "arn": "arn:aws:s3:::munns-test"}, "s3SchemaVersion": "1.0"}, "responseElements": {"x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH", "x-amz-request-id": "EXAMPLE123456789"}, "awsRegion": "us-east-1", "eventName": "ObjectCreated:Put", "userIdentity": {"principalId": "EXAMPLE"}, "eventSource": "aws:s3"}]}
    END RequestId: 7c14eea1-96e9-4b7d-ab54-ed1f50bd1a34
    REPORT RequestId: 7c14eea1-96e9-4b7d-ab54-ed1f50bd1a34 Duration: 1 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 14 MB
    
    "Success! Parsed Events"

    You can see the full output of your function in the logs that follow the invoke command. In this example, the Python function prints out the event payload and then exits.

With the AWS SAM CLI, you can pass in valid test payloads that interface with data in other AWS services. You can also have your Lambda function talk to other AWS resources that exist in your account, for example Amazon DynamoDB tables, Amazon S3 buckets, and so on. You could also test an API interface using the local start-api command, provided that you have configured your AWS SAM template with events of the API type. Follow the full instructions for setting up and configuring the AWS SAM CLI in Installing the AWS SAM CLI. Find the full syntax guide for AWS SAM templates in the AWS Serverless Application Model documentation.

Testing in the Lambda console

After you have deployed your functions after the start of the Update/Create phase or with the Opt-In layer added, test your functions in the Lambda console.

  1. In the Lambda console, select the function to test.
  2. Select a test event and choose Test.
  3. If no test event exists, choose Configure test events.
    1. Choose Event template and select the relevant invocation service from which to test.
    2. Name the test event.
    3. Modify the event payload for your specific function.
    4. Choose Create and then return to step 2.

The results from the test are displayed.

Conclusion

With Lambda and [email protected], AWS has allowed developers to focus on just application code without the need to think about the work involved managing the actual servers that run the code. We believe that the mechanisms provided and processes described in this post allow you to easily test and update your functions for this new execution environment.

Some of you may have questions about this process, and we are ready to help you. Please contact us through AWS Support, the AWS forums, or AWS account teams.

Optimize Amazon EMR costs with idle checks and automatic resource termination using advanced Amazon CloudWatch metrics and AWS Lambda

Post Syndicated from Praveen Krishnamoorthy Ravikumar original https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-costs-with-idle-checks-and-automatic-resource-termination-using-advanced-amazon-cloudwatch-metrics-and-aws-lambda/

Many customers use Amazon EMR to run big data workloads, such as Apache Spark and Apache Hive queries, in their development environment. Data analysts and data scientists frequently use these types of clusters, known as analytics EMR clusters. Users often forget to terminate the clusters after their work is done. This leads to idle running of the clusters and in turn, adds up unnecessary costs.

To avoid this overhead, you must track the idleness of the EMR cluster and terminate it if it is running idle for long hours. There is the Amazon EMR native IsIdle Amazon CloudWatch metric, which determines the idleness of the cluster by checking whether there’s a YARN job running. However, you should consider additional metrics, such as SSH users connected or Presto jobs running, to determine whether the cluster is idle. Also, when you execute any Spark jobs in Apache Zeppelin, the IsIdle metric remains active (1) for long hours, even after the job is finished executing. In such cases, the IsIdle metric is not ideal in deciding the inactivity of a cluster.

In this blog post, we propose a solution to cut down this overhead cost. We implemented a bash script to be installed in the master node of the EMR cluster, and the script is scheduled to run every 5 minutes. The script monitors the clusters and sends a CUSTOM metric EMR-INUSE (0=inactive; 1=active) to CloudWatch every 5 minutes. If CloudWatch receives 0 (inactive) for some predefined set of data points, it triggers an alarm, which in turn executes an AWS Lambda function that terminates the cluster.

Prerequisites

You must have the following before you can create and deploy this framework:

Note: This solution is designed as an additional feature. It can be applied to any existing EMR clusters by executing the scheduler script (explained later in the post) as an EMR step. If you want to implement this solution as a mandatory feature for your future clusters, you can include the EMR step as part of your cluster deployment. You can also apply this solution to EMR clusters that are spun up through AWS CloudFormation, the AWS CLI, and even the AWS Management Console.

Key components

The following are the key components of the solution.

Analytics EMR cluster

Amazon EMR provides a managed Apache Hadoop framework that lets you easily process large amounts of data across dynamically scalable Amazon EC2 instances. Data scientists use analytics EMR clusters for data analysis, machine learning using notebook applications (such as Apache Zeppelin or JupyterHub), and running big data workloads based on Apache Spark, Presto, etc.

Scheduler script

The schedule_script.sh is the shell script to be executed as an Amazon EMR step. When executed, it copies the monitoring script from the Amazon S3 artifacts folder and schedules the monitoring script to run every 5 minutes. The S3 location of the monitoring script should be passed as an argument.

Monitoring script

The pushShutDownMetrin.sh script is a monitoring script that is implemented using shell commands. It should be installed in the master node of the EMR cluster as an Amazon EMR step. The script is scheduled to run every 5 minutes and sends the cluster activity status to CloudWatch.  

JupyterHub API token script

The jupyterhub_addAdminToken.sh script is a shell script to be executed as an Amazon EMR step if JupyterHub is enabled on the cluster. In our design, the monitoring script uses REST APIs provided by JupyterHub to check whether the application is in use.

To send the request to JupyterHub, you must pass an API token along with the request. By default, the application does not generate API tokens. This script generates the API token and assigns it to the admin user, which is then picked up by the jupyterhub module in the monitoring script to track the activity of the application.

Custom CloudWatch metric

All Amazon EMR clusters send data for several metrics to CloudWatch. Metrics are updated every 5 minutes, automatically collected, and pushed to CloudWatch. For this use case, we created the Amazon EMR metric EMR-INUSE. This metric represents the active status of the cluster based on the module checks implemented in the monitoring script. The metric is set to 0 when the cluster is inactive and 1 when active.

Amazon CloudWatch

CloudWatch is a monitoring service that you can use to set high-resolution alarms to take automated actions. In this case, CloudWatch triggers an alarm if it receives 0 continuously for the configured number of hours.

AWS Lambda

Lambda is a serverless technology that lets you run code without provisioning or managing servers. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration. You can set up your code to automatically trigger from other AWS services. In this case, the triggered CloudWatch alarm mentioned earlier signals Lambda to terminate the cluster.

Architectural diagram

The following diagram illustrates the sequence of events when the solution is enabled, showing what happens to the EMR cluster that is spun up via AWS CloudFormation.

 

The diagram shows the following steps:

  1. The AWS CloudFormation stack is launched to spin up an EMR cluster.
  2. The Amazon EMR step is executed (installs the pushShutDownMetric.sh and then schedules it as a cron job to run every 5 minutes).
  3. If the EMR cluster is active (executing jobs), the master node sets the EMR-INUSE metric to 1 and sends it to CloudWatch.
  4. If the EMR cluster is inactive, the master node sets the EMR-INUSE metric to 0 and sends it to CloudWatch.
  5. On receiving 0 for a predefined number of data points, CloudWatch triggers a CloudWatch alarm.
  6. The CloudWatch alarm sends notification to AWS Lambda to terminate the cluster.
  7. AWS Lambda executes the Lambda function.
  8. The Lambda function then deletes all the stack resources associated with the cluster.
  9. Finally, the EMR cluster is terminated, and the Stack ID is removed from AWS CloudFormation.

Modules in the monitoring script

Following are the different activity checks that are implemented in the monitoring script (pushShutDownMetric.sh). The script is designed in a modular fashion so that you can easily include new modules without modifying the core functionality.

ActiveSSHCheck

The ActiveSSHCheck module checks whether there are any active SSH connections to the master node. If there is an active SSH connection, and it’s idle for less than 10 minutes, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch.

YARNCheck

Apache Hadoop YARN is the resource manager of the EMR Hadoop ecosystem. All the Spark Submits and Hive queries reach YARN initially. It then schedules and processes these jobs. The YARNCheck module checks whether there are any running jobs in YARN or jobs completed within last 5 minutes. If it finds any, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch.The checks are performed by calling REST APIs exposed by YARN.

The API to fetch the running jobs is http://localhost:8088/ws/v1/cluster/apps?state=RUNNING.

The API to fetch the completed jobs is

http://localhost:8088/ws/v1/cluster/apps?state=FINISHED.

PRESTOCheck

Presto is an open-source distributed query engine for running interactive analytic queries. It is included in EMR release version 5.0.0 and later.The PRESTOCheck module checks whether there are any running Presto queries or if any queries have been completed within last 5 minutes. If there are some, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. These checks are performed by calling REST APIs exposed by the Presto server.

The API to fetch the Presto jobs is http://localhost:8889/v1/query.

ZeppelinCheck

Amazon EMR users use Apache Zeppelin as a notebook for interactive data exploration. The ZeppelinCheck module checks whether there are any jobs running or if any have been completed within the last 5 minutes. If so, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. These checks are performed by calling the REST APIs exposed by Zeppelin.

The API to fetch the list of notebook IDs is http://localhost:8890/api/notebook.

The API to fetch the status of each cell inside each notebook ID is http://localhost:8890/api/notebook/job/$notebookID.

JupyterHubCheck

Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and narrative text. JupyterHub allows you to host multiple instances of a single-user Jupyter notebook server.The JupyterHubCheck module checks whether any Jupyter notebook is currently in use.

The function uses REST APIs exposed by JupyterHub to fetch the list of Jupyter notebook users and gathers the data about individual notebook servers. From the response, it extracts the last activity time of the servers and checks whether any server was used in the last 5 minutes. If so, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. The jupyterhub_addAdminToken.sh script needs to be executed as an EMR step before enabling the scheduler script.

The API to fetch the list of notebook users is https://localhost:9443/hub/api/users -H "Authorization: token $admin_token".

The API to fetch individual server information is https://localhost:9443/hub/api/users/$user -H "Authorization: token $admin_token.

If any one of these checks fails, the cluster is considered to be inactive, and the monitoring script sets the EMR-INUSE metric to 0 and pushes it to CloudWatch.

Note:

The scheduler script schedules the monitoring script (pushShutDownMetric.sh) to run every 5 minutes. Internal cron jobs that run for a very few minutes are not considered in calibrating the EMR-INUSE metric.

Deploying each component

Follow the steps in this section to deploy each component of the proposed design.

Step 1. Create the Lambda function and SNS subscription

The Lambda function and the SNS subscription are the core components of the design. You must set up these components initially, and they are common for every cluster. The following are the AWS resources to be created for these components:

  • Execution role for the Lambda function
  • Terminate Idle EMR Lambda function
  • SNS topic and Lambda subscription

For one-step deployment, use this AWS CloudFormation template to launch and configure the resources in a single go.

The following parameters are available in the template.

ParameterDefaultDescription
s3Bucketemr-shutdown-blogartifactsThe name of the S3 bucket that contains the Lambda file
s3KeyEMRTerminate.zipThe Amazon S3 key of the Lambda file

For manual deployment, follow these steps on the AWS Management Console.

Execution role for the Lambda function

  1. Open the AWS Identity and Access Management (IAM) consoleand choose PoliciesCreate policy.
  2. Choose the JSON tab, paste the following policy text, and then choose Review policy.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:HeadBucket",
                "s3:ListObjects",
                "s3:GetObject",
                "cloudformation:ListStacks",
                "cloudformation:DeleteStack",
                "cloudformation:DescribeStacks",
                "cloudformation:ListStackResources",
                "elasticmapreduce:TerminateJobFlows"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "GenericAccess"
        },
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*",
            "Effect": "Allow",
            "Sid": "LogAccess"
        }
    ]
}
  1. For Name, enter TerminateEMRPolicy and choose Create policy.
  2. Choose RolesCreate role.
  3. Under Choose the service that will use this role, choose Lambda, and then choose Next: Permissions.
  4. For Attach permissions policies, choose the arrow next to Filter policies and choose Customer managed in the drop-down list.
  5. Attach the TerminateEMRPolicy policy that you just created, and choose Review.
  6. For Role name, enter TerminateEMRLambdaRole and then choose Create role.

Terminate idle EMR – Lambda function

I created a deployment package to use with this function.

  1. Open the Lambda consoleand choose Create function.
  2. Choose Author from scratch, and provide the details as shown in the following screenshot:
  • Name: lambdaTerminateEMR
  • Runtime: Python 2.7
  • Role: Choose an existing role
  • Existing role: TerminateEMRLambdaRole

  1. Choose Create function.
  2. In the Function code section, for Code entry type, choose Upload a file from Amazon S3, and for Runtime, choose Python 2.7.

The Lambda function S3 link URL is

s3://emr-shutdown-blogartifacts/EMRTerminate.zip.

Link to the function: https://s3.amazonaws.com/emr-shutdown-blogartifacts/EMRTerminate.zip

This Lambda function is triggered by a CloudWatch alarm. It parses the input event, retrieves the JobFlowId, and deletes the AWS CloudFormation stack of the corresponding JobFlowId.

SNS topic and Lambda subscription

For setting the CloudWatch alarm in the further stages, you must create an Amazon SNS topic that notifies the preceding Lambda function to execute. Follow these steps to create an SNS topic and configure the Lambda endpoint.

  1. Navigate to the Amazon SNS console, and choose Create topic.
  2. Enter the Topic name and Display name, and choose Create topic.

  1. The topic is created and displayed in the Topics
  2. Select the topic and choose Actions, Subscribe to topic.

  1. In the Create subscription, choose the AWS Lambda Choose lambdaterminateEMR as the endpoint, and choose Create subscription.

Step 2. Execute the JupyterHub API token script as an EMR step

This step is required only when JupyterHub is enabled in the cluster.

Navigate to the EMR cluster to be monitored, and execute the scheduler script as an EMR step.

Command: s3://emr-shutdown-blogartifacts/jupyterhub_addAdminToken.sh

This script generates an API token and assigns it to the admin user. It is then picked up by the jupyterhub module in the monitoring script to track the activity of the application.

Step 3. Execute the scheduler script as an EMR step

Navigate to the EMR cluster to be monitored and execute the scheduler script as an EMR step.

Note:

Ensure that termination protection is disabled in the cluster. The termination protection flag causes the Lambda function to fail.

Command: s3://emr-shutdown-blogartifacts/schedule_script.sh

Parameter: s3://emr-shutdown-blogartifacts/pushShutDownMetrin.sh

The step function copies the pushShutDownMetric.sh script to the master node and schedules it to run every 5 minutes.

The schedule_script.sh is at https://s3.amazonaws.com/emr-shutdown-blogartifacts/schedule_script.sh.

The pushShutDownMetrin.sh is at https://s3.amazonaws.com/emr-shutdown-blogartifacts/pushShutDownMetrin.sh.

Step 4. Create a CloudWatch alarm

For single-step deployment, use this AWS CloudFormation template to launch and configure the resources in a single go.

The following parameters are available in the template.

ParameterDefaultDescription
AlarmNameTerminateIDLE-EMRAlarmThe name for the alarm.
EMRJobFlowIDRequires inputThe Jobflowid of the cluster.
EvaluationPeriodRequires inputThe idle timeout value—input should be in data points (1 data point equals 5 minutes). For example, to terminate the cluster if it is idle for 20 minutes, the input should be 4.
SNSSubscribeTopicRequires inputThe Amazon Resource Name (ARN) of the SNS topic to be triggered on the alarm.

 

The AWS CloudFormation CLI command is as follows:

aws cloudformation create-stack --stack-name EMRAlarmStack \
      --template-body s3://emr-shutdown-blogartifacts/Cloudformation/alarm.json \
      --parameters AlarmName=TerminateIDLE-EMRAlarm,EMRJobFlowID=<Input>,                 EvaluationPeriod=4,SNSSubscribeTopic=<Input>

For manual deployment, follow these steps to create the alarm.

  1. Open the Amazon CloudWatch console and choose Alarms.
  2. Choose Create Alarm.
  3. On the Select Metric page, under Custom Metrics, choose EMRShutdown/Cluster-Metric.

  1. Choose the isEMRUsed metric of the EMR JobFlowId, and then choose Next.

  1. Define the alarm as required. In this case, the alarm is set to send notification to the SNS topic shutDownEMRTest when CloudWatch receives the IsEMRUsed metric as 0 for every data point in the last 2 hours.

  1. Choose Create Alarm.

Summary

In this post, we focused on building a framework to cut down the additional cost that you might incur due to the idle running of an EMR cluster. The modules implemented in the shell script, the tracking of the execution status of the Spark scripts, and the Hive/Presto queries using the lightweight REST API calls make this approach an efficient solution.

If you have questions or suggestions, please comment below.

 


About the Author

Praveen Krishnamoorthy Ravikumar is an associate big data consultant with Amazon Web Services.

 

 

 

 

Build and automate a serverless data lake using an AWS Glue trigger for the Data Catalog and ETL jobs

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/big-data/build-and-automate-a-serverless-data-lake-using-an-aws-glue-trigger-for-the-data-catalog-and-etl-jobs/

Today, data is flowing from everywhere, whether it is unstructured data from resources like IoT sensors, application logs, and clickstreams, or structured data from transaction applications, relational databases, and spreadsheets. Data has become a crucial part of every business. This has resulted in a need to maintain a single source of truth and automate the entire pipeline—from data ingestion to transformation and analytics— to extract value from the data quickly.

There is a growing concern over the complexity of data analysis as the data volume, velocity, and variety increases. The concern stems from the number and complexity of steps it takes to get data to a state that is usable by business users. Often data engineering teams spend most of their time on building and optimizing extract, transform, and load (ETL) pipelines. Automating the entire process can reduce the time to value and cost of operations. In this post, we describe how to create a fully automated data cataloging and ETL pipeline to transform your data.

Architecture

In this post, you learn how to build and automate the following architecture.

You build your serverless data lake with Amazon Simple Storage Service (Amazon S3) as the primary data store. Given the scalability and high availability of Amazon S3, it is best suited as the single source of truth for your data.

You can use various techniques to ingest and store data in Amazon S3. For example, you can use Amazon Kinesis Data Firehose to ingest streaming data. You can use AWS Database Migration Service (AWS DMS) to ingest relational data from existing databases. And you can use AWS DataSync to ingest files from an on-premises Network File System (NFS).

Ingested data lands in an Amazon S3 bucket that we refer to as the raw zone. To make that data available, you have to catalog its schema in the AWS Glue Data Catalog. You can do this using an AWS Lambda function invoked by an Amazon S3 trigger to start an AWS Glue crawler that catalogs the data. When the crawler is finished creating the table definition, you invoke a second Lambda function using an Amazon CloudWatch Events rule. This step starts an AWS Glue ETL job to process and output the data into another Amazon S3 bucket that we refer to as the processed zone.

The AWS Glue ETL job converts the data to Apache Parquet format and stores it in the processed S3 bucket. You can modify the ETL job to achieve other objectives, like more granular partitioning, compression, or enriching of the data. Monitoring and notification is an integral part of the automation process. So as soon as the ETL job finishes, another CloudWatch rule sends you an email notification using an Amazon Simple Notification Service (Amazon SNS) topic. This notification indicates that your data was successfully processed.

In summary, this pipeline classifies and transforms your data, sending you an email notification upon completion.

Deploy the automated data pipeline using AWS CloudFormation

First, you use AWS CloudFormation templates to create all of the necessary resources. This removes opportunities for manual error, increases efficiency, and ensures consistent configurations over time.

Launch the AWS CloudFormation template with the following Launch stack button.

Be sure to choose the US East (N. Virginia) Region (us-east-1). Then enter the appropriate stack name, email address, and AWS Glue crawler name to create the Data Catalog. Add the AWS Glue database name to save the metadata tables. Acknowledge the IAM resource creation as shown in the following screenshot, and choose Create.

Note: It is important to enter your valid email address so that you get a notification when the ETL job is finished.

This AWS CloudFormation template creates the following resources in your AWS account:

  • Two Amazon S3 buckets to store both the raw data and processed Parquet data.
  • Two AWS Lambda functions: one to create the AWS Glue Data Catalog and another function to publish topics to Amazon SNS.
  • An Amazon Simple Queue Service (Amazon SQS) queue for maintaining the retry logic.
  • An Amazon SNS topic to inform you that your data has been successfully processed.
  • Two CloudWatch Events rules: one rule on the AWS Glue crawler and another on the AWS Glue ETL job.
  • AWS Identity and Access Management (IAM) roles for accessing AWS Glue, Amazon SNS, Amazon SQS, and Amazon S3.

When the AWS CloudFormation stack is ready, check your email and confirm the SNS subscription. Choose the Resources tab and find the details.

Follow these steps to verify your email subscription so that you receive an email alert as soon as your ETL job finishes.

  1. On the Amazon SNS console, in the navigation pane, choose Topics. An SNS topic named SNSProcessedEvent appears in the display.

  1. Choose the ARN The topic details page appears, listing the email subscription as Pending confirmation. Be sure to confirm the subscription for your email address as provided in the Endpoint column.

If you don’t see an email address, or the link is showing as not valid in the email, choose the corresponding subscription endpoint. Then choose Request confirmation to confirm your subscription. Be sure to check your email junk folder for the request confirmation link.

Configure an Amazon S3 bucket event trigger

In this section, you configure a trigger on a raw S3 bucket. So when new data lands in the bucket, you trigger GlueTriggerLambda, which was created in the AWS CloudFormation deployment.

To configure notifications:

  1. Open the Amazon S3 console.
  2. Choose the source bucket. In this case, the bucket name contains raws3bucket, for example, <stackname>-raws3bucket-1k331rduk5aph.
  3. Go to the Properties tab, and under Advanced settings, choose Events.

  1. Choose Add notification and configure a notification with the following settings:
  • Name– Enter a name of your choice. In this example, it is crawlerlambdaTrigger.
  • Events– Select the All object create events check box to create the AWS Glue Data Catalog when you upload the file.
  • Send to– Choose Lambda function.
  • Lambda– Choose the Lambda function that was created in the deployment section. Your Lambda function should contain the string GlueTriggerLambda.

See the following screenshot for all the settings. When you’re finished, choose Save.

For more details on configuring events, see How Do I Enable and Configure Event Notifications for an S3 Bucket? in the Amazon S3 Console User Guide.

Download the dataset

For this post, you use a publicly available New York green taxi dataset in CSV format. You upload monthly data to your raw zone and perform automated data cataloging using an AWS Glue crawler. After cataloging, an automated AWS Glue ETL job triggers to transform the monthly green taxi data to Parquet format and store it in the processed zone.

You can download the raw dataset from the NYC Taxi & Limousine Commission trip record data site. Download the monthly green taxi dataset and upload only one month of data. For example, first upload only the green taxi January 2018 data to the raw S3 bucket.

Automate the Data Catalog with an AWS Glue crawler

One of the important aspects of a modern data lake is to catalog the available data so that it’s easily discoverable. To run ETL jobs or ad hoc queries against your data lake, you must first determine the schema of the data along with other metadata information like location, format, and size. An AWS Glue crawler makes this process easy.

After you upload the data into the raw zone, the Amazon S3 trigger that you created earlier in the post invokes the GlueTriggerLambdafunction. This function creates an AWS Glue Data Catalog that stores metadata information inferred from the data that was crawled.

Open the AWS Glue console. You should see the database, table, and crawler that were created using the AWS CloudFormation template. Your AWS Glue crawler should appear as follows.

Browse to the table using the left navigation, and you will see the table in the database that you created earlier.

Choose the table name, and further explore the metadata discovered by the crawler, as shown following.

You can also view the columns, data types, and other details.  In following screenshot, Glue Crawler has created schema from files available in Amazon S3 by determining column name and respective data type. You can use this schema to create external table.

Author ETL jobs with AWS Glue

AWS Glue provides a managed Apache Spark environment to run your ETL job without maintaining any infrastructure with a pay as you go model.

Open the AWS Glue console and choose Jobs under the ETL section to start authoring an AWS Glue ETL job. Give the job a name of your choice, and note the name because you’ll need it later. Choose the already created IAM role with the name containing <stackname>– GlueLabRole, as shown following. Keep the other default options.

AWS Glue generates the required Python or Scala code, which you can customize as per your data transformation needs. In the Advanced properties section, choose Enable in the Job bookmark list to avoid reprocessing old data.

On the next page, choose your raw Amazon S3 bucket as the data source, and choose Next. On the Data target page, choose the processed Amazon S3 bucket as the data target path, and choose Parquet as the Format.

On the next page, you can make schema changes as required, such as changing column names, dropping ones that you’re less interested in, or even changing data types. AWS Glue generates the ETL code accordingly.

Lastly, review your job parameters, and choose Save Job and Edit Script, as shown following.

On the next page, you can modify the script further as per your data transformation requirements. For this post, you can leave the script as is. In the next section, you automate the execution of this ETL job.

Automate ETL job execution

As the frequency of data ingestion increases, you will want to automate the ETL job to transform the data. Automating this process helps reduce operational overhead and free your data engineering team to focus on more critical tasks.

AWS Glue is optimized for processing data in batches. You can configure it to process data in batches on a set time interval. How often you run a job is determined by how recent the end user expects the data to be and the cost of processing. For information about the different methods, see Triggering Jobs in AWS Glue in the AWS Glue Developer Guide.

First, you need to make one-time changes and configure your ETL job name in the Lambda function and the CloudWatch Events rule. On the console, open the ETLJobLambda Lambda function, which was created using the AWS CloudFormation stack.

Choose the Lambda function link that appears, and explore the code. Change the JobName value to the ETL job name that you created in the previous step, and then choose Save.

As shown in in the following screenshot, you will see an AWS CloudWatch Events rule CrawlerEventRule that is associated with an AWS Lambda function. When the CloudWatch Events rule receives a success status, it triggers the ETLJobLambda Lambda function.

Now you are all set to trigger your AWS Glue ETL job as soon as you upload a file in the raw S3 bucket. Before testing your data pipeline, set up the monitoring and alerts.

Monitoring and notification with Amazon CloudWatch Events

Suppose that you want to receive a notification over email when your AWS Glue ETL job is completed. To achieve that, the CloudWatch Events rule OpsEventRule was deployed from the AWS CloudFormation template in the data pipeline deployment section. This CloudWatch Events rule monitors the status of the AWS Glue ETL job and sends an email notification using an SNS topic upon successful completion of the job.

As the following image shows, you configure your AWS Glue job name in the Event pattern section in CloudWatch. The event triggers an SNS topic configured as a target when the AWS Glue job state changes to SUCCEEDED. This SNS topic sends an email notification to the email address that you provided in the deployment section to receive notification.

Let’s make one-time configuration changes in the CloudWatch Events rule OpsEventRule to capture the status of the AWS Glue ETL job.

  1. Open the CloudWatch console.
  2. In the navigation pane, under Events, choose Rules. Choose the rule name that contains OpsEventRule, as shown following.

  1. In the upper-right corner, choose Actions, Edit.

  1. Replace Your-ETL-jobName with the ETL job name that you created in the previous step.

  1. Scroll down and choose Configure details. Then choose Update rule.

Now that you have set up an entire data pipeline in an automated way with the appropriate notifications and alerts, it’s time to test your pipeline. If you upload new monthly data to the raw Amazon S3 bucket (for example, upload the NY green taxi February 2018 CSV), it triggers the GlueTriggerLambda AWS Lambda function. You can navigate to the AWS Glue console, where you can see that the AWS Glue crawler is running.

Upon completion of the crawler, the CloudWatch Events rule CrawlerEventRule triggers your ETLJobLambda Lambda function. You can notice now that the AWS Glue ETL job is running.

When the ETL job is successful, the CloudWatch Events rule OpsEventRule sends an email notification to you using an Amazon SNS topic, as shown following, hence completing the automation cycle.

Be sure to check your processed Amazon S3 bucket, where you will find transformed data processed by your automated ETL pipeline. Now that the processed data is ready in Amazon S3, you need to run the AWS Glue crawler on this Amazon S3 location. The crawler creates a metadata table with the relevant schema in the AWS Glue Data Catalog.

After the Data Catalog table is created, you can execute standard SQL queries using Amazon Athena and visualize the data using Amazon QuickSight. To learn more, see the blog post Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight

Conclusion

Having an automated serverless data lake architecture lessens the burden of managing data from its source to destination—including discovery, audit, monitoring, and data quality. With an automated data pipeline across organizations, you can identify relevant datasets and extract value much faster than before. The advantage of reducing the time to analysis is that businesses can analyze the data as it becomes available in real time. From the BI tools, queries return results much faster for a single dataset than for multiple databases.

Business analysts can now get their job done faster, and data engineering teams can free themselves from repetitive tasks. You can extend it further by loading your data into a data warehouse like Amazon Redshift or making it available for machine learning via Amazon SageMaker.

Additional resources

See the following resources for more information:

 


About the Author

Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture in hybrid and AWS environments. He enjoys spending time with his family outdoors and traveling to new destinations to discover new cultures.

 

 

 

Luis Lopez Soria is a partner solutions architect and serverless specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys doing sports in addition to traveling around the world exploring new foods and cultures.

 

 

 

Chirag Oswal is a partner solutions architect and AR/VR specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys video games and travel.

From Poll to Push: Transform APIs using Amazon API Gateway REST APIs and WebSockets

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/from-poll-to-push-transform-apis-using-amazon-api-gateway-rest-apis-and-websockets/

This post is courtesy of Adam Westrich – AWS Principal Solutions Architect and Ronan Prenty – Cloud Support Engineer

Want to deploy a web application and give a large number of users controlled access to data analytics? Or maybe you have a retail site that is fulfilling purchase orders, or an app that enables users to connect data from potentially long-running third-party data sources. Similar use cases exist across every imaginable industry and entail sending long-running requests to perform a subsequent action. For example, a healthcare company might build a custom web portal for physicians to connect and retrieve key metrics from patient visits.

This post is aimed at optimizing the delivery of information without needing to poll an endpoint. First, we outline the current challenges with the consumer polling pattern and alternative approaches to solve these information delivery challenges. We then show you how to build and deploy a solution in your own AWS environment.

Here is a glimpse of the sample app that you can deploy in your own AWS environment:

What’s the problem with polling?

Many customers need implement the delivery of long-running activities (such as a query to a data warehouse or data lake, or retail order fulfillment). They may have developed a polling solution that looks similar to the following:

  1. POST sends a request.
  2. GET returns an empty response.
  3. Another… GET returns an empty response.
  4. Yet another… GET returns an empty response.
  5. Finally, GET returns the data for which you were looking.

The challenges of traditional polling methods

  • Unnecessary chattiness and cost due to polling for result sets—Every time your frontend polls an API, it’s adding costs by leveraging infrastructure to compute the result. Empty polls are just wasteful!
  • Hampered mobile battery life—One of the top contributors of apps that eat away at battery life is excessive polling. Make sure that your app isn’t on your users’ Top App Battery Usage list-of-shame that could result in deletion.
  • Delayed data arrival due to polling schedule—Some approaches to polling include an incremental backoff to limit the number of empty polls. This sometimes results in a delay between data readiness and data arrival.

But what about long polling?

  • User request deadlocks can hamper application performance—Long synchronous user responses can lead to unexpected user wait times or UI deadlocks, which can affect mobile devices especially.
  • Memory leaks and consumption could bring your app down—Keeping long-running tasks queries open may overburden your backend and create failure scenarios, which may bring down your app.
  • HTTP default timeouts across browsers may result in inconsistent client experience—These timeouts vary across browsers, and can lead to an inconsistent experience across your end users. Depending on the size and complexity of the requests, processing can last longer than many of these timeouts and take minutes to return results.

Instead, create an event-driven architecture and move your APIs from poll to push.

Asynchronous push model

To create optimal UX experiences, frontend developers often strive to create progressive and reactive user experiences. Users can interact with frontend objects (for example, push buttons) with little lag when sending requests and receiving data. But frontend developers also want users to receive timely data, without sacrificing additional user actions or performing unnecessary processing.

The birth of microservices and the cloud over the past several years has enabled frontend developers and backend service designers to think about these problems in an asynchronous manner. This enables the virtually unlimited resources of the cloud to choreograph data processing. It also enables clients to benefit from progressive and reactive user experiences.

This is a fresh alternative to the synchronous design pattern, which often relies on client consumers to act as the conductor for all user requests. The following diagram compares the flow of communication between patterns.Comparison of communication patterns

Asynchronous orchestration can be easily choreographed using the workflow definition, monitoring, and tracking console with AWS Step Functions state machines. Break up services into functions with AWS Lambda and track executions within a state diagram, like the following:

With this approach, consumers send a request, which initiates downstream distributed processing. The subsequent steps are invoked according to the state machine workflow and each execution can be monitored.

Okay, but how does the consumer get the result of what they’re looking for?

Multiple approaches to this problem

There are different approaches for consumers to retrieve their resulting data. In order to create the optimal solution, there are several questions that a service owner may need to ask.

Is there known trust with the client (such as another microservice)?

If the answer is Yes, one approach is to use Amazon SNS. That way, consumers can subscribe to topics and have data delivered using email, SMS, or even HTTP/HTTPS as an event subscriber (that is, webhooks).

With webhooks, the consumer creates an endpoint where the service provider can issue a callback using a small amount of consumer-side resources. The consumer is waiting for an incoming request to facilitate downstream processing.

  1. Send a request.
  2. Subscribe the consumer to the topic.
  3. Open the endpoint.
  4. SNS sends the POST request to the endpoint.

If the trust answer is No, then clients may not be able to open a lightweight HTTP webhook after sending the initial request. In that case, you must open the webhook. Consider an alternative framework.

Are you restricted to only using a REST protocol?

Some applications can only run HTTP requests. These scenarios include the age of the technology or browser for end users, and potential security requirements that may block other protocols.

In these scenarios, customers may use a GET method as a best practice, but we still advise avoiding polling. In these scenarios, some design questions may be: Does data readiness happen at a predefined time or duration interval? Or can the user experience tolerate time between data readiness and data arrival?

If the answer to both of these is Yes, then consider trying to send GET calls to your RESTful API one time. For example, if a job averages 10 minutes, make your GET call at 10 minutes after the submission. Sounds simple, right? Much simpler than polling.

Should I use GraphQL, a WebSocket API, or another framework?

Each framework has tradeoffs.

If you want a more flexible query schema, you may gravitate to GraphQL, which follows a “data-driven UI” approach. If data drives the UI, then GraphQL may be the best solution for your use case.

AWS AppSync is a serverless GraphQL engine that supports the heavy lifting of these constructs. It offers functionality such as AWS service integration, offline data synchronization, and conflict resolution. With GraphQL, there’s a construct called Subscriptions where clients can subscribe to event-based subscriptions upon a data change.

Amazon API Gateway makes it easy for developers to deploy secure APIs at scale. With the recent introduction of API Gateway WebSocket APIs, web, mobile clients, and backend services can communicate over an established WebSocket connection. This also allows clients to be more reactive to data updates and only do work one time after an update has been received over the WebSocket connection.

The typical frontend design approach is to create a UI component that is updated when the results of the given procedure are complete. This is beneficial over a complete webpage refresh and gains in the customer’s user experience.

Because many companies have elected to use the REST framework for creating API-driven tightly bound service contracts, a RESTful interface can be used to validate and receive the request. It can provide a status endpoint used to deliver status. Also, it provides additional flexibility in delivering the result to the variety of clients, along with the WebSocket API.

Poll-to-push solution with API Gateway

Imagine a scenario where you want to be updated as soon as the data is created. Instead of the traditional polling methods described earlier, use an API Gateway WebSocket API. That pushes new data to the client as it’s created, so that it can be rendered on the client UI.

Alternatively, a WebSocket server can be deployed on Amazon EC2. With this approach, your server is always running to accept and maintain new connections. In addition, you manage the scaling of the instance manually at times of high demand.

By using an API Gateway WebSocket API in front of Lambda, you don’t need a machine to stay always on, eating away your project budget. API Gateway handles connections and invokes Lambda whenever there’s a new event. Scaling is handled on the service side. To update our connected clients from the backend, we can use the API Gateway callback URL. The AWS SDKs make communication from the backend easy. As an example, see the Boto3 sample of post_to_connection:

import boto3 
#Use a layer or deployment package to 
#include the latest boto3 version.
...
apiManagement = boto3.client('apigatewaymanagementapi', region_name={{api_region}},
                      endpoint_url={{api_url}})
...
response = apiManagement.post_to_connection(Data={{message}},ConnectionId={{connectionId}})
...

Solution example

To create this more optimized solution, create a simple web application that enables a user to make a request against a large dataset. It returns the results in a flat file over WebSocket. In this case, we’re querying, via Amazon Athena, a data lake on S3 populated with AWS Twitter sentiment (Twitter: @awscloud or #awsreinvent). However, this solution could apply to any data store, data mart, or data warehouse environment, or the long-running return of data for a response.

For the frontend architecture, create this web application using a JavaScript framework (such as JQuery). Use API Gateway to accept a REST API for the data and then open a WebSocket connection on the client to facilitate the return of results:poll to push solution architecture

  1. The client sends a REST request to API Gateway. This invokes a Lambda function that starts the Step Functions state machine execution. The function also returns a task token for the open connection activity worker. The function then returns the execution ARN and task token to the client.
  2. Using the data returned by the REST request, the client connects to the WebSocket API and sends the task token to the WebSocket connection.
  3. The WebSocket notifies the Step Functions state machine that the client is connected. The Lambda function completes the OpenConnection task through validating the task token and sending a success message.
  4. After RunAthenaQuery and OpenConnection are successful, the state machine updates the connected client over the WebSocket API that their long-running job is complete. It uses the REST API call post_to_connection.
  5. The client receives the update over their WebSocket connection using the IssueCallback Lambda function, with the callback URL from the API Gateway WebSocket API.

In this example application, the data response is an S3 presigned URL composed of results from Athena. You can then configure your frontend to use that S3 link to download to the client.

Why not just open the WebSocket API for the request?

While this approach can work, we advise against it for this use case. Break the required interfaces for this use case into three processes:

  1. Submit a request.
  2. Get the status of the request (to detect failure).
  3. Deliver the results.

For the submit request interface, RESTful APIs have strong controls to ensure that user requests for data are validated and given guardrails for the intended query purpose. This helps prevent rogue requests and unforeseen impacts to the analytics environment, especially when exposed to a large number of users.

With this example solution, you’re restricting the data requests to specific countries in the frontend JavaScript. Using the RESTful POST method for API requests enables you to validate data as query string parameters, such as the following:

https://<apidomain>.amazonaws.com/Demo/CreateStateMachineAndToken?Country=France

API Gateway models and mapping templates can also be used to validate or transform request payloads at the API layer before they hit the backend. Use models to ensure that the structure or contents of the client payload are the same as you expect. Use mapping templates to transform the payload sent by clients to another format, to be processed by the backend.

This REST validation framework can also be used to detect header information on WebSocket browser compatibility (external site). While using WebSocket has many advantages, not all browsers support it, especially older browsers. Therefore, a REST API request layer can pass this browser metadata and determine whether a WebSocket API can be opened.

Because a REST interface is already created for submitting the request, you can easily add another GET method if the client must query the status of the Step Functions state machine. That might be the case if a health check in the request is taking longer than expected. You can also add another GET method as an alternative access method for REST-only compatible clients.

If low-latency request and retrieval are an important characteristic of your API and there aren’t any browser-compatibility risks, use a WebSocket API with JSON model selection expressions to protect your backend with a schema.

In the spirit of picking the best tool for the job, use a REST API for the request layer and a WebSocket API to listen for the result.

This solution, although secure, is an example for how a non-polling solution can work on AWS. At scale, it may require refactoring due to the cross-talk at high concurrency that may result in client resubmissions.

To discover, deploy, and extend this solution into your own AWS environment, follow the PollToPush instructions in the AWS Serverless Application Repository.

Conclusion

When application consumers poll for long-running tasks, it can be a wasteful, detrimental, and costly use of resources. This post outlined multiple ways to refactor the polling method. Use API Gateway to host a RESTful interface, Step Functions to orchestrate your workflow, Lambda to perform backend processing, and an API Gateway WebSocket API to push results to your clients.

This Is My Architecture: Mobile Cryptocurrency Mining

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/this-is-my-architecture-mobile-cryptocurrency-mining/

In North America, approximately 95% of adults over the age of 25 have a bank account. In the developing world, that number is only about 52%. Cryptocurrencies can provide a platform for millions of unbanked people in the world to achieve financial freedom on a more level financial playing field.

Electroneum, a cryptocurrency company located in England, built its cryptocurrency mobile back end on AWS and is using the power of blockchain to unlock the global digital economy for millions of people in the developing world.

Electroneum’s cryptocurrency mobile app allows Electroneum customers in developing countries to transfer ETNs (exchange-traded notes) and pay for goods using their smartphones. Listen in to the discussion between AWS Solutions Architect Toby Knight and Electroneum CTO Barry Last as they explain how the company built its solution. Electroneum’s app is a web application that uses a feedback loop between its web servers and AWS WAF (a web application firewall) to automatically block malicious actors. The system then uses Athena, with a gamified approach, to provide an additional layer of blocking to prevent DDoS attacks. Finally, Electroneum built a serverless, instant payments system using AWS API Gateway, AWS Lambda, and Amazon DynamoDB to help its customers avoid the usual delays in confirming cryptocurrency transactions.

 

Amazon Cognito for Alexa Skills User Management

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/amazon-cognito-for-alexa-skills-user-management/

This post is courtesy of Tom Moore, Solutions Architect – AWS

If your Alexa skill is a general information skill, such as a random facts skill or a news feed, you can provide information to any user who has an Alexa enabled device with your skill turned on. However, sometimes you need to know who the user is before you can provide information to them. You can fulfill this user management scenario with Amazon Cognito user pools.

This blog post will show you how to set up an Amazon Cognito user pool and how to use it to perform authentication for both your Alexa skill and a webpage.

Getting started

In order to complete the steps in this blog post you will need the following:

  • An AWS account
  • An Amazon developer account
  • A basic understanding of Amazon Alexa skill development

This example will use a sample Alexa skill deployed from one of the available skill templates. To fully develop your own Alexa skill, you will need a professional code editor or IDE, as well as knowledge of Alexa skill development. It is beyond the scope of this blog post to cover these details.

Before you begin, consider the set of services that you will use and their availability. To implement this solution, you will use Amazon Cognito for user accounts and AWS Lambda for the Alexa function.

Today, AWS Lambda supports calls from Alexa in the following regions:

  • Asia Pacific (Tokyo)
  • EU (Ireland)
  • US East (N. Virginia)
  • US West (Oregon)

These four regions also support Amazon Cognito. While it is possible to use Amazon Cognito in a different region than your Lambda function, I recommend choosing one of the four listed regions to deploy your entire solution for simplicity.

Setting up Amazon Cognito

To set up Amazon Cognito, you’ll need to create a user pool, create an Alexa client, and set up your authentication UI.

Create your Amazon Cognito user pool

  1. Sign in to the Amazon Cognito console. You might be prompted for your AWS credentials.
  2. From the console navigation bar, choose one of the four regions listed above. For the purposes of this blog, I’ll use US East (N. Virginia).
  3. Choose Manage User Pools.Amazon Cognito
  4. Choose Create a user pool, and provide a name for your user pool. Remember that user pools may be used across multiple applications and platforms including web, mobile, and Alexa. The pool name does not have to be globally unique, but it should be unique in your account so you can easily find the pool when needed. I have named my user pool “Alexa Demo.”Create A User Pool
  5. After you name your pool, choose Step through settings. You can accept the defaults for the remaining steps to set up your user pool, with the following exceptions:
    • Choose email address or phone number as the sign-in method, and then choose Allow both email addresses and phone numbers.
    • Enable Multi-Factor Authentication (MFA).You can use Amazon Cognito to enforce Multi-Factor Authentication for your users. Amazon Cognito also allows you to validate email and phone numbers when the user is created. The verification process for phone numbers requires that Amazon Cognito is able to access the Amazon Simple Notification Service (SNS) service in order to dispatch the SMS message for phone number verification. This access is granted through the use of an AWS Identity and Access Management (IAM) service role. The Amazon Cognito Setup process can automatically create this role for you.
  6. To set up Multi-Factor Authentication:
    • Under Do you want to enable Multi-Factor Authentication (MFA), choose Optional.
    • Choose SMS text message as a second authentication factor, and then choose the options you want to be verified.
    • Choose Create Role, and then choose Next Step.Configure Multi-Factor Authentication
    • For more information, see Adding Multi-Factor Authentication (MFA) to a User Pool.Because the verification process sends SMS messages, some costs will be incurred on your account. If you have not already done so, you will need to request a spending increase on your account to accommodate those charges. To learn more about costs for SMS messages, see SMS Text Messages MFA.
  7. Review the selections that you have made. If you are happy with the settings that you have selected, choose Create Pool.

Create the Alexa client

By completing the steps above, you will have created an Amazon Cognito user pool. The next task in setting up account linking is to create the Alexa client definition inside the Amazon Cognito user pool.

  1. From the Amazon Cognito console, choose Manage User Pools. Select the user pool you just created.
  2. From the General settings menu, choose App Clients to set up applications that will connect to your Amazon Cognito user pool.General Settings for App Clients
  3. Choose Add an App Client, and provide the App client name. In this example, I have chosen “Alexa.” Leave the rest of the options set to default and choose Create App Client to generate the client record for Alexa to use. This process creates an app client ID and a secret.App Client Settings
    To learn more, see Configuring a User Pool App Client.

Set up your Authentication UI

Amazon Cognito can set up and manage the Authentication UI for your application so that you don’t have to host your own sign-in and sign-up UI for your Alexa application.

  1. From the App integration menu, choose Domain name.Choose Domain Name
  2. For this example, I will use an Amazon Cognito domain. Provide a subdomain name and choose Check Availability. If the option is available, choose Save Changes.Choosing a Domain Name

Setting up the Alexa skill

Now you can create the Alexa skill and link it back to the Amazon Cognito user pool that you created.

For step-by-step instructions for creating a new Alexa skill, see Create a New Skill in the Alexa documentation. Follow those instructions, with the following specific selections:

Under Choose a model to add to your skill, keep the default option of Custom.


Under Choose a method to host your skill’s back end resources, keep the default selection of Self Hosted.
Self Hosted

For a custom skill, you can choose a predefined skill template for the back end code for your skill. For this example, I’ll use a Fact Skill template as a starting point. The skill template prepopulates the Lambda function that your Alexa skill uses.

Fact Skill
After you create your sample skill, you’ll need to complete a few basic operations:

  • Set the invocation name of the skill
  • Prepare a Lambda function to handle the skill invocation
  • Connect the Alexa skill to your lambda
  • Test your skill

A full description of these steps is beyond the scope of this blog post. To learn more, see Manage Skills in the Developer Console. Once you have completed these steps, return to this post to continue linking your skill with Amazon Cognito.

Linking Alexa with Amazon Cognito

To link your Alexa skill with Amazon Cognito user pools, you’ll need to update both the Amazon Cognito and Alexa interfaces with data from the other service. I recommend that you have both interfaces open in different tabs of your web browser to make it easy to move back and forth between the two services.

  1. In Amazon Cognito, open the app pool that you created. Under General Settings, choose App Clients. Next, choose Show Details in the section for the Alexa Client that you set up earlier. Make a note of the App client ID and the App client secret. These will be needed to configure Alexa skills app linking.App Client Settings
  2. Switch over to your Alexa developer account and open the skill that you are linking to Amazon Cognito. Choose Account Linking.
  3. Select the option to allow users to link accounts. Leave the default option for an Auth Code Grant selected.TheAccount Linking
    Authorization URI will be made up of the following template:

    https://{Sub-Domain}.auth.{Region}.amazoncognito.com/oauth2/authorize?response_type=code&redirect_uri=https://pitangui.amazon.com/api/skill/link/{Vendor ID}

  4. Replace the {Sub-Domain} with the sub domain that you selected when you set up your Amazon Cognito user pool. In my example, it was “mooretom-alexademo”
  5. Replace {Vendor ID} with your specific vendor ID for your Alexa development account. The easiest way to find this is to scroll down to the bottom of the account linking page. Your Vendor ID will be the final piece of information in the Redirect URI’s.Redirect URLs
  6. Replace {Region} with the name of the region you are deploying your resources into. In my example, was us-east-1.
  7. The Access Token URI will be made up of the following template:
    https://{Sub-Domain}.auth.{region}.amazoncognito.com/oauth2/token

  8. Enter the app client ID and the app client secret that you noted above, or return to the Amazon Cognito tab to copy and paste them.Grant Auth Code
  9. Choose Save at the top of the page. Make a note of the redirect URLs at the bottom of the page, as these will be required to finish the Amazon Cognito configuration in the next step.
  10. Switch back to your Amazon Cognito user pool. Under App Integration, choose App Client Settings. You will see the integration settings for the Alexa client in the details panel on the right.
  11. Under Enabled Identity Providers, choose Cognito User Pool.
  12. Under Callback URL(s) enter in the three callback URLs from your Alexa skill page. For example, here are all three URLs separated by commas:
    https://alexa.amazon.co.jp/api/skill/link/{Vendor ID},
    https://layla.amazon.com/api/skill/link/{Vendor ID},
    https://pitangui.amazon.com/api/skill/link/{Vendor ID}

    The Sign Out URL will follow this template:

    https://{SubDomain}.auth.us-east-1.amazoncognito.com/logout?response_type=code

  13. Under Allowed OAuth Flows, select Authorization code grant.
  14. Under Allowed OAuth Scopes, select phone, email, and openid.Enable Identity Providers
  15. Choose Save Changes.

Testing your Alexa skill

After you have linked Alexa with Amazon Cognito, return to the Alexa developer console and build your model. Then log into the Alexa application on your mobile phone and enable the skill. When the skill is enabled, you will be able to configure access and create a new user with phone number authentication included automatically.

After going through the account creation steps, you can return to your Amazon Cognito user pool and see the new user you created.

New Customer

Conclusion

By completing the steps in this post, you have leveraged Amazon Cognito as a source of authentication for your Amazon Alexa skill. Amazon Cognito provides user authentication as well as sign-in and sign-up functionality without requiring you to write any code. You can now use the Amazon Cognito user ID to personalize the user experience for your Alexa skill. You can also use Amazon Cognito to authenticate your users to a companion application or website.

Outbound Voice Calling with Amazon Pinpoint

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/outbound-voice-calling-with-amazon-pinpoint/

This post is courtesy of Tom Moore, Solutions Architect – AWS

With the recent extension of Amazon Pinpoint to allow an outgoing voice channel, customers can now build applications that include voice messaging to their users. Potential use cases include two-factor authentication via voice for your website and automated reminders of upcoming appointments. This blog post guides you through the process of setting up this functionality.

The Amazon Pinpoint voice channel allows for outbound calls only. If your use case requires additional capabilities such as an interactive voice response (IVR) system, you need to use Amazon Connect instead for your messaging.

Prerequisites

As part of this configuration, you set a default AWS Region. You should set the default Region to the Region where Amazon Pinpoint is available. Valid Regions are currently US East (N. Virginia), US West (Oregon), EU (Ireland), and EU (Frankfurt). If you have already installed and configured the AWS CLI tools and your default Region doesn’t support Amazon Pinpoint, do one of the following:

  • Run the aws configure command and change the default Region
  • Specify the --region switch on any commands that you issue

The Region that you select for the AWS CLI must be the same region you select in the AWS Management Console. To change the Region on the console, choose the down arrow next to the displayed Region (N. Virginia in the following image) and select the new Region.

Region Selecter

Services

This blog post touches on the following AWS services:

Because the code for this blog post is in NodeJS, basic familiarity with JavaScript is helpful for understanding the code and making changes to it.

Pricing

This blog post uses two features that aren’t covered under the AWS Free Tier: Amazon Pinpoint long codes (virtual phone numbers) for messaging and Amazon Pinpoint voice messaging. For pricing information for these features, see Amazon Pinpoint long code pricing and Amazon Pinpoint voice message pricing.

For example, suppose that you set up the Amazon Pinpoint application in a US Region with a single phone number and make 10 minutes of outbound calls to US phone numbers. You incur the following charges.

ItemQuantityUnit CostTotal
Long codes1$1.00$1.00
Call charges10$0.013$0.13
Total$1.13

Creating an Amazon S3 bucket

To deploy your AWS SAM application, you need an Amazon S3 bucket to store the deployment files. When you create a bucket in your account, note the bucket name for later use, where YOUR_BUCKET appears in our code. This bucket is used for temporary storage of your AWS SAM deployments. It shouldn’t be publicly accessible.

On the Amazon S3 console, choose Create Bucket.

Create Bucket

Enter a name for the bucket. The name must conform to the Amazon S3 bucket naming requirements. Choose the Region where you will be deploying your Lambda function and using Amazon Pinpoint. Keep the rest of the defaults and choose Create.

Create Bucket Options

If you prefer, you can use the following command with the AWS CLI to create the S3 bucket in your account.

aws s3 mb s3://{Bucket Name}

Setting up Amazon Pinpoint

The first step in enabling outbound calling is to set up Amazon Pinpoint.

On the AWS Management Console, under Customer Engagement, choose Amazon Pinpoint. Enter a project name and choose Create a project.

Amazon Pinpoint

If you have already created Amazon Pinpoint projects in this Region, you get a project-list page instead of a getting-started page, as shown in the following image. On this page, choose Create a project and enter a project name.

Create a Project

Now you can select the project features that you want to enable. On the Configure features page, for SMS and voice, choose Configure.

Configure features

On the Set up SMS page, expand the Advanced configurations section and choose Request long codes.

Set up SMS

On the Long code specifications page, select the country that you want to request the long code (10-digit phone number) for. Keep the rest of the defaults and choose Request long codes.

Long Code Specifications

You’re assigned a phone number and returned to the Amazon Pinpoint configuration page. The phone number assigned to your application appears under Number settings, as shown in the following image. You can send voice messages only from a long code that your account owns.

SMS and Voice

This completes the Amazon Pinpoint setup.

Creating the application

AWS SAM provides a more streamlined process for creating serverless applications. The AWS SAM CLI also provides a convenient mechanism for packaging and deploying your serverless applications. For the code in this blog post, see Amazon Pinpoint Call Generator on GitHub. You can also deploy this application through the AWS Serverless Application Repository. For more information, see Amazon Pinpoint Call Generator.

Once you have a copy of the code, you need to make a few changes using your favorite text editor or IDE.

Modifying the template file

The template file, template.yaml, defines your AWS SAM application. Specifically, the template defines two resources: an IAM role for your serverless function and the serverless function itself.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Serverless application to trigger outbound calls from Pinpoint.
    
Globals:
    Function:
        Timeout: 30

Resources:
  CallGeneratorFunctionIamRole: 
    Type: AWS::IAM::Role
    Properties: 
      RoleName: PinpointCallGenerator-Role
      AssumeRolePolicyDocument: 
        Version: '2012-10-17'
        Statement: 
        - Effect: Allow
          Principal: 
            Service: lambda.amazonaws.com
          Action: 
          - sts:AssumeRole
      Path: '/'
      Policies: 
      - PolicyName: logs
        PolicyDocument: 
          Statement: 
          - Effect: Allow
            Action: 
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource: arn:aws:logs:*:*:*
      - PolicyName: Pinpoint
        PolicyDocument: 
          Statement: 
          - Effect: Allow
            Action: 
            - sms-voice:*
            Resource: '*'

  CallGeneratorFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: nodejs8.10
      FunctionName: PinpointCallGenerator
      Role: !GetAtt CallGeneratorFunctionIamRole.Arn
      Environment: 
        Variables:
          LongCode: '[YOUR_LONG_CODE_HERE]'
          Language: 'en-US' #Update this for different language
          Voice: 'Joanna'   #Update this for different voices
            
#Outputs:
    CallGeneratorLambdaFunction:
      Description: "Lambda function to trigger calls"
      Value: !GetAtt CallGeneratorFunction.Arn

    CallGeneratorFunctionIamRole:
      Description: "IAM Role created for this function"
      Value: !GetAtt CallGeneratorFunctionIamRole.Arn

The CallGeneratorFunctionIamRole IAM role allows the Lambda function to create CloudWatch Logs entries for monitoring the execution of your Lambda function and to call the Amazon Pinpoint voice service.

The Environment section of the CallGeneratorFunction definition sets the environment parameters that are provided to your Lambda function. By using environment variables, you can easily change the configuration for how your application makes calls without having to update your code.

Update the LongCode parameter to the number that you reserved through Amazon Pinpoint. In Amazon Pinpoint, the number appears as +1 123-456-7890, but in the template, you can’t use spaces or punctuation in the number: +11234567890.

Optionally, you can update the Language and Voice parameters to reflect different cultures. For valid options for these parameters, see Voices in Amazon Polly.

Understanding the source file

The main source file is app.js. It contains the NodeJS code for the application.

The exports line defines a standard Lambda handler that is called from the Lambda runtime. The triggerCall function handle the calling of Amazon Pinpoint asynchronously.

const AWS = require('aws-sdk');
var pinpointsmsvoice = new AWS.PinpointSMSVoice({apiVersion: '2018-09-05'});

function triggerCall (eventData) {
    return new Promise (resolve => {
        var parms = {
            Content: {
                SSMLMessage: {
                    LanguageCode : process.env.Language,
                    Text : eventData.Message,
                    VoiceId: process.env.Voice
                }
            },
            OriginationPhoneNumber: process.env.LongCode,
            DestinationPhoneNumber: eventData.PhoneNumber
        };

        console.log ("Call Parameters: ", JSON.stringify(parms));
        pinpointsmsvoice.sendVoiceMessage (parms, function (err, data) {
            if (err) {
                console.log ("Error : "+ err.message);
                resolve(eventData.PhoneNumber + " " + err.message);
            }
            else {
                console.log (data);
                resolve(eventData.PhoneNumber + " OK");
            }
        });
    });
}

exports.lambda_handler = async (event, context, callback) => {
    console.log ("In Function - lambda_handler")
    try {
        var result = await triggerCall (event);
    }
    catch (err) {
        console.log(err);
        callback(err, null);
    }
};

The parms structure defines the standard payload that is passed to Amazon Pinpoint to trigger a voice phone call. In this case, the parameters are all extracted from either the message payload or the environment variables defined in our AWS SAM template. We’re expecting the message to be passed in as a Synthesized Speech Markup Language (SSML) payload.

var parms = {
    Content: {
       SSMLMessage: {
            LanguageCode : process.env.Language,
            Text : eventData.Message,
            VoiceId: process.env.Voice
        }
    },
    OriginationPhoneNumber: process.env.LongCode,
    DestinationPhoneNumber: eventData.PhoneNumber
};

The following code sends the parameters off to Amazon Pinpoint to trigger the voice call and then resolves the asynchronous call.

pinpointsmsvoice.sendVoiceMessage (parms, function (err, data) {
    if (err) {
        console.log ("Error : "+ err.message);
        resolve(eventData.PhoneNumber + " " + err.message);
    }
    else {
        console.log (data);
        resolve(eventData.PhoneNumber + " OK");
    }
});

Packaging and deploying the application

Deploying an AWS SAM application requires the following commands.

sam validate

This command verifies that your template is valid, free from errors.

sam package --template-file template.yaml --output-template-file packaged.yaml --s3-bucket [YOUR_BUCKET]

This command packages up your resources into a zip file and uploads the resulting files to your S3 bucket in preparation for deployment. The command also creates the packaged.yaml template file, which contains the details necessary to deploy your application via AWS CloudFormation.

sam deploy --template-file packaged.yaml --stack-name pinpoint-call-generator --capabilities CAPABILITY_NAMED_IAM

This command deploys your packaged files using AWS CloudFormation.

After all commands have completed, your function is ready to test.

Testing the application

After you have deployed your application, you can test it on the Lambda console. Sign in to the AWS Management Console and then choose or search for Lambda.

On the Lambda console, choose the function’s name to open it.

Choose Your Lambda Function

On the function’s page, choose Test.

Choose Test

When you first choose Test, an editor opens. Here you can configure the payload that Lambda passes your function as part of the test call.

Configure Test Event

Replace the default text with the following.

{
    "Message" : "<speak>This is a text from <emphasis>Pinpoint</emphasis> using SSML. <break time='1s' /> I repeat. This is a text from <emphasis>Pinpoint</emphasis> using SSML.</speak>",
    "PhoneNumber" : "+11234567890"
}

The Message portion of the payload is defined in SSML. For more information about SSML, see Speech Synthesis Markup Language (SSML) Reference.

Update the PhoneNumber value with the phone number that you want to call and enter a name for your test payload. To save the configured payload to use in your tests, choose Save.

After the configuration panel closes, choose Test. Amazon Pinpoint calls your phone number and read the message out.

Conclusion

The blog post walked you through the basis of setting up outbound calling using Amazon Pinpoint. You can now trigger the Lambda function with any of the standard Lambda event triggers or with the AWS SDK in mobile or web applications. For example, you could provide a one-time password to users, trigger reminders for appointments, or notify someone when a file arrives in an S3 bucket.

The provided function code is intended to respond to single message-triggering events. These include application logic, files arriving in S3, or scheduled reminders. You need to make additional changes to support bulk event sources such as Amazon SQS or streaming sources such as Amazon DynamoDB streams and Amazon Kinesis. For more information about Lambda event sources, see Supported Event Sources.

If your use case requires additional resiliency, you might want to use Amazon SNS or Amazon SQS to deliver messages to Lambda. If your customers are from an international audience, you might consider passing the language and the voice through the event and updating the code to retrieve those values.

Using AWS Lambda and Amazon SNS to Get File Change Notifications from AWS CodeCommit

Post Syndicated from Eason Cao original https://aws.amazon.com/blogs/devops/using-aws-lambda-and-amazon-sns-to-get-file-change-notifications-from-aws-codecommit/

Notifications are an important part of DevOps workflows. Although you can set them up from any stage in the CI or CD pipelines, in this blog post, I will show you how to integrate AWS Lambda and Amazon SNS to extend AWS CodeCommit. Specifically, the solution described in this post makes it possible for you to receive detailed notifications from Amazon SNS about file changes and commit messages when an update is pushed to AWS CodeCommit.

Amazon SNS is a flexible, fully managed notifications service. It coordinates the delivery of messages to receivers. With Amazon SNS, you can fan out messages to a large number of subscribers, including distributed systems and services, and mobile devices. It is easy to set up, operate, and reliably send notifications to all your endpoints – at any scale.

AWS Lambda is our popular serverless service that lets you run code without provisioning or managing servers. In the example used in this post, I use a Lambda function to publish a topic through Amazon SNS to get an update notification.

Amazon CloudWatch is a monitoring and management service. It can collect operational data of AWS resources in the form of events. You can set up simple rules in Amazon CloudWatch to detect changes to your AWS resources. After CloudWatch captures the update event from your AWS resources, it can trigger specific targets to perform other actions (for example, to invoke a Lambda function).

To help you quickly deploy the solution, I have created an AWS CloudFormation template. AWS CloudFormation is a management tool that provides a common language to describe and provision all of the infrastructure resources in AWS.

 

Overview

The following diagram shows how to use AWS services to receive the CodeCommit file change event and details.

AWS CodeCommit supports several useful CloudWatch events, which can notify you of changes to AWS resources. By setting up simple rules, you can detect branch or repository changes. In this example, I create a CloudWatch event rule for an AWS CodeCommit repository so that any designated event invokes a Lambda function. When a change is made to the CodeCommit repository, CloudWatch detects the event and invokes the customized Lambda function.

When this Lambda function is triggered, the following steps are executed:

  1. Use the GetCommit operation in the CodeCommit API to get the latest commit. I want to compare the parent commit IDs with the last commit.
  2. For each commit, use the GetDifferences operation to get a list of each file that was added, modified, or deleted.
  3. Group the modification information from the comparison result and publish the message template to an SNS topic defined in the Lambda environment variable.
  4. Allow reviewers to subscribe to the SNS topic. Any update message from CodeCommit is published to subscribers.

I’ve used Python and Boto 3 to implement this function. The full source code has been published on GitHub. You can find the example in aws-codecommit-file-change-publisher repository.

 

Getting started

There is an AWS CloudFormation template, codecommit-sns-publisher.yml, in the source code. This template uses the AWS Serverless Application Model to define required components of the CodeCommit notification serverless application in simple and clean syntax.

The template is translated to an AWS CloudFormation stack and deploys an SNS topic, CloudWatch event rule, and Lambda function. The Lambda function code already demonstrates a simple notification use case. You can use the sample code to define your own logic and extend the function by using other APIs provided in the AWS SDK for Python (Boto3).

Prerequisites

Before you deploy this example, you must use the AWS CloudFormation template to create a CodeCommit repository. In this example, I have created an empty repository, sample-repo, in the Ohio (us-east-2) Region to demonstrate a scenario in which your repository has a file change or other update on a CodeCommit branch. If you already have a CodeCommit repository, follow these steps to deploy the template and Lambda function.

To deploy the AWS CloudFormation template and Lambda function

1. Download the source code from the aws-codecommit-file-change-publisher repository.

2. Sign in to the AWS Management Console and choose the AWS Region where your CodeCommit repository is located. Create an S3 bucket and then upload the AWS Lambda deployment package, codecommit-sns-publisher.zip, to it. For information, see How Do I Create an S3 Bucket? in the Amazon S3 Console User Guide.

3. Upload the Lambda deployment package to the S3 bucket.

In this example, I created an S3 bucket named codecommit-sns-publisher in the Ohio (us-east-2) Region and uploaded the deployment package from the Amazon S3 console.

4. In the AWS Management Console, choose CloudFormation. You can also open the AWS CloudFormation console directly at https://console.aws.amazon.com/cloudformation.

5. Choose Create Stack.

6. On the Select Template page, choose Upload a template to Amazon S3, and then choose the codecommit-sns-publisher.yml template.

7. Specify the following parameters:

  • Stack Name: codecommit-sns-publisher (You can use your own stack name, if you prefer.)
  • CodeS3BucketLocation: codecommit-sns-publisher (This is the S3 bucket name where you put the sample code.)
  • CodeS3KeyLocation: codecommit-sns-publisher.zip (This is the key name of the sample code S3 object. The object should be a zip file.)
  • CodeCommitRepo: sample-repo (The name of your CodeCommit repository.)
  • MainBranchName: master (Specify the branch name you would like to use as a trigger for publishing an SNS topic.)
  • NotificationEmailAddress: [email protected] (This is the email address you would like to use to subscribe to the SNS topic. The CloudFormation template creates an SNS topic to publish notifications to subscribers.)

8. Choose Next.

9. On the Review page, under Capabilities, choose the following options:

  • I acknowledge that AWS CloudFormation might create IAM resources.
  • I acknowledge that AWS CloudFormation might create IAM resources with custom names.

10. Under Transforms, choose Create Change Set. AWS CloudFormation starts to perform the template transformation and then creates a change set.

11. After the transformation, choose Execute to create the AWS CloudFormation stack.

After the stack has been created, you should receive an SNS subscription confirmation in your email account:

After you subscribe to the SNS topic, you can go to the AWS CloudFormation console and check the created AWS resources. If you would like to monitor the Lambda function, choose Resource to open the SNSPublisherFunction Lambda function.

Now, you can try to push a commit to the remote AWS CodeCommit repository.

1. Clone the CodeCommit repository to your local computer. For information, see Connect to an AWS CodeCommit Repository in the AWS CodeCommit User Guide. The following example shows how to clone a repository named sample-repo in the US East (Ohio) Region:

git clone ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/sample-repo

2. Enter the folder and create a plain text file:

cd sample-repo/
echo 'This is a sample file' > newfile

3. Add and commit this file change:

git add newfile
git commit -m 'Create initial file'

Look for this output:

[master (root-commit) 810d192] Create initial file
1 file changed, 1 insertion(+)
create mode 100644 newfile

4. Push the commit to the remote CodeCommit repository:

git push -u origin master:master

Look for this output:

Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), 235 bytes | 235.00 KiB/s, done.
…
* [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

After the local commit has been pushed to the remote CodeCommit repository, the CloudWatch event detects this update. You should see the following notification message in your email account:

Commit ID: <Commit ID>
author: [YourName] ([email protected]) - <Timestamp> +0000
message: Create initial file

File: newfile Addition - Blob ID: <Blob ID>

Summary

In this blog post, I showed you how to use an AWS CloudFormation template to quickly build a sample solution that can help your operations team or development team track updates to a CodeCommit repository.

The example CloudFormation template and Lambda function can be found in the aws-codecommit-file-change-publisher GitHub repository. Using the sample code, you can customize the email content with HTML or add other information to your email message.

If you have questions or other feedback about this example, please open an issue or submit a pull request.

Deploying a personalized API Gateway serverless developer portal

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/deploying-a-personalized-api-gateway-serverless-developer-portal/

This post is courtesy of Drew Dresser, Application Architect – AWS Professional Services

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Customers of these APIs often want a website to learn and discover APIs that are available to them. These customers might include front-end developers, third-party customers, or internal system engineers. To produce such a website, we have created the API Gateway serverless developer portal.

The API Gateway serverless developer portal (developer portal or portal, for short) is an application that you use to make your API Gateway APIs available to your customers by enabling self-service discovery of those APIs. Your customers can use the developer portal to browse API documentation, register for, and immediately receive their own API key that they can use to build applications, test published APIs, and monitor their own API usage.

Over the past few months, the team has been hard at work contributing to the open source project, available on Github. The developer portal was relaunched on October 29, 2018, and the team will continue to push features and take customer feedback from the open source community. Today, we’re happy to highlight some key functionality of the new portal.

Benefits for API publishers

API publishers use the developer portal to expose the APIs that they manage. As an API publisher, you need to set up, maintain, and enable the developer portal. The new portal has the following benefits:

Benefits for API consumers

API Consumers use the developer portal as a traditional application user. An API consumer needs to understand the APIs being published. API consumers might be front-end developers, distributed system engineers, or third-party customers. The new developer portal comes with the following benefits for API consumers:

  • Explore – API consumers can quickly page through lists of APIs. When they find one they’re interested in, they can immediately see documentation on that API.
  • Learn – API consumers might need to drill down deeper into an API to learn its details. They want to learn how to form requests and what they can expect as a response.
  • Test – Through the developer portal, API consumers can get an API key and invoke the APIs directly. This enables developers to develop faster and with more confidence.

Architecture

The developer portal is a completely serverless application. It leverages Amazon API Gateway, Amazon Cognito User Pools, AWS Lambda, Amazon DynamoDB, and Amazon S3. Serverless architectures enable you to build and run applications without needing to provision, scale, and manage any servers. The developer portal is broken down in to multiple microservices, each with a distinct responsibility, as shown in the following image.

Identity management for the developer portal is performed by Amazon Cognito and a Lambda function in the Login & Registration microservice. An Amazon Cognito User Pool is configured out of the box to enable users to register and login. Additionally, you can deploy the developer portal to use a UI hosted by Amazon Cognito, which you can customize to match your style and branding.

Requests are routed to static content served from Amazon S3 and built using React. The React app communicates to the Lambda backend via API Gateway. The Lambda function is built using the aws-serverless-express library and contains the business logic behind the APIs. The business logic of the web application queries and add data to the API Key Creation and Catalog Update microservices.

To maintain the API catalog, the Catalog Update microservice uses an S3 bucket and a Lambda function. When an API’s Swagger file is added or removed from the bucket, the Lambda function triggers and maintains the API catalog by updating the catalog.json file in the root of the S3 bucket.

To manage the mapping between API keys and customers, the application uses the API Key Creation microservice. The service updates API Gateway with API key creations or deletions and then stores the results in a DynamoDB table that maps customers to API keys.

Deploying the developer portal

You can deploy the developer portal using AWS SAM, the AWS SAM CLI, or the AWS Serverless Application Repository. To deploy with AWS SAM, you can simply clone the repository and then deploy the application using two commands from your CLI. For detailed instructions for getting started with the portal, see Use the Serverless Developer Portal to Catalog Your API Gateway APIs in the Amazon API Gateway Developer Guide.

Alternatively, you can deploy using the AWS Serverless Application Repository as follows:

  1. Navigate to the api-gateway-dev-portal application and choose Deploy in the top right.
  2. On the Review page, for ArtifactsS3BucketName and DevPortalSiteS3BucketName, enter globally unique names. Both buckets are created for you.
  3. To deploy the application with these settings, choose Deploy.
  4. After the stack is complete, get the developer portal URL by choosing View CloudFormation Stack. Under Outputs, choose the URL in Value.

The URL opens in your browser.

You now have your own serverless developer portal application that is deployed and ready to use.

Publishing a new API

With the developer portal application deployed, you can publish your own API to the portal.

To get started:

  1. Create the PetStore API, which is available as a sample API in Amazon API Gateway. The API must be created and deployed and include a stage.
  2. Create a Usage Plan, which is required so that API consumers can create API keys in the developer portal. The API key is used to test actual API calls.
  3. On the API Gateway console, navigate to the Stages section of your API.
  4. Choose Export.
  5. For Export as Swagger + API Gateway Extensions, choose JSON. Save the file with the following format: apiId_stageName.json.
  6. Upload the file to the S3 bucket dedicated for artifacts in the catalog path. In this post, the bucket is named apigw-dev-portal-artifacts. To perform the upload, run the following command.
    aws s3 cp apiId_stageName.json s3://yourBucketName/catalog/apiId_stageName.json

Uploading the file to the artifacts bucket with a catalog/ key prefix automatically makes it appear in the developer portal.

This might be familiar. It’s your PetStore API documentation displayed in the OpenAPI format.

With an API deployed, you’re ready to customize the portal’s look and feel.

Customizing the developer portal

Adding a customer’s own look and feel to the developer portal is easy, and it creates a great user experience. You can customize the domain name, text, logo, and styling. For a more thorough walkthrough of customizable components, see Customization in the GitHub project.

Let’s walk through a few customizations to make your developer portal more familiar to your API consumers.

Customizing the logo and images

To customize logos, images, or content, you need to modify the contents of the your-prefix-portal-static-assets S3 bucket. You can edit files using the CLI or the AWS Management Console.

Start customizing the portal by using the console to upload a new logo in the navigation bar.

  1. Upload the new logo to your bucket with a key named custom-content/nav-logo.png.
    aws s3 cp {myLogo}.png s3://yourPrefix-portal-static-assets/custom-content/nav-logo.png
  2. Modify object permissions so that the file is readable by everyone because it’s a publicly available image. The new navigation bar looks something like this:

Another neat customization that you can make is to a particular API and stage image. Maybe you want your PetStore API to have a dog picture to represent the friendliness of the API. To add an image:

  1. Use the command line to copy the image directly to the S3 bucket location.
    aws s3 cp my-image.png s3://yourPrefix-portal-static-assets/custom-content/api-logos/apiId-stageName.png
  2. Modify object permissions so that the file is readable by everyone.

Customizing the text

Next, make sure that the text of the developer portal welcomes your pet-friendly customer base. The YAML files in the static assets bucket under /custom-content/content-fragments/ determine the portal’s text content.

To edit the text:

  1. On the AWS Management Console, navigate to the website content S3 bucket and then navigate to /custom-content/content-fragments/.
  2. Home.md is the content displayed on the home page, APIs.md controls the tab text on the navigation bar, and GettingStarted.md contains the content of the Getting Started tab. All three files are written in markdown. Download one of them to your local machine so that you can edit the contents. The following image shows Home.md edited to contain custom text:
  3. After editing and saving the file, upload it back to S3, which results in a customized home page. The following image reflects the configuration changes in Home.md from the previous step:

Customizing the domain name

Finally, many customers want to give the portal a domain name that they own and control.

To customize the domain name:

  1. Use AWS Certificate Manager to request and verify a managed certificate for your custom domain name. For more information, see Request a Public Certificate in the AWS Certificate Manager User Guide.
  2. Copy the Amazon Resource Name (ARN) so that you can pass it to the developer portal deployment process. That process is now includes the certificate ARN and a property named UseRoute53Nameservers. If the property is set to true, the template creates a hosted zone and record set in Amazon Route 53 for you. If the property is set to false, the template expects you to use your own name server hosting.
  3. If you deployed using the AWS Serverless Application Repository, navigate to the Application page and deploy the application along with the certificate ARN.

After the developer portal is deployed and your CNAME record has been added, the website is accessible from the custom domain name as well as the new Amazon CloudFront URL.

Customizing the logo, text content, and domain name are great tools to make the developer portal feel like an internally developed application. In this walkthrough, you completely changed the portal’s appearance to enable developers and API consumers to discover and browse APIs.

Conclusion

The developer portal is available to use right away. Support and feature enhancements are tracked in the public GitHub. You can contribute to the project by following the Code of Conduct and Contributing guides. The project is open-sourced under the Amazon Open Source Code of Conduct. We plan to continue to add functionality and listen to customer feedback. We can’t wait to see what customers build with API Gateway and the API Gateway serverless developer portal.

Working with AWS Lambda and Lambda Layers in AWS SAM

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/working-with-aws-lambda-and-lambda-layers-in-aws-sam/

The introduction of serverless technology has enabled developers to shed the burden of managing infrastructure and concentrate on their application code. AWS Lambda has taken on that management by providing isolated, event-driven compute environments for the execution of application code. To use a Lambda function, a developer just needs to package their code and any dependencies into a zip file and upload that file to AWS. However, as serverless applications get larger and more functions are required for those applications, there is a need for the ability to share code across multiple functions within the application.

To meet this need, AWS released Lambda layers, providing a mechanism to externally package dependencies that can be shared across multiple Lambda functions. Lambda layers reduces lines of code and size of application artifacts and simplifies dependency management. Along with the release of Lambda layers, AWS also released support for layers in the AWS Serverless Application Model (SAM) and the AWS SAM command line interface (CLI). SAM is a template specification that enables developers to define a serverless application in clean and simple syntax. The SAM CLI is a command line tool that operates on SAM templates and application code. SAM can now define Lambda layers with the AWS::Serverless::LayerVersion type. The SAM CLI can build and test your layers locally as well as package, deploy, and publish your layers for public consumption.

How layers work

To understand how SAM CLI supports layers, you need to understand how layers work on AWS. When a Lambda function configured with a Lambda layer is executed, AWS downloads any specified layers and extracts them to the /opt directory on the function execution environment. Each runtime then looks for a language-specific folder under the /opt directory.

Lambda layers can come from multiple sources. You can create and upload your own layers for sharing, you can implement an AWS managed layer such as SciPi, or you can grab a third-party layer from an APN Partner or another trusted developer. The following image shows how layers work with multiple sources.AWS Lambda Layers diagram

How layers work in the AWS SAM CLI

To support Lambda layers, SAM CLI replicates the AWS layer process locally by downloading all associated layers and caching them on your development machine. This happens the first time you run sam local invoke or the first time you execute your Lambda functions using sam local start-lambda or sam local start-api.

Two specific flags in SAM CLI are helpful when you’re working with Lambda layers locally. To specify where the layer cache should be located, pass the –layer-cache-basedir flag, followed by your desired cache directory. To force SAM CLI to rebuild the layer cache, pass the –force-image-build flag.

Time for some code

Now you’re going to create a simple application that does some temperature conversions using a simple library named temp-units-conv. After the app is running, you move the dependencies to Lambda layers using SAM. Finally, you add a layer managed by an AWS Partner Network Partner, Epsagon, to enhance the monitoring of the Lambda function.

Creating a serverless application

To create a serverless application, use the SAM CLI. If you don’t have SAM CLI installed, see Installing the AWS SAM CLI in the AWS Serverless Application Model Developer Guide.

  1. To initialized a new application, run the following command.
    $ sam init -r nodejs8.10

    This creates a simple node application under the directory sam-app that looks like this.

    $ tree sam-app
    sam-app
    ├── README.md
    ├── hello-world
    │   ├── app.js
    │   ├── package.json
    │   └── tests
    └── template.yaml

    The template.yaml file is a SAM template describing the infrastructure for the application, and the app.js file contains the application code.

  2. To install the dependencies for the application, run the following command from within the sam-app/hello-world directory.
    $ npm install temp-units-conv
  3. The application is going to perform temperature scale conversions for Celsius, Fahrenheit, and Kelvin using the following code. In a code editor, open the file sam-app/hello-world/app.js and replace its contents with the following.
    const tuc = require('temp-units-conv');
    let response;
    
    const scales = {
        c: "celsius",
        f: "fahrenheit",
        k: "kelvin"
    }
    
    exports.lambdaHandler = async (event) => {
        let conversion = event.pathParameters.conversion
        let originalValue = event.pathParameters.value
        let answer = tuc[conversion](originalValue)
        try {
            response = {
                'statusCode': 200,
                'body': JSON.stringify({
                    source: scales[conversion[0]],
                    target: scales[conversion[2]],
                    original: originalValue,
                    answer: answer
                })
            }
        } catch (err) {
            console.log(err);
            return err;
        }
    
        return response
    };
  4. Update the SAM template. Open the sam-app/template.yaml file. Replace the contents with the following. This is a YAML file, so spacing and indentation is important.
    AWSTemplateFormatVersion: '2010-09-09'
    Transform: AWS::Serverless-2016-10-31
    Description: sam app
    Globals:
        Function:
            Timeout: 3
            Runtime: nodejs8.10
    
    Resources:
        TempConversionFunction:
            Type: AWS::Serverless::Function 
            Properties:
                CodeUri: hello-world/
                Handler: app.lambdaHandler
                Events:
                    HelloWorld:
                        Type: Api
                        Properties:
                            Path: /{conversion}/{value}
                            Method: get

    This change dropped out some comments and output parameters and updated the function resource to TempConversionFunction. The primary change is the Path: /{conversion}/{value} line. This enables you to use path mapping for our conversion type and value.

  5. Okay, now you have a simple app that does temperature conversions. Time to spin it up and make sure that it works. In the sam-app directory, run the following command.
    $ sam local start-api

  6. Using curl or your browser, navigate to the address output by the previous command with a conversion and value attached. For reference, c = Celsius, f = Fahrenheit, and k = Kelvin. Use the pattern c2f/ followed by the temperature that you want to convert.
    $ curl http://127.0.0.1:3000/f2c/45
    {"source":"fahrenheit","target":"celsius","original":"45","answer":7.222222222222222}

Deploying the application

Now that you have a working application that you have tested locally, deploy it to AWS. This enables the application to run on Lambda, providing a public endpoint that you can share with others.

  1. Create a resource bucket. The resource bucket gives you a place to upload the application so that AWS CloudFormation can access it when you run the deploy process. Run the following command to create a resource bucket.
    $ aws s3api create-bucket –bucket <your unique bucket name>
  2. Use SAM to package the application. From the sam-app directory, run the following command.
    $ sam package --template-file template.yaml --s3-bucket <your bucket> --output-template-file out.yaml
  3. Now you can use SAM to deploy the application. Run the following command from the sam-app folder.
    $ sam deploy --template-file ./out.yaml --stack-name <your stack name> --capabilities CAPABILITY_IAM

Sign in to the AWS Management Console. Navigate to the Lambda console to find your function.

Lambda Console

Now that the application is deployed, you can access it via the API endpoint. In the Lambda console, click on the API Gateway option and scroll down. You will find a link to your API Gateway endpoint.

API Gateway Endpoint

Using that value, you can test the live application. Your endpoint will be different from the one in the following image.

Live Demo

Let’s take a moment to talk through the structure of our new application. Because you installed temp-units-conv, there is a dependency folder named sam-app`hello-world/node_modules that you need to include when you upload the application.

$ tree sam-app
sam-app
├── README.md
├── hello-world
│   ├── app.js
│   ├── node_modules
│   ├── package-lock.json
│   ├── package.json
│   └── tests
└── template.yaml

Because you’re a node user, you can use something like webpack to minimize your uploads. However, this requires a processing step to pack your code, and it still forces you to upload unchanging, static code on every update. To simplify this, create a layer to separate the dependencies from the application code.

Creating a layer

To create and manage the dependency layer, you need to update your directory structure a bit.

$ tree sam-app
sam-app
├── README.md
├── dependencies
│   └── nodejs
│       └── package.json
├── hello-world
│   ├── app.js
│   └── tests
│       └── unit
└── template.yaml

In the root, create a new directory named dependencies. Under that directory, create a second directory named nodejs. This is the structure required for layers to be injected into a Lambda function. Next, move the package.json file from the hello-world directory to the dependencies/nodejs directory. Finally, clean up the hello-world directory by deleting the node_modules folder and the package-lock.json file.

Before you gather your dependencies, edit the sam-app/dependencies/nodejs/pakage.json file. Replace the entire contents with the following.

{
  "dependencies": {
    "temp-units-conv": "^1.0.2"
  }
}

Now that you have the package file cleaned up, install the required packages into the dependencies directory. From the sam-app/dependencies/nodejs directory, run the following command.

$ npm install

You now have a node_modules directory under the nodejs directory. With this in place, you have everything in place to create your first layer using SAM.

The next step is to update the AWS SAM template. Replace the contents of your sam-app/template.yaml file with the following.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: sam app
Globals:
    Function:
        Timeout: 3
        Runtime: nodejs8.10

Resources:
    TempConversionFunction:
        Type: AWS::Serverless::Function 
        Properties:
            CodeUri: hello-world/
            Handler: app.lambdaHandler
            Layers:
              - !Ref TempConversionDepLayer
            Events:
                HelloWorld:
                    Type: Api
                    Properties:
                        Path: /{conversion}/{value}
                        Method: get

    TempConversionDepLayer:
        Type: AWS::Serverless::LayerVersion
        Properties:
            LayerName: sam-app-dependencies
            Description: Dependencies for sam app [temp-units-conv]
            ContentUri: dependencies/
            CompatibleRuntimes:
              - nodejs6.10
              - nodejs8.10
            LicenseInfo: 'MIT'
            RetentionPolicy: Retain

There are two changes to the template. The first is a new resource named TempConversionDepLayer, which defines the new layer and points to the dependencies folder as the code source for the layer. The second is the addition of the Layers parameter in the TempConversionFunction resource. The single layer entry references the layer that is being created in the template file.

With that final change, you have separated the application code from the dependencies. Try out the application and see if it still works. From the sam-app directory, run the following command.

$ sam local start-api

If all went well, you can open your browser back up and try another conversion.

One other thing to note here is that you didn’t have to change our application code at all. As far as the application is concerned, nothing has changed. However, under the hood, SAM CLI is doing a bit of magic. SAM CLI is creating an image of the layer and caching it locally. It then makes that layer available in the /opt directory on the container being used to execute the Lambda function locally.

Using layers from APN Partners and third parties

So far, you have used a layer of your own creation. Now you’re going to branch out and add a managed layer by an APN Partner, Epsagon, who provides a tool to help with monitoring and troubleshooting your serverless applications. If you want to try this demo, you can sign up for a free trial on their website. After you create an account, you need to get the Epsagon token from the Settings page of your dashboard.

Epsagon Settings

  1. Add Epsagon layer reference. Edit the sam-app/template.yaml file. Update the Layers section of the TempConversionFunction resource to the following.
    Layers:
      - !Ref TempConversionDepLayer
      - arn:aws:lambda:us-east-1:066549572091:layer:epsagon-node-layer:1
    

    Note: This demo uses us-east-1 for the AWS Region. If you plan to deploy your Lambda function to a different Region, update the Epsagon LayerVersion Amazon Resource Name (ARN) accordingly. For more information, see the Epsagon blog post on layers.

  2. To use the Epsagon library in our code, you need to add or modify nine lines of code. You reference and initialize the library, wrap the handler with the Epsagon library, and modify the output. Open the sam-app/hello-world/app.js file and replace the entire contents with the following. The changes are highlighted. Be sure to update 1122334455 with your token from Epsagon.
    const tuc = require('temp-units-conv');
    const epsagon = require('epsagon');
    epsagon.init({
        token: '1122334455',
        appName: 'layer-demo-app',
        metadataOnly: false, // Optional, send more trace data
    });
    
    let response;
    
    const scales = {
        c: "celsius",
        f: "fahrenheit",
        k: "kelvin"
    }
    
    exports.lambdaHandler = epsagon.lambdaWrapper((event, context, callback) => {
        let conversion = event.pathParameters.conversion
        let originalValue = event.pathParameters.value
        let answer = tuc[conversion](originalValue)
        try {
            response = {
                'statusCode': 200,
                'body': JSON.stringify({
                    source: scales[conversion[0]],
                    target: scales[conversion[2]],
                    original: originalValue,
                    answer: answer
                })
            }
        } catch (err) {
            console.log(err);
            return err;
        }
    
        callback(null, response)
    });

Test the change to make sure that everything still works. From the sam-app directory, run the following command.

$ sam cli start-api

Use curl to test your code.

$ curl http://127.0.0.1:3000/k2c/100

Your answer should be the following.

{"source":"kelvin","target":"celsius","original":"100","answer":-173.14999999999998}

Your Epsagon dashboard should display traces from your Lambda function, as shown in the following image.

Epsagon Dashboard

Deploying the application with layers

Now that you have a functioning application that uses Lambda layers, you can package and deploy it.

  1. To package the application, run the following command.
    $ sam package --template-file template.yaml --s3-bucket <your bucket> --output-template-file out.yaml
  2. To deploy the application, run the following command.
    $ sam deploy --template-file ./out.yaml --stack-name <your stack name> --capabilities CAPABILITY_IAM

The Lambda console for your function has updated, as shown in the following image.

Lambda Console

Also, the dependency code isn’t in your code environment in the Lambda function.

Lambda Console Code

There you have it! You just deployed your Lambda function and your dependencies layer for that function. It’s important to note that you did not publish the Epsagon layer. You just told AWS to grab their layer and extract it in to your function’s execution environment. The following image shows the flow of this process.

Epsagon Layer

Options for managing layers

You have several options for managing your layers through AWS SAM.

First, following the pattern you just walked through releases a new version of your layer each time you deploy your application. If you remember, one of the advantages of using layers is not having to upload the dependencies each time. One option to avoid this is to keep your dependencies in a separate template and deploy them only when the dependencies have changed.

Second, this pattern always uses the latest build of dependencies. If for some reason you want to get a specific version of your dependencies, you can. After you deploy the application for the first time, you can run the following command.

$ aws lambda list-layer-versions --layer-name sam-app-depedencies

You should see a response like the following.

{
    "LayerVersions": [
        {
            "LayerVersionArn": "arn:aws:lambda:us-east-1:5555555:layer:sam-app-dependencies:1",
            "Version": 1,
            "Description": "Dependencies for sam app",
            "CreatedDate": "2019-01-08T18:04:51.833+0000",
            "CompatibleRuntimes": [
                "nodejs6.10",
                "nodejs8.10"
            ],
            "LicenseInfo": "MIT"
        }
    ]
}

The critical information here is the LayerArnVersion. Returning to the sam-app/template.yaml file, you can change the Layers section of the TempConversionFunction resource to use this version.

Layers:
  - arn:aws:lambda:us-east-1:5555555:layer:sam-app-dependencies:1
  - arn:aws:lambda:us-east-1:066549572091:layer:epsagon-node-layer:1

Conclusion

This blog post demonstrates how AWS SAM can manage Lambda layers via AWS SAM templates. It also demonstrates how the AWS SAM CLI creates a local development environment that provides layer support without any changes in the application.

Developers are often taught to think of code in an object-oriented manner and to code in a DRY way (don’t repeat yourself). For developers of serverless applications, these practices remain true. As serverless applications grow in size and require more Lambda functions, using Lambda layers provides an efficient mechanism to reuse code and libraries. Layers also reduce the size of your upload packages, making iterations faster.

We’re excited to see what you do with layers and to hear how AWS SAM is helping you. As always, we welcome your feedback.

Now go code!

Handling AWS Chargebacks for Enterprise Customers

Post Syndicated from Varad Ram original https://aws.amazon.com/blogs/architecture/handling-aws-chargebacks-for-enterprise-customers/

As AWS product portfolios and feature sets grow, as an enterprise customer, you are likely to migrate your existing workloads and innovate your new products on AWS. To help you keep your cloud charges simple, you can use consolidated billing. This can, however, create complexity for your internal chargebacks, especially if some of your resources and services are not tagged correctly. To help your individual teams and business units normalize and reduce their costs as your AWS implementation grows, you can implement chargebacks transparently and automate billing.

This blog post includes a walkthrough of an end-to-end mechanism that you can use to automate your consolidated billing charges for either your existing AWS accounts, or for newly created accounts.

Walkthrough

Prerequisites for implementation:

  • One account that is the payer account, which consolidates billing and links all other accounts (including admin accounts)
  • An understanding of billing, Detailed Billing Report (DBR), Cost and Usage Report (CUR), and blended and unblended costs
  • Activate propagation of necessary cost allocation tags to consolidated billing
  • Access to reservations across the linked accounts
  • Read permission on the source bucket and write permission to the transformed bucket
  • An automated method (such as database access or an API) to verify the cost centers tagged to AWS resources
  • Permissions to get access to the services described in this solution on the account targeted for this automation

Before you begin, it is important to understand the blended costs and unblended costs in consolidated billing. Blended costs are calculated based on the blended rate (the average rates for the reserved and on-demand instances that are used by your member accounts) for each service your accounts used, multiplied by the account usage of those services. Unblended costs are the charges for those services broken out for each linked account.

Based on your organization’s strategy for savings (centralized or not), you could consider either the blended or unblended costs. The consolidated billing files that include the information for the chargeback are the Detailed Billing Report (DBR) and Cost and Usage Report (CUR). Both of these reports provide both the blended and unblended rates as separate columns.

To help you create and maintain your AWS accounts, you can use AWS Account Vending Machine (AVM). You can launch AVM from either the AWS Landing Zone or with a custom solution. AVM keeps all your account information in a DynamoDB table (such as the account number, root mail ID, default cost center, name of the owner, etc.) and maintains reservation-related data (such as invoice ID, instance type, region, amount, cost center, etc.) in another table. To enable your account administrator to add invoice details for all your reservations, you can use a web page hosted on AWS Lambda, Amazon Simple Storage Service (Amazon S3), or a web server.

To begin the process of billing transformation, you must add a trigger on an S3 bucket (which contains raw AWS billing files) that pushes messages (PutObject) into Amazon Simple Queue Service (SQS) and your billing transformation program (written in Python, Nodejs, Java, .net, etc. using AWS SDK) that runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance, containers, or Lambda (if the bill can be processed within 15 minutes with file size restrictions).

The billing transformation program must do the following:

  • Cache the Account details and reservation DynamoDB tables
  • Verify if there are any messages in SQS
  • Ignore if the file is not a DBR or CUR file (process either of them, not both)
  • Download the file, unzip, and read row-by-row; for a DBR file, consider only the “LineItem” RecordType
  • Add two new columns: Bill_CostCenter and Bill_Notes
    • If there is a valid value in the CostCenter tag (verified with internal automation processes), add the same value to the Bill_CostCenter column and any notes to the Bill_Notes column
    • If the CostCenter is invalid, get the default Cost Center from the cached account details and add the information to the Bill_CostCenter and Bill_Notes columns
    • If the row is a reservation invoice, the cost center information comes from the reservation table and is added to the correct column
  • Cache consolidation of cost centers with the blended or unblended cost of each row
  • Write each of these processed line items into a new file
  • Handle exceptions by the normal organization practices (for example, email the owner of the cost center or the finance team)
  • Push the new file into the transformed Amazon S3 bucket
  • Write the consolidated lines into a different file and upload to Transformed Amazon S3 bucket
Figure 1 – Architecture of processing a billing chargeback

Figure 1 – Architecture of processing a billing chargeback

 

Figure 2 – Validating the Cost Center process

After you have the consolidated billing file aggregated by cost center, you can easily see and handle your internal chargebacks. To further simplify your chargeback model, you can get help from AWS Technical Account Managers and Billing Concierge, if your organization would like AWS to provide custom invoices from the consolidated billing file.

Because the cost centers in your organization can expire over time, it’s important validate them frequently with automation, such as a Lambda program.

Improvements

If your organization has a more complex chargeback structure, you can extend the logic described above to support deeper and broader chargeback codes, or implement hierarchical chargeback structure.

You can also extend the transformation logic to support several chargeback codes (such as comma separated or with additional tags) if you have multiple teams or project that want to share a resource.

Summary

As enterprise organizations grow and consume more cloud services, the cost optimization process grows and evolves with them. Sophisticated chargeback models enable the teams and business units in the organization to be accountable and contribute to take the steps necessary to normalize the usage and costs of AWS services.

About the Author

Varad RamVarad Ram likes to help customers adopt to cloud technologies and he is particularly interested in Artificial Intelligence. He believes Deep Learning will power future technology growth. In his spare time, his daughter and toddler son keep him busy biking and hiking.

Our data lake story: How Woot.com built a serverless data lake on AWS

Post Syndicated from Karthik Kumar Odapally original https://aws.amazon.com/blogs/big-data/our-data-lake-story-how-woot-com-built-a-serverless-data-lake-on-aws/

In this post, we talk about designing a cloud-native data warehouse as a replacement for our legacy data warehouse built on a relational database.

At the beginning of the design process, the simplest solution appeared to be a straightforward lift-and-shift migration from one relational database to another. However, we decided to step back and focus first on what we really needed out of a data warehouse. We started looking at how we could decouple our legacy Oracle database into smaller microservices, using the right tool for the right job. Our process wasn’t just about using the AWS tools. More, it was about having a mind shift to use cloud-native technologies to get us to our final state.

This migration required developing new extract, transform, load (ETL) pipelines to get new data flowing in while also migrating existing data. Because of this migration, we were able to deprecate multiple servers and move to a fully serverless data warehouse orchestrated by AWS Glue.

In this blog post, we are going to show you:

  • Why we chose a serverless data lake for our data warehouse.
  • An architectural diagram of Woot’s systems.
  • An overview of the migration project.
  • Our migration results.

Architectural and design concerns

Here are some of the design points that we considered:

  • Customer experience. We always start with what our customer needs, and then work backwards from there. Our data warehouse is used across the business by people with varying level of technical expertise. We focused on the ability for different types of users to gain insights into their operations and to provide better feedback mechanisms to improve the overall customer experience.
  • Minimal infrastructure maintenance. The “Woot data warehouse team” is really just one person—Chaya! Because of this, it’s important for us to focus on AWS services that enable us to use cloud-native technologies. These remove the undifferentiated heavy lifting of managing infrastructure as demand changes and technologies evolve.
  • Responsiveness to data source changes. Our data warehouse gets data from a range of internal services. In our existing data warehouse, any updates to those services required manual updates to ETL jobs and tables. The response times for these data sources are critical to our key stakeholders. This requires us to take a data-driven approach to selecting a high-performance architecture.
  • Separation from production systems. Access to our production systems is tightly coupled. To allow multiple users, we needed to decouple it from our production systems and minimize the complexities of navigating resources in multiple VPCs.

Based on these requirements, we decided to change the data warehouse both operationally and architecturally. From an operational standpoint, we designed a new shared responsibility model for data ingestion. Architecturally, we chose a serverless model over a traditional relational database. These two decisions ended up driving every design and implementation decision that we made in our migration.

As we moved to a shared responsibility model, several important points came up. First, our new way of data ingestion was a major cultural shift for Woot’s technical organization. In the past, data ingestion had been exclusively the responsibility of the data warehouse team and required customized pipelines to pull data from services. We decided to shift to “push, not pull”: Services should send data to the data warehouse.

This is where shared responsibility came in. For the first time, our development teams had ownership over their services’ data in the data warehouse. However, we didn’t want our developers to have to become mini data engineers. Instead, we had to give them an easy way to push data that fit with the existing skill set of a developer. The data also needed to be accessible by the range of technologies used by our website.

These considerations led us to select the following AWS services for our serverless data warehouse:

The following diagram shows at a high level how we use these services.

Tradeoffs

These components together met all of our requirements and enabled our shared responsibility model. However, we made few tradeoffs compared to a lift-and-shift migration to another relational database:

  • The biggest tradeoff was upfront effort vs. ongoing maintenance. We effectively had to start from scratch with all of our data pipelines and introduce a new technology into all of our website services, which required a concerted effort across multiple teams. Minimal ongoing maintenance was a core requirement. We were willing to make this tradeoff to take advantage of the managed infrastructure of the serverless components that we use.
  • Another tradeoff was balancing usability for nontechnical users vs. taking advantage of big data technologies. Making customer experience a core requirement helped us navigate the decision-making when considering these tradeoffs. Ultimately, only switching to another relational database would mean that our customers would have the same experience, not a better one.

Building data pipelines with Kinesis Data Firehose and Lambda

Because our site already runs on AWS, using an AWS SDK to send data to Kinesis Data Firehose was an easy sell to developers. Things like the following were considerations:

  • Direct PUT ingestion for Kinesis Data Firehose is natural for developers to implement, works in all languages used across our services, and delivers data to Amazon S3.
  • Using S3 for data storage means that we automatically get high availability, scalability, and durability. And because S3 is a global resource, it enables us to manage the data warehouse in a separate AWS account and avoid the complexity of navigating multiple VPCs.

We also consume data stored in Amazon DynamoDB tables. Kinesis Data Firehose again provided the core of the solution, this time combined with DynamoDB Streams and Lambda. For each DynamoDB table, we enabled DynamoDB Streams and then used the stream to trigger a Lambda function.

The Lambda function cleans the DynamoDB stream output and writes the cleaned JSON to Kinesis Data Firehose using boto3. After doing this, it converges with the other process and outputs the data to S3. For more information, see How to Stream Data from Amazon DynamoDB to Amazon Aurora using AWS Lambda and Amazon Kinesis Firehose on the AWS Database Blog.

Lambda gave us more fine-grained control and enabled us to move files between accounts:

  • We enabled S3 event notifications on the S3 bucket and created an Amazon SNS topic to receive notifications whenever Kinesis Data Firehose put an object in the bucket.
  • The SNS topic triggered a Lambda function, which took the Kinesis output and moved it to the data warehouse account in our chosen partition structure.

S3 event notifications can trigger Lambda functions, but we chose SNS as an intermediary because the S3 bucket and Lambda function were in separate accounts.

Migrating existing data with AWS DMS and AWS Glue

We needed to migrate data from our existing RDS database to S3, which we accomplished with AWS DMS. DMS natively supports S3 as a target, as described in the DMS documentation.

Setting this up was relatively straightforward. We exported data directly from our production VPC to the separate data warehouse account by tweaking the connection attributes in DMS. The string that we used was this:

"cannedAclForObjects=BUCKET_OWNER_FULL_CONTROL;compressionType=GZIP;addColumnName=true;”

This code gives ownership to the bucket owner (the destination data warehouse account), compresses the files to save on storage costs, and includes all column names. After the data was in S3, we used an AWS Glue crawler to infer the schemas of all exported tables and then compared against the source data.

With AWS Glue, some of the challenges we overcame were these:

  • Unstructured text data, such as forum and blog posts. DMS exports these to CSV. This approach conflicted with the commas present in the text data. We opted to use AWS Glue to export data from RDS to S3 in Parquet format, which is unaffected by commas because it encodes columns directly.
  • Cross-account exports. We resolved this by including the code

"glueContext._jsc.hadoopConfiguration().set("fs.s3.canned.acl", "BucketOwnerFullControl”)”

at the top of each AWS Glue job to grant bucket owner access to all S3 files produced by AWS Glue.

Overall, AWS DMS was quicker to set up and great for exporting large amounts of data with rule-based transformations. AWS Glue required more upfront effort to set up jobs, but provided better results for cases where we needed more control over the output.

If you’re looking to convert existing raw data (CSV or JSON) into Parquet, you can set up an AWS Glue job to do that. The process is described in the AWS Big Data Blog post Build a data lake foundation with AWS Glue and Amazon S3.

Bringing it all together with AWS Glue, Amazon Athena, and Amazon QuickSight

After data landed in S3, it was time for the real fun to start: actually working with the data! Can you tell I’m a data engineer? For me, a big part of the fun was exploring AWS Glue:

  • AWS Glue handles our ETL job scheduling.
  • AWS Glue crawlers manage the metadata in the AWS Glue Data Catalog.

Crawlers are the “secret sauce” that enables us to be responsive to schema changes. Throughout the pipeline, we chose to make each step as schema-agnostic as possible, which allows any schema changes to flow through until they reach AWS Glue.

However, raw data is not ideal for most of our business users, because it often has duplicates or incorrect data types. Most importantly, the data out of Firehose is in JSON format, but we quickly observed significant query performance gains from using Parquet format. Here, we used one of the performance tips in the Big Data Blog post Top 10 performance tuning tips for Amazon Athena.

With our shared responsibility model, the data warehouse and BI teams are responsible for the final processing of data into curated datasets ready for reporting. Using Lambda and AWS Glue enables these teams to work in Python and SQL (the core languages for Amazon data engineering and BI roles). It also enables them to deploy code with minimal infrastructure setup or maintenance.

Our ETL process is as follows:

  • Scheduled triggers.
  • Series of conditional triggers that control the flow of subsequent jobs that depend on previous jobs.
  • A similar pattern across many jobs of reading in the raw data, deduplicating the data, and then writing to Parquet. We centralized this logic by creating a Python library of functions and uploading it to S3. We then included that library in the AWS Glue job as an additional Python library. For more information on how to do this, see Using Python Libraries with AWS Glue in the AWS Glue documentation.

We also migrated complex jobs used to create reporting tables with business metrics:

  • The AWS Glue use of PySpark simplified the migration of these queries, because you can embed SparkSQL queries directly in the job.
  • Converting to SparkSQL took some trial and error, but ultimately required less work than translating SQL queries into Spark methods. However, for people on our BI team who had previously worked with Pandas or Spark, working with Spark dataframes was a natural transition. As someone who used SQL for several years before learning Python, I appreciate that PySpark lets me quickly switch back and forth between SQL and an object-oriented framework.

Another hidden benefit of using AWS Glue jobs is that the AWS Glue version of Python (like Lambda) already has boto3 installed. Thus, ETL jobs can directly use AWS API operations without additional configuration.

For example, some of our longer-running jobs created read inconsistency if a user happened to query that table while AWS Glue was writing data to S3. We modified the AWS Glue jobs to write to a temporary directory with Spark and then used boto3 to move the files into place. Doing this reduced read inconsistency by up to 90 percent. It was great to have this functionality readily available, which may not have been the case if we managed our own Spark cluster.

Comparing previous state and current state

After we had all the datasets in place, it was time for our customers to come on board and start querying. This is where we really leveled up the customer experience.

Previously, users had to download a SQL client, request a user name and password, set it up, and learn SQL to get data out. Now, users just sign in to the AWS Management Console through automatically provisioned IAM roles and run queries in their browser with Athena. Or if they want to skip SQL altogether, they can use our Amazon QuickSight account with accounts managed through our pre-existing Active Directory server.

Integration with Active Directory was a big win for us. We wanted to enable users to get up and running without having to wait for an account to be created or managing separate credentials. We already use Active Directory across the company for access to multiple resources. Upgrading to Amazon QuickSight Enterprise Edition enabled us to manage access with our existing AD groups and credentials.

Migration results

Our legacy data warehouse was developed over the course of five years. We recreated it as a serverless data lake using AWS Glue in about three months.

In the end, it took more upfront effort than simply migrating to another relational database. We also dealt with more uncertainty because we used many products that were relatively new to us (especially AWS Glue).

However, in the months since the migration was completed, we’ve gotten great feedback from data warehouse users about the new tools. Our users have been amazed by these things:

  • How fast Athena is.
  • How intuitive and beautiful Amazon QuickSight is. They love that no setup is required—it’s easy enough that even our CEO has started using it!
  • That Athena plus the AWS Glue Data Catalog have given us the performance gains of a true big data platform, but for end users it retains the look and feel of a relational database.

Summary

From an operational perspective, the investment has already started to pay off. Literally: Our operating costs have fallen by almost 90 percent.

Personally, I was thrilled that recently I was able to take a three-week vacation and didn’t get paged once, thanks to the serverless infrastructure. And for our BI engineers in addition to myself, the S3-centric architecture is enabling us to experiment with new technologies by integrating seamlessly with other services, such as Amazon EMR, Amazon SageMaker, Amazon Redshift Spectrum, and Lambda. It’s been exciting to see how these services have grown in the time since we’ve adopted them (for example, the recent AWS Glue launch of Amazon CloudWatch metrics and Athena’s launch of views).

We are thrilled that we’ve invested in technologies that continue to grow as we do. We are incredibly proud of our team for accomplishing this ambitious migration. We hope our experience can inspire other engineers to dive in to building a data lake of their own.

For additional information, see these similar AWS Big Data blog posts:


About the authors

Chaya Carey is a data engineer at Woot.com. At Woot, she’s responsible for managing the data warehouse and other scalable data solutions. Outside of work, she’s passionate about Seattle’s bar and restaurant scene, books, and video games.

 

 

 

Karthik Odapally is a senior solutions architect at AWS. His passion is to build cost-effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.

 

 

 

 

ICYMI: Serverless Q4 2018

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2018/

This post is courtesy of Eric Johnson, Senior Developer Advocate – AWS Serverless

Welcome to the fourth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

This edition of ICYMI includes all announcements from AWS re:Invent 2018!

If you didn’t see them, check our Q1 ICYMIQ2 ICYMI, and Q3 ICYMI posts for what happened then.

So, what might you have missed this past quarter? Here’s the recap.

New features

AWS Lambda introduced the Lambda runtime API and Lambda layers, which enable developers to bring their own runtime and share common code across Lambda functions. With the release of the runtime API, we can now support runtimes from AWS partners such as the PHP runtime from Stackery and the Erlang and Elixir runtimes from Alert Logic. Using layers, partners such as Datadog and Twistlock have also simplified the process of using their Lambda code libraries.

To meet the demand of larger Lambda payloads, Lambda doubled the payload size of asynchronous calls to 256 KB.

In early October, Lambda also lengthened the runtime limit by enabling Lambda functions that can run up to 15 minutes.

Lambda also rolled out native support for Ruby 2.5 and Python 3.7.

You can now process Amazon Kinesis streams up to 68% faster with AWS Lambda support for Kinesis Data Streams enhanced fan-out and HTTP/2 for faster streaming.

Lambda also released a new Application view in the console. It’s a high-level view of all of the resources in your application. It also gives you a quick view of deployment status with the ability to view service metrics and custom dashboards.

Application Load Balancers added support for targeting Lambda functions. ALBs can provide a simple HTTP/S front end to Lambda functions. ALB features such as host- and path-based routing are supported to allow flexibility in triggering Lambda functions.

Amazon API Gateway added support for AWS WAF. You can use AWS WAF for your Amazon API Gateway APIs to protect from attacks such as SQL injection and cross-site scripting (XSS).

API Gateway has also improved parameter support by adding support for multi-value parameters. You can now pass multiple values for the same key in the header and query string when calling the API. Returning multiple headers with the same name in the API response is also supported. For example, you can send multiple Set-Cookie headers.

In October, API Gateway relaunched the Serverless Developer Portal. It provides a catalog of published APIs and associated documentation that enable self-service discovery and onboarding. You can customize it for branding through either custom domain names or logo/styling updates. In November, we made it easier to launch the developer portal from the Serverless Application Repository.

In the continuous effort to decrease customer costs, API Gateway introduced tiered pricing. The tiered pricing model allows the cost of API Gateway at scale with an API Requests price as low as $1.51 per million requests at the highest tier.

Last but definitely not least, API Gateway released support for WebSocket APIs in mid-December as a final holiday gift. With this new feature, developers can build bidirectional communication applications without having to provision and manage any servers. This has been a long-awaited and highly anticipated announcement for the serverless community.

AWS Step Functions added eight new service integrations. With this release, the steps of your workflow can exist on Amazon ECS, AWS Fargate, Amazon DynamoDB, Amazon SNS, Amazon SQS, AWS Batch, AWS Glue, and Amazon SageMaker. This is in addition to the services that Step Functions already supports: AWS Lambda and Amazon EC2.

Step Functions expanded by announcing availability in the EU (Paris) and South America (São Paulo) Regions.

The AWS Serverless Application Repository increased its functionality by supporting more resources in the repository. The Serverless Application Repository now supports Application Auto Scaling, Amazon Athena, AWS AppSync, AWS Certificate Manager, Amazon CloudFront, AWS CodeBuild, AWS CodePipeline, AWS Glue, AWS dentity and Access Management, Amazon SNS, Amazon SQS, AWS Systems Manager, and AWS Step Functions.

The Serverless Application Repository also released support for nested applications. Nested applications enable you to build highly sophisticated serverless architectures by reusing services that are independently authored and maintained but easily composed using AWS SAM and the Serverless Application Repository.

AWS SAM made authorization simpler by introducing SAM support for authorizers. Enabling authorization for your APIs is as simple as defining an Amazon Cognito user pool or an API Gateway Lambda authorizer as a property of your API in your SAM template.

AWS SAM CLI introduced two new commands. First, you can now build locally with the sam build command. This functionality allows you to compile deployment artifacts for Lambda functions written in Python. Second, the sam publish command allows you to publish your SAM application to the Serverless Application Repository.

Our SAM tooling team also released the AWS Toolkit for PyCharm, which provides an integrated experience for developing serverless applications in Python.

The AWS Toolkits for Visual Studio Code (Developer Preview) and IntelliJ (Developer Preview) are still in active development and will include similar features when they become generally available.

AWS SAM and the AWS SAM CLI implemented support for Lambda layers. Using a SAM template, you can manage your layers, and using the AWS SAM CLI, you can develop and debug Lambda functions that are dependent on layers.

Amazon DynamoDB added support for transactions, allowing developer to enforce all-or-nothing operations. In addition to transaction support, Amazon DynamoDB Accelerator also added support for DynamoDB transactions.

Amazon DynamoDB also announced Amazon DynamoDB on-demand, a flexible new billing option for DynamoDB capable of serving thousands of requests per second without capacity planning. DynamoDB on-demand offers simple pay-per-request pricing for read and write requests so that you only pay for what you use, making it easy to balance costs and performance.

AWS Amplify released the Amplify Console, which is a continuous deployment and hosting service for modern web applications with serverless backends. Modern web applications include single-page app frameworks such as React, Angular, and Vue and static-site generators such as Jekyll, Hugo, and Gatsby.

Amazon SQS announced support for Amazon VPC Endpoints using PrivateLink.

Serverless blogs

October

November

December

Tech talks

We hold several Serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q4:

Twitch

We’ve been so busy livestreaming on Twitch that you’re most certainly missing out if you aren’t following along!

For information about upcoming broadcasts and recent livestreams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

New Home for SAM Docs

This quarter, we moved all SAM docs to https://docs.aws.amazon.com/serverless-application-model. Everything you need to know about SAM is there. If you don’t find what you’re looking for, let us know!

In other news

The schedule is out for 2019 AWS Global Summits in cities around the world. AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Summits are held in major cities around the world. They attract technologists from all industries and skill levels who want to discover how AWS can help them innovate quickly and deliver flexible, reliable solutions at scale. Get notified when to register and learn more at the AWS Global Summits website.

Still looking for more?

The Serverless landing page has lots of information. The resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Announcing nested applications for AWS SAM and the AWS Serverless Application Repository

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/announcing-nested-applications-for-aws-sam-and-the-aws-serverless-application-repository/

Serverless application architectures enable you to break large projects into smaller, more manageable services that are highly reusable and independently scalable, secured, and evolved over time. As serverless architectures grow, we have seen common patterns that get reimplemented across companies, teams, and projects, hurting development velocity and leading to wasted effort. We have made it easier to develop new serverless architectures that meet organizational best practices by supporting nested applications in AWS SAM and the AWS Serverless Application Repository.

How it works

Nested applications build off a concept in AWS CloudFormation called nested stacks. With nested applications, serverless applications are deployed as stacks, or collections of resources, that contain one or more other serverless application stacks. You can reference resources created in these nested templates to either the parent stack or other nested stacks to manage these collections of resources more easily.

This enables you to build sophisticated serverless architectures by reusing services that are authored and maintained independently but easily composed via AWS SAM and the AWS Serverless Application Repository. These applications can be either publicly or privately available in the AWS Serverless Application Repository. It’s just as easy to create your own serverless applications that you then consume again later in other nested applications. You can access the application code, configure parameters exposed via the nested application’s template, and later manage its configuration completely.

Building a nested application

Suppose that I want to build an API powered by a serverless application architecture made up of AWS Lambda and Amazon API Gateway. I can use AWS SAM to create Lambda functions, configure API Gateway, and deploy and manage them both. To start building, I can use the sam init command.

$ sam init -r python2.7
[+] Initializing project structure...
[SUCCESS] - Read sam-app/README.md for further instructions on how to proceed
[*] Project initialization is now complete

The sam-app directory has everything that I need to start building a serverless application.

$ tree sam-app/
sam-app/
├── hello_world
│   ├── app.py
│   ├── app.pyc
│   ├── __init__.py
│   ├── __init__.pyc
│   └── requirements.txt
├── README.md
├── template.yaml
└── tests
    └── unit
        ├── __init__.py
        ├── __init__.pyc
        ├── test_handler.py
        └── test_handler.pyc

3 directories, 11 files

The README in the sam-app directory points me to using the new sam build command to install any requirements of my application.

$ cd sam-app/
$ sam build
2018-11-21 20:41:23 Building resource 'HelloWorldFunction'
2018-11-21 20:41:23 Running PythonPipBuilder:ResolveDependencies
2018-11-21 20:41:24 Running PythonPipBuilder:CopySource

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml
...

At this point, I have a fully functioning serverless application based on Lambda that I can test and debug using the AWS SAM CLI locally. For example, I can invoke my Lambda function directly, as in the following code.

$ cd .aws-sam/build/
$ sam local invoke --no-event
2018-11-21 20:43:52 Invoking app.lambda_handler (python2.7)

....trimmed output....

{"body": "{\"message\": \"hello world\", \"location\": \"34.239.158.3\"}", "statusCode": 200}

Or I can use the actual API Gateway interface for it with the sam local start-api command. I can also now use the sam package and sam deploy commands to get this application running on Lambda and begin letting my customers consume it. What I really want to do, though, is expand on this API by adding an authorization mechanism to add some security. API Gateway supports several methods for doing this, but in this case, I want to leverage a basic form of HTTP Basic Auth. Although not the best way to secure an API, this example highlights the power of nested applications.

To start, I search an existing serverless application that meets my needs. I can access the AWS Serverless Application Repository either directly or via the AWS Lambda console. Then I search for “http basic auth,” as shown in the following image.

There is already an application for HTTP Basic Auth that someone else made.

I can review the AWS SAM template, license, and permissions created in the AWS Serverless Application Repository. Reading through the linked GitHub repository and README, I find that this application meets my needs, and I can review its code. The application enables me to store a user’s username and password in Amazon DynamoDB and use them to provide authorization to an API.

I could deploy this application directly from the console and then plug the launched Lambda function into my API Gateway configuration manually. This would work, but it doesn’t give me a way to more directly relate the two applications together as one application, which to me it is. If I remove the authorizer application, my API will break. If I decide to shut down my API, I might forget about the authorizer. Logically thinking about them as a single application is difficult. This is where nested applications come in.

The launch of nested applications comes with a new AWS SAM resource, AWS::Serverless::Application, which you can use to refer to serverless applications that you want to nest inside another application. The specification for this resource is straightforward and at a minimum resembles the following template.

application-alias-name:
  Type: AWS::Serverless::Application
  Properties:
    Location:
    Parameters:

For the full resource specification, see AWS::Serverless::Application on the GitHub website.

Currently, the location can be one of two places. It can be in the AWS Serverless Application Repository.

Location:
  ApplicationId: arn:aws:serverlessrepo:region:account-id:applications/application-name
  SemanticVersion: 1.0.0

Or it can be in Amazon S3.

Location: https://s3.region.amazonaws.com/bucket-name/sam-template-object

The sam-template-object is the name of a packaged AWS SAM template.

Parameters is the parameters that the nested application requires at launch as defined by its template. You can discover them via the template that the nested application references or in the console under Application Settings. For this Basic HTTP Auth application, there are no parameters, and so there is nothing for me to do here.

We have included a feature to help you to figure out what you need in your template. First, navigate to the Review, configure and deploy page by choosing Deploy from the Application details page or when you choose an application via the Lambda console’s view into the app repository. Then choose Copy as SAM Resource.

This button copies to your clipboard exactly what you need to start nesting this application. For this application, choosing Copy as SAM Resource resulted in the following template.

lambdaauthorizerbasicauth:
  Type: AWS::Serverless::Application
  Properties:
    Location:
      ApplicationId: arn:aws:serverlessrepo:us-east-1:560348900601:applications/lambda-authorizer-basic-auth
      SemanticVersion: 0.2.0

Note: The name of this resource defaults to lambdaauthorizerbasicauth, but best practice is to give it a name that is more descriptive for your use case.

If I stopped here and launched the application, this second application’s stack would launch as a nested application off my application. Before I do that, I want to use the function that this application creates as the authorizer for the API Gateway endpoint created in my existing AWS SAM template. To reference this Lambda function, I can check the template for this application to see what (if any) outputs are created.

I can get the Amazon Resource Name (ARN) from this function by referencing the output named LambdaAuthorizerBasicAuthFunction directly as an attribute of the nested application.

!GetAtt lambdaauthorizerbasicauth.Outputs.LambdaAuthorizerBasicAuthFunction

To configure the authorizer for my existing function, I create an AWS::Serverless::Api resource and then configure the authorizer with the FunctionArn attribute set to this value. The following template shows the entire syntax.

MyApi:
  Type: AWS::Serverless::Api
  Properties:
    StageName: Prod      
    Auth:
      DefaultAuthorizer: MyLambdaRequestAuthorizer
      Authorizers:
        MyLambdaRequestAuthorizer:
          FunctionPayloadType: REQUEST
          FunctionArn: !GetAtt lambdaauthorizerbasicauth.Outputs.LambdaAuthorizerBasicAuthFunction
          Identity:
            Headers:
              - Authorization

This creates a new API Gateway stage, configures the authorizer to point to my nested stack’s Lambda function, and specifies what information is required from clients of the API. In my actual function definition, I refer to the new API stage definition and the authorizer. The last three lines of the following template are new.

Events:
   HelloWorld:
      Type: Api
      Properties:
        Method: get
        Path: /hello
        RestApiId: !Ref MyApi
        Auth:
          Authorizers: MyLambdaRequestAuthorizer

Finally, because of the code created by the sam init command that I ran earlier, I need to update the HelloWorldApi output to use my new API resource.

Outputs:
  HelloWorldApi:
    Description: API Gateway endpoint URL for Prod stage for Hello World function
    Value:
      Fn::Sub: https://${MyApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hello/

At this point, my combined template file is 70 lines of YAML that define the following:

  • My Lambda function
  • The API Gateway endpoint configuration that I’ll use to interface with my business logic
  • The nested application that represents my authorizer
  • Other bits such as the outputs, globals, and parameters of my template

I can use the sam package and sam deploy commands to launch this whole stack of resources.

sam package --template-file template.yaml --output-template-file packaged-template.yaml --s3-bucket my-bucket

Successfully packaged artifacts and wrote output template to file packaged-template.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file packaged-template.yaml --stack-name <YOUR STACK NAME>

To deploy applications that have nested applications that come from the app repository, I use a new capability in AWS CloudFormation called auto-expand. I pass it in by adding CAPABILITY_AUTO_EXPAND to the –capabilities flag of the deploy command.

sam deploy --template-file packaged-template.yaml --stack-name SimpleAuthExample --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - SimpleAuthExample

With the successful creation of my application stack, I can find what I created. The stack can be in one of two places: the AWS CloudFormation console or the Lambda console’s Application view. Because I launched this application via AWS SAM, I can use the Application view to manage serverless applications. That page displays two application stacks named SimpleAuthExample.

The second application stack has the appended name of the nested application that I launched. Because it was also a serverless application launched with AWS SAM, the Application view enables me to manage them independently if I want. Selecting the SimpleAuthExample application shows me its details page, where I can get more information, including the various launched resources of this application.

There are four top-level resources or resource groupings:

  • My Lambda function, which expands to show the related permissions and role for the function
  • The nested application stack for my authorizer
  • The API Gateway resource and related deployment and stage
  • The Lambda permission that AWS SAM created to allow API Gateway to use my nested application’s Lambda function as an authorizer

Selecting the logical ID of my nested application opens the AWS CloudFormation console, where I note its relation to the root stack.

Testing the authorizer

With my application and authorizer set up, I want to confirm its functionality. The Application view for the authorizer application shows the DynamoDB table that stores my user credentials.

Selecting the logical ID of the table opens the DynamoDB console. In the Item view, I can create a user and its password.

Back in the Application view for my main application, expanding the API Gateway resource displays the physical ID link to the API Gateway stage for this application. It represents the API endpoint that I would point my clients at.

I can copy that destination URL to my clipboard, and with a tool such as curl, I can test my application.

curl -u foo:bar https://dw0wr24jwg.execute-api.us-east-1.amazonaws.com/Prod/hello
{"message": "hello world", "location": "34.239.119.48"}

To confirm that my authorizer is working, I test with bad credentials.

curl -u bar:foo https://dw0wr24jwg.execute-api.us-east-1.amazonaws.com/Prod/hello
{"Message":"User is not authorized to access this resource with an explicit deny"}

Everything works!

To delete this entire setup, I delete the main stack of this application, which deletes all of its resources and the nested application as well.

Conclusion

With the growth of serverless applications, we’re finding that developers can be reusing and sharing common patterns, both publicly and privately. This led to the release of the AWS Serverless Application Repository in early 2018. Now nested applications bring the additional benefit of simplifying the consumption of these reusable application components alongside your own applications. This blog post demonstrates how to find and discover applications in the AWS Serverless Application Repository and nest them in your own applications. We have shown how you can modify an AWS SAM template to refer to these other components, launch your new nested applications, and get their improved management and organizational capabilities.

We’re excited to see what this enables you to do, and we welcome any feedback here, in the AWS forums, and on Twitter.

Happy coding!

Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis

Post Syndicated from David Bailey original https://aws.amazon.com/blogs/architecture/stream-amazon-cloudwatch-logs-to-a-centralized-account-for-audit-and-analysis/

A key component of enterprise multi-account environments is logging. Centralized logging provides a single point of access to all salient logs generated across accounts and regions, and is critical for auditing, security and compliance. While some customers use the built-in ability to push Amazon CloudWatch Logs directly into Amazon Elasticsearch Service for analysis, others would prefer to move all logs into a centralized Amazon Simple Storage Service (Amazon S3) bucket location for access by several custom and third-party tools. In this blog post, I will show you how to forward existing and any new CloudWatch Logs log groups created in the future to a cross-account centralized logging Amazon S3 bucket.

The streaming architecture I use in the destination logging account is a streamlined version of the architecture and AWS CloudFormation templates from the Central logging in Multi-Account Environments blog post by Mahmoud Matouk. This blog post assumes some knowledge of CloudFormation, Python3 and the boto3 AWS SDK. You will need to have or configure an AWS working account and logging account, an IAM access and secret key for those accounts, and a working environment containing Python and the boto3 SDK. (For assistance, see the Getting Started Resource Center and Start Building with SDKs and Tools.) All CloudFormation templates and Python code used in this article can be found in this GitHub Repository.

Setting Up the Solution

You need to create or use an existing S3 bucket for storing CloudFormation templates and Python code for an AWS Lambda function. This S3 bucket is referred to throughout the blog post as the <S3 infrastructure-bucket>. Ensure that the bucket does not block new bucket policies or cross-account access by checking the bucket’s Permissions tab and the Public access settings button.

You also need a bucket policy that allows each account that needs to stream logs to access it when we create the AWS Lambda function below. To do so, update your bucket policy to include each new account you create and the <S3 infrastructure-bucket> ARN from the top of the Bucket policy editor page to modify this template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                  "03XXXXXXXX85",
                  "29XXXXXXXX02",
                  "13XXXXXXXX96",
                  "37XXXXXXXX30",
                  "86XXXXXXXX95"
                ]
            },
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::<S3 infrastructure-bucket>",
                "arn:aws:s3:::<S3 infrastructure-bucket>/*"
            ]
        }
    ]
}

Clone a local copy of the CloudFormation templates and Python code from the GitHub repository. Compress the CentralLogging.py and lambda.py into a .zip file for the lambda function we create below and name it AddSubscriptionFilter.zip. Load these local files into the <S3 infrastructure-bucket>. I recommend using folders called /python for the .py files, /lambdas for the AddSubscriptionFilter.zip file and /cfn for the CloudFormation templates.

Multi-Account Configuration and the Central Logging Account

One form of multi-account configuration is the Landing Zone offering, which provides a core logging account for storing all logs for auditing. I use this account configuration as an example in this blog post. Initially, the Landing Zone setup creates several stack sets and resources, including roles, security groups, alarms, lambda functions, a cloud trail stream and an S3 bucket.

If you are not using a Landing Zone, create an appropriately named S3 bucket in the account you have chosen as a logging account. This S3 bucket will be referred to later as the <LoggingS3Bucket>. To mimic what the Landing Zone calls its logging bucket, you can use the format aws-landing-zone-logs-<Account Number><Region>, or simply pick an appropriate name for the centralized logging location. In a production environment, remember that it is critical to lock down the access to logging resources and the permissions allowed within the account to prevent deletion or tampering with the logs.

Figure 1 - Initial Landing Zone logging account resources

Figure 1 – Initial Landing Zone logging account resources

The S3 bucket – aws-landing-zone-logs-<Account Number><Region> is the most important resource created by the stack-sets for logging purposes. It contains all of the logs streamed to it from all of the accounts. Initially, the Landing Zone only sends the AWS CloudTrail and AWS Config logs to this S3 bucket.

In order to send all of the other CloudWatch Logs that are necessary for auditing, we need to add a destination and streaming mechanism to the logging account.

Logging Account Insfrastructure

The additional infrastructure required in the central logging account provides a destination for the log group subscription filters and a stream for log events that are sent from all accounts and appropriate regions to load them into the <LoggingS3Bucket> repository. The selection of these particular AWS resources is important, because Kinesis Data Streams is the only resource currently supported as a destination for cross-account CloudWatch Logs subscription filters.

The centralLogging.yml CloudFormation template automates the creation of the entire required infrastructure in the core logging account. Make sure to run it in each of the regions in which you need to centralize logs. The log group subscription filter and destination regions must match in order to successfully stream the logs.

Installation Instructions:

  1. Modify the centralLogging.yml template to add your account numbers for all of the accounts you want to stream logs from into the DestinationPolicy where you see the <AccountNumberHere> placeholders. Remove any unused placeholders.
  2. In the same DestinationPolicy, modify the final arn statement, replacing <region> with the region it will be run in (e.g., us-east-1), and the <logging account number> with the account number of the logging account where this template is to be run.
  3. Log in to the core logging account and access the AWS management console using administrator credentials.
  4. Navigate to CloudFormation and click the Create Stack button.
  5. Select Specify an Amazon S3 template URL and enter the Link for the centralLogging.yml template found in the <S3 infrastructure-bucket>.
  6. Enter a stack name, such as CentralizedLogging, and the one parameter called LoggingS3Bucket. Enter in the ARN of the logging bucket: arn:aws:s3::: <LoggingS3Bucket>. This can be obtained by opening the S3 console, clicking on the bucket icon next to this bucket, and then clicking the Copy Bucket ARN button.
  7. Skip the next page, acknowledge the creation of IAM resources, and Create the stack.
  8. When the stack completes, select the stack name to go to stack details and open the Outputs. Copy the value of the DestinationArnExport, which will be needed as a parameter for the script in the next section.

Upon successful creation of this CloudFormation stack, the following new resources will be created:

  • Amazon CloudWatch Logs Destination
  • Amazon Kinesis Stream
  • Amazon Kinesis Firehose Stream
  • Two AWS Identity and Access Management (IAM) Roles
Figure 2 - New infrastructure required in the centralized logging account

Figure 2 – New infrastructure required in the centralized logging account

Because the Landing Zone is a multi-account offering, the Log Destination is required to be the destination for all subscription filters. The key feature of the destination is its DestinationPolicy. Whenever a new account is added to the environment, its account number needs to be added to this DestinationPolicy in order for logs to be sent to it from the new account. Add the new account number in the centralLogging.yml CloudFormation template, and run an update in CloudFormation to complete the addition. A sample Destination Policy looks like this:

{
  "Version" : "2012-10-17",
  "Statement" : [
    {
      "Effect" : "Allow",
      "Principal" : {
        "AWS" : [
          "03XXXXXXXX85",
          "29XXXXXXXX02",
          "13XXXXXXXX96",
          "37XXXXXXXX30",
          "86XXXXXXXX95"
        ]
      },
      "Action" : "logs:PutSubscriptionFilter",
      "Resource" : "arn:aws:logs:<Region>:<LoggingAccountNumber>:destination:CentralLogDestination"
    }
  ]
}

The Kinesis Stream get records from the Logs Destination and holds them for 48 hours. Kinesis Streams scale by adding shards. The CloudFormation template starts the stream with two shards. You need to monitor this as instances and applications are deployed into the accounts, however, because all CloudWatch log objects will flow through this stream, and it will need to be scaled up at some point. To scale, change the number of shards (ShardCount) in the Kinesis Stream resource (KinesisLoggingStream) to the required number. See the Amazon Kinesis Data Streams FAQ documentation to confirm the capacity and throughput of each shard.

Kinesis Firehose provides a simple and efficient mechanism to retrieve the records from the Kinesis Stream and load them into the <LoggingS3Bucket> repository. It uses the CloudFormation template parameter to know where to load the logs. All of the CloudWatch logs loaded by Firehose will be under the prefix /CentralizedAccountsLog. The buffering hints for Firehose suggest that the logs be loaded every 5 minutes or 50 MB. Leave the CompressionFormat UNCOMPRESSED, since the logs are already compressed.

There are two AWS Identity and Access Management (IAM) roles created for this infrastructure. The first, CWLtoKinesisRole is used by the destination to allow CloudWatch Logs from all regions to use the destination to put the log object records into the Kinesis Stream, as well as to pass the role. The second, FirehoseDeliveryRole, allows Firehose to get the log object records from the Kinesis Stream, and then to load them into S3 logging bucket.

Once you have successfully created this infrastructure, the next step is to add the subscription filters to existing log groups.

Adding Subscription Filters to Existing Log Groups

The next step in the process is to add subscription filters for the Log Destination in the core logging account to all existing log groups. Several log groups are created by the Landing Zone, or you may have created them by using various AWS services or by logging application events. For every new AWS account, you will need to run the init_account_central_logging.py Python script to add the subscription filters to all the existing log groups.

The init_account_central_logging.py script takes one parameter, which is the Log Destination ARN. Use the Destination ARN you copied from the stack details output in the previous section as the parameter to the script.

The init_account_central_logging.py script first adds this Destination ARN to the AWS Systems Manager Parameter Store so that the core logic that creates the subscription filter can use it. The script then gets a list of all existing log groups, iterates over them, deletes any existing subscription filters (because there can only be one subscription filter per log group and attempting to create another would cause an error), and then adds the new subscription filter to the centralized logging account to the Log Destination.

Figure 3 - Run script to add subscription filters to existing log groups

Figure 3 – Run script to add subscription filters to existing log groups

Installation Instructions:

  1. Make sure that Python and boto3 are installed and accessible in the client computer – consider loading into a virtual environment to keep dependencies separate.
  2. Set the AWS_PROFILE environment variable to the appropriate AWS account profile.
  3. Log in to the proper account, and obtain administrator or other credentials with appropriate permissions, and add the account access key and secret key to the AWS credentials file.
  4. Set the region and output in the AWS config file.
  5. Download and place two python files into a working directory: init_account_central_logging.py and CentralLogging.py.
  6. Run the script using the command python3 ./init_account_central_logging.py -d <LogDestinationArn>.

Use the AWS Management Console to validate the results. Navigate to CloudWatch Logs and view all of the log groups. Each one should now have a subscription filter named “Logs (CentralLogDestination).”

Automatically Adding Subscription Filters to New Log Groups

The final step to set up the centralized log streaming capability is to run a CloudFormation script to create resources that automatically add subscription filters to new log groups. New log groups are created in accounts by resources (e.g., Lambda functions) and by applications. A subscription filter must be added to every new log group in order to deliver its log events to the logging account,

The AddSubscriptionFilter.yml CloudFormation template contains resources to automatically add subscription filters.

First, it creates a role that allows it to access the lambda code that is stored in a centralized location – the <S3 infrastructure-bucket>. (Remember that its S3 bucket policy must contain this account number in order to access the lambda code.)

Second, the template creates the AddSubscriptionLambda, which reuses the core logic shared by the script in the last section. It retrieves the proper destination from the Parameter Store, deletes any existing subscription filter from the log group, and adds the new subscription filter to the newly created log group. This lambda function is triggered by a CloudWatch event rule.

Third, the CloudFormation creates a Lambda Permission, which allows the event trigger to invoke this particular lambda.

Finally, the CloudFormation template creates an Amazon CloudWatch Events Rule that acts as a trigger for the lambda. This rule looks for an event coming from CloudTrail that signals the creation of a new log group. For each create log group event found, it invokes the AddSubscriptionLambda.

Figure 4 - Infrastructure to automatically add a subscription filter to a new log group and the log flow to the centralized account

Figure 4 – Infrastructure to automatically add a subscription filter to a new log group and the log flow to the centralized account

Installation Instructions:

(Important note: This functionality requires that the LogDestination parameter be properly set to the LogDestinationArn in the Parameter Store before the Lambda will run successfully. The script in the previous step sets this parameter, or it can be done manually. Make certain that the destination specified is in this same region.)

  1. Ensure that the <S3 infrastructure-bucket> has the AddSubscriptionFilter.zip file containing the Python code files lambda.py and CentralLogging.py.
  2. Log in to the appropriate account, and access using administrator credentials. Make sure that the region is set properly.
  3. Navigate to Cloudformation and click the Create Stack button.
  4. Select Specify an Amazon S3 template URL and enter the Link for the AddSubscriptionFilter.yml template found in <S3 infrastructure-bucket>
  5. Enter a stack name, such as AddSubscription.
  6. Enter the two parameters, the <S3 infrastructure-bucket> name (not ARN) and the folder and file name (e.g., lambdas/AddSubscriptionFilter.zip)
  7. Skip the next page, acknowledge the creation of IAM resources, and Create the stack.

In order to test that the automated addition of subscription filters is working properly, use the AWS Management Console to navigate to CloudWatch Logs and click the Actions button. Select Create New Log Group and enter a random log group name, such as “testLogGroup.” When first created, the log group will not have a subscription filter. After a few minutes, refresh the display and you should see the new subscription filter on the log group. At this point, you can delete the test log group.

New Account Setup

As a reminder, when you add new accounts that you want to have stream log events to the central logging account, you will need to configure the new accounts in two places in order for this functionality to work properly.

First, add the account number to the LoggingDestination property DestinationPolicy in the centralLogging.yml template. Then, update the CloudFormation stack.

Second, modify the bucket policy for the <S3 infrastructure-bucket>. Select the Permissions tab, then the Bucket Policy button. Add the new account to allow cross-account access to the lambda code by adding the line “arn:aws:iam::<new account number>:root” to the Principal.AWS list.

Conclusion

Centralized logging is a key component in enterprise multi-account architectures. In this blog post, I have built on the central logging in multi-account environments streaming architecture to automatically subscribe all CloudWatch Logs log groups to send all log events to an S3 bucket in a designated logging account. The solution uses a script to add subscription filters to existing log groups, and a lambda function to automatically place a subscription filter on all new log groups created within the account. This can be used to forward application logs, security logs, VPC flow logs, or any other important logs that are required for audit, security, or compliance purposes.

About the author

David BaileyDavid Bailey is a Cloud Infrastructure Architect with AWS Professional Services specializing in serverless application architecture, IoT, and artificial intelligence. He has spent decades architecting and developing complex custom software applications, as well as teaching internationally on object-oriented design, expert systems, and neural networks.

 

 

Introducing the C++ Lambda Runtime

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/introducing-the-c-lambda-runtime/

This post is courtesy of Marco Magdy, AWS Software Development Engineer – AWS SDKs and Tools

Today, AWS Lambda announced the availability of the Runtime API. The Runtime API allows you to write your Lambda functions in any language, provided that you bundle it with your application artifact or as a Lambda layer that your application uses.

As an example of using this API and based on the customer demand, AWS is releasing a reference implementation of a C++ runtime for Lambda. This C++ runtime brings the simplicity and expressiveness of interpreted languages while maintaining the superiority of C++ performance and low memory footprint. These are benefits that align well with the event-driven, function-based, development model of Lambda applications.

Hello World

Start by writing a Hello World Lambda function in C++ using this runtime.

Prerequisites

You need a Linux-based environment (I recommend Amazon Linux), with the following packages installed:

  • A C++11 compiler, either GCC 5.x or later or Clang 3.3 or later. On Amazon Linux, run the following commands:
    $ yum install gcc64-c++ libcurl-devel
    $ export CC=gcc64
    $ export CXX=g++64
  • CMake v.3.5 or later. On Amazon Linux, run the following command:
    $ yum install cmake3
  • Git

Download and compile the runtime

The first step is to download & compile the runtime:

$ cd ~ 
$ git clone https://github.com/awslabs/aws-lambda-cpp.git
$ cd aws-lambda-cpp
$ mkdir build
$ cd build
$ cmake3 .. -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF \
   -DCMAKE_INSTALL_PREFIX=~/out
$ make && make install

This builds and installs the runtime as a static library under the directory ~/out.

Create your C++ function

The next step is to build the Lambda C++ function.

  1. Create a new directory for this project:
    $ mkdir hello-cpp-world
    $ cd hello-cpp-world
  2. In that directory, create a file named main.cpp with the following content:
    // main.cpp
    #include <aws/lambda-runtime/runtime.h>
    
    using namespace aws::lambda_runtime;
    
    invocation_response my_handler(invocation_request const& request)
    {
       return invocation_response::success("Hello, World!", "application/json");
    }
    
    int main()
    {
       run_handler(my_handler);
       return 0;
    }
  3. Create a file named CMakeLists.txt in the same directory, with the following content:
    cmake_minimum_required(VERSION 3.5)
    set(CMAKE_CXX_STANDARD 11)
    project(hello LANGUAGES CXX)
    
    find_package(aws-lambda-runtime REQUIRED)
    add_executable(${PROJECT_NAME} "main.cpp")
    target_link_libraries(${PROJECT_NAME} PUBLIC AWS::aws-lambda-runtime)
    aws_lambda_package_target(${PROJECT_NAME})
  4. To build this executable, create a build directory and run CMake from there:
    $ mkdir build
    $ cd build
    $ cmake3 .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=~/out
    $ make

    This compiles and links the executable in release mode.

  5. To package this executable along with all its dependencies, run the following command:
    $ make aws-lambda-package-hello

    This creates a zip file in the same directory named after your project, in this case hello.zip.

Create the Lambda function

Using the AWS CLI, you create the Lambda function. First, create a role for the Lambda function to execute under.

  1. Create the following JSON file for the trust policy and name it trust-policy.json.
    {
     "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": ["lambda.amazonaws.com"]
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
  2. Using the AWS CLI, run the following command:
    $ aws iam create-role \
    --role-name lambda-cpp-demo \
    --assume-role-policy-document file://trust-policy.json

    This should output JSON that contains the newly created IAM role information. Make sure to note down the “Arn” value from that JSON. You need it later. The Arn looks like the following:

    “Arn”: “arn:aws:iam::<account_id>:role/lambda-cpp-demo”

  3. Create the Lambda function:
    $ aws lambda create-function \
    --function-name hello-world \
    --role <specify the role arn from the previous step> \
    --runtime provided \
    --timeout 15 \
    --memory-size 128 \
    --handler hello \
    --zip-file fileb://hello.zip
  4. Invoke the function using the AWS CLI:
    <bash>

    $ aws lambda invoke --function-name hello-world --payload '{ }' output.txt

    You should see the following output:

    {
      "StatusCode": 200
    }

    A file named output.txt containing the words “Hello, World!” should be in the current directory.

Beyond Hello

OK, well that was exciting, but how about doing something slightly more interesting?

The following example shows you how to download a file from Amazon S3 and do some basic processing of its contents. To interact with AWS, you need the AWS SDK for C++.

Prerequisites

If you don’t have them already, install the following libraries:

  • zlib-devel
  • openssl-devel
  1. Build the AWS SDK for C++:
    $ cd ~
    $ git clone https://github.com/aws/aws-sdk-cpp.git
    $ cd aws-sdk-cpp
    $ mkdir build
    $ cd build
    $ cmake3 .. -DBUILD_ONLY=s3 \
     -DBUILD_SHARED_LIBS=OFF \
     -DENABLE_UNITY_BUILD=ON \
     -DCMAKE_BUILD_TYPE=Release \
     -DCMAKE_INSTALL_PREFIX=~/out
    
    $ make && make install

    This builds the S3 SDK as a static library and installs it in ~/out.

  2. Create a directory for the new application’s logic:
    $ cd ~
    $ mkdir cpp-encoder-example
    $ cd cpp-encoder-example
  3. Now, create the following main.cpp:
    // main.cpp
    #include <aws/core/Aws.h>
    #include <aws/core/utils/logging/LogLevel.h>
    #include <aws/core/utils/logging/ConsoleLogSystem.h>
    #include <aws/core/utils/logging/LogMacros.h>
    #include <aws/core/utils/json/JsonSerializer.h>
    #include <aws/core/utils/HashingUtils.h>
    #include <aws/core/platform/Environment.h>
    #include <aws/core/client/ClientConfiguration.h>
    #include <aws/core/auth/AWSCredentialsProvider.h>
    #include <aws/s3/S3Client.h>
    #include <aws/s3/model/GetObjectRequest.h>
    #include <aws/lambda-runtime/runtime.h>
    #include <iostream>
    #include <memory>
    
    using namespace aws::lambda_runtime;
    
    std::string download_and_encode_file(
        Aws::S3::S3Client const& client,
        Aws::String const& bucket,
        Aws::String const& key,
        Aws::String& encoded_output);
    
    std::string encode(Aws::String const& filename, Aws::String& output);
    char const TAG[] = "LAMBDA_ALLOC";
    
    static invocation_response my_handler(invocation_request const& req, Aws::S3::S3Client const& client)
    {
        using namespace Aws::Utils::Json;
        JsonValue json(req.payload);
        if (!json.WasParseSuccessful()) {
            return invocation_response::failure("Failed to parse input JSON", "InvalidJSON");
        }
    
        auto v = json.View();
    
        if (!v.ValueExists("s3bucket") || !v.ValueExists("s3key") || !v.GetObject("s3bucket").IsString() ||
            !v.GetObject("s3key").IsString()) {
            return invocation_response::failure("Missing input value s3bucket or s3key", "InvalidJSON");
        }
    
        auto bucket = v.GetString("s3bucket");
        auto key = v.GetString("s3key");
    
        AWS_LOGSTREAM_INFO(TAG, "Attempting to download file from s3://" << bucket << "/" << key);
    
        Aws::String base64_encoded_file;
        auto err = download_and_encode_file(client, bucket, key, base64_encoded_file);
        if (!err.empty()) {
            return invocation_response::failure(err, "DownloadFailure");
        }
    
        return invocation_response::success(base64_encoded_file, "application/base64");
    }
    
    std::function<std::shared_ptr<Aws::Utils::Logging::LogSystemInterface>()> GetConsoleLoggerFactory()
    {
        return [] {
            return Aws::MakeShared<Aws::Utils::Logging::ConsoleLogSystem>(
                "console_logger", Aws::Utils::Logging::LogLevel::Trace);
        };
    }
    
    int main()
    {
        using namespace Aws;
        SDKOptions options;
        options.loggingOptions.logLevel = Aws::Utils::Logging::LogLevel::Trace;
        options.loggingOptions.logger_create_fn = GetConsoleLoggerFactory();
        InitAPI(options);
        {
            Client::ClientConfiguration config;
            config.region = Aws::Environment::GetEnv("AWS_REGION");
            config.caFile = "/etc/pki/tls/certs/ca-bundle.crt";
    
            auto credentialsProvider = Aws::MakeShared<Aws::Auth::EnvironmentAWSCredentialsProvider>(TAG);
            S3::S3Client client(credentialsProvider, config);
            auto handler_fn = [&client](aws::lambda_runtime::invocation_request const& req) {
                return my_handler(req, client);
            };
            run_handler(handler_fn);
        }
        ShutdownAPI(options);
        return 0;
    }
    
    std::string encode(Aws::IOStream& stream, Aws::String& output)
    {
        Aws::Vector<unsigned char> bits;
        bits.reserve(stream.tellp());
        stream.seekg(0, stream.beg);
    
        char streamBuffer[1024 * 4];
        while (stream.good()) {
            stream.read(streamBuffer, sizeof(streamBuffer));
            auto bytesRead = stream.gcount();
    
            if (bytesRead > 0) {
                bits.insert(bits.end(), (unsigned char*)streamBuffer, (unsigned char*)streamBuffer + bytesRead);
            }
        }
        Aws::Utils::ByteBuffer bb(bits.data(), bits.size());
        output = Aws::Utils::HashingUtils::Base64Encode(bb);
        return {};
    }
    
    std::string download_and_encode_file(
        Aws::S3::S3Client const& client,
        Aws::String const& bucket,
        Aws::String const& key,
        Aws::String& encoded_output)
    {
        using namespace Aws;
    
        S3::Model::GetObjectRequest request;
        request.WithBucket(bucket).WithKey(key);
    
        auto outcome = client.GetObject(request);
        if (outcome.IsSuccess()) {
            AWS_LOGSTREAM_INFO(TAG, "Download completed!");
            auto& s = outcome.GetResult().GetBody();
            return encode(s, encoded_output);
        }
        else {
            AWS_LOGSTREAM_ERROR(TAG, "Failed with error: " << outcome.GetError());
            return outcome.GetError().GetMessage();
        }
    }

    This Lambda function expects an input payload to contain an S3 bucket and S3 key. It then downloads that resource from S3, encodes it as base64, and sends it back as the response of the Lambda function. This can be useful to display an image in a webpage, for example.

  4. Next, create the following CMakeLists.txt file in the same directory.
    cmake_minimum_required(VERSION 3.5)
    set(CMAKE_CXX_STANDARD 11)
    project(encoder LANGUAGES CXX)
    
    find_package(aws-lambda-runtime REQUIRED)
    find_package(AWSSDK COMPONENTS s3)
    
    add_executable(${PROJECT_NAME} "main.cpp")
    target_link_libraries(${PROJECT_NAME} PUBLIC
                          AWS::aws-lambda-runtime
                           ${AWSSDK_LINK_LIBRARIES})
    
    aws_lambda_package_target(${PROJECT_NAME})
  5. Follow the same build steps as before:
    $ mkdir build
    $ cd build
    $ cmake3 .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=~/out
    $ make
    $ make aws-lambda-package-encoder

    Notice how the target name for packaging has changed to aws-lambda-package-encoder. The CMake function aws_lambda_package_target() always creates a target based on its input name.

    You should now have a file named “encoder.zip” in your build directory.

  6. Before you create the Lambda function, modify the IAM role that you created earlier to allow it to access S3.
    $ aws iam attach-role-policy \
    --role-name lambda-cpp-demo \
    --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
  7. Using the AWS CLI, create the new Lambda function:
    $ aws lambda create-function \
    --function-name encode-file \
    --role <specify the same role arn used in the prior Lambda> \
    --runtime provided \
    --timeout 15 \
    --memory-size 128 \
    --handler encoder \
    --zip-file fileb://encoder.zip
  8. Using the AWS CLI, run the function. Make sure to use a S3 bucket in the same Region as the Lambda function:
    $ aws lambda invoke --function-name encode-file --payload '{"s3bucket": "your_bucket_name", "s3key":"your_file_key" }' base64_image.txt

    You can use an online base64 image decoder and paste the contents of the output file to verify that everything is working. In a real-world scenario, you would inject the output of this Lambda function in an HTML img tag, for example.

Conclusion

With the new Lambda Runtime API, a new door of possibilities is open. This C++ runtime enables you to do more with Lambda than you ever could have before.

More in-depth details, along with examples, can be found on the GitHub repository. With it, you can start writing Lambda functions with C++ today. AWS will continue evolving the contents of this repository with additional enhancements and samples. I’m so excited to see what you build using this runtime. I appreciate feedback sent via issues in GitHub.

Happy hacking!

New for AWS Lambda – Use Any Programming Language and Share Common Components

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-programming-language-and-share-common-components/

I remember the excitement when AWS Lambda was announced in 2014Four years on, customers are using Lambda functions for many different use cases. For example, iRobot is using AWS Lambda to provide compute services for their Roomba robotic vacuum cleaners, Fannie Mae to run Monte Carlo simulations for millions of mortgages, Bustle to serve billions of requests for their digital content.

Today, we are introducing two new features that are going to make serverless development even easier:

  • Lambda Layers, a way to centrally manage code and data that is shared across multiple functions.
  • Lambda Runtime API, a simple interface to use any programming language, or a specific language version, for developing your functions.

These two features can be used together: runtimes can be shared as layers so that developers can pick them up and use their favorite programming language when authoring Lambda functions.

Let’s see how they work more in detail.

Lambda Layers

When building serverless applications, it is quite common to have code that is shared across Lambda functions. It can be your custom code, that is used by more than one function, or a standard library, that you add to simplify the implementation of your business logic.

Previously, you would have to package and deploy this shared code together with all the functions using it. Now, you can put common components in a ZIP file and upload it as a Lambda Layer. Your function code doesn’t need to be changed and can reference the libraries in the layer as it would normally do.

Layers can be versioned to manage updates, each version is immutable. When a version is deleted or permissions to use it are revoked, functions that used it previously will continue to work, but you won’t be able to create new ones.

In the configuration of a function, you can reference up to five layers, one of which can optionally be a runtime. When the function is invoked, layers are installed in /opt in the order you provided. Order is important because layers are all extracted under the same path, so each layer can potentially overwrite the previous one. This approach can be used to customize the environment. For example, the first layer can be a runtime and the second layer adds specific versions of the libraries you need.

The overall, uncompressed size of function and layers is subject to the usual unzipped deployment package size limit.

Layers can be used within an AWS account, shared between accounts, or shared publicly with the broad developer community.

There are many advantages when using layers. For example, you can use Lambda Layers to:

  • Enforce separation of concerns, between dependencies and your custom business logic.
  • Make your function code smaller and more focused on what you want to build.
  • Speed up deployments, because less code must be packaged and uploaded, and dependencies can be reused.

Based on our customer feedback, and to provide an example of how to use Lambda Layers, we are publishing a public layer which includes NumPy and SciPy, two popular scientific libraries for Python. This prebuilt and optimized layer can help you start very quickly with data processing and machine learning applications.

In addition to that, you can find layers for application monitoring, security, and management from partners such as Datadog, Epsagon, IOpipe, NodeSource, Thundra, Protego, PureSec, Twistlock, Serverless, and Stackery.

Using Lambda Layers

In the Lambda console I can now manage my own layers:

I don’t want to create a new layer now but use an existing one in a function. I create a new Python function and, in the function configuration, I can see that there are no referenced layers. I choose to add a layer:

From the list of layers compatible with the runtime of my function, I select the one with NumPy and SciPy, using the latest available version:

After I add the layer, I click Save to update the function configuration. In case you’re using more than one layer, you can adjust here the order in which they are merged with the function code.

To use the layer in my function, I just have to import the features I need from NumPy and SciPy:

import numpy as np
from scipy.spatial import ConvexHull

def lambda_handler(event, context):

    print("\nUsing NumPy\n")

    print("random matrix_a =")
    matrix_a = np.random.randint(10, size=(4, 4))
    print(matrix_a)

    print("random matrix_b =")
    matrix_b = np.random.randint(10, size=(4, 4))
    print(matrix_b)

    print("matrix_a * matrix_b = ")
    print(matrix_a.dot(matrix_b)
    print("\nUsing SciPy\n")

    num_points = 10
    print(num_points, "random points:")
    points = np.random.rand(num_points, 2)
    for i, point in enumerate(points):
        print(i, '->', point)

    hull = ConvexHull(points)
    print("The smallest convex set containing all",
        num_points, "points has", len(hull.simplices),
        "sides,\nconnecting points:")
    for simplex in hull.simplices:
        print(simplex[0], '<->', simplex[1])

I run the function, and looking at the logs, I can see some interesting results.

First, I am using NumPy to perform matrix multiplication (matrices and vectors are often used to represent the inputs, outputs, and weights of neural networks):

random matrix_1 =
[[8 4 3 8]
[1 7 3 0]
[2 5 9 3]
[6 6 8 9]]
random matrix_2 =
[[2 4 7 7]
[7 0 0 6]
[5 0 1 0]
[4 9 8 6]]
matrix_1 * matrix_2 = 
[[ 91 104 123 128]
[ 66 4 10 49]
[ 96 35 47 62]
[130 105 122 132]]

Then, I use SciPy advanced spatial algorithms to compute something quite hard to build by myself: finding the smallest “convex set” containing a list of points on a plane. For example, this can be used in a Lambda function receiving events from multiple geographic locations (corresponding to buildings, customer locations, or devices) to visually “group” similar events together in an efficient way:

10 random points:
0 -> [0.07854072 0.91912467]
1 -> [0.11845307 0.20851106]
2 -> [0.3774705 0.62954561]
3 -> [0.09845837 0.74598477]
4 -> [0.32892855 0.4151341 ]
5 -> [0.00170082 0.44584693]
6 -> [0.34196204 0.3541194 ]
7 -> [0.84802508 0.98776034]
8 -> [0.7234202 0.81249389]
9 -> [0.52648981 0.8835746 ]
The smallest convex set containing all 10 points has 6 sides,
connecting points:
1 <-> 5
0 <-> 5
0 <-> 7
6 <-> 1
8 <-> 7
8 <-> 6

When I was building this example, there was no need to install or package dependencies. I could quickly iterate on the code of the function. Deployments were very fast because I didn’t have to include large libraries or modules.

To visualize the output of SciPy, it was easy for me to create an additional layer to import matplotlib, a plotting library. Adding a few lines of code at the end of the previous function, I can now upload to Amazon Simple Storage Service (S3) an image that shows how the “convex set” is wrapping all the points:

    plt.plot(points[:,0], points[:,1], 'o')
    for simplex in hull.simplices:
        plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
        
    img_data = io.BytesIO()
    plt.savefig(img_data, format='png')
    img_data.seek(0)

    s3 = boto3.resource('s3')
    bucket = s3.Bucket(S3_BUCKET_NAME)
    bucket.put_object(Body=img_data, ContentType='image/png', Key=S3_KEY)
    
    plt.close()

Lambda Runtime API

You can now select a custom runtime when creating or updating a function:

With this selection, the function must include (in its code or in a layer) an executable file called bootstrap, responsible for the communication between your code (that can use any programming language) and the Lambda environment.

The runtime bootstrap uses a simple HTTP based interface to get the event payload for a new invocation and return back the response from the function. Information on the interface endpoint and the function handler are shared as environment variables.

For the execution of your code, you can use anything that can run in the Lambda execution environment. For example, you can bring an interpreter for the programming language of your choice.

You only need to know how the Runtime API works if you want to manage or publish your own runtimes. As a developer, you can quickly use runtimes that are shared with you as layers.

We are making these open source runtimes available today:

We are also working with our partners to provide more open source runtimes:

  • Erlang (Alert Logic)
  • Elixir (Alert Logic)
  • Cobol (Blu Age)
  • N|Solid (NodeSource)
  • PHP (Stackery)

The Runtime API is the future of how we’ll support new languages in Lambda. For example, this is how we built support for the Ruby language.

Available Now

You can use runtimes and layers in all regions where Lambda is available, via the console or the AWS Command Line Interface (CLI). You can also use the AWS Serverless Application Model (SAM) and the SAM CLI to test, deploy and manage serverless applications using these new features.

There is no additional cost for using runtimes and layers. The storage of your layers takes part in the AWS Lambda Function storage per region limit.

To learn more about using the Runtime API and Lambda Layers, don’t miss our webinar on December 11, hosted by Principal Developer Advocate Chris Munns.

I am so excited by these new features, please let me know what are you going to build next!

Announcing Ruby Support for AWS Lambda

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/announcing-ruby-support-for-aws-lambda/

This post is courtesy of Xiang Shen Senior – AWS Solutions Architect and Alex Wood Software Development Engineer – AWS SDKs and Tools

Ruby remains a popular programming language for AWS customers. In the summer of 2011, AWS introduced the initial release of AWS SDK for Ruby, which has helped Ruby developers to better integrate and use AWS resources. The SDK is now in its third major version and it continues to improve and deliver AWS API updates.

Today, AWS is excited to announce Ruby as a supported language for AWS Lambda.

Now it’s possible to write Lambda functions as idiomatic Ruby code, and run them on AWS. The AWS SDK for Ruby is included in the Lambda execution environment by default. That makes it easy to interact with the AWS resources directly from your functions. In this post, we walk you through how it works, using examples:

  • Creating a Hello World example
  • Including dependencies
  • Migrating a Sinatra application

Creating a Hello World example

If you are new to Lambda, it’s simple to create a function using the console.

  1. Open the Lambda console.
  2. Choose Create function.
  3. Select Author from scratch.
  4. Name your function something like hello_ruby.
  5. For Runtime, choose Ruby 2.5.
  6. For Role, choose Create a new role from one or more templates.
  7. Name your role something like hello_ruby_role.
  8. Choose Create function.

Your function is created and you are directed to your function’s console page.

You can modify all aspects of your function, such editing the function’s code, assigning one or more triggering services, or configuring additional services that your function can interact with. From the Monitoring tab, you can view various metrics about your function’s usage as well as a link to CloudWatch Logs.

As you can see in the code editor, the Ruby code for this Hello World example is basic. It has a single handler function named lambda_handler and returns an HTTP status code of 200 and the text “Hello from Lambda!” in a JSON structure. You can learn more about the programming model for Lambda functions.

Next, test this Lambda function and confirm that it is working.

  1. On your function console page, choose Test.
  2. Name the test HelloRubyTest and clear out the data in the brackets. This function takes no input.
  3. Choose Save.
  4. Choose Test.

You should now see the results of a success invocation of your Ruby Lambda function.

Including dependencies

When developing Lambda functions with Ruby, you probably need to include other dependencies in your code. To achieve this, use the tool bundle to download the needed RubyGems to a local directory and create a deployable application package. All dependencies need to be included in either this package or in a Lambda layer.

Do this with a Lambda function that is using the gem aws-record to save data into an Amazon DynamoDB table.

  1. Create a directory for your new Ruby application in your development environment:
    mkdir hello_ruby
    cd hello_ruby
  2. Inside of this directory, create a file Gemfile and add aws-record to it:
    source 'https://rubygems.org'
    gem 'aws-record', '~> 2'
  3. Create a hello_ruby_record.rb file with the following code. In the code, put_item is the handler method, which expects an event object with a body attribute. After it’s invoked, it saves the value of the body attribute along with a UUID to the table.
    # hello_ruby_record.rb
    require 'aws-record'
    
    class DemoTable
      include Aws::Record
      set_table_name ENV[‘DDB_TABLE’]
      string_attr :id, hash_key: true
      string_attr :body
    end
    
    def put_item(event:,context:)
      body = event["body"]
      item = DemoTable.new(id: SecureRandom.uuid, body: body)
      item.save! # raise an exception if save fails
      item.to_h
    end 
  4. Next, bring in the dependencies for this application. Bundler is a tool used to manage RubyGems. From your application directory, run the following two commands. They create the Gemfile.lock file and download the gems to the local directory instead of to the local systems Ruby directory. This way, they ensure that all your dependencies are included in the function deployment package.
    bundle install
    bundle install --deployment
  5. AWS SAM is a templating tool that you can use to create and manage serverless applications. With it, you can define the structure of your Lambda application, define security policies and invocation sources, and manage or create almost any AWS resource. Use it now to help define the function and its policy, create your DynamoDB table and then deploy the application.
    Create a new file in your hello_ruby directory named template.yaml with the following contents:

    AWSTemplateFormatVersion: '2010-09-09'
    Transform: AWS::Serverless-2016-10-31
    Description: 'sample ruby application'
    
    Resources:
      HelloRubyRecordFunction:
        Type: AWS::Serverless::Function
        Properties:
          Handler: hello_ruby_record.put_item
          Runtime: ruby2.5
          Policies:
          - DynamoDBCrudPolicy:
              TableName: !Ref RubyExampleDDBTable 
          Environment:
            Variables:
              DDB_TABLE: !Ref RubyExampleDDBTable
    
      RubyExampleDDBTable:
        Type: AWS::Serverless::SimpleTable
        Properties:
          PrimaryKey:
            Name: id
            Type: String
    
    Outputs:
      HelloRubyRecordFunction:
        Description: Hello Ruby Record Lambda Function ARN
        Value:
          Fn::GetAtt:
          - HelloRubyRecordFunction
          - Arn

    In this template file, you define Serverless::Function and Serverless::SimpleTable as resources, which correspond to a Lambda function and DynamoDB table.

    The line Policies in the function and the following line DynamoDBCrudPolicy refer to an AWS SAM policy template, which greatly simplifies granting permissions to Lambda functions. The DynamoDBCrudPolicy allows you to create, read, update, and delete DynamoDB resources and items in tables.

    In this example, you limit permissions by specifying TableName and passing a reference to Serverless::SimpleTable that the template creates. Next in importance is the Environment section of this template, where you create a variable named DDB_TABLE and also pass it a reference to Serverless::SimpleTable. Lastly, the Outputs section of the template allows you to easily find the function that was created.

    The directory structure now should look like the following:

    $ tree -L 2 -a
    .
    ├── .bundle
    │   └── config
    ├── Gemfile
    ├── Gemfile.lock
    ├── hello_ruby_record.rb
    ├── template.yaml
    └── vendor
        └── bundle
  6. Now use the template file to package and deploy your application. An AWS SAM template can be deployed using the AWS CloudFormation console, AWS CLI, or AWS SAM CLI. The AWS SAM CLI is a tool that simplifies serverless development across the lifecycle of your application. That includes the initial creation of a serverless project, to local testing and debugging, to deployment up to AWS. Follow the steps for your platform to get the AWS SAM CLI Installed.
  7. Create an Amazon S3 bucket to store your application code. Run the following AWS CLI command to create an S3 bucket with a custom name:
    aws s3 mb s3://<bucketname>
  8. Use the AWS SAM CLI to package your application:
    sam package --template-file template.yaml \
    --output-template-file packaged-template.yaml \
    --s3-bucket <bucketname>

    This creates a new template file named packaged-template.yaml.

  9. Use the AWS SAM CLI to deploy your application. Use any stack-name value:
    sam deploy --template-file packaged-template.yaml \
    --stack-name helloRubyRecord \
    --capabilities CAPABILITY_IAM
    
    Waiting for changeset to be created...
    Waiting for stack create/update to complete
    Successfully created/updated stack - helloRubyRecord

    This can take a few moments to create all of the resources from the template. After you see the output “Successfully created/updated stack,” it has completed.

  10. In the Lambda Serverless Applications console, you should see your application listed:

  11. To see the application dashboard, choose the name of your application stack. Because this application was deployed with either AWS SAM or AWS CloudFormation, the dashboard allows you to manage the resources as a single group with a number of features. You can view the stack resources, its template, recent deployments, metrics, including any custom dashboards you might make.

Now test the Lambda function and confirm that it is working.

  1. Choose Overview. Under Resources, select the Lambda function created in this application:

  2. In the Lambda function console, configure a test as you did earlier. Use the following JSON:
    {"body": "hello lambda"}
  3. Execute the test and you should see a success message:

  4. In the Lambda Serverless Applications console for this stack, select the DynamoDB table created:

  5. Choose Items

The id and body should match the output from the Lambda function test run.

You just created a Ruby-based Lambda application using AWS SAM!

Migrating a Sinatra application

Sinatra is a popular open source framework for Ruby that launched over a decade ago. It allows you to quickly create powerful web applications with minimal effort. Until today, you still would have needed servers to run those applications. Now, you can just deploy a Sinatra app to Lambda and move to a serverless world!

Thanks to Rack, a Ruby webserver interface, you only need to create a simple Lambda function to bridge the gap between the HTTP requests and the serverless Sinatra application. You don’t need to make additional changes to other Sinatra files at all. Paired with Amazon API Gateway and DynamoDB, your Sinatra application runs completely serverless!

For this post, take an existing Sinatra application and make it function in Lambda.

  1. Clone the serverless-sinatra-sample GitHub repository into your local environment.
    Under the app directory, find the Sinatra application files. The files enable you to specify routes to return either JSON or HTML that is generated from ERB templates in the server.rb file.

    ├── app
    │   ├── config.ru
    │   ├── server.rb
    │   └── views
    │       ├── feedback.erb
    │       ├── index.erb
    │       └── layout.erb

    In the root of the directory, you also find the template.yaml and lambda.rb files. The template.yaml includes four resources:

    • Serverless::Function
    • Serverless::API
    • Serverless::SimpleTable
    • Lambda::Permission

    In the lambda.rb file, you find the main handler for this function, which calls Rack to interface with the Sinatra application.

  2. This application has several dependencies, so use bundle to install them:
    bundle install
    bundle install --deployment
  3. Package this Lambda function and the related application components using the AWS SAM CLI:
    sam package --template-file template.yaml \
    --output-template-file packaged-template.yaml \
    --s3-bucket <bucketname>
  4. Next, deploy the application:
    sam deploy --template-file packaged-template.yaml \
    --stack-name LambdaSinatra \
    --capabilities CAPABILITY_IAM                                                                                                        
    
    Waiting for changeset to be created..
    Waiting for stack create/update to complete
    Successfully created/updated stack - LambdaSinatra
  5. In the Lambda Serverless Applications console, select your application:

  6. Choose Overview. Under Resources, find the ApiGateway RestApi entry and select the Logical ID. Below it is SinatraAPI:

  7. In the API Gateway console, in the left navigation pane, choose Dashboard. Copy the URL from Invoke this API and paste it in another browser tab:

  8. Add on to the URL a route from the Sinatra application, as seen in the server.rb.

For example, this is the hello-world GET route:

And this is the /feedback route:

Congratulations, you’ve just successfully deployed a Sinatra-based Ruby application inside of a Lambda function!

Conclusion

As you’ve seen in this post, getting started with Ruby on Lambda is made easy via either the AWS Management Console or the AWS SAM CLI.

You might even be able to easily port existing applications to Lambda without needing to change your code base. The new support for Ruby allows you to benefit from the greatly reduced operational overhead, scalability, availability, and pay–per-use pricing of Lambda.

If you are excited about this feature as well, there is even more information on writing Lambda functions in Ruby in the AWS Lambda Developer Guide.

Happy coding!