Tag Archives: AWS Lambda

Intuit: Serving Millions of Global Customers with Amazon Connect

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/intuit-serving-millions-of-global-customers-with-amazon-connect/

Recently, Bill Schuller, Intuit Contact Center Domain Architect met with AWS’s Simon Elisha to discuss how Intuit manages its customer contact centers with AWS Connect.

As a 35-year-old company with an international customer base, Intuit is widely known as the maker of Quick Books and Turbo Tax, among other software products. Its 50 million customers can access its global contact centers not just for password resets and feature explanations, but for detailed tax interpretation and advice. As you can imagine, this presents a challenge of scale.

Using Amazon Connect, a self-service, cloud-based contact center service, Intuit has been able to provide a seamless call-in experience to Intuit customers from around the globe. When a customer calls in to Amazon Connect, Intuit is able to do a “data dip” through AWS Lambda out to the company’s CRM system (in this case, SalesForce) in order to get more information from the customer. At this point, Intuit can leverage other services like Amazon Lex for national language feedback and then get the customer to the right person who can help. When the call is over, instead of having that important recording of the call locked up in a proprietary system, the audio is moved into an S3 bucket, where Intuit can do some post-call processing. It can also be sent it out to third parties for analysis, or Intuit can use Amazon Transcribe or Amazon Comprehend to get a transcription or sentiment analysis to understand more about what happened during that particular call.

Watch the video below to understand the reasons why Intuit decided on this set of AWS services (hint: it has to do with the ability to experiment with speed and scale but without the cost overhead).

*Check out more This Is My Architecture video series.

About the author

Annik StahlAnnik Stahl is a Senior Program Manager in AWS, specializing in blog and magazine content as well as customer ratings and satisfaction. Having been the face of Microsoft Office for 10 years as the Crabby Office Lady columnist, she loves getting to know her customers and wants to hear from you.

Best Practices for Developing on AWS Lambda

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/best-practices-for-developing-on-aws-lambda/

In our previous post we discussed the various ways you can invoke AWS Lambda functions. In this post, we’ll provide some tips and best practices you can use when building your AWS Lambda functions.

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your Lambda functions. You should take advantage of this architecture with the tips below.

Tip #1: When to VPC-Enable a Lambda Function

Lambda functions always operate from an AWS-owned VPC. By default, your function has full ability to make network requests to any public internet address — this includes access to any of the public AWS APIs. For example, your function can interact with AWS DynamoDB APIs to PutItem or Query for records. You should only enable your functions for VPC access when you need to interact with a private resource located in a private subnet. An RDS instance is a good example.

RDS instance: When to VPC enable a Lambda function

Once your function is VPC-enabled, all network traffic from your function is subject to the routing rules of your VPC/Subnet. If your function needs to interact with a public resource, you will need a route through a NAT gateway in a public subnet.

Tip #2: Deploy Common Code to a Lambda Layer (i.e. the AWS SDK)

If you intend to reuse code in more than one function, consider creating a Layer and deploying it there. A great candidate would be a logging package that your team is required to standardize on. Another great example is the AWS SDK. AWS will include the AWS SDK for NodeJS and Python functions (and update the SDK periodically). However, you should bundle your own SDK and pin your functions to a version of the SDK you have tested.

Tip #3: Watch Your Package Size and Dependencies

Lambda functions require you to package all needed dependencies (or attach a Layer) — the bigger your deployment package, the slower your function will cold-start. Remove all unnecessary items, such as documentation and unused libraries. If you are using Java functions with the AWS SDK, only bundle the module(s) that you actually need to use — not the entire SDK.

Good:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>dynamodb</artifactId>
    <version>2.6.0</version>
</dependency>

Bad:

<!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-sdk-java -->
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>aws-sdk-java</artifactId>
    <version>2.6.0</version>
</dependency>

Tip #4: Monitor Your Concurrency (and Set Alarms)

Our first post in this series talked about how concurrency can effect your down stream systems. Since Lambda functions can scale extremely quickly, this means you should have controls in place to notify you when you have a spike in concurrency. A good idea is to deploy a CloudWatch Alarm that notifies your team when function metrics such as ConcurrentExecutions or Invocations exceeds your threshold. You should create an AWS Budget so you can monitor costs on a daily basis. Here is a great example of how to set up automated cost controls.

Tip #5: Over-Provision Memory (in some use cases) but Not Function Timeout

Lambda allocates compute power in proportion to the memory you allocate to your function. This means you can over provision memory to run your functions faster and potentially reduce your costs. You should benchmark your use case to determine where the breakeven point is for running faster and using more memory vs running slower and using less memory.

However, we recommend you do not over provision your function time out settings. Always understand your code performance and set a function time out accordingly. Overprovisioning function timeout often results in Lambda functions running longer than expected and unexpected costs.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Understanding the Different Ways to Invoke Lambda Functions

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/understanding-the-different-ways-to-invoke-lambda-functions/

In our first post, we talked about general design patterns to enable massive scale with serverless applications. In this post, we’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Synchronous Invokes

Synchronous invocations are the most straight forward way to invoke your Lambda functions. In this model, your functions execute immediately when you perform the Lambda Invoke API call. This can be accomplished through a variety of options, including using the CLI or any of the supported SDKs.

Here is an example of a synchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type RequestResponse —payload  “[JSON string here]”

The Invocation-type flag specifies a value of “RequestResponse”. This instructs AWS to execute your Lambda function and wait for the function to complete. When you perform a synchronous invoke, you are responsible for checking the response and determining if there was an error and if you should retry the invoke.

Many AWS services can emit events that trigger Lambda functions. Here is a list of services that invoke Lambda functions synchronously:

Asynchronous Invokes

Here is an example of an asynchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type Event —payload  “[JSON string here]”

Notice, the Invocation-type flag specifies “Event.” If your function returns an error, AWS will automatically retry the invoke twice, for a total of three invocations.

Here is a list of services that invoke Lambda functions asynchronously:

Asynchronous invokes place your invoke request in Lambda service queue and we process the requests as they arrive. You should use AWS X-Ray to review how long your request spent in the service queue by checking the “dwell time” segment.

Poll based Invokes

This invocation model is designed to allow you to integrate with AWS Stream and Queue based services with no code or server management. Lambda will poll the following services on your behalf, retrieve records, and invoke your functions. The following are supported services:

AWS will manage the poller on your behalf and perform Synchronous invokes of your function with this type of integration. The retry behavior for this model is based on data expiration in the data source. For example, Kinesis Data streams store records for 24 hours by default (up to 168 hours). The specific details of each integration are linked above.

Conclusion

In our next post, we’ll provide some tips and best practices for developing Lambda functions. Happy coding!

 

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Increasing real-time stream processing performance with Amazon Kinesis Data Streams enhanced fan-out and AWS Lambda

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/increasing-real-time-stream-processing-performance-with-amazon-kinesis-data-streams-enhanced-fan-out-and-aws-lambda/

Live business data and real-time analytics are critical to informed decision-making and customer service. For example, streaming services like Netflix process billions of traffic flows each day to help you binge-watch your favorite shows. And consumer audio specialists like Sonos monitor a billion events per week to improve listener experiences. These data-savvy businesses collect and analyze massive amounts of real-time data every day.

Kinesis Data Streams overview

To help ingest real-time data or streaming data at large scales, AWS customers turn to Amazon Kinesis Data Streams. Kinesis Data Streams can continuously capture gigabytes of data per second from hundreds of thousands of sources. The data collected is available in milliseconds, enabling real-time analytics.

To provide this massively scalable throughput, Kinesis Data Streams relies on shards, which are units of throughput and represent a parallelism. One shard provides an ingest throughput of 1 MB/second or 1000 records/second. A shard also has an outbound throughput of 2 MB/sec. As you ingest more data, Kinesis Data Streams can add more shards. Customers often ingest thousands of shards in a single stream.

Enhanced fan-out

One of the main advantages of stream processing is that you can attach multiple unique applications, each consuming data from the same Kinesis data stream. For example, one application can aggregate the records in the data stream, batch them, and write the batch to S3 for long-term retention. Another application can enrich the records and write them into an Amazon DynamoDB table. At the same time, a third application can filter the stream and write a subset of the data into a different Kinesis data stream.

Before the adoption of enhanced fan-out technology, users consumed data from a Kinesis data stream with multiple AWS Lambda functions sharing the same 2 MB/second outbound throughput. Due to shared bandwidth constraints, no more than two or three functions could efficiently connect to the data stream at a time, as shown in the following diagram.

Default Method

To achieve greater outbound throughput across multiple applications, you could spread data ingestion across multiple data streams. So, a developer seeking to achieve 10 GB/second of outbound throughput to support five separate applications might resort to math like the following table:

StreamShardsInputOutput
111000 records/second
or 1 MB/second
2 MB/second
22500 ea.5,000,000 records/second
or 5000 MB/second
10,000 MB/second
or 10 GB/second

Due to the practical limitation of two to three applications per stream, you must have at least two streams to support five individual applications. You could attach three applications to the first stream and two applications to the second. However, diverting data into two separate streams adds complexity.

In August of 2018, Kinesis Data Streams announced a solution: support for enhanced fan-out and HTTP/2 for faster streaming. The enhanced fan-out method is an option that you can use for consuming Kinesis data streams at a higher capacity. The enhanced capacity enables you to achieve higher outbound throughput without having to provision more streams or shards in the same stream.

When using the enhanced fan-out option, first create a Kinesis data stream consumer. A consumer is an isolated connection to the stream that provides a 2 MB/second outbound throughput. A Kinesis data stream can support up to five consumers, providing a combined outbound throughput capacity of 10MB/second/shard. As the stream scales dynamically by adding shards, so does the amount of throughput scale through the consumers.

Consider again the requirement of 10 GB of output capacity—but rerun your math using enhanced fan-out.

StreamShardsInputConsumersOutput
111000 records/second
or 1 MB/second
510 MB/second
110001,000,000 records
or 1,000 MB/second
510,000 MB/second
or 10 GB/second

Enhanced fan-out with Lambda functions

Just before re:Invent 2018, AWS Lambda announced support for enhanced fan-out and HTTP/2. Lambda functions can now be triggered using the enhanced fan-out pattern to reduce latency. These improvements increase the amount of real-time data that can be processed in serverless applications, as seen in the following diagram.

Enhanced Fan-Out Method

In addition to using the enhanced fan-out option, you can still attach Lambda functions to the stream using the GetRecords API, as before. You can attach up to five consumers with Lambda functions at 2 MB/second outbound throughput capacity and another two or three Lambda functions sharing a single 2 MB/second outbound throughput capacity. Thus, enhanced fan-out enables you to support up to eight Lambda functions, simultaneously.

HTTP/2

The streaming technology in HTTP/2 increases the output ability of Kinesis data streams. In addition, it allows data delivery from producers to consumers in 70 milliseconds or better (a 65% improvement) in typical scenarios. These new features enable you to build faster, more reactive, highly parallel, and latency-sensitive applications on top of Kinesis Data Streams.

Comparing methods

To demonstrate the advantage of Kinesis Data Streams enhanced fan-out, I built an application with a single shard stream. It has three Lambda functions connected using the standard method and three Lambda functions connected using enhanced fan-out for consumers. I created roughly 76 KB of dummy data and inserted it into the stream at 1,000 records per second. After four seconds, I stopped the process, leaving a total of 4,000 records to be processed.

As seen in the following diagram, each of the enhanced fan-out functions processed the 4000 records in under 2 seconds, averaging at 1,852 MS each. Interestingly, the standard method got a jumpstart in the first function, processing 4,000 records in 1,732 MS. However, because of the shared resources, the other two functions took longer to process the data, at just over 2.5 seconds.

Comparison of Methods

By Kinesis Data Streams standards, 4000 records is a small dataset. But when processing millions of records in real time, the latency between standard and enhanced fan-out becomes much more significant.

Cost

When using Kinesis Data Streams, a company incurs an hourly cost of $0.015 per shard and a PUT fee of $0.014 per one million units. You can purchase enhanced fan-out for a consumer-shard per hour fee of $0.015 and $0.013 per GB data retrieval fee. These fees are for the us-east-1 Region only. To see a full list of prices, see Kinesis Data Streams pricing.

Show me the code

To demonstrate the use of Kinesis Data Streams enhanced fan-out with Lambda functions, I built a simple application. It ingests simulated IoT sensor data and stores it in an Amazon DynamoDB table as well as in an Amazon S3 bucket for later use. I could have conceivably done this in a single Lambda function. However, to keep things simple, I broke it into two separate functions.

Deploying the application

I built the Kinesis-Enhanced-Fan-Out-to-DDB-S3 application and made it available through the AWS Serverless Application Repository.

Deploy the application in your AWS account. The application is only available in the us-east-1 Region.

On the deployment status page, you can monitor the resources being deployed, including policies and capabilities.

Deployment Status

After all the resources deploy, you should see a green banner.

Exploring the application

Take a moment to examine the list of deployed resources. The two Lambda functions, DDBFunction and S3Function, receive data and write to DynamoDB and S3, respectively. Additionally, two roles have been created to allow the functions access to their respective targets.

There are also two consumers, DDBConsumer and S3Consumer, which provide isolated output at 2 MB/second throughput. Each consumer is connected to the KinesisStream stream and triggers the Lambda functions when data occurs.

Also, there is a DynamoDB table called DBRecords and an S3 bucket called S3Records.

Finally, there is a stream consumption app, as shown in the following diagram.

Application Example

Testing the application

Now that you have your application installed, test it by putting data into the Kinesis data stream.

There are several ways to do this. You can build your producer using the Kinesis Producer Library (KPL), or you could create an app that uses the AWS SDK to input data. However, there is an easier way that suits your purposes for this post: the Amazon Kinesis Data Generator. The easiest way to use this tool is to use the hosted generator and follow the setup instructions.

After you have the generator configured, you should have a custom URL to generate data for your Kinesis data stream. In your configuration steps, you created a username and password. Log in to the generator using those credentials.

When you are logged in, you can generate data for your stream test.

  1. For Region, choose us-east-1.
  2. For Stream/delivery stream, select your stream. It should start with serverlessrepo.
  3. For Records per second, keep the default value of 100.
  4. On the Template 1 tab, name the template Sensor1.
  5. Use the following template:
    {
        "sensorId": {{random.number(50)}},
        "currentTemperature": {{random.number(
            {
                "min":10,
                "max":150
            }
        )}},
        "status": "{{random.arrayElement(
            ["OK","FAIL","WARN"]
        )}}"
    }
  6. Choose Send Data.
  7. After several seconds, choose Stop Sending Data.

At this point, if all went according to plan, you should see data in both your DynamoDB table and S3 bucket. Use the following steps to verify that your enhanced fan-out process worked.

  1. On the Lambda console, choose Applications.
  2. Select the application that starts with serverlessrepo-.
  3. Choose Resources, DDBFunction. This opens the DynamoDB console.
  4. Choose Items.

The following screenshot shows the first 100 items that your database absorbed from the DDBFunction attached to KinesisStream through DDBConsumer.

DynamoDB Records

Next, check your S3 bucket,

  1. On the Lambda console, choose Applications.
  2. Select the application that starts with serverlessrepo-.
  3. Choose Resources, S3Records. This opens the S3 console.

As in DynamoDB, you should now see the fake IoT sensor data stored in your S3 bucket for later use.

S3 Records

Now that the demonstration is working, I want to point out the benefits of what you have just done. By using the enhanced fan-out method, you have increased your performance in the following ways.

  1. HTTP/2 has decreased the time from data producers to consumers to <=70 MS, a 65% improvement.
  2. At the consumer level, each consumer has an isolated 2 MB/second outbound throughput speed. Because you are using two consumers, it works out to 2x the performance.

Conclusion

Using Lambda functions in concert with Kinesis Data Streams to collect and analyze massive amounts of data isn’t a new idea. However, the introduction of enhanced fan-out technology and HTTP/2 enables you to use more functions at the same time without losing throughput capacity.

If you only connect one or two Lambda functions to a data stream, then enhanced fan-out might not be a great fit. However, if you attach more than three Lambda functions to a stream for real-time manipulation and data routing, it makes sense to evaluate this option.

I hope this helps. Happy coding!

How to Design Your Serverless Apps for Massive Scale

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/how-to-design-your-serverless-apps-for-massive-scale/

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Scaling Considerations

In general, developers in a “serverful” world need to be worried about how many total requests can be served throughout the day, week, or month, and how quickly their system can scale. As you move into the serverless world, the most important question you should understand becomes: “What is the concurrency that your system is designed to handle?”

The AWS Serverless platform allows you to scale very quickly in response to demand. Below is an example of a serverless design that is fully synchronous throughout the application. During periods of extremely high demand, Amazon API Gateway and AWS Lambda will scale in response to your incoming load. This design places extremely high load on your backend relational database because Lambda can easily scale from thousands to tens of thousands of concurrent requests. In most cases, your relational databases are not designed to accept the same number of concurrent connections.

Serverless at scale-1

This design risks bottlenecks at your relational database and may cause service outages. This design also risks data loss due to throttling or database connection exhaustion.

Cloud Native Design

Instead, you should consider decoupling your architecture and moving to an asynchronous model. In this architecture, you use an intermediary service to buffer incoming requests, such as Amazon Kinesis or Amazon Simple Queue Service (SQS). You can configure Kinesis or SQS as out of the box event sources for Lambda. In design below, AWS will automatically poll your Kinesis stream or SQS resource for new records and deliver them to your Lambda functions. You can control the batch size per delivery and further place throttles on a per Lambda function basis.

Serverless at scale - 2

This design allows you to accept extremely high volume of requests, store the requests in a durable datastore, and process them at the speed which your system can handle.

Conclusion

Serverless computing allows you to scale much quicker than with server-based applications, but that means application architects should always consider the effects of scaling to your downstream services. Always keep in mind cost, speed, and reliability when you’re building your serverless applications.

Our next post in this series will discuss the different ways to invoke your Lambda functions and how to design your applications appropriately.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark

Post Syndicated from Srikanth Kodali original https://aws.amazon.com/blogs/big-data/optimizing-downstream-data-processing-with-amazon-kinesis-data-firehose-and-amazon-emr-running-apache-spark/

For most organizations, working with ever-increasing volumes of data and incorporating new data sources can be a challenge.  Often, AWS customers have messages coming from various connected devices and sensors that must be efficiently ingested and processed before further analysis.  Amazon S3 is a natural landing spot for data of all types.  However, the way data is stored in Amazon S3 can make a significant difference in the efficiency and cost of downstream data processing.  Specifically, Apache Spark can be over-burdened with file operations if it is processing a large number of small files versus fewer larger files.  Each of these files has its own overhead of a few milliseconds for opening, reading metadata information, and closing. This overhead of file operations on these large numbers of files results in slow processing. This blog post shows how to use Amazon Kinesis Data Firehose to merge many small messages into larger messages for delivery to Amazon S3.  This results in faster processing with Amazon EMR running Spark.

Like Amazon Kinesis Data Streams, Kinesis Data Firehose accepts a maximum incoming message size of 1 MB.  If a single message is greater than 1 MB, it can be compressed before placing it on the stream.  However, at large volumes, a message or file size of 1 MB or less is usually too small.  Although there is no right answer for file size, 1 MB for many datasets would just yield too many files and file operations.

This post also shows how to read the compressed files using Apache Spark that are in Amazon S3, which does not have a proper file name extension and store back in Amazon S3 in parquet format.

Solution overview

The steps we follow in this blog post are:

  1. Create a virtual private cloud (VPC) and an Amazon S3 bucket.
  2. Provision a Kinesis data stream, and an AWS Lambda function to process the messages from the Kinesis data stream.
  3. Provision Kinesis Data Firehose to deliver messages to Amazon S3 sent from the Lambda function in step 2. This step also provisions an Amazon EMR cluster to process the data in Amazon S3.
  4. Generate test data with custom code running on an Amazon EC2
  5. Run a sample Spark program from the Amazon EMR cluster’s master instance to read the files from Amazon S3, convert them into parquet format and write back to an Amazon S3 destination.

The following diagram explains how the services work together:

The AWS Lambda function in the diagram reads the messages, append additional data to them, and compress them with gzip before sending to Amazon Kinesis Data Firehose. The reason for this is most customers need some enrichment to the data before arriving to Amazon S3.

Amazon Kinesis Data Firehose can buffer incoming messages into larger records before delivering them to your Amazon S3 bucket. It does so according to two conditions, buffer size (up to 128 MB) and buffer interval (up to 900 seconds). Record delivery is triggered once either of these conditions has been satisfied.

An Apache Spark job reads the messages from Amazon S3, and stores them in parquet format. With parquet, data is stored in a columnar format that provides more efficient scanning and enables ad hoc querying or further processing by services like Amazon Athena.

Considerations

The maximum size of a record sent to Kinesis Data Firehose is 1,000 KB. If your message size is greater than this value, compressing the message before it is sent to Kinesis Data Firehose is the best approach. Kinesis Data Firehose  also offers compression of messages after they are written to the Kinesis Data Firehose data stream. Unfortunately, this does not overcome the message size limitation, because this compression happens after the message is written. When Kinesis Data Firehose delivers a previously compressed message to Amazon S3 it is written as an object without a file extension. For example, if a message is compressed with gzip before it is written to Kinesis Data Firehose, it is delivered to Amazon S3 without the .gz extension. This is problematic if you are using Apache Spark for downstream processing because a “.gz” extension is required.

We will see how to overcome this issue by reading the files using the Amazon S3 API operations later in this blog.

Prerequisites and assumptions

To follow the steps outlined in this blog post, you need the following:

  • An AWS account that provides access to AWS services.
  • An AWS Identity and Access Management (IAM) user with an access key and secret access key to configure the AWS CLI.
  • The templates and code are intended to work in the US East (N. Virginia) Region only.

Additionally, be aware of the following:

  • We configure all services in the same VPC to simplify networking considerations.
  • Important: The AWS CloudFormation templates and the sample code that we provide use hardcoded user names and passwords and open security groups. These are for testing purposes only. They aren’t intended for production use without any modifications.

Implementing the solution

You can use this downloadable template for single-click deployment. This template is launched in the US East (N. Virginia) Region by default. Do not change to a different Region. The template is designed to work only in the US East (N. Virginia) Region. To launch directly through the console, choose the Launch Stack button.

This template takes the following parameters. Some of the parameters have default values, and you can’t edit these. These predefined names are hardcoded in the code. For some of the parameters, you must provide the values. The following table provides additional details.

For this parameterProvide this
StackNameProvide the stack name.

ClientIP

 

The IP address range of the client that is allowed to connect to the cluster using SSH.
FirehoseDeliveryStreamNameThe name of the Amazon Firehose delivery stream. Default value is set to “AWSBlogs-LambdaToFireHose”.
InstanceTypeThe EC2 instance type.
KeyNameThe name of an existing EC2 key pair to enable access to login.
KinesisStreamNameThe name of the Amazon Kinesis Stream. Default value is set to “AWS-Blog-BaseKinesisStream”
RegionAWS Region – By default it is us-east-1 — US East (N. Virginia). Do not change this as the scripts are developed to work in this Region only.

EMRClusterName

 

A name for the EMR cluster.
S3BucketNameThe name of the bucket that is created in your account. Provide some unique name to this bucket. This bucket is used for storing the messages and output from the Spark code.

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, select the check box for I acknowledge that AWS CloudFormation might create IAM resources with custom names and for I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND. And then click on the Create button.

If you use this one-step solution, you can skip to Step 7: Generate test dataset and load into Kinesis Data Streams.

To create each component individually, use the following steps.

1. Use the AWS CloudFormation template to configure Amazon VPC and create an Amazon S3 bucket

In this step, we set up a VPC, public subnet, internet gateway, route table, and a security group. The security group has two inbound access rules. The first inbound rule allows access to the TCP port 22 (SSH) from the provided client IP CIDR range and the second inbound rule allows access to any TCP port from any host with in the same security group. We use this VPC and subnet for all other services that are created in the next steps. In addition to these resources, we will also create a standard Amazon S3 bucket with a provided bucket name to store the incoming data and processed data. You can use this downloadable AWS CloudFormation template to set up the previous components. To launch directly through the console, choose Launch Stack.

This template takes the following parameters. The following table provides additional details.

For this parameterDo this
StackNameProvide the stack name.
S3BucketNameProvide a unique Amazon S3 bucket. This bucket is created in your account.
ClientIpProvide a CIDR IP address range that is added to inbound rule of the security group. You can get your current IP address from “checkip.amazon.com” web url.

After you specify the template details, choose Next. On the Review page, choose Create.

When the stack launch is complete, it should return outputs similar to the following.

KeyValue
StackNameName
VPCIDVpc-xxxxxxx
SubnetIDsubnet-xxxxxxxx
SecurityGroupsg-xxxxxxxxxx
S3BucketDomain<S3_BUCKET_NAME>.s3.amazonaws.com
S3BucketARNarn:aws:s3:::<S3_BUCKET_NAME>

Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

$ aws cloudformation describe-stacks --stack-name <stack_name> --region us-east-1 --query 'Stacks[0].Outputs'

2.  Use the AWS CloudFormation template to create necessary IAM Roles

In this step, we set up two AWS IAM roles. One of the IAM roles will be used by an AWS Lambda function to allow access to Amazon S3 service, Amazon Kinesis Data Firehose, Amazon CloudWatch Logs, and Amazon EC2 instances.  The second IAM role is used by the Amazon Kinesis Data Firehose service to access Amazon S3 service. You can use this downloadable CloudFormation template to set up the previous components. To launch directly through the console, choose Launch Stack.

This template takes the following parameters. The following table provides additional details.

For this parameterDo this
StackNameProvide the stack name.

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, select the check box for I acknowledge that AWS CloudFormation might create IAM resources with custom names. Choose Create.

When the stack launch is complete, it should return outputs similar to the following.

KeyValue
LambdaRoleArnarn:aws:iam::<ACCOUNT_NUMBER>:role/small-files-lamdarole
FirehoseRoleArnarn:aws:iam::<ACCOUNT_NUMBER>:role/small-files-firehoserole

When the stack launch is complete, it returns the output with information about the resources that were created. Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

$ aws cloudformation describe-stacks --stack-name <stack_name> --region us-east-1 --query 'Stacks[0].Outputs'

3. Use an AWS CloudFormation template to configure the Amazon Kinesis Data Firehose data stream

In this step, we set up Amazon Kinesis Data Firehose with Amazon S3 as destination for the incoming messages. We select the Uncompressed option for compression format, buffering options with 128 MB size and interval seconds of 300. You can use this downloadable AWS CloudFormation template to set up the previous components. To launch directly through the console, choose Launch Stack.

This template takes the following parameters. The following table provides additional details.

For this parameterDo this
StackNameProvide the stack name.
FirehoseDeliveryStreamNameProvide the name of the Amazon Kinesis Data Firehose delivery stream. The default value is set to “AWSBlogs-LambdaToFirehose”
RoleProvide the Kinesis Data Firehose IAM role ARN that was created as part of step 2.
S3BucketARNSelect the S3BucketARN. You can get this from the step 1 AWS CloudFormation output.

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.

4. Use an AWS CloudFormation template to create a Kinesis data stream and a Lambda function

In this step, we set up a Kinesis data stream and an AWS Lambda function. We can use the AWS Lambda function to process incoming messages in a Kinesis data stream. An event source mapping is also created as part of this template. This adds a trigger to the AWS Lambda function for the Kinesis data stream source. For more information about creating event source mapping, see Creating an Event Source Mapping. This Kinesis data stream is created with 10 shards and the Lambda function is created with a Java 8 runtime. We allocate memory size of 1920 MB and timeout of 300 seconds. You can use this downloadable AWS CloudFormation template to set up the previous components. To launch directly through the console, choose Launch Stack.

This template takes the following parameters. The following table provides details.

For this parameterDo this
StackNameProvide the stack name.
KinesisStreamNameProvide the name of the Amazon Kinesis stream. Default value is set to ‘AWS-Blog-BaseKinesisStream’
RoleProvide the IAM Role created for Lambda function as part of the second AWS CloudFormation template. Get the value from the output of second AWS CloudFormation template.
S3BucketProvide the existing Amazon S3 bucket name that was created using first AWS CloudFormation template. Do not use the domain name. Provide the bucket name only.
RegionSelect the AWS Region. By default it is us-east-1 — US East (N. Virginia).

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.

5. Use an AWS CloudFormation template to configure the Amazon EMR cluster

In this step, we set up an Amazon EMR 5.16.0 cluster with “Spark”, “Ganglia” and “Hive” applications. We create this cluster with one master and two core nodes, and use an r4.xlarge instance type. The template uses an AWS Glue metastore for the Amazon EMR hive metastore. This Amazon EMR cluster is used to process the messages in Amazon S3 bucket that are created by the Amazon Kinesis Data Firehose data stream. You can use this downloadable AWS CloudFormation template to set up the previous components. To launch directly through the console, choose Launch Stack.

This template takes the following parameters. The following table provides additional details.

For this parameterDo this
EMRClusterNameProvide the name for the EMR cluster.
ClusterSecurityGroupSelect the security group ID that was created as part of the first AWS CloudFormation template.
ClusterSubnetIDSelect the subnet ID that was created as part of the first AWS CloudFormation template.
AllowedCIDRProvide the IP address range of the client that is allowed to connect to the cluster.
KeyNameProvide the name of an existing EC2 key pair to access the Amazon EMR cluster.

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page, choose Create.

When the stack launch is complete, it should return outputs similar to the following.

KeyValue
EMRClusterMasterssh [email protected] -i <KEY_PAIR_NAME>.pem

Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

$ aws cloudformation describe-stacks --stack-name <stack_name> --region us-east-1 --query 'Stacks[0].Outputs'

6. Use an AWS CloudFormation template to create an Amazon EC2 Instance to generate test data

In this step, we set up an Amazon EC2 instance and install open-jdk version 1.8. The AWS CloudFormation script that creates this EC2 instance runs two additional steps. First, it downloads and installs open-jdk version 1.8. Second, it downloads a Java program jar file on to the EC2 instance’s ec2-user home directory. We use this Java program to generate test data messages with an approximate size of ~900 KB. We then send them to the Kinesis data stream that was created as part of the previous steps. The Java jar file name is: “sample-kinesis-producer-1.0-SNAPSHOT-jar-with-dependencies.jar”.

You can use this downloadable AWS CloudFormation template to set up the previous components. To launch directly through the console, choose Launch Stack.

This template takes the following parameters. The following table provides additional details.

For this parameterDo this
EC2SecurityGroupSelect the security group ID that was created from the first AWS CloudFormation template.
EC2SubnetSelect the subnet that was created from the first AWS CloudFormation template.
InstanceTypeSelect the provided instance type. By default, it selects r4.4xlarge instance.
KeyNameName of an existing EC2 key pair to enable SSH access to the EC2 instance.

After you specify the template details, choose Next. On the options page, choose Next again. On the Review page select “I acknowledge that AWS CloudFormation might create IAM resources with custom names” option and, click Create button.

When the stack launch is complete, it should return outputs similar to the following.

KeyValue
EC2Instancessh [email protected]<Public-IP> -i <KEY_PAIR_NAME>.pem

Make a note of the output, because you use this information in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

$ aws cloudformation describe-stacks --stack-name <stack_name> --region us-east-1 --query 'Stacks[0].Outputs'

7. Generate the test dataset and load into Kinesis Data Streams

After all of the previous AWS CloudFormation stacks are created successfully, log in to the EC2 instance that was created as part of the step 6. Use the “ssh” command as shown in the CloudFormation stack template output. This template copies the “sample-kinesis-producer-1.0-SNAPSHOT-jar-with-dependencies.jar” file, which we use to generate the test data and send to Amazon Kinesis Data Streams. You can find the code corresponding to this sample Kinesis producer in this Git repository.

Make sure your EC2 instance’s security group allows ssh port 22 (Inbound) from your IP address. If not, update your security group inbound access.

Run the following commands to generate some test data.

$ cd;

 

$ ls -ltra sample-kinesis-producer-1.0-SNAPSHOT-jar-with-dependencies.jar

-rwxr-xr-x 1 ec2-user ec2-user 27536802 Oct 29 21:19 sample-kinesis-producer-1.0-SNAPSHOT-jar-with-dependencies.jar

 

$java -Xms1024m -Xmx25600m -XX:+UseG1GC -cp sample-kinesis-producer-1.0-SNAPSHOT-jar-with-dependencies.jar com.optimize.downstream.entry.Main 10000

 

This java program uses PutRecords API method that allows many records to be sent with a single HTTP request. For more information on this you can check this AWS blog. Once you run the above java program, you will see the below output that shows messages are in the process of sending to Kinesis Data Stream.

“Starting producer and consumer.....
Inserting a message into blocking queue before sending to Kinesis Firehose and Message number is : 0
Producer Thread # 9 is going to sleep mode for 500 ms.
Inserting a message into blocking queue before sending to Kinesis Firehose and Message number is : 1
Inserting a message into blocking queue before sending to Kinesis Firehose and Message number is : 2
Inserting a message into blocking queue before sending to Kinesis Firehose and Message number is : 3
::
::
Record sent to Kinesis Stream. Record size is ::: 5042850 KB
Sending a record to Kinesis Stream with 5 messages grouped together.
Record sent to Kinesis Stream. Record size is ::: 5042726 KB
Sending a record to Kinesis Stream with 5 messages grouped together.
Record sent to Kinesis Stream. Record size is ::: 5042729 KB”

When running the sample Kinesis producer jar, notice that the number of messages is 10,000. This program generates the test data messages and is not a replacement for your load testing tool. This is created to demonstrate the use case presented in this post.

After all of the messages generated and sent to Amazon Kinesis Data Streams, program will exit gracefully.

The sample JSON input message format is shown as follows:

   "processedDate":"2018/10/30 19:05:19",
   "currentDate":"2018/10/30 19:05:07",
   "hashDeviceId":"0c2745e4-c2d6-4d43-8339-9c2401e80e92",
   "deviceId":"94581b5f-a117-484a-8e3c-4fcc2dbd53b7",
   "accelerometerSensorList":[  
      {  
         "accelerometer_Y":8,
         "gravitySensor_X":5,
         "accelerometer_X":9,
         "gravitySensor_Z":4,
         "accelerometer_Z":1,
         "gravitySensor_Y":5,
         "linearAccelerationSensor_Z":3,
         "linearAccelerationSensor_Y":9,
         "linearAccelerationSensor_X":9
      },
      {  
         "accelerometer_Y":1,
         "gravitySensor_X":3,
         "accelerometer_X":5,
         "gravitySensor_Z":5,
         "accelerometer_Z":7,
         "gravitySensor_Y":9,
         "linearAccelerationSensor_Z":6,
         "linearAccelerationSensor_Y":5,
         "linearAccelerationSensor_X":3
      },
 {
   …
 },
 {
   …
 },
 :
 :
   ],
   "tempSensorList":[  
      {  
         "kelvin":585.4928040286752,
         "celsius":43.329574923775425,
         "fahrenheit":50.13864584530086
      },
      {  
         "kelvin":349.95625855125814,
         "celsius":95.68423052685313,
         "fahrenheit":7.854854574219985
      },
 {
   …
 },
 {
   …
 },
 :
 :
 
   ],
   "illuminancesSensorList":[  
      {  
         "illuminance":44.65135784368194
      },
      {  
         "illuminance":98.15404017082403
      },
 {
   …
 },
 {
   …
 },
 :
 :
   ],
   "gpsSensorList":[  
      {  
         "altitude":4.38273213294682,
         "heading":7.416314616289915,
         "latitude":5.759723677991661,
         "longitude":1.4732885894731842
      },
      {  
         "altitude":9.816473807569487,
         "heading":5.118919157684835,
         "latitude":3.581361614110458,
         "longitude":1.3699272610616127
      },
 {
   …
 },
 {
   …
 },
 :
 :
   }

 

Log in to the Kinesis Data Streams console, then choose the Kinesis data stream that was created as part of the step 4.  Choose the Monitor tab to see the graphs. Run the data generation utility for at least 15 mins to generate enough data.

8. Processing Kinesis Data Streams messages using AWS Lambda

As part of the previously-described setup, we also use an AWS Lambda function (name:LambdaForProcessingKinesisRecords) to process the messages from the Kinesis data stream. This Lambda function reads each message content and appends “additional data.” This demonstrates that the incoming message from Kinesis data stream is read, and appended with some additional information to make the message size more than 1 MB.  Several customers have a use case like this to enrich the incoming messages by adding additional information. After the AWS Lambda function appends additional data to incoming messages, it sends them to Amazon Kinesis Data Firehose. Because Kinesis Data Firehose accepts only messages that are less than 1 MB, we must compress the messages before sending to it. In the Lambda function, we are compressing the message using gzip compression before sending it to Kinesis Data Firehose. In addition to compressing each message, we are also appending a new line character (“/n”) to each message after compressing it to separate the messages.

We set the buffer size to 128 MB and duration of the buffer is 900 seconds while creating the Kinesis Data Firehose. This helps merge the incoming compressed messages into larger messages and sends to the provided Amazon S3 bucket.

The AWS Lambda function appends the following content to the original message in Kinesis Data Streams after reading it.

"testAdditonalDataList": [
  {
    "dimesnion_X": 9,
    "dimesnion_Y": 2,
    "dimesnion_Z": 2
  },
  {
    "dimesnion_X": 3,
    "dimesnion_Y": 10,
    "dimesnion_Z": 5
  }
  {
    …
  },
  {
    …
  },
  :
  :
]

If we do not compress the message before sending to Kinesis Data Firehose, it throws this error message in the Amazon CloudWatch Logs.

Here is the code snippet where we are compressing the message in the AWS Lambda function. The complete code can be found in this Git repository.

private String sendToFireHose(String mergedJsonString)
{
    PutRecordResult res = null;
    try {
        //To Firehose -
        System.out.println("MESSAGE SIZE BEFORE COMPRESSION IS : " + mergedJsonString.toString().getBytes(charset).length);
        System.out.println("MESSAGE SIZE AFTER GZIP COMPRESSION IS : " + compressMessage(mergedJsonString.toString().getBytes(charset)).length);
        PutRecordRequest req = new PutRecordRequest()
                .withDeliveryStreamName(firehoseStreamName);

        // Without compression - Send to Firehose
        //Record record = new Record().withData(ByteBuffer.wrap((mergedJsonString.toString() + "\r\n").getBytes()));

        // With compression - send to Firehose
        Record record = new Record().withData(ByteBuffer.wrap(compressMessage((mergedJsonString.toString() + "\r\n").getBytes())));
        req.setRecord(record);
        res = kinesisFirehoseClient.putRecord(req);
    }
    catch (IOException ie) {
        ie.printStackTrace();
    }
    return res.getRecordId();
}

You can check the provided bucket to see if the messages are flowing into the bucket. The Amazon S3 bucket should show something similar to the following example:

You see the files generated from Kinesis Data Firehose that do not have any extension. By default, Kinesis Data Firehose does not provide any extension to the files that are generated in Amazon S3 bucket unless you select a compression option. But in our use case, since the size of the uncompressed input message is greater than 1 MB, we are compressing it before sending to Kinesis Data Firehose. As the message is already compressed, we are not selecting any compression option in Kinesis Data Firehose, as it double-compresses the message and the downstream Spark application cannot process this.

9. Reading and converting the data into parquet format using Apache Spark program with Amazon EMR

As we noted down from the previous screen shot, Kinesis Data Firehose by default does not generate any file extensions to the files that are written into Amazon S3 bucket. This creates a problem while reading the files using Apache Spark. Apache Spark, by default, checks for a valid file name extension if the file is compressed. In this case for gzip compression, it looks for <filename>.gz to successfully read it.

To overcome this issue, we can use Amazon S3 API operations, particularly AmazonS3Client class, to list all the Amazon S3 keys and use Spark’s parallelize method to read the contents of the files. After reading the file content, we can uncompress it using GZipInputStream class. You can find the code snippet below. The complete code can be found in the Git repository.

val allLinesRDD = spark.sparkContext.parallelize(s3ObjectKeys).flatMap
{ key => Source.fromInputStream
  (
   new GZipInputStream(s3Client.getObject(bucketName, key).getObjectContent:   InputStream)
  ).getLines 
}

var finalDF = spark.read.json(allLinesRDD).toDF()

Once the Amazon EMR cluster creation is completed successfully, login to the Amazon EMR master machine using the following command. You can get the “ssh” login command from the AWS CloudFormation stack 5 (step 5) outputs parameter “EMRClusterMaster”.

  • ssh [email protected] -i <KEYPAIR_NAME>.pem
  • Make sure the security port 22 is opened to connect to the Amazon EMR master machine.

Run the Spark program using the following Spark submit command.

spark-submit --class com.optimize.downstream.process.ProcessFilesFromS3AndConvertToParquet --master yarn --deploy-mode client s3://aws-bigdata-blog/artifacts/aws-blog-optimize-downstream-data-processing/appjars/spark-process-1.0-SNAPSHOT-jar-with-dependencies.jar <S3_BUCKET_NAME> fromfirehose/<YEAR>/ output-emr-parquet/

Change the S3_BUCKET_NAME and YEAR values from the previous Spark command.

Argument #PropertyValue
1–classcom.optimize.downstream.process.ProcessFilesFromS3AndConvertToParquet
2–masteryarn
3–deploy-modeclient
4s3://aws-bigdata-blog/artifacts/aws-blog-avoid-small-files/appjars/spark-process-1.0-SNAPSHOT-jar-with-dependencies.jar
5S3_BUCKET_NAMEThe Amazon S3 bucket name that was created as part of the AWS CloudFormation template. The source files are created in this bucket.
6<INPUT S3 LOCATION>“fromfirehose/<YYYY>/”. The input files are created in this Amazon S3 key location under the bucket that was created. “YYYY” represents the current year. For example, “fromfirehose/2018/”
7<OUTPUT S3 LOCATION>Provide an output directory name that will be created under the above provided Amazon S3 bucket. For example: “output-emr-parquet/”

 

When the program finishes running, you can check the Amazon S3 output location to see the files that are written in parquet format.

Cleaning up after the migration

After completing and testing this solution, clean up the resources by stopping your tasks and deleting the AWS CloudFormation stacks. The stack deletion fails if you have any files in the created Amazon S3 bucket. Make sure that you cleaned up the Amazon S3 bucket that was created before deleting the AWS CloudFormation templates.

Conclusion

In this post, we described the process of avoiding small file creation in Amazon S3 by sending the incoming messages to Amazon Kinesis Data Firehose. We also went through the process of reading and storing the data in parquet format using Apache Spark with an Amazon EMR cluster.

 


About the Author

Photo of Srikanth KodaliSrikanth Kodali is a Sr. IOT Data analytics architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on building IoT data and analytics solutions, helping them improve the value of their solutions when using AWS.

 

 

Updates to Serverless Architectural Patterns and Best Practices

Post Syndicated from Drew Dennis original https://aws.amazon.com/blogs/architecture/updates-to-serverless-architectural-patterns-and-best-practices/

As we sail past the halfway point between re:Invent 2018 and re:Invent 2019, I’d like to revisit some of the recent serverless announcements we’ve made. These are all complimentary to the patterns discussed in the re:Invent architecture track’s Serverless Architectural Patterns and Best Practices session.

AWS Event Fork Pipelines

AWS Event Fork Pipelines was announced in March 2019. Many customers use asynchronous event-driven processing in their serverless applications to decouple application components and address high concurrency needs. And in doing so, they often find themselves needing to backup, search, analyze, or replay these asynchronous events. That is exactly what AWS Event Fork Pipelines aims to achieve. You can plug them into a new or existing SNS topic used by your application and immediately address retention and compliance needs, gain new business insights, or even improve your application’s disaster recovery abilities.

AWS Event Fork Pipelines is a suite of three applications. The first application addresses event storage and backup needs by writing all events to an S3 bucket where they can be queried with services like Amazon Athena. The second is a search and analytics pipeline that delivers events to a new or existing Amazon ES domain, enabling search and analysis of your events. Finally, the third application is an event replay pipeline that can be used to reprocess messages should a downstream failure occur in your application. AWS Event Fork Pipelines is available in AWS Serverless Application Model (SAM) templates and are available in the AWS Serverless Application Repository (SAR). Check out our example e-commerce application on GitHub..

Amazon API Gateway Serverless Developer Portal

If you publish APIs for developers allowing them to build new applications and capabilities with your data, you understand the need for a developer portal. Also, in March 2019, we announced some significant upgrades to the API Gateway Serverless Developer Portal. The portal’s front end is written in React and is designed to be fully customizable.

The API Gateway Serverless Developer Portal is also available in GitHub and the AWS SAR. As you can see from the architecture diagram below, it is integrated with Amazon Cognito User Pools to allow developers to sign-up, receive an API Key, and register for one or more of your APIs. You can now also enable administrative scenarios from your developer portal by logging in as users belonging to the portal’s Admin group which is created when the portal is initially deployed to your account. For example, you can control which APIs appear in a customer’s developer portal, enable SDK downloads, solicit developer feedback, and even publish updates for APIs that have been recently revised.

AWS Lambda with Amazon Application Load Balancer (ALB)

Serverless microservices have been built by our customers for quite a while, with AWS Lambda and Amazon API Gateway. At re:Invent 2018 during Dr. Werner Vogel’s keynote, a new approach to serverless microservices was announced, Lambda functions as ALB targets.

ALB’s support for Lambda targets gives customers the ability to deploy serverless code behind an ALB, alongside servers, containers, and IP addresses. With this feature, ALB path and host-based routing can be used to direct incoming requests to Lambda functions. Also, ALB can now provide an entry point for legacy applications to take on new serverless functionality, and enable migration scenarios from monolithic legacy server or container-based applications.

Use cases for Lambda targets for ALB include adding new functionality to an existing application that already sits behind an ALB. This could be request monitoring by sending http headers to Elasticsearch clusters or implementing controls that manage cookies. Check out our demo of this new feature. For additional details, take a look at the feature’s documentation.

Security Overview of AWS Lambda Whitepaper

Finally, I’d be remiss if I didn’t point out the great work many of my colleagues have done in releasing the Security Overview of AWS Lambda Whitepaper. It is a succinct and enlightening read for anyone wishing to better understand the Lambda runtime environment, function isolation, or data paths taken for payloads sent to the Lambda service during synchronous and asynchronous invocations. It also has some great insight into compliance, auditing, monitoring, and configuration management of your Lambda functions. A must read for anyone wishing to better understand the overall security of AWS serverless applications.

I look forward to seeing everyone at re:Invent 2019 for more exciting serverless announcements!

About the author

Drew DennisDrew Dennis is a Global Solutions Architect with AWS based in Dallas, TX. He enjoys all things Serverless and has delivered the Architecture Track’s Serverless Patterns and Best Practices session at re:Invent the past three years. Today, he helps automotive companies with autonomous driving research on AWS, connected car use cases, and electrification.

How to securely provide database credentials to Lambda functions by using AWS Secrets Manager

Post Syndicated from Ramesh Adabala original https://aws.amazon.com/blogs/security/how-to-securely-provide-database-credentials-to-lambda-functions-by-using-aws-secrets-manager/

As a solutions architect at AWS, I often assist customers in architecting and deploying business applications using APIs and microservices that rely on serverless services such as AWS Lambda and database services such as Amazon Relational Database Service (Amazon RDS). Customers can take advantage of these fully managed AWS services to unburden their teams from infrastructure operations and other undifferentiated heavy lifting, such as patching, software maintenance, and capacity planning.

In this blog post, I’ll show you how to use AWS Secrets Manager to secure your database credentials and send them to Lambda functions that will use them to connect and query the backend database service Amazon RDS—without hardcoding the secrets in code or passing them through environment variables. This approach will help you secure last-mile secrets and protect your backend databases. Long living credentials need to be managed and regularly rotated to keep access into critical systems secure, so it’s a security best practice to periodically reset your passwords. Manually changing the passwords would be cumbersome, but AWS Secrets Manager helps by managing and rotating the RDS database passwords.

Solution overview

This is sample code: you’ll use an AWS CloudFormation template to deploy the following components to test the API endpoint from your browser:

  • An RDS MySQL database instance on a db.t2.micro instance
  • Two Lambda functions with necessary IAM roles and IAM policies, including access to AWS Secrets Manager:
    • LambdaRDSCFNInit: This Lambda function will execute immediately after the CloudFormation stack creation. It will create an “Employees” table in the database, where it will insert three sample records.
    • LambdaRDSTest: This function will query the Employees table and return the record count in an HTML string format
  • RESTful API with “GET” method on AWS API Gateway

Here’s the high level setup of the AWS services that will be created from the CloudFormation stack deployment:
 

Figure 1: Solution architecture

Figure 1: Architecture diagram

  1. Clients call the RESTful API hosted on AWS API Gateway
  2. The API Gateway executes the Lambda function
  3. The Lambda function retrieves the database secrets using the Secrets Manager API
  4. The Lambda function connects to the RDS database using database secrets from Secrets Manager and returns the query results

You can access the source code for the sample used in this post here: https://github.com/awslabs/automating-governance-sample/tree/master/AWS-SecretsManager-Lambda-RDS-blog.

Deploying the sample solution

Set up the sample deployment by selecting the Launch Stack button below. If you haven’t logged into your AWS account, follow the prompts to log in.

By default, the stack will be deployed in the us-east-1 region. If you want to deploy this stack in any other region, download the code from the above GitHub link, place the Lambda code zip file in a region-specific S3 bucket and make the necessary changes in the CloudFormation template to point to the right S3 bucket. (Please refer to the AWS CloudFormation User Guide for additional details on how to create stacks using the AWS CloudFormation console.)
 
Select this image to open a link that starts building the CloudFormation stack

Next, follow these steps to execute the stack:

  1. Leave the default location for the template and select Next.
     
    Figure 2: Keep the default location for the template

    Figure 2: Keep the default location for the template

  2. On the Specify Details page, you’ll see the parameters pre-populated. These parameters include the name of the database and the database user name. Select Next on this screen
     
    Figure 3: Parameters on the "Specify Details" page

    Figure 3: Parameters on the “Specify Details” page

  3. On the Options screen, select the Next button.
  4. On the Review screen, select both check boxes, then select the Create Change Set button:
     
    Figure 4: Select the check boxes and "Create Change Set"

    Figure 4: Select the check boxes and “Create Change Set”

  5. After the change set creation is completed, choose the Execute button to launch the stack.
  6. Stack creation will take between 10 – 15 minutes. After the stack is created successfully, select the Outputs tab of the stack, then select the link.
     
    Figure 5:  Select the link on the "Outputs" tab

    Figure 5: Select the link on the “Outputs” tab

    This action will trigger the code in the Lambda function, which will query the “Employee” table in the MySQL database and will return the results count back to the API. You’ll see the following screen as output from the RESTful API endpoint:
     

    Figure 6:   Output from the RESTful API endpoint

    Figure 6: Output from the RESTful API endpoint

At this point, you’ve successfully deployed and tested the API endpoint with a backend Lambda function and RDS resources. The Lambda function is able to successfully query the MySQL RDS database and is able to return the results through the API endpoint.

What’s happening in the background?

The CloudFormation stack deployed a MySQL RDS database with a randomly generated password using a secret resource. Now that the secret resource with randomly generated password has been created, the CloudFormation stack will use dynamic reference to resolve the value of the password from Secrets Manager in order to create the RDS instance resource. Dynamic references provide a compact, powerful way for you to specify external values that are stored and managed in other AWS services, such as Secrets Manager. The dynamic reference guarantees that CloudFormation will not log or persist the resolved value, keeping the database password safe. The CloudFormation template also creates a Lambda function to do automatic rotation of the password for the MySQL RDS database every 30 days. Native credential rotation can improve security posture, as it eliminates the need to manually handle database passwords through the lifecycle process.

Below is the CloudFormation code that covers these details:


#This is a Secret resource with a randomly generated password in its SecretString JSON.
MyRDSInstanceRotationSecret:
    Type: AWS::SecretsManager::Secret
    Properties:
    Description: 'This is my rds instance secret'
    GenerateSecretString:
        SecretStringTemplate: !Sub '{"username": "${!Ref RDSUserName}"}'
        GenerateStringKey: 'password'
        PasswordLength: 16
        ExcludeCharacters: '"@/\'
    Tags:
    -
        Key: AppNam
        Value: MyApp

#This is a RDS instance resource. Its master username and password use dynamic references to resolve values from
#SecretsManager. The dynamic reference guarantees that CloudFormation will not log or persist the resolved value
#We use a ref to the Secret resource logical id in order to construct the dynamic reference, since the Secret name is being
#generated by CloudFormation
MyDBInstance2:
    Type: AWS::RDS::DBInstance
    Properties:
    AllocatedStorage: 20
    DBInstanceClass: db.t2.micro
    DBName: !Ref RDSDBName
    Engine: mysql
    MasterUsername: !Ref RDSUserName
    MasterUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:password}}' ]]
    MultiAZ: False
    PubliclyAccessible: False      
    StorageType: gp2
    DBSubnetGroupName: !Ref myDBSubnetGroup
    VPCSecurityGroups:
    - !Ref RDSSecurityGroup
    BackupRetentionPeriod: 0
    DBInstanceIdentifier: 'rotation-instance'

#This is a SecretTargetAttachment resource which updates the referenced Secret resource with properties about
#the referenced RDS instance
SecretRDSInstanceAttachment:
    Type: AWS::SecretsManager::SecretTargetAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    TargetId: !Ref MyDBInstance2
    TargetType: AWS::RDS::DBInstance
#This is a RotationSchedule resource. It configures rotation of password for the referenced secret using a rotation lambda
#The first rotation happens at resource creation time, with subsequent rotations scheduled according to the rotation rules
#We explicitly depend on the SecretTargetAttachment resource being created to ensure that the secret contains all the
#information necessary for rotation to succeed
MySecretRotationSchedule:
    Type: AWS::SecretsManager::RotationSchedule
    DependsOn: SecretRDSInstanceAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    RotationLambdaARN: !GetAtt MyRotationLambda.Arn
    RotationRules:
        AutomaticallyAfterDays: 30

#This is a lambda Function resource. We will use this lambda to rotate secrets
#For details about rotation lambdas, see https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html     https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html
#The below example assumes that the lambda code has been uploaded to a S3 bucket, and that it will rotate a mysql database password
MyRotationLambda:
    Type: AWS::Serverless::Function
    Properties:
    Runtime: python2.7
    Role: !GetAtt MyLambdaExecutionRole.Arn
    Handler: mysql_secret_rotation.lambda_handler
    Description: 'This is a lambda to rotate MySql user passwd'
    FunctionName: 'cfn-rotation-lambda'
    CodeUri: 's3://devsecopsblog/code.zip'      
    Environment:
        Variables:
        SECRETS_MANAGER_ENDPOINT: !Sub 'https://secretsmanager.${AWS::Region}.amazonaws.com' 

Verifying the solution

To be certain that everything is set up properly, you can look at the Lambda code that’s querying the database table by following the below steps:

  1. Go to the AWS Lambda service page
  2. From the list of Lambda functions, click on the function with the name scm2-LambdaRDSTest-…
  3. You can see the environment variables at the bottom of the Lambda Configuration details screen. Notice that there should be no database password supplied as part of these environment variables:
     
    Figure 7: Environment variables

    Figure 7: Environment variables

    
        import sys
        import pymysql
        import boto3
        import botocore
        import json
        import random
        import time
        import os
        from botocore.exceptions import ClientError
        
        # rds settings
        rds_host = os.environ['RDS_HOST']
        name = os.environ['RDS_USERNAME']
        db_name = os.environ['RDS_DB_NAME']
        helperFunctionARN = os.environ['HELPER_FUNCTION_ARN']
        
        secret_name = os.environ['SECRET_NAME']
        my_session = boto3.session.Session()
        region_name = my_session.region_name
        conn = None
        
        # Get the service resource.
        lambdaClient = boto3.client('lambda')
        
        
        def invokeConnCountManager(incrementCounter):
            # return True
            response = lambdaClient.invoke(
                FunctionName=helperFunctionARN,
                InvocationType='RequestResponse',
                Payload='{"incrementCounter":' + str.lower(str(incrementCounter)) + ',"RDBMSName": "Prod_MySQL"}'
            )
            retVal = response['Payload']
            retVal1 = retVal.read()
            return retVal1
        
        
        def openConnection():
            print("In Open connection")
            global conn
            password = "None"
            # Create a Secrets Manager client
            session = boto3.session.Session()
            client = session.client(
                service_name='secretsmanager',
                region_name=region_name
            )
            
            # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
            # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
            # We rethrow the exception by default.
            
            try:
                get_secret_value_response = client.get_secret_value(
                    SecretId=secret_name
                )
                print(get_secret_value_response)
            except ClientError as e:
                print(e)
                if e.response['Error']['Code'] == 'DecryptionFailureException':
                    # Secrets Manager can't decrypt the protected secret text using the provided KMS key.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InternalServiceErrorException':
                    # An error occurred on the server side.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InvalidParameterException':
                    # You provided an invalid value for a parameter.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InvalidRequestException':
                    # You provided a parameter value that is not valid for the current state of the resource.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'ResourceNotFoundException':
                    # We can't find the resource that you asked for.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
            else:
                # Decrypts secret using the associated KMS CMK.
                # Depending on whether the secret is a string or binary, one of these fields will be populated.
                if 'SecretString' in get_secret_value_response:
                    secret = get_secret_value_response['SecretString']
                    j = json.loads(secret)
                    password = j['password']
                else:
                    decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
                    print("password binary:" + decoded_binary_secret)
                    password = decoded_binary_secret.password    
            
            try:
                if(conn is None):
                    conn = pymysql.connect(
                        rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
                elif (not conn.open):
                    # print(conn.open)
                    conn = pymysql.connect(
                        rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
        
            except Exception as e:
                print (e)
                print("ERROR: Unexpected error: Could not connect to MySql instance.")
                raise e
        
        
        def lambda_handler(event, context):
            if invokeConnCountManager(True) == "false":
                print ("Not enough Connections available.")
                return False
        
            item_count = 0
            try:
                openConnection()
                # Introducing artificial random delay to mimic actual DB query time. Remove this code for actual use.
                time.sleep(random.randint(1, 3))
                with conn.cursor() as cur:
                    cur.execute("select * from Employees")
                    for row in cur:
                        item_count += 1
                        print(row)
                        # print(row)
            except Exception as e:
                # Error while opening connection or processing
                print(e)
            finally:
                print("Closing Connection")
                if(conn is not None and conn.open):
                    conn.close()
                invokeConnCountManager(False)
        
            content =  "Selected %d items from RDS MySQL table" % (item_count)
            response = {
                "statusCode": 200,
                "body": content,
                "headers": {
                    'Content-Type': 'text/html',
                }
            }
            return response        
        

In the AWS Secrets Manager console, you can also look at the new secret that was created from CloudFormation execution by following the below steps:

  1. Go to theAWS Secret Manager service page with appropriate IAM permissions
  2. From the list of secrets, click on the latest secret with the name MyRDSInstanceRotationSecret-…
  3. You will see the secret details and rotation information on the screen, as shown in the following screenshot:
     
    Figure 8: Secret details and rotation information

    Figure 8: Secret details and rotation information

Conclusion

In this post, I showed you how to manage database secrets using AWS Secrets Manager and how to leverage Secrets Manager’s API to retrieve the secrets into a Lambda execution environment to improve database security and protect sensitive data. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and ongoing maintenance costs of operating your own secrets management infrastructure. To get started, visit the Secrets Manager console. To learn more, visit Secrets Manager documentation.

If you have feedback about this post, add it to the Comments section below. If you have questions about implementing the example used in this post, open a thread on the Secrets Manager Forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ramesh Adabala

Ramesh is a Solution Architect on the Southeast Enterprise Solution Architecture team at AWS.

Getting started with serverless

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/getting-started-with-serverless/

This post is contributed by Maureen Lonergan, Director, AWS Training and Certification

We consistently hear from customers that they’re interested in building serverless applications to take advantage of the increased agility and decreased total cost of ownership (TCO) that serverless delivers. But we also know that serverless may be intimidating for those who are more accustomed to using instances or containers for compute.

Since we launched AWS Lambda in 2014, our serverless portfolio has expanded beyond event-driven computing. We now have serverless databases, integration, and orchestration tools. This enables you to build end-to-end serverless applications—but it also means that you must learn how to build using a new serverless operational model.

For this reason, AWS Training and Certification is pleased to offer a new course through Coursera entitled AWS Fundamentals: Building Serverless Applications.

This scenario-based course, developed by the experts at AWS, will:

  • Introduce the AWS serverless framework and architecture in the context of a real business problem.
  • Provide the foundational knowledge to become more proficient in choosing and creating serverless solutions using AWS.
  • Provide demonstrations of the AWS services needed for deploying serverless solutions.
  • Help you develop skills in building and deploying serverless solutions using real-world examples of a serverless website and chatbot.

The syllabus allocates more than nine hours of video content and reading material over four weekly lessons. Each lesson has an estimated 2–3 hours per week of study time (though you can set your own pace and deadlines), with suggested exercises in the AWS Management Console. There is an end-of-course assessment that covers all the learning objectives and content.

The course is on-demand and 100% digital; you can even audit it for free. A completion certificate and access to the graded assessments are available for $49.

What can you expect?

In this course you will learn to use the AWS Serverless portfolio to create a chatbot that answers the question, “Can I let my cat outside?” You will build an application using every one of the concepts and services discussed in the class, including:

At the end of the class, you can audibly interact with the application to ask that essential question, “Can my cat go out in Denver?” (See the conversation in the following screenshot.)

Serverless Coursera training app

Across the four weeks of the course, you learn:

  1. What serverless computing is and how to create a chatbot with Amazon Lex using an S3 bucket to host a web application.
  2. How to build a highly scalable API with API Gateway and use Amazon CloudFront as a content delivery network (CDN) for your site and API.
  3. How to use Lambda to build serverless functions that write data to DynamoDB.
  4. How to apply lessons from the previous weeks to extend and add functionality to the chatbot.

Serverless Coursera training

AWS Fundamentals: Building Serverless Applications is now available. This course complements other standalone digital courses by AWS Training and Certification. They include the highly recommended Introduction to Serverless Development, as well as the following:

Updated timeframe for the upcoming AWS Lambda and AWS [email protected] execution environment update

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/updated-timeframe-for-the-upcoming-aws-lambda-and-aws-lambdaedge-execution-environment-update/

On May 14th we announced an upcoming update to the AWS Lambda and AWS [email protected] execution environments. In that announcement we shared that we are updating the execution environment to a more recent version of Amazon Linux. This newer execution environment brings updates that offer improvements in capabilities, performance, security, and updated packages that your application code might interface with. The previous post explained approaches to proactively testing against the new update, and methods to update your code to be compatible in the rare case you were impacted.

So far, we’ve heard from many customers that their functions have not been impacted when using the new execution environment by configuring the Opt-in mechanism. For those that have been impacted, they have been able to follow the guidance on rebuilding any dependencies against the new execution environment and retesting their functions with success.

We also received feedback that customers wanted to see a longer time frame for validation as well as have more control over it, and so based on this feedback we’ve decided to modify the timeframe in two ways.

The first phase, Begin Testing, will be extended by three weeks, retroactive starting May 21 and now ending June 10. This will give you more time to test your functions with the Opt-in layer before any further changes to the platform kick in.

We are then taking the second phase, originally called Update/Create and breaking into two independent periods of time. The first, now referred to as the New Function Create phase, will be two weeks long and during this time all newly created functions will use the new execution environment unless a delayed-update layer is configured. The second new phase, Existing Function Update, will be three weeks long and during this time both newly created functions as well as existing functions that you update, will use the new execution environment unless a delayed-update layer is configured.

The end result is that you now have 5 more weeks in total to test and potentially update your functions for this change before the General Update begins. As a reminder, starting at that time, all functions without a delayed-update layer configured will begin migrating to the new execution environment.

New update timeline

The following is the new timeline for the update, which is now broken up over five phases:

May 14, 2019 – Begin Testing: You can begin testing your functions for the new execution environment locally with AWS SAM CLI or using an Amazon EC2 instance running on Amazon Linux 2018.03. You can also proactively enable the new environment in AWS Lambda using the opt-in mechanism described in the original announcement post.
June 11, 2019 – New Function Create: All newly created functions will result in those functions running on the new execution environment, unless they have a delayed-update layer configured.
June 25, 2019 – Existing Function Update: All newly created functions or existing functions that you update will result in those functions running on the new execution environment, unless they have a delayed-update layer configured.
July 16, 2019 – General Update: Existing functions begin using the new execution environment on invoke, unless they have a delayed-update layer configured.
July 23, 2019 – Delayed Update End: All functions with a delayed-update layer configured start being migrated automatically.
July 29, 2019 – Migration End: All functions have been migrated over to the new execution environment.

Note, that we have updated the original announcement post with this new timeline as well.

FAQ

We also wanted to take the chance to provide additional information on follow up questions customers have had about the update.

Q. How does this relate to the recent Node.js v10 runtime launch?
A. The Node.js v10 launch is unrelated and is not impacted by this change. The Node.js v10 runtime is based on Amazon Linux 2 as its execution environment. Please see the AWS Lambda Runtimes section in the documentation for more information.

Q. Does this update change the execution environment for other runtimes to run on Amazon Linux 2?
A. No, this update brings the execution environment to the latest Amazon Linux 1 distribution release. In the future, new runtimes will launch on Amazon Linux 2, but all previous existing runtimes will continue to run on Amazon Linux 1.

Q. Was this update related to the recent Intel Quarterly Security Release (QSR) 2019.1?
A. No, this motion to begin updating the execution environment for Lambda and [email protected] is unrelated to the Intel QSR. There is no action for Lambda or [email protected] customers to take in relation to the QSR.

Next Steps

Your feedback greatly matters to us and we will continue to listen and learn from you. Please continue to contact us through AWS Support, the AWS forums, or AWS account teams.

ICYMI: Serverless Q1 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-q1-2019/

Welcome to the fifth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

If you didn’t see them, check our previous posts for what happened in 2018:

So, what might you have missed this past quarter? Here’s the recap.

Amazon API Gateway

Amazon API Gateway improved the experience for publishing APIs on the API Gateway Developer Portal. In addition, we also added features like a search capability, feedback mechanism, and SDK-generation capabilities.

Last year, API Gateway announced support for WebSockets. As of early February 2019, it is now possible to build WebSocket-enabled APIs via AWS CloudFormation and AWS Serverless Application Model (AWS SAM). The following diagram shows an example application.WebSockets

API Gateway is also now supported in AWS Config. This feature enhancement allows API administrators to track changes to their API configuration automatically. With the power of AWS Config, you can automate alerts—and even remediation—with triggered Lambda functions.

In early January, API Gateway also announced a service level agreement (SLA) of 99.95% availability.

AWS Step Functions

Step Functions Local

AWS Step Functions added the ability to tag Step Function resources and provide access control with tag-based permissions. With this feature, developers can use tags to define access via AWS Identity and Access Management (IAM) policies.

In addition to tag-based permissions, Step Functions was one of 10 additional services to have support from the Resource Group Tagging API, which allows a single central point of administration for tags on resources.

In early February, Step Functions released the ability to develop and test applications locally using a local Docker container. This new feature allows you to innovate faster by iterating faster locally.

In late January, Step Functions joined the family of services offering SLAs with an SLA of 99.9% availability. They also increased their service footprint to include the AWS China (Ningxia) and AWS China (Beijing) Regions.

AWS SAM Command Line Interface

AWS SAM Command Line Interface (AWS SAM CLI) released the AWS Toolkit for Visual Studio Code and the AWS Toolkit for IntelliJ. These toolkits are open source plugins that make it easier to develop applications on AWS. The toolkits provide an integrated experience for developing serverless applications in Node.js (Visual Studio Code) as well as Java and Python (IntelliJ), with more languages and features to come.

The toolkits help you get started fast with built-in project templates that leverage AWS SAM to define and configure resources. They also include an integrated experience for step-through debugging of serverless applications and make it easy to deploy your applications from the integrated development environment (IDE).

AWS Serverless Application Repository

AWS Serverless Application Repository applications can now be published to the application repository using AWS CodePipeline. This allows you to update applications in the AWS Serverless Application Repository with a continuous integration and continuous delivery (CICD) process. The CICD process is powered by a pre-built application that publishes other applications to the AWS Serverless Application Repository.

AWS Event Fork Pipelines

Event Fork Pipelines

AWS Event Fork Pipelines is now available in AWS Serverless Application Repository. AWS Event Fork Pipelines is a suite of nested open-source applications based on AWS SAM. You can deploy Event Fork Pipelines directly from AWS Serverless Application Repository into your AWS account. These applications help you build event-driven serverless applications by providing pipelines for common event-handling requirements.

AWS Cloud9

Cloud9

AWS Cloud9 announced that, in addition to Amazon Linux, you can now select Ubuntu as the operating system for their AWS Cloud9 environment. Before this announcement, you would have to stand up an Ubuntu server and connect AWS Cloud9 to the instance by using SSH. With native support for Ubuntu, you can take advantage of AWS Cloud9 features, such as instance lifecycle management for cost efficiency and preconfigured tooling environments.

AWS Cloud9 also added support for AWS CloudTrail, which allows you to monitor and react to changes made to your AWS Cloud9 environment.

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics now supports CloudTrail logging. CloudTrail captures changes made to Kinesis Data Analytics and delivers the logs to an Amazon S3 bucket. This makes it easy for administrators to understand changes made to the application and who made them.

Amazon DynamoDB

Amazon DynamoDB removed the associated costs of DynamoDB Streams used in replicating data globally. Because of their use of streams to replicate data between Regions, this translates to cost savings in global tables. However, DynamoDB streaming costs remain the same for your applications reading from a replica table’s stream.

DynamoDB added the ability to switch encryption keys used to encrypt data. DynamoDB, by default, encrypts all data at rest. You can use the default encryption, the AWS-owned customer master key (CMK), or the AWS managed CMK to encrypt data. It is now possible to change between the AWS-owned CMK and the AWS managed CMK without having to modify code or applications.

Amazon DynamoDB Local, a local installable version of DynamoDB, has added support for transactional APIs, on-demand capacity, and as many as 20 global secondary indexes per table.

AWS Amplify

Amplify Deploy

AWS Amplify added support for OAuth 2.0 Authorization Code Grant flows in the native (iOS and Android) and React Native libraries. Previously, you would have to use third-party libraries and handwritten logic to achieve these use cases.

Additionally, Amplify also launched the ability to perform instant cache invalidation and delta deployments on every code commit. To achieve this, Amplify creates unique references to all the build artifacts on each deploy. Amplify has also added the ability to detect and upload only modified artifacts at the time of release to help reduce deployment time.

Amplify also added features for multiple environments, custom resolvers, larger data models, and IAM roles, including multi-factor authentication (MFA).

AWS AppSync

AWS AppSync increased its availability footprint to the EU (London) Region.

Amazon Cognito

Amazon Cognito increased its service footprint to include the Canada (central) Region. It also published an SLA of 99.9% availability.

Amazon Aurora

Amazon Aurora Serverless increases performance visibility by publishing logs to Amazon CloudWatch.

AWS CodePipeline

CodePipeline

AWS CodePipeline announces support for deploying static files to Amazon S3. While this may not usually fall under the serverless blogs and announcements, if you’re a developer who builds single-page applications or host static websites, this makes your life easier. Your static site can now be part of your CICD process without custom coding.

Serverless Posts

January:

February:

March

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q1:

Whitepapers

Security Overview of AWS Lambda: This whitepaper presents a deep dive into the Lambda service through a security lens. It provides a well-rounded picture of the service, which can be useful for new adopters, as well as deepening understanding of Lambda for current users. Read the full whitepaper.

Twitch

AWS Launchpad Santa Clara

There is always something going on at our Twitch channel! Be sure and follow us so you don’t miss anything! For information about upcoming broadcasts and recent livestreams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

In other news

Building Happy Little APIs

Twitch Series: Building Happy Little APIs

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

Twitch series: Build on Serverless: Season 2

Build On Serverless

Join Heitor Lessa across 14 weeks, nearly every Wednesday from April 24 – August 7 at 8AM PST/11AM EST/3PM UTC. Heitor is live-building a full-stack, serverless airline-booking application using a bunch of services: Lambda, Amplify, API Gateway, Amazon Cognito, AWS SAM, CloudWatch, AWS AppSync, and others. See the episode guide and sign up for stream reminders.

2019 AWS Summits

AWS Summit

The 2019 schedule is in full swing for 2019 AWS Global Summits held in major cities around the world. These free events bring the cloud computing community together to connect, collaborate, and learn about AWS. They attract technologists from all industries and skill levels who want to discover how AWS can help them innovate quickly and deliver flexible, reliable solutions at scale. Get notified when to register and learn more at the AWS Global Summit Program website.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Upcoming updates to the AWS Lambda and AWS [email protected] execution environment

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/upcoming-updates-to-the-aws-lambda-execution-environment/

AWS Lambda was first announced at AWS re:Invent 2014. Amazon CTO Werner Vogels highlighted the aspect of needing to run no servers, no instances, nothing, you just write your code. In 2016, we announced the launch of [email protected], which lets you run Lambda functions to customize content that CloudFront delivers, executing the functions in AWS locations closer to the viewer.

At AWS, we talk often about “shared responsibility” models. Essentially, those are the places where there is a handoff between what we as a technology provider offer you and what you as the customer are responsible for. In the case of Lambda and [email protected], one of the key things that we manage is the “execution environment.” The execution environment is what your code runs inside of. It is composed of the underlying operating system, system packages, the runtime for your language (if a managed one), and common capabilities like environment variables. From the customer standpoint, your primary responsibility is for your application code and configuration.

In this post, we outline an upcoming change to the execution environment for Lambda and [email protected] functions for all runtimes with the exception of Node.js v10. As with any update, some functionality could be affected. We encourage you to read through this post to understand the changes and any actions that you might need to take.

Update overview

AWS Lambda and AWS [email protected] run on top of the Amazon Linux operating system distribution and maintain updates to both the core OS and managed language runtimes. We are updating the Lambda execution environment AMI to version 2018.03 of Amazon Linux. This newer AMI brings updates that offer improvements in capabilities, performance, security, and updated packages that your application code might interface with.

This does not apply to the recently announced Node.js v10 runtime which today runs on Amazon Linux 2.

The majority of functions will benefit seamlessly from the enhancements in this update without any action from you. However, in rare cases, package updates may introduce compatibility issues. Potential impacted functions may be those that contain libraries or application code compiled against specific underlying OS packages or other system libraries. If you are primarily using an AWS SDK or the AWS X-Ray SDK with no other dependencies, you will see no impact.

You have the following options in terms of next steps:

  • Take no action before the automatic update of the execution environment starting May 21 for all newly created/updated functions and June 11 for all existing functions.
  • Proactively test your functions against the new environment starting today.
  • Configure your functions to delay the execution environment update until June 18 to allow for a longer testing window.

In addition to the overall timeline for this change, this post also provides instructions on the following:

  • How to test your functions for this new execution environment locally and on Lambda/[email protected]
  • How to proactively update your functions.
  • How to extend the testing window by one week.

Update timeline

The following is the timeline for the update, which is broken up over four phases over the next several weeks:

May 14, 2019—Begin Testing: You can begin testing your functions for the new execution environment locally with AWS SAM CLI or using an Amazon EC2 instance running on Amazon Linux 2018.03. You can also proactively enable the new environment in AWS Lambda using the opt-in mechanism described later in this post.
May 21, 2019—Update/Create: All new function creates or function updates result in your functions running on the new execution environment.
June 11, 2019—General Update: Existing functions begin using the new execution environment on invoke unless they have a delayed-update layer configured.
June 18, 2019—Delayed Update End: All functions with a delayed-update layer configured start being migrated automatically.
June 24, 2019—Migration End: All functions have been migrated over to the new execution environment.

Recommended Approach

decision tree

You only have to act if your application uses dependencies that are compiled to work on the previous execution environment. Otherwise, you can continue to deploy new and updated Lambda functions without needing to perform any other testing steps. For those who aren’t sure if their functions use such dependencies, we encourage you to do a new deployment of your functions and to test their functionality.

There are two options for when you can start testing your functions on the new execution environment:

  • You can begin testing today using the opt-in mechanism described later.
  • Starting May 21, a new deploy or update of your functions uses the new execution environment.

If you confirm that your functions would be affected by the new execution environment, you can begin re-compiling or building your dependencies using the new reference AMI for the execution environment today and then repeat the testing. The final step is to redeploy your applications any time after May 21 to use the new execution environment.

Building your dependencies and application for the new execution environment

Because we are basing the environment off of an existing Amazon Linux AMI, you can start with building and testing your code against that AMI on EC2. With an updated EC2 instance running this AMI, you can compile and build your packages using your normal processes. For the list of AMI IDs in all public Regions, check the release notes. To start an EC2 instance running this AMI, follow the steps in the Launching an Instance Using the Launch Instance Wizard topic in the Amazon EC2 User Guide.

Opt-in/Delayed-Update with Lambda layers

Some of you may want to begin testing as soon as you’ve read this announcement. Some know that they should postpone until later in the timeline.

To give you some control over testing, we’re releasing two special Lambda layers. Lambda layers can be used to provide shared resources, code, or data between Lambda functions and can simplify the deployment and update process. These layers don’t actually contain any data or code. Instead, they act as a special flag to Lambda to run your function executions either specifically on the new or old execution environment.

The Opt-In layer allows you to start testing today. You can use the Delayed-Update layer when you know that you must make updates to your function or its configuration after May 21, but aren’t ready to deploy to the new execution environment. The Delayed-Update layer extends the initial period available to you to deploy your functions by one week until the end of June 17, without changing the execution environment.

Neither layer brings any performance or runtime changes beyond this. After June 24, the layers will have no functionality. In a future deployment, you should remove them from any function configurations.

The ARNs for the two scenarios:

  • To OPT-IN to the update to the new execution environment, add the following layer:

arn:aws:lambda:::awslayer:AmazonLinux1803

  • To DELAY THE UPDATE to the new execution environment until June 18, add the following layer:

arn:aws:lambda:::awslayer:AmazonLinux1703

The action for adding a layer to your existing functions requires an update to the Lambda function’s configuration. You can do this with the AWS CLI, AWS CloudFormation or AWS SAM, popular third-party frameworks, the AWS Management Console, or an AWS SDK.

Validating your functions

There are several ways for you to test your function code and assure that it will work after the execution environment has been updated.

Local testing

We’re providing an update to the AWS SAM CLI to enable you to test your functions locally against this new execution environment. The AWS SAM CLI uses a Docker image that mirrors the live Lambda environment locally wherever you do development. To test against this new update, make sure that you have the most recent update to AWS SAM CLI version 0.16.0. You also should have an AWS SAM template configured for your function.

  1. Install or update the AWS SAM CLI:
    $ pip install --upgrade aws-sam-cli

    -Or-

    $ pip install aws-sam-cli
  2. Confirm that you have a valid AWS SAM template:
    $ sam validate -t <template file name>

    If you don’t have a valid AWS SAM template, you can begin with a basic template to test your functions. The following example represents the basic needs for running your function against a variety. The Runtime value must be listed in the AWS Lambda Runtimes topic.

    AWSTemplateFormatVersion: 2010-09-09
    Transform: 'AWS::Serverless-2016-10-31'
    
    Resources:
      myFunction:
        Type: 'AWS::Serverless::Function'
        Properties:
          CodeUri: ./ 
          Handler: YOUR_HANDLER
          Runtime: YOUR_RUNTIME
  3. With a valid template, you can begin testing your function with mock event payloads. To generate a mock event payload, you can use the AWS SAM CLI local generate-event command. Here is an example of that command being run to generate an Amazon S3 notification type of event:
    sam local generate-event s3 put --bucket munns-test --key somephoto.jpeg
    {
      "Records": [
        {
          "eventVersion": "2.0", 
          "eventTime": "1970-01-01T00:00:00.000Z", 
          "requestParameters": {
            "sourceIPAddress": "127.0.0.1"
          }, 
          "s3": {
            "configurationId": "testConfigRule", 
            "object": {
              "eTag": "0123456789abcdef0123456789abcdef", 
              "sequencer": "0A1B2C3D4E5F678901", 
              "key": "somephoto.jpeg", 
              "size": 1024
            }, 
            "bucket": {
              "arn": "arn:aws:s3:::munns-test", 
              "name": "munns-test", 
              "ownerIdentity": {
                "principalId": "EXAMPLE"
              }
            }, 
            "s3SchemaVersion": "1.0"
          }, 
          "responseElements": {
            "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH", 
            "x-amz-request-id": "EXAMPLE123456789"
          }, 
          "awsRegion": "us-east-1", 
          "eventName": "ObjectCreated:Put", 
          "userIdentity": {
            "principalId": "EXAMPLE"
          }, 
          "eventSource": "aws:s3"
        }
      ]
    }

    You can then use the AWS SAM CLI local invoke command and pipe in the output from the previous command. Or, you can save the output from the previous command to a file and then pass in a reference to the file’s name and location with the -e flag. Here is an example of the pipe event method:

    sam local generate-event s3 put --bucket munns-test --key somephoto.jpeg | sam local invoke myFunction
    2019-02-19 18:45:53 Reading invoke payload from stdin (you can also pass it from file with --event)
    2019-02-19 18:45:53 Found credentials in shared credentials file: ~/.aws/credentials
    2019-02-19 18:45:53 Invoking index.handler (python2.7)
    
    Fetching lambci/lambda:python2.7 Docker container image......
    2019-02-19 18:45:53 Mounting /home/ec2-user/environment/forblog as /var/task:ro inside runtime container
    START RequestId: 7c14eea1-96e9-4b7d-ab54-ed1f50bd1a34 Version: $LATEST
    {"Records": [{"eventVersion": "2.0", "eventTime": "1970-01-01T00:00:00.000Z", "requestParameters": {"sourceIPAddress": "127.0.0.1"}, "s3": {"configurationId": "testConfigRule", "object": {"eTag": "0123456789abcdef0123456789abcdef", "key": "somephoto.jpeg", "sequencer": "0A1B2C3D4E5F678901", "size": 1024}, "bucket": {"ownerIdentity": {"principalId": "EXAMPLE"}, "name": "munns-test", "arn": "arn:aws:s3:::munns-test"}, "s3SchemaVersion": "1.0"}, "responseElements": {"x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH", "x-amz-request-id": "EXAMPLE123456789"}, "awsRegion": "us-east-1", "eventName": "ObjectCreated:Put", "userIdentity": {"principalId": "EXAMPLE"}, "eventSource": "aws:s3"}]}
    END RequestId: 7c14eea1-96e9-4b7d-ab54-ed1f50bd1a34
    REPORT RequestId: 7c14eea1-96e9-4b7d-ab54-ed1f50bd1a34 Duration: 1 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 14 MB
    
    "Success! Parsed Events"

    You can see the full output of your function in the logs that follow the invoke command. In this example, the Python function prints out the event payload and then exits.

With the AWS SAM CLI, you can pass in valid test payloads that interface with data in other AWS services. You can also have your Lambda function talk to other AWS resources that exist in your account, for example Amazon DynamoDB tables, Amazon S3 buckets, and so on. You could also test an API interface using the local start-api command, provided that you have configured your AWS SAM template with events of the API type. Follow the full instructions for setting up and configuring the AWS SAM CLI in Installing the AWS SAM CLI. Find the full syntax guide for AWS SAM templates in the AWS Serverless Application Model documentation.

Testing in the Lambda console

After you have deployed your functions after the start of the Update/Create phase or with the Opt-In layer added, test your functions in the Lambda console.

  1. In the Lambda console, select the function to test.
  2. Select a test event and choose Test.
  3. If no test event exists, choose Configure test events.
    1. Choose Event template and select the relevant invocation service from which to test.
    2. Name the test event.
    3. Modify the event payload for your specific function.
    4. Choose Create and then return to step 2.

The results from the test are displayed.

Conclusion

With Lambda and [email protected], AWS has allowed developers to focus on just application code without the need to think about the work involved managing the actual servers that run the code. We believe that the mechanisms provided and processes described in this post allow you to easily test and update your functions for this new execution environment.

Some of you may have questions about this process, and we are ready to help you. Please contact us through AWS Support, the AWS forums, or AWS account teams.

Optimize Amazon EMR costs with idle checks and automatic resource termination using advanced Amazon CloudWatch metrics and AWS Lambda

Post Syndicated from Praveen Krishnamoorthy Ravikumar original https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-costs-with-idle-checks-and-automatic-resource-termination-using-advanced-amazon-cloudwatch-metrics-and-aws-lambda/

Many customers use Amazon EMR to run big data workloads, such as Apache Spark and Apache Hive queries, in their development environment. Data analysts and data scientists frequently use these types of clusters, known as analytics EMR clusters. Users often forget to terminate the clusters after their work is done. This leads to idle running of the clusters and in turn, adds up unnecessary costs.

To avoid this overhead, you must track the idleness of the EMR cluster and terminate it if it is running idle for long hours. There is the Amazon EMR native IsIdle Amazon CloudWatch metric, which determines the idleness of the cluster by checking whether there’s a YARN job running. However, you should consider additional metrics, such as SSH users connected or Presto jobs running, to determine whether the cluster is idle. Also, when you execute any Spark jobs in Apache Zeppelin, the IsIdle metric remains active (1) for long hours, even after the job is finished executing. In such cases, the IsIdle metric is not ideal in deciding the inactivity of a cluster.

In this blog post, we propose a solution to cut down this overhead cost. We implemented a bash script to be installed in the master node of the EMR cluster, and the script is scheduled to run every 5 minutes. The script monitors the clusters and sends a CUSTOM metric EMR-INUSE (0=inactive; 1=active) to CloudWatch every 5 minutes. If CloudWatch receives 0 (inactive) for some predefined set of data points, it triggers an alarm, which in turn executes an AWS Lambda function that terminates the cluster.

Prerequisites

You must have the following before you can create and deploy this framework:

Note: This solution is designed as an additional feature. It can be applied to any existing EMR clusters by executing the scheduler script (explained later in the post) as an EMR step. If you want to implement this solution as a mandatory feature for your future clusters, you can include the EMR step as part of your cluster deployment. You can also apply this solution to EMR clusters that are spun up through AWS CloudFormation, the AWS CLI, and even the AWS Management Console.

Key components

The following are the key components of the solution.

Analytics EMR cluster

Amazon EMR provides a managed Apache Hadoop framework that lets you easily process large amounts of data across dynamically scalable Amazon EC2 instances. Data scientists use analytics EMR clusters for data analysis, machine learning using notebook applications (such as Apache Zeppelin or JupyterHub), and running big data workloads based on Apache Spark, Presto, etc.

Scheduler script

The schedule_script.sh is the shell script to be executed as an Amazon EMR step. When executed, it copies the monitoring script from the Amazon S3 artifacts folder and schedules the monitoring script to run every 5 minutes. The S3 location of the monitoring script should be passed as an argument.

Monitoring script

The pushShutDownMetrin.sh script is a monitoring script that is implemented using shell commands. It should be installed in the master node of the EMR cluster as an Amazon EMR step. The script is scheduled to run every 5 minutes and sends the cluster activity status to CloudWatch.  

JupyterHub API token script

The jupyterhub_addAdminToken.sh script is a shell script to be executed as an Amazon EMR step if JupyterHub is enabled on the cluster. In our design, the monitoring script uses REST APIs provided by JupyterHub to check whether the application is in use.

To send the request to JupyterHub, you must pass an API token along with the request. By default, the application does not generate API tokens. This script generates the API token and assigns it to the admin user, which is then picked up by the jupyterhub module in the monitoring script to track the activity of the application.

Custom CloudWatch metric

All Amazon EMR clusters send data for several metrics to CloudWatch. Metrics are updated every 5 minutes, automatically collected, and pushed to CloudWatch. For this use case, we created the Amazon EMR metric EMR-INUSE. This metric represents the active status of the cluster based on the module checks implemented in the monitoring script. The metric is set to 0 when the cluster is inactive and 1 when active.

Amazon CloudWatch

CloudWatch is a monitoring service that you can use to set high-resolution alarms to take automated actions. In this case, CloudWatch triggers an alarm if it receives 0 continuously for the configured number of hours.

AWS Lambda

Lambda is a serverless technology that lets you run code without provisioning or managing servers. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration. You can set up your code to automatically trigger from other AWS services. In this case, the triggered CloudWatch alarm mentioned earlier signals Lambda to terminate the cluster.

Architectural diagram

The following diagram illustrates the sequence of events when the solution is enabled, showing what happens to the EMR cluster that is spun up via AWS CloudFormation.

 

The diagram shows the following steps:

  1. The AWS CloudFormation stack is launched to spin up an EMR cluster.
  2. The Amazon EMR step is executed (installs the pushShutDownMetric.sh and then schedules it as a cron job to run every 5 minutes).
  3. If the EMR cluster is active (executing jobs), the master node sets the EMR-INUSE metric to 1 and sends it to CloudWatch.
  4. If the EMR cluster is inactive, the master node sets the EMR-INUSE metric to 0 and sends it to CloudWatch.
  5. On receiving 0 for a predefined number of data points, CloudWatch triggers a CloudWatch alarm.
  6. The CloudWatch alarm sends notification to AWS Lambda to terminate the cluster.
  7. AWS Lambda executes the Lambda function.
  8. The Lambda function then deletes all the stack resources associated with the cluster.
  9. Finally, the EMR cluster is terminated, and the Stack ID is removed from AWS CloudFormation.

Modules in the monitoring script

Following are the different activity checks that are implemented in the monitoring script (pushShutDownMetric.sh). The script is designed in a modular fashion so that you can easily include new modules without modifying the core functionality.

ActiveSSHCheck

The ActiveSSHCheck module checks whether there are any active SSH connections to the master node. If there is an active SSH connection, and it’s idle for less than 10 minutes, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch.

YARNCheck

Apache Hadoop YARN is the resource manager of the EMR Hadoop ecosystem. All the Spark Submits and Hive queries reach YARN initially. It then schedules and processes these jobs. The YARNCheck module checks whether there are any running jobs in YARN or jobs completed within last 5 minutes. If it finds any, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch.The checks are performed by calling REST APIs exposed by YARN.

The API to fetch the running jobs is http://localhost:8088/ws/v1/cluster/apps?state=RUNNING.

The API to fetch the completed jobs is

http://localhost:8088/ws/v1/cluster/apps?state=FINISHED.

PRESTOCheck

Presto is an open-source distributed query engine for running interactive analytic queries. It is included in EMR release version 5.0.0 and later.The PRESTOCheck module checks whether there are any running Presto queries or if any queries have been completed within last 5 minutes. If there are some, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. These checks are performed by calling REST APIs exposed by the Presto server.

The API to fetch the Presto jobs is http://localhost:8889/v1/query.

ZeppelinCheck

Amazon EMR users use Apache Zeppelin as a notebook for interactive data exploration. The ZeppelinCheck module checks whether there are any jobs running or if any have been completed within the last 5 minutes. If so, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. These checks are performed by calling the REST APIs exposed by Zeppelin.

The API to fetch the list of notebook IDs is http://localhost:8890/api/notebook.

The API to fetch the status of each cell inside each notebook ID is http://localhost:8890/api/notebook/job/$notebookID.

JupyterHubCheck

Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and narrative text. JupyterHub allows you to host multiple instances of a single-user Jupyter notebook server.The JupyterHubCheck module checks whether any Jupyter notebook is currently in use.

The function uses REST APIs exposed by JupyterHub to fetch the list of Jupyter notebook users and gathers the data about individual notebook servers. From the response, it extracts the last activity time of the servers and checks whether any server was used in the last 5 minutes. If so, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. The jupyterhub_addAdminToken.sh script needs to be executed as an EMR step before enabling the scheduler script.

The API to fetch the list of notebook users is https://localhost:9443/hub/api/users -H "Authorization: token $admin_token".

The API to fetch individual server information is https://localhost:9443/hub/api/users/$user -H "Authorization: token $admin_token.

If any one of these checks fails, the cluster is considered to be inactive, and the monitoring script sets the EMR-INUSE metric to 0 and pushes it to CloudWatch.

Note:

The scheduler script schedules the monitoring script (pushShutDownMetric.sh) to run every 5 minutes. Internal cron jobs that run for a very few minutes are not considered in calibrating the EMR-INUSE metric.

Deploying each component

Follow the steps in this section to deploy each component of the proposed design.

Step 1. Create the Lambda function and SNS subscription

The Lambda function and the SNS subscription are the core components of the design. You must set up these components initially, and they are common for every cluster. The following are the AWS resources to be created for these components:

  • Execution role for the Lambda function
  • Terminate Idle EMR Lambda function
  • SNS topic and Lambda subscription

For one-step deployment, use this AWS CloudFormation template to launch and configure the resources in a single go.

The following parameters are available in the template.

ParameterDefaultDescription
s3Bucketemr-shutdown-blogartifactsThe name of the S3 bucket that contains the Lambda file
s3KeyEMRTerminate.zipThe Amazon S3 key of the Lambda file

For manual deployment, follow these steps on the AWS Management Console.

Execution role for the Lambda function

  1. Open the AWS Identity and Access Management (IAM) consoleand choose PoliciesCreate policy.
  2. Choose the JSON tab, paste the following policy text, and then choose Review policy.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:HeadBucket",
                "s3:ListObjects",
                "s3:GetObject",
                "cloudformation:ListStacks",
                "cloudformation:DeleteStack",
                "cloudformation:DescribeStacks",
                "cloudformation:ListStackResources",
                "elasticmapreduce:TerminateJobFlows"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "GenericAccess"
        },
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*",
            "Effect": "Allow",
            "Sid": "LogAccess"
        }
    ]
}
  1. For Name, enter TerminateEMRPolicy and choose Create policy.
  2. Choose RolesCreate role.
  3. Under Choose the service that will use this role, choose Lambda, and then choose Next: Permissions.
  4. For Attach permissions policies, choose the arrow next to Filter policies and choose Customer managed in the drop-down list.
  5. Attach the TerminateEMRPolicy policy that you just created, and choose Review.
  6. For Role name, enter TerminateEMRLambdaRole and then choose Create role.

Terminate idle EMR – Lambda function

I created a deployment package to use with this function.

  1. Open the Lambda consoleand choose Create function.
  2. Choose Author from scratch, and provide the details as shown in the following screenshot:
  • Name: lambdaTerminateEMR
  • Runtime: Python 2.7
  • Role: Choose an existing role
  • Existing role: TerminateEMRLambdaRole

  1. Choose Create function.
  2. In the Function code section, for Code entry type, choose Upload a file from Amazon S3, and for Runtime, choose Python 2.7.

The Lambda function S3 link URL is

s3://emr-shutdown-blogartifacts/EMRTerminate.zip.

Link to the function: https://s3.amazonaws.com/emr-shutdown-blogartifacts/EMRTerminate.zip

This Lambda function is triggered by a CloudWatch alarm. It parses the input event, retrieves the JobFlowId, and deletes the AWS CloudFormation stack of the corresponding JobFlowId.

SNS topic and Lambda subscription

For setting the CloudWatch alarm in the further stages, you must create an Amazon SNS topic that notifies the preceding Lambda function to execute. Follow these steps to create an SNS topic and configure the Lambda endpoint.

  1. Navigate to the Amazon SNS console, and choose Create topic.
  2. Enter the Topic name and Display name, and choose Create topic.

  1. The topic is created and displayed in the Topics
  2. Select the topic and choose Actions, Subscribe to topic.

  1. In the Create subscription, choose the AWS Lambda Choose lambdaterminateEMR as the endpoint, and choose Create subscription.

Step 2. Execute the JupyterHub API token script as an EMR step

This step is required only when JupyterHub is enabled in the cluster.

Navigate to the EMR cluster to be monitored, and execute the scheduler script as an EMR step.

Command: s3://emr-shutdown-blogartifacts/jupyterhub_addAdminToken.sh

This script generates an API token and assigns it to the admin user. It is then picked up by the jupyterhub module in the monitoring script to track the activity of the application.

Step 3. Execute the scheduler script as an EMR step

Navigate to the EMR cluster to be monitored and execute the scheduler script as an EMR step.

Note:

Ensure that termination protection is disabled in the cluster. The termination protection flag causes the Lambda function to fail.

Command: s3://emr-shutdown-blogartifacts/schedule_script.sh

Parameter: s3://emr-shutdown-blogartifacts/pushShutDownMetrin.sh

The step function copies the pushShutDownMetric.sh script to the master node and schedules it to run every 5 minutes.

The schedule_script.sh is at https://s3.amazonaws.com/emr-shutdown-blogartifacts/schedule_script.sh.

The pushShutDownMetrin.sh is at https://s3.amazonaws.com/emr-shutdown-blogartifacts/pushShutDownMetrin.sh.

Step 4. Create a CloudWatch alarm

For single-step deployment, use this AWS CloudFormation template to launch and configure the resources in a single go.

The following parameters are available in the template.

ParameterDefaultDescription
AlarmNameTerminateIDLE-EMRAlarmThe name for the alarm.
EMRJobFlowIDRequires inputThe Jobflowid of the cluster.
EvaluationPeriodRequires inputThe idle timeout value—input should be in data points (1 data point equals 5 minutes). For example, to terminate the cluster if it is idle for 20 minutes, the input should be 4.
SNSSubscribeTopicRequires inputThe Amazon Resource Name (ARN) of the SNS topic to be triggered on the alarm.

 

The AWS CloudFormation CLI command is as follows:

aws cloudformation create-stack --stack-name EMRAlarmStack \
      --template-body s3://emr-shutdown-blogartifacts/Cloudformation/alarm.json \
      --parameters AlarmName=TerminateIDLE-EMRAlarm,EMRJobFlowID=<Input>,                 EvaluationPeriod=4,SNSSubscribeTopic=<Input>

For manual deployment, follow these steps to create the alarm.

  1. Open the Amazon CloudWatch console and choose Alarms.
  2. Choose Create Alarm.
  3. On the Select Metric page, under Custom Metrics, choose EMRShutdown/Cluster-Metric.

  1. Choose the isEMRUsed metric of the EMR JobFlowId, and then choose Next.

  1. Define the alarm as required. In this case, the alarm is set to send notification to the SNS topic shutDownEMRTest when CloudWatch receives the IsEMRUsed metric as 0 for every data point in the last 2 hours.

  1. Choose Create Alarm.

Summary

In this post, we focused on building a framework to cut down the additional cost that you might incur due to the idle running of an EMR cluster. The modules implemented in the shell script, the tracking of the execution status of the Spark scripts, and the Hive/Presto queries using the lightweight REST API calls make this approach an efficient solution.

If you have questions or suggestions, please comment below.

 


About the Author

Praveen Krishnamoorthy Ravikumar is an associate big data consultant with Amazon Web Services.

 

 

 

 

Build and automate a serverless data lake using an AWS Glue trigger for the Data Catalog and ETL jobs

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/big-data/build-and-automate-a-serverless-data-lake-using-an-aws-glue-trigger-for-the-data-catalog-and-etl-jobs/

Today, data is flowing from everywhere, whether it is unstructured data from resources like IoT sensors, application logs, and clickstreams, or structured data from transaction applications, relational databases, and spreadsheets. Data has become a crucial part of every business. This has resulted in a need to maintain a single source of truth and automate the entire pipeline—from data ingestion to transformation and analytics— to extract value from the data quickly.

There is a growing concern over the complexity of data analysis as the data volume, velocity, and variety increases. The concern stems from the number and complexity of steps it takes to get data to a state that is usable by business users. Often data engineering teams spend most of their time on building and optimizing extract, transform, and load (ETL) pipelines. Automating the entire process can reduce the time to value and cost of operations. In this post, we describe how to create a fully automated data cataloging and ETL pipeline to transform your data.

Architecture

In this post, you learn how to build and automate the following architecture.

You build your serverless data lake with Amazon Simple Storage Service (Amazon S3) as the primary data store. Given the scalability and high availability of Amazon S3, it is best suited as the single source of truth for your data.

You can use various techniques to ingest and store data in Amazon S3. For example, you can use Amazon Kinesis Data Firehose to ingest streaming data. You can use AWS Database Migration Service (AWS DMS) to ingest relational data from existing databases. And you can use AWS DataSync to ingest files from an on-premises Network File System (NFS).

Ingested data lands in an Amazon S3 bucket that we refer to as the raw zone. To make that data available, you have to catalog its schema in the AWS Glue Data Catalog. You can do this using an AWS Lambda function invoked by an Amazon S3 trigger to start an AWS Glue crawler that catalogs the data. When the crawler is finished creating the table definition, you invoke a second Lambda function using an Amazon CloudWatch Events rule. This step starts an AWS Glue ETL job to process and output the data into another Amazon S3 bucket that we refer to as the processed zone.

The AWS Glue ETL job converts the data to Apache Parquet format and stores it in the processed S3 bucket. You can modify the ETL job to achieve other objectives, like more granular partitioning, compression, or enriching of the data. Monitoring and notification is an integral part of the automation process. So as soon as the ETL job finishes, another CloudWatch rule sends you an email notification using an Amazon Simple Notification Service (Amazon SNS) topic. This notification indicates that your data was successfully processed.

In summary, this pipeline classifies and transforms your data, sending you an email notification upon completion.

Deploy the automated data pipeline using AWS CloudFormation

First, you use AWS CloudFormation templates to create all of the necessary resources. This removes opportunities for manual error, increases efficiency, and ensures consistent configurations over time.

Launch the AWS CloudFormation template with the following Launch stack button.

Be sure to choose the US East (N. Virginia) Region (us-east-1). Then enter the appropriate stack name, email address, and AWS Glue crawler name to create the Data Catalog. Add the AWS Glue database name to save the metadata tables. Acknowledge the IAM resource creation as shown in the following screenshot, and choose Create.

Note: It is important to enter your valid email address so that you get a notification when the ETL job is finished.

This AWS CloudFormation template creates the following resources in your AWS account:

  • Two Amazon S3 buckets to store both the raw data and processed Parquet data.
  • Two AWS Lambda functions: one to create the AWS Glue Data Catalog and another function to publish topics to Amazon SNS.
  • An Amazon Simple Queue Service (Amazon SQS) queue for maintaining the retry logic.
  • An Amazon SNS topic to inform you that your data has been successfully processed.
  • Two CloudWatch Events rules: one rule on the AWS Glue crawler and another on the AWS Glue ETL job.
  • AWS Identity and Access Management (IAM) roles for accessing AWS Glue, Amazon SNS, Amazon SQS, and Amazon S3.

When the AWS CloudFormation stack is ready, check your email and confirm the SNS subscription. Choose the Resources tab and find the details.

Follow these steps to verify your email subscription so that you receive an email alert as soon as your ETL job finishes.

  1. On the Amazon SNS console, in the navigation pane, choose Topics. An SNS topic named SNSProcessedEvent appears in the display.

  1. Choose the ARN The topic details page appears, listing the email subscription as Pending confirmation. Be sure to confirm the subscription for your email address as provided in the Endpoint column.

If you don’t see an email address, or the link is showing as not valid in the email, choose the corresponding subscription endpoint. Then choose Request confirmation to confirm your subscription. Be sure to check your email junk folder for the request confirmation link.

Configure an Amazon S3 bucket event trigger

In this section, you configure a trigger on a raw S3 bucket. So when new data lands in the bucket, you trigger GlueTriggerLambda, which was created in the AWS CloudFormation deployment.

To configure notifications:

  1. Open the Amazon S3 console.
  2. Choose the source bucket. In this case, the bucket name contains raws3bucket, for example, <stackname>-raws3bucket-1k331rduk5aph.
  3. Go to the Properties tab, and under Advanced settings, choose Events.

  1. Choose Add notification and configure a notification with the following settings:
  • Name– Enter a name of your choice. In this example, it is crawlerlambdaTrigger.
  • Events– Select the All object create events check box to create the AWS Glue Data Catalog when you upload the file.
  • Send to– Choose Lambda function.
  • Lambda– Choose the Lambda function that was created in the deployment section. Your Lambda function should contain the string GlueTriggerLambda.

See the following screenshot for all the settings. When you’re finished, choose Save.

For more details on configuring events, see How Do I Enable and Configure Event Notifications for an S3 Bucket? in the Amazon S3 Console User Guide.

Download the dataset

For this post, you use a publicly available New York green taxi dataset in CSV format. You upload monthly data to your raw zone and perform automated data cataloging using an AWS Glue crawler. After cataloging, an automated AWS Glue ETL job triggers to transform the monthly green taxi data to Parquet format and store it in the processed zone.

You can download the raw dataset from the NYC Taxi & Limousine Commission trip record data site. Download the monthly green taxi dataset and upload only one month of data. For example, first upload only the green taxi January 2018 data to the raw S3 bucket.

Automate the Data Catalog with an AWS Glue crawler

One of the important aspects of a modern data lake is to catalog the available data so that it’s easily discoverable. To run ETL jobs or ad hoc queries against your data lake, you must first determine the schema of the data along with other metadata information like location, format, and size. An AWS Glue crawler makes this process easy.

After you upload the data into the raw zone, the Amazon S3 trigger that you created earlier in the post invokes the GlueTriggerLambdafunction. This function creates an AWS Glue Data Catalog that stores metadata information inferred from the data that was crawled.

Open the AWS Glue console. You should see the database, table, and crawler that were created using the AWS CloudFormation template. Your AWS Glue crawler should appear as follows.

Browse to the table using the left navigation, and you will see the table in the database that you created earlier.

Choose the table name, and further explore the metadata discovered by the crawler, as shown following.

You can also view the columns, data types, and other details.  In following screenshot, Glue Crawler has created schema from files available in Amazon S3 by determining column name and respective data type. You can use this schema to create external table.

Author ETL jobs with AWS Glue

AWS Glue provides a managed Apache Spark environment to run your ETL job without maintaining any infrastructure with a pay as you go model.

Open the AWS Glue console and choose Jobs under the ETL section to start authoring an AWS Glue ETL job. Give the job a name of your choice, and note the name because you’ll need it later. Choose the already created IAM role with the name containing <stackname>– GlueLabRole, as shown following. Keep the other default options.

AWS Glue generates the required Python or Scala code, which you can customize as per your data transformation needs. In the Advanced properties section, choose Enable in the Job bookmark list to avoid reprocessing old data.

On the next page, choose your raw Amazon S3 bucket as the data source, and choose Next. On the Data target page, choose the processed Amazon S3 bucket as the data target path, and choose Parquet as the Format.

On the next page, you can make schema changes as required, such as changing column names, dropping ones that you’re less interested in, or even changing data types. AWS Glue generates the ETL code accordingly.

Lastly, review your job parameters, and choose Save Job and Edit Script, as shown following.

On the next page, you can modify the script further as per your data transformation requirements. For this post, you can leave the script as is. In the next section, you automate the execution of this ETL job.

Automate ETL job execution

As the frequency of data ingestion increases, you will want to automate the ETL job to transform the data. Automating this process helps reduce operational overhead and free your data engineering team to focus on more critical tasks.

AWS Glue is optimized for processing data in batches. You can configure it to process data in batches on a set time interval. How often you run a job is determined by how recent the end user expects the data to be and the cost of processing. For information about the different methods, see Triggering Jobs in AWS Glue in the AWS Glue Developer Guide.

First, you need to make one-time changes and configure your ETL job name in the Lambda function and the CloudWatch Events rule. On the console, open the ETLJobLambda Lambda function, which was created using the AWS CloudFormation stack.

Choose the Lambda function link that appears, and explore the code. Change the JobName value to the ETL job name that you created in the previous step, and then choose Save.

As shown in in the following screenshot, you will see an AWS CloudWatch Events rule CrawlerEventRule that is associated with an AWS Lambda function. When the CloudWatch Events rule receives a success status, it triggers the ETLJobLambda Lambda function.

Now you are all set to trigger your AWS Glue ETL job as soon as you upload a file in the raw S3 bucket. Before testing your data pipeline, set up the monitoring and alerts.

Monitoring and notification with Amazon CloudWatch Events

Suppose that you want to receive a notification over email when your AWS Glue ETL job is completed. To achieve that, the CloudWatch Events rule OpsEventRule was deployed from the AWS CloudFormation template in the data pipeline deployment section. This CloudWatch Events rule monitors the status of the AWS Glue ETL job and sends an email notification using an SNS topic upon successful completion of the job.

As the following image shows, you configure your AWS Glue job name in the Event pattern section in CloudWatch. The event triggers an SNS topic configured as a target when the AWS Glue job state changes to SUCCEEDED. This SNS topic sends an email notification to the email address that you provided in the deployment section to receive notification.

Let’s make one-time configuration changes in the CloudWatch Events rule OpsEventRule to capture the status of the AWS Glue ETL job.

  1. Open the CloudWatch console.
  2. In the navigation pane, under Events, choose Rules. Choose the rule name that contains OpsEventRule, as shown following.

  1. In the upper-right corner, choose Actions, Edit.

  1. Replace Your-ETL-jobName with the ETL job name that you created in the previous step.

  1. Scroll down and choose Configure details. Then choose Update rule.

Now that you have set up an entire data pipeline in an automated way with the appropriate notifications and alerts, it’s time to test your pipeline. If you upload new monthly data to the raw Amazon S3 bucket (for example, upload the NY green taxi February 2018 CSV), it triggers the GlueTriggerLambda AWS Lambda function. You can navigate to the AWS Glue console, where you can see that the AWS Glue crawler is running.

Upon completion of the crawler, the CloudWatch Events rule CrawlerEventRule triggers your ETLJobLambda Lambda function. You can notice now that the AWS Glue ETL job is running.

When the ETL job is successful, the CloudWatch Events rule OpsEventRule sends an email notification to you using an Amazon SNS topic, as shown following, hence completing the automation cycle.

Be sure to check your processed Amazon S3 bucket, where you will find transformed data processed by your automated ETL pipeline. Now that the processed data is ready in Amazon S3, you need to run the AWS Glue crawler on this Amazon S3 location. The crawler creates a metadata table with the relevant schema in the AWS Glue Data Catalog.

After the Data Catalog table is created, you can execute standard SQL queries using Amazon Athena and visualize the data using Amazon QuickSight. To learn more, see the blog post Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight

Conclusion

Having an automated serverless data lake architecture lessens the burden of managing data from its source to destination—including discovery, audit, monitoring, and data quality. With an automated data pipeline across organizations, you can identify relevant datasets and extract value much faster than before. The advantage of reducing the time to analysis is that businesses can analyze the data as it becomes available in real time. From the BI tools, queries return results much faster for a single dataset than for multiple databases.

Business analysts can now get their job done faster, and data engineering teams can free themselves from repetitive tasks. You can extend it further by loading your data into a data warehouse like Amazon Redshift or making it available for machine learning via Amazon SageMaker.

Additional resources

See the following resources for more information:

 


About the Author

Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture in hybrid and AWS environments. He enjoys spending time with his family outdoors and traveling to new destinations to discover new cultures.

 

 

 

Luis Lopez Soria is a partner solutions architect and serverless specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys doing sports in addition to traveling around the world exploring new foods and cultures.

 

 

 

Chirag Oswal is a partner solutions architect and AR/VR specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys video games and travel.

From Poll to Push: Transform APIs using Amazon API Gateway REST APIs and WebSockets

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/from-poll-to-push-transform-apis-using-amazon-api-gateway-rest-apis-and-websockets/

This post is courtesy of Adam Westrich – AWS Principal Solutions Architect and Ronan Prenty – Cloud Support Engineer

Want to deploy a web application and give a large number of users controlled access to data analytics? Or maybe you have a retail site that is fulfilling purchase orders, or an app that enables users to connect data from potentially long-running third-party data sources. Similar use cases exist across every imaginable industry and entail sending long-running requests to perform a subsequent action. For example, a healthcare company might build a custom web portal for physicians to connect and retrieve key metrics from patient visits.

This post is aimed at optimizing the delivery of information without needing to poll an endpoint. First, we outline the current challenges with the consumer polling pattern and alternative approaches to solve these information delivery challenges. We then show you how to build and deploy a solution in your own AWS environment.

Here is a glimpse of the sample app that you can deploy in your own AWS environment:

What’s the problem with polling?

Many customers need implement the delivery of long-running activities (such as a query to a data warehouse or data lake, or retail order fulfillment). They may have developed a polling solution that looks similar to the following:

  1. POST sends a request.
  2. GET returns an empty response.
  3. Another… GET returns an empty response.
  4. Yet another… GET returns an empty response.
  5. Finally, GET returns the data for which you were looking.

The challenges of traditional polling methods

  • Unnecessary chattiness and cost due to polling for result sets—Every time your frontend polls an API, it’s adding costs by leveraging infrastructure to compute the result. Empty polls are just wasteful!
  • Hampered mobile battery life—One of the top contributors of apps that eat away at battery life is excessive polling. Make sure that your app isn’t on your users’ Top App Battery Usage list-of-shame that could result in deletion.
  • Delayed data arrival due to polling schedule—Some approaches to polling include an incremental backoff to limit the number of empty polls. This sometimes results in a delay between data readiness and data arrival.

But what about long polling?

  • User request deadlocks can hamper application performance—Long synchronous user responses can lead to unexpected user wait times or UI deadlocks, which can affect mobile devices especially.
  • Memory leaks and consumption could bring your app down—Keeping long-running tasks queries open may overburden your backend and create failure scenarios, which may bring down your app.
  • HTTP default timeouts across browsers may result in inconsistent client experience—These timeouts vary across browsers, and can lead to an inconsistent experience across your end users. Depending on the size and complexity of the requests, processing can last longer than many of these timeouts and take minutes to return results.

Instead, create an event-driven architecture and move your APIs from poll to push.

Asynchronous push model

To create optimal UX experiences, frontend developers often strive to create progressive and reactive user experiences. Users can interact with frontend objects (for example, push buttons) with little lag when sending requests and receiving data. But frontend developers also want users to receive timely data, without sacrificing additional user actions or performing unnecessary processing.

The birth of microservices and the cloud over the past several years has enabled frontend developers and backend service designers to think about these problems in an asynchronous manner. This enables the virtually unlimited resources of the cloud to choreograph data processing. It also enables clients to benefit from progressive and reactive user experiences.

This is a fresh alternative to the synchronous design pattern, which often relies on client consumers to act as the conductor for all user requests. The following diagram compares the flow of communication between patterns.Comparison of communication patterns

Asynchronous orchestration can be easily choreographed using the workflow definition, monitoring, and tracking console with AWS Step Functions state machines. Break up services into functions with AWS Lambda and track executions within a state diagram, like the following:

With this approach, consumers send a request, which initiates downstream distributed processing. The subsequent steps are invoked according to the state machine workflow and each execution can be monitored.

Okay, but how does the consumer get the result of what they’re looking for?

Multiple approaches to this problem

There are different approaches for consumers to retrieve their resulting data. In order to create the optimal solution, there are several questions that a service owner may need to ask.

Is there known trust with the client (such as another microservice)?

If the answer is Yes, one approach is to use Amazon SNS. That way, consumers can subscribe to topics and have data delivered using email, SMS, or even HTTP/HTTPS as an event subscriber (that is, webhooks).

With webhooks, the consumer creates an endpoint where the service provider can issue a callback using a small amount of consumer-side resources. The consumer is waiting for an incoming request to facilitate downstream processing.

  1. Send a request.
  2. Subscribe the consumer to the topic.
  3. Open the endpoint.
  4. SNS sends the POST request to the endpoint.

If the trust answer is No, then clients may not be able to open a lightweight HTTP webhook after sending the initial request. In that case, you must open the webhook. Consider an alternative framework.

Are you restricted to only using a REST protocol?

Some applications can only run HTTP requests. These scenarios include the age of the technology or browser for end users, and potential security requirements that may block other protocols.

In these scenarios, customers may use a GET method as a best practice, but we still advise avoiding polling. In these scenarios, some design questions may be: Does data readiness happen at a predefined time or duration interval? Or can the user experience tolerate time between data readiness and data arrival?

If the answer to both of these is Yes, then consider trying to send GET calls to your RESTful API one time. For example, if a job averages 10 minutes, make your GET call at 10 minutes after the submission. Sounds simple, right? Much simpler than polling.

Should I use GraphQL, a WebSocket API, or another framework?

Each framework has tradeoffs.

If you want a more flexible query schema, you may gravitate to GraphQL, which follows a “data-driven UI” approach. If data drives the UI, then GraphQL may be the best solution for your use case.

AWS AppSync is a serverless GraphQL engine that supports the heavy lifting of these constructs. It offers functionality such as AWS service integration, offline data synchronization, and conflict resolution. With GraphQL, there’s a construct called Subscriptions where clients can subscribe to event-based subscriptions upon a data change.

Amazon API Gateway makes it easy for developers to deploy secure APIs at scale. With the recent introduction of API Gateway WebSocket APIs, web, mobile clients, and backend services can communicate over an established WebSocket connection. This also allows clients to be more reactive to data updates and only do work one time after an update has been received over the WebSocket connection.

The typical frontend design approach is to create a UI component that is updated when the results of the given procedure are complete. This is beneficial over a complete webpage refresh and gains in the customer’s user experience.

Because many companies have elected to use the REST framework for creating API-driven tightly bound service contracts, a RESTful interface can be used to validate and receive the request. It can provide a status endpoint used to deliver status. Also, it provides additional flexibility in delivering the result to the variety of clients, along with the WebSocket API.

Poll-to-push solution with API Gateway

Imagine a scenario where you want to be updated as soon as the data is created. Instead of the traditional polling methods described earlier, use an API Gateway WebSocket API. That pushes new data to the client as it’s created, so that it can be rendered on the client UI.

Alternatively, a WebSocket server can be deployed on Amazon EC2. With this approach, your server is always running to accept and maintain new connections. In addition, you manage the scaling of the instance manually at times of high demand.

By using an API Gateway WebSocket API in front of Lambda, you don’t need a machine to stay always on, eating away your project budget. API Gateway handles connections and invokes Lambda whenever there’s a new event. Scaling is handled on the service side. To update our connected clients from the backend, we can use the API Gateway callback URL. The AWS SDKs make communication from the backend easy. As an example, see the Boto3 sample of post_to_connection:

import boto3 
#Use a layer or deployment package to 
#include the latest boto3 version.
...
apiManagement = boto3.client('apigatewaymanagementapi', region_name={{api_region}},
                      endpoint_url={{api_url}})
...
response = apiManagement.post_to_connection(Data={{message}},ConnectionId={{connectionId}})
...

Solution example

To create this more optimized solution, create a simple web application that enables a user to make a request against a large dataset. It returns the results in a flat file over WebSocket. In this case, we’re querying, via Amazon Athena, a data lake on S3 populated with AWS Twitter sentiment (Twitter: @awscloud or #awsreinvent). However, this solution could apply to any data store, data mart, or data warehouse environment, or the long-running return of data for a response.

For the frontend architecture, create this web application using a JavaScript framework (such as JQuery). Use API Gateway to accept a REST API for the data and then open a WebSocket connection on the client to facilitate the return of results:poll to push solution architecture

  1. The client sends a REST request to API Gateway. This invokes a Lambda function that starts the Step Functions state machine execution. The function also returns a task token for the open connection activity worker. The function then returns the execution ARN and task token to the client.
  2. Using the data returned by the REST request, the client connects to the WebSocket API and sends the task token to the WebSocket connection.
  3. The WebSocket notifies the Step Functions state machine that the client is connected. The Lambda function completes the OpenConnection task through validating the task token and sending a success message.
  4. After RunAthenaQuery and OpenConnection are successful, the state machine updates the connected client over the WebSocket API that their long-running job is complete. It uses the REST API call post_to_connection.
  5. The client receives the update over their WebSocket connection using the IssueCallback Lambda function, with the callback URL from the API Gateway WebSocket API.

In this example application, the data response is an S3 presigned URL composed of results from Athena. You can then configure your frontend to use that S3 link to download to the client.

Why not just open the WebSocket API for the request?

While this approach can work, we advise against it for this use case. Break the required interfaces for this use case into three processes:

  1. Submit a request.
  2. Get the status of the request (to detect failure).
  3. Deliver the results.

For the submit request interface, RESTful APIs have strong controls to ensure that user requests for data are validated and given guardrails for the intended query purpose. This helps prevent rogue requests and unforeseen impacts to the analytics environment, especially when exposed to a large number of users.

With this example solution, you’re restricting the data requests to specific countries in the frontend JavaScript. Using the RESTful POST method for API requests enables you to validate data as query string parameters, such as the following:

https://<apidomain>.amazonaws.com/Demo/CreateStateMachineAndToken?Country=France

API Gateway models and mapping templates can also be used to validate or transform request payloads at the API layer before they hit the backend. Use models to ensure that the structure or contents of the client payload are the same as you expect. Use mapping templates to transform the payload sent by clients to another format, to be processed by the backend.

This REST validation framework can also be used to detect header information on WebSocket browser compatibility (external site). While using WebSocket has many advantages, not all browsers support it, especially older browsers. Therefore, a REST API request layer can pass this browser metadata and determine whether a WebSocket API can be opened.

Because a REST interface is already created for submitting the request, you can easily add another GET method if the client must query the status of the Step Functions state machine. That might be the case if a health check in the request is taking longer than expected. You can also add another GET method as an alternative access method for REST-only compatible clients.

If low-latency request and retrieval are an important characteristic of your API and there aren’t any browser-compatibility risks, use a WebSocket API with JSON model selection expressions to protect your backend with a schema.

In the spirit of picking the best tool for the job, use a REST API for the request layer and a WebSocket API to listen for the result.

This solution, although secure, is an example for how a non-polling solution can work on AWS. At scale, it may require refactoring due to the cross-talk at high concurrency that may result in client resubmissions.

To discover, deploy, and extend this solution into your own AWS environment, follow the PollToPush instructions in the AWS Serverless Application Repository.

Conclusion

When application consumers poll for long-running tasks, it can be a wasteful, detrimental, and costly use of resources. This post outlined multiple ways to refactor the polling method. Use API Gateway to host a RESTful interface, Step Functions to orchestrate your workflow, Lambda to perform backend processing, and an API Gateway WebSocket API to push results to your clients.

This Is My Architecture: Mobile Cryptocurrency Mining

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/this-is-my-architecture-mobile-cryptocurrency-mining/

In North America, approximately 95% of adults over the age of 25 have a bank account. In the developing world, that number is only about 52%. Cryptocurrencies can provide a platform for millions of unbanked people in the world to achieve financial freedom on a more level financial playing field.

Electroneum, a cryptocurrency company located in England, built its cryptocurrency mobile back end on AWS and is using the power of blockchain to unlock the global digital economy for millions of people in the developing world.

Electroneum’s cryptocurrency mobile app allows Electroneum customers in developing countries to transfer ETNs (exchange-traded notes) and pay for goods using their smartphones. Listen in to the discussion between AWS Solutions Architect Toby Knight and Electroneum CTO Barry Last as they explain how the company built its solution. Electroneum’s app is a web application that uses a feedback loop between its web servers and AWS WAF (a web application firewall) to automatically block malicious actors. The system then uses Athena, with a gamified approach, to provide an additional layer of blocking to prevent DDoS attacks. Finally, Electroneum built a serverless, instant payments system using AWS API Gateway, AWS Lambda, and Amazon DynamoDB to help its customers avoid the usual delays in confirming cryptocurrency transactions.

 

Amazon Cognito for Alexa Skills User Management

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/amazon-cognito-for-alexa-skills-user-management/

This post is courtesy of Tom Moore, Solutions Architect – AWS

If your Alexa skill is a general information skill, such as a random facts skill or a news feed, you can provide information to any user who has an Alexa enabled device with your skill turned on. However, sometimes you need to know who the user is before you can provide information to them. You can fulfill this user management scenario with Amazon Cognito user pools.

This blog post will show you how to set up an Amazon Cognito user pool and how to use it to perform authentication for both your Alexa skill and a webpage.

Getting started

In order to complete the steps in this blog post you will need the following:

  • An AWS account
  • An Amazon developer account
  • A basic understanding of Amazon Alexa skill development

This example will use a sample Alexa skill deployed from one of the available skill templates. To fully develop your own Alexa skill, you will need a professional code editor or IDE, as well as knowledge of Alexa skill development. It is beyond the scope of this blog post to cover these details.

Before you begin, consider the set of services that you will use and their availability. To implement this solution, you will use Amazon Cognito for user accounts and AWS Lambda for the Alexa function.

Today, AWS Lambda supports calls from Alexa in the following regions:

  • Asia Pacific (Tokyo)
  • EU (Ireland)
  • US East (N. Virginia)
  • US West (Oregon)

These four regions also support Amazon Cognito. While it is possible to use Amazon Cognito in a different region than your Lambda function, I recommend choosing one of the four listed regions to deploy your entire solution for simplicity.

Setting up Amazon Cognito

To set up Amazon Cognito, you’ll need to create a user pool, create an Alexa client, and set up your authentication UI.

Create your Amazon Cognito user pool

  1. Sign in to the Amazon Cognito console. You might be prompted for your AWS credentials.
  2. From the console navigation bar, choose one of the four regions listed above. For the purposes of this blog, I’ll use US East (N. Virginia).
  3. Choose Manage User Pools.Amazon Cognito
  4. Choose Create a user pool, and provide a name for your user pool. Remember that user pools may be used across multiple applications and platforms including web, mobile, and Alexa. The pool name does not have to be globally unique, but it should be unique in your account so you can easily find the pool when needed. I have named my user pool “Alexa Demo.”Create A User Pool
  5. After you name your pool, choose Step through settings. You can accept the defaults for the remaining steps to set up your user pool, with the following exceptions:
    • Choose email address or phone number as the sign-in method, and then choose Allow both email addresses and phone numbers.
    • Enable Multi-Factor Authentication (MFA).You can use Amazon Cognito to enforce Multi-Factor Authentication for your users. Amazon Cognito also allows you to validate email and phone numbers when the user is created. The verification process for phone numbers requires that Amazon Cognito is able to access the Amazon Simple Notification Service (SNS) service in order to dispatch the SMS message for phone number verification. This access is granted through the use of an AWS Identity and Access Management (IAM) service role. The Amazon Cognito Setup process can automatically create this role for you.
  6. To set up Multi-Factor Authentication:
    • Under Do you want to enable Multi-Factor Authentication (MFA), choose Optional.
    • Choose SMS text message as a second authentication factor, and then choose the options you want to be verified.
    • Choose Create Role, and then choose Next Step.Configure Multi-Factor Authentication
    • For more information, see Adding Multi-Factor Authentication (MFA) to a User Pool.Because the verification process sends SMS messages, some costs will be incurred on your account. If you have not already done so, you will need to request a spending increase on your account to accommodate those charges. To learn more about costs for SMS messages, see SMS Text Messages MFA.
  7. Review the selections that you have made. If you are happy with the settings that you have selected, choose Create Pool.

Create the Alexa client

By completing the steps above, you will have created an Amazon Cognito user pool. The next task in setting up account linking is to create the Alexa client definition inside the Amazon Cognito user pool.

  1. From the Amazon Cognito console, choose Manage User Pools. Select the user pool you just created.
  2. From the General settings menu, choose App Clients to set up applications that will connect to your Amazon Cognito user pool.General Settings for App Clients
  3. Choose Add an App Client, and provide the App client name. In this example, I have chosen “Alexa.” Leave the rest of the options set to default and choose Create App Client to generate the client record for Alexa to use. This process creates an app client ID and a secret.App Client Settings
    To learn more, see Configuring a User Pool App Client.

Set up your Authentication UI

Amazon Cognito can set up and manage the Authentication UI for your application so that you don’t have to host your own sign-in and sign-up UI for your Alexa application.

  1. From the App integration menu, choose Domain name.Choose Domain Name
  2. For this example, I will use an Amazon Cognito domain. Provide a subdomain name and choose Check Availability. If the option is available, choose Save Changes.Choosing a Domain Name

Setting up the Alexa skill

Now you can create the Alexa skill and link it back to the Amazon Cognito user pool that you created.

For step-by-step instructions for creating a new Alexa skill, see Create a New Skill in the Alexa documentation. Follow those instructions, with the following specific selections:

Under Choose a model to add to your skill, keep the default option of Custom.


Under Choose a method to host your skill’s back end resources, keep the default selection of Self Hosted.
Self Hosted

For a custom skill, you can choose a predefined skill template for the back end code for your skill. For this example, I’ll use a Fact Skill template as a starting point. The skill template prepopulates the Lambda function that your Alexa skill uses.

Fact Skill
After you create your sample skill, you’ll need to complete a few basic operations:

  • Set the invocation name of the skill
  • Prepare a Lambda function to handle the skill invocation
  • Connect the Alexa skill to your lambda
  • Test your skill

A full description of these steps is beyond the scope of this blog post. To learn more, see Manage Skills in the Developer Console. Once you have completed these steps, return to this post to continue linking your skill with Amazon Cognito.

Linking Alexa with Amazon Cognito

To link your Alexa skill with Amazon Cognito user pools, you’ll need to update both the Amazon Cognito and Alexa interfaces with data from the other service. I recommend that you have both interfaces open in different tabs of your web browser to make it easy to move back and forth between the two services.

  1. In Amazon Cognito, open the app pool that you created. Under General Settings, choose App Clients. Next, choose Show Details in the section for the Alexa Client that you set up earlier. Make a note of the App client ID and the App client secret. These will be needed to configure Alexa skills app linking.App Client Settings
  2. Switch over to your Alexa developer account and open the skill that you are linking to Amazon Cognito. Choose Account Linking.
  3. Select the option to allow users to link accounts. Leave the default option for an Auth Code Grant selected.TheAccount Linking
    Authorization URI will be made up of the following template:

    https://{Sub-Domain}.auth.{Region}.amazoncognito.com/oauth2/authorize?response_type=code&redirect_uri=https://pitangui.amazon.com/api/skill/link/{Vendor ID}

  4. Replace the {Sub-Domain} with the sub domain that you selected when you set up your Amazon Cognito user pool. In my example, it was “mooretom-alexademo”
  5. Replace {Vendor ID} with your specific vendor ID for your Alexa development account. The easiest way to find this is to scroll down to the bottom of the account linking page. Your Vendor ID will be the final piece of information in the Redirect URI’s.Redirect URLs
  6. Replace {Region} with the name of the region you are deploying your resources into. In my example, was us-east-1.
  7. The Access Token URI will be made up of the following template:
    https://{Sub-Domain}.auth.{region}.amazoncognito.com/oauth2/token

  8. Enter the app client ID and the app client secret that you noted above, or return to the Amazon Cognito tab to copy and paste them.Grant Auth Code
  9. Choose Save at the top of the page. Make a note of the redirect URLs at the bottom of the page, as these will be required to finish the Amazon Cognito configuration in the next step.
  10. Switch back to your Amazon Cognito user pool. Under App Integration, choose App Client Settings. You will see the integration settings for the Alexa client in the details panel on the right.
  11. Under Enabled Identity Providers, choose Cognito User Pool.
  12. Under Callback URL(s) enter in the three callback URLs from your Alexa skill page. For example, here are all three URLs separated by commas:
    https://alexa.amazon.co.jp/api/skill/link/{Vendor ID},
    https://layla.amazon.com/api/skill/link/{Vendor ID},
    https://pitangui.amazon.com/api/skill/link/{Vendor ID}

    The Sign Out URL will follow this template:

    https://{SubDomain}.auth.us-east-1.amazoncognito.com/logout?response_type=code

  13. Under Allowed OAuth Flows, select Authorization code grant.
  14. Under Allowed OAuth Scopes, select phone, email, and openid.Enable Identity Providers
  15. Choose Save Changes.

Testing your Alexa skill

After you have linked Alexa with Amazon Cognito, return to the Alexa developer console and build your model. Then log into the Alexa application on your mobile phone and enable the skill. When the skill is enabled, you will be able to configure access and create a new user with phone number authentication included automatically.

After going through the account creation steps, you can return to your Amazon Cognito user pool and see the new user you created.

New Customer

Conclusion

By completing the steps in this post, you have leveraged Amazon Cognito as a source of authentication for your Amazon Alexa skill. Amazon Cognito provides user authentication as well as sign-in and sign-up functionality without requiring you to write any code. You can now use the Amazon Cognito user ID to personalize the user experience for your Alexa skill. You can also use Amazon Cognito to authenticate your users to a companion application or website.

Outbound Voice Calling with Amazon Pinpoint

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/outbound-voice-calling-with-amazon-pinpoint/

This post is courtesy of Tom Moore, Solutions Architect – AWS

With the recent extension of Amazon Pinpoint to allow an outgoing voice channel, customers can now build applications that include voice messaging to their users. Potential use cases include two-factor authentication via voice for your website and automated reminders of upcoming appointments. This blog post guides you through the process of setting up this functionality.

The Amazon Pinpoint voice channel allows for outbound calls only. If your use case requires additional capabilities such as an interactive voice response (IVR) system, you need to use Amazon Connect instead for your messaging.

Prerequisites

As part of this configuration, you set a default AWS Region. You should set the default Region to the Region where Amazon Pinpoint is available. Valid Regions are currently US East (N. Virginia), US West (Oregon), EU (Ireland), and EU (Frankfurt). If you have already installed and configured the AWS CLI tools and your default Region doesn’t support Amazon Pinpoint, do one of the following:

  • Run the aws configure command and change the default Region
  • Specify the --region switch on any commands that you issue

The Region that you select for the AWS CLI must be the same region you select in the AWS Management Console. To change the Region on the console, choose the down arrow next to the displayed Region (N. Virginia in the following image) and select the new Region.

Region Selecter

Services

This blog post touches on the following AWS services:

Because the code for this blog post is in NodeJS, basic familiarity with JavaScript is helpful for understanding the code and making changes to it.

Pricing

This blog post uses two features that aren’t covered under the AWS Free Tier: Amazon Pinpoint long codes (virtual phone numbers) for messaging and Amazon Pinpoint voice messaging. For pricing information for these features, see Amazon Pinpoint long code pricing and Amazon Pinpoint voice message pricing.

For example, suppose that you set up the Amazon Pinpoint application in a US Region with a single phone number and make 10 minutes of outbound calls to US phone numbers. You incur the following charges.

ItemQuantityUnit CostTotal
Long codes1$1.00$1.00
Call charges10$0.013$0.13
Total$1.13

Creating an Amazon S3 bucket

To deploy your AWS SAM application, you need an Amazon S3 bucket to store the deployment files. When you create a bucket in your account, note the bucket name for later use, where YOUR_BUCKET appears in our code. This bucket is used for temporary storage of your AWS SAM deployments. It shouldn’t be publicly accessible.

On the Amazon S3 console, choose Create Bucket.

Create Bucket

Enter a name for the bucket. The name must conform to the Amazon S3 bucket naming requirements. Choose the Region where you will be deploying your Lambda function and using Amazon Pinpoint. Keep the rest of the defaults and choose Create.

Create Bucket Options

If you prefer, you can use the following command with the AWS CLI to create the S3 bucket in your account.

aws s3 mb s3://{Bucket Name}

Setting up Amazon Pinpoint

The first step in enabling outbound calling is to set up Amazon Pinpoint.

On the AWS Management Console, under Customer Engagement, choose Amazon Pinpoint. Enter a project name and choose Create a project.

Amazon Pinpoint

If you have already created Amazon Pinpoint projects in this Region, you get a project-list page instead of a getting-started page, as shown in the following image. On this page, choose Create a project and enter a project name.

Create a Project

Now you can select the project features that you want to enable. On the Configure features page, for SMS and voice, choose Configure.

Configure features

On the Set up SMS page, expand the Advanced configurations section and choose Request long codes.

Set up SMS

On the Long code specifications page, select the country that you want to request the long code (10-digit phone number) for. Keep the rest of the defaults and choose Request long codes.

Long Code Specifications

You’re assigned a phone number and returned to the Amazon Pinpoint configuration page. The phone number assigned to your application appears under Number settings, as shown in the following image. You can send voice messages only from a long code that your account owns.

SMS and Voice

This completes the Amazon Pinpoint setup.

Creating the application

AWS SAM provides a more streamlined process for creating serverless applications. The AWS SAM CLI also provides a convenient mechanism for packaging and deploying your serverless applications. For the code in this blog post, see Amazon Pinpoint Call Generator on GitHub. You can also deploy this application through the AWS Serverless Application Repository. For more information, see Amazon Pinpoint Call Generator.

Once you have a copy of the code, you need to make a few changes using your favorite text editor or IDE.

Modifying the template file

The template file, template.yaml, defines your AWS SAM application. Specifically, the template defines two resources: an IAM role for your serverless function and the serverless function itself.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Serverless application to trigger outbound calls from Pinpoint.
    
Globals:
    Function:
        Timeout: 30

Resources:
  CallGeneratorFunctionIamRole: 
    Type: AWS::IAM::Role
    Properties: 
      RoleName: PinpointCallGenerator-Role
      AssumeRolePolicyDocument: 
        Version: '2012-10-17'
        Statement: 
        - Effect: Allow
          Principal: 
            Service: lambda.amazonaws.com
          Action: 
          - sts:AssumeRole
      Path: '/'
      Policies: 
      - PolicyName: logs
        PolicyDocument: 
          Statement: 
          - Effect: Allow
            Action: 
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource: arn:aws:logs:*:*:*
      - PolicyName: Pinpoint
        PolicyDocument: 
          Statement: 
          - Effect: Allow
            Action: 
            - sms-voice:*
            Resource: '*'

  CallGeneratorFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: nodejs8.10
      FunctionName: PinpointCallGenerator
      Role: !GetAtt CallGeneratorFunctionIamRole.Arn
      Environment: 
        Variables:
          LongCode: '[YOUR_LONG_CODE_HERE]'
          Language: 'en-US' #Update this for different language
          Voice: 'Joanna'   #Update this for different voices
            
#Outputs:
    CallGeneratorLambdaFunction:
      Description: "Lambda function to trigger calls"
      Value: !GetAtt CallGeneratorFunction.Arn

    CallGeneratorFunctionIamRole:
      Description: "IAM Role created for this function"
      Value: !GetAtt CallGeneratorFunctionIamRole.Arn

The CallGeneratorFunctionIamRole IAM role allows the Lambda function to create CloudWatch Logs entries for monitoring the execution of your Lambda function and to call the Amazon Pinpoint voice service.

The Environment section of the CallGeneratorFunction definition sets the environment parameters that are provided to your Lambda function. By using environment variables, you can easily change the configuration for how your application makes calls without having to update your code.

Update the LongCode parameter to the number that you reserved through Amazon Pinpoint. In Amazon Pinpoint, the number appears as +1 123-456-7890, but in the template, you can’t use spaces or punctuation in the number: +11234567890.

Optionally, you can update the Language and Voice parameters to reflect different cultures. For valid options for these parameters, see Voices in Amazon Polly.

Understanding the source file

The main source file is app.js. It contains the NodeJS code for the application.

The exports line defines a standard Lambda handler that is called from the Lambda runtime. The triggerCall function handle the calling of Amazon Pinpoint asynchronously.

const AWS = require('aws-sdk');
var pinpointsmsvoice = new AWS.PinpointSMSVoice({apiVersion: '2018-09-05'});

function triggerCall (eventData) {
    return new Promise (resolve => {
        var parms = {
            Content: {
                SSMLMessage: {
                    LanguageCode : process.env.Language,
                    Text : eventData.Message,
                    VoiceId: process.env.Voice
                }
            },
            OriginationPhoneNumber: process.env.LongCode,
            DestinationPhoneNumber: eventData.PhoneNumber
        };

        console.log ("Call Parameters: ", JSON.stringify(parms));
        pinpointsmsvoice.sendVoiceMessage (parms, function (err, data) {
            if (err) {
                console.log ("Error : "+ err.message);
                resolve(eventData.PhoneNumber + " " + err.message);
            }
            else {
                console.log (data);
                resolve(eventData.PhoneNumber + " OK");
            }
        });
    });
}

exports.lambda_handler = async (event, context, callback) => {
    console.log ("In Function - lambda_handler")
    try {
        var result = await triggerCall (event);
    }
    catch (err) {
        console.log(err);
        callback(err, null);
    }
};

The parms structure defines the standard payload that is passed to Amazon Pinpoint to trigger a voice phone call. In this case, the parameters are all extracted from either the message payload or the environment variables defined in our AWS SAM template. We’re expecting the message to be passed in as a Synthesized Speech Markup Language (SSML) payload.

var parms = {
    Content: {
       SSMLMessage: {
            LanguageCode : process.env.Language,
            Text : eventData.Message,
            VoiceId: process.env.Voice
        }
    },
    OriginationPhoneNumber: process.env.LongCode,
    DestinationPhoneNumber: eventData.PhoneNumber
};

The following code sends the parameters off to Amazon Pinpoint to trigger the voice call and then resolves the asynchronous call.

pinpointsmsvoice.sendVoiceMessage (parms, function (err, data) {
    if (err) {
        console.log ("Error : "+ err.message);
        resolve(eventData.PhoneNumber + " " + err.message);
    }
    else {
        console.log (data);
        resolve(eventData.PhoneNumber + " OK");
    }
});

Packaging and deploying the application

Deploying an AWS SAM application requires the following commands.

sam validate

This command verifies that your template is valid, free from errors.

sam package --template-file template.yaml --output-template-file packaged.yaml --s3-bucket [YOUR_BUCKET]

This command packages up your resources into a zip file and uploads the resulting files to your S3 bucket in preparation for deployment. The command also creates the packaged.yaml template file, which contains the details necessary to deploy your application via AWS CloudFormation.

sam deploy --template-file packaged.yaml --stack-name pinpoint-call-generator --capabilities CAPABILITY_NAMED_IAM

This command deploys your packaged files using AWS CloudFormation.

After all commands have completed, your function is ready to test.

Testing the application

After you have deployed your application, you can test it on the Lambda console. Sign in to the AWS Management Console and then choose or search for Lambda.

On the Lambda console, choose the function’s name to open it.

Choose Your Lambda Function

On the function’s page, choose Test.

Choose Test

When you first choose Test, an editor opens. Here you can configure the payload that Lambda passes your function as part of the test call.

Configure Test Event

Replace the default text with the following.

{
    "Message" : "<speak>This is a text from <emphasis>Pinpoint</emphasis> using SSML. <break time='1s' /> I repeat. This is a text from <emphasis>Pinpoint</emphasis> using SSML.</speak>",
    "PhoneNumber" : "+11234567890"
}

The Message portion of the payload is defined in SSML. For more information about SSML, see Speech Synthesis Markup Language (SSML) Reference.

Update the PhoneNumber value with the phone number that you want to call and enter a name for your test payload. To save the configured payload to use in your tests, choose Save.

After the configuration panel closes, choose Test. Amazon Pinpoint calls your phone number and read the message out.

Conclusion

The blog post walked you through the basis of setting up outbound calling using Amazon Pinpoint. You can now trigger the Lambda function with any of the standard Lambda event triggers or with the AWS SDK in mobile or web applications. For example, you could provide a one-time password to users, trigger reminders for appointments, or notify someone when a file arrives in an S3 bucket.

The provided function code is intended to respond to single message-triggering events. These include application logic, files arriving in S3, or scheduled reminders. You need to make additional changes to support bulk event sources such as Amazon SQS or streaming sources such as Amazon DynamoDB streams and Amazon Kinesis. For more information about Lambda event sources, see Supported Event Sources.

If your use case requires additional resiliency, you might want to use Amazon SNS or Amazon SQS to deliver messages to Lambda. If your customers are from an international audience, you might consider passing the language and the voice through the event and updating the code to retrieve those values.

Using AWS Lambda and Amazon SNS to Get File Change Notifications from AWS CodeCommit

Post Syndicated from Eason Cao original https://aws.amazon.com/blogs/devops/using-aws-lambda-and-amazon-sns-to-get-file-change-notifications-from-aws-codecommit/

Notifications are an important part of DevOps workflows. Although you can set them up from any stage in the CI or CD pipelines, in this blog post, I will show you how to integrate AWS Lambda and Amazon SNS to extend AWS CodeCommit. Specifically, the solution described in this post makes it possible for you to receive detailed notifications from Amazon SNS about file changes and commit messages when an update is pushed to AWS CodeCommit.

Amazon SNS is a flexible, fully managed notifications service. It coordinates the delivery of messages to receivers. With Amazon SNS, you can fan out messages to a large number of subscribers, including distributed systems and services, and mobile devices. It is easy to set up, operate, and reliably send notifications to all your endpoints – at any scale.

AWS Lambda is our popular serverless service that lets you run code without provisioning or managing servers. In the example used in this post, I use a Lambda function to publish a topic through Amazon SNS to get an update notification.

Amazon CloudWatch is a monitoring and management service. It can collect operational data of AWS resources in the form of events. You can set up simple rules in Amazon CloudWatch to detect changes to your AWS resources. After CloudWatch captures the update event from your AWS resources, it can trigger specific targets to perform other actions (for example, to invoke a Lambda function).

To help you quickly deploy the solution, I have created an AWS CloudFormation template. AWS CloudFormation is a management tool that provides a common language to describe and provision all of the infrastructure resources in AWS.

 

Overview

The following diagram shows how to use AWS services to receive the CodeCommit file change event and details.

AWS CodeCommit supports several useful CloudWatch events, which can notify you of changes to AWS resources. By setting up simple rules, you can detect branch or repository changes. In this example, I create a CloudWatch event rule for an AWS CodeCommit repository so that any designated event invokes a Lambda function. When a change is made to the CodeCommit repository, CloudWatch detects the event and invokes the customized Lambda function.

When this Lambda function is triggered, the following steps are executed:

  1. Use the GetCommit operation in the CodeCommit API to get the latest commit. I want to compare the parent commit IDs with the last commit.
  2. For each commit, use the GetDifferences operation to get a list of each file that was added, modified, or deleted.
  3. Group the modification information from the comparison result and publish the message template to an SNS topic defined in the Lambda environment variable.
  4. Allow reviewers to subscribe to the SNS topic. Any update message from CodeCommit is published to subscribers.

I’ve used Python and Boto 3 to implement this function. The full source code has been published on GitHub. You can find the example in aws-codecommit-file-change-publisher repository.

 

Getting started

There is an AWS CloudFormation template, codecommit-sns-publisher.yml, in the source code. This template uses the AWS Serverless Application Model to define required components of the CodeCommit notification serverless application in simple and clean syntax.

The template is translated to an AWS CloudFormation stack and deploys an SNS topic, CloudWatch event rule, and Lambda function. The Lambda function code already demonstrates a simple notification use case. You can use the sample code to define your own logic and extend the function by using other APIs provided in the AWS SDK for Python (Boto3).

Prerequisites

Before you deploy this example, you must use the AWS CloudFormation template to create a CodeCommit repository. In this example, I have created an empty repository, sample-repo, in the Ohio (us-east-2) Region to demonstrate a scenario in which your repository has a file change or other update on a CodeCommit branch. If you already have a CodeCommit repository, follow these steps to deploy the template and Lambda function.

To deploy the AWS CloudFormation template and Lambda function

1. Download the source code from the aws-codecommit-file-change-publisher repository.

2. Sign in to the AWS Management Console and choose the AWS Region where your CodeCommit repository is located. Create an S3 bucket and then upload the AWS Lambda deployment package, codecommit-sns-publisher.zip, to it. For information, see How Do I Create an S3 Bucket? in the Amazon S3 Console User Guide.

3. Upload the Lambda deployment package to the S3 bucket.

In this example, I created an S3 bucket named codecommit-sns-publisher in the Ohio (us-east-2) Region and uploaded the deployment package from the Amazon S3 console.

4. In the AWS Management Console, choose CloudFormation. You can also open the AWS CloudFormation console directly at https://console.aws.amazon.com/cloudformation.

5. Choose Create Stack.

6. On the Select Template page, choose Upload a template to Amazon S3, and then choose the codecommit-sns-publisher.yml template.

7. Specify the following parameters:

  • Stack Name: codecommit-sns-publisher (You can use your own stack name, if you prefer.)
  • CodeS3BucketLocation: codecommit-sns-publisher (This is the S3 bucket name where you put the sample code.)
  • CodeS3KeyLocation: codecommit-sns-publisher.zip (This is the key name of the sample code S3 object. The object should be a zip file.)
  • CodeCommitRepo: sample-repo (The name of your CodeCommit repository.)
  • MainBranchName: master (Specify the branch name you would like to use as a trigger for publishing an SNS topic.)
  • NotificationEmailAddress: [email protected] (This is the email address you would like to use to subscribe to the SNS topic. The CloudFormation template creates an SNS topic to publish notifications to subscribers.)

8. Choose Next.

9. On the Review page, under Capabilities, choose the following options:

  • I acknowledge that AWS CloudFormation might create IAM resources.
  • I acknowledge that AWS CloudFormation might create IAM resources with custom names.

10. Under Transforms, choose Create Change Set. AWS CloudFormation starts to perform the template transformation and then creates a change set.

11. After the transformation, choose Execute to create the AWS CloudFormation stack.

After the stack has been created, you should receive an SNS subscription confirmation in your email account:

After you subscribe to the SNS topic, you can go to the AWS CloudFormation console and check the created AWS resources. If you would like to monitor the Lambda function, choose Resource to open the SNSPublisherFunction Lambda function.

Now, you can try to push a commit to the remote AWS CodeCommit repository.

1. Clone the CodeCommit repository to your local computer. For information, see Connect to an AWS CodeCommit Repository in the AWS CodeCommit User Guide. The following example shows how to clone a repository named sample-repo in the US East (Ohio) Region:

git clone ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/sample-repo

2. Enter the folder and create a plain text file:

cd sample-repo/
echo 'This is a sample file' > newfile

3. Add and commit this file change:

git add newfile
git commit -m 'Create initial file'

Look for this output:

[master (root-commit) 810d192] Create initial file
1 file changed, 1 insertion(+)
create mode 100644 newfile

4. Push the commit to the remote CodeCommit repository:

git push -u origin master:master

Look for this output:

Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), 235 bytes | 235.00 KiB/s, done.
…
* [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

After the local commit has been pushed to the remote CodeCommit repository, the CloudWatch event detects this update. You should see the following notification message in your email account:

Commit ID: <Commit ID>
author: [YourName] ([email protected]) - <Timestamp> +0000
message: Create initial file

File: newfile Addition - Blob ID: <Blob ID>

Summary

In this blog post, I showed you how to use an AWS CloudFormation template to quickly build a sample solution that can help your operations team or development team track updates to a CodeCommit repository.

The example CloudFormation template and Lambda function can be found in the aws-codecommit-file-change-publisher GitHub repository. Using the sample code, you can customize the email content with HTML or add other information to your email message.

If you have questions or other feedback about this example, please open an issue or submit a pull request.

Deploying a personalized API Gateway serverless developer portal

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/deploying-a-personalized-api-gateway-serverless-developer-portal/

This post is courtesy of Drew Dresser, Application Architect – AWS Professional Services

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Customers of these APIs often want a website to learn and discover APIs that are available to them. These customers might include front-end developers, third-party customers, or internal system engineers. To produce such a website, we have created the API Gateway serverless developer portal.

The API Gateway serverless developer portal (developer portal or portal, for short) is an application that you use to make your API Gateway APIs available to your customers by enabling self-service discovery of those APIs. Your customers can use the developer portal to browse API documentation, register for, and immediately receive their own API key that they can use to build applications, test published APIs, and monitor their own API usage.

Over the past few months, the team has been hard at work contributing to the open source project, available on Github. The developer portal was relaunched on October 29, 2018, and the team will continue to push features and take customer feedback from the open source community. Today, we’re happy to highlight some key functionality of the new portal.

Benefits for API publishers

API publishers use the developer portal to expose the APIs that they manage. As an API publisher, you need to set up, maintain, and enable the developer portal. The new portal has the following benefits:

Benefits for API consumers

API Consumers use the developer portal as a traditional application user. An API consumer needs to understand the APIs being published. API consumers might be front-end developers, distributed system engineers, or third-party customers. The new developer portal comes with the following benefits for API consumers:

  • Explore – API consumers can quickly page through lists of APIs. When they find one they’re interested in, they can immediately see documentation on that API.
  • Learn – API consumers might need to drill down deeper into an API to learn its details. They want to learn how to form requests and what they can expect as a response.
  • Test – Through the developer portal, API consumers can get an API key and invoke the APIs directly. This enables developers to develop faster and with more confidence.

Architecture

The developer portal is a completely serverless application. It leverages Amazon API Gateway, Amazon Cognito User Pools, AWS Lambda, Amazon DynamoDB, and Amazon S3. Serverless architectures enable you to build and run applications without needing to provision, scale, and manage any servers. The developer portal is broken down in to multiple microservices, each with a distinct responsibility, as shown in the following image.

Identity management for the developer portal is performed by Amazon Cognito and a Lambda function in the Login & Registration microservice. An Amazon Cognito User Pool is configured out of the box to enable users to register and login. Additionally, you can deploy the developer portal to use a UI hosted by Amazon Cognito, which you can customize to match your style and branding.

Requests are routed to static content served from Amazon S3 and built using React. The React app communicates to the Lambda backend via API Gateway. The Lambda function is built using the aws-serverless-express library and contains the business logic behind the APIs. The business logic of the web application queries and add data to the API Key Creation and Catalog Update microservices.

To maintain the API catalog, the Catalog Update microservice uses an S3 bucket and a Lambda function. When an API’s Swagger file is added or removed from the bucket, the Lambda function triggers and maintains the API catalog by updating the catalog.json file in the root of the S3 bucket.

To manage the mapping between API keys and customers, the application uses the API Key Creation microservice. The service updates API Gateway with API key creations or deletions and then stores the results in a DynamoDB table that maps customers to API keys.

Deploying the developer portal

You can deploy the developer portal using AWS SAM, the AWS SAM CLI, or the AWS Serverless Application Repository. To deploy with AWS SAM, you can simply clone the repository and then deploy the application using two commands from your CLI. For detailed instructions for getting started with the portal, see Use the Serverless Developer Portal to Catalog Your API Gateway APIs in the Amazon API Gateway Developer Guide.

Alternatively, you can deploy using the AWS Serverless Application Repository as follows:

  1. Navigate to the api-gateway-dev-portal application and choose Deploy in the top right.
  2. On the Review page, for ArtifactsS3BucketName and DevPortalSiteS3BucketName, enter globally unique names. Both buckets are created for you.
  3. To deploy the application with these settings, choose Deploy.
  4. After the stack is complete, get the developer portal URL by choosing View CloudFormation Stack. Under Outputs, choose the URL in Value.

The URL opens in your browser.

You now have your own serverless developer portal application that is deployed and ready to use.

Publishing a new API

With the developer portal application deployed, you can publish your own API to the portal.

To get started:

  1. Create the PetStore API, which is available as a sample API in Amazon API Gateway. The API must be created and deployed and include a stage.
  2. Create a Usage Plan, which is required so that API consumers can create API keys in the developer portal. The API key is used to test actual API calls.
  3. On the API Gateway console, navigate to the Stages section of your API.
  4. Choose Export.
  5. For Export as Swagger + API Gateway Extensions, choose JSON. Save the file with the following format: apiId_stageName.json.
  6. Upload the file to the S3 bucket dedicated for artifacts in the catalog path. In this post, the bucket is named apigw-dev-portal-artifacts. To perform the upload, run the following command.
    aws s3 cp apiId_stageName.json s3://yourBucketName/catalog/apiId_stageName.json

Uploading the file to the artifacts bucket with a catalog/ key prefix automatically makes it appear in the developer portal.

This might be familiar. It’s your PetStore API documentation displayed in the OpenAPI format.

With an API deployed, you’re ready to customize the portal’s look and feel.

Customizing the developer portal

Adding a customer’s own look and feel to the developer portal is easy, and it creates a great user experience. You can customize the domain name, text, logo, and styling. For a more thorough walkthrough of customizable components, see Customization in the GitHub project.

Let’s walk through a few customizations to make your developer portal more familiar to your API consumers.

Customizing the logo and images

To customize logos, images, or content, you need to modify the contents of the your-prefix-portal-static-assets S3 bucket. You can edit files using the CLI or the AWS Management Console.

Start customizing the portal by using the console to upload a new logo in the navigation bar.

  1. Upload the new logo to your bucket with a key named custom-content/nav-logo.png.
    aws s3 cp {myLogo}.png s3://yourPrefix-portal-static-assets/custom-content/nav-logo.png
  2. Modify object permissions so that the file is readable by everyone because it’s a publicly available image. The new navigation bar looks something like this:

Another neat customization that you can make is to a particular API and stage image. Maybe you want your PetStore API to have a dog picture to represent the friendliness of the API. To add an image:

  1. Use the command line to copy the image directly to the S3 bucket location.
    aws s3 cp my-image.png s3://yourPrefix-portal-static-assets/custom-content/api-logos/apiId-stageName.png
  2. Modify object permissions so that the file is readable by everyone.

Customizing the text

Next, make sure that the text of the developer portal welcomes your pet-friendly customer base. The YAML files in the static assets bucket under /custom-content/content-fragments/ determine the portal’s text content.

To edit the text:

  1. On the AWS Management Console, navigate to the website content S3 bucket and then navigate to /custom-content/content-fragments/.
  2. Home.md is the content displayed on the home page, APIs.md controls the tab text on the navigation bar, and GettingStarted.md contains the content of the Getting Started tab. All three files are written in markdown. Download one of them to your local machine so that you can edit the contents. The following image shows Home.md edited to contain custom text:
  3. After editing and saving the file, upload it back to S3, which results in a customized home page. The following image reflects the configuration changes in Home.md from the previous step:

Customizing the domain name

Finally, many customers want to give the portal a domain name that they own and control.

To customize the domain name:

  1. Use AWS Certificate Manager to request and verify a managed certificate for your custom domain name. For more information, see Request a Public Certificate in the AWS Certificate Manager User Guide.
  2. Copy the Amazon Resource Name (ARN) so that you can pass it to the developer portal deployment process. That process is now includes the certificate ARN and a property named UseRoute53Nameservers. If the property is set to true, the template creates a hosted zone and record set in Amazon Route 53 for you. If the property is set to false, the template expects you to use your own name server hosting.
  3. If you deployed using the AWS Serverless Application Repository, navigate to the Application page and deploy the application along with the certificate ARN.

After the developer portal is deployed and your CNAME record has been added, the website is accessible from the custom domain name as well as the new Amazon CloudFront URL.

Customizing the logo, text content, and domain name are great tools to make the developer portal feel like an internally developed application. In this walkthrough, you completely changed the portal’s appearance to enable developers and API consumers to discover and browse APIs.

Conclusion

The developer portal is available to use right away. Support and feature enhancements are tracked in the public GitHub. You can contribute to the project by following the Code of Conduct and Contributing guides. The project is open-sourced under the Amazon Open Source Code of Conduct. We plan to continue to add functionality and listen to customer feedback. We can’t wait to see what customers build with API Gateway and the API Gateway serverless developer portal.