Tag Archives: AWS X-Ray

Managing AWS Lambda Function Concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/

One of the key benefits of serverless applications is the ease in which they can scale to meet traffic demands or requests, with little to no need for capacity planning. In AWS Lambda, which is the core of the serverless platform at AWS, the unit of scale is a concurrent execution. This refers to the number of executions of your function code that are happening at any given time.

Thinking about concurrent executions as a unit of scale is a fairly unique concept. In this post, I dive deeper into this and talk about how you can make use of per function concurrency limits in Lambda.

Understanding concurrency in Lambda

Instead of diving right into the guts of how Lambda works, here’s an appetizing analogy: a magical pizza.
Yes, a magical pizza!

This magical pizza has some unique properties:

  • It has a fixed maximum number of slices, such as 8.
  • Slices automatically re-appear after they are consumed.
  • When you take a slice from the pizza, it does not re-appear until it has been completely consumed.
  • One person can take multiple slices at a time.
  • You can easily ask to have the number of slices increased, but they remain fixed at any point in time otherwise.

Now that the magical pizza’s properties are defined, here’s a hypothetical situation of some friends sharing this pizza.

Shawn, Kate, Daniela, Chuck, Ian and Avleen get together every Friday to share a pizza and catch up on their week. As there is just six of them, they can easily all enjoy a slice of pizza at a time. As they finish each slice, it re-appears in the pizza pan and they can take another slice again. Given the magical properties of their pizza, they can continue to eat all they want, but with two very important constraints:

  • If any of them take too many slices at once, the others may not get as much as they want.
  • If they take too many slices, they might also eat too much and get sick.

One particular week, some of the friends are hungrier than the rest, taking two slices at a time instead of just one. If more than two of them try to take two pieces at a time, this can cause contention for pizza slices. Some of them would wait hungry for the slices to re-appear. They could ask for a pizza with more slices, but then run the same risk again later if more hungry friends join than planned for.

What can they do?

If the friends agreed to accept a limit for the maximum number of slices they each eat concurrently, both of these issues are avoided. Some could have a maximum of 2 of the 8 slices, or other concurrency limits that were more or less. Just so long as they kept it at or under eight total slices to be eaten at one time. This would keep any from going hungry or eating too much. The six friends can happily enjoy their magical pizza without worry!

Concurrency in Lambda

Concurrency in Lambda actually works similarly to the magical pizza model. Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed, just like the count of slices in the pizza. As of May 2017, the default limit is 1000 “slices” of concurrency per AWS Region.

Also like the magical pizza, each concurrency “slice” can only be consumed individually one at a time. After consumption, it becomes available to be consumed again. Services invoking Lambda functions can consume multiple slices of concurrency at the same time, just like the group of friends can take multiple slices of the pizza.

Let’s take our example of the six friends and bring it back to AWS services that commonly invoke Lambda:

  • Amazon S3
  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon Cognito

In a single account with the default concurrency limit of 1000 concurrent executions, any of these four services could invoke enough functions to consume the entire limit or some part of it. Just like with the pizza example, there is the possibility for two issues to pop up:

  • One or more of these services could invoke enough functions to consume a majority of the available concurrency capacity. This could cause others to be starved for it, causing failed invocations.
  • A service could consume too much concurrent capacity and cause a downstream service or database to be overwhelmed, which could cause failed executions.

For Lambda functions that are launched in a VPC, you have the potential to consume the available IP addresses in a subnet or the maximum number of elastic network interfaces to which your account has access. For more information, see Configuring a Lambda Function to Access Resources in an Amazon VPC. For information about elastic network interface limits, see Network Interfaces section in the Amazon VPC Limits topic.

One way to solve both of these problems is applying a concurrency limit to the Lambda functions in an account.

Configuring per function concurrency limits

You can now set a concurrency limit on individual Lambda functions in an account. The concurrency limit that you set reserves a portion of your account level concurrency for a given function. All of your functions’ concurrent executions count against this account-level limit by default.

If you set a concurrency limit for a specific function, then that function’s concurrency limit allocation is deducted from the shared pool and assigned to that specific function. AWS also reserves 100 units of concurrency for all functions that don’t have a specified concurrency limit set. This helps to make sure that future functions have capacity to be consumed.

Going back to the example of the consuming services, you could set throttles for the functions as follows:

Amazon S3 function = 350
Amazon Kinesis function = 200
Amazon DynamoDB function = 200
Amazon Cognito function = 150
Total = 900

With the 100 reserved for all non-concurrency reserved functions, this totals the account limit of 1000.

Here’s how this works. To start, create a basic Lambda function that is invoked via Amazon API Gateway. This Lambda function returns a single “Hello World” statement with an added sleep time between 2 and 5 seconds. The sleep time simulates an API providing some sort of capability that can take a varied amount of time. The goal here is to show how an API that is underloaded can reach its concurrency limit, and what happens when it does.
To create the example function

  1. Open the Lambda console.
  2. Choose Create Function.
  3. For Author from scratch, enter the following values:
    1. For Name, enter a value (such as concurrencyBlog01).
    2. For Runtime, choose Python 3.6.
    3. For Role, choose Create new role from template and enter a name aligned with this function, such as concurrencyBlogRole.
  4. Choose Create function.
  5. The function is created with some basic example code. Replace that code with the following:

import time
from random import randint
seconds = randint(2, 5)

def lambda_handler(event, context):
time.sleep(seconds)
return {"statusCode": 200,
"body": ("Hello world, slept " + str(seconds) + " seconds"),
"headers":
{
"Access-Control-Allow-Headers": "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token",
"Access-Control-Allow-Methods": "GET,OPTIONS",
}}

  1. Under Basic settings, set Timeout to 10 seconds. While this function should only ever take up to 5-6 seconds (with the 5-second max sleep), this gives you a little bit of room if it takes longer.

  1. Choose Save at the top right.

At this point, your function is configured for this example. Test it and confirm this in the console:

  1. Choose Test.
  2. Enter a name (it doesn’t matter for this example).
  3. Choose Create.
  4. In the console, choose Test again.
  5. You should see output similar to the following:

Now configure API Gateway so that you have an HTTPS endpoint to test against.

  1. In the Lambda console, choose Configuration.
  2. Under Triggers, choose API Gateway.
  3. Open the API Gateway icon now shown as attached to your Lambda function:

  1. Under Configure triggers, leave the default values for API Name and Deployment stage. For Security, choose Open.
  2. Choose Add, Save.

API Gateway is now configured to invoke Lambda at the Invoke URL shown under its configuration. You can take this URL and test it in any browser or command line, using tools such as “curl”:


$ curl https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Hello world, slept 2 seconds

Throwing load at the function

Now start throwing some load against your API Gateway + Lambda function combo. Right now, your function is only limited by the total amount of concurrency available in an account. For this example account, you might have 850 unreserved concurrency out of a full account limit of 1000 due to having configured a few concurrency limits already (also the 100 concurrency saved for all functions without configured limits). You can find all of this information on the main Dashboard page of the Lambda console:

For generating load in this example, use an open source tool called “hey” (https://github.com/rakyll/hey), which works similarly to ApacheBench (ab). You test from an Amazon EC2 instance running the default Amazon Linux AMI from the EC2 console. For more help with configuring an EC2 instance, follow the steps in the Launch Instance Wizard.

After the EC2 instance is running, SSH into the host and run the following:


sudo yum install go
go get -u github.com/rakyll/hey

“hey” is easy to use. For these tests, specify a total number of tests (5,000) and a concurrency of 50 against the API Gateway URL as follows(replace the URL here with your own):


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

The output from “hey” tells you interesting bits of information:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

Summary:
Total: 381.9978 secs
Slowest: 9.4765 secs
Fastest: 0.0438 secs
Average: 3.2153 secs
Requests/sec: 13.0891
Total data: 140024 bytes
Size/request: 28 bytes

Response time histogram:
0.044 [1] |
0.987 [2] |
1.930 [0] |
2.874 [1803] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.817 [1518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4.760 [719] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
5.703 [917] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6.647 [13] |
7.590 [14] |
8.533 [9] |
9.477 [4] |

Latency distribution:
10% in 2.0224 secs
25% in 2.0267 secs
50% in 3.0251 secs
75% in 4.0269 secs
90% in 5.0279 secs
95% in 5.0414 secs
99% in 5.1871 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0000 secs, 0.0332 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0046 secs
req write: 0.0000 secs, 0.0000 secs, 0.0005 secs
resp wait: 3.2149 secs, 0.0438 secs, 9.4472 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
[200] 4997 responses
[502] 3 responses

You can see a helpful histogram and latency distribution. Remember that this Lambda function has a random sleep period in it and so isn’t entirely representational of a real-life workload. Those three 502s warrant digging deeper, but could be due to Lambda cold-start timing and the “second” variable being the maximum of 5, causing the Lambda functions to time out. AWS X-Ray and the Amazon CloudWatch logs generated by both API Gateway and Lambda could help you troubleshoot this.

Configuring a concurrency reservation

Now that you’ve established that you can generate this load against the function, I show you how to limit it and protect a backend resource from being overloaded by all of these requests.

  1. In the console, choose Configure.
  2. Under Concurrency, for Reserve concurrency, enter 25.

  1. Click on Save in the top right corner.

You could also set this with the AWS CLI using the Lambda put-function-concurrency command or see your current concurrency configuration via Lambda get-function. Here’s an example command:


$ aws lambda get-function --function-name concurrencyBlog01 --output json --query Concurrency
{
"ReservedConcurrentExecutions": 25
}

Either way, you’ve set the Concurrency Reservation to 25 for this function. This acts as both a limit and a reservation in terms of making sure that you can execute 25 concurrent functions at all times. Going above this results in the throttling of the Lambda function. Depending on the invoking service, throttling can result in a number of different outcomes, as shown in the documentation on Throttling Behavior. This change has also reduced your unreserved account concurrency for other functions by 25.

Rerun the same load generation as before and see what happens. Previously, you tested at 50 concurrency, which worked just fine. By limiting the Lambda functions to 25 concurrency, you should see rate limiting kick in. Run the same test again:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

While this test runs, refresh the Monitoring tab on your function detail page. You see the following warning message:

This is great! It means that your throttle is working as configured and you are now protecting your downstream resources from too much load from your Lambda function.

Here is the output from a new “hey” command:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Summary:
Total: 379.9922 secs
Slowest: 7.1486 secs
Fastest: 0.0102 secs
Average: 1.1897 secs
Requests/sec: 13.1582
Total data: 164608 bytes
Size/request: 32 bytes

Response time histogram:
0.010 [1] |
0.724 [3075] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.438 [0] |
2.152 [811] |∎∎∎∎∎∎∎∎∎∎∎
2.866 [11] |
3.579 [566] |∎∎∎∎∎∎∎
4.293 [214] |∎∎∎
5.007 [1] |
5.721 [315] |∎∎∎∎
6.435 [4] |
7.149 [2] |

Latency distribution:
10% in 0.0130 secs
25% in 0.0147 secs
50% in 0.0205 secs
75% in 2.0344 secs
90% in 4.0229 secs
95% in 5.0248 secs
99% in 5.0629 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0000 secs, 0.0537 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0184 secs
req write: 0.0000 secs, 0.0000 secs, 0.0016 secs
resp wait: 1.1892 secs, 0.0101 secs, 7.1038 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
[502] 3076 responses
[200] 1924 responses

This looks fairly different from the last load test run. A large percentage of these requests failed fast due to the concurrency throttle failing them (those with the 0.724 seconds line). The timing shown here in the histogram represents the entire time it took to get a response between the EC2 instance and API Gateway calling Lambda and being rejected. It’s also important to note that this example was configured with an edge-optimized endpoint in API Gateway. You see under Status code distribution that 3076 of the 5000 requests failed with a 502, showing that the backend service from API Gateway and Lambda failed the request.

Other uses

Managing function concurrency can be useful in a few other ways beyond just limiting the impact on downstream services and providing a reservation of concurrency capacity. Here are two other uses:

  • Emergency kill switch
  • Cost controls

Emergency kill switch

On occasion, due to issues with applications I’ve managed in the past, I’ve had a need to disable a certain function or capability of an application. By setting the concurrency reservation and limit of a Lambda function to zero, you can do just that.

With the reservation set to zero every invocation of a Lambda function results in being throttled. You could then work on the related parts of the infrastructure or application that aren’t working, and then reconfigure the concurrency limit to allow invocations again.

Cost controls

While I mentioned how you might want to use concurrency limits to control the downstream impact to services or databases that your Lambda function might call, another resource that you might be cautious about is money. Setting the concurrency throttle is another way to help control costs during development and testing of your application.

You might want to prevent against a function performing a recursive action too quickly or a development workload generating too high of a concurrency. You might also want to protect development resources connected to this function from generating too much cost, such as APIs that your Lambda function calls.

Conclusion

Concurrent executions as a unit of scale are a fairly unique characteristic about Lambda functions. Placing limits on how many concurrency “slices” that your function can consume can prevent a single function from consuming all of the available concurrency in an account. Limits can also prevent a function from overwhelming a backend resource that isn’t as scalable.

Unlike monolithic applications or even microservices where there are mixed capabilities in a single service, Lambda functions encourage a sort of “nano-service” of small business logic directly related to the integration model connected to the function. I hope you’ve enjoyed this post and configure your concurrency limits today!

Latency Distribution Graph in AWS X-Ray

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/latency-distribution-graph-in-aws-x-ray/

We’re continuing to iterate on the AWS X-Ray service based on customer feedback and today we’re excited to release a set of tools to help you quickly dive deep on latencies in your applications. Visual Node and Edge latency distribution graphs are shown in a handy new “Service Details” side bar in your X-Ray Service Map.

The X-Ray service graph gives you a visual representation of services and their interactions over a period of time that you select. The nodes represent services and the edges between the nodes represent calls between the services. The nodes and edges each have a set of statistics associated with them. While the visualizations provided in the service map are useful for estimating the average latency in an application they don’t help you to dive deep on specific issues. Most of the time issues occur at statistical outliers. To alleviate this X-Ray computes histograms like the one above help you solve those 99th percentile bugs.

To see a Response Distribution for a Node just click on it in the service graph. You can also click on the edges between the nodes to see the Response Distribution from the viewpoint of the calling service.

The team had a few interesting problems to solve while building out this feature and I wanted to share a bit of that with you now! Given the large number of traces an app can produce it’s not a great idea (for your browser) to plot every single trace client side. Instead most plotting libraries, when dealing with many points, use approximations and bucketing to get a network and performance friendly histogram. If you’ve used monitoring software in the past you’ve probably seen as you zoom in on the data you get higher fidelity. The interesting thing about the latencies coming in from X-Ray is that they vary by several orders of magnitude.

If the latencies were distributed between strictly 0s and 1s you could easily just create 10 buckets of 100 milliseconds. If your apps are anything like mine there’s a lot of interesting stuff happening in the outliers, so it’s beneficial to have more fidelity at 1% and 99% than it is at 50%. The problem with fixed bucket sizes is that they’re not necessarily giving you an accurate summary of data. So X-Ray, for now, uses dynamic bucket sizing based on the t-digests algorithm by Ted Dunning and Otmar Ertl. One of the distinct advantages of this algorithm over other approximation algorithms is its accuracy and precision at extremes (where most errors typically are).

An additional advantage of X-Ray over other monitoring software is the ability to measure two perspectives of latency simultaneously. Developers almost always have some view into the server side latency from their application logs but with X-Ray you can examine latency from the view of each of the clients, services, and microservices that you’re interacting with. You can even dive deeper by adding additional restrictions and queries on your selection. You can identify the specific users and clients that are having issues at that 99th percentile.

This info has already been available in API calls to GetServiceGraph as ResponseTimeHistogram but now we’re exposing it in the console as well to make it easier for customers to consume. For more information check out the documentation here.

Randall

AWS Lambda Support for AWS X-Ray

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-lambda-support-for-aws-x-ray/

Today we’re announcing general availability of AWS Lambda support for AWS X-Ray. As you may already know from Jeff’s GA POST, X-Ray is an AWS service for analyzing the execution and performance behavior of distributed applications. Traditional debugging methods don’t work so well for microservice based applications, in which there are multiple, independent components running on different services. X-Ray allows you to rapidly diagnose errors, slowdowns, and timeouts by breaking down the latency in your applications. I’ll demonstrate how you can use X-Ray in your own applications in just a moment by walking us through building and analyzing a simple Lambda based application.

If you just want to get started right away you can easily turn on X-Ray for your existing Lambda functions by navigating to your function’s configuration page and enabling tracing:

Or in the AWS Command Line Interface (CLI) by updating the functions’s tracing-config (Be sure to pass in a --function-name as well):

$ aws lambda update-function-configuration --tracing-config '{"Mode": "Active"}'

When tracing mode is active Lambda will attempt to trace your function (unless explicitly told not to trace by an upstream service). Otherwise, your function will only be traced if it is explicitly told to do so by an upstream service. Once tracing is enabled, you’ll start generating traces and you’ll get a visual representation of the resources in your application and the connections (edges) between them. One thing to note is that the X-Ray daemon does consume some of your Lambda function’s resources. If you’re getting close to your memory limit Lambda will try to kill the X-Ray daemon to avoid throwing an out-of-memory error.

Let’s test this new integration out by building a quick application that uses a few different services.


As twenty-something with a smartphone I have a lot of pictures selfies (10000+!) and I thought it would be great to analyze all of them. We’ll write a simple Lambda function with the Java 8 runtime that responds to new images uploaded into an Amazon Simple Storage Service (S3) bucket. We’ll use Amazon Rekognition on the photos and store the detected labels in Amazon DynamoDB.

service map

First, let’s define a few quick X-Ray vocabulary words: subsegments, segments, and traces. Got that? X-Ray is easy to understand if you remember that subsegments and segments make up traces which X-Ray processes to generate service graphs. Service graphs make a nice visual representation we can see above (with different colors indicating various request responses). The compute resources that run your applications send data about the work they’re doing in the form of segments. You can add additional annotations about that data and more granular timing of your code by creating subsgements. The path of a request through your application is tracked with a trace. A trace collects all the segments generated by a single request. That means you can easily trace Lambda events coming in from S3 all the way to DynamoDB and understand where errors and latencies are cropping up.

So, we’ll create an S3 bucket called selfies-bucket, a DynamoDB table called selfies-table, and a Lambda function. We’ll add a trigger to our Lambda function for the S3 bucket on ObjectCreated:All events. Our Lambda function code will be super simple and you can look at it in it’s entirety here. With no code changes we can enable X-Ray in our Java function by including the aws-xray-sdk and aws-xray-sdk-recorder-aws-sdk-instrumentor packages in our JAR.

Let’s trigger some photo uploads and get a look at the traces in X-Ray.

We’ve got some data! We can click on one of these individual traces for a lot of detailed information on our invocation.

In the first AWS::Lambda segmet we see the dwell time of the function, how long it spent waiting to execute, followed by the number of execution attempts.

In the second AWS::Lambda::Function segment there are a few possible subsegments:

  • The inititlization subsegment includes all of the time spent before your function handler starts executing
  • The outbound service calls
  • Any of your custom subsegments (these are really easy to add)

Hmm, it seems like there’s a bit of an issue on the DynamoDB side. We can even dive deeper and get the full exception stacktrace by clicking on the error icon. You can see we’ve been throttled by DynamoDB because we’re out of write capacity units. Luckily we can add more with just a few clicks or a quick API call. As we do that we’ll see more and more green on our service map!

The X-Ray SDKs make it super easy to emit data to X-Ray, but you don’t have to use them to talk to the X-Ray daemon. For Python, you can check out this library from rackspace called fleece. The X-Ray service is full of interesting stuff and the best place to learn more is by hopping over to the documentation. I’ve been using it for my @awscloudninja bot and it’s working great! Just keep in mind that this isn’t an official library and isn’t supported by AWS.

Personally, I’m really excited to use X-Ray in all of my upcoming projects because it really will save me some time and effort debugging and operating. I look forward to seeing what our customers can build with it as well. If you come up with any cool tricks or hacks please let me know!

– Randall

AWS San Francisco Summit – Summary of Launches and Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-san-francisco-summit-summary-of-launches-and-announcements/

Many of my colleagues are in San Francisco for today’s AWS Summit. Here’s a summary of what we announced from the main stage and in the breakout sessions:

New Services

Newly Available

New Features

Jeff;

 

AWS X-Ray Update – General Availability, Including Lambda Integration

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-x-ray-update-general-availability-including-lambda-integration/

I first told you about AWS X-Ray at AWS re:Invent in my post, AWS X-Ray – See Inside Your Distributed Application. X-Ray allows you to trace requests made to your application as execution traverses Amazon EC2 instances, Amazon ECS containers, microservices, AWS database services, and AWS messaging services. It is designed for development and production use, and can handle simple three-tier applications as well as applications composed of thousands of microservices. As I showed you last year, X-Ray helps you to perform end-to-end tracing of requests, record a representative sample of the traces, see a map of the services and the trace data, and to analyze performance issues and errors. This helps you understand how your application and its underlying services are performing so you can identify and address the root cause of issues.

You can take a look at the full X-Ray walk-through in my earlier post to learn more.

We launched X-Ray in preview form at re:Invent and invited interested developers and architects to start using it. Today we are making the service generally available, with support in the US East (Northern Virginia), US West (Northern California), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), South America (São Paulo), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Sydney), and Asia Pacific (Mumbai) Regions.

New Lambda Integration (Preview)
During the preview period we fine-tuned the service and added AWS Lambda integration, which we are launching today in preview form. Now, Lambda developers can use X-Ray to gain visibility into their function executions and performance. Previously, Lambda customers who wanted to understand their application’s latency breakdown, diagnose slowdowns, or troubleshoot timeouts had to rely on custom logging and analysis.

In order to make use of this new integration, you simply ensure that the functions of interest have execution roles that gives the functions permission to write to X-Ray, and then enable tracing on a function-by-function basis (when you create new functions using the console, the proper permissions are assigned automatically). Then you use the X-Ray service map to see how your requests flow through your Lambda functions, EC2 instances, ECS containers, and so forth. You can identify the services and resources of interest, zoom in, examine detailed timing information, and then remedy the issue.

Each call to a Lambda function generates two or more nodes in the X-Ray map:

Lambda Service – This node represents the time spent within Lambda itself.

User Function – This node represents the execution time of the Lambda function.

Downstream Service Calls – These nodes represent any calls that the Lambda function makes to other services.

To learn more, read Using X-Ray with Lambda.

Now Available
We will begin to charge for the usage of X-Ray on May 1, 2017.

Pricing is based on the number of traces that you record, and the number that you analyze (each trace represent a request made to your application). You can record 100,000 traces and retrieve or scan 1,000,000 traces every month at no charge. Beyond that, you pay $5 for every million traces that you record and $0.50 for every million traces that you retrieve for analysis, with more info available on the AWS X-Ray Pricing page. You can visit the AWS Billing Console to see how many traces you have recorded or accessed (data collection began on March 1, 2017).

Check out AWS X-Ray and the new Lambda integration today and let me know what you think!

Jeff;

 

AWS Lambda – A Look Back at 2016

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/aws-lambda-a-look-back-at-2016/

2016 was an exciting year for AWS Lambda, Amazon API Gateway and serverless compute technology, to say the least. But just in case you have been hiding away and haven’t heard of serverless computing with AWS Lambda and Amazon API Gateway, let me introduce these great services to you.  AWS Lambda lets you run code without provisioning or managing servers, making it a serverless compute service that is event-driven and allows developers to bring their functions to the cloud easily for virtually any type of application or backend.  Amazon API Gateway helps you quickly build highly scalable, secure, and robust APIs at scale and provides the ability to maintain and monitor created APIs.

With the momentum of serverless in 2016, of course, the year had to end with a bang as the AWS team launched some powerful service features at re:Invent to make it even easier to build serverless solutions.  These features include:

Since Jeff has already introduced most of the aforementioned new service features for building distributed applications and microservices like Step Functions, let’s walk-through the last four new features not yet discussed using a common serverless use case example: Real-time Stream Processing.  In our walk-through of the stream processing use case, we will implement a Dead Letter Queue for notifications of errors that may come from the Lambda function processing a stream of data, we will take an existing Lambda function written in Node.js to process the stream and rewrite it using the C# language.  We then will build an example of the monetization of a Lambda backed API using API Gateway’s integration with AWS Marketplace.  This will be exciting, so let’s get started.

During the AWS Developer Days in San Francisco and Austin, I presented an example of leveraging AWS Lambda for real-time stream processing by building a demo showcasing a streaming solution with Twitter Streaming APIs. I will build upon this example to demonstrate the power of Dead Letter Queues (DLQ), C# Support, API Gateway Monetization features, and the open source template for API Gateway Developer Portal.  In the demo, a console or web application streams tweets gathered from the Twitter Streaming API that has the keywords ‘awscloud’ and/or ‘serverless’.  Those tweets are sent real-time to Amazon Kinesis Streams where Lambda detects the new records and processes the stream batch by writing the tweets to the NoSQL database, Amazon DynamoDB.

Now that we understand the real-time streaming process demo’s workflow, let’s take a deeper look at the Lambda function that processes the batch records from Kinesis.  First, you will notice below that the Lambda function, DevDayStreamProcessor, has an event source or trigger that is a Kinesis stream named DevDay2016Stream with a Batch size of 100.  Our Lambda function will poll the stream periodically for new records and automatically read and process batches of records, in this case, the tweets detected on the stream.

Now we will examine our Lambda function code which is written in Node.js 4.3. The section of the Lambda function shown below loops through the batch of tweet records from our Kinesis stream, parses each record, and writes desired tweet information into an array of JSON data. The array of the JSON tweet items is passed to the function, ddbItemsWrite which is outside of our Lambda handler.

'use strict';
console.log('Loading function');

var timestamp;
var twitterID;
var tweetData;
var ddbParams;
var itemNum = 0;
var dataItemsBatch = [];
var dbBatch = [];
var AWS = require('aws-sdk');
var ddbTable = 'TwitterStream';
var dynamoDBClient = new AWS.DynamoDB.DocumentClient();

exports.handler = (event, context, callback) => {
    var counter = 0; 
    
    event.Records.forEach((record) => {
        // Kinesis data is base64 encoded so decode here
        console.log("Base 64 record: " + JSON.stringify(record, null, 2));
        const payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
        console.log('Decoded payload:', payload);
        
        var data = payload.replace(/[\u0000-\u0019]+/g," "); 
        try
        {  tweetData = JSON.parse(data);   }
        catch(err)
        {  callback(err, err.stack);   }
        
        timestamp = "" + new Date().getTime();
        twitterID = tweetData.id.toString();
        itemNum = itemNum+1;
               
         var ddbItem = {
                PutRequest: { 
                    Item: { 
                        TwitterID: twitterID,
                        TwitterUser: tweetData.username.toString(),
                        TwitterUserPic: tweetData.pic,
                        TwitterTime: new Date(tweetData.time.replace(/( \+)/, ' UTC$1')).toLocaleString(), 
                        Tweet: tweetData.text,
                        TweetTopic: tweetData.topic,
                        Tags: (tweetData.hashtags) ? tweetData.hashtags : " ",
                        Location: (tweetData.loc) ? tweetData.loc : " ",
                        Country: (tweetData.country) ? tweetData.country : " ",
                        TimeStamp: timestamp,
                        RecordNum: itemNum
                    }
                }
            };
            
            dataItemsBatch.push(ddbItem);
            counter++;
});
    
    var twitterItems = {}; 
    twitterItems[ddbTable] = dataItemsBatch; 
    ddbItemsWrite(twitterItems, 0, context, callback); 

};

The ddbItemsWrite function shown below will take the array of JSON tweet records processed from the Kinesis stream, and write the records multiple items at a time to our DynamoDB table using batch operations. This function leverages the DynamoDB best practice of retrying unprocessed items by implementing an exponential backoff algorithm to prevent write request failures due to throttling on the individual tables.

 function ddbItemsWrite(items, retries, ddbContext, ddbCallback) 
    { 
        dynamoDBClient.batchWrite({ RequestItems: items }, function(err, data) 
            { 
                if (err) 
                { 
                    console.log('DDB call failed: ' + err, err.stack); 
                    ddbCallback(err, err.stack); 
                } 
                else 
                { 
                    if(Object.keys(data.UnprocessedItems).length) 
                    { 
                        console.log('Unprocessed items remain, retrying.'); 
                        var delay = Math.min(Math.pow(2, retries) * 100, ddbContext.getRemainingTimeInMillis() - 200); 
                        setTimeout(function() {ddbItemsWrite(data.UnprocessedItems, retries + 1, ddbContext, ddbCallback)}, delay); 
                    } 
                    else 
                    { 
                         ddbCallback(null, "Success");
                         console.log("Completed Successfully");
                    } 
                } 
            } 
        );
    }

Currently, this Lambda function works as expected and will successfully process tweets captured in Kinesis from the Twitter Streaming API, however, this function has a flaw that will cause an error to occur when processing batch write requests to our DynamoDB table.  In the Lambda function, the current code does not take into account that the DynamoDB batchWrite function should be comprised of no more than 25 write (put) requests per single call to this function up to 16 MB of data. Therefore, without changing the code appropriately to have the ddbItemsWrite function to handle batches of 25 or have the handler function put items in the array in groups of 25 requests before sending to the ddbItemsWrite function; there will be a validation exception thrown when the batch of tweets items sent is greater than 25.  This is a great example of a bug that is not easily detected in small-scale testing scenarios yet will cause failures under production load.

 

Dead Letter Queues

Now that we are aware of an event that will cause the ddbItemsWrite Lambda function to throw an exception and/or an event that will fail while processing records, we have a first-rate scenario for leveraging Dead Letter Queues (DLQ).

Since AWS Lambda DLQ functionality is only available for asynchronous event sources like Amazon S3, Amazon SNS, AWS IoT or direct asynchronous invocations, and not for streaming event sources such as Amazon Kinesis or Amazon DynamoDB streams; our first step is to break this Lambda function into two functions.  The first Lambda function will handle the processing of the Kinesis stream, and the second Lambda function will take the data processed by the first function and write the tweet information to DynamoDB.  We will then setup our DLQ on the second Lambda function for the error that will occur on writing the batch of tweets to DynamoDB as noted above.

We have two options when setting up a target for our DLQ; Amazon SNS topic or an Amazon SQS queue.  In this walk-through, we will opt for using an Amazon SQS queue.  Therefore, my first step in using DLQ is to create a SQS Standard queue.  A Standard queue type is a queue which has high transactions throughput, a message will be delivered at least once, but another copy of the message may also be delivered, and it is possible that messages might be delivered in an order different from which they were sent.  You can learn more about creating SQS queues and queue type in the Amazon SQS documentation.

Once my queue, StreamDemoDLQ, is created, I will grab the ARN from the Details tab of this selected queue. If I am not using the console to designate the DLQ resource for this function, I will need the ARN for the queue for my Lambda function to identify this SQS queue as the DLQ target for error and event failure notifications. Additionally, I will use the ARN to add permissions to my Lambda execution role policy in order to access this SQS queue.

I will now return to my Lambda function and select the Configuration tab and expand the Advanced settings section. I will select SQS in the DLQ Resource field and select my StreamDemoDLQ queue in the SQS Queue field dropdown.

Remember, the execution role for the Lambda function must explicitly provide sqs:SendMessage access permissions to in order to successfully send messages to your SQS DLQ.  Therefore, I ensured that my Lambda role, lambda_kinesis_role, has the following IAM policy for SQS permissions.

 

We have now successfully configured a Dead Letter Queue for our Lambda function using Amazon SQS. To learn more about Dead Letter Queues in Lambda, read the Troubleshooting and Monitoring section of the AWS Lambda Developer Guide and check out the AWS Compute Blog post on Dead Letter Queues.

C# Support

As I mentioned earlier, another very exciting feature added to Lambda during AWS re:Invent was the support for the C# language via the open source .NET Core 1.0 platform.  Since the Lambda console does not offer editing for compiled languages yet, in order to author a C# Lambda function you can use tooling in Visual Studio with the AWS Toolkit, Yeoman, and/or the .NET CLI.  To deploy Lambda functions written in C#, you can use the Lambda plugin in the AWS ToolKit for Visual Studio or create a deployment package with the .NET Core command line.

A C# Lambda function handler should be defined as an instance or static method in a class. There are two handler function parameters; the first is the input type which is the event data and second is the Lambda context object of type ILambdaContext. The event data input object types for AWS Services include the following:

  • Amazon.Lambda.APIGatewayEvents
  • Amazon.Lambda.CognitoEvents
  • Amazon.Lambda.ConfigEvents
  • Amazon.Lambda.DynamoDBEvents
  • Amazon.Lambda.KinesisEvents
  • Amazon.Lambda.S3Events
  • Amazon.Lambda.SNSEvents

Now that we have discussed more detail around C# Support in Lambda, let’s rewrite our DevDayStreamProcessor lambda function with the C# language. For this example, I will use Visual Studio IDE to write the Lambda function, and additionally take advantage of the AWS Lambda Visual Studio plugin to deploy the function. Remember in order to use the AWS Toolkit for Visual Studio with Lambda, you will need to have Visual Studio 2015 Update 3 version and NET Core tools. You can read more about installing Visual Studio 2015 Update 3 and .NET Core here.

To create the C# function using Visual Studio, I start a New Project, select AWS Lambda Project (.NET Core) and name it ServerlessStreamProcessor.

What’s really cool about taking advantage of the AWS Toolkit for Visual Studio to author this function, is that inside of Visual Studio I can use Lambda blueprints to get started in a similar way that I would in using the Lambda console.  Therefore in order to replicate the DevDayStreamProcessor in C#, I will select the Simple Kinesis Function blueprint.

It should be noted that when writing Lambda functions in C#, there is no need to mark the class declaration nor the target handler function as a Lambda function. Additionally, when writing CloudWatch logs you can use the standard C# Console class WriteLine function or use the ILambdaContext LogLine function found as a part of the ILambdaContext interface. With the template for accessing the Kinesis stream in place, I finish writing the C# Lambda function, ServerlessStreamProcessor, utilizing the same variable names as in the Node.js code in DevDayStreamProcessor. Please note the C# Lambda handler function below.

using System.Collections.Generic;
using Amazon.Lambda.Core;
using Amazon.Lambda.KinesisEvents;
using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.DataModel;
using Newtonsoft.Json.Linq;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializerAttribute(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]

namespace ServerlessStreamProcessor
{
    public class LambdaTwitterStream
    {
        string twitterID, timeStamp;
        int itemNum = 0;
        
        private static AmazonDynamoDBClient dynamoDBClient = new AmazonDynamoDBClient();
        List<TwitterItem> dataItemsBatch = new List<TwitterItem>();
        
        public void FunctionHandler(KinesisEvent kinesisEvent, ILambdaContext context)
        {
            DynamoDBContext dbContext = new DynamoDBContext(dynamoDBClient);
            context.Logger.LogLine($"Beginning to process {kinesisEvent.Records.Count} records...");
            
            foreach (var record in kinesisEvent.Records)
            {
                context.Logger.LogLine($"Event ID: {record.EventId}");
                context.Logger.LogLine($"Event Name: {record.EventName}");

                // Kinesis data is base64 encoded so decode here
                string tweetData = GetRecordContents(record.Kinesis);
                context.Logger.LogLine($"Decoded Payload: {tweetData}");
                tweetData = @"" + tweetData;
                JObject twitterObj = JObject.Parse(tweetData);
                
                twitterID = twitterObj["id"].ToString();
                timeStamp = DateTime.Now.Millisecond.ToString();
                itemNum++;
                context.Logger.LogLine(timeStamp);
                context.Logger.LogLine($"Twitter ID is: {twitterID}");
                context.Logger.LogLine(itemNum.ToString());

                TwitterItem ddbItem = new TwitterItem()
                { 
                    TwitterID = twitterID,
                    TwitterUser = twitterObj["username"].ToString(),
                    TwitterUserPic = twitterObj["pic"].ToString(),
                    TwitterTime = DateTime.Parse(twitterObj["time"].ToString()).ToUniversalTime().ToString(),
                    Tweet = twitterObj["text"].ToString(),
                    TweetTopic = twitterObj["topic"].ToString(),
                    Tags = twitterObj["hashtags"] != null ? twitterObj["hashtags"].ToString() : String.Empty,
                    Location = twitterObj["loc"] != null ? twitterObj["loc"].ToString() : String.Empty,
                    Country = twitterObj["country"] != null ? twitterObj["country"].ToString() : String.Empty,
                    TimeStamp =  timeStamp,
                    RecordNum = itemNum
                };
                
                dataItemsBatch.Add(ddbItem);
            }

            context.Logger.LogLine(JObject.FromObject(dataItemsBatch).ToString());
            ddbItemsWrite(dataItemsBatch, 0, dbContext, context);
            context.Logger.LogLine("Success - Completed Successfully");
            context.Logger.LogLine("Stream processing complete.");
        }

There are only a few differences that should be noted between our Kinesis stream processor written in C# and our original Node.js code.  Since the input parameter type supported by default in C# Lambda functions is the System.IO.Stream type, the Kinesis base64 string is decoded by using a StreamReader with ASCII encoding in a blueprint provided function, GetRecordContents.

 

private string GetRecordContents(KinesisEvent.Record streamRecord)
{
    using (var reader = new StreamReader(streamRecord.Data, Encoding.ASCII))
    {
        return reader.ReadToEnd();
    }
}

The other thing to note is that in order to write the tweet data to the DynamoDB Table, I added the AWS .NET SDK NuGet package for DynamoDB; AWSSDK.DynamoDBv2 to the Lambda function project via the NuGet package manager within Visual Studio.  I also created a .NET data object, TwitterItem, to map to the data being stored in the DynamoDB table. Using the AWS .NET SDK higher level programming interface, object persistence model for DynamoDB, I created a collection of TwitterItem objects to be written via the BatchWrite object class in our ddbItemsWrite C# function.

private async void ddbItemsWrite(List<TwitterItem> items, int retries, DynamoDBContext ddbContext, ILambdaContext context)
{
BatchWrite<TwitterItem> twitterStreamBatchWrite = ddbContext.CreateBatchWrite<TwitterItem>();
        
        try
        {
            twitterStreamBatchWrite.AddPutItems(items);   
            await twitterStreamBatchWrite.ExecuteAsync();
        }
        catch (Exception ex)
        {
            context.Logger.LogLine($"DDB call failed: {ex.Source} ");
            context.Logger.LogLine($"Exception: {ex.Message}");
            context.Logger.LogLine($"Exception Stacktrace: {ex.StackTrace}");
        }      
}

Another benefit of using AWS Toolkit for Visual Studio to author my C# Lambda function is that I can deploy my Lambda function directly to AWS with a single click.  Selecting my project name in the Solution Explorer and performing a right-click, I get a menu option, Publish to AWS Lambda, which brings up a menu for information to include about my Lambda function for deployment to AWS.

It is important to note that the handler function signature follows the nomenclature of Assembly :: Namespace :: ClassName :: Method, therefore, the signature of our C# Lambda function shown here is: ServerlessStreamProcessor :: ServerlessStreamProcessor.LambdaTwitterStream :: FunctionHandler.  We provide this information to the Upload to AWS Lambda dialog box and select Next to assign a role for the function.

Upon completion, you can test in the Lambda console or in Visual Studio with AWS toolkit provided plugin (shown below) using the sample data of the triggering event source for an iterative approach to developing the Lambda function.

You can learn more about authoring AWS Lambda functions using the C# Language in the AWS Lambda developer guide or by reading the post announcing C# Support on the Compute Blog.

API Gateway Monetization and Developer Portal

If you have been following the microservices momentum, you may be aware of an architectural pattern that calls for using smart endpoints and/or using an API gateway via REST APIs to manage access and exposure of individual services that make up a microservices solution.  Amazon API Gateway enables creation and management of RESTful APIs to expose AWS Lambda functions, external HTTP endpoints, as well as, other AWS services.  In addition, Amazon API Gateway allows clients and external developers to have access to a deployed APIs by via HTTP protocol or a platform/language targeted SDK.

With the introduction of SaaS Subscriptions on AWS Marketplace and the API Gateway integration with the AWS Marketplace, you can now monetize your APIs by allowing customers to directly consume the APIs you create with API Gateway in the AWS Marketplace.  AWS customers can subscribe and be billed for the APIs published on the marketplace with their existing AWS account.  With the integration of API Gateway with the AWS Marketplace, the process to get started is easy on the AWS Marketplace.

To get started, you must ensure that you have enabled the Usage Plan feature in Amazon API Gateway.

Once enabled the next step is to create a Usage Plan, enable throttling (if desired) with targeted rate and burst request thresholds, and finally enable quotas (if you choose) by providing targeted request quota per a set timeframe.

Next, we would choose our APIs and related stage(s) that we wish to be associated with the usage plan. Please note that this is an optional step as you can opt not associate a specific API with your usage plan.

All that is left to do is add or create an API key for the usage plan.  Again, it should be noted that this is also an optional step in creating your usage plan.

Now that we have our usage plan, StreamingPlan, we are ready for the next step in preparation for selling our API on the marketplace. You have the option to create multiple usage plans with varying APIs and limits, and sell these plans as differentiated API products on AWS Marketplace.

In order to enable customers to buy our new API product, however, the AWS Marketplace requires that each API product has an external developer portal to handle subscription requests, provide API information details and ability for the management of usage.

This customer need for an external developer portal for the marketplace birthed the new open source API Gateway developer portal serverless web application implementation.  The goal of the API Gateway developer portal project was to allow customers to follow a few easy steps to create a serverless web application that lists a catalog of your APIs built with API Gateway while allowing for developer signups.

The API Gateway developer portal was built upon AWS Serverless Express; an open source library published by AWS which aids you in utilizing AWS Lambda and Amazon API Gateway in building web applications/services with the Node.js Express framework.  Additionally, the API Gateway developer portal application uses an AWS SAM (Serverless Application Model) template to deploy its serverless resources.  AWS SAM is a simplified CloudFormation template and specification that allows easier management and deployment of serverless applications on AWS.

To build your developer portal using the API Gateway portal, you would start by cloning the aws-api-gateway-developer-portal project from GitHub.

Assuming you have the latest version of the AWS CLI and Node.js installed, you would setup the developer portal by running “npm run setup” on the command line for Mac and Linux OS users. For Windows users, you would run “npm run win-setup” on the command line setup the developer portal.

The result is a functional sample developer portal website running on S3 that you can customize in order to create your own developer portal for your APIs.

The frontend of the sample developer portal website is built with the React JavaScript library, and the backend is an AWS Lambda function running using the aws-serverless-express library. Additionally, a Lambda function with a SNS event source was created as a listener for notification when customers subscribe or unsubscribe to your API via the AWS Marketplace console.  You can learn more about the steps to build, customize, and deploy your API Gateway developer portal web application with this reference project by visiting the AWS Compute blog post which discusses the architecture and implementation in more detail.

 

The next key step in monetizing our API is establishing an account on the AWS Marketplace.  If an account is not already established, registering is simply verifying that you meet the requirement prerequisites provided in the AWS Marketplace Seller Guide and completing a seller registration form on the AWS Marketplace Management Portal.  You can see a snapshot of the start of the seller registration form below.

To list the API, you would fill a product load form describing the API, establish the pricing for the API, and provide t\he IDs of AWS Accounts that will test the API subscription process.  Completing this form would also require you to submit the URL for your API developer portal.

When your seller registration is complete, you will be supplied an AWS Marketplace product code.  You will need to associate your marketplace product code with your API usage plan.  In order to complete this step, you would simply log into the API Gateway console and go to your API usage plan. Go to the Marketplace tab and enter your product code. This tells API Gateway to send measurement data to AWS Marketplace when your API is used.

With your Amazon API Gateway managed API packaged into a usage plan, the accompanying API developer portal created, seller account registration completed, and product code associated with API usage plan; we are now ready to monetize our API on the AWS Marketplace.

Learn more about monetizing your APIs created with API Gateway by checking out the related blog post and reviewing the API Gateway developer guide documentation.

Summary

As you can see, the AWS teams were busy in 2016 working to make the customer experience easier for creating and deploying serverless architectures, as well as, providing mechanisms for customers to generate and monetize their API Gateway managed APIs.

Visit the product documentation for AWS Lambda and Amazon API Gateway to learn more about these services and all the newly released features.

Tara

AWS Webinars – January 2017 (Bonus: December Recap)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-webinars-january-2017-bonus-december-recap/

Have you had time to digest all of the announcements that we made at AWS re:Invent? Are you ready to debug with AWS X-Ray, analyze with Amazon QuickSight, or build conversational interfaces using Amazon Lex? Do you want to learn more about AWS Lambda, set up CI/CD with AWS CodeBuild, or use Polly to give your applications a voice?

January Webinars
In our continued quest to provide you with training and education resources, I am pleased to share the webinars that we have set up for January. These are free, but they do fill up and you should definitely register ahead of time. All times are PT and each webinar runs for one hour:

January 16:

January 17:

January 18::

January 19:

January 20

December Webinar Recap
The December webinar series is already complete; here’s a quick recap with links to the recordings:

December 12:

December 13:

December 14:

December 15:

Jeff;

PS – If you want to get a jump start on your 2017 learning objectives, the re:Invent 2016 Presentations and re:Invent 2016 Videos are just a click or two away.

Amazon EC2 Container Service at AWS re:Invent 2016 – Wrap-up

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/compute/amazon-ec2-container-service-at-aws-reinvent-2016-wrap-up/

We wanted to summarize a few of the highlights from this year’s AWS re:Invent.

Announcements

On Thursday December 1, Werner Vogels announced two new features for Amazon ECS.

Blox is a new open source project that enables users to build custom schedulers and other tooling on top of Amazon ECS. Our goal with Blox is to provide tools that simplify the creation of custom schedulers, dashboards and other extensions, so that customers can meet the needs of their specific use cases. Werner also announced that new task placement strategies are coming later this year. Watch the keynote or see the AWS Compute blog for more details on these announcements.

Werner also announced three other services that can be used with Amazon ECS. EC2 Systems Manager parameter store provides a centralized, encrypted store for sensitive information​ that can be used to configure microservices; see the docs for more info. CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages and Docker images that are ready to deploy; see the docs for more info. AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture; see the docs for more info on how to use X-Ray with ECS.

Sessions

There were multiple sessions that included deep information about Amazon ECS:

CON301 – Operations Management with Amazon ECS [video]
CON302 – Development Workflow with Docker and Amazon ECS [video]
CON303 – Introduction to Container Management on AWS [video]
CON307 – Advanced Task Scheduling with Amazon ECS and Blox [video]
CON308 – Service Integration Delivery and Automation Using Amazon ECS [video]
CON309 – Running Microservices on Amazon ECS [video]
CON310 – Running Batch Jobs on Amazon ECS [video]
CON311 – Operations Automation and Infrastructure Management with Amazon ECS [video]
CON312 – Deploying Scalable SAP Hybris Clusters using Docker [video]
CON313 – Netflix: Container Scheduling, Execution, and Integration with AWS [video]
CON316 – State of the Union: Containers [video]
CON401 – Amazon ECR Deep Dive on Image Optimization [video]
CON402 – Securing Container-Based Applications [video]
DEV313 – Infrastructure Continuous Deployment Using AWS CloudFormation [video]
GAM401 – Riot Games: Standardizing Application Deployments Using Amazon ECS and Terraform [video]
NET203 – From EC2 to ECS: How Capital One uses Application Load Balancer Features to Serve Traffic at Scale [video]

We enjoyed meeting everyone at re:Invent and appreciate all the feedback you had about Amazon ECS, and look forward to hearing about how you use the new features we announced.

— The Amazon ECS Team

AWS X-Ray – See Inside of Your Distributed Application

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-x-ray-see-inside-of-your-distributed-application/

From what I can tell, Presidential Medal of Freedom recipient Grace Hopper was the first person to apply the term debugging to the process of identifying and removing errors from programs.

While I have never had to extract an actual bug from a computer, I did spend plenty of time debugging assembly language programs early in my career. Back then, debugging consisted of single-stepping through code, examining the contents of each processor register before and after each step in order to verify that your mental model was in accord with what was actually happening. It was fairly tedious, but it left little room for bugs to hide and rewarded you with an in-depth understanding of how your code worked. Later, single-stepping gave way to debug output (hello, stderr) and from there to log files and log analysis tools.

Over the last decade or two, as complex distributed systems have emerged, debugging has changed and has taken on a new meaning. With unit tests ensuring that individual functions and modules behave as expected, the challenge turns to looking at patterns of behavior at scale. The combination of cloud computing, microservices, and asynchronous, notification-based architectures has brought forth systems that have hundreds or thousands of moving parts. The challenge of identifying and addressing performance issues in these complex systems has only grown, as has the difficulty of aggregating individual, service-level observations into meaningful top-level results. There has been no easy way for developers to “follow-the-thread” as execution traverses EC2 instances, ECS containers, microservices, AWS database and messaging services.

Let’s fix this!

Introducing AWS X-Ray
Today I would like to tell you about AWS X-Ray.  We have made it possible for you to trace requests from beginning to end across all of the touch-points that I just mentioned. It addresses the problems that come about when you want to understand and improve distributed systems at scale, and gives you the information and the insights that you need to have in order to do this.

X-Ray captures trace data from code running on EC2 instances (including ECS containers), AWS Elastic Beanstalk, Amazon API Gateway, and more. It implements follow-the-thread tracing by adding an HTTP header (including a unique ID) to requests that do not already have one, and passing the header along to additional tiers of request handlers. The data collected at each point is called a segment, and is stored as a chunk of JSON data. A segment represents a unit of work, and includes request and response timing, along with optional sub-segments that represent smaller work units (down to lines of code, if you supply the proper instrumentation). A statistically meaningful sample of the segments are routed to X-Ray (a daemon process handles this on EC2 instances and inside of containers) where it is assembled into traces (groups of segments that share a common ID). The traces are segments are further processed to create service graphs that visually depict the relationship of services to each other.

I spent a few minutes walking through the X-Ray console in order to see how all of this fits together. Along the way I made use of some sample apps that the console offered up for launch on my behalf:

Each sample app is launched by a AWS CloudFormation template. The apps make use of the newest AWS AWS SDKs; these SDKs are X-Ray aware and participate in the process of collecting and storing X-Ray segments. The Java, Node.js, and .NET SDKs now include support for X-Ray; we’ll be updating the others as soon as possible.

When you are ready to instrument and run your own applications, the X-Ray Console will show you what you have to do:

I launched a pair of apps, ran them for a bit, and then hopped over to the X-Ray Console to see what was happening.  The Service Map gives me a top-level view:

I can use the date/time range selector to indicate the time frame of interest:

I can click on any node in the graph in order to take a look at the traces behind it:

At the top of the page I can see that the signup operation, while infrequent (4.12% of the traces) has higher latency than the other two operations.First I sort by URL to group the signup operations together, and then I look at the segments that contribute to the particular trace. Here’s one that includes calls to DynamoDB and SNS:

This shows me that, when invoked from the entry point of interest, the call to DynamoDB is taking a long time. Calls to DynamoDB run in single-digit milliseconds so I should take a closer look. I focus on the Meta column, click on the document icon, and then examine the Resources tab to see what’s going on:

Looks like the client SDK is doing some retries, mostly likely because the table should be provisioned for additional read or write throughput.

The X-Ray UI is built around the concept of filter expressions. There are dedicated UI elements for a few key features, but the rest (as befits a developer-oriented tool) is powered by free-form filters that you simply enter in the text box at the top of the page. Here are a few very simple examples:

  • responsetime > 5 – Response time more than 5 seconds.
  • duration >= 5 AND duration <= 8 – Duration between 5 and 8 seconds.
  • service("dynamodb") – Requests that include a call to DynamoDB.

You can also filter by dates, trace IDs, HTTP methods & status codes, URLs, user agents, client IP addresses, and much more.

Everything that I have shown you (and a whole lot more) is also accessible from the X-Ray APIs and the AWS Command Line Interface (CLI). This should open the door to all sorts of high-level tools, visualizations, and partner opportunities. Leave me a comment and let me know what you build!

Available Now
AWS X-Ray is available in preview form now in all 12 public AWS Regions and you can start using it today!

Jeff;