Tag Archives: DynamoDB

Announcing the Winners of the AWS Chatbot Challenge – Conversational, Intelligent Chatbots using Amazon Lex and AWS Lambda

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/announcing-the-winners-of-the-aws-chatbot-challenge-conversational-intelligent-chatbots-using-amazon-lex-and-aws-lambda/

A couple of months ago on the blog, I announced the AWS Chatbot Challenge in conjunction with Slack. The AWS Chatbot Challenge was an opportunity to build a unique chatbot that helped to solve a problem or that would add value for its prospective users. The mission was to build a conversational, natural language chatbot using Amazon Lex and leverage Lex’s integration with AWS Lambda to execute logic or data processing on the backend.

I know that you all have been anxiously waiting to hear announcements of who were the winners of the AWS Chatbot Challenge as much as I was. Well wait no longer, the winners of the AWS Chatbot Challenge have been decided.

May I have the Envelope Please? (The Trumpets sound)

The winners of the AWS Chatbot Challenge are:

  • First Place: BuildFax Counts by Joe Emison
  • Second Place: Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
  • Third Place: PFMBot by Benny Leong and his team from MoneyLion.
  • Large Organization Winner: ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang


Diving into the Winning Chatbot Projects

Let’s take a walkthrough of the details for each of the winning projects to get a view of what made these chatbots distinctive, as well as, learn more about the technologies used to implement the chatbot solution.


BuildFax Counts by Joe Emison

The BuildFax Counts bot was created as a real solution for the BuildFax company to decrease the amount the time that sales and marketing teams can get answers on permits or properties with permits meet certain criteria.

BuildFax, a company co-founded by bot developer Joe Emison, has the only national database of building permits, which updates data from approximately half of the United States on a monthly basis. In order to accommodate the many requests that come in from the sales and marketing team regarding permit information, BuildFax has a technical sales support team that fulfills these requests sent to a ticketing system by manually writing SQL queries that run across the shards of the BuildFax databases. Since there are a large number of requests received by the internal sales support team and due to the manual nature of setting up the queries, it may take several days for getting the sales and marketing teams to receive an answer.

The BuildFax Counts chatbot solves this problem by taking the permit inquiry that would normally be sent into a ticket from the sales and marketing team, as input from Slack to the chatbot. Once the inquiry is submitted into Slack, a query executes and the inquiry results are returned immediately.

Joe built this solution by first creating a nightly export of the data in their BuildFax MySQL RDS database to CSV files that are stored in Amazon S3. From the exported CSV files, an Amazon Athena table was created in order to run quick and efficient queries on the data. He then used Amazon Lex to create a bot to handle the common questions and criteria that may be asked by the sales and marketing teams when seeking data from the BuildFax database by modeling the language used from the BuildFax ticketing system. He added several different sample utterances and slot types; both custom and Lex provided, in order to correctly parse every question and criteria combination that could be received from an inquiry.  Using Lambda, Joe created a Javascript Lambda function that receives information from the Lex intent and used it to build a SQL statement that runs against the aforementioned Athena database using the AWS SDK for JavaScript in Node.js library to return inquiry count result and SQL statement used.

The BuildFax Counts bot is used today for the BuildFax sales and marketing team to get back data on inquiries immediately that previously took up to a week to receive results.

Not only is BuildFax Counts bot our 1st place winner and wonderful solution, but its creator, Joe Emison, is a great guy.  Joe has opted to donate his prize; the $5,000 cash, the $2,500 in AWS Credits, and one re:Invent ticket to the Black Girls Code organization. I must say, you rock Joe for helping these kids get access and exposure to technology.


Hubsy by Andrew Riess, Andrew Puch, and John Wetzel

Hubsy bot was created to redefine and personalize the way users traditionally manage their HubSpot account. HubSpot is a SaaS system providing marketing, sales, and CRM software. Hubsy allows users of HubSpot to create engagements and log engagements with customers, provide sales teams with deals status, and retrieves client contact information quickly. Hubsy uses Amazon Lex’s conversational interface to execute commands from the HubSpot API so that users can gain insights, store and retrieve data, and manage tasks directly from Facebook, Slack, or Alexa.

In order to implement the Hubsy chatbot, Andrew and the team members used AWS Lambda to create a Lambda function with Node.js to parse the users request and call the HubSpot API, which will fulfill the initial request or return back to the user asking for more information. Terraform was used to automatically setup and update Lambda, CloudWatch logs, as well as, IAM profiles. Amazon Lex was used to build the conversational piece of the bot, which creates the utterances that a person on a sales team would likely say when seeking information from HubSpot. To integrate with Alexa, the Amazon Alexa skill builder was used to create an Alexa skill which was tested on an Echo Dot. Cloudwatch Logs are used to log the Lambda function information to CloudWatch in order to debug different parts of the Lex intents. In order to validate the code before the Terraform deployment, ESLint was additionally used to ensure the code was linted and proper development standards were followed.


PFMBot by Benny Leong and his team from MoneyLion

PFMBot, Personal Finance Management Bot,  is a bot to be used with the MoneyLion finance group which offers customers online financial products; loans, credit monitoring, and free credit score service to improve the financial health of their customers. Once a user signs up an account on the MoneyLion app or website, the user has the option to link their bank accounts with the MoneyLion APIs. Once the bank account is linked to the APIs, the user will be able to login to their MoneyLion account and start having a conversation with the PFMBot based on their bank account information.

The PFMBot UI has a web interface built with using Javascript integration. The chatbot was created using Amazon Lex to build utterances based on the possible inquiries about the user’s MoneyLion bank account. PFMBot uses the Lex built-in AMAZON slots and parsed and converted the values from the built-in slots to pass to AWS Lambda. The AWS Lambda functions interacting with Amazon Lex are Java-based Lambda functions which call the MoneyLion Java-based internal APIs running on Spring Boot. These APIs obtain account data and related bank account information from the MoneyLion MySQL Database.


ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

ADP PI (Payroll Innovation) bot is designed to help employees of ADP customers easily review their own payroll details and compare different payroll data by just asking the bot for results. The ADP PI Bot additionally offers issue reporting functionality for employees to report payroll issues and aids HR managers in quickly receiving and organizing any reported payroll issues.

The ADP Payroll Innovation bot is an ecosystem for the ADP payroll consisting of two chatbots, which includes ADP PI Bot for external clients (employees and HR managers), and ADP PI DevOps Bot for internal ADP DevOps team.

The architecture for the ADP PI DevOps bot is different architecture from the ADP PI bot shown above as it is deployed internally to ADP. The ADP PI DevOps bot allows input from both Slack and Alexa. When input comes into Slack, Slack sends the request to Lex for it to process the utterance. Lex then calls the Lambda backend, which obtains ADP data sitting in the ADP VPC running within an Amazon VPC. When input comes in from Alexa, a Lambda function is called that also obtains data from the ADP VPC running on AWS.

The architecture for the ADP PI bot consists of users entering in requests and/or entering issues via Slack. When requests/issues are entered via Slack, the Slack APIs communicate via Amazon API Gateway to AWS Lambda. The Lambda function either writes data into one of the Amazon DynamoDB databases for recording issues and/or sending issues or it sends the request to Lex. When sending issues, DynamoDB integrates with Trello to keep HR Managers abreast of the escalated issues. Once the request data is sent from Lambda to Lex, Lex processes the utterance and calls another Lambda function that integrates with the ADP API and it calls ADP data from within the ADP VPC, which runs on Amazon Virtual Private Cloud (VPC).

Python and Node.js were the chosen languages for the development of the bots.

The ADP PI bot ecosystem has the following functional groupings:

Employee Functionality

  • Summarize Payrolls
  • Compare Payrolls
  • Escalate Issues
  • Evolve PI Bot

HR Manager Functionality

  • Bot Management
  • Audit and Feedback

DevOps Functionality

  • Reduce call volume in service centers (ADP PI Bot).
  • Track issues and generate reports (ADP PI Bot).
  • Monitor jobs for various environment (ADP PI DevOps Bot)
  • View job dashboards (ADP PI DevOps Bot)
  • Query job details (ADP PI DevOps Bot)



Let’s all wish all the winners of the AWS Chatbot Challenge hearty congratulations on their excellent projects.

You can review more details on the winning projects, as well as, all of the submissions to the AWS Chatbot Challenge at: https://awschatbot2017.devpost.com/submissions. If you are curious on the details of Chatbot challenge contest including resources, rules, prizes, and judges, you can review the original challenge website here:  https://awschatbot2017.devpost.com/.

Hopefully, you are just as inspired as I am to build your own chatbot using Lex and Lambda. For more information, take a look at the Amazon Lex developer guide or the AWS AI blog on Building Better Bots Using Amazon Lex (Part 1)

Chat with you soon!


New – VPC Endpoints for DynamoDB

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-vpc-endpoints-for-dynamodb/

Starting today Amazon Virtual Private Cloud (VPC) Endpoints for Amazon DynamoDB are available in all public AWS regions. You can provision an endpoint right away using the AWS Management Console or the AWS Command Line Interface (CLI). There are no additional costs for a VPC Endpoint for DynamoDB.

Many AWS customers run their applications within a Amazon Virtual Private Cloud (VPC) for security or isolation reasons. Previously, if you wanted your EC2 instances in your VPC to be able to access DynamoDB, you had two options. You could use an Internet Gateway (with a NAT Gateway or assigning your instances public IPs) or you could route all of your traffic to your local infrastructure via VPN or AWS Direct Connect and then back to DynamoDB. Both of these solutions had security and throughput implications and it could be difficult to configure NACLs or security groups to restrict access to just DynamoDB. Here is a picture of the old infrastructure.

Creating an Endpoint

Let’s create a VPC Endpoint for DynamoDB. We can make sure our region supports the endpoint with the DescribeVpcEndpointServices API call.

aws ec2 describe-vpc-endpoint-services --region us-east-1
    "ServiceNames": [

Great, so I know my region supports these endpoints and I know what my regional endpoint is. I can grab one of my VPCs and provision an endpoint with a quick call to the CLI or through the console. Let me show you how to use the console.

First I’ll navigate to the VPC console and select “Endpoints” in the sidebar. From there I’ll click “Create Endpoint” which brings me to this handy console.

You’ll notice the AWS Identity and Access Management (IAM) policy section for the endpoint. This supports all of the fine grained access control that DynamoDB supports in regular IAM policies and you can restrict access based on IAM policy conditions.

For now I’ll give full access to my instances within this VPC and click “Next Step”.

This brings me to a list of route tables in my VPC and asks me which of these route tables I want to assign my endpoint to. I’ll select one of them and click “Create Endpoint”.

Keep in mind the note of warning in the console: if you have source restrictions to DynamoDB based on public IP addresses the source IP of your instances accessing DynamoDB will now be their private IP addresses.

After adding the VPC Endpoint for DynamoDB to our VPC our infrastructure looks like this.

That’s it folks! It’s that easy. It’s provided at no cost. Go ahead and start using it today. If you need more details you can read the docs here.

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
    Type: "AWS::Serverless::SimpleTable"
    Type: "AWS::Serverless::Function"
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
          TABLE_NAME: !Ref VotesTable
          Type: Api
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
            body = json.loads(event['body'])
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.


New – Amazon Connect and Amazon Lex Integration

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-amazon-connect-and-amazon-lex-integration/

I’m really excited to share some recent enhancements to two of my favorite services: Amazon Connect and Amazon Lex. Amazon Connect is a self-service, cloud-based contact center service that makes it easy for any business to deliver better customer service at lower cost. Amazon Lex is a service for building conversational interfaces using voice and text. By integrating these two services you can take advantage of Lex‘s automatic speech recognition (ASR) and natural language processing/understading (NLU) capabilities to create great self-service experiences for your customers. To enable this integration the Lex team added support for 8kHz speech input – more on that later. Why should you care about this? Well, if the a bot can solve the majority of your customer’s requests your customers spend less time waiting on hold and more time using your products.

If you need some more background on Amazon Connect or Lex I strongly recommend Jeff’s previous posts[1][2] on these services – especially if you like LEGOs.

Let’s dive in and learn to use this new integration. We’ll take an application that we built on our Twitch channel and modify it for this blog. At the application’s core a user calls an Amazon Connect number which connects them to an Lex bot which invokes an AWS Lambda function based on an intent from Lex. So what does our little application do?

I want to finally settle the question of what the best code editor is: I like vim, it’s a spectacular editor that does one job exceptionally well – editing code (it’s the best). My colleague Jeff likes emacs, a great operating system editor… if you were born with extra joints in your fingers. My colleague Tara loves Visual Studio and sublime. Rather than fighting over what the best editor is I thought we might let you, dear reader, vote. Don’t worry you can even vote for butterflies.

Interested in voting? Call +1 614-569-4019 and tell us which editor you’re voting for! We don’t store your number or record your voice so feel free to vote more than once for vim. Want to see the votes live? http://best-editor-ever.s3-website-us-east-1.amazonaws.com/.

Now, how do we build this little contraption? We’ll cover each component but since we’ve talked about Lex and Lambda before we’ll focus mostly on the Amazon Connect component. I’m going to assume you already have a connect instance running.

Amazon Lex

Let’s start with the Lex side of things. We’ll create a bot named VoteEditor with two intents: VoteEditor with a single slot called editor and ConnectToAgent with no slots. We’ll populate our editor slot full of different code editor names (maybe we’ll leave out emacs).

AWS Lambda

Our Lambda function will also be fairly simple. First we’ll create a Amazon DynamoDB table to store our votes. Then we’ll make a helper method to respond to Lex (build_response) – it will just wrap our message in a Lex friendly response format. Now we just have to figure out our flow logic.

def lambda_handler(event, context):
    if 'ConnectToAgent' == event['currentIntent']['name']:
        return build_response("Ok, connecting you to an agent.")
    elif 'VoteEditor' == event['currentIntent']['name']:
        editor = event['currentIntent']['slots']['editor']
        resp = ddb.update_item(
            Key={"name": editor.lower()},
            UpdateExpression="SET votes = :incr + if_not_exists(votes, :default)",
            ExpressionAttributeValues={":incr": 1, ":default": 0},
        msg = "Awesome, now {} has {} votes!".format(
        return build_response(msg)

Let’s make sure we understand the code. So, if we got a vote for an editor and it doesn’t exist yet then we add that editor with 1 vote. Otherwise we increase the number of votes on that editor by 1. If we get a request for an agent, we terminate the flow with a nice message. Easy. Now we just tell our Lex bot to use our Lambda function to fulfill our intents. We can test that everything is working over text in the Lex console before moving on.

Amazon Connect

Before we can use our Lex bot in a Contact Flow we have to make sure our Amazon Connect instance has access to it. We can do this by hopping over to the Amazon Connect service console, selecting our instance, and navigating to “Contact Flows”. There should be a section called Lex where you can add your bots!

Now that our Amazon Connect instance can invoke our Lex bot we can create a new Contact Flow that contains our Lex bot. We add the bot to our flow through the “Get customer input” widget from the “Interact” category.

Once we’re on the widget we have a “DTMF” tab for taking input from number keys on a phone or the “Amazon Lex” tab for taking voiceinput and passing it to the Lex service. We’ll use the Lex tab and put in some configuration.

Lots of options, but in short we add the bot we want to use (including the version of the bot), the intents we want to use from our bot, and a short prompt to introduce the bot (and mayb prompt the customer for input).

Our final contact flow looks like this:

A real world example might allow a customer to perform many transactions through a Lex bot. Then on an error or ConnectToAgent intent put the customer into a queue where they could talk to a real person. It could collect and store information about users and populate a rich interface for an agent to use so they could jump right into the conversation with all the context they need.

I want to especially highlight the advantage of 8kHz audio support in Lex. Lex originally only supported speech input that was sampled at a higher rate than the 8 kHz input from the phone. Modern digital communication appliations typically use audio signals sampled at a minimum of 16 kHz. This higher fidelity recroding makes it easier differentiate between sounds like “ess” (/s/) and “eff” (/f/) – or so the audio experts tell me. Phones, however, use a much lower quality recording. Humans, and their ears, are pretty good at using surrounding words to figure out what a voice is saying from a lower quality recording (just check the NASA apollo recordings for proof of this). Most digital phone systems are setup to use 8 kHz sampling by default – it’s a nice tradeoff in bandwidth and fidelity. That’s why your voice sometimes sounds different on the phone. On top of this fundmental sampling rate issue you also have to deal with the fact that a lot of phone call data is already lossy (can you hear me now?). There are thousands of different devices from hundreds of different manufacturers, and tons of different software implentations. So… how do you solve this recognition issue?

The Lex team decided that the best way to address this was to expand the set of models they were using for speech recognition to include an 8kHz model. Support for an 8 kHz telephony audio sampling rate provides increased speech recognition accuracy and fidelity for your contact center interactions. This was a great effort by the team that enables a lot of customers to do more with Amazon Connect.

One final note is that Amazon Connect uses the exact same PostContent endpoint that you can use as an external developer so you don’t have to be a Amazon Connect user to take advantage of this 8kHz feature in Lex.

I hope you guys enjoyed this post and as always the real details are in the docs and API Reference.


Newly Updated: Example AWS IAM Policies for You to Use and Customize

Post Syndicated from Deren Smith original https://aws.amazon.com/blogs/security/newly-updated-example-policies-for-you-to-use-and-customize/

To help you grant access to specific resources and conditions, the Example Policies page in the AWS Identity and Access Management (IAM) documentation now includes more than thirty policies for you to use or customize to meet your permissions requirements. The AWS Support team developed these policies from their experiences working with AWS customers over the years. The example policies cover common permissions use cases you might encounter across services such as Amazon DynamoDB, Amazon EC2, AWS Elastic Beanstalk, Amazon RDS, Amazon S3, and IAM.

In this blog post, I introduce the updated Example Policies page and explain how to use and customize these policies for your needs.

The new Example Policies page

The Example Policies page in the IAM User Guide now provides an overview of the example policies and includes a link to view each policy on a separate page. Note that each of these policies has been reviewed and approved by AWS Support. If you would like to submit a policy that you have found to be particularly useful, post it on the IAM forum.

To give you an idea of the policies we have included on this page, the following are a few of the EC2 policies on the page:

To see the full list of available policies, see the Example Polices page.

In the following section, I demonstrate how to use a policy from the Example Policies page and customize it for your needs.

How to customize an example policy for your needs

Suppose you want to allow an IAM user, Bob, to start and stop EC2 instances with a specific resource tag. After looking through the Example Policies page, you see the policy, Allows Starting or Stopping EC2 Instances a User Has Tagged, Programmatically and in the Console.

To apply this policy to your specific use case:

  1. Navigate to the Policies section of the IAM console.
  2. Choose Create policy.
    Screenshot of choosing "Create policy"
  3. Choose the Select button next to Create Your Own Policy. You will see an empty policy document with boxes for Policy Name, Description, and Policy Document, as shown in the following screenshot.
  4. Type a name for the policy, copy the policy from the Example Policies page, and paste the policy in the Policy Document box. In this example, I use “start-stop-instances-for-owner-tag” as the policy name and “Allows users to start or stop instances if the instance tag Owner has the value of their user name” as the description.
  5. Update the placeholder text in the policy (see the full policy that follows this step). For example, replace <REGION> with a region from AWS Regions and Endpoints and <ACCOUNTNUMBER> with your 12-digit account number. The IAM policy variable, ${aws:username}, is a dynamic property in the policy that automatically applies to the user to which it is attached. For example, when the policy is attached to Bob, the policy replaces ${aws:username} with Bob. If you do not want to use the key value pair of Owner and ${aws:username}, you can edit the policy to include your desired key value pair. For example, if you want to use the key value pair, CostCenter:1234, you can modify “ec2:ResourceTag/Owner”: “${aws:username}” to “ec2:ResourceTag/CostCenter”: “1234”.
        "Version": "2012-10-17",
        "Statement": [
          "Effect": "Allow",
          "Action": [
                 "Resource": "arn:aws:ec2:<REGION>:<ACCOUNTNUMBER>:instance/*",
                 "Condition": {
              "StringEquals": {
                  "ec2:ResourceTag/Owner": "${aws:username}"
                 "Effect": "Allow",
                 "Action": "ec2:DescribeInstances",
                 "Resource": "*"

  6. After you have edited the policy, choose Create policy.

You have created a policy that allows an IAM user to stop and start EC2 instances in your account, as long as these instances have the correct resource tag and the policy is attached to your IAM users. You also can attach this policy to an IAM group and apply the policy to users by adding them to that group.


We updated the Example Policies page in the IAM User Guide so that you have a central location where you can find examples of the most commonly requested and used IAM policies. In addition to these example policies, we recommend that you review the list of AWS managed policies, including the AWS managed policies for job functions. You can choose these predefined policies from the IAM console and associate them with your IAM users, groups, and roles.

We will add more IAM policies to the Example Policies page over time. If you have a useful policy you would like to share with others, post it on the IAM forum. If you have comments about this post, submit them in the “Comments” section below.

– Deren

New – Target Tracking Policies for EC2 Auto Scaling

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-target-tracking-policies-for-ec2-auto-scaling/

I recently told you about DynamoDB Auto Scaling and showed you how it uses multiple CloudWatch Alarms to automate capacity management for DynamoDB tables. Behind the scenes, this feature makes use of a more general Application Auto Scaling model that we plan to put to use across several different AWS services over time.

The new Auto Scaling model includes an important new feature that we call target tracking. When you create an Auto Scaling policy that makes use of target tracking, you choose a target value for a particular CloudWatch metric. Auto Scaling then turns the appropriate knob (so to speak) to drive the metric toward the target, while also adjusting the relevant CloudWatch Alarms. Specifying your desired target, in whatever metrics-driven units make sense for your application, is generally easier and more direct than setting up ranges and thresholds manually using the original step scaling policy type. However, you can use target tracking in conjunction with step scaling in order to implement an advanced scaling strategy. For example, you could use target tracking for scale-out operations and step scaling for scale-in.

Now for EC2
Today we are adding target tracking support to EC2 Auto Scaling. You can now create scaling policies that are driven by application load balancer request counts, CPU load, network traffic, or a custom metric (the Request Count per Target metric is new, and is also part of today’s launch):

These metrics share an important property: adding additional EC2 instances will (with no changes in overall load) drive the metric down, and vice versa.

To create an Auto Scaling Group that makes use of target tracking, you simply enter a name for the policy, choose a metric, and set the desired target value:

You have the option to disable the scale-in side of the policy. If you do this, you can scale-in manually or use a separate policy.

You can create target tracking policies using the AWS Management Console, AWS Command Line Interface (CLI), or the AWS SDKs.

Here are a couple of things to keep in mind as you look forward to using target tracking:

  • You can track more than one target in a single Auto Scaling Group as long as each one references a distinct metric. Scaling will always choose the policy that drives the highest capacity.
  • Scaling will not take place if the metric has insufficient data.
  • Auto Scaling compensates for rapid, transient fluctuations in the metrics, and strives to minimize corresponding fluctuations in capacity.
  • You can set up target tracking for a custom metric through the Auto Scaling API or the AWS Command Line Interface (CLI).
  • In most cases you should elect to scale on metrics that are published with 1-minute frequency (also known as detailed monitoring). Using 5-minute metrics as the basis for scaling will result in a slower response time.

Now Available
This new feature is available now and you can start using it today at no extra charge. To learn more, read about Target Tracking Scaling in the Auto Scaling User Guide.


Take the Journey: Build Your First Serverless Web Application

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/build-your-first-serverless-application/

I realized at a young age that I really liked writing those special statements that would control the computer and make it work in the manner in which I desired. This technique of controlling the computer and building things on the machine, I learned from my teachers was called writing code, and it fascinated me. Even now, what seems like centuries later, I still get the thrill of writing code, building cool solutions, and tackling all the associated challenges of this craft. It is no wonder then, that I am a huge fan of serverless computing and serverless architectures.

Serverless Computing allows me to do what I enjoy, which is write code, without having to provision and/or configure servers. Using the AWS Serverless Platform means that all the heavy lifting of server management is handled by AWS, allowing you to focus on building your application.

If you enjoy coding like I do and have yet to dive into building serverless applications, boy do I have some sensational news for you. You can build your own serverless web application with our new Serverless Web Application Guide, which provides step-by-step instructions for you to create and deploy your serverless web application on AWS.


The Serverless Web Application Guide is a hands-on tutorial that will assist you in building a fully scalable, serverless web application using the following AWS Services:

  • AWS Lambda: a managed service for serverless compute that allows you to run code without provisioning or managing servers
  • Amazon S3: a managed service that provides simple, durable, scalable object storage
  • Amazon Cognito: a managed service that allows you to add user sign-up, and data synchronization to your application
  • Amazon API Gateway: a managed service which you can create, publish, and maintain secure APIs
  • Amazon DynamoDB: a fast and flexible NoSQL managed cloud database with support for various document and key-value storage models

The application you will build is a simple web application designed for a fictional transportation service. The application will enable users to register and login into the website to request rides from a very unique transportation fleet. You will accomplish this by using the aforementioned AWS services with the serverless application architecture shown in the diagram below.

The guide breaks up the each step to build your serverless web application into five separate modules.


  1. Static Web Hosting: Amazon S3 hosts static web resources including HTML, CSS, JavaScript, and image files that are loaded in the user’s browser.
  2. User Management: Amazon Cognito provides user management and authentication functions to secure the backend API.
  3. Serverless Backend: Amazon DynamoDB provides a persistence layer where data can be stored by the API’s Lambda function.
  4. RESTful APIs: JavaScript executed in the browser sends and receives data from a public backend API built using AWS Lambda and API Gateway.
  5. Resource Cleanup: All the resources created throughout the tutorial will be terminated.

To be successful in building the application, you must remember to complete each module in sequential order, as the modules are dependent on resources created in the previous one. Some of the guide’s modules provide CloudFormation templates to aid you in generating the necessary resources to build the application if you do not wish to create them manually.



Now that you know all about this fantastic new guide for building a serverless web application, you are ready to journey into the world of AWS serverless computing and have some fun writing the code to build the application. The guide is great for beginners and yet still has cool features that even seasoned serverless computing developers will enjoy building. And to top it off, you don’t have to worry about the cost. Each service used is eligible for the AWS Free Tier and is only estimated to cost less than $0.25 if you are outside of Free Tier usage limits.

Take the plunge today and dive into building serverless applications on the AWS serverless platform with this new and exciting Serverless Web Application Guide.



Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

Post Syndicated from Rendy Oka original https://aws.amazon.com/blogs/big-data/analysis-of-top-n-dynamodb-objects-using-amazon-athena-and-amazon-quicksight/

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what is driving the business at any particular moment. For retail companies, for example, understanding which products are currently popular can aid in planning for future growth. Similarly, for PR companies, understanding the impact of an advertising campaign can help them market their products more effectively.

This post covers an architecture that helps you analyze your streaming data. You’ll build a solution using Amazon DynamoDB Streams, AWS Lambda, Amazon Kinesis Firehose, and Amazon Athena to analyze data intake at a frequency that you choose. And because this is a serverless architecture, you can use all of the services here without the need to provision or manage servers.

The data source

You’ll collect a random sampling of tweets via Twitter’s API and store a variety of attributes in your DynamoDB table, such as: Twitter handle, tweet ID, hashtags, location, and Time-To-Live (TTL) value.

In DynamoDB, the primary key is used as an input to an internal hash function. The output from this function determines the partition in which the data will be stored. When using a combination of primary key and sort key as a DynamoDB schema, you need to make sure that no single partition key contains many more objects than the other partition keys because this can cause partition level throttling. For the demonstration in this blog, the Twitter handle will be the primary key and the tweet ID will be the sort key. This allows you to group and sort tweets from each user.

To help you get started, I have written a script that pulls a live Twitter stream that you can use to generate your data. All you need to do is provide your own Twitter Apps credentials, and it should generate the data immediately. Alternatively, I have also provided a script that you can use to generate random Tweets with little effort.

You can find both scripts in the Github repository:


There are some modules that you may need to install to run these scripts. You can find them in Python’s module repository:

To get your own Twitter credentials, go to https://www.twitter.com/ and sign up for a free account, if you don’t already have one. After your account is set up, go to https://apps.twitter.com/. On the main landing page, choose the Create New App button. After the application is created, go to Keys and Access Tokens to get your credentials to use the Twitter API. You’ll need to generate Customer Tokens/Secret and Access Token/Secret. All four keys will be used to authenticate your request.

Architecture overview

Before we begin, let’s take a look at the overall flow of information will look like, from data ingestion into DynamoDB to visualization of results in Amazon QuickSight.

As illustrated in the architecture diagram above, any changes made to the items in DynamoDB will be captured and processed using DynamoDB Streams. Next, a Lambda function will be invoked by a trigger that is configured to respond to events in DynamoDB Streams. The Lambda function processes the data prior to pushing to Amazon Kinesis Firehose, which will output to Amazon S3. Finally, you use Amazon Athena to analyze the streaming data landing in Amazon S3. The result can be explored and visualized in Amazon QuickSight for your company’s business analytics.

You’ll need to implement your custom Lambda function to help transform the raw <key, value> data stored in DynamoDB to a JSON format for Athena to digest, but I can help you with a sample code that you are free to modify.


In the following sections, I’ll walk through how you can set up the architecture discussed earlier.

Create your DynamoDB table

First, let’s create a DynamoDB table and enable DynamoDB Streams. This will enable data to be copied out of this table. From the console, use the user_id as the partition key and tweet_id as the sort key:

After the table is ready, you can enable DynamoDB Streams. This process operates asynchronously, so there is no performance impact on the table when you enable this feature. The easiest way to manage DynamoDB Streams is also through the DynamoDB console.

In the Overview tab of your newly created table, click Manage Stream. In the window, choose the information that will be written to the stream whenever data in the table is added or modified. In this example, you can choose either New image or New and old images.

For more details on this process, check out our documentation:


Configure Kinesis Firehose

Before creating the Lambda function, you need to configure Kinesis Firehose delivery stream so that it’s ready to accept data from Lambda. Open the Firehose console and choose Create Firehose Delivery Stream. From here, choose S3 as the destination and use the following to information to configure the resource. Note the Delivery stream name because you will use it in the next step.

For more details on this process, check out our documentation:


Create your Lambda function

Now that Kinesis Firehose is ready to accept data, you can create your Lambda function.

From the AWS Lambda console, choose the Create a Lambda function button and use the Blank Function. Enter a name and description, and choose Python 2.7 as the Runtime. Note your Lambda function name because you’ll need it in the next step.

In the Lambda function code field, you can paste the script that I have written for this purpose. All this function needs is the name of your Firehose stream name set as an environment variable.

import boto3
import json
import os

# Initiate Firehose client
firehose_client = boto3.client('firehose')

def lambda_handler(event, context):
    records = []
    batch   = []
    try :
        for record in event['Records']:
            tweet = {}
            t_stats = '{ "table_name":"%s", "user_id":"%s", "tweet_id":"%s", "approx_post_time":"%d" }\n' \
                      % ( record['eventSourceARN'].split('/')[1], \
                          record['dynamodb']['Keys']['user_id']['S'], \
                          record['dynamodb']['Keys']['tweet_id']['N'], \
                          int(record['dynamodb']['ApproximateCreationDateTime']) )
            tweet["Data"] = t_stats
        res = firehose_client.put_record_batch(
            DeliveryStreamName = os.environ['firehose_stream_name'],
            Records = batch[0]
        return 'Successfully processed {} records.'.format(len(event['Records']))
    except Exception :

The handler should be set to lambda_function.lambda_handler and you can use the existing lambda_dynamodb_streams role that’s been created by default.

Enable DynamoDB trigger and start collecting data

Everything is ready to go. Open your table using the DynamoDB console and go to the Triggers tab. Select the Create trigger drop down list and choose Existing Lambda function. In the pop-up window, select the function that you just created, and choose the Create button.

At this point, you can start collecting data with the Python script that I’ve provided. The first one will create a script that will pull public Twitter data and the other will generate fake tweets using Lorem Ipsum text.

Configure Amazon Athena to read the data

Next, you will configure Amazon Athena so that it can read the data Kinesis Firehose outputs to Amazon S3 and allow you to analyze the data as needed. You can connect to Athena directly from the Athena console, and you can establish a connection using JDBC or the Athena API. In this example, I’m going to demonstrate what this looks like on the Athena console.

First, create a new database and a new table. You can do this by running the following two queries. The first query creates a new database:


And the second query creates a new table:

CREATE EXTERNAL TABLE IF NOT EXISTS ddbtablestats.twitterfeed (
    `table_name` string,
    `user_id` string,
    `tweet_id` bigint,
    `approx_post_time` timestamp 
    year string,
    month string,
    day string,
    hour string 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://myBucket/dynamodb/streams/transactions/'

Note that this table is created using partitions. Partitioning separates your data into logical parts based on certain criteria, such as date, location, language, etc. This allows Athena to selectively pull your data without needing to process the entire data set. This effectively minimizes the query execution time, and it also allows you to have greater control over the data that you want to query.

After the query has completed, you should be able to see the table in the left side pane of the Athena dashboard.

After the database and table have been created, execute the ALTER TABLE query to populate the partitions in your table. Replace the date with the current date when the script was executed.

ALTER TABLE ddbtablestats.TwitterFeed ADD IF NOT EXISTS
PARTITION (year='2017',month='05',day='17',hour='01') location 's3://myBucket/dynamodb/streams/transactions/2017/05/17/01/'

Using the Athena console, you’ll need to manually populate each partition for each additional partition that you’d like to analyze, however you can programmatically automate this process by using the JDBC driver or any AWS SDK of your choice.

For more information on partitioning in Athena, check out our documentation:


Querying the data in Amazon Athena

This is it! Let’s run this query to see the top 10 most active Twitter users in the last 24 hours. You can do this from the Athena console:

SELECT user_id, COUNT(DISTINCT tweet_id) tweets FROM ddbTableStats.TwitterFeed
WHERE year='2017' AND month='05' AND day='17'
GROUP BY user_id

The result should look similar to the following:

Linking Athena to Amazon QuickSight

Finally, to make this data available to a larger audience, let’s visualize this data in Amazon QuickSight. Amazon QuickSight provides native connectivity to AWS data sources such as Amazon Redshift, Amazon RDS, and Amazon Athena. Amazon QuickSight can also connect to on-premises databases, Excel, or CSV files, and it can connect to cloud data sources such as Salesforce.com. For this solution, we will connect Amazon QuickSight to the Athena table we just created.

Amazon QuickSight has a free tier that provides 1 user and 1GB of SPICE (Superfast Parallel In-memory Calculated Engine) capacity free. So you can sign up and use QuickSight free of charge.

When you are signing up for Amazon QuickSight, ensure that you grant permissions for QuickSight to connect to Athena and the S3 bucket where the data is stored.

After you’ve signed up, navigate to the new analysis button, and choose new data set, and then select the Athena data source option. Create a new name for your data source and proceed to the next prompt. At this point, you should see the Athena table you created earlier.

Choose the option to import the data to SPICE for a quicker analysis. SPICE is an in-memory optimized calculation engine that is designed for quick data visualization through parallel processing. SPICE also enables you to refresh your data sets at a regular interval or on-demand as you want.

In the dialog box, confirm this data set creation, and you’ll arrive on the landing page where you can start building your graph. The X-axis will represent the user_id and the Value will be used to represent the SUM total of the tweets from each user.

The Amazon QuickSight report looks like this:

Through this visualization, I can easily see that there are 3 users that tweeted over 20 times that day and that the majority of the users have fewer than 10 tweets that day. I can also set up a scheduled refresh of my SPICE dataset so that I have a dashboard that is regularly updated with the latest data.

Closing thoughts

Here are the benefits that you can gain from using this architecture:

  1. You can optimize the design of your DynamoDB schema that follows AWS best practice recommendations.
  1. You can run analysis and data intelligence in order to understand the current customer demands for your business.
  1. You can store incremental backup for future auditing.

The flexibility of our AWS services invites you to create and design the ideal workflow for your production at any scale, and, as always, if you ever need some guidance, don’t hesitate to reach out to us.I  hope this has been helpful to you! Please leave any questions and comments below.


Additional Reading

Learn how to analyze VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight.

About the Author

Rendy Oka is a Big Data Support Engineer for Amazon Web Services. He provides consultations and architectural designs and partners with the TAMs, Solution Architects, and AWS product teams to help develop solutions for our customers. He is also a team lead for the big data support team in Seattle. Rendy has traveled to dozens of countries around the world and takes every opportunity to experience the local culture wherever he goes





AWS Hot Startups – June 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-june-2017/

Thanks for stopping by for another round of AWS Hot Startups! This month we are featuring:

  • CloudRanger – helping companies understand the cloud with visual representation.
  • quintly – providing social media analytics for brands on a single dashboard.
  • Tango Card – reinventing rewards programs for businesses and their customers worldwide.

Don’t forget to check out May’s Hot Startups in case you missed them.

CloudRanger (Letterkenny, Ireland)   

The idea for CloudRanger started where most great ideas do – at a bar in Las Vegas. During a late-night conversation with his friends at re:Invent 2014, Dave Gildea (Founder and CEO) used cocktail napkins and drink coasters to visually illustrate servers and backups, and the light on his phone to represent scheduling. By the end of the night, the idea for automated visual server management was born. With CloudRanger, companies can easily create backup and retention policies, visual scheduling, and simple restoration of snapshots and AMIs. The team behind CloudRanger believes that when servers and cloud resources are represented visually, they are easier to manage and understand. Users are able to see their servers, which turns them into a tangible and important piece of business inventory.

CloudRanger is an excellent platform for MSPs who manage many different AWS accounts, and need a quick method to display many servers and audit certain attributes. The company’s goal is to give anyone the ability to create backup policies in multiple regions, apply them using a tag-based methodology, and manage backups. Servers can be scheduled from one simple dashboard, and restoration is easy and step-by-step. With CloudRanger’s visual representation of resources, customers are encouraged to fully understand their backup policies, schedules, and servers.

As an AWS Partner, CloudRanger has built a globally redundant system after going all-in with AWS. They are using over 25 AWS services for everything including enterprise-level security, automation and 24/7 runtimes, and an emphasis on Machine Learning for efficiency in the sales process. CloudRanger continues to rely more on AWS as new services and features are released, and are replacing current services with AWS CodePipeline and AWS CodeBuild. CloudRanger was also named Startup Company of the Year at a recent Irish tech event!

To learn more about CloudRanger, visit their website.

quintly (Cologne, Germany)

In 2010, brothers Alexander Peiniger and Frederik Peiniger started a journey to help companies track their social media profiles and improve their strategies against competitors. The startup began under the name “Social.Media.Tracking” and then “AllFacebook Stats” before officially becoming quintly in 2013. With quintly, brands and agencies can analyze, benchmark, and optimize their social media activities on a global scale. The innovative dashboarding system gives clients an overview across all social media profiles on the most important networks (Facebook, Twitter, YouTube, Google+, LinkedIn, Instagram, etc.) and then derives an optimal social media strategy from those profiles. Today, quintly has users in over 180 countries and paying clients in over 65 countries including major agency networks and Fortune 500 companies.

Getting an overview of a brand’s social media activities can be time-consuming, and turning insights into actions is a challenge that not all brands master. Quintly offers a variety of features designed to help clients improve their social media reach. With their web-based SaaS product, brands and agencies can compare their social media performance against competitors and their best practices. Not only can clients learn from their own historic performance, but they can leverage data from any other brand around the world.

Since the company’s founding, quintly built and operates its SaaS offering on top of AWS services, leveraging Amazon EC2, Amazon ECS, Elastic Load Balancing, and Amazon Route53 to host their Docker-based environment. Large amounts of data are stored in Amazon DynamoDB and Amazon RDS, and they use Amazon CloudWatch to monitor and seamlessly scale to the current needs. In addition, quintly is using Amazon Machine Learning to add additional attributes to the data and to drive better decisions for their clients. With the help of AWS, quintly has been able to focus on their core business while having a scalable and well-performing solution to solve their technical needs.

For more on quintly, check out their Social Media Analytics blog.

Tango Card (Seattle, Washington)

Based in the heart of West Seattle, Tango Card is revolutionizing rewards programs for companies around the world. Too often customers redeem points in a loyalty or rebate program only to wait weeks for their prize to arrive. Companies generously give their employees appreciation gifts, but the gifts can be generic and impersonal. With Tango Card, companies can choose from a variety of rewards that fit the needs of their specific program, event, or business incentive. The extensive Rewards Catalog includes options for e-gift cards that are sure to excite any recipient. There are plenty of options for everyone from traditional e-gift cards to nonprofit donations to cash equivalent rewards.

Tango Card uses a combination of desired rewards, modern technology, and expert service to change the rewards and incentive experience. The Reward Delivery Platform offers solutions including Blast Rewards, Reward Link, and Rewards as a Service API (RaaS). Blast Rewards enables companies to purchase and send e-gift cards in bulk in just one business day. Reward Link lets recipients choose from an assortment of e-gift cards, prepaid cards, digital checks, and donations and is delivered instantly. Finally, Rewards as a Service is a robust digital gift card API that is built to support apps and platforms. With RaaS, Tango Card can send out e-gift cards on company-branded email templates or deliver them directly within a user interface.

The entire Tango Card Reward Delivery Platform leverages many AWS services. They use Amazon EC2 Container Service (ECS) for rapid deployment of containerized micro services, and Amazon Relational Database Service (RDS) for low overhead managed databases. Tango Card is also leveraging Amazon Virtual Private Cloud (VPC), AWS Key Management Service (KMS), and AWS Identity and Access Management (IMS).

To learn more about Tango Card, check out their blog!

I would also like to thank Alexander Moss-Bolanos for helping with the Hot Startups posts this year.

Thanks for reading and we’ll see you next month!

-Tina Barr

DynamoDB Accelerator (DAX) Now Generally Available

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/dynamodb-accelerator-dax-now-generally-available/

Earlier this year I told you about Amazon DynamoDB Accelerator (DAX), a fully-managed caching service that sits in front of (logically speaking) your Amazon DynamoDB tables. DAX returns cached responses in microseconds, making it a great fit for eventually-consistent read-intensive workloads. DAX supports the DynamoDB API, and is seamless and easy to use. As a managed service, you simply create your DAX cluster and use it as the target for your existing reads and writes. You don’t have to worry about patching, cluster maintenance, replication, or fault management.

Now Generally Available
Today I am pleased to announce that DAX is now generally available. We have expanded DAX into additional AWS Regions and used the preview time to fine-tune performance and availability:

Now in Five Regions – DAX is now available in the US East (Northern Virginia), EU (Ireland), US West (Oregon), Asia Pacific (Tokyo), and US West (Northern California) Regions.

In Production – Our preview customers are reporting that they are using DAX in production, that they loved how easy it was to add DAX to their application, and have told us that their apps are now running 10x faster.

Getting Started with DAX
As I outlined in my earlier post, it is easy to use DAX to accelerate your existing DynamoDB applications. You simply create a DAX cluster in the desired region, update your application to reference the DAX SDK for Java (the calls are the same; this is a drop-in replacement), and configure the SDK to use the endpoint to your cluster. As a read-through/write-through cache, DAX seamlessly handles all of the DynamoDB read/write APIs.

We are working on SDK support for other languages, and I will share additional information as it becomes available.

DAX Pricing
You pay for each node in the cluster (see the DynamoDB Pricing page for more information) on a per-hour basis, with prices starting at $0.269 per hour in the US East (Northern Virginia) and US West (Oregon) regions. With DAX, each of the nodes in your cluster serves as a read target and as a failover target for high availability. The DAX SDK is cluster aware and will issue round-robin requests to all nodes in the cluster so that you get to make full use of the cluster’s cache resources.

Because DAX can easily handle sudden spikes in read traffic, you may be able to reduce the amount of provisioned throughput for your tables, resulting in an overall cost savings while still returning results in microseconds.



Building Loosely Coupled, Scalable, C# Applications with Amazon SQS and Amazon SNS

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/

Stephen Liedig, Solutions Architect


One of the many challenges professional software architects and developers face is how to make cloud-native applications scalable, fault-tolerant, and highly available.

Fundamental to your project success is understanding the importance of making systems highly cohesive and loosely coupled. That means considering the multi-dimensional facets of system coupling to support the distributed nature of the applications that you are building for the cloud.

By that, I mean addressing not only the application-level coupling (managing incoming and outgoing dependencies), but also considering the impacts of of platform, spatial, and temporal coupling of your systems. Platform coupling relates to the interoperability, or lack thereof, of heterogeneous systems components. Spatial coupling deals with managing components at a network topology level or protocol level. Temporal, or runtime coupling, refers to the ability of a component within your system to do any kind of meaningful work while it is performing a synchronous, blocking operation.

The AWS messaging services, Amazon SQS and Amazon SNS, help you deal with these forms of coupling by providing mechanisms for:

  • Reliable, durable, and fault-tolerant delivery of messages between application components
  • Logical decomposition of systems and increased autonomy of components
  • Creating unidirectional, non-blocking operations, temporarily decoupling system components at runtime
  • Decreasing the dependencies that components have on each other through standard communication and network channels

Following on the recent topic, Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox, in this post, I look at some of the ways you can introduce SQS and SNS into your architectures to decouple your components, and show how you can implement them using C#.


To illustrate some of these concepts, consider a web application that processes customer orders. As good architects and developers, you have followed best practices and made your application scalable and highly available. Your solution included implementing load balancing, dynamic scaling across multiple Availability Zones, and persisting orders in a Multi-AZ Amazon RDS database instance, as in the following diagram.

In this example, the application is responsible for handling and persisting the order data, as well as dealing with increases in traffic for popular items.

One potential point of vulnerability in the order processing workflow is in saving the order in the database. The business expects that every order has been persisted into the database. However, any potential deadlock, race condition, or network issue could cause the persistence of the order to fail. Then, the order is lost with no recourse to restore the order.

With good logging capability, you may be able to identify when an error occurred and which customer’s order failed. This wouldn’t allow you to “restore” the transaction, and by that stage, your customer is no longer your customer.

As illustrated in the following diagram, introducing an SQS queue helps improve your ordering application. Using the queue isolates the processing logic into its own component and runs it in a separate process from the web application. This, in turn, allows the system to be more resilient to spikes in traffic, while allowing work to be performed only as fast as necessary in order to manage costs.

In addition, you now have a mechanism for persisting orders as messages (with the queue acting as a temporary database), and have moved the scope of your transaction with your database further down the stack. In the event of an application exception or transaction failure, this ensures that the order processing can be retired or redirected to the Amazon SQS Dead Letter Queue (DLQ), for re-processing at a later stage. (See the recent post, Using Amazon SQS Dead-Letter Queues to Control Message Failure, for more information on dead-letter queues.)

Scaling the order processing nodes

This change allows you now to scale the web application frontend independently from the processing nodes. The frontend application can continue to scale based on metrics such as CPU usage, or the number of requests hitting the load balancer. Processing nodes can scale based on the number of orders in the queue. Here is an example of scale-in and scale-out alarms that you would associate with the scaling policy.

Scale-out Alarm

aws cloudwatch put-metric-alarm --alarm-name AddCapacityToCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
--statistic Average --period 300 --threshold 3 --comparison-operator GreaterThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
--evaluation-periods 2 --alarm-actions <arn of the scale-out autoscaling policy>

Scale-in Alarm

aws cloudwatch put-metric-alarm --alarm-name RemoveCapacityFromCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
 --statistic Average --period 300 --threshold 1 --comparison-operator LessThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
 --evaluation-periods 2 --alarm-actions <arn of the scale-in autoscaling policy>

In the above example, use the ApproximateNumberOfMessagesVisible metric to discover the queue length and drive the scaling policy of the Auto Scaling group. Another useful metric is ApproximateAgeOfOldestMessage, when applications have time-sensitive messages and developers need to ensure that messages are processed within a specific time period.

Scaling the order processing implementation

On top of scaling at an infrastructure level using Auto Scaling, make sure to take advantage of the processing power of your Amazon EC2 instances by using as many of the available threads as possible. There are several ways to implement this. In this post, we build a Windows service that uses the BackgroundWorker class to process the messages from the queue.

Here’s a closer look at the implementation. In the first section of the consuming application, use a loop to continually poll the queue for new messages, and construct a ReceiveMessageRequest variable.

public static void PollQueue()
    while (_running)
        Task<ReceiveMessageResponse> receiveMessageResponse;

        // Pull messages off the queue
        using (var sqs = new AmazonSQSClient())
            const int maxMessages = 10;  // 1-10

            //Receiving a message
            var receiveMessageRequest = new ReceiveMessageRequest
                // Get URL from Configuration
                QueueUrl = _queueUrl, 
                // The maximum number of messages to return. 
                // Fewer messages might be returned. 
                MaxNumberOfMessages = maxMessages, 
                // A list of attributes that need to be returned with message.
                AttributeNames = new List<string> { "All" },
                // Enable long polling. 
                // Time to wait for message to arrive on queue.
                WaitTimeSeconds = 5 

            receiveMessageResponse = sqs.ReceiveMessageAsync(receiveMessageRequest);

The WaitTimeSeconds property of the ReceiveMessageRequest specifies the duration (in seconds) that the call waits for a message to arrive in the queue before returning a response to the calling application. There are a few benefits to using long polling:

  • It reduces the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response.
  • It eliminates false empty responses by querying all (rather than a limited number) of the servers.
  • It returns messages as soon any message becomes available.

For more information, see Amazon SQS Long Polling.

After you have returned messages from the queue, you can start to process them by looping through each message in the response and invoking a new BackgroundWorker thread.

// Process messages
if (receiveMessageResponse.Result.Messages != null)
    foreach (var message in receiveMessageResponse.Result.Messages)
        Console.WriteLine("Received SQS message, starting worker thread");

        // Create background worker to process message
        BackgroundWorker worker = new BackgroundWorker();
        worker.DoWork += (obj, e) => ProcessMessage(message);
    Console.WriteLine("No messages on queue");

The event handler, ProcessMessage, is where you implement business logic for processing orders. It is important to have a good understanding of how long a typical transaction takes so you can set a message VisibilityTimeout that is long enough to complete your operation. If order processing takes longer than the specified timeout period, the message becomes visible on the queue. Other nodes may pick it and process the same order twice, leading to unintended consequences.

Handling Duplicate Messages

In order to manage duplicate messages, seek to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself:

f(x) = f(f(x))

No matter how many times you process the same message, the end result is the same (definition from Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Hohpe and Wolf, 2004).

There are several strategies you could apply to achieve this:

  • Create messages that have inherent idempotent characteristics. That is, they are non-transactional in nature and are unique at a specified point in time. Rather than saying “place new order for Customer A,” which adds a duplicate order to the customer, use “place order <orderid> on <timestamp> for Customer A,” which creates a single order no matter how often it is persisted.
  • Deliver your messages via an Amazon SQS FIFO queue, which provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
var sendMessageRequest = new SendMessageRequest
    QueueUrl = _queueUrl,
    MessageBody = JsonConvert.SerializeObject(order),
    MessageGroupId = Guid.NewGuid().ToString("N"),
    MessageDeduplicationId = Guid.NewGuid().ToString("N")
  • If using SQS FIFO queues is not an option, keep a message log of all messages attributes processed for a specified period of time, as an alternative to message deduplication on the receiving end. Verifying the existence of the message in the log before processing the message adds additional computational overhead to your processing. This can be minimized through low latency persistence solutions such as Amazon DynamoDB. Bear in mind that this solution is dependent on the successful, distributed transaction of the message and the message log.

Handling exceptions

Because of the distributed nature of SQS queues, it does not automatically delete the message. Therefore, you must explicitly delete the message from the queue after processing it, using the message ReceiptHandle property (see the following code example).

However, if at any stage you have an exception, avoid handling it as you normally would. The intention is to make sure that the message ends back on the queue, so that you can gracefully deal with intermittent failures. Instead, log the exception to capture diagnostic information, and swallow it.

By not explicitly deleting the message from the queue, you can take advantage of the VisibilityTimeout behavior described earlier. Gracefully handle the message processing failure and make the unprocessed message available to other nodes to process.

In the event that subsequent retries fail, SQS automatically moves the message to the configured DLQ after the configured number of receives has been reached. You can further investigate why the order process failed. Most importantly, the order has not been lost, and your customer is still your customer.

private static void ProcessMessage(Message message)
    using (var sqs = new AmazonSQSClient())
            Console.WriteLine("Processing message id: {0}", message.MessageId);

            // Implement messaging processing here
            // Ensure no downstream resource contention (parallel processing)
            // <your order processing logic in here…>
            Console.WriteLine("{0} Thread {1}: {2}", DateTime.Now.ToString("s"), Thread.CurrentThread.ManagedThreadId, message.MessageId);
            // Delete the message off the queue. 
            // Receipt handle is the identifier you must provide 
            // when deleting the message.
            var deleteRequest = new DeleteMessageRequest(_queueName, message.ReceiptHandle);
            Console.WriteLine("Processed message id: {0}", message.MessageId);

        catch (Exception ex)
            // Do nothing.
            // Swallow exception, message will return to the queue when 
            // visibility timeout has been exceeded.
            Console.WriteLine("Could not process message due to error. Exception: {0}", ex.Message);

Using SQS to adapt to changing business requirements

One of the benefits of introducing a message queue is that you can accommodate new business requirements without dramatically affecting your application.

If, for example, the business decided that all orders placed over $5000 are to be handled as a priority, you could introduce a new “priority order” queue. The way the orders are processed does not change. The only significant change to the processing application is to ensure that messages from the “priority order” queue are processed before the “standard order” queue.

The following diagram shows how this logic could be isolated in an “order dispatcher,” whose only purpose is to route order messages to the appropriate queue based on whether the order exceeds $5000. Nothing on the web application or the processing nodes changes other than the target queue to which the order is sent. The rates at which orders are processed can be achieved by modifying the poll rates and scalability settings that I have already discussed.

Extending the design pattern with Amazon SNS

Amazon SNS supports reliable publish-subscribe (pub-sub) scenarios and push notifications to known endpoints across a wide variety of protocols. It eliminates the need to periodically check or poll for new information and updates. SNS supports:

  • Reliable storage of messages for immediate or delayed processing
  • Publish / subscribe – direct, broadcast, targeted “push” messaging
  • Multiple subscriber protocols
  • Amazon SQS, HTTP, HTTPS, email, SMS, mobile push, AWS Lambda

With these capabilities, you can provide parallel asynchronous processing of orders in the system and extend it to support any number of different business use cases without affecting the production environment. This is commonly referred to as a “fanout” scenario.

Rather than your web application pushing orders to a queue for processing, send a notification via SNS. The SNS messages are sent to a topic and then replicated and pushed to multiple SQS queues and Lambda functions for processing.

As the diagram above shows, you have the development team consuming “live” data as they work on the next version of the processing application, or potentially using the messages to troubleshoot issues in production.

Marketing is consuming all order information, via a Lambda function that has subscribed to the SNS topic, inserting the records into an Amazon Redshift warehouse for analysis.

All of this, of course, is happening without affecting your order processing application.


While I haven’t dived deep into the specifics of each service, I have discussed how these services can be applied at an architectural level to build loosely coupled systems that facilitate multiple business use cases. I’ve also shown you how to use infrastructure and application-level scaling techniques, so you can get the most out of your EC2 instances.

One of the many benefits of using these managed services is how quickly and easily you can implement powerful messaging capabilities in your systems, and lower the capital and operational costs of managing your own messaging middleware.

Using Amazon SQS and Amazon SNS together can provide you with a powerful mechanism for decoupling application components. This should be part of design considerations as you architect for the cloud.

For more information, see the Amazon SQS Developer Guide and Amazon SNS Developer Guide. You’ll find tutorials on all the concepts covered in this post, and more. To can get started using the AWS console or SDK of your choice visit:

Happy messaging!

New – Auto Scaling for Amazon DynamoDB

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-auto-scaling-for-amazon-dynamodb/

Amazon DynamoDB has more than one hundred thousand customers, spanning a wide range of industries and use cases. These customers depend on DynamoDB’s consistent performance at any scale and presence in 16 geographic regions around the world. A recent trend we’ve been observing is customers using DynamoDB to power their serverless applications. This is a good match: with DynamoDB, you don’t have to think about things like provisioning servers, performing OS and database software patching, or configuring replication across availability zones to ensure high availability – you can simply create tables and start adding data, and let DynamoDB handle the rest.

DynamoDB provides a provisioned capacity model that lets you set the amount of read and write capacity required by your applications. While this frees you from thinking about servers and enables you to change provisioning for your table with a simple API call or button click in the AWS Management Console, customers have asked us how we can make managing capacity for DynamoDB even easier.

Today we are introducing Auto Scaling for DynamoDB to help automate capacity management for your tables and global secondary indexes. You simply specify the desired target utilization and provide upper and lower bounds for read and write capacity. DynamoDB will then monitor throughput consumption using Amazon CloudWatch alarms and then will adjust provisioned capacity up or down as needed. Auto Scaling will be on by default for all new tables and indexes, and you can also configure it for existing ones.

Even if you’re not around, DynamoDB Auto Scaling will be monitoring your tables and indexes to automatically adjust throughput in response to changes in application traffic. This can make it easier to administer your DynamoDB data, help you maximize availability for your applications, and help you reduce your DynamoDB costs.

Let’s see how it works…

Using Auto Scaling
The DynamoDB Console now proposes a comfortable set of default parameters when you create a new table. You can accept them as-is or you can uncheck Use default settings and enter your own parameters:

Here’s how you enter your own parameters:

Target utilization is expressed in terms of the ratio of consumed capacity to provisioned capacity. The parameters above would allow for sufficient headroom to allow consumed capacity to double due to a burst in read or write requests (read Capacity Unit Calculations to learn more about the relationship between DynamoDB read and write operations and provisioned capacity). Changes in provisioned capacity take place in the background.

Auto Scaling in Action
In order to see this important new feature in action, I followed the directions in the Getting Started Guide. I launched a fresh EC2 instance, installed (sudo pip install boto3) and configured (aws configure) the AWS SDK for Python. Then I used the code in the Python and DynamoDB section to create and populate a table with some data, and manually configured the table for 5 units each of read and write capacity.

I took a quick break in order to have clean, straight lines for the CloudWatch metrics so that I could show the effect of Auto Scaling. Here’s what the metrics look like before I started to apply a load:

I modified the code in Step 3 to continually issue queries for random years in the range of 1920 to 2007, ran a single copy of the code, and checked the read metrics a minute or two later:

The consumed capacity is higher than the provisioned capacity, resulting in a large number of throttled reads. Time for Auto Scaling!

I returned to the console and clicked on the Capacity tab for my table. Then I clicked on Read capacity, accepted the default values, and clicked on Save:

DynamoDB created a new IAM role (DynamoDBAutoscaleRole) and a pair of CloudWatch alarms to manage the Auto Scaling of read capacity:

DynamoDB Auto Scaling will manage the thresholds for the alarms, moving them up and down as part of the scaling process. The first alarm was triggered and the table state changed to Updating while additional read capacity was provisioned:

The change was visible in the read metrics within minutes:

I started a couple of additional copies of my modified query script and watched as additional capacity was provisioned, as indicated by the red line:

I killed all of the scripts and turned my attention to other things while waiting for the scale-down alarm to trigger. Here’s what I saw when I came back:

The next morning I checked my Scaling activities and saw that the alarm had triggered several more times overnight:

This was also visible in the metrics:

Until now, you would prepare for this situation by setting your read capacity well about your expected usage, and pay for the excess capacity (the space between the blue line and the red line). Or, you might set it too low, forget to monitor it, and run out of capacity when traffic picked up. With Auto Scaling you can get the best of both worlds: an automatic response when an increase in demand suggests that more capacity is needed, and another automated response when the capacity is no longer needed.

Things to Know
DynamoDB Auto Scaling is designed to accommodate request rates that vary in a somewhat predictable, generally periodic fashion. If you need to accommodate unpredictable bursts of read activity, you should use Auto Scaling in combination with DAX (read Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads to learn more). Also, the AWS SDKs will detect throttled read and write requests and retry them after a suitable delay.

I mentioned the DynamoDBAutoscaleRole earlier. This role provides Auto Scaling with the privileges that it needs to have in order for it to be able to scale your tables and indexes up and down. To learn more about this role and the permissions that it uses, read Grant User Permissions for DynamoDB Auto Scaling.

Auto Scaling has complete CLI and API support, including the ability to enable and disable the Auto Scaling policies. If you have some predictable, time-bound spikes in traffic, you can programmatically disable an Auto Scaling policy, provision higher throughput for a set period of time, and then enable Auto Scaling again later.

As noted on the Limits in DynamoDB page, you can increase provisioned capacity as often as you would like and as high as you need (subject to per-account limits that we can increase on request). You can decrease capacity up to nine times per day for each table or global secondary index.

You pay for the capacity that you provision, at the regular DynamoDB prices. You can also purchase DynamoDB Reserved Capacity to further savings.

Available Now
This feature is available now in all regions and you can start using it today!


Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway

Post Syndicated from Ed Lima original https://aws.amazon.com/blogs/compute/secure-api-access-with-amazon-cognito-federated-identities-amazon-cognito-user-pools-and-amazon-api-gateway/

Ed Lima, Solutions Architect


Our identities are what define us as human beings. Philosophical discussions aside, it also applies to our day-to-day lives. For instance, I need my work badge to get access to my office building or my passport to travel overseas. My identity in this case is attached to my work badge or passport. As part of the system that checks my access, these documents or objects help define whether I have access to get into the office building or travel internationally.

This exact same concept can also be applied to cloud applications and APIs. To provide secure access to your application users, you define who can access the application resources and what kind of access can be granted. Access is based on identity controls that can confirm authentication (AuthN) and authorization (AuthZ), which are different concepts. According to Wikipedia:


The process of authorization is distinct from that of authentication. Whereas authentication is the process of verifying that “you are who you say you are,” authorization is the process of verifying that “you are permitted to do what you are trying to do.” This does not mean authorization presupposes authentication; an anonymous agent could be authorized to a limited action set.

Amazon Cognito allows building, securing, and scaling a solution to handle user management and authentication, and to sync across platforms and devices. In this post, I discuss the different ways that you can use Amazon Cognito to authenticate API calls to Amazon API Gateway and secure access to your own API resources.


Amazon Cognito Concepts


It’s important to understand that Amazon Cognito provides three different services:

Today, I discuss the use of the first two. One service doesn’t need the other to work; however, they can be configured to work together.

Amazon Cognito Federated Identities

To use Amazon Cognito Federated Identities in your application, create an identity pool. An identity pool is a store of user data specific to your account. It can be configured to require an identity provider (IdP) for user authentication, after you enter details such as app IDs or keys related to that specific provider.

After the user is validated, the provider sends an identity token to Amazon Cognito Federated Identities. In turn, Amazon Cognito Federated Identities contacts the AWS Security Token Service (AWS STS) to retrieve temporary AWS credentials based on a configured, authenticated IAM role linked to the identity pool. The role has appropriate IAM policies attached to it and uses these policies to provide access to other AWS services.

Amazon Cognito Federated Identities currently supports the IdPs listed in the following graphic.


Continue reading Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway

AWS Greengrass – Run AWS Lambda Functions on Connected Devices

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-greengrass-run-aws-lambda-functions-on-connected-devices/

I first told you about AWS Greengrass in the post that I published during re:Invent (AWS Greengrass – Ubiquitous Real-World Computing). We launched a limited preview of Greengrass at that time and invited you to sign up if you were interested.

As I noted at the time, many AWS customers want to collect and process data out in the field, where connectivity is often slow and sometimes either intermittent or unreliable. Greengrass allows them to extend the AWS programming model to small, simple, field-based devices. It builds on AWS IoT and AWS Lambda, and supports access to the ever-increasing variety of services that are available in the AWS Cloud.

Greengrass gives you access to compute, messaging, data caching, and syncing services that run in the field, and that do not depend on constant, high-bandwidth connectivity to an AWS Region. You can write Lambda functions in Python 2.7 and deploy them to your Greengrass devices from the cloud while using device shadows to maintain state. Your devices and peripherals can talk to each other using local messaging that does not pass through the cloud.

Now Generally Available
Today we are making Greengrass generally available in the US East (Northern Virginia) and US West (Oregon) Regions. During the preview, AWS customers were able to get hands-on experience with Greengrass and to start building applications and businesses around it. I’ll share a few of these early successes later in this post.

The Greengrass Core code runs on each device. It allows you to deploy and run Lambda applications on the device, supports local MQTT messaging across a secure network, and also ensures that conversations between devices and the cloud are made across secure connections. The Greengrass Core also supports secure, over-the-air software updates, including Lambda functions. It includes a message broker, a Lambda runtime, a Thing Shadows implementation, and a deployment agent. Greengrass Core and (optionally) other devices make up a Greengrass Group. The group includes configuration data, the list of devices and the identity of the Greengrass Core, a list of Lambda functions, and a set of subscriptions that define where the messages should go. All of this information is copied to the Greengrass core devices during the deployment process.

Your Lambda functions can use APIs in three distinct SDKs:

AWS SDK for Python – This SDK allows your code to interact with Amazon Simple Storage Service (S3), Amazon DynamoDB, Amazon Simple Queue Service (SQS), and other AWS services.

AWS IoT Device SDK – This SDK (available for Node.js, Python, Java, and C++) helps you to connect your hardware devices to AWS IoT. The C++ SDK has a few extra features including access to the Greengrass Discovery Service and support for root CA downloads.

AWS Greengrass Core SDK – This SDK provides APIs that allow local invocation of other Lambda functions, publish messages, and work with thing shadows.

You can run the Greengrass Core on x86 and ARM devices that have version 4.4.11 (or newer) of the Linux kernel, with the OverlayFS and user namespace features enabled. While most deployments of Greengrass will be targeted at specialized, industrial-grade hardware, you can also run the Greengrass Core on a Raspberry Pi or an EC2 instance for development and test purposes.

For this post, I used a Raspberry Pi attached to a BrickPi, connected to my home network via WiFi:

The Raspberry Pi, the BrickPi, the case, and all of the other parts are available in the BrickPi 3 Starter Kit. You will need some Linux command-line expertise and a decent amount of manual dexterity to put all of this together, but if I did it then you surely can.

Greengrass in Action
I can access Greengrass from the Console, API, or CLI. I’ll use the Console. The intro page of the Greengrass Console lets me define groups, add Greengrass Cores, and add devices to my groups:

I click on Get Started and then on Use easy creation:

Then I name my group:

And name my first Greengrass Core:

I’m ready to go, so I click on Create Group and Core:

This runs for a few seconds and then offers up my security resources (two keys and a certificate) for downloading, along with the Greengrass Core:

I download the security resources and put them in a safe place, and select and download the desired version of the Greengrass Core software (ARMv7l for my Raspberry Pi), and click on Finish.

Now I power up my Pi, and copy the security resources and the software to it (I put them in an S3 bucket and pulled them down with wget). Here’s my shell history at that point:

Following the directions in the user guide, I create a new user and group, run the rpi-update script, and install several packages including sqlite3 and openssl. After a couple of reboots, I am ready to proceed!

Next, still following the directions, I untar the Greengrass Core software and move the security resources to their final destination (/greengrass/configuration/certs), giving them generic names along the way. Here’s what the directory looks like:

The next step is to associate the core with an AWS IoT thing. I return to the Console, click through the group and the Greengrass Core, and find the Thing ARN:

I insert the names of the certificates and the Thing ARN into the config.json file, and also fill in the missing sections of the iotHost and ggHost:

I start the Greengrass demon (this was my second attempt; I had a typo in one of my path names the first time around):

After all of this pleasant time at the command line (taking me back to my Unix v7 and BSD 4.2 days), it is time to go visual once again! I visit my AWS IoT dashboard and see that my Greengrass Core is making connections to IoT:

I go to the Lambda Console and create a Lambda function using the Python 2.7 runtime (the IAM role does not matter here):

I publish the function in the usual way and, hop over to the Greengrass Console, click on my group, and choose to add a Lambda function:

Then I choose the version to deploy:

I also configure the function to be long-lived instead of on-demand:

My code will publish messages to AWS IoT, so I create a subscription by specifying the source and destination:

I set up a topic filter (hello/world) on the subscription as well:

I confirm my settings and save my subscription and I am just about ready to deploy my code. I revisit my group, click on Deployments, and choose Deploy from the Actions menu:

I choose Automatic detection to move forward:

Since this is my first deployment, I need to create a service-level role that gives Greengrass permission to access other AWS services. I simply click on Grant permission:

I can see the status of each deployment:

The code is now running on my Pi! It publishes messages to topic hello/world; I can see them by going to the IoT Console, clicking on Test, and subscribing to the topic:

And here are the messages:

With all of the setup work taken care of, I can do iterative development by uploading, publishing, and deploying new versions of my code. I plan to use the BrickPi to control some LEGO Technic motors and to publish data collected from some sensors. Stay tuned for that post!

Greengrass Pricing
You can run the Greengrass Core on three devices free for one year as part of the AWS Free Tier. At the next level (3 to 10,000 devices) two options are available:

  • Pay as You Go – $0.16 per month per device.
  • Annual Commitment – $1.49 per year per device, a 17.5% savings.

If you want to run the Greengrass Core on more than 10,000 devices or make a longer commitment, please get in touch with us; details on all pricing models are on the Greengrass Pricing page.


Event: AWS Serverless Roadshow – Hands-on Workshops

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/event-aws-serverless-roadshow-hands-on-workshops/

Surely, some of you have contemplated how you would survive the possible Zombie apocalypse or how you would build your exciting new startup to disrupt the transportation industry when Unicorn haven is uncovered. Well, there is no need to worry; I know just the thing to get you prepared to handle both of those scenarios: the AWS Serverless Computing Workshop Roadshow.

With the roadshow’s serverless workshops, you can get hands-on experience building serverless applications and microservices so you can rebuild what remains of our great civilization after a widespread viral infection causes human corpses to reanimate around the world in the AWS Zombie Microservices Workshop. In addition, you can give your startup a jump on the competition with the Wild Rydes workshop in order to revolutionize the transportation industry; just in time for a pilot’s crash landing leading the way to the discovery of abundant Unicorn pastures found on the outskirts of the female Amazonian warrior inhabited island of Themyscira also known as Paradise Island.

These free, guided hands-on workshops will introduce the basics of building serverless applications and microservices for common and uncommon scenarios using services like AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Step Functions, and more. Let me share some advice before you decide to tackle Zombies and mount Unicorns – don’t forget to bring your laptop to the workshop and make sure you have an AWS account established and available for use for the event.

Check out the schedule below and get prepared today by registering for an upcoming workshop in a city near you. Remember these are workshops are completely free, so participation is on a first come, first served basis. So register and get there early, we need Zombie hunters and Unicorn riders across the globe.  Learn more about AWS Serverless Computing Workshops here and register for your city using links below.

Event Location Date
Wild Rydes New York Thursday, June 8
Wild Rydes Austin Thursday, June 22
Wild Rydes Santa Monica Thursday, July 20
Zombie Apocalypse Chicago Thursday, July 20
Wild Rydes Atlanta Tuesday, September 12
Zombie Apocalypse Dallas Tuesday, September 19


I look forward to fighting zombies and riding unicorns with you all.


Building High-Throughput Genomics Batch Workflows on AWS: Workflow Layer (Part 4 of 4)

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/building-high-throughput-genomics-batch-workflows-on-aws-workflow-layer-part-4-of-4/

Aaron Friedman is a Healthcare and Life Sciences Partner Solutions Architect at AWS

Angel Pizarro is a Scientific Computing Technical Business Development Manager at AWS

This post is the fourth in a series on how to build a genomics workflow on AWS. In Part 1, we introduced a general architecture, shown below, and highlighted the three common layers in a batch workflow:

  • Job
  • Batch
  • Workflow

In Part 2, you built a Docker container for each job that needed to run as part of your workflow, and stored them in Amazon ECR.

In Part 3, you tackled the batch layer and built a scalable, elastic, and easily maintainable batch engine using AWS Batch. This solution took care of dynamically scaling your compute resources in response to the number of runnable jobs in your job queue length as well as managed job placement.

In part 4, you build out the workflow layer of your solution using AWS Step Functions and AWS Lambda. You then run an end-to-end genomic analysis―specifically known as exome secondary analysis―for many times at a cost of less than $1 per exome.

Step Functions makes it easy to coordinate the components of your applications using visual workflows. Building applications from individual components that each perform a single function lets you scale and change your workflow quickly. You can use the graphical console to arrange and visualize the components of your application as a series of steps, which simplify building and running multi-step applications. You can change and add steps without writing code, so you can easily evolve your application and innovate faster.

An added benefit of using Step Functions to define your workflows is that the state machines you create are immutable. While you can delete a state machine, you cannot alter it after it is created. For regulated workloads where auditing is important, you can be assured that state machines you used in production cannot be altered.

In this blog post, you will create a Lambda state machine to orchestrate your batch workflow. For more information on how to create a basic state machine, please see this Step Functions tutorial.

All code related to this blog series can be found in the associated GitHub repository here.

Build a state machine building block

To skip the following steps, we have provided an AWS CloudFormation template that can deploy your Step Functions state machine. You can use this in combination with the setup you did in part 3 to quickly set up the environment in which to run your analysis.

The state machine is composed of smaller state machines that submit a job to AWS Batch, and then poll and check its execution.

The steps in this building block state machine are as follows:

  1. A job is submitted.
    Each analytical module/job has its own Lambda function for submission and calls the batchSubmitJob Lambda function that you built in the previous blog post. You will build these specialized Lambda functions in the following section.
  2. The state machine queries the AWS Batch API for the job status.
    This is also a Lambda function.
  3. The job status is checked to see if the job has completed.
    If the job status equals SUCCESS, proceed to log the final job status. If the job status equals FAILED, end the execution of the state machine. In all other cases, wait 30 seconds and go back to Step 2.

Here is the JSON representing this state machine.

  "Comment": "A simple example that submits a Job to AWS Batch",
  "StartAt": "SubmitJob",
  "States": {
    "SubmitJob": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>::function:batchSubmitJob",
      "Next": "GetJobStatus"
    "GetJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "Next": "CheckJobStatus",
      "InputPath": "$",
      "ResultPath": "$.status"
    "CheckJobStatus": {
      "Type": "Choice",
      "Choices": [
          "Variable": "$.status",
          "StringEquals": "FAILED",
          "End": true
          "Variable": "$.status",
          "StringEquals": "SUCCEEDED",
          "Next": "GetFinalJobStatus"
      "Default": "Wait30Seconds"
    "Wait30Seconds": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "GetJobStatus"
    "GetFinalJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "End": true

Building the Lambda functions for the state machine

You need two basic Lambda functions for this state machine. The first one submits a job to AWS Batch and the second checks the status of the AWS Batch job that was submitted.

In AWS Step Functions, you specify an input as JSON that is read into your state machine. Each state receives the aggregate of the steps immediately preceding it, and you can specify which components a state passes on to its children. Because you are using Lambda functions to execute tasks, one of the easiest routes to take is to modify the input JSON, represented as a Python dictionary, within the Lambda function and return the entire dictionary back for the next state to consume.

Building the batchSubmitIsaacJob Lambda function

For Step 1 above, you need a Lambda function for each of the steps in your analysis workflow. As you created a generic Lambda function in the previous post to submit a batch job (batchSubmitJob), you can use that function as the basis for the specialized functions you’ll include in this state machine. Here is such a Lambda function for the Isaac aligner.

from __future__ import print_function

import boto3
import json
import traceback

lambda_client = boto3.client('lambda')

def lambda_handler(event, context):
        # Generate output put
        bam_s3_path = '/'.join([event['resultsS3Path'], event['sampleId'], 'bam/'])

        depends_on = event['dependsOn'] if 'dependsOn' in event else []

        # Generate run command
        command = [
            '--bam_s3_folder_path', bam_s3_path,
            '--fastq1_s3_path', event['fastq1S3Path'],
            '--fastq2_s3_path', event['fastq2S3Path'],
            '--reference_s3_path', event['isaac']['referenceS3Path'],
            '--working_dir', event['workingDir']

        if 'cmdArgs' in event['isaac']:
            command.extend(['--cmd_args', event['isaac']['cmdArgs']])
        if 'memory' in event['isaac']:
            command.extend(['--memory', event['isaac']['memory']])

        # Submit Payload
        response = lambda_client.invoke(
                    'command': command,
                jobName='-'.join(['isaac', event['sampleId']]),

        response_payload = response['Payload'].read()

        # Update event
        event['bamS3Path'] = bam_s3_path
        event['jobId'] = json.loads(response_payload)['jobId']
        return event
    except Exception as e:
        raise e

In the Lambda console, create a Python 2.7 Lambda function named batchSubmitIsaacJob and paste in the above code. Use the LambdaBatchExecutionRole that you created in the previous post. For more information, see Step 2.1: Create a Hello World Lambda Function.

This Lambda function reads in the inputs passed to the state machine it is part of, formats the data for the batchSubmitJob Lambda function, invokes that Lambda function, and then modifies the event dictionary to pass onto the subsequent states. You can repeat these for each of the other tools, which can be found in the tools//lambda/lambda_function.py script in the GitHub repo.

Building the batchGetJobStatus Lambda function

For Step 2 above, the process queries the AWS Batch DescribeJobs API action with jobId to identify the state that the job is in. You can put this into a Lambda function to integrate it with Step Functions.

In the Lambda console, create a new Python 2.7 function with the LambdaBatchExecutionRole IAM role. Name your function batchGetJobStatus and paste in the following code. This is similar to the batch-get-job-python27 Lambda blueprint.

from __future__ import print_function

import boto3
import json

print('Loading function')

batch_client = boto3.client('batch')

def lambda_handler(event, context):
    # Log the received event
    print("Received event: " + json.dumps(event, indent=2))
    # Get jobId from the event
    job_id = event['jobId']

        response = batch_client.describe_jobs(
        job_status = response['jobs'][0]['status']
        return job_status
    except Exception as e:
        message = 'Error getting Batch Job status'
        raise Exception(message)

Structuring state machine input

You have structured the state machine input so that general file references are included at the top-level of the JSON object, and any job-specific items are contained within a nested JSON object. At a high level, this is what the input structure looks like:

        "general_field_1": "value1",
        "general_field_2": "value2",
        "general_field_3": "value3",
        "job1": {},
        "job2": {},
        "job3": {}

Building the full state machine

By chaining these state machine components together, you can quickly build flexible workflows that can process genomes in multiple ways. The development of the larger state machine that defines the entire workflow uses four of the above building blocks. You use the Lambda functions that you built in the previous section. Rename each building block submission to match the tool name.

We have provided a CloudFormation template to deploy your state machine and the associated IAM roles. In the CloudFormation console, select Create Stack, choose your template (deploy_state_machine.yaml), and enter in the ARNs for the Lambda functions you created.

Continue through the rest of the steps and deploy your stack. Be sure to check the box next to "I acknowledge that AWS CloudFormation might create IAM resources."

Once the CloudFormation stack is finished deploying, you should see the following image of your state machine.

In short, you first submit a job for Isaac, which is the aligner you are using for the analysis. Next, you use parallel state to split your output from "GetFinalIsaacJobStatus" and send it to both your variant calling step, Strelka, and your QC step, Samtools Stats. These then are run in parallel and you annotate the results from your Strelka step with snpEff.

Putting it all together

Now that you have built all of the components for a genomics secondary analysis workflow, test the entire process.

We have provided sequences from an Illumina sequencer that cover a region of the genome known as the exome. Most of the positions in the genome that we have currently associated with disease or human traits reside in this region, which is 1–2% of the entire genome. The workflow that you have built works for both analyzing an exome, as well as an entire genome.

Additionally, we have provided prebuilt reference genomes for Isaac, located at:


If you are interested, we have provided a script that sets up all of that data. To execute that script, run the following command on a large EC2 instance:

make reference REGISTRY=<your-ecr-registry>

Indexing and preparing this reference takes many hours on a large-memory EC2 instance. Be careful about the costs involved and note that the data is available through the prebuilt reference genomes.

Starting the execution

In a previous section, you established a provenance for the JSON that is fed into your state machine. For ease, we have auto-populated the input JSON for you to the state machine. You can also find this in the GitHub repo under workflow/test.input.json:

  "fastq1S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz",
  "fastq2S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz",
  "referenceS3Path": "s3://aws-batch-genomics-resources/reference/hg38.fa",
  "resultsS3Path": "s3://<bucket>/genomic-workflow/results",
  "sampleId": "NA12878_states_1",
  "workingDir": "/scratch",
  "isaac": {
    "jobDefinition": "isaac-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/highPriority-myenv",
    "referenceS3Path": "s3://aws-batch-genomics-resources/reference/isaac/"
  "samtoolsStats": {
    "jobDefinition": "samtools_stats-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/lowPriority-myenv"
  "strelka": {
    "jobDefinition": "strelka-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/highPriority-myenv",
    "cmdArgs": " --exome "
  "snpEff": {
    "jobDefinition": "snpeff-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/lowPriority-myenv",
    "cmdArgs": " -t hg38 "

You are now at the stage to run your full genomic analysis. Copy the above to a new text file, change paths and ARNs to the ones that you created previously, and save your JSON input as input.states.json.

In the CLI, execute the following command. You need the ARN of the state machine that you created in the previous post:

aws stepfunctions start-execution --state-machine-arn <your-state-machine-arn> --input file://input.states.json

Your analysis has now started. By using Spot Instances with AWS Batch, you can quickly scale out your workflows while concurrently optimizing for cost. While this is not guaranteed, most executions of the workflows presented here should cost under $1 for a full analysis.

Monitoring the execution

The output from the above CLI command gives you the ARN that describes the specific execution. Copy that and navigate to the Step Functions console. Select the state machine that you created previously and paste the ARN into the search bar.

The screen shows information about your specific execution. On the left, you see where your execution currently is in the workflow.

In the following screenshot, you can see that your workflow has successfully completed the alignment job and moved onto the subsequent steps, which are variant calling and generating quality information about your sample.

You can also navigate to the AWS Batch console and see that progress of all of your jobs reflected there as well.

Finally, after your workflow has completed successfully, check out the S3 path to which you wrote all of your files. If you run a ls –recursive command on the S3 results path, specified in the input to your state machine execution, you should see something similar to the following:

2017-05-02 13:46:32 6475144340 genomic-workflow/results/NA12878_run1/bam/sorted.bam
2017-05-02 13:46:34    7552576 genomic-workflow/results/NA12878_run1/bam/sorted.bam.bai
2017-05-02 13:46:32         45 genomic-workflow/results/NA12878_run1/bam/sorted.bam.md5
2017-05-02 13:53:20      68769 genomic-workflow/results/NA12878_run1/stats/bam_stats.dat
2017-05-02 14:05:12        100 genomic-workflow/results/NA12878_run1/vcf/stats/runStats.tsv
2017-05-02 14:05:12        359 genomic-workflow/results/NA12878_run1/vcf/stats/runStats.xml
2017-05-02 14:05:12  507577928 genomic-workflow/results/NA12878_run1/vcf/variants/genome.S1.vcf.gz
2017-05-02 14:05:12     723144 genomic-workflow/results/NA12878_run1/vcf/variants/genome.S1.vcf.gz.tbi
2017-05-02 14:05:12  507577928 genomic-workflow/results/NA12878_run1/vcf/variants/genome.vcf.gz
2017-05-02 14:05:12     723144 genomic-workflow/results/NA12878_run1/vcf/variants/genome.vcf.gz.tbi
2017-05-02 14:05:12   30783484 genomic-workflow/results/NA12878_run1/vcf/variants/variants.vcf.gz
2017-05-02 14:05:12    1566596 genomic-workflow/results/NA12878_run1/vcf/variants/variants.vcf.gz.tbi

Modifications to the workflow

You have now built and run your genomics workflow. While diving deep into modifications to this architecture are beyond the scope of these posts, we wanted to leave you with several suggestions of how you might modify this workflow to satisfy additional business requirements.

  • Job tracking with Amazon DynamoDB
    In many cases, such as if you are offering Genomics-as-a-Service, you might want to track the state of your jobs with DynamoDB to get fine-grained records of how your jobs are running. This way, you can easily identify the cost of individual jobs and workflows that you run.
  • Resuming from failure
    Both AWS Batch and Step Functions natively support job retries and can cover many of the standard cases where a job might be interrupted. There may be cases, however, where your workflow might fail in a way that is unpredictable. In this case, you can use custom error handling with AWS Step Functions to build out a workflow that is even more resilient. Also, you can build in fail states into your state machine to fail at any point, such as if a batch job fails after a certain number of retries.
  • Invoking Step Functions from Amazon API Gateway
    You can use API Gateway to build an API that acts as a "front door" to Step Functions. You can create a POST method that contains the input JSON to feed into the state machine you built. For more information, see the Implementing Serverless Manual Approval Steps in AWS Step Functions and Amazon API Gateway blog post.


While the approach we have demonstrated in this series has been focused on genomics, it is important to note that this can be generalized to nearly any high-throughput batch workload. We hope that you have found the information useful and that it can serve as a jump-start to building your own batch workloads on AWS with native AWS services.

For more information about how AWS can enable your genomics workloads, be sure to check out the AWS Genomics page.

Other posts in this four-part series:

Please leave any questions and comments below.

New Features for IAM Policy Summaries – An Easier Way to Detect Potential Typos in Your IAM Policies

Post Syndicated from Joy Chatterjee original https://aws.amazon.com/blogs/security/new-features-for-iam-policy-summaries-an-easier-way-to-detect-potential-typos-in-your-iam-policies/

Last month, we introduced policy summaries to make it easier for you to understand the permissions in your AWS Identity and Access Management (IAM) policies. On Thursday, May 25, I announced three new features that have been added to policy summaries and reviewed resource summaries. Yesterday, I reviewed the benefits of being able to view services and actions that are implicitly denied by a policy.

Today, I demonstrate how policy summaries make it easier for you to detect potential typos in your policies by showing you unrecognized services and actions. In this post, I show how this new feature can help you detect and fix potential typos in your policies.

Unrecognized services and actions

You can now use policy summaries to see unrecognized services and actions. One key benefit of this feature is that it helps you find possible typos in a policy. Let’s say your developer, Bob, creates a policy granting full List and Read permissions to some Amazon S3 buckets and full access to Amazon DynamoDB. Unfortunately, when testing the policy, Bob sees “Access denied” messages when he tries to use those services. To troubleshoot, Bob returns to the IAM console to review the policy summary. Bob sees that he inadvertently misspelled “DynamoDB” as “DynamoBD” (reversing the position of the last two letters) in the policy and notices that he does not have all of the list permissions for S3.

Screenshot showing the "dynamobd" typo

When Bob chooses S3, he sees that ListBuckets is an unrecognized action, as shown in the following screenshot.

Screenshot showing that ListBuckets is not recognized by IAM

Bob chooses Show remaining 26 and realizes the correct action is s3:ListBucket and not s3:ListBuckets. He also confirms this by looking at the list of actions for S3.

Screenshot showing the true action name

Bob fixes the mistakes by choosing the Edit policy button, making the necessary updates, and saving the changes. He returns to the policy summary and sees that the policy no longer has unrecognized services and actions.


If you have a service or action that appears in the Unrecognized services or Unrecognized actions section of the policy summary, it may be because the service is in preview mode. If you think a service or action should be recognized, please submit feedback by choosing the Feedback link located in the bottom left corner of the IAM console.


Policy summaries make it easier to troubleshoot possible errors in policies. The newest updates I have explored this week on the AWS Security Blog make it easy to understand the resources defined in a policy, show the services and actions that are implicitly denied by a policy, and help you troubleshoot possible typos in a policy. To see policy summaries in your AWS account, sign in to the IAM console and navigate to any policy on the Policies page of the IAM console or the Permissions tab on a user’s page.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

– Joy