Tag Archives: Service Proxy

SAML for Your Serverless JavaScript Application: Part II

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/saml-for-your-serverless-javascript-application-part-ii/

Contributors: Richard Threlkeld, Gene Ting, Stefano Buliani

The full code for both scenarios—including SAM templates—can be found at the samljs-serverless-sample GitHub repository. We highly recommend you use the SAM templates in the GitHub repository to create the resources, opitonally you can manually create them.


This is the second part of a two part series for using SAML providers in your application and receiving short-term credentials to access AWS Services. These credentials can be limited with IAM roles so the users of the applications can perform actions like fetching data from databases or uploading files based on their level of authorization. For example, you may want to build a JavaScript application that allows a user to authenticate against Active Directory Federation Services (ADFS). The user can be granted scoped AWS credentials to invoke an API to display information in the application or write to an Amazon DynamoDB table.

Part I of this series walked through a client-side flow of retrieving SAML claims and passing them to Amazon Cognito to retrieve credentials. This blog post will take you through a more advanced scenario where logic can be moved to the backend for a more comprehensive and flexible solution.

Prerequisites

As in Part I of this series, you need ADFS running in your environment. The following configurations are used for reference:

  1. ADFS federated with the AWS console. For a walkthrough with an AWS CloudFormation template, see Enabling Federation to AWS Using Windows Active Directory, ADFS, and SAML 2.0.
  2. Verify that you can authenticate with user example\bob for both the ADFS-Dev and ADFS-Production groups via the sign-in page.
  3. Create an Amazon Cognito identity pool.

Scenario Overview

The scenario in the last blog post may be sufficient for many organizations but, due to size restrictions, some browsers may drop part or all of a query string when sending a large number of claims in the SAMLResponse. Additionally, for auditing and logging reasons, you may wish to relay SAML assertions via POST only and perform parsing in the backend before sending credentials to the client. This scenario allows you to perform custom business logic and validation as well as putting tracking controls in place.

In this post, we want to show you how these requirements can be achieved in a Serverless application. We also show how different challenges (like XML parsing and JWT exchange) can be done in a Serverless application design. Feel free to mix and match, or swap pieces around to suit your needs.

This scenario uses the following services and features:

  • Cognito for unique ID generation and default role mapping
  • S3 for static website hosting
  • API Gateway for receiving the SAMLResponse POST from ADFS
  • Lambda for processing the SAML assertion using a native XML parser
  • DynamoDB conditional writes for session tracking exceptions
  • STS for credentials via Lambda
  • KMS for signing JWT tokens
  • API Gateway custom authorizers for controlling per-session access to credentials, using JWT tokens that were signed with KMS keys
  • JavaScript-generated SDK from API Gateway using a service proxy to DynamoDB
  • RelayState in the SAMLRequest to ADFS to transmit the CognitoID and a short code from the client to your AWS backend

At a high level, this solution is similar to that of Scenario 1; however, most of the work is done in the infrastructure rather than on the client.

  • ADFS still uses a POST binding to redirect the SAMLResponse to API Gateway; however, the Lambda function does not immediately redirect.
  • The Lambda function decodes and uses an XML parser to read the properties of the SAML assertion.
  • If the user’s assertion shows that they belong to a certain group matching a specified string (“Prod” in the sample), then you assign a role that they can assume (“ADFS-Production”).
  • Lambda then gets the credentials on behalf of the user and stores them in DynamoDB as well as logging an audit record in a separate table.
  • Lambda then returns a short-lived, signed JSON Web Token (JWT) to the JavaScript application.
  • The application uses the JWT to get their stored credentials from DynamoDB through an API Gateway custom authorizer.

The architecture you build in this tutorial is outlined in the following diagram.

lambdasamltwo_1.png

First, a user visits your static website hosted on S3. They generate an ephemeral random code that is transmitted during redirection to ADFS, where they are prompted for their Active Directory credentials.

Upon successful authentication, the ADFS server redirects the SAMLResponse assertion, along with the code (as the RelayState) via POST to API Gateway.

The Lambda function parses the SAMLResponse. If the user is part of the appropriate Active Directory group (AWS-Production in this tutorial), it retrieves credentials from STS on behalf of the user.

The credentials are stored in a DynamoDB table called SAMLSessions, along with the short code. The user login is stored in a tracking table called SAMLUsers.

The Lambda function generates a JWT token, with a 30-second expiration time signed with KMS, then redirects the client back to the static website along with this token.

The client then makes a call to an API Gateway resource acting as a DynamoDB service proxy that retrieves the credentials via a DeleteItem call. To make this call, the client passes the JWT in the authorization header.

A custom authorizer runs to validate the token using the KMS key again as well as the original random code.

Now that the client has credentials, it can use these to access AWS resources.

Tutorial: Backend processing and audit tracking

Before you walk through this tutorial you will need the source code from the samljs-serverless-sample Github Repository. You should use the SAM template provided in order to streamline the process but we’ll outline how you you would manually create resources too. There is a readme in the repository with instructions for using the SAM template. Either way you will still perform the manual steps of KMS key configuration, ADFS enablement of RelayState, and Amazon Cognito Identity Pool creation. The template will automate the process in creating the S3 website, Lambda functions, API Gateway resources and DynamoDB tables.

We walk through the details of all the steps and configuration below for illustrative purposes in this tutorial calling out the sections that can be omitted if you used the SAM template.

KMS key configuration

To sign JWT tokens, you need an encrypted plaintext key, to be stored in KMS. You will need to complete this step even if you use the SAM template.

  1. In the IAM console, choose Encryption Keys, Create Key.
  2. For Alias, type sessionMaster.
  3. For Advanced Options, choose KMS, Next Step.
  4. For Key Administrative Permissions, select your administrative role or user account.
  5. For Key Usage Permissions, you can leave this blank as the IAM Role (next section) will have individual key actions configured. This allows you to perform administrative actions on the set of keys while the Lambda functions have rights to just create data keys for encryption/decryption and use them to sign JWTs.
  6. Take note of the Key ID, which is needed for the Lambda functions.

IAM role configuration

You will need an IAM role for executing your Lambda functions. If you are using the SAM template this can be skipped. The sample code in the GitHub repository under Scenario2 creates separate roles for each function, with limited permissions on individual resources when you use the SAM template. We recommend separate roles scoped to individual resources for production deployments. Your Lambda functions need the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1432927122000",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                “dynamodb:GetItem”,
                “dynamodb:DeleteItem”,
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "kms:GenerateDataKey*",
                “kms:Decrypt”
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Lambda function configuration

If you are not using the SAM template, create the following three Lambda functions from the GitHub repository in /Scenario2/lambda using the following names and environment variables. The Lambda functions are written in Node.js.

  • GenerateKey_awslabs_samldemo
  • ProcessSAML_awslabs_samldemo
  • SAMLCustomAuth_awslabs_samldemo

The functions above are built, packaged, and uploaded to Lambda. For two of the functions, this can be done from your workstation (the sample commands for each function assume OSX or Linux). The third will need to be built on an AWS EC2 instance running the current Lambda AMI.

GenerateKey_awslabs_samldemo

This function is only used one time to create keys in KMS for signing JWT tokens. The function calls GenerateDataKey and stores the encrypted CipherText blob as Base64 in DynamoDB. This is used by the other two functions for getting the PlainTextKey for signing with a Decrypt operation.

This function only requires a single file. It has the following environment variables:

  • KMS_KEY_ID: Unique identifier from KMS for your sessionMaster Key
  • SESSION_DDB_TABLE: SAMLSessions
  • ENC_CONTEXT: ADFS (or something unique to your organization)
  • RAND_HASH: us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Navigate into /Scenario2/lambda/GenerateKey and run the following commands:

zip –r generateKey.zip .

aws lambda create-function --function-name GenerateKey_awslabs_samldemo --runtime nodejs4.3 --role LAMBDA_ROLE_ARN --handler index.handler --timeout 10 --memory-size 512 --zip-file fileb://generateKey.zip --environment Variables={SESSION_DDB_TABLE=SAMLSessions,ENC_CONTEXT=ADFS,RAND_HASH=us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,KMS_KEY_ID=<kms key="KEY" id="ID">}

SAMLCustomAuth_awslabs_samldemo

This is an API Gateway custom authorizer called after the client has been redirected to the website as part of the login workflow. This function calls a GET against the service proxy to DynamoDB, retrieving credentials. The function uses the KMS key signing validation of the JWT created in the ProcessSAML_awslabs_samldemo function and also validates the random code that was generated at the beginning of the login workflow.

You must install the dependencies before zipping this function up. It has the following environment variables:

  • SESSION_DDB_TABLE: SAMLSessions
  • ENC_CONTEXT: ADFS (or whatever was used in GenerateKey_awslabs_samldemo)
  • ID_HASH: us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Navigate into /Scenario2/lambda/CustomAuth and run:

npm install

zip –r custom_auth.zip .

aws lambda create-function --function-name SAMLCustomAuth_awslabs_samldemo --runtime nodejs4.3 --role LAMBDA_ROLE_ARN --handler CustomAuth.handler --timeout 10 --memory-size 512 --zip-file fileb://custom_auth.zip --environment Variables={SESSION_DDB_TABLE=SAMLSessions,ENC_CONTEXT=ADFS,ID_HASH= us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX }

ProcessSAML_awslabs_samldemo

This function is called when ADFS sends the SAMLResponse to API Gateway. The function parses the SAML assertion to select a role (based on a simple string search) and extract user information. It then uses this data to get short-term credentials from STS via AssumeRoleWithSAML and stores this information in a SAMLSessions table and tracks the user login via a SAMLUsers table. Both of these are DynamoDB tables but you could also store the user information in another AWS database type, as this is for auditing purposes. Finally, this function creates a JWT (signed with the KMS key) which is only valid for 30 seconds and is returned to the client as part of a 302 redirect from API Gateway.

This function needs to be built on an EC2 server running Amazon Linux. This function leverages two main external libraries:

  • nJwt: Used for secure JWT creation for individual client sessions to get access to their records
  • libxmljs: Used for XML XPath queries of the decoded SAMLResponse from AD FS

Libxmljs uses native build tools and you should run this on EC2 running the same AMI as Lambda and with Node.js v4.3.2; otherwise, you might see errors. For more information about current Lambda AMI information, see Lambda Execution Environment and Available Libraries.

After you have the correct AMI launched in EC2 and have SSH open to that host, install Node.js. Ensure that the Node.js version on EC2 is 4.3.2, to match Lambda. If your version is off, you can roll back with NVM.

After you have set up Node.js, run the following command:

yum install -y make gcc*

Now, create a /saml folder on your EC2 server and copy up ProcessSAML.js and package.json from /Scenario2/lambda/ProcessSAML to the EC2 server. Here is a sample SCP command:

cd ProcessSAML/

ls

package.json    ProcessSAML.js

scp -i ~/path/yourpemfile.pem ./* [email protected]:/home/ec2-user/saml/

Then you can SSH to your server, cd into the /saml directory, and run:

npm install

A successful build should look similar to the following:

lambdasamltwo_2.png

Finally, zip up the package and create the function using the following AWS CLI command and these environment variables. Configure the CLI with your credentials as needed.

  • SESSION_DDB_TABLE: SAMLSessions
  • ENC_CONTEXT: ADFS (or whatever was used in GenerateKeyawslabssamldemo)
  • PRINCIPAL_ARN: Full ARN of the AD FS IdP created in the IAM console
  • USER_DDB_TABLE: SAMLUsers
  • REDIRECT_URL: Endpoint URL of your static S3 website (or CloudFront distribution domain name if you did that optional step)
  • ID_HASH: us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
zip –r saml.zip .

aws lambda create-function --function-name ProcessSAML_awslabs_samldemo --runtime nodejs4.3 --role LAMBDA_ROLE_ARN --handler ProcessSAML.handler --timeout 10 --memory-size 512 --zip-file fileb://saml.zip –environment Variables={USER_DDB_TABLE=SAMLUsers,SESSION_DDB_TABLE= SAMLSessions,REDIRECT_URL=<your S3 bucket and test page path>,ID_HASH=us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,ENC_CONTEXT=ADFS,PRINCIPAL_ARN=<your ADFS IdP ARN>}

If you built the first two functions on your workstation and created the ProcessSAML_awslabs_samldemo function separately in the Lambda console before building on EC2, you can update the code after building on with the following command:

aws lambda update-function-code --function-name ProcessSAML_awslabs_samldemo --zip-file fileb://saml.zip

Role trust policy configuration

This scenario uses STS directly to assume a role. You will need to complete this step even if you use the SAM template. Modify the trust policy, as you did before when Amazon Cognito was assuming the role. In the GitHub repository sample code, ProcessSAML.js is preconfigured to filter and select a role with “Prod” in the name via the selectedRole variable.

This is an example of business logic you can alter in your organization later, such as a callout to an external mapping database for other rules matching. In this tutorial, it corresponds to the ADFS-Production role that was created.

  1. In the IAM console, choose Roles and open the ADFS-Production Role.
  2. Edit the Trust Permissions field and replace the content with the following:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Federated": [
              "arn:aws:iam::ACCOUNTNUMBER:saml-provider/ADFS"
    ]
          },
          "Action": "sts:AssumeRoleWithSAML"
        }
      ]
    }

If you end up using another role (or add more complex filtering/selection logic), ensure that those roles have similar trust policy configurations. Also note that the sample policy above purposely uses an array for the federated provider matching the IdP ARN that you added. If your environment has multiple SAML providers, you could list them here and modify the code in ProcessSAML.js to process requests from different IdPs and grant or revoke credentials accordingly.

DynamoDB table creation

If you are not using the SAM template, create two DynamoDB tables:

  • SAMLSessions: Temporarily stores credentials from STS. Credentials are removed by an API Gateway Service Proxy to the DynamoDB DeleteItem call that simultaneously returns the credentials to the client.
  • SAMLUsers: This table is for tracking user information and the last time they authenticated in the system via ADFS.

The following AWS CLI commands creates the tables (indexed only with a primary key hash, called identityHash and CognitoID respectively):

aws dynamodb create-table \
    --table-name SAMLSessions \
    --attribute-definitions \
        AttributeName=group,AttributeType=S \
    --key-schema AttributeName=identityhash,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
aws dynamodb create-table \
    --table-name SAMLUsers \
    --attribute-definitions \
        AttributeName=CognitoID,AttributeType=S \
    --key-schema AttributeName=CognitoID,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

After the tables are created, you should be able to run the GenerateKey_awslabs_samldemo Lambda function and see a CipherText key stored in SAMLSessions. This is only for convenience of this post, to demonstrate that you should persist CipherText keys in a data store and never persist plaintext keys that have been decrypted. You should also never log plaintext keys in your code.

API Gateway configuration

If you are not using the SAM template, you will need to create API Gateway resources. If you have created resources for Scenario 1 in Part I, then the naming of these resources may be similar. If that is the case, then simply create an API with a different name (SAMLAuth2 or similar) and follow these steps accordingly.

  1. In the API Gateway console for your API, choose Authorizers, Custom Authorizer.
  2. Select your region and enter SAMLCustomAuth_awslabs_samldemo for the Lambda function. Choose a friendly name like JWTParser and ensure that Identity token source is method.request.header.Authorization. This tells the custom authorizer to look for the JWT in the Authorization header of the HTTP request, which is specified in the JavaScript code on your S3 webpage. Save the changes.

    lambdasamltwo_3.png

Now it’s time to wire up the Lambda functions to API Gateway.

  1. In the API Gateway console, choose Resources, select your API, and then create a Child Resource called SAML. This includes a POST and a GET method. The POST method uses the ProcessSAML_awslabs_samldemo Lambda function and a 302 redirect, while the GET method uses the JWTParser custom authorizer with a service proxy to DynamoDB to retrieve credentials upon successful authorization.
  2. lambdasamltwo_4.png

  3. Create a POST method. For Integration Type, choose Lambda and add the ProcessSAML_awslabs_samldemo Lambda function. For Method Request, add headers called RelayState and SAMLResponse.

    lambdasamltwo_5.png

  4. Delete the Method Response code for 200 and add a 302. Create a response header called Location. In the Response Models section, for Content-Type, choose application/json and for Models, choose Empty.

    lambdasamltwo_6.png

  5. Delete the Integration Response section for 200 and add one for 302 that has a Method response status of 302. Edit the response header for Location to add a Mapping value of integration.response.body.location.

    lambdasamltwo_7.png

  6. Finally, in order for Lambda to capture the SAMLResponse and RelayState values, choose Integration Request.

  7. In the Body Mapping Template section, for Content-Type, enter application/x-www-form-urlencoded and add the following template:

    {
    "SAMLResponse" :"$input.params('SAMLResponse')",
    "RelayState" :"$input.params('RelayState')",
    "formparams" : $input.json('$')
    }

  8. Create a GET method with an Integration Type of Service Proxy. Select the region and DynamoDB as the AWS Service. Use POST for the HTTP method and DeleteItem for the Action. This is important as you leverage a DynamoDB feature to return the current records when you perform deletion. This simultaneously allows credentials in this system to not be stored long term and also allows clients to retrieve them. For Execution role, use the Lambda role from earlier or a new role that only has IAM scoped permissions for DeleteItem on the SAMLSessions table.

    lambdasamltwo_8.png

  9. Save this and open Method Request.

  10. For Authorization, select your custom authorizer JWTParser. Add in a header called COGNITO_ID and save the changes.

    lambdasamltwo_9.png

  11. In the Integration Request, add in a header name of Content-Type and a value for Mapped of ‘application/x-amzn-json-1.0‘ (you need the single quotes surrounding the entry).

  12. Next, in the Body Mapping Template section, for Content-Type, enter application/json and add the following template:

    {
        "TableName": "SAMLSessions",
        "Key": {
            "identityhash": {
                "S": "$input.params('COGNITO_ID')"
            }
        },
        "ReturnValues": "ALL_OLD"
    }

Inspect this closely for a moment. When your client passes the JWT in an Authorization Header to this GET method, the JWTParser Custom Authorizer grants/denies executing a DeleteItem call on the SAMLSessions table.

ADF

If it is granted, then there needs to be an item to delete the reference as a primary key to the table. The client JavaScript (seen in a moment) passes its CognitoID through as a header called COGNITO_ID that is mapped above. DeleteItem executes to remove the credentials that were placed there via a call to STS by the ProcessSAML_awslabs_samldemo Lambda function. Because the above action specifies ALL_OLD under the ReturnValues mapping, DynamoDB returns these credentials at the same time.

lambdasamltwo_10.png

  1. Save the changes and open your /saml resource root.
  2. Choose Actions, Enable CORS.
  3. In the Access-Control-Allow-Headers section, add COGNITO_ID into the end (inside the quotes and separated from other headers by a comma), then choose Enable CORS and replace existing CORS headers.
  4. When completed, choose Actions, Deploy API. Use the Prod stage or another stage.
  5. In the Stage Editor, choose SDK Generation. For Platform, choose JavaScript and then choose Generate SDK. Save the folder someplace close. Take note of the Invoke URL value at the top, as you need this for ADFS configuration later.

Website configuration

If you are not using the SAM template, create an S3 bucket and configure it as a static website in the same way that you did for Part I.

If you are using the SAM template this will automatically be created for you however the steps below will still need to be completed:

In the source code repository, edit /Scenario2/website/configs.js.

  1. Ensure that the identityPool value matches your Amazon Cognito Pool ID and the region is correct.
  2. Leave adfsUrl the same if you’re testing on your lab server; otherwise, update with the AD FS DNS entries as appropriate.
  3. Update the relayingPartyId value as well if you used something different from the prerequisite blog post.

Next, download the minified version of the AWS SDK for JavaScript in the Browser (aws-sdk.min.js) and place it along with the other files in /Scenario2/website into the S3 bucket.

Copy the files from the API Gateway Generated SDK in the last section to this bucket so that the apigClient.js is in the root directory and lib folder is as well. The imports for these scripts (which do things like sign API requests and configure headers for the JWT in the Authorization header) are already included in the index.html file. Consult the latest API Gateway documentation if the SDK generation process updates in the future

ADFS configuration

Now that the AWS setup is complete, modify your ADFS setup to capture RelayState information about the client and to send the POST response to API Gateway for processing. You will need to complete this step even if you use the SAM template.

If you’re using Windows Server 2008 with ADFS 2.0, ensure that Update Rollup 2 is installed before enabling RelayState. Please see official Microsoft documentation for specific download information.

  1. After Update Rollup 2 is installed, modify %systemroot%\inetpub\adfs\ls\web.config. If you’re on a newer version of Windows Server running AD FS 3.0, modify %systemroot%\ADFS\Microsoft.IdentityServer.Servicehost.exe.config.
  2. Find the section in the XML marked <Microsoft.identityServer.web> and add an entry for <useRelayStateForIdpInitiatedSignOn enabled="true">. If you have the proper ADFS rollup or version installed, this should allow the RelayState parameter to be accepted by the service provider.
  3. In the ADFS console, open Relaying Party Trusts for Amazon Web Services and choose Endpoints.
  4. For Binding, choose POST and for Invoke URL,enter the URL to your API Gateway from the stage that you noted earlier.

At this point, you are ready to test out your webpage. Navigate to the S3 static website Endpoint URL and it should redirect you to the ADFS login screen. If the user login has been recent enough to have a valid SAML cookie, then you should see the login pass-through; otherwise, a login prompt appears. After the authentication has taken place, you should quickly end up back at your original webpage. Using the browser debugging tools, you see “Successful DDB call” followed by the results of a call to STS that were stored in DynamoDB.

lambdasamltwo_11.png

As in Scenario 1, the sample code under /scenario2/website/index.html has a button that allows you to “ping” an endpoint to test if the federated credentials are working. If you have used the SAM template this should already be working and you can test it out (it will fail at first – keep reading to find out how to set the IAM permissions!). If not go to API Gateway and create a new Resource called /users at the same level of /saml in your API with a GET method.

lambdasamltwo_12.png

For Integration type, choose Mock.

lambdasamltwo_13.png

In the Method Request, for Authorization, choose AWS_IAM. In the Integration Response, in the Body Mapping Template section, for Content-Type, choose application/json and add the following JSON:

{
    "status": "Success",
    "agent": "${context.identity.userAgent}"
}

lambdasamltwo_14.png

Before using this new Mock API as a test, configure CORS and re-generate the JavaScript SDK so that the browser knows about the new methods.

  1. On the /saml resource root and choose Actions, Enable CORS.
  2. In the Access-Control-Allow-Headers section, add COGNITO_ID into the endpoint and then choose Enable CORS and replace existing CORS headers.
  3. Choose Actions, Deploy API. Use the stage that you configured earlier.
  4. In the Stage Editor, choose SDK Generation and select JavaScript as your platform. Choose Generate SDK.
  5. Upload the new apigClient.js and lib directory to the S3 bucket of your static website.

One last thing must be completed before testing (You will need to complete this step even if you use the SAM template) if the credentials can invoke this mock endpoint with AWS_IAM credentials. The ADFS-Production Role needs execute-api:Invoke permissions for this API Gateway resource.

  1. In the IAM console, choose Roles, and open the ADFS-Production Role.

  2. For testing, you can attach the AmazonAPIGatewayInvokeFullAccess policy; however, for production, you should scope this down to the resource as documented in Control Access to API Gateway with IAM Permissions.

  3. After you have attached a policy with invocation rights and authenticated with AD FS to finish the redirect process, choose PING.

If everything has been set up successfully you should see an alert with information about the user agent.

Final Thoughts

We hope these scenarios and sample code help you to not only begin to build comprehensive enterprise applications on AWS but also to enhance your understanding of different AuthN and AuthZ mechanisms. Consider some ways that you might be able to evolve this solution to meet the needs of your own customers and innovate in this space. For example:

  • Completing the CloudFront configuration and leveraging SSL termination for site identification. See if this can be incorporated into the Lambda processing pipeline.
  • Attaching a scope-down IAM policy if the business rules are matched. For example, the default role could be more permissive for a group but if the user is a contractor (username with –C appended) they get extra restrictions applied when assumeRoleWithSaml is called in the ProcessSAML_awslabs_samldemo Lambda function.
  • Changing the time duration before credentials expire on a per-role basis. Perhaps if the SAMLResponse parsing determines the user is an Administrator, they get a longer duration.
  • Passing through additional user claims in SAMLResponse for further logical decisions or auditing by adding more claim rules in the ADFS console. This could also be a mechanism to synchronize some Active Directory schema attributes with AWS services.
  • Granting different sets of credentials if a user has accounts with multiple SAML providers. While this tutorial was made with ADFS, you could also leverage it with other solutions such as Shibboleth and modify the ProcessSAML_awslabs_samldemo Lambda function to be aware of the different IdP ARN values. Perhaps your solution grants different IAM roles for the same user depending on if they initiated a login from Shibboleth rather than ADFS?

The Lambda functions can be altered to take advantage of these options which you can read more about here. For more information about ADFS claim rule language manipulation, see The Role of the Claim Rule Language on Microsoft TechNet.

We would love to hear feedback from our customers on these designs and see different secure application designs that you’re implementing on the AWS platform.

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

Post Syndicated from Chris Marshall original https://blogs.aws.amazon.com/bigdata/post/Tx1XNQPQ2ARGT81/Real-time-Clickstream-Anomaly-Detection-with-Amazon-Kinesis-Analytics

Chris Marshall is a Solutions Architect for Amazon Web Services

Analyzing web log traffic to gain insights that drive business decisions has historically been performed using batch processing.  While effective, this approach results in delayed responses to emerging trends and user activities.  There are solutions to deal with processing data in real time using streaming and micro-batching technologies, but they can be complex to set up and maintain.  Amazon Kinesis Analytics is a managed service that makes it very easy to identify and respond to changes in behavior in real-time.   

One use case where it’s valuable to have immediate insights is analyzing clickstream data.   In the world of digital advertising, an impression is when an ad is displayed in a browser and a clickthrough represents a user clicking on that ad.  A clickthrough rate (CTR) is one way to monitor the ad’s effectiveness.  CTR is calculated in the form of: CTR = Clicks / Impressions * 100.  Digital marketers are interested in monitoring CTR to know when particular ads perform better than normal, giving them a chance to optimize placements within the ad campaign.  They may also be interested in anomalous low-end CTR that could be a result of a bad image or bidding model.

In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics. 

RANDOM_CUT_FOREST 

Amazon Kinesis Analytics includes a powerful set of analytics functions to analyze streams of data.  One such function is RANDOM_CUT_FOREST.  This function detects anomalies by scoring data flowing through a dynamic data stream. This novel approach identifies a normal pattern in streaming data, then compares new data points in reference to it. For more information, see Robust Random Cut Forest Based Anomaly Detection On Streams.

Analytics pipeline components

To demonstrate how the RANDOM_CUT_FOREST function can be used to detect anomalies in real-time click through rates, I will walk you through how to build an analytics pipeline and generate web traffic using a simple Python script.  When your injected anomaly is detected, you get an email or SMS message to alert you to the event. 

This post first walks through the components of the analytics pipeline, then discusses how to build and test it in your own AWS account.  The accompanying AWS CloudFormation script builds the Amazon API Gateway API, Amazon Kinesis streams, AWS Lambda function, Amazon SNS components, and IAM roles required for this walkthrough. Then you manually create the Amazon Kinesis Analytics application that uses the RANDOM_CUT_FOREST function in SQL. 

Web Client

Because Python is a popular scripting language available on many clients, we’ll use it to generate web impressions and click data by making HTTP GET requests with the requests library.  This script mimics beacon traffic generated by web browsers.  The GET request contains a query string value of click or impression to indicate the type of beacon being captured.  The script has a while loop that iterates 2500 times.  For each pass through the loop, an Impression beacon is sent.  Then, the random function is used to potentially send an additional click beacon.  For most of the passes through the loop, the click request is generated at the rate of roughly 10% of the time.  However, for some web requests, the CTR jumps to a rate of 50% representing an anomaly.  These are artificially high CTR rates used for the purpose of illustration for this post, but the functionality would be the same using small fractional values

import requests
import random
import sys
import argparse

def getClicked(rate):
	if random.random() <= rate:
		return True
	else:
		return False

def httpGetImpression():
	url = args.target + '?browseraction=Impression' 
	r = requests.get(url)

def httpGetClick():
	url = args.target + '?browseraction=Click' 
	r = requests.get(url)
	sys.stdout.write('+')

parser = argparse.ArgumentParser()
parser.add_argument("target",help=" the http(s) location to send the GET request")
args = parser.parse_args()
i = 0
while (i < 2500):
	httpGetImpression()
	if(i<1950 or i>=2000):
		clicked = getClicked(.1)
		sys.stdout.write('_')
	else:
		clicked = getClicked(.5)
		sys.stdout.write('-')
	if(clicked):
		httpGetClick()
	i = i + 1
	sys.stdout.flush()

Web “front door”

Client web requests are handled by Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. This handles the web request traffic to get data into your analytics pipeline by being a service proxy to an Amazon Kinesis stream.  API Gateway takes click and impression web traffic and converts the HTTP header and query string data into a JSON message and place it into your stream.  The CloudFormation script creates an API endpoint with a body mapping template similar to the following:

#set($inputRoot = $input.path('$'))
{
  "Data": "$util.base64Encode("{ ""browseraction"" : ""$input.params('browseraction')"", ""site"" : ""$input.params('Host')"" }")",
  "PartitionKey" : "$input.params('Host')",
  "StreamName" : "CSEKinesisBeaconInputStream"
}

(API Gateway service proxy body mapping template)

This tells API Gateway to take the host header and query string inputs and convert them into a payload to be put into a stream using a service proxy for Amazon Kinesis Streams.

Input stream

Amazon Kinesis Streams can capture terabytes of data per hour from thousands of sources.  In this case, you use it to deliver your JSON messages to Amazon Kinesis Analytics. 

Analytics application

Amazon Kinesis Analytics processes the incoming messages and allow you to perform analytics on the dynamic stream of data.  You use SQL to calculate the CTR and pass that into your RANDOM_CUT_FOREST function to calculate an anomaly score. 

First, calculate the counts of impressions and clicks using SQL with a tumbling window in Amazon Kinesis Analytics.  Analytics uses streams and pumps in SQL to process data.  See our earlier post to get an overview of streaming data with Kinesis Analytics. Inside the Analytics SQL, a stream is analogous to a table and a pump is the flow of data into those tables.  Here’s the description of streams for the impression and click data.

CREATE OR REPLACE STREAM "CLICKSTREAM" ( 
   "CLICKCOUNT" DOUBLE
);


CREATE OR REPLACE STREAM "IMPRESSIONSTREAM" ( 
   "IMPRESSIONCOUNT" DOUBLE
);

Later, you calculate the clicks divided by impressions; you are using a DOUBLE data type to contain the count values.  If you left them as integers, the division below would yield only a 0 or 1.  Next, you define the pumps that populate the stream using a tumbling window, a window of time during which all records received are considered as part of the statement. 

CREATE OR REPLACE PUMP "CLICKPUMP" STARTED AS 
INSERT INTO "CLICKSTREAM" ("CLICKCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Click'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);
CREATE OR REPLACE PUMP "IMPRESSIONPUMP" STARTED AS 
INSERT INTO "IMPRESSIONSTREAM" ("IMPRESSIONCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Impression'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);

The GROUP BY statement uses FLOOR and ROWTIME to create a tumbling window of 10 seconds.  This tumbling window will output one record every ten seconds assuming we are receiving data from the corresponding pump.  In this case, you output the total number of records that were clicks or impressions. 

Next, use the output of these pumps to calculate the CTR:

CREATE OR REPLACE STREAM "CTRSTREAM" (
  "CTR" DOUBLE
);

CREATE OR REPLACE PUMP "CTRPUMP" STARTED AS 
INSERT INTO "CTRSTREAM" ("CTR")
SELECT STREAM "CLICKCOUNT" / "IMPRESSIONCOUNT" * 100 as "CTR"
FROM "IMPRESSIONSTREAM",
  "CLICKSTREAM"
WHERE "IMPRESSIONSTREAM".ROWTIME = "CLICKSTREAM".ROWTIME;

Finally, these CTR values are used in RANDOM_CUT_FOREST to detect anomalous values.  The DESTINATION_SQL_STREAM is the output stream of data from the Amazon Kinesis Analytics application. 

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "CTRPERCENT" DOUBLE,
    "ANOMALY_SCORE" DOUBLE
);

CREATE OR REPLACE PUMP "OUTPUT_PUMP" STARTED AS 
INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM * FROM
TABLE (RANDOM_CUT_FOREST( 
             CURSOR(SELECT STREAM "CTR" FROM "CTRSTREAM"), --inputStream
             100, --numberOfTrees (default)
             12, --subSampleSize 
             100000, --timeDecay (default)
             1) --shingleSize (default)
)
WHERE ANOMALY_SCORE > 2; 

The RANDOM_CUT_FOREST function greatly simplifies the programming required for anomaly detection.  However, understanding your data domain is paramount when performing data analytics.  The RANDOM_CUT_FOREST function is a tool for data scientists, not a replacement for them.  Knowing whether your data is logarithmic, circadian rhythmic, linear, etc. will provide the insights necessary to select the right parameters for RANDOM_CUT_FOREST.  For more information about parameters, see the RANDOM_CUT_FOREST Function

Fortunately, the default values work in a wide variety of cases. In this case, use the default values for all but the subSampleSize parameter.  Typically, you would use a larger sample size to increase the pool of random samples used to calculate the anomaly score; for this post, use 12 samples so as to start evaluating the anomaly scores sooner.  

Your SQL query outputs one record every ten seconds from the tumbling window so you’ll have enough evaluation values after two minutes to start calculating the anomaly score.  You are also using a cutoff value where records are only output to “DESTINATION_SQL_STREAM” if the anomaly score from the function is greater than 2 using the WHERE clause. To help visualize the cutoff point, here are the data points from a few runs through the pipeline using the sample Python script:

As you can see, the majority of the CTR data points are grouped around a 10% rate.  As the values move away from the 10% rate, their anomaly scores go up because they are more infrequent.  In the graph, an anomaly score above the cutoff is shaded orange. A cutoff value of 2 works well for this data set.  

Output stream

The Amazon Kinesis Analytics application outputs the values from “DESTINATION_SQL_STREAM” to an Amazon Kinesis stream when the record has an anomaly score greater than 2.

Anomaly execution function

The output stream is an input event for an AWS Lambda function with a batch size of 1, so the function gets called one time for each message put in the output stream.  This function represents a place where you could react to anomalous events.  In a real world scenario, you may want to instead update a real-time bidding system to react to current events.  In this example, you send a message to SNS to see a tangible response to the changing CTR.  The CloudFormation template creates a Lambda function similar to the following:

var AWS = require('aws-sdk');
var sns = new AWS.SNS( { region: "<region>" });
	exports.handler = function(event, context) {
	//our batch size is 1 record so loop expected to process only once
	event.Records.forEach(function(record) {
		// Kinesis data is base64 encoded so decode here
		var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
		var rec = payload.split(',');
		var ctr = rec[0];
		var anomaly_score = rec[1];
		var detail = 'Anomaly detected with a click through rate of ' + ctr + '% and an anomaly score of ' + anomaly_score;
		var subject = 'Anomaly Detected';
		var params = {
			Message: detail,
			MessageStructure: 'String',
			Subject: subject,
			TopicArn: 'arn:aws:sns:us-east-1::ClickStreamEvent'
		};
		sns.publish(params, function(err, data) {
			if (err) context.fail(err.stack);
			else     context.succeed('Published Notification');
		});
	});
}; 

Notification system

Amazon SNS is a fully managed, highly scalable messaging platform.  For this walkthrough, you send messages to SNS with subscribers that send email and text messages to a mobile phone.

Analytics pipeline walkthrough

Sometimes it’s best to build out a solution so you can see all the parts working and get a good sense of how it works.   Here are the steps to build out the entire pipeline as described above in your own account and perform real-time clickstream analysis yourself.  Following this walkthrough creates resources in your account that you can use to test the example pipeline.

Prerequisites

To complete the walkthrough, you’ll need an AWS account and a place to execute Python scripts.

Configure the CloudFormation template

Download the CloudFormation template named CSE-cfn.json from the AWS Big Data blog Github repository. In the CloudFormation console, set your region to N. Virginia, Oregon, or Ireland and then browse to that file.

When your script executes and an anomaly is detected, SNS sends a notification.  To see these in an email or text message, enter your email address and mobile phone number in the Parameters section and choose Next

This CloudFormation script creates policies and roles necessary to process the incoming messages.  Acknowledge this by selecting the field or you will get a warning about IAM_CAPABILITY.

If you entered a valid email or SMS number, you will receive a validation message when the SNS subscriptions are created by CloudFormation.  If you don’t wish to receive email or texts from your pipeline, you don’t need to complete the verification process.  

Run the Python script

Copy ClickImpressionGenerator.py to a local folder.

This script uses the requests library to make HTTP requests.  If you don’t have that Python package globally installed, you can install it locally in by running the following command in the same folder as the Python script:

pip install requests –t .

When the CloudFormation stack is complete, choose Outputs, ExecutionCommand, and run the command from the folder with the Python script. 

This will start generating web traffic and send it to the API Gateway in the front of your analytics pipeline.  It is important to have data flowing into your Amazon Kinesis Analytics application for the next section so that a schema can be generated based on the incoming data.  If the script completes before you finish with the next section, simply restart the script with the same command.

Create the Analytics pipeline application

Open the Amazon Kinesis console and choose Go to Analytics.

Choose Create new application to create the Kinesis Analytics application, enter an application name and description, and then choose Save and continue

Choose Connect to a source to see a list of streams.  Select the stream that starts with the name of your CloudFormation stack and contains CSEKinesisBeaconInputStream in the form of: <stack name>- CSEKinesisBeaconInputStream-<random string>.  This is the stream that your click and impression requests will be sent to from API Gateway. 

Scroll down and choose Choose an IAM role.  Select the role with the name in the form of <stack name>-CSEKinesisAnalyticsRole-<random string>.  If your ClickImpressionGenerator.py script is still running and generating data, you should see a stream sample.  Scroll to the bottom and choose Save and continue

Next, add the SQL for your real-time analytics.  Choose the Go to SQL editor, choose Yes, start application, and paste the SQL contents from CSE-SQL.sql into the editor. Choose Save and run SQL.  After a minute or so, you should see data showing up every ten seconds in the CLICKSTREAM.  Choose Exit when you are done editing. 

Choose Connect to a destination and select the <stack name>-CSEKinesisBeaconOutputStream-<random string> from the list of streams.  Change the output format to CSV. 

Choose Choose an IAM role and select <stack name>-CSEKinesisAnalyticsRole-<random string> again.  Choose Save and continue.  Now your analytics pipeline is complete.  When your script gets to the injected anomaly section, you should get a text message or email to notify you of the anomaly.  To clean up the resources created here, delete the stack in CloudFormation, stop the Kinesis Analytics application then delete it. 

Summary

A pipeline like this can be used for many use cases where anomaly detection is valuable. What solutions have you enabled with this architecture? If you have questions or suggestions, please comment below.

—————————–

Related

Process Amazon Kinesis Aggregated Data with AWS Lambda

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

Post Syndicated from Chris Marshall original https://aws.amazon.com/blogs/big-data/real-time-clickstream-anomaly-detection-with-amazon-kinesis-analytics/

Chris Marshall is a Solutions Architect for Amazon Web Services

Analyzing web log traffic to gain insights that drive business decisions has historically been performed using batch processing.  While effective, this approach results in delayed responses to emerging trends and user activities.  There are solutions to deal with processing data in real time using streaming and micro-batching technologies, but they can be complex to set up and maintain.  Amazon Kinesis Analytics is a managed service that makes it very easy to identify and respond to changes in behavior in real-time.

One use case where it’s valuable to have immediate insights is analyzing clickstream data.   In the world of digital advertising, an impression is when an ad is displayed in a browser and a clickthrough represents a user clicking on that ad.  A clickthrough rate (CTR) is one way to monitor the ad’s effectiveness.  CTR is calculated in the form of: CTR = Clicks / Impressions * 100.  Digital marketers are interested in monitoring CTR to know when particular ads perform better than normal, giving them a chance to optimize placements within the ad campaign.  They may also be interested in anomalous low-end CTR that could be a result of a bad image or bidding model.

In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics.

RANDOM_CUT_FOREST 

Amazon Kinesis Analytics includes a powerful set of analytics functions to analyze streams of data.  One such function is RANDOM_CUT_FOREST.  This function detects anomalies by scoring data flowing through a dynamic data stream. This novel approach identifies a normal pattern in streaming data, then compares new data points in reference to it. For more information, see Robust Random Cut Forest Based Anomaly Detection On Streams.

Analytics pipeline components

To demonstrate how the RANDOM_CUT_FOREST function can be used to detect anomalies in real-time click through rates, I will walk you through how to build an analytics pipeline and generate web traffic using a simple Python script.  When your injected anomaly is detected, you get an email or SMS message to alert you to the event.

This post first walks through the components of the analytics pipeline, then discusses how to build and test it in your own AWS account.  The accompanying AWS CloudFormation script builds the Amazon API Gateway API, Amazon Kinesis streams, AWS Lambda function, Amazon SNS components, and IAM roles required for this walkthrough. Then you manually create the Amazon Kinesis Analytics application that uses the RANDOM_CUT_FOREST function in SQL.

Web Client

Because Python is a popular scripting language available on many clients, we’ll use it to generate web impressions and click data by making HTTP GET requests with the requests library.  This script mimics beacon traffic generated by web browsers.  The GET request contains a query string value of click or impression to indicate the type of beacon being captured.  The script has a while loop that iterates 2500 times.  For each pass through the loop, an Impression beacon is sent.  Then, the random function is used to potentially send an additional click beacon.  For most of the passes through the loop, the click request is generated at the rate of roughly 10% of the time.  However, for some web requests, the CTR jumps to a rate of 50% representing an anomaly.  These are artificially high CTR rates used for the purpose of illustration for this post, but the functionality would be the same using small fractional values

import requests
import random
import sys
import argparse

def getClicked(rate):
	if random.random() <= rate:
		return True
	else:
		return False

def httpGetImpression():
	url = args.target + '?browseraction=Impression' 
	r = requests.get(url)

def httpGetClick():
	url = args.target + '?browseraction=Click' 
	r = requests.get(url)
	sys.stdout.write('+')

parser = argparse.ArgumentParser()
parser.add_argument("target",help=" the http(s) location to send the GET request")
args = parser.parse_args()
i = 0
while (i < 2500):
	httpGetImpression()
	if(i<1950 or i>=2000):
		clicked = getClicked(.1)
		sys.stdout.write('_')
	else:
		clicked = getClicked(.5)
		sys.stdout.write('-')
	if(clicked):
		httpGetClick()
	i = i + 1
	sys.stdout.flush()

Web “front door”

Client web requests are handled by Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. This handles the web request traffic to get data into your analytics pipeline by being a service proxy to an Amazon Kinesis stream.  API Gateway takes click and impression web traffic and converts the HTTP header and query string data into a JSON message and place it into your stream.  The CloudFormation script creates an API endpoint with a body mapping template similar to the following:

#set($inputRoot = $input.path('$'))
{
  "Data": "$util.base64Encode("{ ""browseraction"" : ""$input.params('browseraction')"", ""site"" : ""$input.params('Host')"" }")",
  "PartitionKey" : "$input.params('Host')",
  "StreamName" : "CSEKinesisBeaconInputStream"
}

(API Gateway service proxy body mapping template)

This tells API Gateway to take the host header and query string inputs and convert them into a payload to be put into a stream using a service proxy for Amazon Kinesis Streams.

Input stream

Amazon Kinesis Streams can capture terabytes of data per hour from thousands of sources.  In this case, you use it to deliver your JSON messages to Amazon Kinesis Analytics.

Analytics application

Amazon Kinesis Analytics processes the incoming messages and allow you to perform analytics on the dynamic stream of data.  You use SQL to calculate the CTR and pass that into your RANDOM_CUT_FOREST function to calculate an anomaly score.

First, calculate the counts of impressions and clicks using SQL with a tumbling window in Amazon Kinesis Analytics.  Analytics uses streams and pumps in SQL to process data.  See our earlier post to get an overview of streaming data with Kinesis Analytics. Inside the Analytics SQL, a stream is analogous to a table and a pump is the flow of data into those tables.  Here’s the description of streams for the impression and click data.

CREATE OR REPLACE STREAM "CLICKSTREAM" ( 
   "CLICKCOUNT" DOUBLE
);


CREATE OR REPLACE STREAM "IMPRESSIONSTREAM" ( 
   "IMPRESSIONCOUNT" DOUBLE
);

Later, you calculate the clicks divided by impressions; you are using a DOUBLE data type to contain the count values.  If you left them as integers, the division below would yield only a 0 or 1.  Next, you define the pumps that populate the stream using a tumbling window, a window of time during which all records received are considered as part of the statement.

CREATE OR REPLACE PUMP "CLICKPUMP" STARTED AS 
INSERT INTO "CLICKSTREAM" ("CLICKCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Click'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);
CREATE OR REPLACE PUMP "IMPRESSIONPUMP" STARTED AS 
INSERT INTO "IMPRESSIONSTREAM" ("IMPRESSIONCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Impression'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);

The GROUP BY statement uses FLOOR and ROWTIME to create a tumbling window of 10 seconds.  This tumbling window will output one record every ten seconds assuming we are receiving data from the corresponding pump.  In this case, you output the total number of records that were clicks or impressions.

Next, use the output of these pumps to calculate the CTR:

CREATE OR REPLACE STREAM "CTRSTREAM" (
  "CTR" DOUBLE
);

CREATE OR REPLACE PUMP "CTRPUMP" STARTED AS 
INSERT INTO "CTRSTREAM" ("CTR")
SELECT STREAM "CLICKCOUNT" / "IMPRESSIONCOUNT" * 100 as "CTR"
FROM "IMPRESSIONSTREAM",
  "CLICKSTREAM"
WHERE "IMPRESSIONSTREAM".ROWTIME = "CLICKSTREAM".ROWTIME;

Finally, these CTR values are used in RANDOM_CUT_FOREST to detect anomalous values.  The DESTINATION_SQL_STREAM is the output stream of data from the Amazon Kinesis Analytics application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "CTRPERCENT" DOUBLE,
    "ANOMALY_SCORE" DOUBLE
);

CREATE OR REPLACE PUMP "OUTPUT_PUMP" STARTED AS 
INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM * FROM
TABLE (RANDOM_CUT_FOREST( 
             CURSOR(SELECT STREAM "CTR" FROM "CTRSTREAM"), --inputStream
             100, --numberOfTrees (default)
             12, --subSampleSize 
             100000, --timeDecay (default)
             1) --shingleSize (default)
)
WHERE ANOMALY_SCORE > 2; 

The RANDOM_CUT_FOREST function greatly simplifies the programming required for anomaly detection.  However, understanding your data domain is paramount when performing data analytics.  The RANDOM_CUT_FOREST function is a tool for data scientists, not a replacement for them.  Knowing whether your data is logarithmic, circadian rhythmic, linear, etc. will provide the insights necessary to select the right parameters for RANDOM_CUT_FOREST.  For more information about parameters, see the RANDOM_CUT_FOREST Function.

Fortunately, the default values work in a wide variety of cases. In this case, use the default values for all but the subSampleSize parameter.  Typically, you would use a larger sample size to increase the pool of random samples used to calculate the anomaly score; for this post, use 12 samples so as to start evaluating the anomaly scores sooner.

Your SQL query outputs one record every ten seconds from the tumbling window so you’ll have enough evaluation values after two minutes to start calculating the anomaly score.  You are also using a cutoff value where records are only output to “DESTINATION_SQL_STREAM” if the anomaly score from the function is greater than 2 using the WHERE clause. To help visualize the cutoff point, here are the data points from a few runs through the pipeline using the sample Python script:

As you can see, the majority of the CTR data points are grouped around a 10% rate.  As the values move away from the 10% rate, their anomaly scores go up because they are more infrequent.  In the graph, an anomaly score above the cutoff is shaded orange. A cutoff value of 2 works well for this data set.

Output stream

The Amazon Kinesis Analytics application outputs the values from “DESTINATION_SQL_STREAM” to an Amazon Kinesis stream when the record has an anomaly score greater than 2.

Anomaly execution function

The output stream is an input event for an AWS Lambda function with a batch size of 1, so the function gets called one time for each message put in the output stream.  This function represents a place where you could react to anomalous events.  In a real world scenario, you may want to instead update a real-time bidding system to react to current events.  In this example, you send a message to SNS to see a tangible response to the changing CTR.  The CloudFormation template creates a Lambda function similar to the following:

var AWS = require('aws-sdk');
var sns = new AWS.SNS( { region: "<region>" });
	exports.handler = function(event, context) {
	//our batch size is 1 record so loop expected to process only once
	event.Records.forEach(function(record) {
		// Kinesis data is base64 encoded so decode here
		var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
		var rec = payload.split(',');
		var ctr = rec[0];
		var anomaly_score = rec[1];
		var detail = 'Anomaly detected with a click through rate of ' + ctr + '% and an anomaly score of ' + anomaly_score;
		var subject = 'Anomaly Detected';
		var params = {
			Message: detail,
			MessageStructure: 'String',
			Subject: subject,
			TopicArn: 'arn:aws:sns:us-east-1::ClickStreamEvent'
		};
		sns.publish(params, function(err, data) {
			if (err) context.fail(err.stack);
			else     context.succeed('Published Notification');
		});
	});
}; 

Notification system

Amazon SNS is a fully managed, highly scalable messaging platform.  For this walkthrough, you send messages to SNS with subscribers that send email and text messages to a mobile phone.

Analytics pipeline walkthrough

Sometimes it’s best to build out a solution so you can see all the parts working and get a good sense of how it works.   Here are the steps to build out the entire pipeline as described above in your own account and perform real-time clickstream analysis yourself.  Following this walkthrough creates resources in your account that you can use to test the example pipeline.

Prerequisites

To complete the walkthrough, you’ll need an AWS account and a place to execute Python scripts.

Configure the CloudFormation template

Download the CloudFormation template named CSE-cfn.json from the AWS Big Data blog Github repository. In the CloudFormation console, set your region to N. Virginia, Oregon, or Ireland and then browse to that file.

When your script executes and an anomaly is detected, SNS sends a notification.  To see these in an email or text message, enter your email address and mobile phone number in the Parameters section and choose Next.

This CloudFormation script creates policies and roles necessary to process the incoming messages.  Acknowledge this by selecting the field or you will get a warning about IAM_CAPABILITY.

If you entered a valid email or SMS number, you will receive a validation message when the SNS subscriptions are created by CloudFormation.  If you don’t wish to receive email or texts from your pipeline, you don’t need to complete the verification process.

Run the Python script

Copy ClickImpressionGenerator.py to a local folder.

This script uses the requests library to make HTTP requests.  If you don’t have that Python package globally installed, you can install it locally in by running the following command in the same folder as the Python script:

pip install requests –t .

When the CloudFormation stack is complete, choose Outputs, ExecutionCommand, and run the command from the folder with the Python script.

This will start generating web traffic and send it to the API Gateway in the front of your analytics pipeline.  It is important to have data flowing into your Amazon Kinesis Analytics application for the next section so that a schema can be generated based on the incoming data.  If the script completes before you finish with the next section, simply restart the script with the same command.

Create the Analytics pipeline application

Open the Amazon Kinesis console and choose Go to Analytics.

Choose Create new application to create the Kinesis Analytics application, enter an application name and description, and then choose Save and continue.

Choose Connect to a source to see a list of streams.  Select the stream that starts with the name of your CloudFormation stack and contains CSEKinesisBeaconInputStream in the form of: <stack name>- CSEKinesisBeaconInputStream-<random string>.  This is the stream that your click and impression requests will be sent to from API Gateway.

Scroll down and choose Choose an IAM role.  Select the role with the name in the form of <stack name>-CSEKinesisAnalyticsRole-<random string>.  If your ClickImpressionGenerator.py script is still running and generating data, you should see a stream sample.  Scroll to the bottom and choose Save and continue.

Next, add the SQL for your real-time analytics.  Choose the Go to SQL editor, choose Yes, start application, and paste the SQL contents from CSE-SQL.sql into the editor. Choose Save and run SQL.  After a minute or so, you should see data showing up every ten seconds in the CLICKSTREAM.  Choose Exit when you are done editing.

Choose Connect to a destination and select the <stack name>-CSEKinesisBeaconOutputStream-<random string> from the list of streams.  Change the output format to CSV.

Choose Choose an IAM role and select <stack name>-CSEKinesisAnalyticsRole-<random string> again.  Choose Save and continue.  Now your analytics pipeline is complete.  When your script gets to the injected anomaly section, you should get a text message or email to notify you of the anomaly.  To clean up the resources created here, delete the stack in CloudFormation, stop the Kinesis Analytics application then delete it.

Summary

A pipeline like this can be used for many use cases where anomaly detection is valuable. What solutions have you enabled with this architecture? If you have questions or suggestions, please comment below.

 


Related

Process Amazon Kinesis Aggregated Data with AWS Lambda

Amazon API Gateway mapping improvements

Post Syndicated from Stefano Buliani original https://aws.amazon.com/blogs/compute/amazon-api-gateway-mapping-improvements/

Yesterday we announced the new Swagger import API. You may have also noticed a new first time user experience in the API Gateway console that automatically creates a sample Pet Store API and guides you though API Gateway features. That is not all we’ve been doing:

Over the past few weeks, we’ve made mapping requests and responses easier. This post takes you through the new features we introduced and gives practical examples of how to use them.

Multiple 2xx responses

We heard from many of you that you want to return more than one 2xx response code from your API. You can now configure Amazon API Gateway to return multiple 2xx response codes, each with its own header and body mapping templates. For example, when creating resources, you can return 201 for “created” and 202 for “accepted”.

Context variables in parameter mapping

We have added the ability to reference context variables from the parameter mapping fields. For example, you can include the identity principal or the stage name from the context variable in a header to your HTTP backend. To send the principalId returned by a custom authorizer in an X-User-ID header to your HTTP backend, use this mapping expression:

context.authorizer.principalId

For more information, see the context variable in the Mapping Template Reference page of the documentation.

Access to raw request body

Mapping templates in API Gateway help you transform incoming requests and outgoing responses from your API’s back end. The $input variable in mapping templates enables you to read values from a JSON body and its properties. You can now also return the raw payload, whether it’s JSON, XML, or a string using the $input.body property.

For example, if you have configured your API to receive raw data and pass it to Amazon Kinesis using an AWS service proxy integration, you can use the body property to read the incoming body and the $util variable to encode it for an Amazon Kinesis stream.

{
  "Data" : "$util.base64Encode($input.body)",
  "PartitionKey" : "key",
  "StreamName" : "Stream"
}

JSON parse function

We have also added a parseJson() method to the $util object in mapping templates. The parseJson() method parses stringified JSON input into its object representation. You can manipulate this object representation in the mapping templates. For example, if you need to return an error from AWS Lambda, you can now return it like this:

exports.handler = function(event, context) {
    var myErrorObj = {
        errorType : "InternalFailure",
        errorCode : 9130,
        detailedMessage : "This is my error message",
        stackTrace : ["foo1", "foo2", "foo3"],
        data : {
            numbers : [1, 2, 3]
        }
    }
    
    context.fail(JSON.stringify(myErrorObj));
};

Then, you can use the parseJson() method in the mapping template to extract values from the error and return a meaningful message from your API, like this:

#set ($errorMessageObj = $util.parseJson($input.path('$.errorMessage')))
#set ($bodyObj = $util.parseJson($input.body))

{
"type" : "$errorMessageObj.errorType",
"code" : $errorMessageObj.errorCode,
"message" : "$errorMessageObj.detailedMessage",
"someData" : "$errorMessageObj.data.numbers[2]"
}

This will produce a response that looks like this:

{
"type" : "InternalFailure",
"code" : 9130,
"message" : "This is my error message",
"someData" : "3"
}

Conclusion

We continuously release new features and improvements to Amazon API Gateway. Your feedback is extremely important and guides our priorities. Keep sending us feedback on the API Gateway forum and on social media.

Using Amazon API Gateway as a proxy for DynamoDB

Post Syndicated from Stefano Buliani original https://aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/

Andrew Baird Andrew Baird, AWS Solutions Architect
Amazon API Gateway has a feature that enables customers to create their own API definitions directly in front of an AWS service API. This tutorial will walk you through an example of doing so with Amazon DynamoDB.
Why use API Gateway as a proxy for AWS APIs?
Many AWS services provide APIs that applications depend on directly for their functionality. Examples include:

Amazon DynamoDB – An API-accessible NoSQL database.
Amazon Kinesis – Real-time ingestion of streaming data via API.
Amazon CloudWatch – API-driven metrics collection and retrieval.

If AWS already exposes internet-accessible APIs, why would you want to use API Gateway as a proxy for them? Why not allow applications to just directly depend on the AWS service API itself?
Here are a few great reasons to do so:

You might want to enable your application to integrate with very specific functionality that an AWS service provides, without the need to manage access keys and secret keys that AWS APIs require.
There may be application-specific restrictions you’d like to place on the API calls being made to AWS services that you would not be able to enforce if clients integrated with the AWS APIs directly.
You may get additional value out of using a different HTTP method from the method that is used by the AWS service. For example, creating a GET request as a proxy in front of an AWS API that requires an HTTP POST so that the response will be cached.
You can accomplish the above things without having to introduce a server-side application component that you need to manage or that could introduce increased latency. Even a lightweight Lambda function that calls a single AWS service API is code that you do not need to create or maintain if you use API Gateway directly as an AWS service proxy.

Here, we will walk you through a hypothetical scenario that shows how to create an Amazon API Gateway AWS service proxy in front of Amazon DynamoDB.
The Scenario
You would like the ability to add a public Comments section to each page of your website. To achieve this, you’ll need to accept and store comments and you will need to retrieve all of the comments posted for a particular page so that the UI can display them.
We will show you how to implement this functionality by creating a single table in DynamoDB, and creating the two necessary APIs using the AWS service proxy feature of Amazon API Gateway.
Defining the APIs
The first step is to map out the APIs that you want to create. For both APIs, we’ve linked to the DynamoDB API documentation. Take note of how the API you define below differs in request/response details from the native DynamoDB APIs.
Post Comments
First, you need an API that accepts user comments and stores them in the DynamoDB table. Here’s the API definition you’ll use to implement this functionality:
Resource: /comments
HTTP Method: POST
HTTP Request Body:
{
"pageId": "example-page-id",
"userName": "ExampleUserName",
"message": "This is an example comment to be added."
}
After you create it, this API becomes a proxy in front of the DynamoDB API PutItem.
Get Comments
Second, you need an API to retrieve all of the comments for a particular page. Use the following API definition:
Resource: /comments/{pageId}
HTTP Method: GET
The curly braces around {pageId} in the URI path definition indicate that pageId will be treated as a path variable within the URI.
This API will be a proxy in front of the DynamoDB API Query. Here, you will notice the benefit: your API uses the GET method, while the DynamoDB GetItem API requires an HTTP POST and does not include any cache headers in the response.
Creating the DynamoDB Table
First, Navigate to the DynamoDB console and select Create Table. Next, name the table Comments, with commentId as the Primary Key. Leave the rest of the default settings for this example, and choose Create.

After this table is populated with comments, you will want to retrieve them based on the page that they’ve been posted to. To do this, create a secondary index on an attribute called pageId. This secondary index enables you to query the table later for all comments posted to a particular page. When viewing your table, choose the Indexes tab and choose Create index.

When querying this table, you only want to retrieve the pieces of information that matter to the client: in this case, these are the pageId, the userName, and the message itself. Any other data you decide to store with each comment does not need to be retrieved from the table for the publically accessible API. Type the following information into the form to capture this and choose Create index:

Creating the APIs
Now, using the AWS service proxy feature of Amazon API Gateway, we’ll demonstrate how to create each of the APIs you defined. Navigate to the API Gateway service console, and choose Create API. In API name, type CommentsApi and type a short description. Finally, choose Create API.

Now you’re ready to create the specific resources and methods for the new API.
Creating the Post Comments API
In the editor screen, choose Create Resource. To match the description of the Post Comments API above, provide the appropriate details and create the first API resource:

Now, with the resource created, set up what happens when the resource is called with the HTTP POST method. Choose Create Method and select POST from the drop down. Click the checkmark to save.
To map this API to the DynamoDB API needed, next to Integration type, choose Show Advanced and choose AWS Service Proxy.
Here, you’re presented with options that define which specific AWS service API will be executed when this API is called, and in which region. Fill out the information as shown, matching the DynamoDB table you created a moment ago. Before you proceed, create an AWS Identity and Access Management (IAM) role that has permission to call the DynamoDB API PutItem for the Comments table; this role must have a service trust relationship to API Gateway. For more information on IAM policies and roles, see the Overview of IAM Policies topic.
After inputting all of the information as shown, choose Save.

If you were to deploy this API right now, you would have a working service proxy API that only wraps the DynamoDB PutItem API. But, for the Post Comments API, you’d like the client to be able to use a more contextual JSON object structure. Also, you’d like to be sure that the DynamoDB API PutItem is called precisely the way you expect it to be called. This eliminates client-driven error responses and removes the possibility that the new API could be used to call another DynamoDB API or table that you do not intend to allow.
You accomplish this by creating a mapping template. This enables you to define the request structure that your API clients will use, and then transform those requests into the structure that the DynamoDB API PutItem requires.
From the Method Execution screen, choose Integration Request:

In the Integration Request screen expand the Mapping Templates section and choose Add mapping template. Under Content-Type, type application/json and then choose the check mark:

Next, choose the pencil icon next to Input passthrough and choose Mapping template from the dropdown. Now, you’ll be presented with a text box where you create the mapping template. For more information on creating mapping templates, see API Gateway Mapping Template Reference.
The mapping template will be as follows. We’ll walk through what’s important about it next:
{
"TableName": "Comments",
"Item": {
"commentId": {
"S": "$context.requestId"
},
"pageId": {
"S": "$input.path(‘$.pageId’)"
},
"userName": {
"S": "$input.path(‘$.userName’)"
},
"message": {
"S": "$input.path(‘$.message’)"
}
}
}
This mapping template creates the JSON structure required by the DynamoDB PutItem API. The entire mapping template is static. The three input variables are referenced from the request JSON using the $input variable and each comment is stamped with a unique identifier. This unique identifier is the commentId and is extracted directly from the API request’s $context variable. This $context variable is set by the API Gateway service itself. To review other parameters that are available to a mapping template, see API Gateway Mapping Template Reference. You may decide that including information like sourceIp or other headers could be valuable to you.
With this mapping template, no matter how your API is called, the only variance from the DynamoDB PutItem API call will be the values of pageId, userName, and message. Clients of your API will not be able to dictate which DynamoDB table is being targeted (because “Comments” is statically listed), and they will not have any control over the object structure that is specified for each item (each input variable is explicitly declared a string to the PutItem API).
Back in the Method Execution pane click TEST.
Create an example Request Body that matches the API definition documented above and then choose Test. For example, your request body could be:
{
"pageId": "breaking-news-story-01-18-2016",
"userName": "Just Saying Thank You",
"message": "I really enjoyed this story!!"
}
Navigate to the DynamoDB console and view the Comments table to show that the request really was successfully processed:

Great! Try including a few more sample items in the table to further test the Get Comments API.
If you deployed this API, you would be all set with a public API that has the ability to post public comments and store them in DynamoDB. For some use cases you may only want to collect data through a single API like this: for example, when collecting customer and visitor feedback, or for a public voting or polling system. But for this use case, we’ll demonstrate how to create another API to retrieve records from a DynamoDB table as well. Many of the details are similar to the process above.
Creating the Get Comments API
Return to the Resources view, choose the /comments resource you created earlier and choose Create Resource, like before.
This time, include a request path parameter to represent the pageId of the comments being retrieved. Input the following information and then choose Create Resource:

In Resources, choose your new /{pageId} resource and choose Create Method. The Get Comments API will be retrieving data from our DynamoDB table, so choose GET for the HTTP method.
In the method configuration screen choose Show advanced and then select AWS Service Proxy. Fill out the form to match the following. Make sure to use the appropriate AWS Region and IAM execution role; these should match what you previously created. Finally, choose Save.

Modify the Integration Request and create a new mapping template. This will transform the simple pageId path parameter on the GET request to the needed DynamoDB Query API, which requires an HTTP POST. Here is the mapping template:
{
"TableName": "Comments",
"IndexName": "pageId-index",
"KeyConditionExpression": "pageId = :v1",
"ExpressionAttributeValues": {
":v1": {
"S": "$input.params(‘pageId’)"
}
}
}
Now test your mapping template. Navigate to the Method Execution pane and choose the Test icon on the left. Provide one of the pageId values that you’ve inserted into your Comments table and choose Test.

You should see a response like the following; it is directly passing through the raw DynamoDB response:

Now you’re close! All you need to do before you deploy your API is to map the raw DynamoDB response to the similar JSON object structure that you defined on the Post Comment API.
This will work very similarly to the mapping template changes you already made. But you’ll configure this change on the Integration Response page of the console by editing the default mapping response’s mapping template.
Navigate to Integration Response and expand the 200 response code by choosing the arrow on the left. In the 200 response, expand the Mapping Templates section. In Content-Type choose application/json then choose the pencil icon next to Output Passthrough.

Now, create a mapping template that extracts the relevant pieces of the DynamoDB response and places them into a response structure that matches our use case:
#set($inputRoot = $input.path(‘$’))
{
"comments": [
#foreach($elem in $inputRoot.Items) {
"commentId": "$elem.commentId.S",
"userName": "$elem.userName.S",
"message": "$elem.message.S"
}#if($foreach.hasNext),#end
#end
]
}
Now choose the check mark to save the mapping template, and choose Save to save this default integration response. Return to the Method Execution page and test your API again. You should now see a formatted response.
Now you have two working APIs that are ready to deploy! See our documentation to learn about how to deploy API stages.
But, before you deploy your API, here are some additional things to consider:

Authentication: you may want to require that users authenticate before they can leave comments. Amazon API Gateway can enforce IAM authentication for the APIs you create. To learn more, see Amazon API Gateway Access Permissions.
DynamoDB capacity: you may want to provision an appropriate amount of capacity to your Comments table so that your costs and performance reflect your needs.
Commenting features: Depending on how robust you’d like commenting to be on your site, you might like to introduce changes to the APIs described here. Examples are attributes that track replies or timestamp attributes.

Conclusion
Now you’ve got a fully functioning public API to post and retrieve public comments for your website. This API communicates directly with the Amazon DynamoDB API without you having to manage a single application component yourself!

Using Amazon API Gateway as a proxy for DynamoDB

Post Syndicated from Stefano Buliani original https://aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/

Andrew Baird Andrew Baird, AWS Solutions Architect
Amazon API Gateway has a feature that enables customers to create their own API definitions directly in front of an AWS service API. This tutorial will walk you through an example of doing so with Amazon DynamoDB.
Why use API Gateway as a proxy for AWS APIs?
Many AWS services provide APIs that applications depend on directly for their functionality. Examples include:

Amazon DynamoDB – An API-accessible NoSQL database.
Amazon Kinesis – Real-time ingestion of streaming data via API.
Amazon CloudWatch – API-driven metrics collection and retrieval.

If AWS already exposes internet-accessible APIs, why would you want to use API Gateway as a proxy for them? Why not allow applications to just directly depend on the AWS service API itself?
Here are a few great reasons to do so:

You might want to enable your application to integrate with very specific functionality that an AWS service provides, without the need to manage access keys and secret keys that AWS APIs require.
There may be application-specific restrictions you’d like to place on the API calls being made to AWS services that you would not be able to enforce if clients integrated with the AWS APIs directly.
You may get additional value out of using a different HTTP method from the method that is used by the AWS service. For example, creating a GET request as a proxy in front of an AWS API that requires an HTTP POST so that the response will be cached.
You can accomplish the above things without having to introduce a server-side application component that you need to manage or that could introduce increased latency. Even a lightweight Lambda function that calls a single AWS service API is code that you do not need to create or maintain if you use API Gateway directly as an AWS service proxy.

Here, we will walk you through a hypothetical scenario that shows how to create an Amazon API Gateway AWS service proxy in front of Amazon DynamoDB.
The Scenario
You would like the ability to add a public Comments section to each page of your website. To achieve this, you’ll need to accept and store comments and you will need to retrieve all of the comments posted for a particular page so that the UI can display them.
We will show you how to implement this functionality by creating a single table in DynamoDB, and creating the two necessary APIs using the AWS service proxy feature of Amazon API Gateway.
Defining the APIs
The first step is to map out the APIs that you want to create. For both APIs, we’ve linked to the DynamoDB API documentation. Take note of how the API you define below differs in request/response details from the native DynamoDB APIs.
Post Comments
First, you need an API that accepts user comments and stores them in the DynamoDB table. Here’s the API definition you’ll use to implement this functionality:
Resource: /comments
HTTP Method: POST
HTTP Request Body:
{
"pageId": "example-page-id",
"userName": "ExampleUserName",
"message": "This is an example comment to be added."
}
After you create it, this API becomes a proxy in front of the DynamoDB API PutItem.
Get Comments
Second, you need an API to retrieve all of the comments for a particular page. Use the following API definition:
Resource: /comments/{pageId}
HTTP Method: GET
The curly braces around {pageId} in the URI path definition indicate that pageId will be treated as a path variable within the URI.
This API will be a proxy in front of the DynamoDB API Query. Here, you will notice the benefit: your API uses the GET method, while the DynamoDB GetItem API requires an HTTP POST and does not include any cache headers in the response.
Creating the DynamoDB Table
First, Navigate to the DynamoDB console and select Create Table. Next, name the table Comments, with commentId as the Primary Key. Leave the rest of the default settings for this example, and choose Create.

After this table is populated with comments, you will want to retrieve them based on the page that they’ve been posted to. To do this, create a secondary index on an attribute called pageId. This secondary index enables you to query the table later for all comments posted to a particular page. When viewing your table, choose the Indexes tab and choose Create index.

When querying this table, you only want to retrieve the pieces of information that matter to the client: in this case, these are the pageId, the userName, and the message itself. Any other data you decide to store with each comment does not need to be retrieved from the table for the publically accessible API. Type the following information into the form to capture this and choose Create index:

Creating the APIs
Now, using the AWS service proxy feature of Amazon API Gateway, we’ll demonstrate how to create each of the APIs you defined. Navigate to the API Gateway service console, and choose Create API. In API name, type CommentsApi and type a short description. Finally, choose Create API.

Now you’re ready to create the specific resources and methods for the new API.
Creating the Post Comments API
In the editor screen, choose Create Resource. To match the description of the Post Comments API above, provide the appropriate details and create the first API resource:

Now, with the resource created, set up what happens when the resource is called with the HTTP POST method. Choose Create Method and select POST from the drop down. Click the checkmark to save.
To map this API to the DynamoDB API needed, next to Integration type, choose Show Advanced and choose AWS Service Proxy.
Here, you’re presented with options that define which specific AWS service API will be executed when this API is called, and in which region. Fill out the information as shown, matching the DynamoDB table you created a moment ago. Before you proceed, create an AWS Identity and Access Management (IAM) role that has permission to call the DynamoDB API PutItem for the Comments table; this role must have a service trust relationship to API Gateway. For more information on IAM policies and roles, see the Overview of IAM Policies topic.
After inputting all of the information as shown, choose Save.

If you were to deploy this API right now, you would have a working service proxy API that only wraps the DynamoDB PutItem API. But, for the Post Comments API, you’d like the client to be able to use a more contextual JSON object structure. Also, you’d like to be sure that the DynamoDB API PutItem is called precisely the way you expect it to be called. This eliminates client-driven error responses and removes the possibility that the new API could be used to call another DynamoDB API or table that you do not intend to allow.
You accomplish this by creating a mapping template. This enables you to define the request structure that your API clients will use, and then transform those requests into the structure that the DynamoDB API PutItem requires.
From the Method Execution screen, choose Integration Request:

In the Integration Request screen expand the Mapping Templates section and choose Add mapping template. Under Content-Type, type application/json and then choose the check mark:

Next, choose the pencil icon next to Input passthrough and choose Mapping template from the dropdown. Now, you’ll be presented with a text box where you create the mapping template. For more information on creating mapping templates, see API Gateway Mapping Template Reference.
The mapping template will be as follows. We’ll walk through what’s important about it next:
{
"TableName": "Comments",
"Item": {
"commentId": {
"S": "$context.requestId"
},
"pageId": {
"S": "$input.path(‘$.pageId’)"
},
"userName": {
"S": "$input.path(‘$.userName’)"
},
"message": {
"S": "$input.path(‘$.message’)"
}
}
}
This mapping template creates the JSON structure required by the DynamoDB PutItem API. The entire mapping template is static. The three input variables are referenced from the request JSON using the $input variable and each comment is stamped with a unique identifier. This unique identifier is the commentId and is extracted directly from the API request’s $context variable. This $context variable is set by the API Gateway service itself. To review other parameters that are available to a mapping template, see API Gateway Mapping Template Reference. You may decide that including information like sourceIp or other headers could be valuable to you.
With this mapping template, no matter how your API is called, the only variance from the DynamoDB PutItem API call will be the values of pageId, userName, and message. Clients of your API will not be able to dictate which DynamoDB table is being targeted (because “Comments” is statically listed), and they will not have any control over the object structure that is specified for each item (each input variable is explicitly declared a string to the PutItem API).
Back in the Method Execution pane click TEST.
Create an example Request Body that matches the API definition documented above and then choose Test. For example, your request body could be:
{
"pageId": "breaking-news-story-01-18-2016",
"userName": "Just Saying Thank You",
"message": "I really enjoyed this story!!"
}
Navigate to the DynamoDB console and view the Comments table to show that the request really was successfully processed:

Great! Try including a few more sample items in the table to further test the Get Comments API.
If you deployed this API, you would be all set with a public API that has the ability to post public comments and store them in DynamoDB. For some use cases you may only want to collect data through a single API like this: for example, when collecting customer and visitor feedback, or for a public voting or polling system. But for this use case, we’ll demonstrate how to create another API to retrieve records from a DynamoDB table as well. Many of the details are similar to the process above.
Creating the Get Comments API
Return to the Resources view, choose the /comments resource you created earlier and choose Create Resource, like before.
This time, include a request path parameter to represent the pageId of the comments being retrieved. Input the following information and then choose Create Resource:

In Resources, choose your new /{pageId} resource and choose Create Method. The Get Comments API will be retrieving data from our DynamoDB table, so choose GET for the HTTP method.
In the method configuration screen choose Show advanced and then select AWS Service Proxy. Fill out the form to match the following. Make sure to use the appropriate AWS Region and IAM execution role; these should match what you previously created. Finally, choose Save.

Modify the Integration Request and create a new mapping template. This will transform the simple pageId path parameter on the GET request to the needed DynamoDB Query API, which requires an HTTP POST. Here is the mapping template:
{
"TableName": "Comments",
"IndexName": "pageId-index",
"KeyConditionExpression": "pageId = :v1",
"ExpressionAttributeValues": {
":v1": {
"S": "$input.params(‘pageId’)"
}
}
}
Now test your mapping template. Navigate to the Method Execution pane and choose the Test icon on the left. Provide one of the pageId values that you’ve inserted into your Comments table and choose Test.

You should see a response like the following; it is directly passing through the raw DynamoDB response:

Now you’re close! All you need to do before you deploy your API is to map the raw DynamoDB response to the similar JSON object structure that you defined on the Post Comment API.
This will work very similarly to the mapping template changes you already made. But you’ll configure this change on the Integration Response page of the console by editing the default mapping response’s mapping template.
Navigate to Integration Response and expand the 200 response code by choosing the arrow on the left. In the 200 response, expand the Mapping Templates section. In Content-Type choose application/json then choose the pencil icon next to Output Passthrough.

Now, create a mapping template that extracts the relevant pieces of the DynamoDB response and places them into a response structure that matches our use case:
#set($inputRoot = $input.path(‘$’))
{
"comments": [
#foreach($elem in $inputRoot.Items) {
"commentId": "$elem.commentId.S",
"userName": "$elem.userName.S",
"message": "$elem.message.S"
}#if($foreach.hasNext),#end
#end
]
}
Now choose the check mark to save the mapping template, and choose Save to save this default integration response. Return to the Method Execution page and test your API again. You should now see a formatted response.
Now you have two working APIs that are ready to deploy! See our documentation to learn about how to deploy API stages.
But, before you deploy your API, here are some additional things to consider:

Authentication: you may want to require that users authenticate before they can leave comments. Amazon API Gateway can enforce IAM authentication for the APIs you create. To learn more, see Amazon API Gateway Access Permissions.
DynamoDB capacity: you may want to provision an appropriate amount of capacity to your Comments table so that your costs and performance reflect your needs.
Commenting features: Depending on how robust you’d like commenting to be on your site, you might like to introduce changes to the APIs described here. Examples are attributes that track replies or timestamp attributes.

Conclusion
Now you’ve got a fully functioning public API to post and retrieve public comments for your website. This API communicates directly with the Amazon DynamoDB API without you having to manage a single application component yourself!