Tag Archives: aws solutions

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!

Jeff;

Using AWS Lambda and Amazon Comprehend for sentiment analysis

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/using-aws-lambda-and-amazon-comprehend-for-sentiment-analysis/

This post courtesy of Giedrius Praspaliauskas, AWS Solutions Architect

Even with best IVR systems, customers get frustrated. What if you knew that 10 callers in your Amazon Connect contact flow were likely to say “Agent!” in frustration in the next 30 seconds? Would you like to get to them before that happens? What if your bot was smart enough to admit, “I’m sorry this isn’t helping. Let me find someone for you.”?

In this post, I show you how to use AWS Lambda and Amazon Comprehend for sentiment analysis to make your Amazon Lex bots in Amazon Connect more sympathetic.

Setting up a Lambda function for sentiment analysis

There are multiple natural language and text processing frameworks or services available to use with Lambda, including but not limited to Amazon Comprehend, TextBlob, Pattern, and NLTK. Pick one based on the nature of your system:  the type of interaction, languages supported, and so on. For this post, I picked Amazon Comprehend, which uses natural language processing (NLP) to extract insights and relationships in text.

The walkthrough in this post is just an example. In a full-scale implementation, you would likely implement a more nuanced approach. For example, you could keep the overall sentiment score through the conversation and act only when it reaches a certain threshold. It is worth noting that this Lambda function is not called for missed utterances, so there may be a gap between what is being analyzed and what was actually said.

The Lambda function is straightforward. It analyses the input transcript field of the Amazon Lex event. Based on the overall sentiment value, it generates a response message with next step instructions. When the sentiment is neutral, positive, or mixed, the response leaves it to Amazon Lex to decide what the next steps should be. It adds to the response overall sentiment value as an additional session attribute, along with slots’ values received as an input.

When the overall sentiment is negative, the function returns the dialog action, pointing to an escalation intent (specified in the environment variable ESCALATION_INTENT_NAME) or returns the fulfillment closure action with a failure state when the intent is not specified. In addition to actions or intents, the function returns a message, or prompt, to be provided to the customer before taking the next step. Based on the returned action, Amazon Connect can select the appropriate next step in a contact flow.

For this walkthrough, you create a Lambda function using the AWS Management Console:

  1. Open the Lambda console.
  2. Choose Create Function.
  3. Choose Author from scratch (no blueprint).
  4. For Runtime, choose Python 3.6.
  5. For Role, choose Create a custom role. The custom execution role allows the function to detect sentiments, create a log group, stream log events, and store the log events.
  6. Enter the following values:
    • For Role Description, enter Lambda execution role permissions.
    • For IAM Role, choose Create an IAM role.
    • For Role Name, enter LexSentimentAnalysisLambdaRole.
    • For Policy, use the following policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Action": [
                "comprehend:DetectDominantLanguage",
                "comprehend:DetectSentiment"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}
    1. Choose Create function.
    2. Copy/paste the following code to the editor window
import os, boto3

ESCALATION_INTENT_MESSAGE="Seems that you are having troubles with our service. Would you like to be transferred to the associate?"
FULFILMENT_CLOSURE_MESSAGE="Seems that you are having troubles with our service. Let me transfer you to the associate."

escalation_intent_name = os.getenv('ESACALATION_INTENT_NAME', None)

client = boto3.client('comprehend')

def lambda_handler(event, context):
    sentiment=client.detect_sentiment(Text=event['inputTranscript'],LanguageCode='en')['Sentiment']
    if sentiment=='NEGATIVE':
        if escalation_intent_name:
            result = {
                "sessionAttributes": {
                    "sentiment": sentiment
                    },
                    "dialogAction": {
                        "type": "ConfirmIntent", 
                        "message": {
                            "contentType": "PlainText", 
                            "content": ESCALATION_INTENT_MESSAGE
                        }, 
                    "intentName": escalation_intent_name
                    }
            }
        else:
            result = {
                "sessionAttributes": {
                    "sentiment": sentiment
                },
                "dialogAction": {
                    "type": "Close",
                    "fulfillmentState": "Failed",
                    "message": {
                            "contentType": "PlainText",
                            "content": FULFILMENT_CLOSURE_MESSAGE
                    }
                }
            }

    else:
        result ={
            "sessionAttributes": {
                "sentiment": sentiment
            },
            "dialogAction": {
                "type": "Delegate",
                "slots" : event["currentIntent"]["slots"]
            }
        }
    return result
  1. Below the code editor specify the environment variable ESCALATION_INTENT_NAME with a value of Escalate.

  1. Click on Save in the top right of the console.

Now you can test your function.

  1. Click Test at the top of the console.
  2. Configure a new test event using the following test event JSON:
{
  "messageVersion": "1.0",
  "invocationSource": "DialogCodeHook",
  "userId": "1234567890",
  "sessionAttributes": {},
  "bot": {
    "name": "BookSomething",
    "alias": "None",
    "version": "$LATEST"
  },
  "outputDialogMode": "Text",
  "currentIntent": {
    "name": "BookSomething",
    "slots": {
      "slot1": "None",
      "slot2": "None"
    },
    "confirmationStatus": "None"
  },
  "inputTranscript": "I want something"
}
  1. Click Create
  2. Click Test on the console

This message should return a response from Lambda with a sentiment session attribute of NEUTRAL.

However, if you change the input to “This is garbage!”, Lambda changes the dialog action to the escalation intent specified in the environment variable ESCALATION_INTENT_NAME.

Setting up Amazon Lex

Now that you have your Lambda function running, it is time to create the Amazon Lex bot. Use the BookTrip sample bot and call it BookSomething. The IAM role is automatically created on your behalf. Indicate that this bot is not subject to the COPPA, and choose Create. A few minutes later, the bot is ready.

Make the following changes to the default configuration of the bot:

  1. Add an intent with no associated slots. Name it Escalate.
  2. Specify the Lambda function for initialization and validation in the existing two intents (“BookCar” and “BookHotel”), at the same time giving Amazon Lex permission to invoke it.
  3. Leave the other configuration settings as they are and save the intents.

You are ready to build and publish this bot. Set a new alias, BookSomethingWithSentimentAnalysis. When the build finishes, test it.

As you see, sentiment analysis works!

Setting up Amazon Connect

Next, provision an Amazon Connect instance.

After the instance is created, you need to integrate the Amazon Lex bot created in the previous step. For more information, see the Amazon Lex section in the Configuring Your Amazon Connect Instance topic.  You may also want to look at the excellent post by Randall Hunt, New – Amazon Connect and Amazon Lex Integration.

Create a new contact flow, “Sentiment analysis walkthrough”:

  1. Log in into the Amazon Connect instance.
  2. Choose Create contact flow, Create transfer to agent flow.
  3. Add a Get customer input block, open the icon in the top left corner, and specify your Amazon Lex bot and its intents.
  4. Select the Text to speech audio prompt type and enter text for Amazon Connect to play at the beginning of the dialog.
  5. Choose Amazon Lex, enter your Amazon Lex bot name and the alias.
  6. Specify the intents to be used as dialog branches that a customer can choose: BookHotel, BookTrip, or Escalate.
  7. Add two Play prompt blocks and connect them to the customer input block.
    • If booking hotel or car intent is returned from the bot flow, play the corresponding prompt (“OK, will book it for you”) and initiate booking (in this walkthrough, just hang up after the prompt).
    • However, if escalation intent is returned (caused by the sentiment analysis results in the bot), play the prompt (“OK, transferring to an agent”) and initiate the transfer.
  8. Save and publish the contact flow.

As a result, you have a contact flow with a single customer input step and a text-to-speech prompt that uses the Amazon Lex bot. You expect one of the three intents returned:

Edit the phone number to associate the contact flow that you just created. It is now ready for testing. Call the phone number and check how your contact flow works.

Cleanup

Don’t forget to delete all the resources created during this walkthrough to avoid incurring any more costs:

  • Amazon Connect instance
  • Amazon Lex bot
  • Lambda function
  • IAM role LexSentimentAnalysisLambdaRole

Summary

In this walkthrough, you implemented sentiment analysis with a Lambda function. The function can be integrated into Amazon Lex and, as a result, into Amazon Connect. This approach gives you the flexibility to analyze user input and then act. You may find the following potential use cases of this approach to be of interest:

  • Extend the Lambda function to identify “hot” topics in the user input even if the sentiment is not negative and take action proactively. For example, switch to an escalation intent if a user mentioned “where is my order,” which may signal potential frustration.
  • Use Amazon Connect Streams to provide agent sentiment analysis results along with call transfer. Enable service tailored towards particular customer needs and sentiments.
  • Route calls to agents based on both skill set and sentiment.
  • Prioritize calls based on sentiment using multiple Amazon Connect queues instead of transferring directly to an agent.
  • Monitor quality and flag for review contact flows that result in high overall negative sentiment.
  • Implement sentiment and AI/ML based call analysis, such as a real-time recommendation engine. For more details, see Machine Learning on AWS.

If you have questions or suggestions, please comment below.

Node.js 8.10 runtime now available in AWS Lambda

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/node-js-8-10-runtime-now-available-in-aws-lambda/

This post courtesy of Ed Lima, AWS Solutions Architect

We are excited to announce that you can now develop your AWS Lambda functions using the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Start using this new version today by specifying a runtime parameter value of nodejs8.10 when creating or updating functions.

Supporting async/await

The Lambda programming model for Node.js 8.10 now supports defining a function handler using the async/await pattern.

Asynchronous or non-blocking calls are an inherent and important part of applications, as user and human interfaces are asynchronous by nature. If you decide to have a coffee with a friend, you usually order the coffee then start or continue a conversation with your friend while the coffee is getting ready. You don’t wait for the coffee to be ready before you start talking. These activities are asynchronous, because you can start one and then move to the next without waiting for completion. Otherwise, you’d delay (or block) the start of the next activity.

Asynchronous calls used to be handled in Node.js using callbacks. That presented problems when they were nested within other callbacks in multiple levels, making the code difficult to maintain and understand.

Promises were implemented to try to solve issues caused by “callback hell.” They allow asynchronous operations to call their own methods and handle what happens when a call is successful or when it fails. As your requirements become more complicated, even promises become harder to work with and may still end up complicating your code.

Async/await is the new way of handling asynchronous operations in Node.js, and makes for simpler, easier, and cleaner code for non-blocking calls. It still uses promises but a callback is returned directly from the asynchronous function, just as if it were a synchronous blocking function.

Take for instance the following Lambda function to get the current account settings, using the Node.js 6.10 runtime:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();

exports.handler = (event, context, callback) => {
    let getAccountSettingsPromise = lambda.getAccountSettings().promise();
    getAccountSettingsPromise.then(
        (data) => {
            callback(null, data);
        },
        (err) => {
            console.log(err);
            callback(err);
        }
    );
};

With the new Node.js 8.10 runtime, there are new handler types that can be declared with the “async” keyword or can return a promise directly.

This is how the same function looks like using async/await with Node.js 8.10:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();

exports.handler = async (event) => {
    return await lambda.getAccountSettings().promise() ;
};

Alternatively, you could have the handler return a promise directly:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();

exports.handler = (event) => {
    return new Promise((resolve, reject) => {
        lambda.getAccountSettings(event)
        .then((data) => {
            resolve data;
        })
        .catch(reject);
     });
};

The new handler types are alternatives to the callback pattern, which is still fully supported.

All three functions return the same results. However, in the new runtime with async/await, all callbacks in the code are gone, which makes it easier to read. This is especially true for those less familiar with promises.

{
    "AccountLimit":{
        "TotalCodeSize":80530636800,
        "CodeSizeUnzipped":262144000,
        "CodeSizeZipped":52428800, 
        "ConcurrentExecutions":1000,
        "UnreservedConcurrentExecutions":1000
    },
    "AccountUsage":{
        "TotalCodeSize":52234461,
        "FunctionCount":53
    }
}

Another great advantage of async/await is better error handling. You can use a try/catch block inside the scope of an async function. Even though the function awaits an asynchronous operation, any errors end up in the catch block.

You can improve your previous Node.js 8.10 function with this trusted try/catch error handling pattern:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
let data;

exports.handler = async (event) => {
    try {
        data = await lambda.getAccountSettings().promise();
    }
    catch (err) {
        console.log(err);
        return err;
    }
    return data;
};

While you now have a similar number of lines in both runtimes, the code is cleaner and more readable with async/await. It makes the asynchronous calls look more synchronous. However, it is important to notice that the code is still executed the same way as if it were using a callback or promise-based API.

Backward compatibility

You may port your existing Node.js 4.3 and 6.10 functions over to Node.js 8.10 by updating the runtime. Node.js 8.10 does include numerous breaking changes from previous Node versions.

Make sure to review the API changes between Node.js 4.3, 6.10, and Node.js 8.10 to see if there are other changes that might affect your code. We recommend testing that your Lambda function passes internal validation for its behavior when upgrading to the new runtime version.

You can use Lambda versions/aliases to safely test that your function runs as expected on Node 8.10, before routing production traffic to it.

New node features

You can now get better performance when compared to the previous LTS version 6.x (up to 20%). The new V8 6.0 engine comes with Turbofan and the Ignition pipeline, which leads to lower memory consumption and faster startup time across Node.js applications.

HTTP/2, which is subject to future changes, allows developers to use the new protocol to speed application development and undo many of HTTP/1.1 workarounds to make applications faster, simpler, and more powerful.

For more information, see the AWS Lambda Developer Guide.

Hope you enjoy and… go build with Node.js 8.10!

Running ActiveMQ in a Hybrid Cloud Environment with Amazon MQ

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/running-activemq-in-a-hybrid-cloud-environment-with-amazon-mq/

This post courtesy of Greg Share, AWS Solutions Architect

Many organizations, particularly enterprises, rely on message brokers to connect and coordinate different systems. Message brokers enable distributed applications to communicate with one another, serving as the technological backbone for their IT environment, and ultimately their business services. Applications depend on messaging to work.

In many cases, those organizations have started to build new or “lift and shift” applications to AWS. In some cases, there are applications, such as mainframe systems, too costly to migrate. In these scenarios, those on-premises applications still need to interact with cloud-based components.

Amazon MQ is a managed message broker service for ActiveMQ that enables organizations to send messages between applications in the cloud and on-premises to enable hybrid environments and application modernization. For example, you can invoke AWS Lambda from queues and topics managed by Amazon MQ brokers to integrate legacy systems with serverless architectures. ActiveMQ is an open-source message broker written in Java that is packaged with clients in multiple languages, Java Message Server (JMS) client being one example.

This post shows you can use Amazon MQ to integrate on-premises and cloud environments using the network of brokers feature of ActiveMQ. It provides configuration parameters for a one-way duplex connection for the flow of messages from an on-premises ActiveMQ message broker to Amazon MQ.

ActiveMQ and the network of brokers

First, look at queues within ActiveMQ and then at the network of brokers as a mechanism to distribute messages.

The network of brokers behaves differently from models such as physical networks. The key consideration is that the production (sending) of a message is disconnected from the consumption of that message. Think of the delivery of a parcel: The parcel is sent by the supplier (producer) to the end customer (consumer). The path it took to get there is of little concern to the customer, as long as it receives the package.

The same logic can be applied to the network of brokers. Here’s how you build the flow from a simple message to a queue and build toward a network of brokers. Before you look at setting up a hybrid connection, I discuss how a broker processes messages in a simple scenario.

When a message is sent from a producer to a queue on a broker, the following steps occur:

  1. A message is sent to a queue from the producer.
  2. The broker persists this in its store or journal.
  3. At this point, an acknowledgement (ACK) is sent to the producer from the broker.

When a consumer looks to consume the message from that same queue, the following steps occur:

  1. The message listener (consumer) calls the broker, which creates a subscription to the queue.
  2. Messages are fetched from the message store and sent to the consumer.
  3. The consumer acknowledges that the message has been received before processing it.
  4. Upon receiving the ACK, the broker sets the message as having been consumed. By default, this deletes it from the queue.
    • You can set the consumer to ACK after processing by setting up transaction management or handle it manually using Session.CLIENT_ACKNOWLEDGE.

Static propagation

I now introduce the concept of static propagation with the network of brokers as the mechanism for message transfer from on-premises brokers to Amazon MQ.  Static propagation refers to message propagation that occurs in the absence of subscription information. In this case, the objective is to transfer messages arriving at your selected on-premises broker to the Amazon MQ broker for consumption within the cloud environment.

After you configure static propagation with a network of brokers, the following occurs:

  1. The on-premises broker receives a message from a producer for a specific queue.
  2. The on-premises broker sends (statically propagates) the message to the Amazon MQ broker.
  3. The Amazon MQ broker sends an acknowledgement to the on-premises broker, which marks the message as having been consumed.
  4. Amazon MQ holds the message in its queue ready for consumption.
  5. A consumer connects to Amazon MQ broker, subscribes to the queue in which the message resides, and receives the message.
  6. Amazon MQ broker marks the message as having been consumed.

Getting started

The first step is creating an Amazon MQ broker.

  1. Sign in to the Amazon MQ console and launch a new Amazon MQ broker.
  2. Name your broker and choose Next step.
  3. For Broker instance type, choose your instance size:
    mq.t2.micro
    mq.m4.large
  4. For Deployment mode, enter one of the following:
    Single-instance broker for development and test implementations (recommended)
    Active/standby broker for high availability in production environments
  5. Scroll down and enter your user name and password.
  6. Expand Advanced Settings.
  7. For VPC, Subnet, and Security Group, pick the values for the resources in which your broker will reside.
  8. For Public Accessibility, choose Yes, as connectivity is internet-based. Another option would be to use private connectivity between your on-premises network and the VPC, an example being an AWS Direct Connect or VPN connection. In that case, you could set Public Accessibility to No.
  9. For Maintenance, leave the default value, No preference.
  10. Choose Create Broker. Wait several minutes for the broker to be created.

After creation is complete, you see your broker listed.

For connectivity to work, you must configure the security group where Amazon MQ resides. For this post, I focus on the OpenWire protocol.

For Openwire connectivity, allow port 61617 access for Amazon MQ from your on-premises ActiveMQ broker source IP address. For alternate protocols, see the Amazon MQ broker configuration information for the ports required:

OpenWire – ssl://xxxxxxx.xxx.com:61617
AMQP – amqp+ssl:// xxxxxxx.xxx.com:5671
STOMP – stomp+ssl:// xxxxxxx.xxx.com:61614
MQTT – mqtt+ssl:// xxxxxxx.xxx.com:8883
WSS – wss:// xxxxxxx.xxx.com:61619

Configuring the network of brokers

Configuring the network of brokers with static propagation occurs on the on-premises broker by applying changes to the following file:
<activemq install directory>/conf activemq.xml

Network connector

This is the first configuration item required to enable a network of brokers. It is only required on the on-premises broker, which initiates and creates the connection with Amazon MQ. This connection, after it’s established, enables the flow of messages in either direction between the on-premises broker and Amazon MQ. The focus of this post is the uni-directional flow of messages from the on-premises broker to Amazon MQ.

The default activemq.xml file does not include the network connector configuration. Add this with the networkConnector element. In this scenario, edit the on-premises broker activemq.xml file to include the following information between <systemUsage> and <transportConnectors>:

<networkConnectors>
             <networkConnector 
                name="Q:source broker name->target broker name"
                duplex="false" 
                uri="static:(ssl:// aws mq endpoint:61617)" 
                userName="username"
                password="password" 
                networkTTL="2" 
                dynamicOnly="false">
                <staticallyIncludedDestinations>
                    <queue physicalName="queuename"/>
                </staticallyIncludedDestinations> 
                <excludedDestinations>
                      <queue physicalName=">" />
                </excludedDestinations>
             </networkConnector> 
     <networkConnectors>

The highlighted components are the most important elements when configuring your on-premises broker.

  • name – Name of the network bridge. In this case, it specifies two things:
    • That this connection relates to an ActiveMQ queue (Q) as opposed to a topic (T), for reference purposes.
    • The source broker and target broker.
  • duplex –Setting this to false ensures that messages traverse uni-directionally from the on-premises broker to Amazon MQ.
  • uri –Specifies the remote endpoint to which to connect for message transfer. In this case, it is an Openwire endpoint on your Amazon MQ broker. This information could be obtained from the Amazon MQ console or via the API.
  • username and password – The same username and password configured when creating the Amazon MQ broker, and used to access the Amazon MQ ActiveMQ console.
  • networkTTL – Number of brokers in the network through which messages and subscriptions can pass. Leave this setting at the current value, if it is already included in your broker connection.
  • staticallyIncludedDestinations > queue physicalName – The destination ActiveMQ queue for which messages are destined. This is the queue that is propagated from the on-premises broker to the Amazon MQ broker for message consumption.

After the network connector is configured, you must restart the ActiveMQ service on the on-premises broker for the changes to be applied.

Verify the configuration

There are a number of places within the ActiveMQ console of your on-premises and Amazon MQ brokers to browse to verify that the configuration is correct and the connection has been established.

On-premises broker

Launch the ActiveMQ console of your on-premises broker and navigate to Network. You should see an active network bridge similar to the following:

This identifies that the connection between your on-premises broker and your Amazon MQ broker is up and running.

Now navigate to Connections and scroll to the bottom of the page. Under the Network Connectors subsection, you should see a connector labeled with the name: value that you provided within the ActiveMQ.xml configuration file. You should see an entry similar to:

Amazon MQ broker

Launch the ActiveMQ console of your Amazon MQ broker and navigate to Connections. Scroll to the Connections openwire subsection and you should see a connection specified that references the name: value that you provided within the ActiveMQ.xml configuration file. You should see an entry similar to:

If you configured the uri: for AMQP, STOMP, MQTT, or WSS as opposed to Openwire, you would see this connection under the corresponding section of the Connections page.

Testing your message flow

The setup described outlines a way for messages produced on premises to be propagated to the cloud for consumption in the cloud. This section provides steps on verifying the message flow.

Verify that the queue has been created

After you specify this queue name as staticallyIncludedDestinations > queue physicalName: and your ActiveMQ service starts, you see the following on your on-premises ActiveMQ console Queues page.

As you can see, no messages have been sent but you have one consumer listed. If you then choose Active Consumers under the Views column, you see Active Consumers for TestingQ.

This is telling you that your Amazon MQ broker is a consumer of your on-premises broker for the testing queue.

Produce and send a message to the on-premises broker

Now, produce a message on an on-premises producer and send it to your on-premises broker to a queue named TestingQ. If you navigate back to the queues page of your on-premises ActiveMQ console, you see that the messages enqueued and messages dequeued column count for your TestingQ queue have changed:

What this means is that the message originating from the on-premises producer has traversed the on-premises broker and propagated immediately to the Amazon MQ broker. At this point, the message is no longer available for consumption from the on-premises broker.

If you access the ActiveMQ console of your Amazon MQ broker and navigate to the Queues page, you see the following for the TestingQ queue:

This means that the message originally sent to your on-premises broker has traversed the network of brokers unidirectional network bridge, and is ready to be consumed from your Amazon MQ broker. The indicator is the Number of Pending Messages column.

Consume the message from an Amazon MQ broker

Connect to the Amazon MQ TestingQ queue from a consumer within the AWS Cloud environment for message consumption. Log on to the ActiveMQ console of your Amazon MQ broker and navigate to the Queue page:

As you can see, the Number of Pending Messages column figure has changed to 0 as that message has been consumed.

This diagram outlines the message lifecycle from the on-premises producer to the on-premises broker, traversing the hybrid connection between the on-premises broker and Amazon MQ, and finally consumption within the AWS Cloud.

Conclusion

This post focused on an ActiveMQ-specific scenario for transferring messages within an ActiveMQ queue from an on-premises broker to Amazon MQ.

For other on-premises brokers, such as IBM MQ, another approach would be to run ActiveMQ on-premises broker and use JMS bridging to IBM MQ, while using the approach in this post to forward to Amazon MQ. Yet another approach would be to use Apache Camel for more sophisticated routing.

I hope that you have found this example of hybrid messaging between an on-premises environment in the AWS Cloud to be useful. Many customers are already using on-premises ActiveMQ brokers, and this is a great use case to enable hybrid cloud scenarios.

To learn more, see the Amazon MQ website and Developer Guide. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.

 

Invoking AWS Lambda from Amazon MQ

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/invoking-aws-lambda-from-amazon-mq/

Contributed by Josh Kahn, AWS Solutions Architect

Message brokers can be used to solve a number of needs in enterprise architectures, including managing workload queues and broadcasting messages to a number of subscribers. Amazon MQ is a managed message broker service for Apache ActiveMQ that makes it easy to set up and operate message brokers in the cloud.

In this post, I discuss one approach to invoking AWS Lambda from queues and topics managed by Amazon MQ brokers. This and other similar patterns can be useful in integrating legacy systems with serverless architectures. You could also integrate systems already migrated to the cloud that use common APIs such as JMS.

For example, imagine that you work for a company that produces training videos and which recently migrated its video management system to AWS. The on-premises system used to publish a message to an ActiveMQ broker when a video was ready for processing by an on-premises transcoder. However, on AWS, your company uses Amazon Elastic Transcoder. Instead of modifying the management system, Lambda polls the broker for new messages and starts a new Elastic Transcoder job. This approach avoids changes to the existing application while refactoring the workload to leverage cloud-native components.

This solution uses Amazon CloudWatch Events to trigger a Lambda function that polls the Amazon MQ broker for messages. Instead of starting an Elastic Transcoder job, the sample writes the received message to an Amazon DynamoDB table with a time stamp indicating the time received.

Getting started

To start, navigate to the Amazon MQ console. Next, launch a new Amazon MQ instance, selecting Single-instance Broker and supplying a broker name, user name, and password. Be sure to document the user name and password for later.

For the purposes of this sample, choose the default options in the Advanced settings section. Your new broker is deployed to the default VPC in the selected AWS Region with the default security group. For this post, you update the security group to allow access for your sample Lambda function. In a production scenario, I recommend deploying both the Lambda function and your Amazon MQ broker in your own VPC.

After several minutes, your instance changes status from “Creation Pending” to “Available.” You can then visit the Details page of your broker to retrieve connection information, including a link to the ActiveMQ web console where you can monitor the status of your broker, publish test messages, and so on. In this example, use the Stomp protocol to connect to your broker. Be sure to capture the broker host name, for example:

<BROKER_ID>.mq.us-east-1.amazonaws.com

You should also modify the Security Group for the broker by clicking on its Security Group ID. Click the Edit button and then click Add Rule to allow inbound traffic on port 8162 for your IP address.

Deploying and scheduling the Lambda function

To simplify the deployment of this example, I’ve provided an AWS Serverless Application Model (SAM) template that deploys the sample function and DynamoDB table, and schedules the function to be invoked every five minutes. Detailed instructions can be found with sample code on GitHub in the amazonmq-invoke-aws-lambda repository, with sample code. I discuss a few key aspects in this post.

First, SAM makes it easy to deploy and schedule invocation of our function:

SubscriberFunction:
	Type: AWS::Serverless::Function
	Properties:
		CodeUri: subscriber/
		Handler: index.handler
		Runtime: nodejs6.10
		Role: !GetAtt SubscriberFunctionRole.Arn
		Timeout: 15
		Environment:
			Variables:
				HOST: !Ref AmazonMQHost
				LOGIN: !Ref AmazonMQLogin
				PASSWORD: !Ref AmazonMQPassword
				QUEUE_NAME: !Ref AmazonMQQueueName
				WORKER_FUNCTIOn: !Ref WorkerFunction
		Events:
			Timer:
				Type: Schedule
				Properties:
					Schedule: rate(5 minutes)

WorkerFunction:
Type: AWS::Serverless::Function
	Properties:
		CodeUri: worker/
		Handler: index.handler
		Runtime: nodejs6.10
Role: !GetAtt WorkerFunctionRole.Arn
		Environment:
			Variables:
				TABLE_NAME: !Ref MessagesTable

In the code, you include the URI, user name, and password for your newly created Amazon MQ broker. These allow the function to poll the broker for new messages on the sample queue.

The sample Lambda function is written in Node.js, but clients exist for a number of programming languages.

stomp.connect(options, (error, client) => {
	if (error) { /* do something */ }

	let headers = {
		destination: ‘/queue/SAMPLE_QUEUE’,
		ack: ‘auto’
	}

	client.subscribe(headers, (error, message) => {
		if (error) { /* do something */ }

		message.readString(‘utf-8’, (error, body) => {
			if (error) { /* do something */ }

			let params = {
				FunctionName: MyWorkerFunction,
				Payload: JSON.stringify({
					message: body,
					timestamp: Date.now()
				})
			}

			let lambda = new AWS.Lambda()
			lambda.invoke(params, (error, data) => {
				if (error) { /* do something */ }
			})
		}
})
})

Sending a sample message

For the purpose of this example, use the Amazon MQ console to send a test message. Navigate to the details page for your broker.

About midway down the page, choose ActiveMQ Web Console. Next, choose Manage ActiveMQ Broker to launch the admin console. When you are prompted for a user name and password, use the credentials created earlier.

At the top of the page, choose Send. From here, you can send a sample message from the broker to subscribers. For this example, this is how you generate traffic to test the end-to-end system. Be sure to set the Destination value to “SAMPLE_QUEUE.” The message body can contain any text. Choose Send.

You now have a Lambda function polling for messages on the broker. To verify that your function is working, you can confirm in the DynamoDB console that the message was successfully received and processed by the sample Lambda function.

First, choose Tables on the left and select the table name “amazonmq-messages” in the middle section. With the table detail in view, choose Items. If the function was successful, you’ll find a new entry similar to the following:

If there is no message in DynamoDB, check again in a few minutes or review the CloudWatch Logs group for Lambda functions that contain debug messages.

Alternative approaches

Beyond the approach described here, you may consider other approaches as well. For example, you could use an intermediary system such as Apache Flume to pass messages from the broker to Lambda or deploy Apache Camel to trigger Lambda via a POST to API Gateway. There are trade-offs to each of these approaches. My goal in using CloudWatch Events was to introduce an easily repeatable pattern familiar to many Lambda developers.

Summary

I hope that you have found this example of how to integrate AWS Lambda with Amazon MQ useful. If you have expertise or legacy systems that leverage APIs such as JMS, you may find this useful as you incorporate serverless concepts in your enterprise architectures.

To learn more, see the Amazon MQ website and Developer Guide. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

On Architecture and the State of the Art

Post Syndicated from Philip Fitzsimons original https://aws.amazon.com/blogs/architecture/on-architecture-and-the-state-of-the-art/

On the AWS Solutions Architecture team we know we’re following in the footsteps of other technical experts who pulled together the best practices of their eras. Around 22 BC the Roman Architect Vitruvius Pollio wrote On architecture (published as The Ten Books on Architecture), which became a seminal work on architectural theory. Vitruvius captured the best practices of his contemporaries and those who went before him (especially the Greek architects).

Closer to our time, in 1910, another technical expert, Henry Harrison Suplee, wrote Gas Turbine: progress in the design and construction of turbines operated by gases of combustion, from which we believe the phrase “state of the art” originates:

It has therefore been thought desirable to gather under one cover the most important papers which have appeared upon the subject of the gas turbine in England, France, Germany, and Switzerland, together with some account of the work in America, and to add to this such information upon actual experimental machines as can be secured.

In the present state of the art this is all that can be done, but it is believed that this will aid materially in the conduct of subsequent work, and place in the hands of the gas-power engineer a collection of material not generally accessible or available in convenient form.  

Source

Both authors wrote books that captured the current knowledge on design principles and best practices (in architecture and engineering) to improve awareness and adoption. Like these authors, we at AWS believe that capturing and sharing best practices leads to better outcomes. This follows a pattern we established internally, in our Principal Engineering community. In 2012 we started an initiative called “Well-Architected” to help share the best practices for architecting in the cloud with our customers.

Every year AWS Solution Architects dedicate hundreds of thousands of hours to helping customers build architectures that are cloud native. Through customer feedback, and real world experience we see what strategies, patterns, and approaches work for you.

“After our well-architected review and subsequent migration to the cloud, we saw the tremendous cost-savings potential of Amazon Web Services. By using the industry-standard service, we can invest the majority of our time and energy into enhancing our solutions. Thanks to (consulting partner) 1Strategy’s deep, technical AWS expertise and flexibility during our migration, we were able to leverage the strengths of AWS quickly.”

Paul Cooley, Chief Technology Officer for Imprev

This year we have again refreshed the AWS Well-Architected Framework, with a particular focus on Operational Excellence. Last year we announced the addition of Operational Excellence as a new pillar to AWS Well-Architected Framework. Having carried out thousands of reviews since then, we reexamined the pillar and are pleased to announce some significant changes. First, the pillar dives more deeply into people and process because this is an area where we see the most opportunities for teams to improve. Second, we’ve pivoted heavily to focusing on whether your team and your workload are ready for runtime operations. Key to this is ensuring that in the early phases of design that you think about how your architecture will be operated. Reflecting on this we realized that Operational Excellence should be the first pillar to support the “Architect for run-time operations” approach.

We’ve also added detail on how Amazon approaches technology architecture, covering topics such as our Principal Engineering Community and two-way doors and mechanisms. We refreshed the other pillars to reflect the evolution of AWS, and the best practices we are seeing in the field. We have also added detail on the review process, in the surprisingly named “The Review Process” section.

As part of refreshing the pillars are have also released a new Operational Excellence Pillar whitepaper, and have updated the whitepapers for all of the other pillars of the Framework. For example we have significantly updated the Reliability Pillar whitepaper to provide guidance on application design for high availability. New sections cover techniques for high availability including and beyond infrastructure implications, and considerations across the application lifecycle. This updated whitepaper also provides examples that show how to achieve availability goals in single and multi-region architectures.

You can find free training and all of the ”state of the art” whitepapers on the AWS Well-Architected homepage:

Philip Fitzsimons, Leader, AWS Well-Architected Team