Tag Archives: lambda

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Continuous Deployment to Kubernetes using AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, Amazon ECR and AWS Lambda

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/devops/continuous-deployment-to-kubernetes-using-aws-codepipeline-aws-codecommit-aws-codebuild-amazon-ecr-and-aws-lambda/

Thank you to my colleague Omar Lari for this blog on how to create a continuous deployment pipeline for Kubernetes!


You can use Kubernetes and AWS together to create a fully managed, continuous deployment pipeline for container based applications. This approach takes advantage of Kubernetes’ open-source system to manage your containerized applications, and the AWS developer tools to manage your source code, builds, and pipelines.

This post describes how to create a continuous deployment architecture for containerized applications. It uses AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS Lambda to deploy containerized applications into a Kubernetes cluster. In this environment, developers can remain focused on developing code without worrying about how it will be deployed, and development managers can be satisfied that the latest changes are always deployed.

What is Continuous Deployment?

There are many articles, posts and even conferences dedicated to the practice of continuous deployment. For the purposes of this post, I will summarize continuous delivery into the following points:

  • Code is more frequently released into production environments
  • More frequent releases allow for smaller, incremental changes reducing risk and enabling simplified roll backs if needed
  • Deployment is automated and requires minimal user intervention

For a more information, see “Practicing Continuous Integration and Continuous Delivery on AWS”.

How can you use continuous deployment with AWS and Kubernetes?

You can leverage AWS services that support continuous deployment to automatically take your code from a source code repository to production in a Kubernetes cluster with minimal user intervention. To do this, you can create a pipeline that will build and deploy committed code changes as long as they meet the requirements of each stage of the pipeline.

To create the pipeline, you will use the following services:

  • AWS CodePipeline. AWS CodePipeline is a continuous delivery service that models, visualizes, and automates the steps required to release software. You define stages in a pipeline to retrieve code from a source code repository, build that source code into a releasable artifact, test the artifact, and deploy it to production. Only code that successfully passes through all these stages will be deployed. In addition, you can optionally add other requirements to your pipeline, such as manual approvals, to help ensure that only approved changes are deployed to production.
  • AWS CodeCommit. AWS CodeCommit is a secure, scalable, and managed source control service that hosts private Git repositories. You can privately store and manage assets such as your source code in the cloud and configure your pipeline to automatically retrieve and process changes committed to your repository.
  • AWS CodeBuild. AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces artifacts that are ready to deploy. You can use AWS CodeBuild to both build your artifacts, and to test those artifacts before they are deployed.
  • AWS Lambda. AWS Lambda is a compute service that lets you run code without provisioning or managing servers. You can invoke a Lambda function in your pipeline to prepare the built and tested artifact for deployment by Kubernetes to the Kubernetes cluster.
  • Kubernetes. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It provides a platform for running, deploying, and managing containers at scale.

An Example of Continuous Deployment to Kubernetes:

The following example illustrates leveraging AWS developer tools to continuously deploy to a Kubernetes cluster:

  1. Developers commit code to an AWS CodeCommit repository and create pull requests to review proposed changes to the production code. When the pull request is merged into the master branch in the AWS CodeCommit repository, AWS CodePipeline automatically detects the changes to the branch and starts processing the code changes through the pipeline.
  2. AWS CodeBuild packages the code changes as well as any dependencies and builds a Docker image. Optionally, another pipeline stage tests the code and the package, also using AWS CodeBuild.
  3. The Docker image is pushed to Amazon ECR after a successful build and/or test stage.
  4. AWS CodePipeline invokes an AWS Lambda function that includes the Kubernetes Python client as part of the function’s resources. The Lambda function performs a string replacement on the tag used for the Docker image in the Kubernetes deployment file to match the Docker image tag applied in the build, one that matches the image in Amazon ECR.
  5. After the deployment manifest update is completed, AWS Lambda invokes the Kubernetes API to update the image in the Kubernetes application deployment.
  6. Kubernetes performs a rolling update of the pods in the application deployment to match the docker image specified in Amazon ECR.
    The pipeline is now live and responds to changes to the master branch of the CodeCommit repository. This pipeline is also fully extensible, you can add steps for performing testing or adding a step to deploy into a staging environment before the code ships into the production cluster.

An example pipeline in AWS CodePipeline that supports this architecture can be seen below:

Conclusion

We are excited to see how you leverage this pipeline to help ease your developer experience as you develop applications in Kubernetes.

You’ll find an AWS CloudFormation template with everything necessary to spin up your own continuous deployment pipeline at the CodeSuite – Continuous Deployment Reference Architecture for Kubernetes repo on GitHub. The repository details exactly how the pipeline is provisioned and how you can use it to deploy your own applications. If you have any questions, feedback, or suggestions, please let us know!

Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

  • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
  • Save data in an Amazon S3
  • Query data using Amazon Redshift Spectrum.

We use the following services:

Serverless architecture for capturing and analyzing Aurora data changes

Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

The following diagram shows the flow of data as it occurs in this tutorial:

The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

Creating an Aurora database

First, create a database by following these steps in the Amazon RDS console:

  1. Sign in to the AWS Management Console, and open the Amazon RDS console.
  2. Choose Launch a DB instance, and choose Next.
  3. For Engine, choose Amazon Aurora.
  4. Choose a DB instance class. This example uses a small, since this is not a production database.
  5. In Multi-AZ deployment, choose No.
  6. Configure DB instance identifier, Master username, and Master password.
  7. Launch the DB instance.

After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

The following screenshot shows the MySQL Workbench configuration:

Next, create a table in the database by running the following SQL statement:

Create Table
CREATE TABLE Sales (
InvoiceID int NOT NULL AUTO_INCREMENT,
ItemID int NOT NULL,
Category varchar(255),
Price double(10,2), 
Quantity int not NULL,
OrderDate timestamp,
DestinationState varchar(2),
ShippingType varchar(255),
Referral varchar(255),
PRIMARY KEY (InvoiceID)
)

You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

#!/usr/bin/python
import MySQLdb
import random
import datetime

db = MySQLdb.connect(host="AURORA_CNAME",
                     user="DBUSER",
                     passwd="DBPASSWORD",
                     db="DB")

states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
"IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ",
"NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA",
"WA","WV","WI","WY")

shipping_types = ("Free", "3-Day", "2-Day")

product_categories = ("Garden", "Kitchen", "Office", "Household")
referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")

for i in range(0,10):
    item_id = random.randint(1,100)
    state = states[random.randint(0,len(states)-1)]
    shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
    product_category = product_categories[random.randint(0,len(product_categories)-1)]
    quantity = random.randint(1,4)
    referral = referrals[random.randint(0,len(referrals)-1)]
    price = random.randint(1,100)
    order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()

    data_order = (item_id, product_category, price, quantity, order_date, state,
    shipping_type, referral)

    add_order = ("INSERT INTO Sales "
                   "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                   ShippingType, Referral) "
                   "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")

    cursor = db.cursor()
    cursor.execute(add_order, data_order)

    db.commit()

cursor.close()
db.close() 

The following screenshot shows how the table appears with the sample data:

Sending data from Amazon Aurora to Amazon S3

There are two methods available to send data from Amazon Aurora to Amazon S3:

  • Using a Lambda function
  • Using SELECT INTO OUTFILE S3

To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

Creating a Kinesis data delivery stream

The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

To create a delivery stream:

  1. Open the Kinesis Data Firehose console
  2. Choose Create delivery stream.
  3. For Delivery stream name, type AuroraChangesToS3.
  4. For Source, choose Direct PUT.
  5. For Record transformation, choose Disabled.
  6. For Destination, choose Amazon S3.
  7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
  8. Enter a prefix if needed, and choose Next.
  9. For Data compression, choose GZIP.
  10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
  11. Review all the details on the screen, and choose Create delivery stream when you’re finished.

 

Creating a Lambda function

Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

To create the Lambda function:

  1. Open the AWS Lambda console.
  2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
  3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
  4. Choose Author from scratch.
  5. Give your function a name and select Python 3.6 for Runtime
  6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
  7. Choose Next on the trigger selection screen.
  8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
  9. Choose File -> Save in the code editor and then choose Save.
import boto3
import json

firehose = boto3.client('firehose')
stream_name = ‘AuroraChangesToS3’


def Kinesis_publish_message(event, context):
    
    firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
    event['Category'], event['Price'], event['Quantity'],
    event['OrderDate'], event['DestinationState'], event['ShippingType'], 
    event['Referral']))
    
    firehose_data = {'Data': str(firehose_data)}
    print(firehose_data)
    
    firehose.put_record(DeliveryStreamName=stream_name,
    Record=firehose_data)

Note the Amazon Resource Name (ARN) of this Lambda function.

Giving Aurora permissions to invoke a Lambda function

To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

Creating a stored procedure and a trigger in Amazon Aurora

Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

DROP PROCEDURE IF EXISTS CDC_TO_FIREHOSE;
DELIMITER ;;
CREATE PROCEDURE CDC_TO_FIREHOSE (IN ItemID VARCHAR(255), 
									IN Category varchar(255), 
									IN Price double(10,2),
                                    IN Quantity int(11),
                                    IN OrderDate timestamp,
                                    IN DestinationState varchar(2),
                                    IN ShippingType varchar(255),
                                    IN Referral  varchar(255)) LANGUAGE SQL 
BEGIN
  CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
     CONCAT('{ "ItemID" : "', ItemID, 
            '", "Category" : "', Category,
            '", "Price" : "', Price,
            '", "Quantity" : "', Quantity, 
            '", "OrderDate" : "', OrderDate, 
            '", "DestinationState" : "', DestinationState, 
            '", "ShippingType" : "', ShippingType, 
            '", "Referral" : "', Referral, '"}')
     );
END
;;
DELIMITER ;

Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

DROP TRIGGER IF EXISTS TR_Sales_CDC;
 
DELIMITER ;;
CREATE TRIGGER TR_Sales_CDC
  AFTER INSERT ON Sales
  FOR EACH ROW
BEGIN
  SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
  , New.DestinationState, New.ShippingType, New.Referral
  INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral;
  CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral);
END
;;
DELIMITER ;

If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

Querying data in Amazon Redshift

In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

Next, connect to the Amazon Redshift cluster, and create an external schema and database:

create external schema if not exists spectrum_schema
from data catalog 
database 'spectrum_db' 
region 'us-east-1'
IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
create external database if not exists;

Don’t forget to replace the IAM role in the statement.

Then create an external table within the database:

 CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
  ItemID int,
  Category varchar,
  Price DOUBLE PRECISION,
  Quantity int,
  OrderDate TIMESTAMP,
  DestinationState varchar,
  ShippingType varchar,
  Referral varchar)
ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://{BUCKET_NAME}/CDC/'

Query the table, and it should contain data. This is a fact table.

select top 10 * from spectrum_schema.ecommerce_sales

 

Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

CREATE TABLE date_dimension (
  d_datekey           integer       not null sortkey,
  d_dayofmonth        integer       not null,
  d_monthnum          integer       not null,
  d_dayofweek                varchar(10)   not null,
  d_prettydate        date       not null,
  d_quarter           integer       not null,
  d_half              integer       not null,
  d_year              integer       not null,
  d_season            varchar(10)   not null,
  d_fiscalyear        integer       not null)
diststyle all;

Populate the table with data:

copy date_dimension from 's3://reparmar-lab/2016dates' 
iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
DELIMITER ','
dateformat 'auto';

The date dimension table should look like the following:

Querying data in local and external tables using Amazon Redshift

Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

select sum(quantity*price) as total_sales, date_dimension.d_season
from spectrum_schema.ecommerce_sales 
join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
group by date_dimension.d_season

You get the following results:

Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

Enter the database connection details, validate the connection, and create the data source.

Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

On the next screen, choose Edit.

In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

Final notes

Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

Keep the following in mind:

  • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
  • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
  • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum


About the Authors

Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.

 

 

 

Now Available: New Digital Training to Help You Learn About AWS Big Data Services

Post Syndicated from Sara Snedeker original https://aws.amazon.com/blogs/big-data/now-available-new-digital-training-to-help-you-learn-about-aws-big-data-services/

AWS Training and Certification recently released free digital training courses that will make it easier for you to build your cloud skills and learn about using AWS Big Data services. This training includes courses like Introduction to Amazon EMR and Introduction to Amazon Athena.

You can get free and unlimited access to more than 100 new digital training courses built by AWS experts at aws.training. It’s easy to access training related to big data. Just choose the Analytics category on our Find Training page to browse through the list of courses. You can also use the keyword filter to search for training for specific AWS offerings.

Recommended training

Just getting started, or looking to learn about a new service? Check out the following digital training courses:

Introduction to Amazon EMR (15 minutes)
Covers the available tools that can be used with Amazon EMR and the process of creating a cluster. It includes a demonstration of how to create an EMR cluster.

Introduction to Amazon Athena (10 minutes)
Introduces the Amazon Athena service along with an overview of its operating environment. It covers the basic steps in implementing Athena and provides a brief demonstration.

Introduction to Amazon QuickSight (10 minutes)
Discusses the benefits of using Amazon QuickSight and how the service works. It also includes a demonstration so that you can see Amazon QuickSight in action.

Introduction to Amazon Redshift (10 minutes)
Walks you through Amazon Redshift and its core features and capabilities. It also includes a quick overview of relevant use cases and a short demonstration.

Introduction to AWS Lambda (10 minutes)
Discusses the rationale for using AWS Lambda, how the service works, and how you can get started using it.

Introduction to Amazon Kinesis Analytics (10 minutes)
Discusses how Amazon Kinesis Analytics collects, processes, and analyzes streaming data in real time. It discusses how to use and monitor the service and explores some use cases.

Introduction to Amazon Kinesis Streams (15 minutes)
Covers how Amazon Kinesis Streams is used to collect, process, and analyze real-time streaming data to create valuable insights.

Introduction to AWS IoT (10 minutes)
Describes how the AWS Internet of Things (IoT) communication architecture works, and the components that make up AWS IoT. It discusses how AWS IoT works with other AWS services and reviews a case study.

Introduction to AWS Data Pipeline (10 minutes)
Covers components like tasks, task runner, and pipeline. It also discusses what a pipeline definition is, and reviews the AWS services that are compatible with AWS Data Pipeline.

Go deeper with classroom training

Want to learn more? Enroll in classroom training to learn best practices, get live feedback, and hear answers to your questions from an instructor.

Big Data on AWS (3 days)
Introduces you to cloud-based big data solutions such as Amazon EMR, Amazon Redshift, Amazon Kinesis, and the rest of the AWS big data platform.

Data Warehousing on AWS (3 days)
Introduces you to concepts, strategies, and best practices for designing a cloud-based data warehousing solution, and demonstrates how to collect, store, and prepare data for the data warehouse.

Building a Serverless Data Lake (1 day)
Teaches you how to design, build, and operate a serverless data lake solution with AWS services. Includes topics such as ingesting data from any data source at large scale, storing the data securely and durably, using the right tool to process large volumes of data, and understanding the options available for analyzing the data in near-real time.

More training coming in 2018

We’re always evaluating and expanding our training portfolio, so stay tuned for more training options in the new year. You can always visit us at aws.training to explore our latest offerings.

Instrumenting Web Apps Using AWS X-Ray

Post Syndicated from Bharath Kumar original https://aws.amazon.com/blogs/devops/instrumenting-web-apps-using-aws-x-ray/

This post was written by James Bowman, Software Development Engineer, AWS X-Ray

AWS X-Ray helps developers analyze and debug distributed applications and underlying services in production. You can identify and analyze root-causes of performance issues and errors, understand customer impact, and extract statistical aggregations (such as histograms) for optimization.

In this blog post, I will provide a step-by-step walkthrough for enabling X-Ray tracing in the Go programming language. You can use these steps to add X-Ray tracing to any distributed application.

Revel: A web framework for the Go language

This section will assist you with designing a guestbook application. Skip to “Instrumenting with AWS X-Ray” section below if you already have a Go language application.

Revel is a web framework for the Go language. It facilitates the rapid development of web applications by providing a predefined framework for controllers, views, routes, filters, and more.

To get started with Revel, run revel new github.com/jamesdbowman/guestbook. A project base is then copied to $GOPATH/src/github.com/jamesdbowman/guestbook.

$ tree -L 2
.
├── README.md
├── app
│ ├── controllers
│ ├── init.go
│ ├── routes
│ ├── tmp
│ └── views
├── conf
│ ├── app.conf
│ └── routes
├── messages
│ └── sample.en
├── public
│ ├── css
│ ├── fonts
│ ├── img
│ └── js
└── tests
└── apptest.go

Writing a guestbook application

A basic guestbook application can consist of just two routes: one to sign the guestbook and another to list all entries.
Let’s set up these routes by adding a Book controller, which can be routed to by modifying ./conf/routes.

./app/controllers/book.go:
package controllers

import (
    "math/rand"
    "time"

    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/endpoints"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/dynamodb"
    "github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute"
    "github.com/revel/revel"
)

const TABLE_NAME = "guestbook"
const SUCCESS = "Success.\n"
const DAY = 86400

var letters = []rune("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

func init() {
    rand.Seed(time.Now().UnixNano())
}

// randString returns a random string of len n, used for DynamoDB Hash key.
func randString(n int) string {
    b := make([]rune, n)
    for i := range b {
        b[i] = letters[rand.Intn(len(letters))]
    }
    return string(b)
}

// Book controls interactions with the guestbook.
type Book struct {
    *revel.Controller
    ddbClient *dynamodb.DynamoDB
}

// Signature represents a user's signature.
type Signature struct {
    Message string
    Epoch   int64
    ID      string
}

// ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
func (c Book) ddb() *dynamodb.DynamoDB {
    if c.ddbClient == nil {
        sess := session.Must(session.NewSession(&aws.Config{
            Region: aws.String(endpoints.UsWest2RegionID),
        }))
        c.ddbClient = dynamodb.New(sess)
    }
    return c.ddbClient
}

// Sign allows users to sign the book.
// The message is to be passed as application/json typed content, listed under the "message" top level key.
func (c Book) Sign() revel.Result {
    var s Signature

    err := c.Params.BindJSON(&s)
    if err != nil {
        return c.RenderError(err)
    }
    now := time.Now()
    s.Epoch = now.Unix()
    s.ID = randString(20)

    item, err := dynamodbattribute.MarshalMap(s)
    if err != nil {
        return c.RenderError(err)
    }

    putItemInput := &dynamodb.PutItemInput{
        TableName: aws.String(TABLE_NAME),
        Item:      item,
    }
    _, err = c.ddb().PutItem(putItemInput)
    if err != nil {
        return c.RenderError(err)
    }

    return c.RenderText(SUCCESS)
}

// List allows users to list all signatures in the book.
func (c Book) List() revel.Result {
    scanInput := &dynamodb.ScanInput{
        TableName: aws.String(TABLE_NAME),
        Limit:     aws.Int64(100),
    }
    res, err := c.ddb().Scan(scanInput)
    if err != nil {
        return c.RenderError(err)
    }

    messages := make([]string, 0)
    for _, v := range res.Items {
        messages = append(messages, *(v["Message"].S))
    }
    return c.RenderJSON(messages)
}

./conf/routes:
POST /sign Book.Sign
GET /list Book.List

Creating the resources and testing

For the purposes of this blog post, the application will be run and tested locally. We will store and retrieve messages from an Amazon DynamoDB table. Use the following AWS CLI command to create the guestbook table:

aws dynamodb create-table --region us-west-2 --table-name "guestbook" --attribute-definitions AttributeName=ID,AttributeType=S AttributeName=Epoch,AttributeType=N --key-schema AttributeName=ID,KeyType=HASH AttributeName=Epoch,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

Now, let’s test our sign and list routes. If everything is working correctly, the following result appears:

$ curl -d '{"message":"Hello from cURL!"}' -H "Content-Type: application/json" http://localhost:9000/book/sign
Success.
$ curl http://localhost:9000/book/list
[
  "Hello from cURL!"
]%

Integrating with AWS X-Ray

Download and run the AWS X-Ray daemon

The AWS SDKs emit trace segments over UDP on port 2000. (This port can be configured.) In order for the trace segments to make it to the X-Ray service, the daemon must listen on this port and batch the segments in calls to the PutTraceSegments API.
For information about downloading and running the X-Ray daemon, see the AWS X-Ray Developer Guide.

Installing the AWS X-Ray SDK for Go

To download the SDK from GitHub, run go get -u github.com/aws/aws-xray-sdk-go/... The SDK will appear in the $GOPATH.

Enabling the incoming request filter

The first step to instrumenting an application with AWS X-Ray is to enable the generation of trace segments on incoming requests. The SDK conveniently provides an implementation of http.Handler which does exactly that. To ensure incoming web requests travel through this handler, we can modify app/init.go, adding a custom function to be run on application start.

import (
    "github.com/aws/aws-xray-sdk-go/xray"
    "github.com/revel/revel"
)

...

func init() {
  ...
    revel.OnAppStart(installXRayHandler)
}

func installXRayHandler() {
    revel.Server.Handler = xray.Handler(xray.NewFixedSegmentNamer("GuestbookApp"), revel.Server.Handler)
}

The application will now emit a segment for each incoming web request. The service graph appears:

You can customize the name of the segment to make it more descriptive by providing an alternate implementation of SegmentNamer to xray.Handler. For example, you can use xray.NewDynamicSegmentNamer(fallback, pattern) in place of the fixed namer. This namer will use the host name from the incoming web request (if it matches pattern) as the segment name. This is often useful when you are trying to separate different instances of the same application.

In addition, HTTP-centric information such as method and URL is collected in the segment’s http subsection:

"http": {
    "request": {
        "url": "/book/list",
        "method": "GET",
        "user_agent": "curl/7.54.0",
        "client_ip": "::1"
    },
    "response": {
        "status": 200
    }
},

Instrumenting outbound calls

To provide detailed performance metrics for distributed applications, the AWS X-Ray SDK needs to measure the time it takes to make outbound requests. Trace context is passed to downstream services using the X-Amzn-Trace-Id header. To draw a detailed and accurate representation of a distributed application, outbound call instrumentation is required.

AWS SDK calls

The AWS X-Ray SDK for Go provides a one-line AWS client wrapper that enables the collection of detailed per-call metrics for any AWS client. We can modify the DynamoDB client instantiation to include this line:

// ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
func (c Book) ddb() *dynamodb.DynamoDB {
    if c.ddbClient == nil {
        sess := session.Must(session.NewSession(&aws.Config{
            Region: aws.String(endpoints.UsWest2RegionID),
        }))
        c.ddbClient = dynamodb.New(sess)
        xray.AWS(c.ddbClient.Client) // add subsegment-generating X-Ray handlers to this client
    }
    return c.ddbClient
}

We also need to ensure that the segment generated by our xray.Handler is passed to these AWS calls so that the X-Ray SDK knows to which segment these generated subsegments belong. In Go, the context.Context object is passed throughout the call path to achieve this goal. (In most other languages, some variant of ThreadLocal is used.) AWS clients provide a *WithContext method variant for each AWS operation, which we need to switch to:

_, err = c.ddb().PutItemWithContext(c.Request.Context(), putItemInput)
    res, err := c.ddb().ScanWithContext(c.Request.Context(), scanInput)

We now see much more detail in the Timeline view of the trace for the sign and list operations:

We can use this detail to help diagnose throttling on our DynamoDB table. In the following screenshot, the purple in the DynamoDB service graph node indicates that our table is underprovisioned. The red in the GuestbookApp node indicates that the application is throwing faults due to this throttling.

HTTP calls

Although the guestbook application does not make any non-AWS outbound HTTP calls in its current state, there is a similar one-liner to wrap HTTP clients that make outbound requests. xray.Client(c *http.Client) wraps an existing http.Client (or nil if you want to use a default HTTP client). For example:

resp, err := ctxhttp.Get(ctx, xray.Client(nil), "https://aws.amazon.com/")

Instrumenting local operations

X-Ray can also assist in measuring the performance of local compute operations. To see this in action, let’s create a custom subsegment inside the randString method:


// randString returns a random string of len n, used for DynamoDB Hash key.
func randString(ctx context.Context, n int) string {
    xray.Capture(ctx, "randString", func(innerCtx context.Context) {
        b := make([]rune, n)
        for i := range b {
            b[i] = letters[rand.Intn(len(letters))]
        }
        s := string(b)
    })
    return s
}

// we'll also need to change the callsite

s.ID = randString(c.Request.Context(), 20)

Summary

By now, you are an expert on how to instrument X-Ray for your Go applications. Instrumenting X-Ray with your applications is an easy way to analyze and debug performance issues and understand customer impact. Please feel free to give any feedback or comments below.

For more information about advanced configuration of the AWS X-Ray SDK for Go, see the AWS X-Ray SDK for Go in the AWS X-Ray Developer Guide and the aws/aws-xray-sdk-go GitHub repository.

For more information about some of the advanced X-Ray features such as histograms, annotations, and filter expressions, see the Analyzing Performance for Amazon Rekognition Apps Written on AWS Lambda Using AWS X-Ray blog post.

AWS Architecture Monthly for Kindle

Post Syndicated from Jamey Tisdale original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-for-kindle/

We recently launched AWS Architecture Monthly, a new subscription service on Kindle that will push a selection of the best content around cloud architecture from AWS, with a few pointers to other content you might also enjoy.

From building a simple website to crafting an AI-based chat bot, the choices of technologies and the best practices in how to apply them are constantly evolving. Our goal is to supply you each month with a broad selection of the best new tech content from AWS — from deep-dive tutorials to industry-trend articles.

With your free subscription, you can look forward to fresh content delivered directly to your Kindledevice or Kindle app including:
– Technical whitepapers
– Reference architectures
– New solutions and implementation guides
– Training and certification opportunities
– Industry trends

The January issue is now live. This month includes:
– AWS Architecture Blog: Glenn Gore’s Take on re:Invent 2017 (Chief Architect for AWS)
– AWS Reference Architectures: Java Microservices Deployed on EC2 Container Service; Node.js Microservices Deployed on EC2 Container Service
– AWS Training & Certification: AWS Certified Solutions Architect – Associate
– Sample Code: aws-serverless-express
– Technical Whitepaper: Serverless Architectures with AWS Lambda – Overview and Best Practices

At this time, Architecture Monthly annual subscriptions are only available in the France (new), US, UK, and Germany. As more countries become available, we’ll update you here on the blog. For Amazon.com countries not listed above, we are offering single-issue downloads — also accessible from our landing page. The content is the same as in the subscription but requires individual-issue downloads.

FAQ
I have to submit my credit card information for a free subscription?
While you do have to submit your card information at this time (as you would for a free book in the Kindle store), it won’t be charged. This will remain a free, annual subscription and includes all 10 issues for the year.

Why isn’t the subscription available everywhere?
As new countries get added to Kindle Newsstand, we’ll ensure we add them for Architecture Monthly. This month we added France but anticipate it will take some time for the new service to move into additional markets.

What countries are included in the Amazon.com list where the issues can be downloaded?
Andorra, Australia, Austria, Belgium, Brazil, Canada, Gibraltar, Guernsey, India, Ireland, Isle of Man, Japan, Jersey, Liechtenstein, Luxembourg, Mexico, Monaco, Netherlands, New Zealand, San Marino, Spain, Switzerland, Vatican City

Serverless @ re:Invent 2017

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/serverless-reinvent-2017/

At re:Invent 2014, we announced AWS Lambda, what is now the center of the serverless platform at AWS, and helped ignite the trend of companies building serverless applications.

This year, at re:Invent 2017, the topic of serverless was everywhere. We were incredibly excited to see the energy from everyone attending 7 workshops, 15 chalk talks, 20 skills sessions and 27 breakout sessions. Many of these sessions were repeated due to high demand, so we are happy to summarize and provide links to the recordings and slides of these sessions.

Over the course of the week leading up to and then the week of re:Invent, we also had over 15 new features and capabilities across a number of serverless services, including AWS Lambda, Amazon API Gateway, AWS [email protected], AWS SAM, and the newly announced AWS Serverless Application Repository!

AWS Lambda

Amazon API Gateway

  • Amazon API Gateway Supports Endpoint Integrations with Private VPCs – You can now provide access to HTTP(S) resources within your VPC without exposing them directly to the public internet. This includes resources available over a VPN or Direct Connect connection!
  • Amazon API Gateway Supports Canary Release Deployments – You can now use canary release deployments to gradually roll out new APIs. This helps you more safely roll out API changes and limit the blast radius of new deployments.
  • Amazon API Gateway Supports Access Logging – The access logging feature lets you generate access logs in different formats such as CLF (Common Log Format), JSON, XML, and CSV. The access logs can be fed into your existing analytics or log processing tools so you can perform more in-depth analysis or take action in response to the log data.
  • Amazon API Gateway Customize Integration Timeouts – You can now set a custom timeout for your API calls as low as 50ms and as high as 29 seconds (the default is 30 seconds).
  • Amazon API Gateway Supports Generating SDK in Ruby – This is in addition to support for SDKs in Java, JavaScript, Android and iOS (Swift and Objective-C). The SDKs that Amazon API Gateway generates save you development time and come with a number of prebuilt capabilities, such as working with API keys, exponential back, and exception handling.

AWS Serverless Application Repository

Serverless Application Repository is a new service (currently in preview) that aids in the publication, discovery, and deployment of serverless applications. With it you’ll be able to find shared serverless applications that you can launch in your account, while also sharing ones that you’ve created for others to do the same.

AWS [email protected]

[email protected] now supports content-based dynamic origin selection, network calls from viewer events, and advanced response generation. This combination of capabilities greatly increases the use cases for [email protected], such as allowing you to send requests to different origins based on request information, showing selective content based on authentication, and dynamically watermarking images for each viewer.

AWS SAM

Twitch Launchpad live announcements

Other service announcements

Here are some of the other highlights that you might have missed. We think these could help you make great applications:

AWS re:Invent 2017 sessions

Coming up with the right mix of talks for an event like this can be quite a challenge. The Product, Marketing, and Developer Advocacy teams for Serverless at AWS spent weeks reading through dozens of talk ideas to boil it down to the final list.

From feedback at other AWS events and webinars, we knew that customers were looking for talks that focused on concrete examples of solving problems with serverless, how to perform common tasks such as deployment, CI/CD, monitoring, and troubleshooting, and to see customer and partner examples solving real world problems. To that extent we tried to settle on a good mix based on attendee experience and provide a track full of rich content.

Below are the recordings and slides of breakout sessions from re:Invent 2017. We’ve organized them for those getting started, those who are already beginning to build serverless applications, and the experts out there already running them at scale. Some of the videos and slides haven’t been posted yet, and so we will update this list as they become available.

Find the entire Serverless Track playlist on YouTube.

Talks for people new to Serverless

Advanced topics

Expert mode

Talks for specific use cases

Talks from AWS customers & partners

Looking to get hands-on with Serverless?

At re:Invent, we delivered instructor-led skills sessions to help attendees new to serverless applications get started quickly. The content from these sessions is already online and you can do the hands-on labs yourself!
Build a Serverless web application

Still looking for more?

We also recently completely overhauled the main Serverless landing page for AWS. This includes a new Resources page containing case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Using Amazon CloudWatch and Amazon SNS to Notify when AWS X-Ray Detects Elevated Levels of Latency, Errors, and Faults in Your Application

Post Syndicated from Bharath Kumar original https://aws.amazon.com/blogs/devops/using-amazon-cloudwatch-and-amazon-sns-to-notify-when-aws-x-ray-detects-elevated-levels-of-latency-errors-and-faults-in-your-application/

AWS X-Ray helps developers analyze and debug production applications built using microservices or serverless architectures and quantify customer impact. With X-Ray, you can understand how your application and its underlying services are performing and identify and troubleshoot the root cause of performance issues and errors. You can use these insights to identify issues and opportunities for optimization.

In this blog post, I will show you how you can use Amazon CloudWatch and Amazon SNS to get notified when X-Ray detects high latency, errors, and faults in your application. Specifically, I will show you how to use this sample app to get notified through an email or SMS message when your end users observe high latencies or server-side errors when they use your application. You can customize the alarms and events by updating the sample app code.

Sample App Overview

The sample app uses the X-Ray GetServiceGraph API to get the following information:

  • Aggregated response time.
  • Requests that failed with 4xx status code (errors).
  • 429 status code (throttle).
  • 5xx status code (faults).
Sample app architecture

Overview of sample app architecture

Getting started

The sample app uses AWS CloudFormation to deploy the required resources.
To install the sample app:

  1. Run git clone to get the sample app.
  2. Update the JSON file in the Setup folder with threshold limits and notification details.
  3. Run the install.py script to install the sample app.

For more information about the installation steps, see the readme file on GitHub.

You can update the app configuration to include your phone number or email to get notified when your application in X-Ray breaches the latency, error, and fault limits you set in the configuration. If you prefer to not provide your phone number and email, then you can use the CloudWatch alarm deployed by the sample app to monitor your application in X-Ray.

The sample app deploys resources with the sample app namespace you provided during setup. This enables you to have multiple sample apps in the same region.

CloudWatch rules

The sample app uses two CloudWatch rules:

  1. SCHEDULEDLAMBDAFOR-sample_app_name to trigger at regular intervals the AWS Lambda function that queries the GetServiceGraph API.
  2. XRAYALERTSFOR-sample_app_name to look for published CloudWatch events that match the pattern defined in this rule.
CloudWatch Rules for sample app

CloudWatch rules created for the sample app

CloudWatch alarms

If you did not provide your phone number or email in the JSON file, the sample app uses a CloudWatch alarm named XRayCloudWatchAlarm-sample_app_name in combination with the CloudWatch event that you can use for monitoring.

CloudWatch Alarm for sample app

CloudWatch alarm created for the sample app

Amazon SNS messages

The sample app creates two SNS topics:

  • sample_app_name-cloudwatcheventsnstopic to send out an SMS message when the CloudWatch event matches a pattern published from the Lambda function.
  • sample_app_name-cloudwatchalarmsnstopic to send out an email message when the CloudWatch alarm goes into an ALARM state.
Amazon SNS for sample app

Amazon SNS created for the sample app

Getting notifications

The CloudWatch event looks for the following matching pattern:

{
  "detail-type": [
    "XCW Notification for Alerts"
  ],
  "source": [
    "<sample_app_name>-xcw.alerts"
  ]
}

The event then invokes an SNS topic that sends out an SMS message.

SMS in sample app

SMS that is sent when CloudWatch Event invokes Amazon SNS topic

The CloudWatch alarm looks for the TriggeredRules metric that is published whenever the CloudWatch event matches the event pattern. It goes into the ALARM state whenever TriggeredRules > 0 for the specified evaluation period and invokes an SNS topic that sends an email message.

Email sent in sample app

Email that is sent when CloudWatch Alarm goes to ALARM state

Stopping notifications

If you provided your phone number or email address, but would like to stop getting notified, change the SUBSCRIBE_TO_EMAIL_SMS environment variable in the Lambda function to No. Then, go to the Amazon SNS console and delete the subscriptions. You can still monitor your application for elevated levels of latency, errors, and faults by using the CloudWatch console.

Lambda environment variable in sample app

Change environment variable in Lambda

 

Delete subscription in SNS for sample app

Delete subscriptions to stop getting notified

Uninstalling the sample app

To uninstall the sample app, run the uninstall.py script in the Setup folder.

Extending the sample app

The sample app notifes you when when X-Ray detects high latency, errors, and faults in your application. You can extend it to provide more value for your use cases (for example, to perform an action on a resource when the state of a CloudWatch alarm changes).

To summarize, after this set up you will be able to get notified through Amazon SNS when X-Ray detects high latency, errors and faults in your application.

I hope you found this information about setting up alarms and alerts for your application in AWS X-Ray helpful. Feel free to leave questions or other feedback in the comments. Feel free to learn more about AWS X-Ray, Amazon SNS and Amazon CloudWatch

About the Author

Bharath Kumar is a Sr.Product Manager with AWS X-Ray. He has developed and launched mobile games, web applications on microservices and serverless architecture.

AWS Updated Its ISO Certifications and Now Has 67 Services Under ISO Compliance

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/aws-updated-its-iso-certifications-and-now-has-67-services-under-iso-compliance/

ISO logo

AWS has updated its certifications against ISO 9001, ISO 27001, ISO 27017, and ISO 27018 standards, bringing the total to 67 services now under ISO compliance. We added the following 29 services this cycle:

Amazon Aurora Amazon S3 Transfer Acceleration AWS [email protected]
Amazon Cloud Directory Amazon SageMaker AWS Managed Services
Amazon CloudWatch Logs Amazon Simple Notification Service AWS OpsWorks Stacks
Amazon Cognito Auto Scaling AWS Shield
Amazon Connect AWS Batch AWS Snowball Edge
Amazon Elastic Container Registry AWS CodeBuild AWS Snowmobile
Amazon Inspector AWS CodeCommit AWS Step Functions
Amazon Kinesis Data Streams AWS CodeDeploy AWS Systems Manager (formerly Amazon EC2 Systems Manager)
Amazon Macie AWS CodePipeline AWS X-Ray
Amazon QuickSight AWS IoT Core

For the complete list of services under ISO compliance, see AWS Services in Scope by Compliance Program.

AWS maintains certifications through extensive audits of its controls to ensure that information security risks that affect the confidentiality, integrity, and availability of company and customer information are appropriately managed.

You can download copies of the AWS ISO certificates that contain AWS’s in-scope services and Regions, and use these certificates to jump-start your own certification efforts:

AWS does not increase service costs in any AWS Region as a result of updating its certifications.

To learn more about compliance in the AWS Cloud, see AWS Cloud Compliance.

– Chad

Amazon Linux 2 – Modern, Stable, and Enterprise-Friendly

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-linux-2-modern-stable-and-enterprise-friendly/

I’m getting ready to wrap up my work for the year, cleaning up my inbox and catching up on a few recent AWS launches that happened at and shortly after AWS re:Invent.

Last week we launched Amazon Linux 2. This is modern version of Linux, designed to meet the security, stability, and productivity needs of enterprise environments while giving you timely access to new tools and features. It also includes all of the things that made the Amazon Linux AMI popular, including AWS integration, cloud-init, a secure default configuration, regular security updates, and AWS Support. From that base, we have added many new features including:

Long-Term Support – You can use Amazon Linux 2 in situations where you want to stick with a single major version of Linux for an extended period of time, perhaps to avoid re-qualifying your applications too frequently. This build (2017.12) is a candidate for LTS status; the final determination will be made based on feedback in the Amazon Linux Discussion Forum. Long-term support for the Amazon Linux 2 LTS build will include security updates, bug fixes, user-space Application Binary Interface (ABI), and user-space Application Programming Interface (API) compatibility for 5 years.

Extras Library – You can now get fast access to fresh, new functionality while keeping your base OS image stable and lightweight. The Amazon Linux Extras Library eliminates the age-old tradeoff between OS stability and access to fresh software. It contains open source databases, languages, and more, each packaged together with any needed dependencies.

Tuned Kernel – You have access to the latest 4.9 LTS kernel, with support for the latest EC2 features and tuned to run efficiently in AWS and other virtualized environments.

SystemdAmazon Linux 2 includes the systemd init system, designed to provide better boot performance and increased control over individual services and groups of interdependent services. For example, you can indicate that Service B must be started only after Service A is fully started, or that Service C should start on a change in network connection status.

Wide AvailabiltyAmazon Linux 2 is available in all AWS Regions in AMI and Docker image form. Virtual machine images for Hyper-V, KVM, VirtualBox, and VMware are also available. You can build and test your applications on your laptop or in your own data center and then deploy them to AWS.

Launching an Instance
You can launch an instance in all of the usual ways – AWS Management Console, AWS Command Line Interface (CLI), AWS Tools for Windows PowerShell, RunInstances, and via a AWS CloudFormation template. I’ll use the Console:

I’m interested in the Extras Library; here’s how I see which topics (lists of packages) are available:

As you can see, the library includes languages, editors, and web tools that receive frequent updates. Each topic contains all of dependencies that are needed to install the package on Amazon Linux 2. For example, the Rust topic includes the cmake build system for Rust, cargo for Rust package maintenance, and the LLVM-based compiler toolchain for Rust.

Here’s how I install a topic (Emacs 25.3):

SNS Updates
Many AWS customers use the Amazon Linux AMIs as a starting point for their own AMIs. If you do this and would like to kick off your build process whenever a new AMI is released, you can subscribe to an SNS topic:

You can be notified by email, invoke a AWS Lambda function, and so forth.

Available Now
Amazon Linux 2 is available now and you can start using it in the cloud and on-premises today! To learn more, read the Amazon Linux 2 LTS Candidate (2017.12) Release Notes.

Jeff;

 

Now Open AWS EU (Paris) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-eu-paris-region/

Today we are launching our 18th AWS Region, our fourth in Europe. Located in the Paris area, AWS customers can use this Region to better serve customers in and around France.

The Details
The new EU (Paris) Region provides a broad suite of AWS services including Amazon API Gateway, Amazon Aurora, Amazon CloudFront, Amazon CloudWatch, CloudWatch Events, Amazon CloudWatch Logs, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), EC2 Container Registry, Amazon ECS, Amazon Elastic Block Store (EBS), Amazon EMR, Amazon ElastiCache, Amazon Elasticsearch Service, Amazon Glacier, Amazon Kinesis Streams, Polly, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Route 53, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Storage Service (S3), Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, Auto Scaling, AWS Certificate Manager (ACM), AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Direct Connect, AWS Elastic Beanstalk, AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), AWS Lambda, AWS Marketplace, AWS OpsWorks Stacks, AWS Personal Health Dashboard, AWS Server Migration Service, AWS Service Catalog, AWS Shield Standard, AWS Snowball, AWS Snowball Edge, AWS Snowmobile, AWS Storage Gateway, AWS Support (including AWS Trusted Advisor), Elastic Load Balancing, and VM Import.

The Paris Region supports all sizes of C5, M5, R4, T2, D2, I3, and X1 instances.

There are also four edge locations for Amazon Route 53 and Amazon CloudFront: three in Paris and one in Marseille, all with AWS WAF and AWS Shield. Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

The Paris Region will benefit from three AWS Direct Connect locations. Telehouse Voltaire is available today. AWS Direct Connect will also become available at Equinix Paris in early 2018, followed by Interxion Paris.

All AWS infrastructure regions around the world are designed, built, and regularly audited to meet the most rigorous compliance standards and to provide high levels of security for all AWS customers. These include ISO 27001, ISO 27017, ISO 27018, SOC 1 (Formerly SAS 70), SOC 2 and SOC 3 Security & Availability, PCI DSS Level 1, and many more. This means customers benefit from all the best practices of AWS policies, architecture, and operational processes built to satisfy the needs of even the most security sensitive customers.

AWS is certified under the EU-US Privacy Shield, and the AWS Data Processing Addendum (DPA) is GDPR-ready and available now to all AWS customers to help them prepare for May 25, 2018 when the GDPR becomes enforceable. The current AWS DPA, as well as the AWS GDPR DPA, allows customers to transfer personal data to countries outside the European Economic Area (EEA) in compliance with European Union (EU) data protection laws. AWS also adheres to the Cloud Infrastructure Service Providers in Europe (CISPE) Code of Conduct. The CISPE Code of Conduct helps customers ensure that AWS is using appropriate data protection standards to protect their data, consistent with the GDPR. In addition, AWS offers a wide range of services and features to help customers meet the requirements of the GDPR, including services for access controls, monitoring, logging, and encryption.

From Our Customers
Many AWS customers are preparing to use this new Region. Here’s a small sample:

Societe Generale, one of the largest banks in France and the world, has accelerated their digital transformation while working with AWS. They developed SG Research, an application that makes reports from Societe Generale’s analysts available to corporate customers in order to improve the decision-making process for investments. The new AWS Region will reduce latency between applications running in the cloud and in their French data centers.

SNCF is the national railway company of France. Their mobile app, powered by AWS, delivers real-time traffic information to 14 million riders. Extreme weather, traffic events, holidays, and engineering works can cause usage to peak at hundreds of thousands of users per second. They are planning to use machine learning and big data to add predictive features to the app.

Radio France, the French public radio broadcaster, offers seven national networks, and uses AWS to accelerate its innovation and stay competitive.

Les Restos du Coeur, a French charity that provides assistance to the needy, delivering food packages and participating in their social and economic integration back into French society. Les Restos du Coeur is using AWS for its CRM system to track the assistance given to each of their beneficiaries and the impact this is having on their lives.

AlloResto by JustEat (a leader in the French FoodTech industry), is using AWS to to scale during traffic peaks and to accelerate their innovation process.

AWS Consulting and Technology Partners
We are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in France. Here’s a partial list:

AWS Premier Consulting PartnersAccenture, Capgemini, Claranet, CloudReach, DXC, and Edifixio.

AWS Consulting PartnersABC Systemes, Atos International SAS, CoreExpert, Cycloid, Devoteam, LINKBYNET, Oxalide, Ozones, Scaleo Information Systems, and Sopra Steria.

AWS Technology PartnersAxway, Commerce Guys, MicroStrategy, Sage, Software AG, Splunk, Tibco, and Zerolight.

AWS in France
We have been investing in Europe, with a focus on France, for the last 11 years. We have also been developing documentation and training programs to help our customers to improve their skills and to accelerate their journey to the AWS Cloud.

As part of our commitment to AWS customers in France, we plan to train more than 25,000 people in the coming years, helping them develop highly sought after cloud skills. They will have access to AWS training resources in France via AWS Academy, AWSome days, AWS Educate, and webinars, all delivered in French by AWS Technical Trainers and AWS Certified Trainers.

Use it Today
The EU (Paris) Region is open for business now and you can start using it today!

Jeff;

 

Power data ingestion into Splunk using Amazon Kinesis Data Firehose

Post Syndicated from Tarik Makota original https://aws.amazon.com/blogs/big-data/power-data-ingestion-into-splunk-using-amazon-kinesis-data-firehose/

In late September, during the annual Splunk .conf, Splunk and Amazon Web Services (AWS) jointly announced that Amazon Kinesis Data Firehose now supports Splunk Enterprise and Splunk Cloud as a delivery destination. This native integration between Splunk Enterprise, Splunk Cloud, and Amazon Kinesis Data Firehose is designed to make AWS data ingestion setup seamless, while offering a secure and fault-tolerant delivery mechanism. We want to enable customers to monitor and analyze machine data from any source and use it to deliver operational intelligence and optimize IT, security, and business performance.

With Kinesis Data Firehose, customers can use a fully managed, reliable, and scalable data streaming solution to Splunk. In this post, we tell you a bit more about the Kinesis Data Firehose and Splunk integration. We also show you how to ingest large amounts of data into Splunk using Kinesis Data Firehose.

Push vs. Pull data ingestion

Presently, customers use a combination of two ingestion patterns, primarily based on data source and volume, in addition to existing company infrastructure and expertise:

  1. Pull-based approach: Using dedicated pollers running the popular Splunk Add-on for AWS to pull data from various AWS services such as Amazon CloudWatch or Amazon S3.
  2. Push-based approach: Streaming data directly from AWS to Splunk HTTP Event Collector (HEC) by using AWS Lambda. Examples of applicable data sources include CloudWatch Logs and Amazon Kinesis Data Streams.

The pull-based approach offers data delivery guarantees such as retries and checkpointing out of the box. However, it requires more ops to manage and orchestrate the dedicated pollers, which are commonly running on Amazon EC2 instances. With this setup, you pay for the infrastructure even when it’s idle.

On the other hand, the push-based approach offers a low-latency scalable data pipeline made up of serverless resources like AWS Lambda sending directly to Splunk indexers (by using Splunk HEC). This approach translates into lower operational complexity and cost. However, if you need guaranteed data delivery then you have to design your solution to handle issues such as a Splunk connection failure or Lambda execution failure. To do so, you might use, for example, AWS Lambda Dead Letter Queues.

How about getting the best of both worlds?

Let’s go over the new integration’s end-to-end solution and examine how Kinesis Data Firehose and Splunk together expand the push-based approach into a native AWS solution for applicable data sources.

By using a managed service like Kinesis Data Firehose for data ingestion into Splunk, we provide out-of-the-box reliability and scalability. One of the pain points of the old approach was the overhead of managing the data collection nodes (Splunk heavy forwarders). With the new Kinesis Data Firehose to Splunk integration, there are no forwarders to manage or set up. Data producers (1) are configured through the AWS Management Console to drop data into Kinesis Data Firehose.

You can also create your own data producers. For example, you can drop data into a Firehose delivery stream by using Amazon Kinesis Agent, or by using the Firehose API (PutRecord(), PutRecordBatch()), or by writing to a Kinesis Data Stream configured to be the data source of a Firehose delivery stream. For more details, refer to Sending Data to an Amazon Kinesis Data Firehose Delivery Stream.

You might need to transform the data before it goes into Splunk for analysis. For example, you might want to enrich it or filter or anonymize sensitive data. You can do so using AWS Lambda. In this scenario, Kinesis Data Firehose buffers data from the incoming source data, sends it to the specified Lambda function (2), and then rebuffers the transformed data to the Splunk Cluster. Kinesis Data Firehose provides the Lambda blueprints that you can use to create a Lambda function for data transformation.

Systems fail all the time. Let’s see how this integration handles outside failures to guarantee data durability. In cases when Kinesis Data Firehose can’t deliver data to the Splunk Cluster, data is automatically backed up to an S3 bucket. You can configure this feature while creating the Firehose delivery stream (3). You can choose to back up all data or only the data that’s failed during delivery to Splunk.

In addition to using S3 for data backup, this Firehose integration with Splunk supports Splunk Indexer Acknowledgments to guarantee event delivery. This feature is configured on Splunk’s HTTP Event Collector (HEC) (4). It ensures that HEC returns an acknowledgment to Kinesis Data Firehose only after data has been indexed and is available in the Splunk cluster (5).

Now let’s look at a hands-on exercise that shows how to forward VPC flow logs to Splunk.

How-to guide

To process VPC flow logs, we implement the following architecture.

Amazon Virtual Private Cloud (Amazon VPC) delivers flow log files into an Amazon CloudWatch Logs group. Using a CloudWatch Logs subscription filter, we set up real-time delivery of CloudWatch Logs to an Kinesis Data Firehose stream.

Data coming from CloudWatch Logs is compressed with gzip compression. To work with this compression, we need to configure a Lambda-based data transformation in Kinesis Data Firehose to decompress the data and deposit it back into the stream. Firehose then delivers the raw logs to the Splunk Http Event Collector (HEC).

If delivery to the Splunk HEC fails, Firehose deposits the logs into an Amazon S3 bucket. You can then ingest the events from S3 using an alternate mechanism such as a Lambda function.

When data reaches Splunk (Enterprise or Cloud), Splunk parsing configurations (packaged in the Splunk Add-on for Kinesis Data Firehose) extract and parse all fields. They make data ready for querying and visualization using Splunk Enterprise and Splunk Cloud.

Walkthrough

Install the Splunk Add-on for Amazon Kinesis Data Firehose

The Splunk Add-on for Amazon Kinesis Data Firehose enables Splunk (be it Splunk Enterprise, Splunk App for AWS, or Splunk Enterprise Security) to use data ingested from Amazon Kinesis Data Firehose. Install the Add-on on all the indexers with an HTTP Event Collector (HEC). The Add-on is available for download from Splunkbase.

HTTP Event Collector (HEC)

Before you can use Kinesis Data Firehose to deliver data to Splunk, set up the Splunk HEC to receive the data. From Splunk web, go to the Setting menu, choose Data Inputs, and choose HTTP Event Collector. Choose Global Settings, ensure All tokens is enabled, and then choose Save. Then choose New Token to create a new HEC endpoint and token. When you create a new token, make sure that Enable indexer acknowledgment is checked.

When prompted to select a source type, select aws:cloudwatch:vpcflow.

Create an S3 backsplash bucket

To provide for situations in which Kinesis Data Firehose can’t deliver data to the Splunk Cluster, we use an S3 bucket to back up the data. You can configure this feature to back up all data or only the data that’s failed during delivery to Splunk.

Note: Bucket names are unique. Thus, you can’t use tmak-backsplash-bucket.

aws s3 create-bucket --bucket tmak-backsplash-bucket --create-bucket-configuration LocationConstraint=ap-northeast-1

Create an IAM role for the Lambda transform function

Firehose triggers an AWS Lambda function that transforms the data in the delivery stream. Let’s first create a role for the Lambda function called LambdaBasicRole.

Note: You can also set this role up when creating your Lambda function.

$ aws iam create-role --role-name LambdaBasicRole --assume-role-policy-document file://TrustPolicyForLambda.json

Here is TrustPolicyForLambda.json.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

 

After the role is created, attach the managed Lambda basic execution policy to it.

$ aws iam attach-role-policy 
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole 
  --role-name LambdaBasicRole

 

Create a Firehose Stream

On the AWS console, open the Amazon Kinesis service, go to the Firehose console, and choose Create Delivery Stream.

In the next section, you can specify whether you want to use an inline Lambda function for transformation. Because incoming CloudWatch Logs are gzip compressed, choose Enabled for Record transformation, and then choose Create new.

From the list of the available blueprint functions, choose Kinesis Data Firehose CloudWatch Logs Processor. This function unzips data and place it back into the Firehose stream in compliance with the record transformation output model.

Enter a name for the Lambda function, choose Choose an existing role, and then choose the role you created earlier. Then choose Create Function.

Go back to the Firehose Stream wizard, choose the Lambda function you just created, and then choose Next.

Select Splunk as the destination, and enter your Splunk Http Event Collector information.

Note: Amazon Kinesis Data Firehose requires the Splunk HTTP Event Collector (HEC) endpoint to be terminated with a valid CA-signed certificate matching the DNS hostname used to connect to your HEC endpoint. You receive delivery errors if you are using a self-signed certificate.

In this example, we only back up logs that fail during delivery.

To monitor your Firehose delivery stream, enable error logging. Doing this means that you can monitor record delivery errors.

Create an IAM role for the Firehose stream by choosing Create new, or Choose. Doing this brings you to a new screen. Choose Create a new IAM role, give the role a name, and then choose Allow.

If you look at the policy document, you can see that the role gives Kinesis Data Firehose permission to publish error logs to CloudWatch, execute your Lambda function, and put records into your S3 backup bucket.

You now get a chance to review and adjust the Firehose stream settings. When you are satisfied, choose Create Stream. You get a confirmation once the stream is created and active.

Create a VPC Flow Log

To send events from Amazon VPC, you need to set up a VPC flow log. If you already have a VPC flow log you want to use, you can skip to the “Publish CloudWatch to Kinesis Data Firehose” section.

On the AWS console, open the Amazon VPC service. Then choose VPC, Your VPC, and choose the VPC you want to send flow logs from. Choose Flow Logs, and then choose Create Flow Log. If you don’t have an IAM role that allows your VPC to publish logs to CloudWatch, choose Set Up Permissions and Create new role. Use the defaults when presented with the screen to create the new IAM role.

Once active, your VPC flow log should look like the following.

Publish CloudWatch to Kinesis Data Firehose

When you generate traffic to or from your VPC, the log group is created in Amazon CloudWatch. The new log group has no subscription filter, so set up a subscription filter. Setting this up establishes a real-time data feed from the log group to your Firehose delivery stream.

At present, you have to use the AWS Command Line Interface (AWS CLI) to create a CloudWatch Logs subscription to a Kinesis Data Firehose stream. However, you can use the AWS console to create subscriptions to Lambda and Amazon Elasticsearch Service.

To allow CloudWatch to publish to your Firehose stream, you need to give it permissions.

$ aws iam create-role --role-name CWLtoKinesisFirehoseRole --assume-role-policy-document file://TrustPolicyForCWLToFireHose.json


Here is the content for TrustPolicyForCWLToFireHose.json.

{
  "Statement": {
    "Effect": "Allow",
    "Principal": { "Service": "logs.us-east-1.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }
}

 

Attach the policy to the newly created role.

$ aws iam put-role-policy 
    --role-name CWLtoKinesisFirehoseRole 
    --policy-name Permissions-Policy-For-CWL 
    --policy-document file://PermissionPolicyForCWLToFireHose.json

Here is the content for PermissionPolicyForCWLToFireHose.json.

{
    "Statement":[
      {
        "Effect":"Allow",
        "Action":["firehose:*"],
        "Resource":["arn:aws:firehose:us-east-1:YOUR-AWS-ACCT-NUM:deliverystream/ FirehoseSplunkDeliveryStream"]
      },
      {
        "Effect":"Allow",
        "Action":["iam:PassRole"],
        "Resource":["arn:aws:iam::YOUR-AWS-ACCT-NUM:role/CWLtoKinesisFirehoseRole"]
      }
    ]
}

Finally, create a subscription filter.

$ aws logs put-subscription-filter 
   --log-group-name " /vpc/flowlog/FirehoseSplunkDemo" 
   --filter-name "Destination" 
   --filter-pattern "" 
   --destination-arn "arn:aws:firehose:us-east-1:YOUR-AWS-ACCT-NUM:deliverystream/FirehoseSplunkDeliveryStream" 
   --role-arn "arn:aws:iam::YOUR-AWS-ACCT-NUM:role/CWLtoKinesisFirehoseRole"

When you run the AWS CLI command preceding, you don’t get any acknowledgment. To validate that your CloudWatch Log Group is subscribed to your Firehose stream, check the CloudWatch console.

As soon as the subscription filter is created, the real-time log data from the log group goes into your Firehose delivery stream. Your stream then delivers it to your Splunk Enterprise or Splunk Cloud environment for querying and visualization. The screenshot following is from Splunk Enterprise.

In addition, you can monitor and view metrics associated with your delivery stream using the AWS console.

Conclusion

Although our walkthrough uses VPC Flow Logs, the pattern can be used in many other scenarios. These include ingesting data from AWS IoT, other CloudWatch logs and events, Kinesis Streams or other data sources using the Kinesis Agent or Kinesis Producer Library. We also used Lambda blueprint Kinesis Data Firehose CloudWatch Logs Processor to transform streaming records from Kinesis Data Firehose. However, you might need to use a different Lambda blueprint or disable record transformation entirely depending on your use case. For an additional use case using Kinesis Data Firehose, check out This is My Architecture Video, which discusses how to securely centralize cross-account data analytics using Kinesis and Splunk.

 


Additional Reading

If you found this post useful, be sure to check out Integrating Splunk with Amazon Kinesis Streams and Using Amazon EMR and Hunk for Rapid Response Log Analysis and Review.


About the Authors

Tarik Makota is a solutions architect with the Amazon Web Services Partner Network. He provides technical guidance, design advice and thought leadership to AWS’ most strategic software partners. His career includes work in an extremely broad software development and architecture roles across ERP, financial printing, benefit delivery and administration and financial services. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.

 

 

 

Roy Arsan is a solutions architect in the Splunk Partner Integrations team. He has a background in product development, cloud architecture, and building consumer and enterprise cloud applications. More recently, he has architected Splunk solutions on major cloud providers, including an AWS Quick Start for Splunk that enables AWS users to easily deploy distributed Splunk Enterprise straight from their AWS console. He’s also the co-author of the AWS Lambda blueprints for Splunk. He holds an M.S. in Computer Science Engineering from the University of Michigan.

 

 

 

How to Enhance the Security of Sensitive Customer Data by Using Amazon CloudFront Field-Level Encryption

Post Syndicated from Alex Tomic original https://aws.amazon.com/blogs/security/how-to-enhance-the-security-of-sensitive-customer-data-by-using-amazon-cloudfront-field-level-encryption/

Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content to end users through a worldwide network of edge locations. CloudFront provides a number of benefits and capabilities that can help you secure your applications and content while meeting compliance requirements. For example, you can configure CloudFront to help enforce secure, end-to-end connections using HTTPS SSL/TLS encryption. You also can take advantage of CloudFront integration with AWS Shield for DDoS protection and with AWS WAF (a web application firewall) for protection against application-layer attacks, such as SQL injection and cross-site scripting.

Now, CloudFront field-level encryption helps secure sensitive data such as a customer phone numbers by adding another security layer to CloudFront HTTPS. Using this functionality, you can help ensure that sensitive information in a POST request is encrypted at CloudFront edge locations. This information remains encrypted as it flows to and beyond your origin servers that terminate HTTPS connections with CloudFront and throughout the application environment. In this blog post, we demonstrate how you can enhance the security of sensitive data by using CloudFront field-level encryption.

Note: This post assumes that you understand concepts and services such as content delivery networks, HTTP forms, public-key cryptography, CloudFrontAWS Lambda, and the AWS CLI. If necessary, you should familiarize yourself with these concepts and review the solution overview in the next section before proceeding with the deployment of this post’s solution.

How field-level encryption works

Many web applications collect and store data from users as those users interact with the applications. For example, a travel-booking website may ask for your passport number and less sensitive data such as your food preferences. This data is transmitted to web servers and also might travel among a number of services to perform tasks. However, this also means that your sensitive information may need to be accessed by only a small subset of these services (most other services do not need to access your data).

User data is often stored in a database for retrieval at a later time. One approach to protecting stored sensitive data is to configure and code each service to protect that sensitive data. For example, you can develop safeguards in logging functionality to ensure sensitive data is masked or removed. However, this can add complexity to your code base and limit performance.

Field-level encryption addresses this problem by ensuring sensitive data is encrypted at CloudFront edge locations. Sensitive data fields in HTTPS form POSTs are automatically encrypted with a user-provided public RSA key. After the data is encrypted, other systems in your architecture see only ciphertext. If this ciphertext unintentionally becomes externally available, the data is cryptographically protected and only designated systems with access to the private RSA key can decrypt the sensitive data.

It is critical to secure private RSA key material to prevent unauthorized access to the protected data. Management of cryptographic key material is a larger topic that is out of scope for this blog post, but should be carefully considered when implementing encryption in your applications. For example, in this blog post we store private key material as a secure string in the Amazon EC2 Systems Manager Parameter Store. The Parameter Store provides a centralized location for managing your configuration data such as plaintext data (such as database strings) or secrets (such as passwords) that are encrypted using AWS Key Management Service (AWS KMS). You may have an existing key management system in place that you can use, or you can use AWS CloudHSM. CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys in the AWS Cloud.

To illustrate field-level encryption, let’s look at a simple form submission where Name and Phone values are sent to a web server using an HTTP POST. A typical form POST would contain data such as the following.

POST / HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length:60

Name=Jane+Doe&Phone=404-555-0150

Instead of taking this typical approach, field-level encryption converts this data similar to the following.

POST / HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 1713

Name=Jane+Doe&Phone=AYABeHxZ0ZqWyysqxrB5pEBSYw4AAA...

To further demonstrate field-level encryption in action, this blog post includes a sample serverless application that you can deploy by using a CloudFormation template, which creates an application environment using CloudFront, Amazon API Gateway, and Lambda. The sample application is only intended to demonstrate field-level encryption functionality and is not intended for production use. The following diagram depicts the architecture and data flow of this sample application.

Sample application architecture and data flow

Diagram of the solution's architecture and data flow

Here is how the sample solution works:

  1. An application user submits an HTML form page with sensitive data, generating an HTTPS POST to CloudFront.
  2. Field-level encryption intercepts the form POST and encrypts sensitive data with the public RSA key and replaces fields in the form post with encrypted ciphertext. The form POST ciphertext is then sent to origin servers.
  3. The serverless application accepts the form post data containing ciphertext where sensitive data would normally be. If a malicious user were able to compromise your application and gain access to your data, such as the contents of a form, that user would see encrypted data.
  4. Lambda stores data in a DynamoDB table, leaving sensitive data to remain safely encrypted at rest.
  5. An administrator uses the AWS Management Console and a Lambda function to view the sensitive data.
  6. During the session, the administrator retrieves ciphertext from the DynamoDB table.
  7. The administrator decrypts sensitive data by using private key material stored in the EC2 Systems Manager Parameter Store.
  8. Decrypted sensitive data is transmitted over SSL/TLS via the AWS Management Console to the administrator for review.

Deployment walkthrough

The high-level steps to deploy this solution are as follows:

  1. Stage the required artifacts
    When deployment packages are used with Lambda, the zipped artifacts have to be placed in an S3 bucket in the target AWS Region for deployment. This step is not required if you are deploying in the US East (N. Virginia) Region because the package has already been staged there.
  2. Generate an RSA key pair
    Create a public/private key pair that will be used to perform the encrypt/decrypt functionality.
  3. Upload the public key to CloudFront and associate it with the field-level encryption configuration
    After you create the key pair, the public key is uploaded to CloudFront so that it can be used by field-level encryption.
  4. Launch the CloudFormation stack
    Deploy the sample application for demonstrating field-level encryption by using AWS CloudFormation.
  5. Add the field-level encryption configuration to the CloudFront distribution
    After you have provisioned the application, this step associates the field-level encryption configuration with the CloudFront distribution.
  6. Store the RSA private key in the Parameter Store
    Store the private key in the Parameter Store as a SecureString data type, which uses AWS KMS to encrypt the parameter value.

Deploy the solution

1. Stage the required artifacts

(If you are deploying in the US East [N. Virginia] Region, skip to Step 2, “Generate an RSA key pair.”)

Stage the Lambda function deployment package in an Amazon S3 bucket located in the AWS Region you are using for this solution. To do this, download the zipped deployment package and upload it to your in-region bucket. For additional information about uploading objects to S3, see Uploading Object into Amazon S3.

2. Generate an RSA key pair

In this section, you will generate an RSA key pair by using OpenSSL:

  1. Confirm access to OpenSSL.
    $ openssl version

    You should see version information similar to the following.

    OpenSSL <version> <date>

  1. Create a private key using the following command.
    $ openssl genrsa -out private_key.pem 2048

    The command results should look similar to the following.

    Generating RSA private key, 2048 bit long modulus
    ................................................................................+++
    ..........................+++
    e is 65537 (0x10001)
  1. Extract the public key from the private key by running the following command.
    $ openssl rsa -pubout -in private_key.pem -out public_key.pem

    You should see output similar to the following.

    writing RSA key
  1. Restrict access to the private key.$ chmod 600 private_key.pem Note: You will use the public and private key material in Steps 3 and 6 to configure the sample application.

3. Upload the public key to CloudFront and associate it with the field-level encryption configuration

Now that you have created the RSA key pair, you will use the AWS Management Console to upload the public key to CloudFront for use by field-level encryption. Complete the following steps to upload and configure the public key.

Note: Do not include spaces or special characters when providing the configuration values in this section.

  1. From the AWS Management Console, choose Services > CloudFront.
  2. In the navigation pane, choose Public Key and choose Add Public Key.
    Screenshot of adding a public key

Complete the Add Public Key configuration boxes:

  • Key Name: Type a name such as DemoPublicKey.
  • Encoded Key: Paste the contents of the public_key.pem file you created in Step 2c. Copy and paste the encoded key value for your public key, including the -----BEGIN PUBLIC KEY----- and -----END PUBLIC KEY----- lines.
  • Comment: Optionally add a comment.
  1. Choose Create.
  2. After adding at least one public key to CloudFront, the next step is to create a profile to tell CloudFront which fields of input you want to be encrypted. While still on the CloudFront console, choose Field-level encryption in the navigation pane.
  3. Under Profiles, choose Create profile.
    Screenshot of creating a profile

Complete the Create profile configuration boxes:

  • Name: Type a name such as FLEDemo.
  • Comment: Optionally add a comment.
  • Public key: Select the public key you configured in Step 4.b.
  • Provider name: Type a provider name such as FLEDemo.
    This information will be used when the form data is encrypted, and must be provided to applications that need to decrypt the data, along with the appropriate private key.
  • Pattern to match: Type phone. This configures field-level encryption to match based on the phone.
  1. Choose Save profile.
  2. Configurations include options for whether to block or forward a query to your origin in scenarios where CloudFront can’t encrypt the data. Under Encryption Configurations, choose Create configuration.
    Screenshot of creating a configuration

Complete the Create configuration boxes:

  • Comment: Optionally add a comment.
  • Content type: Enter application/x-www-form-urlencoded. This is a common media type for encoding form data.
  • Default profile ID: Select the profile you added in Step 3e.
  1. Choose Save configuration

4. Launch the CloudFormation stack

Launch the sample application by using a CloudFormation template that automates the provisioning process.

Input parameter Input parameter description
ProviderID Enter the Provider name you assigned in Step 3e. The ProviderID is used in field-level encryption configuration in CloudFront (letters and numbers only, no special characters)
PublicKeyName Enter the Key Name you assigned in Step 3b. This name is assigned to the public key in field-level encryption configuration in CloudFront (letters and numbers only, no special characters).
PrivateKeySSMPath Leave as the default: /cloudfront/field-encryption-sample/private-key
ArtifactsBucket The S3 bucket with artifact files (staged zip file with app code). Leave as default if deploying in us-east-1.
ArtifactsPrefix The path in the S3 bucket containing artifact files. Leave as default if deploying in us-east-1.

To finish creating the CloudFormation stack:

  1. Choose Next on the Select Template page, enter the input parameters and choose Next.
    Note: The Artifacts configuration needs to be updated only if you are deploying outside of us-east-1 (US East [N. Virginia]). See Step 1 for artifact staging instructions.
  2. On the Options page, accept the defaults and choose Next.
  3. On the Review page, confirm the details, choose the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create. (The stack will be created in approximately 15 minutes.)

5. Add the field-level encryption configuration to the CloudFront distribution

While still on the CloudFront console, choose Distributions in the navigation pane, and then:

    1. In the Outputs section of the FLE-Sample-App stack, look for CloudFrontDistribution and click the URL to open the CloudFront console.
    2. Choose Behaviors, choose the Default (*) behavior, and then choose Edit.
    3. For Field-level Encryption Config, choose the configuration you created in Step 3g.
      Screenshot of editing the default cache behavior
    4. Choose Yes, Edit.
    5. While still in the CloudFront distribution configuration, choose the General Choose Edit, scroll down to Distribution State, and change it to Enabled.
    6. Choose Yes, Edit.

6. Store the RSA private key in the Parameter Store

In this step, you store the private key in the EC2 Systems Manager Parameter Store as a SecureString data type, which uses AWS KMS to encrypt the parameter value. For more information about AWS KMS, see the AWS Key Management Service Developer Guide. You will need a working installation of the AWS CLI to complete this step.

  1. Store the private key in the Parameter Store with the AWS CLI by running the following command. You will find the <KMSKeyID> in the KMSKeyID in the CloudFormation stack Outputs. Substitute it for the placeholder in the following command.
    $ aws ssm put-parameter --type "SecureString" --name /cloudfront/field-encryption-sample/private-key --value file://private_key.pem --key-id "<KMSKeyID>"
    
    ------------------
    |  PutParameter  |
    +----------+-----+
    |  Version |  1  |
    +----------+-----+

  1. Verify the parameter. Your private key material should be accessible through the ssm get-parameter in the following command in the Value The key material has been truncated in the following output.
    $ aws ssm get-parameter --name /cloudfront/field-encryption-sample/private-key --with-decryption
    
    -----…
    
    ||  Value  |  -----BEGIN RSA PRIVATE KEY-----
    MIIEowIBAAKCAQEAwGRBGuhacmw+C73kM6Z…….

    Notice we use the —with decryption argument in this command. This returns the private key as cleartext.

    This completes the sample application deployment. Next, we show you how to see field-level encryption in action.

  1. Delete the private key from local storage. On Linux for example, using the shred command, securely delete the private key material from your workstation as shown below. You may also wish to store the private key material within an AWS CloudHSM or other protected location suitable for your security requirements. For production implementations, you also should implement key rotation policies.
    $ shred -zvu -n  100 private*.pem
    
    shred: private_encrypted_key.pem: pass 1/101 (random)...
    shred: private_encrypted_key.pem: pass 2/101 (dddddd)...
    shred: private_encrypted_key.pem: pass 3/101 (555555)...
    ….

Test the sample application

Use the following steps to test the sample application with field-level encryption:

  1. Open sample application in your web browser by clicking the ApplicationURL link in the CloudFormation stack Outputs. (for example, https:d199xe5izz82ea.cloudfront.net/prod/). Note that it may take several minutes for the CloudFront distribution to reach the Deployed Status from the previous step, during which time you may not be able to access the sample application.
  2. Fill out and submit the HTML form on the page:
    1. Complete the three form fields: Full Name, Email Address, and Phone Number.
    2. Choose Submit.
      Screenshot of completing the sample application form
      Notice that the application response includes the form values. The phone number returns the following ciphertext encryption using your public key. This ciphertext has been stored in DynamoDB.
      Screenshot of the phone number as ciphertext
  3. Execute the Lambda decryption function to download ciphertext from DynamoDB and decrypt the phone number using the private key:
    1. In the CloudFormation stack Outputs, locate DecryptFunction and click the URL to open the Lambda console.
    2. Configure a test event using the “Hello World” template.
    3. Choose the Test button.
  4. View the encrypted and decrypted phone number data.
    Screenshot of the encrypted and decrypted phone number data

Summary

In this blog post, we showed you how to use CloudFront field-level encryption to encrypt sensitive data at edge locations and help prevent access from unauthorized systems. The source code for this solution is available on GitHub. For additional information about field-level encryption, see the documentation.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, please start a new thread on the CloudFront forum.

– Alex and Cameron

How to Manage Amazon GuardDuty Security Findings Across Multiple Accounts

Post Syndicated from Tom Stickle original https://aws.amazon.com/blogs/security/how-to-manage-amazon-guardduty-security-findings-across-multiple-accounts/

Introduced at AWS re:Invent 2017, Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. In an AWS Blog post, Jeff Barr shows you how to enable GuardDuty to monitor your AWS resources continuously. That blog post shows how to get started with a single GuardDuty account and provides an overview of the features of the service. Your security team, though, will probably want to use GuardDuty to monitor a group of AWS accounts continuously.

In this post, I demonstrate how to use GuardDuty to monitor a group of AWS accounts and have their findings routed to another AWS account—the master account—that is owned by a security team. The method I demonstrate in this post is especially useful if your security team is responsible for monitoring a group of AWS accounts over which it does not have direct access—known as member accounts. In this solution, I simplify the work needed to enable GuardDuty in member accounts and configure findings by simplifying the process, which I do by enabling GuardDuty in the master account and inviting member accounts.

Enable GuardDuty in a master account and invite member accounts

To get started, you must enable GuardDuty in the master account, which will receive GuardDuty findings. The master account should be managed by your security team, and it will display the findings from all member accounts. The master account can be reverted later by removing any member accounts you add to it. Adding member accounts is a two-way handshake mechanism to ensure that administrators from both the master and member accounts formally agree to establish the relationship.

To enable GuardDuty in the master account and add member accounts:

  1. Navigate to the GuardDuty console.
  2. In the navigation pane, choose Accounts.
    Screenshot of the Accounts choice in the navigation pane
  1. To designate this account as the GuardDuty master account, start adding member accounts:
    • You can add individual accounts by choosing Add Account, or you can add a list of accounts by choosing Upload List (.csv).
  1. Now, add the account ID and email address of the member account, and choose Add. (If you are uploading a list of accounts, choose Browse, choose the .csv file with the member accounts [one email address and account ID per line], and choose Add accounts.)
    Screenshot of adding an account

For security reasons, AWS checks to make sure each account ID is valid and that you’ve entered each member account’s email address that was used to create the account. If a member account’s account ID and email address do not match, GuardDuty does not send an invitation.
Screenshot showing the Status of Invite

  1. After you add all the member accounts you want to add, you will see them listed in the Member accounts table with a Status of Invite. You don’t have to individually invite each account—you can choose a group of accounts and when you choose to invite one account in the group, all accounts are invited.
  2. When you choose Invite for each member account:
    1. AWS checks to make sure the account ID is valid and the email address provided is the email address of the member account.
    2. AWS sends an email to the member account email address with a link to the GuardDuty console, where the member account owner can accept the invitation. You can add a customized message from your security team. Account owners who receive the invitation must sign in to their AWS account to accept the invitation. The service also sends an invitation through the AWS Personal Health Dashboard in case the member email address is not monitored. This invitation appears in the member account under the AWS Personal Health Dashboard alert bell on the AWS Management Console.
    3. A pending-invitation indicator is shown on the GuardDuty console of the member account, as shown in the following screenshot.
      Screenshot showing the pending-invitation indicator

When the invitation is sent by email, it is sent to the account owner of the GuardDuty member account.
Screenshot of the invitation sent by email

The account owner can click the link in the email invitation or the AWS Personal Health Dashboard message, or the account owner can sign in to their account and navigate to the GuardDuty console. In all cases, the member account displays the pending invitation in the member account’s GuardDuty console with instructions for accepting the invitation. The GuardDuty console walks the account owner through accepting the invitation, including enabling GuardDuty if it is not already enabled.

If you prefer to work in the AWS CLI, you can enable GuardDuty and accept the invitation. To do this, call CreateDetector to enable GuardDuty, and then call AcceptInvitation, which serves the same purpose as accepting the invitation in the GuardDuty console.

  1. After the member account owner accepts the invitation, the Status in the master account is changed to Monitored. The status helps you track the status of each AWS account that you invite.
    Screenshot showing the Status change to Monitored

You have enabled GuardDuty on the member account, and all findings will be forwarded to the master account. You can now monitor the findings about GuardDuty member accounts from the GuardDuty console in the master account.

The member account owner can see GuardDuty findings by default and can control all aspects of the experience in the member account with AWS Identity and Access Management (IAM) permissions. Users with the appropriate permissions can end the multi-account relationship at any time by toggling the Accept button on the Accounts page. Note that ending the relationship changes the Status of the account to Resigned and also triggers a security finding on the side of the master account so that the security team knows the member account is no longer linked to the master account.

Working with GuardDuty findings

Most security teams have ticketing systems, chat operations, security information event management (SIEM) systems, or other security automation systems to which they would like to push GuardDuty findings. For this purpose, GuardDuty sends all findings as JSON-based messages through Amazon CloudWatch Events, a scalable service to which you can subscribe and to which AWS services can stream system events. To access these events, navigate to the CloudWatch Events console and create a rule that subscribes to the GuardDuty-related findings. You then can assign a target such as Amazon Kinesis Data Firehose that can place the findings in a number of services such as Amazon S3. The following screenshot is of the CloudWatch Events console, where I have a rule that pulls all events from GuardDuty and pushes them to a preconfigured AWS Lambda function.

Screenshot of a CloudWatch Events rule

The following example is a subset of GuardDuty findings that includes relevant context and information about the nature of a threat that was detected. In this example, the instanceId, i-00bb62b69b7004a4c, is performing Secure Shell (SSH) brute-force attacks against IP address 172.16.0.28. From a Lambda function, you can access any of the following fields such as the title of the finding and its description, and send those directly to your ticketing system.

Example GuardDuty findings

You can use other AWS services to build custom analytics and visualizations of your security findings. For example, you can connect Kinesis Data Firehose to CloudWatch Events and write events to an S3 bucket in a standard format, which can be encrypted with AWS Key Management Service and then compressed. You also can use Amazon QuickSight to build ad hoc dashboards by using AWS Glue and Amazon Athena. Similarly, you can place the data from Kinesis Data Firehose in Amazon Elasticsearch Service, with which you can use tools such as Kibana to build your own visualizations and dashboards.

Like most other AWS services, GuardDuty is a regional service. This means that when you enable GuardDuty in an AWS Region, all findings are generated and delivered in that region. If you are regulated by a compliance regime, this is often an important requirement to ensure that security findings remain in a specific jurisdiction. Because customers have let us know they would prefer to be able to enable GuardDuty globally and have all findings aggregated in one place, we intend to give the choice of regional or global isolation as we evolve this new service.

Summary

In this blog post, I have demonstrated how to use GuardDuty to monitor a group of GuardDuty member accounts and aggregate security findings in a central master GuardDuty account. You can use this solution whether or not you have direct control over the member accounts.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about using GuardDuty, start a thread in the GuardDuty forum or contact AWS Support.

-Tom

Managing AWS Lambda Function Concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/

One of the key benefits of serverless applications is the ease in which they can scale to meet traffic demands or requests, with little to no need for capacity planning. In AWS Lambda, which is the core of the serverless platform at AWS, the unit of scale is a concurrent execution. This refers to the number of executions of your function code that are happening at any given time.

Thinking about concurrent executions as a unit of scale is a fairly unique concept. In this post, I dive deeper into this and talk about how you can make use of per function concurrency limits in Lambda.

Understanding concurrency in Lambda

Instead of diving right into the guts of how Lambda works, here’s an appetizing analogy: a magical pizza.
Yes, a magical pizza!

This magical pizza has some unique properties:

  • It has a fixed maximum number of slices, such as 8.
  • Slices automatically re-appear after they are consumed.
  • When you take a slice from the pizza, it does not re-appear until it has been completely consumed.
  • One person can take multiple slices at a time.
  • You can easily ask to have the number of slices increased, but they remain fixed at any point in time otherwise.

Now that the magical pizza’s properties are defined, here’s a hypothetical situation of some friends sharing this pizza.

Shawn, Kate, Daniela, Chuck, Ian and Avleen get together every Friday to share a pizza and catch up on their week. As there is just six of them, they can easily all enjoy a slice of pizza at a time. As they finish each slice, it re-appears in the pizza pan and they can take another slice again. Given the magical properties of their pizza, they can continue to eat all they want, but with two very important constraints:

  • If any of them take too many slices at once, the others may not get as much as they want.
  • If they take too many slices, they might also eat too much and get sick.

One particular week, some of the friends are hungrier than the rest, taking two slices at a time instead of just one. If more than two of them try to take two pieces at a time, this can cause contention for pizza slices. Some of them would wait hungry for the slices to re-appear. They could ask for a pizza with more slices, but then run the same risk again later if more hungry friends join than planned for.

What can they do?

If the friends agreed to accept a limit for the maximum number of slices they each eat concurrently, both of these issues are avoided. Some could have a maximum of 2 of the 8 slices, or other concurrency limits that were more or less. Just so long as they kept it at or under eight total slices to be eaten at one time. This would keep any from going hungry or eating too much. The six friends can happily enjoy their magical pizza without worry!

Concurrency in Lambda

Concurrency in Lambda actually works similarly to the magical pizza model. Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed, just like the count of slices in the pizza. As of May 2017, the default limit is 1000 “slices” of concurrency per AWS Region.

Also like the magical pizza, each concurrency “slice” can only be consumed individually one at a time. After consumption, it becomes available to be consumed again. Services invoking Lambda functions can consume multiple slices of concurrency at the same time, just like the group of friends can take multiple slices of the pizza.

Let’s take our example of the six friends and bring it back to AWS services that commonly invoke Lambda:

  • Amazon S3
  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon Cognito

In a single account with the default concurrency limit of 1000 concurrent executions, any of these four services could invoke enough functions to consume the entire limit or some part of it. Just like with the pizza example, there is the possibility for two issues to pop up:

  • One or more of these services could invoke enough functions to consume a majority of the available concurrency capacity. This could cause others to be starved for it, causing failed invocations.
  • A service could consume too much concurrent capacity and cause a downstream service or database to be overwhelmed, which could cause failed executions.

For Lambda functions that are launched in a VPC, you have the potential to consume the available IP addresses in a subnet or the maximum number of elastic network interfaces to which your account has access. For more information, see Configuring a Lambda Function to Access Resources in an Amazon VPC. For information about elastic network interface limits, see Network Interfaces section in the Amazon VPC Limits topic.

One way to solve both of these problems is applying a concurrency limit to the Lambda functions in an account.

Configuring per function concurrency limits

You can now set a concurrency limit on individual Lambda functions in an account. The concurrency limit that you set reserves a portion of your account level concurrency for a given function. All of your functions’ concurrent executions count against this account-level limit by default.

If you set a concurrency limit for a specific function, then that function’s concurrency limit allocation is deducted from the shared pool and assigned to that specific function. AWS also reserves 100 units of concurrency for all functions that don’t have a specified concurrency limit set. This helps to make sure that future functions have capacity to be consumed.

Going back to the example of the consuming services, you could set throttles for the functions as follows:

Amazon S3 function = 350
Amazon Kinesis function = 200
Amazon DynamoDB function = 200
Amazon Cognito function = 150
Total = 900

With the 100 reserved for all non-concurrency reserved functions, this totals the account limit of 1000.

Here’s how this works. To start, create a basic Lambda function that is invoked via Amazon API Gateway. This Lambda function returns a single “Hello World” statement with an added sleep time between 2 and 5 seconds. The sleep time simulates an API providing some sort of capability that can take a varied amount of time. The goal here is to show how an API that is underloaded can reach its concurrency limit, and what happens when it does.
To create the example function

  1. Open the Lambda console.
  2. Choose Create Function.
  3. For Author from scratch, enter the following values:
    1. For Name, enter a value (such as concurrencyBlog01).
    2. For Runtime, choose Python 3.6.
    3. For Role, choose Create new role from template and enter a name aligned with this function, such as concurrencyBlogRole.
  4. Choose Create function.
  5. The function is created with some basic example code. Replace that code with the following:

import time
from random import randint
seconds = randint(2, 5)

def lambda_handler(event, context):
time.sleep(seconds)
return {"statusCode": 200,
"body": ("Hello world, slept " + str(seconds) + " seconds"),
"headers":
{
"Access-Control-Allow-Headers": "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token",
"Access-Control-Allow-Methods": "GET,OPTIONS",
}}

  1. Under Basic settings, set Timeout to 10 seconds. While this function should only ever take up to 5-6 seconds (with the 5-second max sleep), this gives you a little bit of room if it takes longer.

  1. Choose Save at the top right.

At this point, your function is configured for this example. Test it and confirm this in the console:

  1. Choose Test.
  2. Enter a name (it doesn’t matter for this example).
  3. Choose Create.
  4. In the console, choose Test again.
  5. You should see output similar to the following:

Now configure API Gateway so that you have an HTTPS endpoint to test against.

  1. In the Lambda console, choose Configuration.
  2. Under Triggers, choose API Gateway.
  3. Open the API Gateway icon now shown as attached to your Lambda function:

  1. Under Configure triggers, leave the default values for API Name and Deployment stage. For Security, choose Open.
  2. Choose Add, Save.

API Gateway is now configured to invoke Lambda at the Invoke URL shown under its configuration. You can take this URL and test it in any browser or command line, using tools such as “curl”:


$ curl https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Hello world, slept 2 seconds

Throwing load at the function

Now start throwing some load against your API Gateway + Lambda function combo. Right now, your function is only limited by the total amount of concurrency available in an account. For this example account, you might have 850 unreserved concurrency out of a full account limit of 1000 due to having configured a few concurrency limits already (also the 100 concurrency saved for all functions without configured limits). You can find all of this information on the main Dashboard page of the Lambda console:

For generating load in this example, use an open source tool called “hey” (https://github.com/rakyll/hey), which works similarly to ApacheBench (ab). You test from an Amazon EC2 instance running the default Amazon Linux AMI from the EC2 console. For more help with configuring an EC2 instance, follow the steps in the Launch Instance Wizard.

After the EC2 instance is running, SSH into the host and run the following:


sudo yum install go
go get -u github.com/rakyll/hey

“hey” is easy to use. For these tests, specify a total number of tests (5,000) and a concurrency of 50 against the API Gateway URL as follows(replace the URL here with your own):


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

The output from “hey” tells you interesting bits of information:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

Summary:
Total: 381.9978 secs
Slowest: 9.4765 secs
Fastest: 0.0438 secs
Average: 3.2153 secs
Requests/sec: 13.0891
Total data: 140024 bytes
Size/request: 28 bytes

Response time histogram:
0.044 [1] |
0.987 [2] |
1.930 [0] |
2.874 [1803] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.817 [1518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4.760 [719] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
5.703 [917] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6.647 [13] |
7.590 [14] |
8.533 [9] |
9.477 [4] |

Latency distribution:
10% in 2.0224 secs
25% in 2.0267 secs
50% in 3.0251 secs
75% in 4.0269 secs
90% in 5.0279 secs
95% in 5.0414 secs
99% in 5.1871 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0000 secs, 0.0332 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0046 secs
req write: 0.0000 secs, 0.0000 secs, 0.0005 secs
resp wait: 3.2149 secs, 0.0438 secs, 9.4472 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
[200] 4997 responses
[502] 3 responses

You can see a helpful histogram and latency distribution. Remember that this Lambda function has a random sleep period in it and so isn’t entirely representational of a real-life workload. Those three 502s warrant digging deeper, but could be due to Lambda cold-start timing and the “second” variable being the maximum of 5, causing the Lambda functions to time out. AWS X-Ray and the Amazon CloudWatch logs generated by both API Gateway and Lambda could help you troubleshoot this.

Configuring a concurrency reservation

Now that you’ve established that you can generate this load against the function, I show you how to limit it and protect a backend resource from being overloaded by all of these requests.

  1. In the console, choose Configure.
  2. Under Concurrency, for Reserve concurrency, enter 25.

  1. Click on Save in the top right corner.

You could also set this with the AWS CLI using the Lambda put-function-concurrency command or see your current concurrency configuration via Lambda get-function. Here’s an example command:


$ aws lambda get-function --function-name concurrencyBlog01 --output json --query Concurrency
{
"ReservedConcurrentExecutions": 25
}

Either way, you’ve set the Concurrency Reservation to 25 for this function. This acts as both a limit and a reservation in terms of making sure that you can execute 25 concurrent functions at all times. Going above this results in the throttling of the Lambda function. Depending on the invoking service, throttling can result in a number of different outcomes, as shown in the documentation on Throttling Behavior. This change has also reduced your unreserved account concurrency for other functions by 25.

Rerun the same load generation as before and see what happens. Previously, you tested at 50 concurrency, which worked just fine. By limiting the Lambda functions to 25 concurrency, you should see rate limiting kick in. Run the same test again:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

While this test runs, refresh the Monitoring tab on your function detail page. You see the following warning message:

This is great! It means that your throttle is working as configured and you are now protecting your downstream resources from too much load from your Lambda function.

Here is the output from a new “hey” command:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Summary:
Total: 379.9922 secs
Slowest: 7.1486 secs
Fastest: 0.0102 secs
Average: 1.1897 secs
Requests/sec: 13.1582
Total data: 164608 bytes
Size/request: 32 bytes

Response time histogram:
0.010 [1] |
0.724 [3075] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.438 [0] |
2.152 [811] |∎∎∎∎∎∎∎∎∎∎∎
2.866 [11] |
3.579 [566] |∎∎∎∎∎∎∎
4.293 [214] |∎∎∎
5.007 [1] |
5.721 [315] |∎∎∎∎
6.435 [4] |
7.149 [2] |

Latency distribution:
10% in 0.0130 secs
25% in 0.0147 secs
50% in 0.0205 secs
75% in 2.0344 secs
90% in 4.0229 secs
95% in 5.0248 secs
99% in 5.0629 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0000 secs, 0.0537 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0184 secs
req write: 0.0000 secs, 0.0000 secs, 0.0016 secs
resp wait: 1.1892 secs, 0.0101 secs, 7.1038 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
[502] 3076 responses
[200] 1924 responses

This looks fairly different from the last load test run. A large percentage of these requests failed fast due to the concurrency throttle failing them (those with the 0.724 seconds line). The timing shown here in the histogram represents the entire time it took to get a response between the EC2 instance and API Gateway calling Lambda and being rejected. It’s also important to note that this example was configured with an edge-optimized endpoint in API Gateway. You see under Status code distribution that 3076 of the 5000 requests failed with a 502, showing that the backend service from API Gateway and Lambda failed the request.

Other uses

Managing function concurrency can be useful in a few other ways beyond just limiting the impact on downstream services and providing a reservation of concurrency capacity. Here are two other uses:

  • Emergency kill switch
  • Cost controls

Emergency kill switch

On occasion, due to issues with applications I’ve managed in the past, I’ve had a need to disable a certain function or capability of an application. By setting the concurrency reservation and limit of a Lambda function to zero, you can do just that.

With the reservation set to zero every invocation of a Lambda function results in being throttled. You could then work on the related parts of the infrastructure or application that aren’t working, and then reconfigure the concurrency limit to allow invocations again.

Cost controls

While I mentioned how you might want to use concurrency limits to control the downstream impact to services or databases that your Lambda function might call, another resource that you might be cautious about is money. Setting the concurrency throttle is another way to help control costs during development and testing of your application.

You might want to prevent against a function performing a recursive action too quickly or a development workload generating too high of a concurrency. You might also want to protect development resources connected to this function from generating too much cost, such as APIs that your Lambda function calls.

Conclusion

Concurrent executions as a unit of scale are a fairly unique characteristic about Lambda functions. Placing limits on how many concurrency “slices” that your function can consume can prevent a single function from consuming all of the available concurrency in an account. Limits can also prevent a function from overwhelming a backend resource that isn’t as scalable.

Unlike monolithic applications or even microservices where there are mixed capabilities in a single service, Lambda functions encourage a sort of “nano-service” of small business logic directly related to the integration model connected to the function. I hope you’ve enjoyed this post and configure your concurrency limits today!