Tag Archives: AWS Lambda

Building and Maintaining an Amazon S3 Metadata Index without Servers

Post Syndicated from Mike Deck original https://blogs.aws.amazon.com/bigdata/post/Tx2YRX3Y16CVQFZ/Building-and-Maintaining-an-Amazon-S3-Metadata-Index-without-Servers

Mike Deck is a Solutions Architect with AWS

Amazon S3 is a simple key-based object store whose scalability and low cost make it ideal for storing large datasets. Its design enables S3 to provide excellent performance for storing and retrieving objects based on a known key. Finding objects based on other attributes, however, requires doing a linear search using the LIST operation. Because each listing can return at most 1000 keys, it may require many requests before finding the object. Because of these additional requests, implementing attribute-based queries in S3 alone can be challenging.

A common solution is to build an external index that maps queryable attributes to the S3 object key. This index can leverage data repositories that are built for fast lookups but might not be great at storing large data blobs. These types of indexes provide an entry point to your data that can be used by a variety of systems. For instance, the AWS Lambda search function described in "Building Scalable and Responsive Big Data Interfaces with AWS Lambda" could leverage an index instead of listing keys directly, to dramatically reduce the search space and improve performance.

In this post, I walk through an approach for building such an index using Amazon DynamoDB and AWS Lambda. With these technologies, you can create a high performance, low-cost index that scales and remains highly available without the need to maintain traditional servers.

Example use case

For the purposes of illustration, this post focuses on a common use case in which S3 is used as the primary data store for a fleet of data ingestion servers. For this example, assume you have a large number of Amazon EC2 instances that receive data sent by customers via a public API. These servers batch the data in one-minute increments and add an object per customer to S3 with the raw data items received in that minute. Because of the distributed nature of the instances, there’s no way to know which servers might store data for a given customer at any minute.

Assume that the servers upload objects with the following key structure:

[4-digit hash]/[server id]/[year]-[month]-[day]-[hour]-[minute]/[customer id]-[epoch timestamp].data

Example: a5b2/i-31cc02/2015-07-05-00-25/87423-1436055953839.data

This key structure enables sustained, high-access rates to S3 but makes it difficult to find all keys for a given customer or server using S3 LIST operations. For instance, to list all the data objects for a given customer uploaded within the last 24 hours, you would have to iterate over every single key in the bucket and inspect the customer ID for each one separately.

In addition to the information encoded in the key, each object has a user-defined metadata field that specifies whether a transaction record is present in the data. A very small percentage of these objects contain transaction records. However, these records are particularly important for certain analyses.

For data collected in this manner there are a number of analyses you could run. This post focuses on building a metadata index to facilitate four specific reports and queries:

Find all objects for a given customer collected during a time range.

Calculate the total storage used for a given customer.

List all objects for a given customer that contain a transaction record.

Find all objects uploaded by a given server during a time range.

Architecture

In addition to fulfilling the functional requirements outlined above, below are the primary architectural goals for this system:

Zero administration cost – This system should not require the creation or administration of any servers.

Scalable and elastic – The index should be able to accommodate a growing number of entries seamlessly as well as scaling up and down to handle changing rates of insertions and queries.

Automatic – Adding objects to the index should not require any additional operations beyond adding the object to S3.

DynamoDB is a NoSQL data store that can be used for storing the index itself, and AWS Lambda is a compute service that can run code to add index entries. Both of these services are fully managed, providing scalable and highly available components without the need to administer servers directly.

To update the index automatically when new objects are created, the AWS Lambda function that creates the index entries can be configured to execute in response to S3 object creation events.

The process is illustrated below.

Architectural goals

Note: The example code in this post only handles object creation, but the same approach can also be used to remove entries from the index when objects are deleted from the bucket.

DynamoDB table design

The heart of the S3 object index is a DynamoDB table with one item per object, which associates various attributes with the object’s S3 key. Each item contains the S3 key, the size of the object, and any additional attributes to use for lookups.

Because DynamoDB tables are schema-less, the only things you need to define explicitly are the primary key and any additional indexes to support your queries. When selecting a primary key and indexes, you need to consider how the table will be queried. The following sections look at each of the four queries from the example and show an index optimized for each one.

In this example, you define all of your indexes up front. In a more iterative development context, you could define only the primary key to begin with and then add secondary indexes as your requirements demand.

1. Find all objects for a given customer collected during a time range

For this query type, use a hash and range primary key. By making the customer ID the hash key, you can find all the objects for a given customer. Additionally, if the range key is the timestamp you can narrow the results to a specific time range.  You can’t use just these two pieces of information alone because there’s no guarantee that two different servers won’t upload an object for the same customer at the same time, violating the uniqueness requirement of your primary key. To guarantee uniqueness while still enabling the ability to query on time range, you can append the server ID to the timestamp for the range key. The resulting key layout is shown below.

2. Calculate the total storage used for a given customer

Because your primary key always allows you to retrieve all of the attributes for each item, you’ll also be able to use this index to track the storage consumed for each customer by retrieving all of the records for a given customer ID and summing the size attribute.

3. List all objects for a given customer that contain a transaction record.

Because most of the objects won’t contain a transaction, you can use a sparse secondary index to enable fast lookups of log files with transactions for a customer. To create a sparse index you need a “HasTransaction” attribute that is only present when a transaction exists in the object. When no transaction is present you should omit the attribute entirely.

For this index, use the same customer ID hash key and set the range key to the “HasTransaction” attribute. Because this index’s hash key is the same as the primary key you can define the index to be a local secondary index.

4. Find all objects uploaded by a given server during a time range

This query will require a global secondary index since the lookup will use a different hash key than the primary key.  Use the server ID as the hash key and reuse the concatenated timestamp and server ID attribute for the range key. Because global secondary indexes do not have the same uniqueness constraint as primary keys, you don’t need to worry about including the customer ID in this index.

Lambda function overview

Now that you have your DynamoDB table defined, you can build the Lambda function that handles the object creation events fired by S3. The event handler needs to complete the following tasks for each object added to your bucket:

Extract the key and object size from the event data.

Request the user-defined metadata fields for the object from S3.

Determine the name of the index DynamoDB table.

Put an item into the table.

Steps 1, 2, and 4 are very straightforward, and are shown in the example code that accompanies this post.

Determining the name of the DynamoDB table to use can be done in several ways. For the sake of simplicity, the code example uses a simple naming convention in which an “-index” suffix is appended to the bucket name. This way, the same Lambda function can be reused on multiple buckets. Other alternatives to this strategy would be to simply hard-code the index table name in the function or use the event notification configuration ID to encode the table name in the S3 event itself.

Practical considerations

The approach described in this post is an effective way to build and maintain an index for S3 buckets across a variety of usage patterns, but there are some issues you should consider before using this architecture in production.

Error Handling

While all of the services used for this index are designed to be highly available, there’s always the potential that the indexing function could encounter an error. We can write our function defensively and handle many scenarios gracefully, but we also need a mechanism for dealing with unrecoverable failures.

Fortunately, Lambda functions create and write to Amazon CloudWatch log streams by default. Each invocation of the function is logged. If it does not complete successfully, there is a record of what caused the failure. You can also create a CloudWatch alarm that notifies a human whenever there is an error that our automated process couldn’t deal with, so that the problem can be investigated and remedied.

Object creation rate

When configuring your index, consider the rate at which objects will be created in S3 to properly set the provisioned throughput for the DynamoDB table as well as the concurrency rates for the Lambda function. This style of index generally requires DynamoDB write capacity equivalent to the maximum object creation rate. For more information about provisioning throughput, see the Use Burst Capacity Sparingly section in the Guidelines for Working with Tables topic.

You should also test your Lambda function under various loads to determine its concurrency requirements. After you’ve determined the maximum request rate and concurrent invocations needed to support your usage patterns, you can request an appropriate increase to the default limits if necessary.

Performance Tuning

Depending on your AWS Lambda function’s complexity, you may need to adjust the available resources (memory, CPU, and network). You can adjust the memory allocated to your function at any time and AWS Lambda assigns proportional CPU and network resources based on that value.

Sample code and query examples

The AWS Big Data Blog’s Github repository contains sample code and instructions for deploying this system.

I’ve also created a video that demonstrates deploying the sample code.

Conclusion

By leveraging S3’s integration with other fully-managed AWS services, you can build extremely useful extensions with minimal development and ongoing administrative costs. Because both Lambda and DynamoDB provide highly flexible platforms for executing arbitrary code or storing schema-less data, respectively, you can use the overarching approach described in this post to build sophisticated solutions that don’t create the operational burden of provisioning and maintaining traditional servers.

Building and Maintaining an Amazon S3 Metadata Index without Servers

Post Syndicated from Mike Deck original https://blogs.aws.amazon.com/bigdata/post/Tx2YRX3Y16CVQFZ/Building-and-Maintaining-an-Amazon-S3-Metadata-Index-without-Servers

Mike Deck is a Solutions Architect with AWS

Amazon S3 is a simple key-based object store whose scalability and low cost make it ideal for storing large datasets. Its design enables S3 to provide excellent performance for storing and retrieving objects based on a known key. Finding objects based on other attributes, however, requires doing a linear search using the LIST operation. Because each listing can return at most 1000 keys, it may require many requests before finding the object. Because of these additional requests, implementing attribute-based queries in S3 alone can be challenging.

A common solution is to build an external index that maps queryable attributes to the S3 object key. This index can leverage data repositories that are built for fast lookups but might not be great at storing large data blobs. These types of indexes provide an entry point to your data that can be used by a variety of systems. For instance, the AWS Lambda search function described in "Building Scalable and Responsive Big Data Interfaces with AWS Lambda" could leverage an index instead of listing keys directly, to dramatically reduce the search space and improve performance.

In this post, I walk through an approach for building such an index using Amazon DynamoDB and AWS Lambda. With these technologies, you can create a high performance, low-cost index that scales and remains highly available without the need to maintain traditional servers.

Example use case

For the purposes of illustration, this post focuses on a common use case in which S3 is used as the primary data store for a fleet of data ingestion servers. For this example, assume you have a large number of Amazon EC2 instances that receive data sent by customers via a public API. These servers batch the data in one-minute increments and add an object per customer to S3 with the raw data items received in that minute. Because of the distributed nature of the instances, there’s no way to know which servers might store data for a given customer at any minute.

Assume that the servers upload objects with the following key structure:

[4-digit hash]/[server id]/[year]-[month]-[day]-[hour]-[minute]/[customer id]-[epoch timestamp].data

Example: a5b2/i-31cc02/2015-07-05-00-25/87423-1436055953839.data

This key structure enables sustained, high-access rates to S3 but makes it difficult to find all keys for a given customer or server using S3 LIST operations. For instance, to list all the data objects for a given customer uploaded within the last 24 hours, you would have to iterate over every single key in the bucket and inspect the customer ID for each one separately.

In addition to the information encoded in the key, each object has a user-defined metadata field that specifies whether a transaction record is present in the data. A very small percentage of these objects contain transaction records. However, these records are particularly important for certain analyses.

For data collected in this manner there are a number of analyses you could run. This post focuses on building a metadata index to facilitate four specific reports and queries:

Find all objects for a given customer collected during a time range.

Calculate the total storage used for a given customer.

List all objects for a given customer that contain a transaction record.

Find all objects uploaded by a given server during a time range.

Architecture

In addition to fulfilling the functional requirements outlined above, below are the primary architectural goals for this system:

Zero administration cost – This system should not require the creation or administration of any servers.

Scalable and elastic – The index should be able to accommodate a growing number of entries seamlessly as well as scaling up and down to handle changing rates of insertions and queries.

Automatic – Adding objects to the index should not require any additional operations beyond adding the object to S3.

DynamoDB is a NoSQL data store that can be used for storing the index itself, and AWS Lambda is a compute service that can run code to add index entries. Both of these services are fully managed, providing scalable and highly available components without the need to administer servers directly.

To update the index automatically when new objects are created, the AWS Lambda function that creates the index entries can be configured to execute in response to S3 object creation events.

The process is illustrated below.

Architectural goals

Note: The example code in this post only handles object creation, but the same approach can also be used to remove entries from the index when objects are deleted from the bucket.

DynamoDB table design

The heart of the S3 object index is a DynamoDB table with one item per object, which associates various attributes with the object’s S3 key. Each item contains the S3 key, the size of the object, and any additional attributes to use for lookups.

Because DynamoDB tables are schema-less, the only things you need to define explicitly are the primary key and any additional indexes to support your queries. When selecting a primary key and indexes, you need to consider how the table will be queried. The following sections look at each of the four queries from the example and show an index optimized for each one.

In this example, you define all of your indexes up front. In a more iterative development context, you could define only the primary key to begin with and then add secondary indexes as your requirements demand.

1. Find all objects for a given customer collected during a time range

For this query type, use a hash and range primary key. By making the customer ID the hash key, you can find all the objects for a given customer. Additionally, if the range key is the timestamp you can narrow the results to a specific time range.  You can’t use just these two pieces of information alone because there’s no guarantee that two different servers won’t upload an object for the same customer at the same time, violating the uniqueness requirement of your primary key. To guarantee uniqueness while still enabling the ability to query on time range, you can append the server ID to the timestamp for the range key. The resulting key layout is shown below.

2. Calculate the total storage used for a given customer

Because your primary key always allows you to retrieve all of the attributes for each item, you’ll also be able to use this index to track the storage consumed for each customer by retrieving all of the records for a given customer ID and summing the size attribute.

3. List all objects for a given customer that contain a transaction record.

Because most of the objects won’t contain a transaction, you can use a sparse secondary index to enable fast lookups of log files with transactions for a customer. To create a sparse index you need a “HasTransaction” attribute that is only present when a transaction exists in the object. When no transaction is present you should omit the attribute entirely.

For this index, use the same customer ID hash key and set the range key to the “HasTransaction” attribute. Because this index’s hash key is the same as the primary key you can define the index to be a local secondary index.

4. Find all objects uploaded by a given server during a time range

This query will require a global secondary index since the lookup will use a different hash key than the primary key.  Use the server ID as the hash key and reuse the concatenated timestamp and server ID attribute for the range key. Because global secondary indexes do not have the same uniqueness constraint as primary keys, you don’t need to worry about including the customer ID in this index.

Lambda function overview

Now that you have your DynamoDB table defined, you can build the Lambda function that handles the object creation events fired by S3. The event handler needs to complete the following tasks for each object added to your bucket:

Extract the key and object size from the event data.

Request the user-defined metadata fields for the object from S3.

Determine the name of the index DynamoDB table.

Put an item into the table.

Steps 1, 2, and 4 are very straightforward, and are shown in the example code that accompanies this post.

Determining the name of the DynamoDB table to use can be done in several ways. For the sake of simplicity, the code example uses a simple naming convention in which an “-index” suffix is appended to the bucket name. This way, the same Lambda function can be reused on multiple buckets. Other alternatives to this strategy would be to simply hard-code the index table name in the function or use the event notification configuration ID to encode the table name in the S3 event itself.

Practical considerations

The approach described in this post is an effective way to build and maintain an index for S3 buckets across a variety of usage patterns, but there are some issues you should consider before using this architecture in production.

Error Handling

While all of the services used for this index are designed to be highly available, there’s always the potential that the indexing function could encounter an error. We can write our function defensively and handle many scenarios gracefully, but we also need a mechanism for dealing with unrecoverable failures.

Fortunately, Lambda functions create and write to Amazon CloudWatch log streams by default. Each invocation of the function is logged. If it does not complete successfully, there is a record of what caused the failure. You can also create a CloudWatch alarm that notifies a human whenever there is an error that our automated process couldn’t deal with, so that the problem can be investigated and remedied.

Object creation rate

When configuring your index, consider the rate at which objects will be created in S3 to properly set the provisioned throughput for the DynamoDB table as well as the concurrency rates for the Lambda function. This style of index generally requires DynamoDB write capacity equivalent to the maximum object creation rate. For more information about provisioning throughput, see the Use Burst Capacity Sparingly section in the Guidelines for Working with Tables topic.

You should also test your Lambda function under various loads to determine its concurrency requirements. After you’ve determined the maximum request rate and concurrent invocations needed to support your usage patterns, you can request an appropriate increase to the default limits if necessary.

Performance Tuning

Depending on your AWS Lambda function’s complexity, you may need to adjust the available resources (memory, CPU, and network). You can adjust the memory allocated to your function at any time and AWS Lambda assigns proportional CPU and network resources based on that value.

Sample code and query examples

The AWS Big Data Blog’s Github repository contains sample code and instructions for deploying this system.

I’ve also created a video that demonstrates deploying the sample code.

Conclusion

By leveraging S3’s integration with other fully-managed AWS services, you can build extremely useful extensions with minimal development and ongoing administrative costs. Because both Lambda and DynamoDB provide highly flexible platforms for executing arbitrary code or storing schema-less data, respectively, you can use the overarching approach described in this post to build sophisticated solutions that don’t create the operational burden of provisioning and maintaining traditional servers.

Automatically Deploy from Amazon S3 using AWS CodeDeploy

Post Syndicated from Surya Bala original http://blogs.aws.amazon.com/application-management/post/Tx3TPMTH0EVGA64/Automatically-Deploy-from-Amazon-S3-using-AWS-CodeDeploy

AWS CodeDeploy does software deployments to any instance, including Amazon EC2 instances and instances running on-premises. It helps avoid downtime during deployments and also provides centralized control over your applications, instances, deployments and deployment configurations. You can learn more about CodeDeploy here. This post explains how to automatically start a CodeDeploy deployment when you upload your applications to Amazon S3. We will use AWS Lambda to notify CodeDeploy as soon as your new application revision is uploaded.

AWS Lambda is a compute service that runs your code in response to events. An event is generated by supported AWS services and Lambda executes your code as soon as these events fire. You can learn more about Lambda here.

Prerequisites

Lambda Execution Role

We need to set up an execution role to allow Lambda to execute our function that creates a deployment. To create an execution role copy the following policy

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:*"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::BUCKET_NAME/*"
]
},
{
"Effect": "Allow",
"Action": "codedeploy:GetDeploymentConfig",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:deploymentconfig:*"
},
{
"Effect": "Allow",
"Action": "codedeploy:RegisterApplicationRevision",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:application:*"
},
{
"Effect": "Allow",
"Action": "codedeploy:GetApplicationRevision",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:application:*"
},
{
"Effect": "Allow",
"Action": "codedeploy:CreateDeployment",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:deploymentgroup:*"
}
]
}

Make sure you replace BUCKET_NAME with the S3 bucket to which you’ll upload your CodeDeploy application revision. You will also need to replace “us-east-1” if you are using a different region and replace “123ACCOUNTID” with your AWS account ID that is found on your Account Settings page.

Go to IAM Policies page.

Click on Create Policy.

Select Create Your Own Policy.

Set the Policy Name to CodeDeployDeploymentPolicy, paste the above CodeDeploy deployment policy in the Policy Document section.

Go to IAM Roles page.

Click on Create New Role.

Input the name LambdaExecutionRole and click Next Step.

Under Select Role Type choose AWS Service Roles and choose AWS Lambda and click Next Step.

In the Attach Policy page select CodeDeployDeploymentPolicy and click Next Step.

Click on Create Role to create the role.

Once you’ve created the Lambda execution role, the Trust Relationships policy will look like

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]

This role provides access to write to Amazon CloudWatch Logs, perform GetObject operations on the specified S3 bucket, and perform deployment using CodeDeploy. The CloudWatch Logs permission is optional. It lets us log exceptions if something goes wrong. This will be handy during debugging. The trust policy grants Lambda permission to perform the above allowed actions on the user’s behalf.

Lambda Function

The Lambda function gets a notification from Amazon S3. The event contains the source bucket and key. It tries to detect the object’s type by looking at the extension. By default, it assumes tar. It then calls getObject on the s3 bucket and key to read the object’s metadata. It expects two parameters in object metadata.

These values represent the CodeDeploy application and deployment group to which you want to auto deploy this S3 object. We will talk about how these values are passed to the Lambda function soon.

var aws = require(‘aws-sdk’);
var s3 = new aws.S3({apiVersion: ‘2006-03-01’});
var codedeploy = new aws.CodeDeploy();

exports.handler = function(event, context) {
var artifact_type;
var bucket;
var key;

/* runtime functions */
function getS3ObjectAndCreateDeployment() {
// Get the s3 object to fetch application-name and deploymentgroup-name metadata.
s3.headObject({
Bucket: bucket,
Key: key
}, function(err, data) {
if (err) {
context.done(‘Error’, ‘Error getting s3 object: ‘ + err);
} else {
console.log(‘Creating deployment’);
createDeployment(data);
}
});
}

function createDeployment(data) {
if (!data.Metadata[‘application-name’] || !data.Metadata[‘deploymentgroup-name’]) {
console.error(‘application-name and deploymentgroup-name object metadata must be set.’);
context.done();
}
var params = {
applicationName: data.Metadata[‘application-name’],
deploymentGroupName: data.Metadata[‘deploymentgroup-name’],
description: ‘Lambda invoked codedeploy deployment’,
ignoreApplicationStopFailures: false,
revision: {
revisionType: ‘S3’,
s3Location: {
bucket: bucket,
bundleType: artifact_type,
key: key
}
}
};
codedeploy.createDeployment(params,
function (err, data) {
if (err) {
context.done(‘Error’,’Error creating deployment: ‘ + err);
}
else {
console.log(data); // successful response
console.log(‘Finished executing lambda function’);
context.done();
}
});
}

console.log(‘Received event:’);
console.log(JSON.stringify(event, null, ‘ ‘));

// Get the object from the event
bucket = event.Records[0].s3.bucket.name;
key = event.Records[0].s3.object.key;

tokens = key.split(‘.’);
artifact_type = tokens[tokens.length – 1];
if (artifact_type == ‘gz’) {
artifact_type = ‘tgz’;
} else if ([‘zip’, ‘tar’, ‘tgz’].indexOf(artifact_type) < 0) {
artifact_type = ‘tar’;
}

getS3ObjectAndCreateDeployment();
};

Registering the Lambda Function

Registering a Lambda function is simple. Just follow these steps:

Go to the AWS Lambda console.

Click Create A Lambda Function.

Provide a Name and Description for the function.

Paste the above code snippet in the Lambda Function Code section, replacing the default contents.

Leave the handler name as ‘handler’.

For Role name, select the Lambda execution role that you created.

You can increase the Memory and Timeout under Advanced Settings if you are going to upload a large application revision of size greater than 5 Mb for example. Otherwise use the default values.

Click Create Lambda Function.

Select the Lambda function, click Actions and select Add event source.

Select the Event source type to S3.

Choose your S3 bucket.

Select the Event type to Object Created.

Click Submit.

Upload to Amazon S3

Once all the above steps are done, you should be able to upload new revisions of your application to the configured S3 bucket. Make sure you specify the following custom metadata while uploading.

application-name

deploymentgroup-name

This way Lambda understands which application and deployment group to create the deployment on. If you are uploading via the Amazon S3 console, do the following:

Go to S3 Console and click on your bucket to which you are going to upload your application revision.

Click Upload.

Add your CodeDeploy bundle by clicking AddFiles.

Click SetDetails>.

Click SetPermissions>.

Click SetMetadata>.

Click Add more metadata.

After adding the required metadata, click Start Upload.

Note: Custom object metadata should be prefixed with x-amz-meta-. For example, x-amz-meta-application-name or x-amz-meta-deploymentgroup-name. Amazon S3 uses this prefix to distinguish the user metadata from other headers.

If you forget to specify the above object metadata during S3 upload, the Lambda function would log an error similar to the following in AWS CloudWatch.

Other Integrations

This Lambda function is just a simple example that showcases how to link AWS CodeDeploy with other AWS services. You can create similar functions to perform other CodeDeploy actions in response to other events. We would love to hear about your ideas or questions in the comments here or over in our forum.

Automatically Deploy from Amazon S3 using AWS CodeDeploy

Post Syndicated from Surya Bala original http://blogs.aws.amazon.com/application-management/post/Tx3TPMTH0EVGA64/Automatically-Deploy-from-Amazon-S3-using-AWS-CodeDeploy

AWS CodeDeploy does software deployments to any instance, including Amazon EC2 instances and instances running on-premises. It helps avoid downtime during deployments and also provides centralized control over your applications, instances, deployments and deployment configurations. You can learn more about CodeDeploy here. This post explains how to automatically start a CodeDeploy deployment when you upload your applications to Amazon S3. We will use AWS Lambda to notify CodeDeploy as soon as your new application revision is uploaded.

AWS Lambda is a compute service that runs your code in response to events. An event is generated by supported AWS services and Lambda executes your code as soon as these events fire. You can learn more about Lambda here.

Prerequisites

Lambda Execution Role

We need to set up an execution role to allow Lambda to execute our function that creates a deployment. To create an execution role copy the following policy

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:*"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::BUCKET_NAME/*"
]
},
{
"Effect": "Allow",
"Action": "codedeploy:GetDeploymentConfig",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:deploymentconfig:*"
},
{
"Effect": "Allow",
"Action": "codedeploy:RegisterApplicationRevision",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:application:*"
},
{
"Effect": "Allow",
"Action": "codedeploy:GetApplicationRevision",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:application:*"
},
{
"Effect": "Allow",
"Action": "codedeploy:CreateDeployment",
"Resource": "arn:aws:codedeploy:us-east-1:123ACCOUNTID:deploymentgroup:*"
}
]
}

Make sure you replace BUCKET_NAME with the S3 bucket to which you’ll upload your CodeDeploy application revision. You will also need to replace “us-east-1” if you are using a different region and replace “123ACCOUNTID” with your AWS account ID that is found on your Account Settings page.

Go to IAM Policies page.

Click on Create Policy.

Select Create Your Own Policy.

Set the Policy Name to CodeDeployDeploymentPolicy, paste the above CodeDeploy deployment policy in the Policy Document section.

Go to IAM Roles page.

Click on Create New Role.

Input the name LambdaExecutionRole and click Next Step.

Under Select Role Type choose AWS Service Roles and choose AWS Lambda and click Next Step.

In the Attach Policy page select CodeDeployDeploymentPolicy and click Next Step.

Click on Create Role to create the role.

Once you’ve created the Lambda execution role, the Trust Relationships policy will look like

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]

This role provides access to write to Amazon CloudWatch Logs, perform GetObject operations on the specified S3 bucket, and perform deployment using CodeDeploy. The CloudWatch Logs permission is optional. It lets us log exceptions if something goes wrong. This will be handy during debugging. The trust policy grants Lambda permission to perform the above allowed actions on the user’s behalf.

Lambda Function

The Lambda function gets a notification from Amazon S3. The event contains the source bucket and key. It tries to detect the object’s type by looking at the extension. By default, it assumes tar. It then calls getObject on the s3 bucket and key to read the object’s metadata. It expects two parameters in object metadata.

These values represent the CodeDeploy application and deployment group to which you want to auto deploy this S3 object. We will talk about how these values are passed to the Lambda function soon.

var aws = require(‘aws-sdk’);
var s3 = new aws.S3({apiVersion: ‘2006-03-01’});
var codedeploy = new aws.CodeDeploy();

exports.handler = function(event, context) {
var artifact_type;
var bucket;
var key;

/* runtime functions */
function getS3ObjectAndCreateDeployment() {
// Get the s3 object to fetch application-name and deploymentgroup-name metadata.
s3.headObject({
Bucket: bucket,
Key: key
}, function(err, data) {
if (err) {
context.done(‘Error’, ‘Error getting s3 object: ‘ + err);
} else {
console.log(‘Creating deployment’);
createDeployment(data);
}
});
}

function createDeployment(data) {
if (!data.Metadata[‘application-name’] || !data.Metadata[‘deploymentgroup-name’]) {
console.error(‘application-name and deploymentgroup-name object metadata must be set.’);
context.done();
}
var params = {
applicationName: data.Metadata[‘application-name’],
deploymentGroupName: data.Metadata[‘deploymentgroup-name’],
description: ‘Lambda invoked codedeploy deployment’,
ignoreApplicationStopFailures: false,
revision: {
revisionType: ‘S3’,
s3Location: {
bucket: bucket,
bundleType: artifact_type,
key: key
}
}
};
codedeploy.createDeployment(params,
function (err, data) {
if (err) {
context.done(‘Error’,’Error creating deployment: ‘ + err);
}
else {
console.log(data); // successful response
console.log(‘Finished executing lambda function’);
context.done();
}
});
}

console.log(‘Received event:’);
console.log(JSON.stringify(event, null, ‘ ‘));

// Get the object from the event
bucket = event.Records[0].s3.bucket.name;
key = event.Records[0].s3.object.key;

tokens = key.split(‘.’);
artifact_type = tokens[tokens.length – 1];
if (artifact_type == ‘gz’) {
artifact_type = ‘tgz’;
} else if ([‘zip’, ‘tar’, ‘tgz’].indexOf(artifact_type) < 0) {
artifact_type = ‘tar’;
}

getS3ObjectAndCreateDeployment();
};

Registering the Lambda Function

Registering a Lambda function is simple. Just follow these steps:

Go to the AWS Lambda console.

Click Create A Lambda Function.

Provide a Name and Description for the function.

Paste the above code snippet in the Lambda Function Code section, replacing the default contents.

Leave the handler name as ‘handler’.

For Role name, select the Lambda execution role that you created.

You can increase the Memory and Timeout under Advanced Settings if you are going to upload a large application revision of size greater than 5 Mb for example. Otherwise use the default values.

Click Create Lambda Function.

Select the Lambda function, click Actions and select Add event source.

Select the Event source type to S3.

Choose your S3 bucket.

Select the Event type to Object Created.

Click Submit.

Upload to Amazon S3

Once all the above steps are done, you should be able to upload new revisions of your application to the configured S3 bucket. Make sure you specify the following custom metadata while uploading.

application-name

deploymentgroup-name

This way Lambda understands which application and deployment group to create the deployment on. If you are uploading via the Amazon S3 console, do the following:

Go to S3 Console and click on your bucket to which you are going to upload your application revision.

Click Upload.

Add your CodeDeploy bundle by clicking AddFiles.

Click SetDetails>.

Click SetPermissions>.

Click SetMetadata>.

Click Add more metadata.

After adding the required metadata, click Start Upload.

Note: Custom object metadata should be prefixed with x-amz-meta-. For example, x-amz-meta-application-name or x-amz-meta-deploymentgroup-name. Amazon S3 uses this prefix to distinguish the user metadata from other headers.

If you forget to specify the above object metadata during S3 upload, the Lambda function would log an error similar to the following in AWS CloudWatch.

Other Integrations

This Lambda function is just a simple example that showcases how to link AWS CodeDeploy with other AWS services. You can create similar functions to perform other CodeDeploy actions in response to other events. We would love to hear about your ideas or questions in the comments here or over in our forum.