Tag Archives: versioning

Building End-to-End Continuous Delivery and Deployment Pipelines in AWS and TeamCity

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/devops/building-end-to-end-continuous-delivery-and-deployment-pipelines-in-aws-and-teamcity/

By Balaji Iyer, Janisha Anand, and Frank Li

Organizations that transform their applications to cloud-optimized architectures need a seamless, end-to-end continuous delivery and deployment workflow: from source code, to build, to deployment, to software delivery.

Continuous delivery is a DevOps software development practice where code changes are automatically built, tested, and prepared for a release to production. The practice expands on continuous integration by deploying all code changes to a testing environment and/or a production environment after the build stage. When continuous delivery is implemented properly, developers will always have a deployment-ready build artifact that has undergone a standardized test process.

Continuous deployment is the process of deploying application revisions to a production environment automatically, without explicit approval from a developer. This process makes the entire software release process automated. Features are released as soon as they are ready, providing maximum value to customers.

These two techniques enable development teams to deploy software rapidly, repeatedly, and reliably.

In this post, we will build an end-to-end continuous deployment and delivery pipeline using AWS CodePipeline (a fully managed continuous delivery service), AWS CodeDeploy (an automated application deployment service), and TeamCity’s AWS CodePipeline plugin. We will use AWS CloudFormation to setup and configure the end-to-end infrastructure and application stacks. The ­­pipeline pulls source code from an Amazon S3 bucket, an AWS CodeCommit repository, or a GitHub repository. The source code will then be built and tested using TeamCity’s continuous integration server. Then AWS CodeDeploy will deploy the compiled and tested code to Amazon EC2 instances.

Prerequisites

You’ll need an AWS account, an Amazon EC2 key pair, and administrator-level permissions for AWS Identity and Access Management (IAM), AWS CloudFormation, AWS CodeDeploy, AWS CodePipeline, Amazon EC2, and Amazon S3.

Overview

Here are the steps:

  1. Continuous integration server setup using TeamCity.
  2. Continuous deployment using AWS CodeDeploy.
  3. Building a delivery pipeline using AWS CodePipeline.

In less than an hour, you’ll have an end-to-end, fully-automated continuous integration, continuous deployment, and delivery pipeline for your application. Let’s get started!

1. Continuous integration server setup using TeamCity

Click here on this button launch-stack to launch an AWS CloudFormation stack to set up a TeamCity server. If you’re not already signed in to the AWS Management Console, you will be prompted to enter your AWS credentials. This stack provides an automated way to set up a TeamCity server based on the instructions here. You can download the template used for this setup from here.

The CloudFormation template does the following:

  1. Installs and configures the TeamCity server and its dependencies in Linux.
  2. Installs the AWS CodePipeline plugin for TeamCity.
  3. Installs a sample application with build configurations.
  4. Installs PHP meta-runners required to build the sample application.
  5. Redirects TeamCity port 8111 to 80.

Choose the AWS region where the TeamCity server will be hosted. For this demo, choose US East (N. Virginia).

Select a region

On the Select Template page, choose Next.

On the Specify Details page, do the following:

  1. In Stack name, enter a name for the stack. The name must be unique in the region in which you are creating the stack.
  2. In InstanceType, choose the instance type that best fits your requirements. The default value is t2.medium.

Note: The default instance type exceeds what’s included in the AWS Free Tier. If you use t2.medium, there will be charges to your account. The cost will depend on how long you keep the CloudFormation stack and its resources.

  1. In KeyName, choose the name of your Amazon EC2 key pair.
  2. In SSHLocation, enter the IP address range that can be used to connect through SSH to the EC2 instance. SSH and HTTP access is limited to this IP address range.

Note: You can use checkip.amazonaws.com or whatsmyip.org to find your IP address. Remember to add /32 to any single domain or, if you are representing a larger IP address space, use the correct CIDR block notation.

Specify Details

Choose Next.

Although it’s optional, on the Options page, type TeamCityServer for the instance name. This is the name used in the CloudFormation template for the stack. It’s a best practice to name your instance, because it makes it easier to identify or modify resources later on.

Choose Next.

On the Review page, choose Create button. It will take several minutes for AWS CloudFormation to create the resources for you.

Review

When the stack has been created, you will see a CREATE_COMPLETE message on the Overview tab in the Status column.

Events

You have now successfully created a TeamCity server. To access the server, on the EC2 Instance page, choose Public IP for the TeamCityServer instance.

Public DNS

On the TeamCity First Start page, choose Proceed.

TeamCity First Start

Although an internal database based on the HSQLDB database engine can be used for evaluation, TeamCity strongly recommends that you use an external database as a back-end TeamCity database in a production environment. An external database provides better performance and reliability. For more information, see the TeamCity documentation.

On the Database connection setup page, choose Proceed.

Database connection setup

The TeamCity server will start, which can take several minutes.

TeamCity is starting

Review and Accept the TeamCity License Agreement, and then choose Continue.

Next, create an Administrator account. Type a user name and password, and then choose Create Account.

You can navigate to the demo project from Projects in the top-left corner.

Projects

Note: You can create a project from a repository URL (the option used in this demo), or you can connect to your managed Git repositories, such as GitHub or BitBucket. The demo app used in this example can be found here.

We have already created a sample project configuration. Under Build, choose Edit Settings, and then review the settings.

Demo App

Choose Build Step: PHP – PHPUnit.

Build Step

The fields on the Build Step page are already configured.

Build Step

Choose Run to start the build.

Run Test

To review the tests that are run as part of the build, choose Tests.

Build

Build

You can view any build errors by choosing Build log from the same drop-down list.

Now that we have a successful build, we will use AWS CodeDeploy to set up a continuous deployment pipeline.

2. Continuous deployment using AWS CodeDeploy

Click here on this button launch-stack to launch an AWS CloudFormation stack that will use AWS CodeDeploy to set up a sample deployment pipeline. If you’re not already signed in to the AWS Management Console, you will be prompted to enter your AWS credentials.

You can download the master template used for this setup from here. The template nests two CloudFormation templates to execute all dependent stacks cohesively.

  1. Template 1 creates a fleet of up to three EC2 instances (with a base operating system of Windows or Linux), associates an instance profile, and installs the AWS CodeDeploy agent. The CloudFormation template can be downloaded from here.
  2. Template 2 creates an AWS CodeDeploy deployment group and then installs a sample application. The CloudFormation template can be downloaded from here.

Choose the same AWS region you used when you created the TeamCity server (US East (N. Virginia)).

Note: The templates contain Amazon Machine Image (AMI) mappings for us-east-1, us-west-2, eu-west-1, and ap-southeast-2 only.

On the Select Template page, choose Next.

Picture17

On the Specify Details page, in Stack name, type a name for the stack. In the Parameters section, do the following:

  1. In AppName, you can use the default, or you can type a name of your choice. The name must be between 2 and 15 characters long. It can contain lowercase and alphanumeric characters, hyphens (-), and periods (.), but the name must start with an alphanumeric character.
  1. In DeploymentGroupName, you can use the default, or you type a name of your choice. The name must be between 2 and 25 characters long. It can contain lowercase and alphanumeric characters, hyphens (-), and periods (.), but the name must start with an alphanumeric character.

Picture18

  1. In InstanceType, choose the instance type that best fits the requirements of your application.
  2. In InstanceCount, type the number of EC2 instances (up to three) that will be part of the deployment group.
  3. For Operating System, choose Linux or Windows.
  4. Leave TagKey and TagValue at their defaults. AWS CodeDeploy will use this tag key and value to locate the instances during deployments. For information about Amazon EC2 instance tags, see Working with Tags Using the Console.Picture19
  5. In S3Bucket and S3Key, type the bucket name and S3 key where the application is located. The default points to a sample application that will be deployed to instances in the deployment group. Based on what you selected in the OperatingSystem field, use the following values.
    Linux:
    S3Bucket: aws-codedeploy
    S3Key: samples/latest/SampleApp_Linux.zip
    Windows:
    S3Bucket: aws-codedeploy
    S3Key: samples/latest/SampleApp_Windows.zip
  1. In KeyName, choose the name of your Amazon EC2 key pair.
  2. In SSHLocation, enter the IP address range that can be used to connect through SSH/RDP to the EC2 instance.

Picture20

Note: You can use checkip.amazonaws.com or whatsmyip.org to find your IP address. Remember to add /32 to any single domain or, if you are representing a larger IP address space, use the correct CIDR block notation.

Follow the prompts, and then choose Next.

On the Review page, select the I acknowledge that this template might cause AWS CloudFormation to create IAM resources check box. Review the other settings, and then choose Create.

Picture21

It will take several minutes for CloudFormation to create all of the resources on your behalf. The nested stacks will be launched sequentially. You can view progress messages on the Events tab in the AWS CloudFormation console.

Picture22

You can see the newly created application and deployment groups in the AWS CodeDeploy console.

Picture23

To verify that your application was deployed successfully, navigate to the DNS address of one of the instances.

Picture24

Picture25

Now that we have successfully created a deployment pipeline, let’s integrate it with AWS CodePipeline.

3. Building a delivery pipeline using AWS CodePipeline

We will now create a delivery pipeline in AWS CodePipeline with the TeamCity AWS CodePipeline plugin.

  1. Using AWS CodePipeline, we will build a new pipeline with Source and Deploy stages.
  2. Create a custom action for the TeamCity Build stage.
  3. Create an AWS CodePipeline action trigger in TeamCity.
  4. Create a Build stage in the delivery pipeline for TeamCity.
  5. Publish the build artifact for deployment.

Step 1: Build a new pipeline with Source and Deploy stages using AWS CodePipeline.

In this step, we will create an Amazon S3 bucket to use as the artifact store for this pipeline.

  1. Install and configure the AWS CLI.
  1. Create an Amazon S3 bucket that will host the build artifact. Replace account-number with your AWS account number in the following steps.
    $ aws s3 mb s3://demo-app-build-account-number
  1. Enable bucket versioning
    $ aws s3api put-bucket-versioning --bucket demo-app-build-account-number --versioning-configuration Status=Enabled
  1. Download the sample build artifact and upload it to the Amazon S3 bucket created in step 2.
  • OSX/Linux:
    $ wget -qO- https://s3.amazonaws.com/teamcity-demo-app/Sample_Linux_App.zip | aws s3 cp - s3://demo-app-build-account-number
  • Windows:
    $ wget -qO- https://s3.amazonaws.com/teamcity-demo-app/Sample_Windows_App.zip
    $ aws s3 cp ./Sample_Windows_App.zip s3://demo-app-account-number

Note: You can use AWS CloudFormation to perform these steps in an automated way. When you choose launch-stack, this template will be used. Use the following commands to extract the Amazon S3 bucket name, enable versioning on the bucket, and copy over the sample artifact.

$ export bucket-name ="$(aws cloudformation describe-stacks --stack-name “S3BucketStack” --output text --query 'Stacks[0].Outputs[?OutputKey==`S3BucketName`].OutputValue')"
$ aws s3api put-bucket-versioning --bucket $bucket-name --versioning-configuration Status=Enabled && wget https://s3.amazonaws.com/teamcity-demo-app/Sample_Linux_App.zip && aws s3 cp ./Sample_Linux_App.zip s3://$bucket-name

You can create a pipeline by using a CloudFormation stack or the AWS CodePipeline console.

Option 1: Use AWS CloudFormation to create a pipeline

We’re going to create a two-stage pipeline that uses a versioned Amazon S3 bucket and AWS CodeDeploy to release a sample application. (You can use an AWS CodeCommit repository or a GitHub repository as the source provider instead of Amazon S3.)

Click here on this button launch-stack to launch an AWS CloudFormation stack to set up a new delivery pipeline using the application and deployment group created in an earlier step. If you’re not already signed in to the AWS Management Console, you will be prompted to enter your AWS credentials.

Choose the US East (N. Virginia) region, and then choose Next.

Leave the default options, and then choose Next.

Picture26

On the Options page, choose Next.

Picture27

Select the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create. This will create the delivery pipeline in AWS CodePipeline.

Option 2: Use the AWS CodePipeline console to create a pipeline

On the Create pipeline page, in Pipeline name, type a name for your pipeline, and then choose Next step.
Picture28

Depending on where your source code is located, you can choose Amazon S3, AWS CodeCommit, or GitHub as your Source provider. The pipeline will be triggered automatically upon every check-in to your GitHub or AWS CodeCommit repository or when an artifact is published into the S3 bucket. In this example, we will be accessing the product binaries from an Amazon S3 bucket.

Choose Next step.Picture29

s3://demo-app-build-account-number/Sample_Linux_App.zip (or) Sample_Windows_App.zip

Note: AWS CodePipeline requires a versioned S3 bucket for source artifacts. Enable versioning for the S3 bucket where the source artifacts will be located.

On the Build page, choose No Build. We will update the build provider information later on.Picture31

For Deployment provider, choose CodeDeploy. For Application name and Deployment group, choose the application and deployment group we created in the deployment pipeline step, and then choose Next step.Picture32

An IAM role will provide the permissions required for AWS CodePipeline to perform the build actions and service calls.  If you already have a role you want to use with the pipeline, choose it on the AWS Service Role page. Otherwise, type a name for your role, and then choose Create role.  Review the predefined permissions, and then choose Allow. Then choose Next step.

 

For information about AWS CodePipeline access permissions, see the AWS CodePipeline Access Permissions Reference.

Picture34

Review your pipeline, and then choose Create pipeline

Picture35

This will trigger AWS CodePipeline to execute the Source and Beta steps. The source artifact will be deployed to the AWS CodeDeploy deployment groups.

Picture36

Now you can access the same DNS address of the AWS CodeDeploy instance to see the updated deployment. You will see the background color has changed to green and the page text has been updated.

Picture37

We have now successfully created a delivery pipeline with two stages and integrated the deployment with AWS CodeDeploy. Now let’s integrate the Build stage with TeamCity.

Step 2: Create a custom action for TeamCity Build stage

AWS CodePipeline includes a number of actions that help you configure build, test, and deployment resources for your automated release process. TeamCity is not included in the default actions, so we will create a custom action and then include it in our delivery pipeline. TeamCity’s CodePipeline plugin will also create a job worker that will poll AWS CodePipeline for job requests for this custom action, execute the job, and return the status result to AWS CodePipeline.

TeamCity’s custom action type (Build/Test categories) can be integrated with AWS CodePipeline. It’s similar to Jenkins and Solano CI custom actions. TeamCity’s CodePipeline plugin will also create a job worker that will poll AWS CodePipeline for job requests for this custom action, execute the job, and return the status result to AWS CodePipeline.

The TeamCity AWS CodePipeline plugin is already installed on the TeamCity server we set up earlier. To learn more about installing TeamCity plugins, see install the plugin. We will now create a custom action to integrate TeamCity with AWS CodePipeline using a custom-action JSON file.

Download this file locally: https://github.com/JetBrains/teamcity-aws-codepipeline-plugin/blob/master/custom-action.json

Open a terminal session (Linux, OS X, Unix) or command prompt (Windows) on a computer where you have installed the AWS CLI. For information about setting up the AWS CLI, see here.

Use the AWS CLI to run the aws codepipeline create-custom-action-type command, specifying the name of the JSON file you just created.

For example, to create a build custom action:

$ aws codepipeline create-custom-action-type --cli-input-json file://teamcity-custom-action.json

This should result in an output similar to this:

{
    "actionType": {
        "inputArtifactDetails": {
            "maximumCount": 5,
            "minimumCount": 0
        },
        "actionConfigurationProperties": [
            {
                "description": "The expected URL format is http[s]://host[:port]",
                "required": true,
                "secret": false,
                "key": true,
                "queryable": false,
                "name": "TeamCityServerURL"
            },
            {
                "description": "Corresponding TeamCity build configuration external ID",
                "required": true,
                "secret": false,
                "key": true,
                "queryable": false,
                "name": "BuildConfigurationID"
            },
            {
                "description": "Must be unique, match the corresponding field in the TeamCity build trigger settings, satisfy regular expression pattern: [a-zA-Z0-9_-]+] and have length <= 20",
                "required": true,
                "secret": false,
                "key": true,
                "queryable": true,
                "name": "ActionID"
            }
        ],
        "outputArtifactDetails": {
            "maximumCount": 5,
            "minimumCount": 0
        },
        "id": {
            "category": "Build",
            "owner": "Custom",
            "version": "1",
            "provider": "TeamCity"
        },
        "settings": {
            "entityUrlTemplate": "{Config:TeamCityServerURL}/viewType.html?buildTypeId={Config:BuildConfigurationID}",
            "executionUrlTemplate": "{Config:TeamCityServerURL}/viewLog.html?buildId={ExternalExecutionId}&tab=buildResultsDiv"
        }
    }
}

Before you add the custom action to your delivery pipeline, make the following changes to the TeamCity build server. You can access the server by opening the Public IP of the TeamCityServer instance from the EC2 Instance page.

Picture38

In TeamCity, choose Projects. Under Build Configuration Settings, choose Version Control Settings. You need to remove the version control trigger here so that the TeamCity build server will be triggered during the Source stage in AWS CodePipeline. Choose Detach.

Picture39

Step 3: Create a new AWS CodePipeline action trigger in TeamCity

Now add a new AWS CodePipeline trigger in your build configuration. Choose Triggers, and then choose Add new trigger

Picture40

From the drop-down menu, choose AWS CodePipeline Action.

Picture41

 

In the AWS CodePipeline console, choose the region in which you created your delivery pipeline. Enter your access key credentials, and for Action ID, type a unique name. You will need this ID when you add a TeamCity Build stage to the pipeline.

Picture42

Step 4: Create a new Build stage in the delivery pipeline for TeamCity

Add a stage to the pipeline and name it Build.

Picture43

From the drop-down menu, choose Build. In Action name, type a name for the action. In Build provider, choose TeamCity, and then choose Add action.

Select TeamCity, click Add action

Picture44

For TeamCity Action Configuration, use the following:

TeamCityServerURL:  http://<Public DNS address of the TeamCity build server>[:port]

Picture45

BuildConfigurationID: In your TeamCity project, choose Build. You’ll find this ID (AwsDemoPhpSimpleApp_Build) under Build Configuration Settings.

Picture46

ActionID: In your TeamCity project, choose Build. You’ll find this ID under Build Configuration Settings. Choose Triggers, and then choose AWS CodePipeline Action.

Picture47

Next, choose input and output artifacts for the Build stage, and then choose Add action.

Picture48

We will now publish a new artifact to the Amazon S3 artifact bucket we created earlier, so we can see the deployment of a new app and its progress through the delivery pipeline. The demo app used in this artifact can be found here for Linux or here for Windows.

Download the sample build artifact and upload it to the Amazon S3 bucket created in step 2.

OSX/Linux:

$ wget -qO- https://s3.amazonaws.com/teamcity-demo-app/PhpArtifact.zip | aws s3 cp - s3://demo-app-build-account-number

Windows:

$ wget -qO- https://s3.amazonaws.com/teamcity-demo-app/WindowsArtifact.zip
$ aws s3 cp ./WindowsArtifact.zip s3://demo-app-account-number

From the AWS CodePipeline dashboard, under delivery-pipeline, choose Edit.Picture49

Edit Source stage by choosing the edit icon on the right.

Amazon S3 location:

Linux: s3://demo-app-account-number/PhpArtifact.zip

Windows: s3://demo-app-account-number/WindowsArtifact.zip

Under Output artifacts, make sure My App is displayed for Output artifact #1. This will be the input artifact for the Build stage.Picture50

The output artifact of the Build stage should be the input artifact of the Beta deployment stage (in this case, MyAppBuild).Picture51

Choose Update, and then choose Save pipeline changes. On the next page, choose Save and continue.

Step 5: Publish the build artifact for deploymentPicture52

Step (a)

In TeamCity, on the Build Steps page, for Runner type, choose Command Line, and then add the following custom script to copy the source artifact to the TeamCity build checkout directory.

Note: This step is required only if your AWS CodePipeline source provider is either AWS CodeCommit or Amazon S3. If your source provider is GitHub, this step is redundant, because the artifact is copied over automatically by the TeamCity AWS CodePipeline plugin.

In Step name, enter a name for the Command Line runner to easily distinguish the context of the step.

Syntax:

$ cp -R %codepipeline.artifact.input.folder%/<CodePipeline-Name>/<build-input-artifact-name>/* % teamcity.build.checkoutDir%
$ unzip *.zip -d %teamcity.build.checkoutDir%
$ rm –rf %teamcity.build.checkoutDir%/*.zip

For Custom script, use the following commands:

cp -R %codepipeline.artifact.input.folder%/delivery-pipeline/MyApp/* %teamcity.build.checkoutDir%
unzip *.zip -d %teamcity.build.checkoutDir%
rm –rf %teamcity.build.checkoutDir%/*.zip

Picture53

Step (b):

For Runner type, choose Command Line runner type, and then add the following custom script to copy the build artifact to the output folder.

For Step name, enter a name for the Command Line runner.

Syntax:

$ mkdir -p %codepipeline.artifact.output.folder%/<CodePipeline-Name>/<build-output-artifact-name>/
$ cp -R %codepipeline.artifact.input.folder%/<CodePipeline-Name>/<build-input-artifact-name>/* %codepipeline.artifact.output.folder%/<CodePipeline-Name/<build-output-artifact-name>/

For Custom script, use the following commands:

$ mkdir -p %codepipeline.artifact.output.folder%/delivery-pipeline/MyAppBuild/
$ cp -R %codepipeline.artifact.input.folder%/delivery-pipeline/MyApp/* %codepipeline.artifact.output.folder%/delivery-pipeline/MyAppBuild/

CopyToOutputFolderIn Build Steps, choose Reorder build steps to ensure the copying of the source artifact step is executed before the PHP – PHP Unit step.Picture54

Drag and drop Copy Source Artifact To Build Checkout Directory to make it the first build step, and then choose Apply.Picture55

Navigate to the AWS CodePipeline console. Choose the delivery pipeline, and then choose Release change. When prompted, choose Release.

Choose Release on the next prompt.

Picture56

The most recent change will run through the pipeline again. It might take a few moments before the status of the run is displayed in the pipeline view.

Here is what you’d see after AWS CodePipeline runs through all of the stages in the pipeline:Picture57

Let’s access one of the instances to see the new application deployment on the EC2 Instance page.Picture58

If your base operating system is Windows, accessing the public DNS address of one of the AWS CodeDeploy instances will result in the following page.

Windows: http://public-dns/Picture59

If your base operating system is Linux, when we access the public DNS address of one of the AWS CodeDeploy instances, we will see the following test page, which is the sample application.

Linux: http://public-dns/www/index.phpPicture60

Congratulations! You’ve created an end-to-end deployment and delivery pipeline ─ from source code, to build, to deployment ─ in a fully automated way.

Summary:

In this post, you learned how to build an end-to-end delivery and deployment pipeline on AWS. Specifically, you learned how to build an end-to-end, fully automated, continuous integration, continuous deployment, and delivery pipeline for your application, at scale, using AWS deployment and management services. You also learned how AWS CodePipeline can be easily extended through the use of custom triggers to integrate other services like TeamCity.

If you have questions or suggestions, please leave a comment below.

Going Serverless: Migrating an Express Application to Amazon API Gateway and AWS Lambda

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/going-serverless-migrating-an-express-application-to-amazon-api-gateway-and-aws-lambda/

Brett Andrews
Brett Andrews
Software Development Engineer

Amazon API Gateway recently released three new features that simplify the process of forwarding HTTP requests to your integration endpoint: greedy path variables, the ANY method, and proxy integration types. With this new functionality, it becomes incredibly easy to run HTTP applications in a serverless environment by leveraging the aws-serverless-express library.

In this post, I go through the process of porting an "existing" Node.js Express application onto API Gateway and AWS Lambda, and discuss some of the advantages, disadvantages, and current limitations. While I use Express in this post, the steps are similar for other Node.js frameworks, such as Koa, Hapi, vanilla, etc.

Modifying an existing Express application

Express is commonly used for both web applications as well as REST APIs. While the primary API Gateway function is to deliver APIs, it can certainly be used for delivering web apps/sites (HTML) as well. To cover both use cases, the Express app below exposes a web app on the root / resource, and a REST API on the /pets resource.

The goal of this walkthrough is for it to be complex enough to cover many of the limitations of this approach today (as comments in the code below), yet simple enough to follow along. To this end, you implement just the entry point of the Express application (commonly named app.js) and assume standard implementations of views and controllers (which are more insulated and thus less affected). You also use MongoDB, due to it being a popular choice in the Node.js community as well as providing a time-out edge case. For a greater AWS serverless experience, consider adopting Amazon DynamoDB.

//app.js
'use strict'
const path = require('path')
const express = require('express')
const bodyParser = require('body-parser')
const cors = require('cors')
const mongoose = require('mongoose')
// const session = require('express-session')
// const compress = require('compression')
// const sass = require('node-sass-middleware')

// Lambda does not allow you to configure environment variables, but dotenv is an
// excellent and simple solution, with the added benefit of allowing you to easily
// manage different environment variables per stage, developer, environment, etc.
require('dotenv').config()

const app = express()
const homeController = require('./controllers/home')
const petsController = require('./controllers/pets')

// MongoDB has a default timeout of 30s, which is the same timeout as API Gateway.
// Because API Gateway was initiated first, it also times out first. Reduce the
// timeout and kill the process so that the next request attempts to connect.
mongoose.connect(process.env.MONGODB_URI, { server: { socketOptions: { connectTimeoutMS: 10000 } } })
mongoose.connection.on('error', () => {
   console.error('Error connecting to MongoDB.')
   process.exit(1)
})

app.set('views', path.join(__dirname, 'views'))
app.set('view engine', 'pug')
app.use(cors())
app.use(bodyParser.json())
app.use(bodyParser.urlencoded({ extended: true }))

/*
* GZIP support is currently not available to API Gateway.
app.use(compress())

* node-sass is a native binary/library (aka Addon in Node.js) and thus must be
* compiled in the same environment (operating system) in which it will be run.
* If you absolutely need to use a native library, you can set up an Amazon EC2 instance
* running Amazon Linux for packaging your Lambda function.
* In the case of SASS, I recommend to build your CSS locally instead and
* deploy all static assets to Amazon S3 for improved performance.
const publicPath = path.join(__dirname, 'public')
app.use(sass({ src: publicPath, dest: publicPath, sourceMap: true}))
app.use(express.static(publicPath, { maxAge: 31557600000 }))

* Storing local state is unreliable due to automatic scaling. Consider going stateless (using REST),
* or use an external state store (for MongoDB, you can use the connect-mongo package)
app.use(session({ secret: process.env.SESSION_SECRET }))
*/

app.get('/', homeController.index)
app.get('/pets', petsController.listPets)
app.post('/pets', petsController.createPet)
app.get('/pets/:petId', petsController.getPet)
app.put('/pets/:petId', petsController.updatePet)
app.delete('/pets/:petId', petsController.deletePet)

/*
* aws-serverless-express communicates over a Unix domain socket. While it's not required
* to remove this line, I recommend doing so as it just sits idle.
app.listen(3000)
*/

// Export your Express configuration so that it can be consumed by the Lambda handler
module.exports = app

Assuming that you had the relevant code implemented in your views and controllers directories and a MongoDB server available, you could uncomment the listen line, run node app.js and have an Express application running at http://localhost:3000. The following "changes" made above were specific to API Gateway and Lambda:

  • Used dotenv to set environment variables.
  • Reduced the timeout for connecting to DynamoDB so that API Gateway does not time out first.
  • Removed the compression middleware as API Gateway does not (currently) support GZIP.
  • Removed node-sass-middleware (I opted for serving static assets through S3, but if there is a particular native library your application absolutely needs, you can build/package your Lambda function on an EC2 instance).
  • Served static assets through S3/CloudFront. Not only is S3 a better option for static assets for performance reasons, API Gateway does not currently support binary data (e.g., images).
  • Removed session state for scalability (alternatively, you could have stored session state in MongoDB using connect-mongo.
  • Removed app.listen() as HTTP requests are not being sent over ports (not strictly required).
  • Exported the Express configuration so that you can consume it in your Lambda handler (more on this soon).

Going serverless with aws-serverless-express

In order for users to be able to hit the app (or for developers to consume the API), you first need to get it online. Because this app is going to be immensely popular, you obviously need to consider scalability, resiliency, and many other factors. Previously, you could provision some servers, launch them in multiple Availability Zones, configure Auto Scaling policies, ensure that the servers were healthy (and replace them if they weren’t), keep up-to-date with the latest security updates, and so on…. As a developer, you just care that your users are able to use the product; everything else is a distraction.

Enter serverless. By leveraging AWS services such as API Gateway and Lambda, you have zero servers to manage, automatic scaling out-of-the-box, and true pay-as-you-go: the promise of the cloud, finally delivered.

The example included in the aws-serverless-express library library includes a good starting point for deploying and managing your serverless resources.

  1. Clone the library into a local directory git clone https://github.com/awslabs/aws-serverless-express.git
  2. From within the example directory, run npm run config <accountId> <bucketName> [region] (this modifies some of the files with your own settings).
  3. Edit the package-function command in package.json by removing "index.html" and adding "views" and "controllers" (the additional directories required for running your app).
  4. Copy the following files in the example directory into your existing project’s directory:

    • simple-proxy-api.yaml – A Swagger file that describes your API.
    • cloudformation.json – A CloudFormation template for creating the Lambda function and API.
    • package.json – You may already have a version of this file, in which case just copy the scripts and config sections. This includes some helpful npm scripts for managing your AWS resources, and testing and updating your Lambda function.
    • api-gateway-event.json – Used for testing your Lambda function locally.
    • lambda.js – The Lambda function, a thin wrapper around your Express application.

    Take a quick look at lambda.js so that you understand exactly what’s going on there. The aws-serverless-express library transforms the request from the client (via API Gateway) into a standard Node.js HTTP request object; sends this request to a special listener (a Unix domain socket); and then transforms it back for the response to API Gateway. It also starts your Express server listening on the Unix domain socket on the initial invocation of the Lambda function. Here it is in its entirety:

// lambda.js
'use strict'
const awsServerlessExpress = require('aws-serverless-express')
const app = require('./app')
const server = awsServerlessExpress.createServer(app)

exports.handler = (event, context) => awsServerlessExpress.proxy(server, event, context)

TIP: Everything outside of the handler function is executed only one time per container: that is, the first time your app receives a request (or the first request after several minutes of inactivity), and when it scales up additional containers.

Deploying

Now that you have more of an understanding of how API Gateway and Lambda communicate with your Express server, it’s time to release your app to the world.

From the project’s directory, run:

npm run setup

This command creates the Amazon S3 bucket specified earlier (if it does not yet exist); zips the necessary files and directories for your Lambda function and uploads it to S3; uploads simple-proxy-api.yaml to S3; creates the CloudFormation stack; and finally opens your browser to the AWS CloudFormation console where you can monitor the creation of your resources. To clean up the AWS resources created by this command, simply run npm run delete-stack. Additionally, if you specified a new S3 bucket, run npm run delete-bucket.

After the status changes to CREATE_COMPLETE (usually after a couple of minutes), you see three links in the Outputs section: one to the API Gateway console, another to the Lambda console, and most importantly one for your web app/REST API. Clicking the link to your API displays the web app; appending /pets in the browser address bar displays your list of pets. Your Express application is available online with automatic scalability and pay-per-request without having to manage a single server!

Additional features

Now that you have your REST API available to your users, take a quick look at some of the additional features made available by API Gateway and Lambda:

  • Usage plans for monetizing the API
  • Caching to improve performance
  • Authorizers for authentication and authorization microservices that determine access to your Express application
  • Stages and versioning and aliases when you need additional stages or environments (dev, beta, prod, etc.)
  • SDK generation to provide SDKs to consumers of your API (available in JavaScript, iOS, Android Java, and Android Swift)
  • API monitoring for logs and insights into usage

After running your Express application in a serverless environment for a while and learning more about the best practices, you may start to want more: more performance, more control, more microservices!

So how do you take your existing serverless Express application (a single Lambda function) and refactor it into microservices? You strangle it. Take a route, move the logic to a new Lambda function, and add a new resource or method to your API Gateway API. However, you’ll find that the tools provided to you by the aws-serverless-express example just don’t cut it for managing these additional functions. For that, you should check out Claudia; it even has support for aws-serverless-express.

Conclusion

To sum it all up, you took an existing application of moderate complexity, applied some minimal changes, and deployed it in just a couple of commands. You now have no servers to manage, automatic scaling out-of-the-box, true pay-as-you-go, loads of features provided by API Gateway, and as a bonus, a great path forward for refactoring into microservices.

If that’s not enough, or this server-side stuff doesn’t interest you and the front end is where you live, consider using aws-serverless-express for server-side rendering of your single page application.

If you have questions or suggestions, please comment below.

Deploy an App to an AWS OpsWorks Layer Using AWS CodePipeline

Post Syndicated from Daniel Huesch original http://blogs.aws.amazon.com/application-management/post/Tx2WKWC9RIY0RD8/Deploy-an-App-to-an-AWS-OpsWorks-Layer-Using-AWS-CodePipeline

Deploy an App to an AWS OpsWorks Layer Using AWS CodePipeline

AWS CodePipeline lets you create continuous delivery pipelines that automatically track code changes from sources such as AWS CodeCommit, Amazon S3, or GitHub. Now, you can use AWS CodePipeline as a code change-management solution for apps, Chef cookbooks, and recipes that you want to deploy with AWS OpsWorks.

This blog post demonstrates how you can create an automated pipeline for a simple Node.js app by using AWS CodePipeline and AWS OpsWorks. After you configure your pipeline, every time you update your Node.js app, AWS CodePipeline passes the updated version to AWS OpsWorks. AWS OpsWorks then deploys the updated app to your fleet of instances, leaving you to focus on improving your application. AWS makes sure that the latest version of your app is deployed.

Step 1: Upload app code to an Amazon S3 bucket

The Amazon S3 bucket must be in the same region in which you later create your pipeline in AWS CodePipeline. For now, AWS CodePipeline supports the AWS OpsWorks provider in the us-east-1 region only; all resources in this blog post should be created in the US East (N. Virginia) region. The bucket must also be versioned, because AWS CodePipeline requires a versioned source. For more information, see Using Versioning.

Upload your app to an Amazon S3 bucket

  1. Download a ZIP file of the AWS OpsWorks sample, Node.js app, and save it to a convenient location on your local computer: https://s3.amazonaws.com/opsworks-codepipeline-demo/opsworks-nodejs-demo-app.zip.
  2. Open the Amazon S3 console at https://console.aws.amazon.com/s3/. Choose Create Bucket. Be sure to enable versioning.
  3. Choose the bucket that you created and upload the ZIP file that you saved in step 1.


     

  4. In the Properties pane for the uploaded ZIP file, make a note of the S3 link to the file. You will need the bucket name and the ZIP file name portion of this link to create your pipeline.

Step 2: Create an AWS OpsWorks to Amazon EC2 service role

1.     Go to the Identity and Access Management (IAM) service console, and choose Roles.
2.     Choose Create Role, and name it aws-opsworks-ec2-role-with-s3.
3.     In the AWS Service Roles section, choose Amazon EC2, and then choose the policy called AmazonS3ReadOnlyAccess.
4.     The new role should appear in the Roles dashboard.

Step 3: Create an AWS OpsWorks Chef 12 Linux stack

To use AWS OpsWorks as a provider for a pipeline, you must first have an AWS OpsWorks stack, a layer, and at least one instance in the layer. As a reminder, the Amazon S3 bucket to which you uploaded your app must be in the same region in which you later create your AWS OpsWorks stack and pipeline, US East (N. Virginia).

1.     In the OpsWorks console, choose Add Stack, and then choose a Chef 12 stack.
2.     Set the stack’s name to CodePipeline Demo and make sure the Default operating system is set to Linux.
3.     Enable Use custom Chef cookbooks.
4.     For Repository type, choose HTTP Archive, and then use the following cookbook repository on S3: https://s3.amazonaws.com/opsworks-codepipeline-demo/opsworks-nodejs-demo-cookbook.zip. This repository contains a set of Chef cookbooks that include Chef recipes you’ll use to install the Node.js package and its dependencies on your instance. You will use these Chef recipes to deploy the Node.js app that you prepared in step 1.1.

 Step 4: Create and configure an AWS OpsWorks layer

Now that you’ve created an AWS OpsWorks stack called CodePipeline Demo, you can create an OpsWorks layer.

1.     Choose Layers, and then choose Add Layer in the AWS OpsWorks stack view.
2.     Name the layer Node.js App Server. For Short Name, type app1, and then choose Add Layer.
3.     After you create the layer, open the layer’s Recipes tab. In the Deploy lifecycle event, type nodejs_demo. Later, you will link this to a Chef recipe that is part of the Chef cookbook you referenced when you created the stack in step 3.4. This Chef recipe runs every time a new version of your application is deployed.

4.     Now, open the Security tab, choose Edit, and choose AWS-OpsWorks-WebApp from the Security groups drop-down list. You will also need to set the EC2 Instance Profile to use the service role you created in step 2.2 (aws-opsworks-ec2-role-with-s3).

Step 5: Add your App to AWS OpsWorks

Now that your layer is configured, add the Node.js demo app to your AWS OpsWorks stack. When you create the pipeline, you’ll be required to reference this demo Node.js app.

  1. Have the Amazon S3 bucket link from the step 1.4 ready. You will need the link to the bucket in which you stored your test app.
  2. In AWS OpsWorks, open the stack you created (CodePipeline Demo), and in the navigation pane, choose Apps.
  3. Choose Add App.
  4. Provide a name for your demo app (for example, Node.js Demo App), and set the Repository type to an S3 Archive. Paste your S3 bucket link (s3://bucket-name/file name) from step 1.4.
  5. Now that your app appears in the list on the Apps page, add an instance to your OpsWorks layer.

 Step 6: Add an instance to your AWS OpsWorks layer

Before you create a pipeline in AWS CodePipeline, set up at least one instance within the layer you defined in step 4.

  1. Open the stack that you created (CodePipeline Demo), and in the navigation pane, choose Instances.
  2. Choose +Instance, and accept the default settings, including the hostname, size, and subnet. Choose Add Instance.

  1. By default, the instance is in a stopped state. Choose start to start the instance.

Step 7: Create a pipeline in AWS CodePipeline

Now that you have a stack and an app configured in AWS OpsWorks, create a pipeline with AWS OpsWorks as the provider to deploy your app to your specified layer. If you update your app or your Chef deployment recipes, the pipeline runs again automatically, triggering the deployment recipe to run and deploy your updated app.

This procedure creates a simple pipeline that includes only one Source and one Deploy stage. However, you can create more complex pipelines that use AWS OpsWorks as a provider.

To create a pipeline

  1. Open the AWS CodePipeline console in the U.S. East (N. Virginia) region.
  2. Choose Create pipeline.
  3. On the Getting started with AWS CodePipeline page, type MyOpsWorksPipeline, or a pipeline name of your choice, and then choose Next step.
  4. On the Source Location page, choose Amazon S3 from the Source provider drop-down list.
  5. In the Amazon S3 details area, type the Amazon S3 bucket path to your application, in the format s3://bucket-name/file name. Refer to the link you noted in step 1.4. Choose Next step.
  6. On the Build page, choose No Build from the drop-down list, and then choose Next step.
  7. On the Deploy page, choose AWS OpsWorks as the deployment provider.


     

  8. Specify the names of the stack, layer, and app that you created earlier, then choose Next step.
  9. On the AWS Service Role page, choose Create Role. On the IAM console page that opens, you will see the role that will be created for you (AWS-CodePipeline-Service). From the Policy Name drop-down list, choose Create new policy. Be sure the policy document has the following content, and then choose Allow.
    For more information about the service role and its policy statement, see Attach or Edit a Policy for an IAM Service Role.


     

  10. On the Review your pipeline page, confirm the choices shown on the page, and then choose Create pipeline.

The pipeline should now start deploying your app to your OpsWorks layer on its own.  Wait for deployment to finish; you’ll know it’s finished when Succeeded is displayed in both the Source and Deploy stages.

Step 8: Verifying the app deployment

To verify that AWS CodePipeline deployed the Node.js app to your layer, sign in to the instance you created in step 4. You should be able to see and use the Node.js web app.

  1. On the AWS OpsWorks dashboard, choose the stack and the layer to which you just deployed your app.
  2. In the navigation pane, choose Instances, and then choose the public IP address of your instance to view the web app. The running app will be displayed in a new browser tab.


     

  3. To test the app, on the app’s web page, in the Leave a comment text box, type a comment, and then choose Send. The app adds your comment to the web page. You can add more comments to the page, if you like.

Wrap up

You now have a working and fully automated pipeline. As soon as you make changes to your application’s code and update the S3 bucket with the new version of your app, AWS CodePipeline automatically collects the artifact and uses AWS OpsWorks to deploy it to your instance, by running the OpsWorks deployment Chef recipe that you defined on your layer. The deployment recipe starts all of the operations on your instance that are required to support a new version of your artifact.

To learn more about Chef cookbooks and recipes: https://docs.chef.io/cookbooks.html

To learn more about the AWS OpsWorks and AWS CodePipeline integration: https://docs.aws.amazon.com/opsworks/latest/userguide/other-services-cp.html

AWS SDK for C++ – Now Ready for Production Use

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-sdk-for-c-now-ready-for-production-use/

After almost a year of developer feedback and contributions, version 1.0 of the AWS SDK for C++ is now available and recommended for production use. The SDK follows semantic versioning, so starting at version 1.0, you can depend on any of the C++ SDKs at version 1.x, and upgrades will not break your build.

Based on the feedback that we received for the developer preview of the SDK, we have made several important changes and improvements:

  • Semantic Versioning – The SDK now follows semantic versioning. Starting with version 1.0, you can be confident that upgrades within the 1.x series will not break your build.
  • Transfer Manager – The original TransferClient has evolved into the new and improved TransferManager interface.
  • Build Process – The CMake build chain has been improved in order to make it easier to override platform defaults.
  • Simplified Configuration -It is now easier to set SDK-wide configuration options at runtime.
  • Encryption – The SDK now includes symmetric cryptography support on all supported platforms.
  • NuGet – The SDK is now available via NuGet (read AWS SDK for C++ Now Available via. NuGet to learn more).
  • Fixes – The 1.0 codebase includes numerous bug fixes and build improvements.

In addition, we have more high-level APIs that we will be releasing soon to make C++ development on AWS even easier and more secure.

Here’s a code sample using the new and improved TransferManager API:

#include <aws/core/Aws.h>
#include <aws/s3/S3Client.h>
#include <aws/transfer/TransferManager.h>

static const char* ALLOC_TAG = "main";

int main()
{
    Aws::SDKOptions options;
    Aws::InitAPI(options);

    auto s3Client = Aws::MakeShared<Aws::S3::S3Client>(ALLOC_TAG);
    Aws::Transfer::TransferManagerConfiguration transferConfig;
    transferConfig.s3Client = s3Client;

    transferConfig.transferStatusUpdatedCallback =
       [](const TransferManager*, const TransferHandle& handle)
       { std::cout << "Transfer Status = " << static_cast(handle.GetStatus()) << "\n"; }

    transferConfig.uploadProgressCallback =
        [](const TransferManager*, const TransferHandle& handle)
        { std::cout << "Upload Progress: " << handle.GetBytesTransferred() << " of " << handle.GetBytesTotalSize() << " bytes\n";};

    transferConfig.downloadProgressCallback =
        [](const TransferManager*, const TransferHandle& handle)
        { std::cout << "Download Progress: " << handle.GetBytesTransferred() << " of " << handle.GetBytesTotalSize() << " bytes\n"; };
    
    Aws::Transfer::TransferManager transferManager(transferConfig);
    auto transferHandle = transferManager.UploadFile("/user/aws/giantFile", "aws_cpp_ga", "giantFile", 
                                                     "text/plain", Aws::Map<Aws::String, Aws::String>());
    transferHandle.WaitUntilFinished();
     
    Aws::ShutdownAPI(options);
    return 0;
}

Visit the AWS SDK for C++ home page and read the AWS Developer Blog (C++) to learn more.

Keep the Feedback Coming
Now that the AWS SDK for C++ is production-ready, we’d like to know what you think, how you are using it, and how we can make it even better. Please feel free to file issues or to submit pull requests as you find opportunities for improvement.


Jeff;

 

Weekly roundup: slow but steady

Post Syndicated from Eevee original https://eev.ee/dev/2016/08/14/weekly-roundup-slow-but-steady/

August is loosely about video games, but really it’s about three big things in particular.

  • book: Lots of progress! I’m definitely getting a feel for the writing style I want to use, I’ve wrangled Sphinx into a basically usable state, I’ve written a lot of tentative preface stuff and much of the intro part of the chapter, and I’ve written a ton of scratchy prose (like notes, but full sentences that just need some editing and cleanup later). Also worked around some frequent crashes with, ah, a thing I’m writing about.

  • veekun: I did a serious cleanup of the gen 1 extraction code; added some zany heuristics for detecting data that should work even in fan hacks (if there even are any for gen 1); and hacked multi-language extraction into working well enough for starters.

    Finally, and I do mean finally, I built some groundwork for versioning support in the Python API. This has been hanging over my head for probably a year and was one of the biggest reasons I kept putting off working on this whole concept. I just didn’t quite know how I wanted to do it, and I was terrified of doing it “wrong”. At long last, yesterday I pushed through, and now I can see the light at the end of the tunnel.

    I also committed what I had so far, which is a complete mess but also a working mess, and that makes me feel better about the state of things. You can have a look if you want.

  • runed awakening: I didn’t get any tangible work done, but after some months of agonizing, I finally figured out how to make the ending sensible. Mostly. Like 80%. I’m much closer than I used to be. Once I nail down a couple minor details, I should be able to go actually build it.


  • blog: I finally fixed veekun’s front page — the entire contents of blog posts will no longer appear there. (The actual problem was that Pelican, the blog generator I use, puts the entirety of blog posts in Atom’s <summary> field and wow is that wrong. I’ve submitted a PR and patched my local install.)

    I wrote about half a post on testing, which I’d really like to finish today.

  • zdoom: My Lua branch can now list out an actor’s entire inventory — the first capability that’s actually impossible using the existing modding tools. (You can check how many of a particular item an actor is carrying, but there’s no way to iterate over a list of strings from a mod.)

  • doom: Almost finished my anachrony demo map, but stopped because I wasn’t sure how to show off the last couple things. Fixed a couple items that had apparently been broken the entire time, whoops.

  • slade: I added the most half-assed stub of a list of all the things in the current map and how many there are on each difficulty. I vaguely intend to make a whole map info panel, and I still need to finish 3D floors; I just haven’t felt too inclined to pour much time into SLADE lately. Both C++ and GUI apps are a bit of a slog to work with.

  • art: I scribbled Latias with a backpack and some other things.

    I did two daily Pokémon, which is, at least, better than one. I think they’re getting better, but I also think I’m just trying to draw more than I know how to do in an hour.

    I hit a skill wall this week, where my own expectations greatly outpaced my ability. It happens every so often and it’s always maddening. I spent a lot of time sketching and looking up refs (for once) and eventually managed to pierce through it — somehow I came out with a markedly improved understanding of general anatomy, hands, color, perspective, and lighting? I don’t know how this works. The best thing I drew is not something I’ll link here, but you can enjoy this which is pretty good too. Oh, I guess I did a semi-public art stream for the first time this week, too.

    Now my arm is tired and the callus where I grip the pen too hard is a bit sore.

  • irl: Oh boy I got my oil changed? Also I closed a whole bunch of tabs and went through some old email again, in a vain attempt to make my thinkin’ space a bit less chaotic.

Wow! A lot of things again. That’s awesome. I really don’t know where I even found the time to do quite so much drawing, but I’m not complaining.

I’m a little panicked, since we’re halfway through the month and I don’t think any of the things I’m working on are half done yet. I did try to give myself a lot of wiggle room in the October scheduling, and it’s still early, so we’ll see how it goes. I can’t remember the last time I was quite this productive for this long continuously, so I’m happy to call this a success so far either way.

How to Automatically Tag Amazon EC2 Resources in Response to API Events

Post Syndicated from Alessandro Martini original https://blogs.aws.amazon.com/security/post/Tx150Z810KS4ZEC/How-to-Automatically-Tag-Amazon-EC2-Resources-in-Response-to-API-Events

Access to manage Amazon EC2 instances can be controlled using tags. You can do this by writing an Identity and Access Management (IAM) policy that grants users permissions to manage EC2 instances that have a specific tag. However, if you also give users permissions to create or delete tags, users can manipulate the values of the tags to gain access and manage additional instances.

In this blog post, I will explore a method to automatically tag an EC2 instance and its associated resources without granting ec2:createTags permission to users. I will use a combination of an Amazon CloudWatch Events rule and AWS Lambda to tag newly created instances. With this solution, your users do not need to have permissions to create tags because the Lambda function will have the permissions to tag the instances. The solution can be automatically deployed in the region of your choice with AWS CloudFormation. I explain the provided solution and the CloudFormation template in the following sections.

Solution overview

This solution has two parts. I first explore the AWS Identity and Access Management (IAM) policy associated with my IAM user, Bob. Then, I explore how you can tag EC2 resources automatically in response to specific API events in my AWS account.

The IAM configuration

My IAM user, Bob, belongs to an IAM group with a customer managed policy called [Your CloudFormation StackName]TagBasedEC2RestrictionsPolicy[Unique ID], as shown in the following screenshot. For convenience, I will refer to this policy as TagBasedEC2RestrictionsPolicy in the remainder of this post. (Throughout this post, you should replace placeholder values with your own AWS information.)

The content of the customer managed policy, TagBasedEC2RestrictionsPolicy, follows.

{
               "Version" : "2012-10-17",
               "Statement" : [
                   {
                       "Sid" : "LaunchEC2Instances",
                      "Effect" : "Allow",
                       "Action" : [
                           "ec2:Describe*",
                           "ec2:RunInstances"
                       ],
                       "Resource" : [
                           "*"
                       ]
                   },
                   {
                       "Sid" : "AllowActionsIfYouAreTheOwner",
                       "Effect" : "Allow",
                       "Action" : [
                           "ec2:StopInstances",
                           "ec2:StartInstances",
                           "ec2:RebootInstances",
                           "ec2:TerminateInstances"
                       ],
                       "Condition" : {
                           "StringEquals" : {
                               "ec2:ResourceTag/PrincipalId" : "${aws:userid}"
                           }
                       },
                       "Resource"  : [
                           "*"
                       ]
                   }
               ]
}
 
This policy explicitly allows all EC2 describe actions and ec2:runInstances (in the LaunchEC2Instances statement). The core of the policy is in the AllowActionsIfYouAreTheOwner statement. This statement applies a condition to EC2 actions we want to limit, in which we allow the action only if a tag named PrincipalId matches your current user ID. I am using the conditional variable, "${aws:userid}", because it is always defined for any type of authenticated user. On the other hand, the AWS variable, aws:username, is only present for IAM users, and not for federated users.

For example, an IAM user cannot see the unique identifier, UserId, from the IAM console, but you can retrieve it with the AWS CLI by using the following command.

aws iam get-user --user-name Bob

The following output comes from that command.

{
    "User": {
        "UserName": "Bob",
        "PasswordLastUsed": "2016-04-29T20:52:37Z",
        "CreateDate": "2016-04-26T19:26:43Z",
        "UserId": "AIDAJ7EQQEKFAVPO3NTQG",
        "Path": "/",
        "Arn": "arn:aws:iam::111122223333:user/Bob"
    }
}

In other cases, such as when assuming an IAM role to access an AWS account, the UserId is a combination of the assumed IAM role ID and the role session name that you specified at the time of the AssumeRole API call.

role-id:role-session-name

For a full list of values that you can substitute for policy variables, see Request Information That You Can Use for Policy Variables.

Because you cannot possibly memorize all these IDs, the automation defined later in this post not only tags resources with the UserId, but also it retrieves the actual userName (or RoleSessionName), and uses this value to define a more human-readable tag, Owner, that can be used to filter or report resources in an easier way.

Tag automation

The IAM user has EC2 rights to launch an EC2 instance. Regardless of how the user creates the EC2 instance (with the AWS Management Console or AWS CLI), he performs a RunInstances API call (#1 in the following diagram). CloudWatch Events records this activity (#2).

A CloudWatch Events rule targets a Lambda function called AutoTag and it invokes the function with the event details (#3). The event details contain the information about the user that completed the action (this information is retrieved automatically from AWS CloudTrail, which must be on for CloudWatch Events to work).

The Lambda function AutoTag scans the event details, and extracts all the possible resource IDs as well as the user’s identity (#4). The function applies two tags to the created resources (#5):

  • Owner, with the current userName.
  • PrincipalId, with the current user’s aws:userid value.

When Amazon Elastic Block Store (EBS) volumes, EBS snapshots and Amazon Machine Images (AMIs) are individually created, they invoke the same Lambda function. This way, you can similarly Allow or Deny actions based on tags for those resources, or identify which resources a user created.

CloudFormation automation

This CloudFormation template creates a Lambda function, and CloudWatch Events trigger that function in the region you choose. Lambda permissions to describe and tag EC2 resources are obtained from an IAM role the template creates along with the function. The template also creates an IAM group into which you can place your user to enforce the behavior described in this blog post. The template also creates a customer managed policy so that you can easily apply it to other IAM entities, such as IAM roles or other existing IAM groups.

Note: Currently, CloudWatch Events is available in six regions, and Lambda is available in five regions. Keep in mind that you can only use this post’s solution in regions where both CloudWatch Events and Lambda are available. As these services grow, you will be able to launch the same template in other regions as well.

The template first defines a Lambda function with an alias (PROD) and two versions (Version $LATEST and Version 1), as shown in the following diagram. This is a best practice for continuous deployment: the definition of an alias decouples the trigger from the actual code version you want to execute, while versioning allows you to work on multiple versions of the same code and test changes while a stable version of your code is still available for current executions. For more information about aliases and versions, see AWS Lambda Function Versioning and Aliases.

The Lambda function

You will find the full Lambda function code in the CloudFormation template. I will explain key parts of the code in this section.

The code starts with the definition of an array called ids. This array will contain all the identifiers of EC2 resources found in a given event. Resource IDs could be instances, EBS volumes, EBS snapshots, ENIs, and AMI IDs.

Array initialization

ids = []

The Lambda function extracts the region, the event name (that is, the name of the invoked API, such as RunInstances and CreateVolume), and the user’s principalID and userName. Depending on the user type, the code extracts the userName from the event detail, or it defines it as the second part of the principalID.

Information extraction       

        region = event['region']
        detail = event['detail']
        eventname = detail['eventName']
        arn = event['detail']['userIdentity']['arn']
        principal = event['detail']['userIdentity']['principalId']
        userType = event['detail']['userIdentity']['type']

        if userType == 'IAMUser':
            user = detail['userIdentity']['userName']

        else:
            user = principal.split(':')[1]

Then, the code continues with EC2 client initialization.

EC2 client initialization

        ec2 = boto3.resource('ec2')

After a few input validations, the code looks for the resource IDs in the event detail. For each API call, the resource IDs can be found in different parts of the response, so the code contains a sequence of if/else statements to map these differences.

In the case of an EC2 instance, the function describes the identified instance to find the attached EBS volumes and ENIs, and adds their IDs to the array’s IDs.

Resource IDs extraction for each API call

        if eventname == 'CreateVolume':
            ids.append(detail['responseElements']['volumeId'])


        elif eventname == 'RunInstances':
            items = detail['responseElements']['instancesSet']['items']
            for item in items:
                ids.append(item['instanceId'])

            base = ec2.instances.filter(InstanceIds=ids)

            #loop through the instances
            for instance in base:
                for vol in instance.volumes.all():
                    ids.append(vol.id)
                for eni in instance.network_interfaces:
                    ids.append(eni.id)

        elif eventname == 'CreateImage':
            ids.append(detail['responseElements']['imageId'])

        elif eventname == 'CreateSnapshot':
            ids.append(detail['responseElements']['snapshotId'])

        else:
            logger.warning('Not supported action')

Now that you have all the identifiers, you can use the EC2 client to tag them all with the following tags:
  • Owner, with the value of the variable user.
  • PrincipalId, with the value of the variable principal.

Resource tagging

       if ids:
            ec2.create_tags(Resources=ids, Tags=[{'Key': 'Owner', 'Value': user}, {'Key': 'PrincipalId', 'Value': principal}])

The CloudWatch Events rule

In order to trigger the Lambda function, you create a CloudWatch Events rule. You define a rule to match events and route them to one or more target Lambda functions. In our case, the events we want to match are the following AWS EC2 API calls:

  • RunInstances
  • CreateVolume
  • CreateSnapshot
  • CreateImage

The target of this rule is our Lambda function. We specifically point to the PROD alias of the function to decouple the trigger from a specific version of the Lambda function code.

The following screenshot shows what the CloudFormation template creates.

IAM group

The CloudFormation template creates an IAM group called [Your CloudFormation StackName]-RestrictedIAMGroup-[Unique ID]. The customer managed policy TagBasedEC2RestrictionsPolicy is associated with the group.

The solution in action

First, deploy the CloudFormation template in the region of your choosing. Specify the following Amazon S3 template URL: https://s3.amazonaws.com/awsiammedia/public/sample/autotagec2resources/AutoTag.template, as shown in the following screenshot.

The template creates the stack only if you confirm you have enabled CloudTrail, as shown in the following screenshot.

The deployment should be completed within 5 minutes. The following screenshot shows the status of the stack as CREATE_COMPLETE.

Now that you have deployed the required automation infrastructure, you can assign IAM users to the created IAM group (ManageEC2InstancesGroup in the following screenshot).

Similarly, if you are using federation and your users access your AWS account through an IAM role, you can attach the customer managed policy, TagBasedEC2RestrictionsPolicy, to the IAM role itself.

In my case, I have added my IAM user, Bob, to the IAM group created by CloudFormation. Note that you have to add the IAM users to this group manually.

When Bob signs in to my AWS account, he can create EC2 instances with no restrictions. However, if he tries to stop an EC2 instance that he did not create, he will get the following error message.

In the meantime, the instance Bob created has been automatically tagged with his user name and user ID.

Bob can now terminate the instance. The following screenshot shows the instance in the process of shutting down.

What’s next?

Now that you know you can tag resources with a Lambda function in response to events, you can apply the same logic to other resources such as Amazon Relational Database Service (RDS) databases or S3 buckets. With resource groups, each user can focus on just the resources he created, and the IAM policy provided in this post assures that no Stop/Start/Reboot/Terminate action is possible on someone else’s instance.

Additionally, tags are useful in custom billing reports to project costs and determine how much money each individual owner is spending. You can activate the Owner tag in the billing console from the Cost Allocation Tags of your billing console to include it in your detailed billing reports. For more information, see Applying Tags.

If you have any comments, submit them in the “Comments” section below. If you have questions about the solution in this blog post, please start a new thread on the EC2 forum.

– Alessandro

Surviving the Zombie Apocalypse with Serverless Microservices

Post Syndicated from Aaron Kao original https://aws.amazon.com/blogs/compute/surviving-the-zombie-apocalypse-with-serverless-microservices/

Run Apps without the Bite!

by: Kyle Somers – Associate Solutions Architect

Let’s face it, managing servers is a pain! Capacity management and scaling is even worse. Now imagine dedicating your time to SysOps during a zombie apocalypse — barricading the door from flesh eaters with one arm while patching an OS with the other.

This sounds like something straight out of a nightmare. Lucky for you, this doesn’t have to be the case. Over at AWS, we’re making it easier than ever to build and power apps at scale with powerful managed services, so you can focus on your core business – like surviving – while we handle the infrastructure management that helps you do so.

Join the AWS Lambda Signal Corps!

At AWS re:Invent in 2015, we piloted a workshop where participants worked in groups to build a serverless chat application for zombie apocalypse survivors, using Amazon S3, Amazon DynamoDB, Amazon API Gateway, and AWS Lambda. Participants learned about microservices design patterns and best practices. They then extended the functionality of the serverless chat application with various add-on functionalities – such as mobile SMS integration, and zombie motion detection – using additional services like Amazon SNS and Amazon Elasticsearch Service.

Between the widespread interest in serverless architectures and AWS Lambda by our customers, we’ve recognized the excitement around this subject. Therefore, we are happy to announce that we’ll be taking this event on the road in the U.S. and abroad to recruit new developers for the AWS Lambda Signal Corps!

 

Help us save humanity! Learn More and Register Here!

 

Washington, DC | March 10 – Mission Accomplished!

San Francisco, CA @ AWS Loft | March 24 – Mission Accomplished!

New York City, NY @ AWS Loft | April 13 – Mission Accomplished!

London, England @ AWS Loft | April 25

Austin, TX | April 26

Atlanta, GA | May 4

Santa Monica, CA | June 7

Berlin, Germany | July 19

San Francisco, CA @ AWS Loft | August 16

New York City, NY @ AWS Loft | August 18

 

If you’re unable to join us at one of these workshops, that’s OK! In this post, I’ll show you how our survivor chat application incorporates some important microservices design patterns and how you can power your apps in the same way using a serverless architecture.


 

What Are Serverless Architectures?

At AWS, we know that infrastructure management can be challenging. We also understand that customers prefer to focus on delivering value to their business and customers. There’s a lot of undifferentiated heavy lifting to be building and running applications, such as installing software, managing servers, coordinating patch schedules, and scaling to meet demand. Serverless architectures allow you to build and run applications and services without having to manage infrastructure. Your application still runs on servers, but all the server management is done for you by AWS. Serverless architectures can make it easier to build, manage, and scale applications in the cloud by eliminating much of the heavy lifting involved with server management.

Key Benefits of Serverless Architectures

  • No Servers to Manage: There are no servers for you to provision and manage. All the server management is done for you by AWS.
  • Increased Productivity: You can now fully focus your attention on building new features and apps because you are freed from the complexities of server management, allowing you to iterate faster and reduce your development time.
  • Continuous Scaling: Your applications and services automatically scale up and down based on size of the workload.

What Should I Expect to Learn at a Zombie Microservices Workshop?

The workshop content we developed is designed to demonstrate best practices for serverless architectures using AWS. In this post we’ll discuss the following topics:

  • Which services are useful when designing a serverless application on AWS (see below!)
  • Design considerations for messaging, data transformation, and business or app-tier logic when building serverless microservices.
  • Best practices demonstrated in the design of our zombie survivor chat application.
  • Next steps for you to get started building your own serverless microservices!

Several AWS services were used to design our zombie survivor chat application. Each of these services are managed and highly scalable. Let’s take a quick at look at which ones we incorporated in the architecture:

  • AWS Lambda allows you to run your code without provisioning or managing servers. Just upload your code (currently Node.js, Python, or Java) and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app. Lambda is used to power many use cases, such as application back ends, scheduled administrative tasks, and even big data workloads via integration with other AWS services such as Amazon S3, DynamoDB, Redshift, and Kinesis.
  • Amazon Simple Storage Service (Amazon S3) is our object storage service, which provides developers and IT teams with secure, durable, and scalable storage in the cloud. S3 is used to support a wide variety of use cases and is easy to use with a simple interface for storing and retrieving any amount of data. In the case of our survivor chat application, it can even be used to host static websites with CORS and DNS support.
  • Amazon API Gateway makes it easy to build RESTful APIs for your applications. API Gateway is scalable and simple to set up, allowing you to build integrations with back-end applications, including code running on AWS Lambda, while the service handles the scaling of your API requests.
  • Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

Overview of the Zombie Survivor Chat App

The survivor chat application represents a completely serverless architecture that delivers a baseline chat application (written using AngularJS) to workshop participants upon which additional functionality can be added. In order to deliver this baseline chat application, an AWS CloudFormation template is provided to participants, which spins up the environment in their account. The following diagram represents a high level architecture of the components that are launched automatically:

High-Level Architecture of Survivor Serverless Chat App

  • Amazon S3 bucket is created to store the static web app contents of the chat application.
  • AWS Lambda functions are created to serve as the back-end business logic tier for processing reads/writes of chat messages.
  • API endpoints are created using API Gateway and mapped to Lambda functions. The API Gateway POST method points to a WriteMessages Lambda function. The GET method points to a GetMessages Lambda function.
  • A DynamoDB messages table is provisioned to act as our data store for the messages from the chat application.

Serverless Survivor Chat App Hosted on Amazon S3

With the CloudFormation stack launched and the components built out, the end result is a fully functioning chat app hosted in S3, using API Gateway and Lambda to process requests, and DynamoDB as the persistence for our chat messages.

With this baseline app, participants join in teams to build out additional functionality, including the following:

  • Integration of SMS/MMS via Twilio. Send messages to chat from SMS.
  • Motion sensor detection of nearby zombies with Amazon SNS and Intel® Edison and Grove IoT Starter Kit. AWS provides a shared motion sensor for the workshop, and you consume its messages from SNS.
  • Help-me panic button with IoT.
  • Integration with Slack for messaging from another platform.
  • Typing indicator to see which survivors are typing.
  • Serverless analytics of chat messages using Amazon Elasticsearch Service (Amazon ES).
  • Any other functionality participants can think of!

As a part of the workshop, AWS provides guidance for most of these tasks. With these add-ons completed, the architecture of the chat system begins to look quite a bit more sophisticated, as shown below:

Architecture of Survivor Chat with Additional Add-on Functionality

Architectural Tenants of the Serverless Survivor Chat

For the most part, the design patterns you’d see in a traditional server-yes environment you will also find in a serverless environment. No surprises there. With that said, it never hurts to revisit best practices while learning new ones. So let’s review some key patterns we incorporated in our serverless application.

Decoupling Is Paramount

In the survivor chat application, Lambda functions are serving as our tier for business logic. Since users interact with Lambda at the function level, it serves you well to split up logic into separate functions as much as possible so you can scale the logic tier independently from the source and destinations upon which it serves.

As you’ll see in the architecture diagram in the above section, the application has separate Lambda functions for the chat service, the search service, the indicator service, etc. Decoupling is also incorporated through the use of API Gateway, which exposes our back-end logic via a unified RESTful interface. This model allows us to design our back-end logic with potentially different programming languages, systems, or communications channels, while keeping the requesting endpoints unaware of the implementation. Use this pattern and you won’t cry for help when you need to scale, update, add, or remove pieces of your environment.

Separate Your Data Stores

Treat each data store as an isolated application component of the service it supports. One common pitfall when following microservices architectures is to forget about the data layer. By keeping the data stores specific to the service they support, you can better manage the resources needed at the data layer specifically for that service. This is the true value in microservices.

In the survivor chat application, this practice is illustrated with the Activity and Messages DynamoDB tables. The activity indicator service has its own data store (Activity table) while the chat service has its own (Messages). These tables can scale independently along with their respective services. This scenario also represents a good example of statefuless. The implementation of the talking indicator add-on uses DynamoDB via the Activity table to track state information about which users are talking. Remember, many of the benefits of microservices are lost if the components are still all glued together at the data layer in the end, creating a messy common denominator for scaling.

Leverage Data Transformations up the Stack

When designing a service, data transformation and compatibility are big components. How will you handle inputs from many different clients, users, systems for your service? Will you run different flavors of your environment to correspond with different incoming request standards?  Absolutely not!

With API Gateway, data transformation becomes significantly easier through built-in models and mapping templates. With these features you can build data transformation and mapping logic into the API layer for requests and responses. This results in less work for you since API Gateway is a managed service. In the case of our survivor chat app, AWS Lambda and our survivor chat app require JSON while Twilio likes XML for the SMS integration. This type of transformation can be offloaded to API Gateway, leaving you with a cleaner business tier and one less thing to design around!

Use API Gateway as your interface and Lambda as your common backend implementation. API Gateway uses Apache Velocity Template Language (VTL) and JSONPath for transformation logic. Of course, there is a trade-off to be considered, as a lot of transformation logic could be handled in your business-logic tier (Lambda). But, why manage that yourself in application code when you can transparently handle it in a fully managed service through API Gateway? Here are a few things to keep in mind when handling transformations using API Gateway and Lambda:

  • Transform first; then call your common back-end logic.
  • Use API Gateway VTL transformations first when possible.
  • Use Lambda to preprocess data in ways that VTL can’t.

Using API Gateway VTL for Input/Output Data Transformations

 

Security Through Service Isolation and Least Privilege

As a general recommendation when designing your services, always utilize least privilege and isolate components of your application to provide control over access. In the survivor chat application, a permissions-based model is used via AWS Identity and Access Management (IAM). IAM is integrated in every service on the AWS platform and provides the capability for services and applications to assume roles with strict permission sets to perform their least-privileged access needs. Along with access controls, you should implement audit and access logging to provide the best visibility into your microservices. This is made easy with Amazon CloudWatch Logs and AWS CloudTrail. CloudTrail enables audit capability of API calls made on the platform while CloudWatch Logs enables you to ship custom log data to AWS. Although our implementation of Amazon Elasticsearch in the survivor chat is used for analyzing chat messages, you can easily ship your log data to it and perform analytics on your application. You can incorporate security best practices in the following ways with the survivor chat application:

  • Each Lambda function should have an IAM role to access only the resources it needs. For example, the GetMessages function can read from the Messages table while the WriteMessages function can write to it. But they cannot access the Activities table that is used to track who is typing for the indicator service.
  • Each API Gateway endpoint must have IAM permissions to execute the Lambda function(s) it is tied to. This model ensures that Lambda is only executed from the principle that is allowed to execute it, in this case the API Gateway method that triggers the back end function.
  • DynamoDB requires read/write permissions via IAM, which limits anonymous database activity.
  • Use AWS CloudTrail to audit API activity on the platform and among the various services. This provides traceability, especially to see who is invoking your Lambda functions.
  • Design Lambda functions to publish meaningful outputs, as these are logged to CloudWatch Logs on your behalf.

FYI, in our application, we allow anonymous access to the chat API Gateway endpoints. We want to encourage all survivors to plug into the service without prior registration and start communicating. We’ve assumed zombies aren’t intelligent enough to hack into our communication channels. Until the apocalypse, though, stay true to API keys and authorization with signatures, which API Gateway supports!

Don’t Abandon Dev/Test

When developing with microservices, you can still leverage separate development and test environments as a part of the deployment lifecycle. AWS provides several features to help you continue building apps along the same trajectory as before, including these:

  • Lambda function versioning and aliases: Use these features to version your functions based on the stages of deployment such as development, testing, staging, pre-production, etc. Or perhaps make changes to an existing Lambda function in production without downtime.
  • Lambda service blueprints: Lambda comes with dozens of blueprints to get you started with prewritten code that you can use as a skeleton, or a fully functioning solution, to complete your serverless back end. These include blueprints with hooks into Slack, S3, DynamoDB, and more.
  • API Gateway deployment stages: Similar to Lambda versioning, this feature lets you configure separate API stages, along with unique stage variables and deployment versions within each stage. This allows you to test your API with the same or different back ends while it progresses through changes that you make at the API layer.
  • Mock Integrations with API Gateway: Configure dummy responses that developers can use to test their code while the true implementation of your API is being developed. Mock integrations make it faster to iterate through the API portion of a development lifecycle by streamlining pieces that used to be very sequential/waterfall.

Using Mock Integrations with API Gateway

Stay Tuned for Updates!

Now that you’ve got the necessary best practices to design your microservices, do you have what it takes to fight against the zombie hoard? The serverless options we explored are ready for you to get started with and the survivors are counting on you!

Be sure to keep an eye on the AWS GitHub repo. Although I didn’t cover each component of the survivor chat app in this post, we’ll be deploying this workshop and code soon for you to launch on your own! Keep an eye out for Zombie Workshops coming to your city, or nominate your city for a workshop here.

For more information on how you can get started with serverless architectures on AWS, refer to the following resources:

Whitepaper – AWS Serverless Multi-Tier Architectures

Reference Architectures and Sample Code

*Special thanks to my colleagues Ben Snively, Curtis Bray, Dean Bryen, Warren Santner, and Aaron Kao at AWS. They were instrumental to our team developing the content referenced in this post.

Amazon EMR Update – Apache HBase 1.2 Is Now Available

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-hbase-1-2-is-now-available/

Apache HBase is a distributed, scalable big data store designed to support tables with billions of rows and millions of columns. HBase runs on top of Hadoop and HDFS and can also be queried using MapReduce, Hive, and Pig jobs.

AWS customers use HBase for their ad tech, web analytics, and financial services workloads. They appreciate its scalability and the ease with which it handles time-series data.

HBase 1.2 on Amazon EMR
Today we are making version 1.2 of HBase available for use with Amazon EMR.  Here are some of the most important and powerful features and benefits that you get when you run HBase:

Strongly Consistent Reads and Writes – When a writer returns, all of the readers will see the same value.

Scalability – Individual HBase tables can be comprised of billions of rows and millions of columns. HBase stores data in a sparse form in order to conserve space. You can use column families and column prefixes to organize your schemas and to indicate to HBase that the members of the family have a similar access pattern. You can also use timestamps and versioning to retain old versions of cells.

Backup to S3 – You can use the HBase Export Snapshot tool to backup your tables to Amazon S3. The backup operation is actually a MapReduce job and uses parallel processing to adeptly handle large tables.

Graphs And Timeseries – You can use HBase as the foundation for a more specialized data store. For example, you can use Titan for graph databases and OpenTSDB for time series.

Coprocessors – You can write custom business logic (similar to a trigger or a stored procedure) that runs within HBase and participates in query and update processing (read The How To of HBase Coprocessors to learn more).

You also get easy provisioning and scaling, access to a pre-configured installation of HDFS, and automatic node replacement for increased durability.

Getting Started with HBase
HBase 1.2 is available as part of Amazon EMR release 4.6. You can, as usual, launch it from the Amazon EMR Console, the Amazon EMR CLI, or through the Amazon EMR API. Here’s the command that I used:

$ aws --region us-east-1 emr create-cluster 
  --name "MyCluster" --release-label "emr-4.6.0" 
  --instance-type m3.xlarge --instance-count 3 --use-default-roles 
  --ec2-attributes KeyName=keys-jbarr-us-east 
  --applications Name=Hadoop Name=Hue Name=HBase Name=Hive

This command assumes that the EMR_DefaultRole and EMR_EC2_DefaultRole IAM roles already exist. They are created automatically when you launch an EMR cluster from the Console (read about Create and Use Roles for Amazon EMR and Create and Use Roles with the AWS CLI to learn more).

I  found the master node’s DNS on the Cluster Details page and SSH’ed in as user hadoop. Then I ran a couple of HBase shell commands:

Following the directions in our new HBase Tutorial, I created a table called customer, restored a multi-million record snapshot from S3 into the table, and ran some simple queries:

Available Now
You can start using HBase 1.2 on Amazon EMR today. To learn more, read the Amazon EMR Documentation.


Jeff;

Simple workflow for building web service APIs

Post Syndicated from yahoo original https://yahooeng.tumblr.com/post/142418165386

Norbert Potocki, Software Engineer @ Yahoo Inc.

APIs are at the core of
server-client communications, and well-defined API contracts are essential to the overall experience of client developer communities. At
Yahoo, we have explored the best methods to develop APIs – both external (like apiary, apigee, API Gateway) and internal. In our examination,
our main focus was to devise a methodology that provides a simple way to build new server endpoints, while guaranteeing a stable,
streamlined integration for client developers. The workflow itself can be used with one of many domain-specific languages (DSL) for API modeling (e.g. Swagger, RAML, ardielle). The
main driver for this project was a need to build a new generation of Flickr
APIs. Flickr has had a long tradition of exposing rich capabilities via our API and innovating. One of Flickr’s contributions to this
domain was inventing an early version of OAuth
protocol. In this post, we will share a simple workflow that demonstrates the new approach to building APIs. For the purpose of this
article we will focus on the Swagger, although the workflow can easily be adapted to use on another DSL.  

Our goals for developing the workflow:

  • Standardize parts of the API but allow for them to be easily replaced or extended
  • Maximize automation opportunities
    • Auto-generation of documentation
    • SDKs
    • API validation tests
  • Drive efficiency with code re-usability
  • Focus on developer productivity
    • Well-documented, easy to use development tools
    • Easy to follow workflow
    • Room for customization
  • Encourage collaboration with the open source community
    • Use open standards
    • Use a DSL for interface modeling

What we tried before and why it didn’t work

Let’s take a look at two popular approaches to building APIs.

The first is an implementation-centric approach. Backend developers implement the
service thus defining the API in the code. If you’re lucky there will be some code-level documentation – like javadoc – attached. Other
teams (e.g., frontend, mobile, 3rd party engineers) have to read the javadoc and/or the code to understand the API contract nuances. They
need to understand the programming language used, and the implementation has to be publicly available. Versioning may be tricky and the
situation can get worse when multiple development teams work on different services that represent a single umbrella-project. There’s a
chance that their APIs will fall out-of-sync, be it by implementing rate-limiting headers or by using different versioning models.

The other approach is to use in-house developed DSL and tools. There are already a slew of mature open source DSLs and tools available on
the market, and opting for this route may be more efficient. Swagger is a perfect example. Many engineers know it and you can share your
early API design with the open source community and get feedback. Also, there’s no extra learning curve involved so the chances that
somebody will contribute are higher.

The New Workflow

API components

Let’s start by discussing the API elements we work with and what roles they play in the workflow:


Figure 1 – API components
  • API specification: the centerpiece of each service, including an API contract described in a well-defined and structured,
    human-readable format.
  • Documentation: developer-friendly API reference, a user’s guide and examples of API usage. Partially
    generated from API specification and partially hand-crafted.
  • SDKs (Software Development Kits): a set of programming
    language-specific libraries that simplify interactions with APIs. They typically abstract out lower layers (e.g. HTTP) and expose API
    input and output as concepts specific to the language (e.g. objects in Java). Partially generated from the API specification.
  • Implementation: the business logic that directs how APIs provide functionality. Validated against API specification via API
    tests.
  • API tests: tests validating the implementation against the specification.

API specification – separating contract from implementation

An API specification is one of the most important parts of a web service. You may want to consider
keeping the contract definition separate from the implementation because it will allow you to:

  • Deliver the interface to customers faster (and get feedback sooner)
  • Keep your implementation private while still clearly communicating to customers what the service contract is
  • Describe APIs in a programming language-agnostic way to allow the use of different technologies for implementation
  • Work on SDKs, API documentation and clients in parallel to implementation

There are
a few popular domain-specific languages that can be used for describing the contract of your service. At Flickr, we use Swagger, and keep
all Swagger files in a GitHub repository called “api-spec”. It contains multiple yaml files that describe different parts of the API –
both reusable elements and service-specific endpoint and resource definitions.


Figure 2 – Repository holding API specification

To give you a taste of Swagger here’s how a combined yaml file could look like:


Figure 3 – sample Swagger YAML file

One nice aspect of Swagger is the Swagger editor. It’s a browser-based IDE that shows you a
live preview of your API documentation (generated from the spec) and also provides a console for querying mock backend implementation.
Here’s how it looks like:


Figure 4 – API editor

Once the changes are approved and merged to master, a number of CD
(continuous delivery) pipelines kick in: one per each SDK that we host and another pipeline for generating documentation. There is also an
option of triggering a CD pipeline generating stubs for backend implementation but the decision is left up to the service owner.

The power of documentation

Documentation is the most important yet most unappreciated part of software engineering. At Yahoo, we devote
lots of attention to documenting APIs, and in this workflow we keep the documentation in a separate GitHub repository. GitHub offers a great feature called GitHub
Pages
that allows us to host documentation on their servers and avoid building a custom CD pipeline for documentation. It also gives
you the ability to edit files directly in the browser. GitHub pages are powered by Jekyll, which
serves HTML pages directly from the repository. You use Markdown files to provide
content, select a web template to use and push it to the “gh-pages” branch:


Figure 5 – documentation repository

The
repo contains both a hand-crafted user’s guide and an automatically generated API reference. The API reference is generated from the API
specification and put in a directory called “api-reference.


Figure 6 – API reference directory

The process of
generating the API reference is executed by a simplistic CD pipeline. Every time you merge changes to the master branch of the API
specification repository, it will assemble the yaml files into a single Swagger json file and submit it as a pull-request towards the
documentation repository. Here’s the simple node.js script that does the transformation:


Figure 7 – merging multiple Swagger files

And a snippet from CD pipeline steps that creates the pull-request:


Figure 8 – generate documentation pull-request

The “api-reference” directory also contains the Swagger UI code, which is responsible for rendering the Swagger json file in the browser. It also provides a
console that allows you to send requests against a test backend instance, and comes in very handy when a customer wants to quickly explore
our APIs. Here’s how the final result looks:


Figure 9 – Swagger UI

Why having SDKs is important

Calling an API is
fun. Dealing with failures, re-tries and HTTP connection issues – not so much. That’s where services which have a dedicated SDK really
shine. An SDK can either be a thin wrapper around an HTTP client that deals with marshalling of requests and responses, or a fat client
that has extra business logic in it. Since this extra business logic is handcrafted most of the time, we will exclude it from the
discussion and focus on a thin client instead.

Thin API clients can usually be auto-generated from API specifications. We have a CD
pipeline (similar to the documentation CD pipeline) that is responsible for this process. Each SDK is kept in a separate GitHub
repository. For each API specification change, all SDKs are regenerated and pushed (as pull-requests) to appropriate repositories. Take a
look at the swagger-codegen project to learn more about SDK generation.

It’s worth mentioning that the thin layer could also be generated in runtime based on the Swagger json file itself.

API implementation

The major question that pops out when implementing an API is: should we automatically generate the stub code? From
our experience – it may be worth it, but most often it’s not. API stub scaffolding saves you some initial work when you add a new API.
However, different service owners prefer to structure their code in various manners (packages, class names, how code is divided between
REST controllers, etc.) and thus it’s expensive to develop a one-size-fits-all generator.

The last topic we want to cover is
validating implementation against API specification. Validation happens via tests (written in Cucumber) that are executed with every
change to the implementation. We validate API responses schema, different failure scenarios (for valid HTTP status code usage), returned
headers, rate-limiting mechanism, pagination and others. To maximize code-reuse and simplify test code, we use one of the thin SDKs for
API calls within tests.


Figure 10 – Implementation validation

Summary

In this article, we provided a simple, yet comprehensive, overview for working with APIs that
we use at Flickr, and examined the key features, including clear separation of different system components (specification, implementation,
documentation, sdks, tests), developer-friendliness and automation options. We also presented the workflow that binds all the components
together in an easy to use, streamlined way.

S3 Lifecycle Management Update – Support for Multipart Uploads and Delete Markers

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/s3-lifecycle-management-update-support-for-multipart-uploads-and-delete-markers/

It is still a bit of a shock to me to realize that Amazon S3 is now ten years old! The intervening decade has simply flown by.
For several years, you have been able to use S3’s Lifecycle Management feature to control the storage class and the lifetime of your objects. As you may know, you can set up rules on a per-bucket or per-prefix basis. Each rule specifies an action to be taken when objects reach a certain age.
Today we are adding two rules that will give you additional control over two special types of objects: incomplete multipart uploads and expired object delete markers. Before we go any further, I should define these objects!
Incomplete Multipart Uploads – S3’s multipart upload feature accelerates the uploading of large objects by allowing you to split them up into logical parts that can be uploaded in parallel.  If you initiate a multipart upload but never finish it, the in-progress upload occupies some storage space and will incur storage charges. However, these uploads are not visible when you list the contents of a bucket and (until today’s release) had to be explicitly removed.
Expired Object Delete Markers – S3’s versioning feature allows you to preserve, retrieve, and restore every version of every object stored in a versioned bucket. When you delete a versioned object, a delete marker is created. If all previous versions of the object subsequently expire, an expired object delete marker is left. These markers do not incur storage charges. However, removing unneeded delete markers can improve the performance of S3’s LIST operation.
New Rules You can now exercise additional control over these objects using some new lifecycle rules, lowering your costs and improving performance in the process. As usual, you can set these up using the AWS Management Console, the S3 APIs, the AWS Command Line Interface (CLI), or the AWS Tools for Windows PowerShell.
Here’s how you set up a rule for incomplete multipart uploads using the Console. Start by opening the console and navigating to the desired bucket (mine is called jbarr):

Then click on Properties, open up the Lifecycle section, and click on Add rule:

Decide on the target (the whole bucket or the prefixed subset of your choice) and then click on Configure Rule:

Then enable the new rule and select the desired expiration period:

As a best practice, we recommend that you enable this setting even if you are not sure that you are actually making use of multipart uploads. Some applications will default to the use of multipart uploads when uploading files above a particular, application-dependent, size.
Here’s how you set up a rule to remove delete markers for expired objects that have no previous versions:

S3 Best Practices While you are here, here are some best practices that you should consider using for your own S3-based applications:
Versioning – You can enable Versioning for your S3 buckets in order to be able to recover from accidental overwrites and deletes. With versioning turned on, you can preserve, retrieve, and restore earlier versions of your data.
Replication – Take advantage of S3’s Cross-Region Replication in order to meet your organization’s compliance policies by creating a replica of your data in a second AWS Region.
Performance -If you anticipate a consistently high number of PUT, LIST, DELETE, or GET requests against your buckets, you can optimize your application’s performance by implementing the tips outlined in the performance section of the Amazon S3 documentation.
Cost Management – You can reduce your costs by setting up S3 lifecycle policies that will transition your data to other S3 storage tiers or expire data that is no longer needed. —
Jeff;
 

S3 Lifecycle Management Update – Support for Multipart Uploads and Delete Markers

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/s3-lifecycle-management-update-support-for-multipart-uploads-and-delete-markers/

It is still a bit of a shock to me to realize that Amazon S3 is now ten years old! The intervening decade has simply flown by.
For several years, you have been able to use S3’s Lifecycle Management feature to control the storage class and the lifetime of your objects. As you may know, you can set up rules on a per-bucket or per-prefix basis. Each rule specifies an action to be taken when objects reach a certain age.
Today we are adding two rules that will give you additional control over two special types of objects: incomplete multipart uploads and expired object delete markers. Before we go any further, I should define these objects!
Incomplete Multipart Uploads – S3’s multipart upload feature accelerates the uploading of large objects by allowing you to split them up into logical parts that can be uploaded in parallel.  If you initiate a multipart upload but never finish it, the in-progress upload occupies some storage space and will incur storage charges. However, these uploads are not visible when you list the contents of a bucket and (until today’s release) had to be explicitly removed.
Expired Object Delete Markers – S3’s versioning feature allows you to preserve, retrieve, and restore every version of every object stored in a versioned bucket. When you delete a versioned object, a delete marker is created. If all previous versions of the object subsequently expire, an expired object delete marker is left. These markers do not incur storage charges. However, removing unneeded delete markers can improve the performance of S3’s LIST operation.
New Rules You can now exercise additional control over these objects using some new lifecycle rules, lowering your costs and improving performance in the process. As usual, you can set these up using the AWS Management Console, the S3 APIs, the AWS Command Line Interface (CLI), or the AWS Tools for Windows PowerShell.
Here’s how you set up a rule for incomplete multipart uploads using the Console. Start by opening the console and navigating to the desired bucket (mine is called jbarr):

Then click on Properties, open up the Lifecycle section, and click on Add rule:

Decide on the target (the whole bucket or the prefixed subset of your choice) and then click on Configure Rule:

Then enable the new rule and select the desired expiration period:

As a best practice, we recommend that you enable this setting even if you are not sure that you are actually making use of multipart uploads. Some applications will default to the use of multipart uploads when uploading files above a particular, application-dependent, size.
Here’s how you set up a rule to remove delete markers for expired objects that have no previous versions:

S3 Best Practices While you are here, here are some best practices that you should consider using for your own S3-based applications:
Versioning – You can enable Versioning for your S3 buckets in order to be able to recover from accidental overwrites and deletes. With versioning turned on, you can preserve, retrieve, and restore earlier versions of your data.
Replication – Take advantage of S3’s Cross-Region Replication in order to meet your organization’s compliance policies by creating a replica of your data in a second AWS Region.
Performance -If you anticipate a consistently high number of PUT, LIST, DELETE, or GET requests against your buckets, you can optimize your application’s performance by implementing the tips outlined in the performance section of the Amazon S3 documentation.
Cost Management – You can reduce your costs by setting up S3 lifecycle policies that will transition your data to other S3 storage tiers or expire data that is no longer needed. —
Jeff;
 

Using API Gateway mapping templates to handle changes in your back-end APIs

Post Syndicated from Stefano Buliani original https://aws.amazon.com/blogs/compute/using-api-gateway-mapping-templates-to-handle-changes-in-your-back-end-apis/

Maitreya Ranganath Maitreya Ranganath, AWS Solutions Architect
Changes to APIs are always risky, especially if changes are made in ways that are not backward compatible. In this blog post, we show you how to use Amazon API Gateway mapping templates to isolate your API consumers from API changes. This enables your API consumers to migrate to new API versions on their own schedule.
For an example scenario, we start with a very simple Store Front API with one resource for orders and one GET method. For this example, the API target is implemented in AWS Lambda to keep things simple – but you can of course imagine the back end being your own endpoint.
The structure of the API V1 is:
Method: GET
Path: /orders
Query Parameters:
start = timestamp
end = timestamp

Response:
[
{
“orderId” : string,
“orderTs” : string,
“orderAmount” : number
}
]
The initial version (V1) of the API was implemented when there were few orders per day. The API was not paginated; if the number of orders that match the query is larger than 5, an error returns. The API consumer must then submit a request with a smaller time range.
The API V1 is exposed through API Gateway and you have several consumers of this API in Production.
After you upgrade the back end, the API developers make a change to support pagination. This makes the API more scalable and allows the API consumers to handle large lists of orders by paging through them with a token. This is a good design change but it breaks backward compatibility. It introduces a challenge because you have a large base of API consumers using V1 and their code can’t handle the changed nesting structure of this response.
The structure of API V2 is:
Method: GET
Path: /orders
Query Parameters:
start = timestamp
end = timestamp
token = string (optional)

Response:
{
“nextToken” : string,
“orders” : [
{
“orderId” : string,
“orderTs” : string
“orderAmount” : number
}
]
}
Using mapping templates, you can isolate your API consumers from this change: your existing V1 API consumers will not be impacted when you publish V2 of the API in parallel. You want to let your consumers migrate to V2 on their own schedule.
We’ll show you how to do that in this blog post. Let’s get started.
Deploying V1 of the API
To deploy V1 of the API, create a simple Lambda function and expose that through API Gateway:

Sign in to the AWS Lambda console.
Choose Create a Lambda function.
In Step 1: Select blueprint, choose Skip; you’ll enter the details for the Lambda function manually.
In Step 2: Configure function, use the following values:

In Name, type getOrders.
In Description, type Returns orders for a time-range.
In Runtime, choose Node.js.
For Code entry type, choose Edit code inline. Copy and paste the code snippet below into the code input box.

MILISECONDS_DAY = 3600*1000*24;

exports.handler = function(event, context) {
console.log(‘start =’, event.start);
console.log(‘end =’, event.end);

start = Date.parse(decodeURIComponent(event.start));
end = Date.parse(decodeURIComponent(event.end));

if(isNaN(start)) {
context.fail("Invalid parameter ‘start’");
}
if(isNaN(end)) {
context.fail("Invalid parameter ‘end’");
}

duration = end – start;

if(duration 5 * MILISECONDS_DAY) {
context.fail("Too many results, try your request with a shorter duration");
}

orderList = [];
count = 0;

for(d = start; d < end; d += MILISECONDS_DAY) {
order = {
"orderId" : "order-" + count,
"orderTs" : (new Date(d).toISOString()),
"orderAmount" : Math.round(Math.random()*100.0)
};
count += 1;
orderList.push(order);
}

console.log(‘Generated’, count, ‘orders’);
context.succeed(orderList);
};

In Handler, leave the default value of index.handler.
In Role, choose Basic execution role or choose an existing role if you’ve created one for Lambda before.
In Advanced settings, leave the default values and choose Next.


Finally, review the settings in the next page and choose Create function.
Your Lambda function is now created. You can test it by sending a test event. Enter the following for your test event:
{
"start": "2015-10-01T00:00:00Z",
"end": "2015-10-04T00:00:00Z"
}
Check the execution result and log output to see the results of your test.

Next, choose the API endpoints tab and then choose Add API endpoint. In Add API endpoint, use the following values:

In API endpoint type, choose API Gateway
In API name, type StoreFront
In Resource name, type /orders
In Method, choose GET
In Deployment stage, use the default value of prod
In Security, choose Open to allow the API to be publicly accessed
Choose Submit to create the API


The API is created and the API endpoint URL is displayed for the Lambda function.
Next, switch to the API Gateway console and verify that the new API appears on the list of APIs. Choose StoreFront to view its details.
To view the method execution details, in the Resources pane, choose GET. Choose Integration Request to edit the method properties.

On the Integration Request details page, expand the Mapping Templates section and choose Add mapping template. In Content-Type, type application/json and choose the check mark to accept.

Choose the edit icon to the right of Input passthrough. From the drop down, choose Mapping template and copy and paste the mapping template text below into the Template input box. Choose the check mark to create the template.
{
#set($queryMap = $input.params().querystring)

#foreach( $key in $queryMap.keySet())
"$key" : "$queryMap.get($key)"
#if($foreach.hasNext),#end
#end
}
This step is needed because the Lambda function requires its input as a JSON document. The mapping template takes query string parameters from the GET request and creates a JSON input document. Mapping templates use Apache Velocity, expose a number of utility functions, and give you access to all of the incoming requests data and context parameters. You can learn more from the mapping template reference page.
Back to the GET method configuration page, in the left pane, choose the GET method and then open the Method Request settings. Expand the URL Query String Parameters section and choose Add query string. In Name, type start and choose the check mark to accept. Repeat the process to create a second parameter named end.
From the GET method configuration page, in the top left, choose Test to test your API. Type the following values for the query string parameters and then choose Test:

In start, type 2015-10-01T00:00:00Z
In end, type 2015-10-04T00:00:00Z

Verify that the response status is 200 and the response body contains a JSON response with 3 orders.
Now that your test is successful, you can deploy your changes to the production stage. In the Resources pane, choose Deploy API. In Deployment stage, choose prod. In Deployment description, type a description of the deployment, and then choose Deploy.
The prod Stage Editor page appears, displaying the Invoke URL. In the CloudWatch Settings section, choose Enable CloudWatch Logs so you can see logs and metrics from this stage. Keep in mind that CloudWatch logs are charged to your account separately from API Gateway.
You have now deployed an API that is backed by V1 of the Lambda function.
Testing V1 of the API
Now you’ll test V1 of the API with curl and confirm its behavior. First, copy the Invoke URL and add the query parameters ?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z and make a GET invocation using curl.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z"

[
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 82
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 3
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 75
}
]
This should output a JSON response with 3 orders. Next, check what happens if you use a longer time-range by changing the end timestamp to 2015-10-15T00:00:00Z:
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-15T00:00:00Z"

{
"errorMessage": "Too many results, try your request with a shorter duration"
}
You see that the API returns an error indicating the time range is too long. This is correct V1 API behavior, so you are all set.
Updating the Lambda Function to V2
Next, you will update the Lambda function code to V2. This simulates the scenario of the back end of your API changing in a manner that is not backward compatible.
Switch to the Lambda console and choose the getOrders function. In the code input box, copy and paste the code snippet below. Be sure to replace all of the existing V1 code with V2 code.
MILISECONDS_DAY = 3600*1000*24;

exports.handler = function(event, context) {
console.log(‘start =’, event.start);
console.log(‘end =’, event.end);

start = Date.parse(decodeURIComponent(event.start));
end = Date.parse(decodeURIComponent(event.end));

token = NaN;
if(event.token) {
s = new Buffer(event.token, ‘base64’).toString();
token = Date.parse(s);
}

if(isNaN(start)) {
context.fail("Invalid parameter ‘start’");
}
if(isNaN(end)) {
context.fail("Invalid parameter ‘end’");
}
if(!isNaN(token)) {
start = token;
}

duration = end – start;

if(duration <= 0) {
context.fail("Invalid parameters ‘end’ must be greater than ‘start’");
}

orderList = [];
count = 0;

console.log(‘start=’, start, ‘ end=’, end);

for(d = start; d < end && count < 5; d += MILISECONDS_DAY) {
order = {
"orderId" : "order-" + count,
"orderTs" : (new Date(d).toISOString()),
"orderAmount" : Math.round(Math.random()*100.0)
};
count += 1;
orderList.push(order);
}

nextToken = null;
if(d < end) {
nextToken = new Buffer(new Date(d).toISOString()).toString(‘base64’);
}

console.log(‘Generated’, count, ‘orders’);

result = {
orders : orderList,
};

if(nextToken) {
result.nextToken = nextToken;
}
context.succeed(result);
};
Choose Save to save V2 of the code. Then choose Test. Note that the output structure is different in V2 and there is a second level of nesting in the JSON document. This represents the updated V2 output structure that is different from V1.
Next, repeat the curl tests from the previous section. First, do a request for a short time duration. Notice that the response structure is nested differently from V1 and this is a problem for our API consumers that expect V1 responses.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z"

{
"orders": [
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 8
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 92
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 84
}
]
}
Now, repeat the request for a longer time range and you’ll see that instead of an error message, you now get the first page of information with 5 orders and a nextToken that will let you request the next page. This is the paginated behavior of V2 of the API.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-15T00:00:00Z"

{
"orders": [
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 62
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 59
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 21
},
{
"orderId": "order-3",
"orderTs": "2015-10-04T00:00:00.000Z",
"orderAmount": 95
},
{
"orderId": "order-4",
"orderTs": "2015-10-05T00:00:00.000Z",
"orderAmount": 84
}
],
"nextToken": "MjAxNS0xMC0wNlQwMDowMDowMC4wMDBa"
}
It is clear from these tests that V2 will break the current V1 consumer’s code. Next, we show how to isolate your V1 consumers from this change using API Gateway mapping templates.
Cloning the API
Because you want both V1 and V2 of the API to be available simultaneously to your API consumers, you first clone the API to create a V2 API. You then modify the V1 API to make it behave as your V1 consumers expect.
Go back to the API Gateway console, and choose Create API. Configure the new API with the following values:

In API name, type StoreFrontV2
In Clone from API, choose StoreFront
In Description, type a description
Choose Create API to clone the StoreFront API as StoreFrontV2

Open the StoreFrontV2 API and choose the GET method of the /orders resource. Next, choose Integration Request. Choose the edit icon next to the getOrders Lambda function name.
Keep the name as getOrders and choose the check mark to accept. In the pop up, choose OK to allow the StoreFrontV2 to invoke the Lambda function.
Once you have granted API Gateway permissions to access your Lambda function, choose Deploy API. In Deployment stage, choose New stage. In Stage name, type prod, and then choose Deploy. Now you have a new StoreFrontV2 API that invokes the same Lambda function. Confirm that the API has V2 behavior by testing it with curl. Use the Invoke URL for the StoreFrontV2 API instead of the previously used Invoke URL.
Update the V1 of the API
Now you will use mapping templates to update the original StoreFront API to preserve V1 behavior. This enables existing consumers to continue to consume the API without having to make any changes to their code.
Navigate to the API Gateway console, choose the StoreFront API and open the GET method of the /orders resource. On the Method Execution details page, choose Integration Response.
Expand the default response mapping (HTTP status 200), and expand the Mapping Templates section. Choose Add Mapping Template.
In Content-type, type application/json and choose the check mark to accept. Choose the edit icon next to Output passthrough to edit the mapping templates. Select Mapping template from the drop down and copy and paste the mapping template below into the Template input box.
#set($nextToken = $input.path(‘$.nextToken’))

#if($nextToken && $nextToken.length() != 0)
{
"errorMessage" : "Too many results, try your request with a shorter duration"
}
#else
$input.json(‘$.orders[*]’)
#end
Choose the check mark to accept and save. The mapping template transforms the V2 output from the Lambda function into the original V1 response. The mapping template also generates an error if the V2 response indicates that there are more results than can fit in one page. This emulates V1 behavior.
Finally click Save on the response mapping page. Deploy your StoreFront API and choose prod as the stage to deploy your changes.
Verify V1 behavior
Now that you have updated the original API to emulate V1 behavior, you can verify that using curl again. You will essentially repeat the tests from the earlier section. First, confirm that you have the Invoke URL for the original StoreFront API. You can always find the Invoke URL by looking at the stage details for the API.
Try a test with a short time range.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z"

[
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 50
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 16
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 14
}
]
Try a test with a longer time range and note that the V1 behavior of returning an error is recovered.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-15T00:00:00Z"

{
"errorMessage": "Too many results, try your request with a shorter duration"
}
Congratulations, you have successfully used Amazon API Gateway mapping templates to expose both V1 and V2 versions of your API allowing your API consumers to migrate to V2 on their own schedule.
Be sure to delete the two APIs and the AWS Lambda function that you created for this walkthrough to avoid being charged for their use.

Using API Gateway mapping templates to handle changes in your back-end APIs

Post Syndicated from Stefano Buliani original https://aws.amazon.com/blogs/compute/using-api-gateway-mapping-templates-to-handle-changes-in-your-back-end-apis/

Maitreya Ranganath Maitreya Ranganath, AWS Solutions Architect
Changes to APIs are always risky, especially if changes are made in ways that are not backward compatible. In this blog post, we show you how to use Amazon API Gateway mapping templates to isolate your API consumers from API changes. This enables your API consumers to migrate to new API versions on their own schedule.
For an example scenario, we start with a very simple Store Front API with one resource for orders and one GET method. For this example, the API target is implemented in AWS Lambda to keep things simple – but you can of course imagine the back end being your own endpoint.
The structure of the API V1 is:
Method: GET
Path: /orders
Query Parameters:
start = timestamp
end = timestamp

Response:
[
{
“orderId” : string,
“orderTs” : string,
“orderAmount” : number
}
]
The initial version (V1) of the API was implemented when there were few orders per day. The API was not paginated; if the number of orders that match the query is larger than 5, an error returns. The API consumer must then submit a request with a smaller time range.
The API V1 is exposed through API Gateway and you have several consumers of this API in Production.
After you upgrade the back end, the API developers make a change to support pagination. This makes the API more scalable and allows the API consumers to handle large lists of orders by paging through them with a token. This is a good design change but it breaks backward compatibility. It introduces a challenge because you have a large base of API consumers using V1 and their code can’t handle the changed nesting structure of this response.
The structure of API V2 is:
Method: GET
Path: /orders
Query Parameters:
start = timestamp
end = timestamp
token = string (optional)

Response:
{
“nextToken” : string,
“orders” : [
{
“orderId” : string,
“orderTs” : string
“orderAmount” : number
}
]
}
Using mapping templates, you can isolate your API consumers from this change: your existing V1 API consumers will not be impacted when you publish V2 of the API in parallel. You want to let your consumers migrate to V2 on their own schedule.
We’ll show you how to do that in this blog post. Let’s get started.
Deploying V1 of the API
To deploy V1 of the API, create a simple Lambda function and expose that through API Gateway:

Sign in to the AWS Lambda console.
Choose Create a Lambda function.
In Step 1: Select blueprint, choose Skip; you’ll enter the details for the Lambda function manually.
In Step 2: Configure function, use the following values:

In Name, type getOrders.
In Description, type Returns orders for a time-range.
In Runtime, choose Node.js.
For Code entry type, choose Edit code inline. Copy and paste the code snippet below into the code input box.

MILISECONDS_DAY = 3600*1000*24;

exports.handler = function(event, context) {
console.log(‘start =’, event.start);
console.log(‘end =’, event.end);

start = Date.parse(decodeURIComponent(event.start));
end = Date.parse(decodeURIComponent(event.end));

if(isNaN(start)) {
context.fail("Invalid parameter ‘start’");
}
if(isNaN(end)) {
context.fail("Invalid parameter ‘end’");
}

duration = end – start;

if(duration 5 * MILISECONDS_DAY) {
context.fail("Too many results, try your request with a shorter duration");
}

orderList = [];
count = 0;

for(d = start; d < end; d += MILISECONDS_DAY) {
order = {
"orderId" : "order-" + count,
"orderTs" : (new Date(d).toISOString()),
"orderAmount" : Math.round(Math.random()*100.0)
};
count += 1;
orderList.push(order);
}

console.log(‘Generated’, count, ‘orders’);
context.succeed(orderList);
};

In Handler, leave the default value of index.handler.
In Role, choose Basic execution role or choose an existing role if you’ve created one for Lambda before.
In Advanced settings, leave the default values and choose Next.


Finally, review the settings in the next page and choose Create function.
Your Lambda function is now created. You can test it by sending a test event. Enter the following for your test event:
{
"start": "2015-10-01T00:00:00Z",
"end": "2015-10-04T00:00:00Z"
}
Check the execution result and log output to see the results of your test.

Next, choose the API endpoints tab and then choose Add API endpoint. In Add API endpoint, use the following values:

In API endpoint type, choose API Gateway
In API name, type StoreFront
In Resource name, type /orders
In Method, choose GET
In Deployment stage, use the default value of prod
In Security, choose Open to allow the API to be publicly accessed
Choose Submit to create the API


The API is created and the API endpoint URL is displayed for the Lambda function.
Next, switch to the API Gateway console and verify that the new API appears on the list of APIs. Choose StoreFront to view its details.
To view the method execution details, in the Resources pane, choose GET. Choose Integration Request to edit the method properties.

On the Integration Request details page, expand the Mapping Templates section and choose Add mapping template. In Content-Type, type application/json and choose the check mark to accept.

Choose the edit icon to the right of Input passthrough. From the drop down, choose Mapping template and copy and paste the mapping template text below into the Template input box. Choose the check mark to create the template.
{
#set($queryMap = $input.params().querystring)

#foreach( $key in $queryMap.keySet())
"$key" : "$queryMap.get($key)"
#if($foreach.hasNext),#end
#end
}
This step is needed because the Lambda function requires its input as a JSON document. The mapping template takes query string parameters from the GET request and creates a JSON input document. Mapping templates use Apache Velocity, expose a number of utility functions, and give you access to all of the incoming requests data and context parameters. You can learn more from the mapping template reference page.
Back to the GET method configuration page, in the left pane, choose the GET method and then open the Method Request settings. Expand the URL Query String Parameters section and choose Add query string. In Name, type start and choose the check mark to accept. Repeat the process to create a second parameter named end.
From the GET method configuration page, in the top left, choose Test to test your API. Type the following values for the query string parameters and then choose Test:

In start, type 2015-10-01T00:00:00Z
In end, type 2015-10-04T00:00:00Z

Verify that the response status is 200 and the response body contains a JSON response with 3 orders.
Now that your test is successful, you can deploy your changes to the production stage. In the Resources pane, choose Deploy API. In Deployment stage, choose prod. In Deployment description, type a description of the deployment, and then choose Deploy.
The prod Stage Editor page appears, displaying the Invoke URL. In the CloudWatch Settings section, choose Enable CloudWatch Logs so you can see logs and metrics from this stage. Keep in mind that CloudWatch logs are charged to your account separately from API Gateway.
You have now deployed an API that is backed by V1 of the Lambda function.
Testing V1 of the API
Now you’ll test V1 of the API with curl and confirm its behavior. First, copy the Invoke URL and add the query parameters ?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z and make a GET invocation using curl.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z"

[
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 82
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 3
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 75
}
]
This should output a JSON response with 3 orders. Next, check what happens if you use a longer time-range by changing the end timestamp to 2015-10-15T00:00:00Z:
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-15T00:00:00Z"

{
"errorMessage": "Too many results, try your request with a shorter duration"
}
You see that the API returns an error indicating the time range is too long. This is correct V1 API behavior, so you are all set.
Updating the Lambda Function to V2
Next, you will update the Lambda function code to V2. This simulates the scenario of the back end of your API changing in a manner that is not backward compatible.
Switch to the Lambda console and choose the getOrders function. In the code input box, copy and paste the code snippet below. Be sure to replace all of the existing V1 code with V2 code.
MILISECONDS_DAY = 3600*1000*24;

exports.handler = function(event, context) {
console.log(‘start =’, event.start);
console.log(‘end =’, event.end);

start = Date.parse(decodeURIComponent(event.start));
end = Date.parse(decodeURIComponent(event.end));

token = NaN;
if(event.token) {
s = new Buffer(event.token, ‘base64’).toString();
token = Date.parse(s);
}

if(isNaN(start)) {
context.fail("Invalid parameter ‘start’");
}
if(isNaN(end)) {
context.fail("Invalid parameter ‘end’");
}
if(!isNaN(token)) {
start = token;
}

duration = end – start;

if(duration <= 0) {
context.fail("Invalid parameters ‘end’ must be greater than ‘start’");
}

orderList = [];
count = 0;

console.log(‘start=’, start, ‘ end=’, end);

for(d = start; d < end && count < 5; d += MILISECONDS_DAY) {
order = {
"orderId" : "order-" + count,
"orderTs" : (new Date(d).toISOString()),
"orderAmount" : Math.round(Math.random()*100.0)
};
count += 1;
orderList.push(order);
}

nextToken = null;
if(d < end) {
nextToken = new Buffer(new Date(d).toISOString()).toString(‘base64’);
}

console.log(‘Generated’, count, ‘orders’);

result = {
orders : orderList,
};

if(nextToken) {
result.nextToken = nextToken;
}
context.succeed(result);
};
Choose Save to save V2 of the code. Then choose Test. Note that the output structure is different in V2 and there is a second level of nesting in the JSON document. This represents the updated V2 output structure that is different from V1.
Next, repeat the curl tests from the previous section. First, do a request for a short time duration. Notice that the response structure is nested differently from V1 and this is a problem for our API consumers that expect V1 responses.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z"

{
"orders": [
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 8
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 92
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 84
}
]
}
Now, repeat the request for a longer time range and you’ll see that instead of an error message, you now get the first page of information with 5 orders and a nextToken that will let you request the next page. This is the paginated behavior of V2 of the API.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-15T00:00:00Z"

{
"orders": [
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 62
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 59
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 21
},
{
"orderId": "order-3",
"orderTs": "2015-10-04T00:00:00.000Z",
"orderAmount": 95
},
{
"orderId": "order-4",
"orderTs": "2015-10-05T00:00:00.000Z",
"orderAmount": 84
}
],
"nextToken": "MjAxNS0xMC0wNlQwMDowMDowMC4wMDBa"
}
It is clear from these tests that V2 will break the current V1 consumer’s code. Next, we show how to isolate your V1 consumers from this change using API Gateway mapping templates.
Cloning the API
Because you want both V1 and V2 of the API to be available simultaneously to your API consumers, you first clone the API to create a V2 API. You then modify the V1 API to make it behave as your V1 consumers expect.
Go back to the API Gateway console, and choose Create API. Configure the new API with the following values:

In API name, type StoreFrontV2
In Clone from API, choose StoreFront
In Description, type a description
Choose Create API to clone the StoreFront API as StoreFrontV2

Open the StoreFrontV2 API and choose the GET method of the /orders resource. Next, choose Integration Request. Choose the edit icon next to the getOrders Lambda function name.
Keep the name as getOrders and choose the check mark to accept. In the pop up, choose OK to allow the StoreFrontV2 to invoke the Lambda function.
Once you have granted API Gateway permissions to access your Lambda function, choose Deploy API. In Deployment stage, choose New stage. In Stage name, type prod, and then choose Deploy. Now you have a new StoreFrontV2 API that invokes the same Lambda function. Confirm that the API has V2 behavior by testing it with curl. Use the Invoke URL for the StoreFrontV2 API instead of the previously used Invoke URL.
Update the V1 of the API
Now you will use mapping templates to update the original StoreFront API to preserve V1 behavior. This enables existing consumers to continue to consume the API without having to make any changes to their code.
Navigate to the API Gateway console, choose the StoreFront API and open the GET method of the /orders resource. On the Method Execution details page, choose Integration Response.
Expand the default response mapping (HTTP status 200), and expand the Mapping Templates section. Choose Add Mapping Template.
In Content-type, type application/json and choose the check mark to accept. Choose the edit icon next to Output passthrough to edit the mapping templates. Select Mapping template from the drop down and copy and paste the mapping template below into the Template input box.
#set($nextToken = $input.path(‘$.nextToken’))

#if($nextToken && $nextToken.length() != 0)
{
"errorMessage" : "Too many results, try your request with a shorter duration"
}
#else
$input.json(‘$.orders[*]’)
#end
Choose the check mark to accept and save. The mapping template transforms the V2 output from the Lambda function into the original V1 response. The mapping template also generates an error if the V2 response indicates that there are more results than can fit in one page. This emulates V1 behavior.
Finally click Save on the response mapping page. Deploy your StoreFront API and choose prod as the stage to deploy your changes.
Verify V1 behavior
Now that you have updated the original API to emulate V1 behavior, you can verify that using curl again. You will essentially repeat the tests from the earlier section. First, confirm that you have the Invoke URL for the original StoreFront API. You can always find the Invoke URL by looking at the stage details for the API.
Try a test with a short time range.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-04T00:00:00Z"

[
{
"orderId": "order-0",
"orderTs": "2015-10-01T00:00:00.000Z",
"orderAmount": 50
},
{
"orderId": "order-1",
"orderTs": "2015-10-02T00:00:00.000Z",
"orderAmount": 16
},
{
"orderId": "order-2",
"orderTs": "2015-10-03T00:00:00.000Z",
"orderAmount": 14
}
]
Try a test with a longer time range and note that the V1 behavior of returning an error is recovered.
$ curl -s "https://your-invoke-url-and-path/orders?start=2015-10-01T00:00:00Z&end=2015-10-15T00:00:00Z"

{
"errorMessage": "Too many results, try your request with a shorter duration"
}
Congratulations, you have successfully used Amazon API Gateway mapping templates to expose both V1 and V2 versions of your API allowing your API consumers to migrate to V2 on their own schedule.
Be sure to delete the two APIs and the AWS Lambda function that you created for this walkthrough to avoid being charged for their use.

Agile Analytics with Amazon Redshift

Post Syndicated from Nick Corbett original https://blogs.aws.amazon.com/bigdata/post/Tx3HHDIXCZFDGTN/Agile-Analytics-with-Amazon-Redshift

Nick Corbett is a Big Data Consultant for AWS Professional Services

What makes outstanding business intelligence (BI)? It needs to be accurate and up-to-date, but this alone won’t differentiate a solution. Perhaps a better measure is to consider the reaction you get when your latest report or metric is released to the business. Good BI excites:  it prompts new ways of thinking and new ideas that, more often than not, require change to support. Businesses are constantly looking to evolve, to use their BI to gain insight and competitive advantage. Truly outstanding BI is agile enough to keep pace with this demand.

In this post, I show you how your Amazon Redshift data warehouse can be agile. To do this, you need to adopt a continuous delivery (CD) approach that draws on many of the tools and techniques already successfully used by software engineers. CD focuses on automating the release process, allowing quick and frequent deployments to production whilst ensuring quality is maintained. In return, you will enjoy many benefits; for example, you can identify issues quickly, and those that you do find are likely to be less complex.  By shortening your release cycle, you can respond to requests from your business more quickly.

A simple CD process, or pipeline, is shown below.  This uses a combination of AWS fully-managed services and the extensible, open-source, continuous integration server Jenkins. You can follow the instructions at the AWS Big Data Blog repository on GitHub to build your own sample environment and learn how to configure these components. The repository includes a CloudFormation script to set up and configure a Jenkins server in a virtual private cloud (VPC) and instructions for setting up AWS CodeCommit and AWS CodePipeline.  Note that starting this environment incurs a charge in your account, although all the AWS services used are eligible for the AWS Free Tier.

Build a Deployment Package

One of the key assumptions of CD is that no one directly interacts with the production system; all deployments and updates are fully automated. The logical conclusion of this assumption is that everything required to build, maintain, update and test your data warehouse should be scripted and under source control. The sample environment uses AWS CodeCommit as a source control repository. AWS CodeCommit is a fully-managed source control service that makes it easy for companies to host secure and highly scalable private Git repositories.

The first stage of your CD pipeline is to build a deployment package. This is a snapshot of a branch of your source code repository and is immutable. After it’s built, the same package is used for all your testing and finally deployed to your production system.

Each time you build a new package you should aim to answer one question: can I deploy this to my production system?  You can answer ‘yes’ when you are satisfied that the new features in the deployment work as expected and that don’t break anything that’s already part of your production system. The faster you can answer this question, the better. Testing each package costs you, both in infrastructure costs and people resources if manual testing is needed. The more testing you do on a package, the more expensive it becomes. You should look to build a process that fails fast by performing simple, automated tests first and only spend time manually testing builds that you know are of good quality.

The environment uses a Jenkins job to build the deployment package. Jenkins is configured to scan the repository in AWS CodeCommit periodically and trigger a job when changes are detected. The sample pipeline scans the master branch, but you can configure Jenkins to scan any branch, perhaps to test new features as they are being developed. The sample job simply zips all the files in the repository before uploading to an Amazon S3 bucket that has versioning enabled. With fail-fast methodology, you could use this opportunity to verify that all files in the package conform to your coding standards; for example, do they have the correct header and have naming standards been followed?  At this point, you don’t have a Amazon Redshift database to test things in, but you can do quick and easy tests on the files, such as using grep commands to check the contents.

Test the deployment package

The remaining stages in the pipeline are managed by AWS CodePipeline, a continuous delivery service for fast and reliable updates. AWS CodePipeline builds, tests, and deploys your code every time there is a change, based on the release process models that you define. This enables you to deliver features and updates rapidly and reliably.

The CD pipeline shown above uses AWS CodePipeline to run tests in two environments before deploying the package to the production system. The tests have been split into a set of functional tests, run against a small set of data, and non-functional tests run against a production-sized Amazon Redshift cluster. You may choose to organize your CD pipeline differently, but your process will probably involve running one or more test stages followed by a deployment to production. Each stage of the pipeline is implemented by a Jenkins job that carries out four main tasks.

1. Build a test environment

To run tests, you need an Amazon Redshift database. The sample job uses the AWS CloudFormation Plugin to create a VPC with a public facing, single-node Amazon Redshift database. The configuration of the plugin is shown below:

As you can see from the screenshot, the job is configured to build the CloudFormation template found in ./cloudformation/redshiftvpc.json within your deployment package. The definition of your test environment is held in the same source control as the other project assets.

The sample job creates an empty Amazon Redshift data warehouse and creates tables using SQL scripts. However, you can use the snapshotidentifer property of the Amazon Redshift object in the CloudFormation template to create a data warehouse from a point-in-time backup of a cluster. You may have a pre-prepared snapshot that has the same structure as your production system but contains test data. Alternatively, you might first use the AWS Command Line Interface (CLI) to call describe-cluster-snapshots and find the ID of the latest production snapshot.  Creating a test environment that mirrors production is a good choice for non-functional testing, especially performance tests.

Any outputs that are returned by your CloudFormation stack are automatically added as variables in your build environment and can be used in subsequent steps.  The variables are named <stack name>_<output name>.  In the sample job, you can see that the endpoint of the new Amazon Redshift database is used by subsequent steps.

2. Apply the update in the deployment package

The sample job takes the latest folder in the ./updates folder of the deployment package and runs all the .sql files against the data warehouse.  This assumes that you’ve asked your developers to provide scripts for all the changes they want to make to the system. If any of the .sql files fails to run, then the overall Jenkins job fails. If this occurs, you should ask your development team to fix the issue and restart the pipeline with a new deployment package.

3. Run the tests

You now have a populated Amazon Redshift data warehouse that you’ve successfully updated with the deployment package. You should run a series of tests to ensure that the new features work as expected and that the deployment hasn’t broken any existing functionality.

For example, imagine that your data warehouse has a customer table with a name column and you’ve asked your developers to refactor this to first_name and last_name. You would expect your developers to provide an ALTER TABLE  statement to change the definition of your customer table and an UPDATE statement to populate the new columns from the existing name column.

After the update scripts have run, the most basic check would be to confirm that the customer table contains first_name and last_name columns. However, you should also test the script that populates these new columns.  What happened if the name column contained three words, such as John Robert Smith? What if it contained initials or suffixes, such as JJ Smith Esq? What if it were null or contained a single very long name that might be truncated due to its length? You should also look at how your customer names are used. Perhaps you have a view used by a report that shows your best customers – does the name of each customer still appear correctly now that it is a concatenation of two fields?

Over time, you can build a library of tests that are run whenever you have a new deployment package. This ensures that you not only test new features but don’t regress in any other areas.

You may also choose to run non-functional tests. For example, is the query plan for the view that drives your best customer report still the same? Does the query still execute in the same amount of time? Is access to use the view restricted to the correct groups?

The sample job contains the code to run a set of SQL statements and check the result against what is expected. The pseudo-code is shown below:

for each .sql file in ./tests
result = run contents of .sql file against Amazon Redshift
expected = contents of .result file
if [ Ignore Case and White Space ] Result != Expected
// test fail
exit 1
end for
exit 0

The test loops over all .sql files in the ./tests folder and executes each one. It then compares the result of the query with the contents of the .result file with the same name. For example, if message_test1.sql is executed, the value returned is compared with the contents of message_test1.sql.result. This comparison is made after removing all whitespace. If the result is not what was expected, the Jenkins job ends in failure. You can adapt this bash script to include performance statistics by monitoring the execution duration.

At the end of your testing, you can decide whether the deployment meets your quality bar. It may be that you still need to do manual testing before promoting to production, but this should only be performed after all automated tests have been passed. You may want to avoid involving your test team and the extra expense unless there’s a high chance that you will take this deployment forward to production.

4. Delete your test environment

After you have finished your testing, you can delete your test environment.  This can be done by your Jenkins server, meaning that the entire process of creating an environment, running the tests, and deleting the environment can be automated. You can save money with a transient environment that’s created only when you need to run tests and then immediately deleted. If you do need to perform manual testing, you can configure your Jenkins job to not delete the cluster.

Deploy to production

You are now ready to deploy the changes to your production environment. Because the same set of update scripts in the same deployment package have already been successfully run and tested in pre-production, you can be confident that this change will be easy to roll out. In AWS CodePipeline, the transition between the PreProduction and Production stages has been disabled:

When you re-enable a disabled transition, the latest revision runs through the remaining stages of the pipeline. At the end of the automated process, the decision to release to production is manual: it can be initiated by your release manager either using either the AWS Management Console or the API.

Post-deployment tasks

After you’ve deployed to production, you may want to have a final stage in your pipeline that sets up for the next round of testing. If you are using an environment containing test data for your functional testing, then a new snapshot is required for your next release. You could also implement some of the Top 10 Performance Tuning Tasks to make sure that your Amazon Redshift data warehouse is in good health following the latest update.

Summary

In this post, I have shown you how to build a CD pipeline for your Amazon Redshift data warehouse. The pipeline presented here is fairly simple but hopefully it’s easy to see how the concept can be expanded with more stages or more sophisticated testing.

However, remember that a simple pipeline is better than no pipeline at all. You should take a holistic view:  your ‘system’ is the Amazon Redshift data warehouse and the CD pipeline needed to maintain it. Start simple, iterate with each development sprint, and build complexity as you go. Work towards an agile data warehouse that can keep pace with your business, leading change rather than reacting to it. 

If you have any questions or suggestions, please leave a comment below.

————————–

Related:

Top 10 Performance Tuning Techniques for Amazon Redshift

 

 

Agile Analytics with Amazon Redshift

Post Syndicated from Nick Corbett original https://blogs.aws.amazon.com/bigdata/post/Tx3HHDIXCZFDGTN/Agile-Analytics-with-Amazon-Redshift

Nick Corbett is a Big Data Consultant for AWS Professional Services

What makes outstanding business intelligence (BI)? It needs to be accurate and up-to-date, but this alone won’t differentiate a solution. Perhaps a better measure is to consider the reaction you get when your latest report or metric is released to the business. Good BI excites:  it prompts new ways of thinking and new ideas that, more often than not, require change to support. Businesses are constantly looking to evolve, to use their BI to gain insight and competitive advantage. Truly outstanding BI is agile enough to keep pace with this demand.

In this post, I show you how your Amazon Redshift data warehouse can be agile. To do this, you need to adopt a continuous delivery (CD) approach that draws on many of the tools and techniques already successfully used by software engineers. CD focuses on automating the release process, allowing quick and frequent deployments to production whilst ensuring quality is maintained. In return, you will enjoy many benefits; for example, you can identify issues quickly, and those that you do find are likely to be less complex.  By shortening your release cycle, you can respond to requests from your business more quickly.

A simple CD process, or pipeline, is shown below.  This uses a combination of AWS fully-managed services and the extensible, open-source, continuous integration server Jenkins. You can follow the instructions at the AWS Big Data Blog repository on GitHub to build your own sample environment and learn how to configure these components. The repository includes a CloudFormation script to set up and configure a Jenkins server in a virtual private cloud (VPC) and instructions for setting up AWS CodeCommit and AWS CodePipeline.  Note that starting this environment incurs a charge in your account, although all the AWS services used are eligible for the AWS Free Tier.

Build a Deployment Package

One of the key assumptions of CD is that no one directly interacts with the production system; all deployments and updates are fully automated. The logical conclusion of this assumption is that everything required to build, maintain, update and test your data warehouse should be scripted and under source control. The sample environment uses AWS CodeCommit as a source control repository. AWS CodeCommit is a fully-managed source control service that makes it easy for companies to host secure and highly scalable private Git repositories.

The first stage of your CD pipeline is to build a deployment package. This is a snapshot of a branch of your source code repository and is immutable. After it’s built, the same package is used for all your testing and finally deployed to your production system.

Each time you build a new package you should aim to answer one question: can I deploy this to my production system?  You can answer ‘yes’ when you are satisfied that the new features in the deployment work as expected and that don’t break anything that’s already part of your production system. The faster you can answer this question, the better. Testing each package costs you, both in infrastructure costs and people resources if manual testing is needed. The more testing you do on a package, the more expensive it becomes. You should look to build a process that fails fast by performing simple, automated tests first and only spend time manually testing builds that you know are of good quality.

The environment uses a Jenkins job to build the deployment package. Jenkins is configured to scan the repository in AWS CodeCommit periodically and trigger a job when changes are detected. The sample pipeline scans the master branch, but you can configure Jenkins to scan any branch, perhaps to test new features as they are being developed. The sample job simply zips all the files in the repository before uploading to an Amazon S3 bucket that has versioning enabled. With fail-fast methodology, you could use this opportunity to verify that all files in the package conform to your coding standards; for example, do they have the correct header and have naming standards been followed?  At this point, you don’t have a Amazon Redshift database to test things in, but you can do quick and easy tests on the files, such as using grep commands to check the contents.

Test the deployment package

The remaining stages in the pipeline are managed by AWS CodePipeline, a continuous delivery service for fast and reliable updates. AWS CodePipeline builds, tests, and deploys your code every time there is a change, based on the release process models that you define. This enables you to deliver features and updates rapidly and reliably.

The CD pipeline shown above uses AWS CodePipeline to run tests in two environments before deploying the package to the production system. The tests have been split into a set of functional tests, run against a small set of data, and non-functional tests run against a production-sized Amazon Redshift cluster. You may choose to organize your CD pipeline differently, but your process will probably involve running one or more test stages followed by a deployment to production. Each stage of the pipeline is implemented by a Jenkins job that carries out four main tasks.

1. Build a test environment

To run tests, you need an Amazon Redshift database. The sample job uses the AWS CloudFormation Plugin to create a VPC with a public facing, single-node Amazon Redshift database. The configuration of the plugin is shown below:

As you can see from the screenshot, the job is configured to build the CloudFormation template found in ./cloudformation/redshiftvpc.json within your deployment package. The definition of your test environment is held in the same source control as the other project assets.

The sample job creates an empty Amazon Redshift data warehouse and creates tables using SQL scripts. However, you can use the snapshotidentifer property of the Amazon Redshift object in the CloudFormation template to create a data warehouse from a point-in-time backup of a cluster. You may have a pre-prepared snapshot that has the same structure as your production system but contains test data. Alternatively, you might first use the AWS Command Line Interface (CLI) to call describe-cluster-snapshots and find the ID of the latest production snapshot.  Creating a test environment that mirrors production is a good choice for non-functional testing, especially performance tests.

Any outputs that are returned by your CloudFormation stack are automatically added as variables in your build environment and can be used in subsequent steps.  The variables are named <stack name>_<output name>.  In the sample job, you can see that the endpoint of the new Amazon Redshift database is used by subsequent steps.

2. Apply the update in the deployment package

The sample job takes the latest folder in the ./updates folder of the deployment package and runs all the .sql files against the data warehouse.  This assumes that you’ve asked your developers to provide scripts for all the changes they want to make to the system. If any of the .sql files fails to run, then the overall Jenkins job fails. If this occurs, you should ask your development team to fix the issue and restart the pipeline with a new deployment package.

3. Run the tests

You now have a populated Amazon Redshift data warehouse that you’ve successfully updated with the deployment package. You should run a series of tests to ensure that the new features work as expected and that the deployment hasn’t broken any existing functionality.

For example, imagine that your data warehouse has a customer table with a name column and you’ve asked your developers to refactor this to first_name and last_name. You would expect your developers to provide an ALTER TABLE  statement to change the definition of your customer table and an UPDATE statement to populate the new columns from the existing name column.

After the update scripts have run, the most basic check would be to confirm that the customer table contains first_name and last_name columns. However, you should also test the script that populates these new columns.  What happened if the name column contained three words, such as John Robert Smith? What if it contained initials or suffixes, such as JJ Smith Esq? What if it were null or contained a single very long name that might be truncated due to its length? You should also look at how your customer names are used. Perhaps you have a view used by a report that shows your best customers – does the name of each customer still appear correctly now that it is a concatenation of two fields?

Over time, you can build a library of tests that are run whenever you have a new deployment package. This ensures that you not only test new features but don’t regress in any other areas.

You may also choose to run non-functional tests. For example, is the query plan for the view that drives your best customer report still the same? Does the query still execute in the same amount of time? Is access to use the view restricted to the correct groups?

The sample job contains the code to run a set of SQL statements and check the result against what is expected. The pseudo-code is shown below:

for each .sql file in ./tests
result = run contents of .sql file against Amazon Redshift
expected = contents of .result file
if [ Ignore Case and White Space ] Result != Expected
// test fail
exit 1
end for
exit 0

The test loops over all .sql files in the ./tests folder and executes each one. It then compares the result of the query with the contents of the .result file with the same name. For example, if message_test1.sql is executed, the value returned is compared with the contents of message_test1.sql.result. This comparison is made after removing all whitespace. If the result is not what was expected, the Jenkins job ends in failure. You can adapt this bash script to include performance statistics by monitoring the execution duration.

At the end of your testing, you can decide whether the deployment meets your quality bar. It may be that you still need to do manual testing before promoting to production, but this should only be performed after all automated tests have been passed. You may want to avoid involving your test team and the extra expense unless there’s a high chance that you will take this deployment forward to production.

4. Delete your test environment

After you have finished your testing, you can delete your test environment.  This can be done by your Jenkins server, meaning that the entire process of creating an environment, running the tests, and deleting the environment can be automated. You can save money with a transient environment that’s created only when you need to run tests and then immediately deleted. If you do need to perform manual testing, you can configure your Jenkins job to not delete the cluster.

Deploy to production

You are now ready to deploy the changes to your production environment. Because the same set of update scripts in the same deployment package have already been successfully run and tested in pre-production, you can be confident that this change will be easy to roll out. In AWS CodePipeline, the transition between the PreProduction and Production stages has been disabled:

When you re-enable a disabled transition, the latest revision runs through the remaining stages of the pipeline. At the end of the automated process, the decision to release to production is manual: it can be initiated by your release manager either using either the AWS Management Console or the API.

Post-deployment tasks

After you’ve deployed to production, you may want to have a final stage in your pipeline that sets up for the next round of testing. If you are using an environment containing test data for your functional testing, then a new snapshot is required for your next release. You could also implement some of the Top 10 Performance Tuning Tasks to make sure that your Amazon Redshift data warehouse is in good health following the latest update.

Summary

In this post, I have shown you how to build a CD pipeline for your Amazon Redshift data warehouse. The pipeline presented here is fairly simple but hopefully it’s easy to see how the concept can be expanded with more stages or more sophisticated testing.

However, remember that a simple pipeline is better than no pipeline at all. You should take a holistic view:  your ‘system’ is the Amazon Redshift data warehouse and the CD pipeline needed to maintain it. Start simple, iterate with each development sprint, and build complexity as you go. Work towards an agile data warehouse that can keep pace with your business, leading change rather than reacting to it. 

If you have any questions or suggestions, please leave a comment below.

————————–

Related:

Top 10 Performance Tuning Techniques for Amazon Redshift

 

 

How to Version D-Bus Interfaces Properly and Why Using / as Service Entry Point Sucks

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/versioning-dbus.html

So you are designing a D-Bus interface and want to make it future-proof. Of
course, you thought about versioning your stuff. But you wonder how to do that
best. Here are a few things I learned about versioning D-Bus APIs which might
be of general interest:

Version your interfaces! This one is pretty obvious. No explanation
needed. Simply include the interface version in the interface name as suffix.
i.e. the initial release should use org.foobar.AwesomeStuff1, and if
you do changes you should introduce org.foobar.AwesomeStuff2, and so
on, possibly dropping the old interface.

When should you bump the interface version? Generally, I’d recommend only
bumping when doing incompatible changes, such as function call signature
changes. This of course requires clients to handle the
org.freedesktop.DBus.Error.UnknownMethod error properly for each function
you add to an existing interface. That said, in a few cases it might make sense
to bump the interface version even without breaking compatibility of the calls.
(e.g. in case you add something to an interface that is not directly visible in
the introspection data)

Version your services! This one is almost as obvious. When you
completely rework your D-Bus API introducing a new service name might be a
good idea. Best way to do this is by simply bumping the service name. Hence,
call your service org.foobar.AwesomeService1 right from the beginning
and then bump the version if you reinvent the wheel. And don’t forget that you
can acquire more than one well-known service name on the bus, so even if you
rework everything you can keep compatibilty. (Example: BlueZ 3 to BlueZ 4 switch)

Version your ‘entry point’ object paths! This one is far from
obvious. The reasons why object paths should be versioned are purely technical,
not philosophical: for signals sent from a service D-Bus overwrites the
originating service name by the unique name (e.g. :1.42) even if you
fill in a well-known name (e.g. org.foobar.AwesomeService1). Now,
let’s say your application registers two well-known service names, let’s say
two versions of the same service, versioned like mentioned above. And you have
two objects — one on each of the two service names — that implement a generic
interface and share the same object path: for the client there will be no way
to figure out to which service name the signals sent from this object path
belong. And that’s why you should make sure to use versioned and hence
different paths for both objects. i.e. start with
/org/foobar/AwesomeStuff1 and then bump to
/org/foobar/AwesomeStuff2 and so on. (Also see David’s comments about this.)

When should you bump the object path version? Probably only when you
bump the service name it belongs to. Important is to version the ‘entry point’
object path. Objects below that don’t need explicit versioning.

In summary: For good D-Bus API design you should version all three: D-Bus interfaces, service names and ‘entry point’ object paths.

And don’t forget: nobody gets API design right the first time. So even if
you think your D-Bus API is perfect: version things right from the beginning
because later on it might turn out you were not quite as bright as you thought
you were.

A corollary from the reasoning behind versioning object paths as described
above is that using / as entry point object path for your service is a
bad idea. It makes it very hard to implement more than one service or service
version on a single D-Bus connection. Again: Don’t use / as entry
point object path. Use something like /org/foobar/AwesomeStuff!

How to Version D-Bus Interfaces Properly and Why Using / as Service Entry Point Sucks

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/versioning-dbus.html

So you are designing a D-Bus interface and want to make it future-proof. Of
course, you thought about versioning your stuff. But you wonder how to do that
best. Here are a few things I learned about versioning D-Bus APIs which might
be of general interest:

Version your interfaces! This one is pretty obvious. No explanation
needed. Simply include the interface version in the interface name as suffix.
i.e. the initial release should use org.foobar.AwesomeStuff1, and if
you do changes you should introduce org.foobar.AwesomeStuff2, and so
on, possibly dropping the old interface.

When should you bump the interface version? Generally, I’d recommend only
bumping when doing incompatible changes, such as function call signature
changes. This of course requires clients to handle the
org.freedesktop.DBus.Error.UnknownMethod error properly for each function
you add to an existing interface. That said, in a few cases it might make sense
to bump the interface version even without breaking compatibility of the calls.
(e.g. in case you add something to an interface that is not directly visible in
the introspection data)

Version your services! This one is almost as obvious. When you
completely rework your D-Bus API introducing a new service name might be a
good idea. Best way to do this is by simply bumping the service name. Hence,
call your service org.foobar.AwesomeService1 right from the beginning
and then bump the version if you reinvent the wheel. And don’t forget that you
can acquire more than one well-known service name on the bus, so even if you
rework everything you can keep compatibilty. (Example: BlueZ 3 to BlueZ 4 switch)

Version your ‘entry point’ object paths! This one is far from
obvious. The reasons why object paths should be versioned are purely technical,
not philosophical: for signals sent from a service D-Bus overwrites the
originating service name by the unique name (e.g. :1.42) even if you
fill in a well-known name (e.g. org.foobar.AwesomeService1). Now,
let’s say your application registers two well-known service names, let’s say
two versions of the same service, versioned like mentioned above. And you have
two objects — one on each of the two service names — that implement a generic
interface and share the same object path: for the client there will be no way
to figure out to which service name the signals sent from this object path
belong. And that’s why you should make sure to use versioned and hence
different paths for both objects. i.e. start with
/org/foobar/AwesomeStuff1 and then bump to
/org/foobar/AwesomeStuff2 and so on. (Also see David’s comments about this.)

When should you bump the object path version? Probably only when you
bump the service name it belongs to. Important is to version the ‘entry point’
object path. Objects below that don’t need explicit versioning.

In summary: For good D-Bus API design you should version all three: D-Bus interfaces, service names and ‘entry point’ object paths.

And don’t forget: nobody gets API design right the first time. So even if
you think your D-Bus API is perfect: version things right from the beginning
because later on it might turn out you were not quite as bright as you thought
you were.

A corollary from the reasoning behind versioning object paths as described
above is that using / as entry point object path for your service is a
bad idea. It makes it very hard to implement more than one service or service
version on a single D-Bus connection. Again: Don’t use / as entry
point object path. Use something like /org/foobar/AwesomeStuff!