Tag Archives: AWS CloudFormation

New – Import Existing Resources into a CloudFormation Stack

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-import-existing-resources-into-a-cloudformation-stack/

With AWS CloudFormation, you can model your entire infrastructure with text files. In this way, you can treat your infrastructure as code and apply software development best practices, such as putting it under version control, or reviewing architectural changes with your team before deployment.

Sometimes AWS resources initially created using the console or the AWS Command Line Interface (CLI) need to be managed using CloudFormation. For example, you (or a different team) may create an IAM role, a Virtual Private Cloud, or an RDS database in the early stages of a migration, and then you have to spend time to include them in the same stack as the final application. In such cases, you often end up recreating the resources from scratch using CloudFormation, and then migrating configuration and data from the original resource.

To make these steps easier for our customers, you can now import existing resources into a CloudFormation stack!

It was already possible to remove resources from a stack without deleting them by setting the DeletionPolicy to Retain. This, together with the new import operation, enables a new range of possibilities. For example, you are now able to:

  • Create a new stack importing existing resources.
  • Import existing resources in an already created stack.
  • Migrate resources across stacks.
  • Remediate a detected drift.
  • Refactor nested stacks by deleting children stacks from one parent and then importing them into another parent stack.

To import existing resources into a CloudFormation stack, you need to provide:

  • A template that describes the entire stack, including both the resources to import and (for existing stacks) the resources that are already part of the stack.
  • Each resource to import must have a DeletionPolicy attribute in the template. This enables easy reverting of the operation in a completely safe manner.
  • A unique identifier for each target resource, for example the name of the Amazon DynamoDB table or of the Amazon Simple Storage Service (S3) bucket you want to import.

During the resource import operation, CloudFormation checks that:

  • The imported resources do not already belong to another stack in the same region (be careful with global resources such as IAM roles).
  • The target resources exist and you have sufficient permissions to perform the operation.
  • The properties and configuration values are valid against the resource type schema, which defines its required, acceptable properties, and supported values.

The resource import operation does not check that the template configuration and the actual configuration are the same. Since the import operation supports the same resource types as drift detection, I recommend running drift detection after importing resources in a stack.

Importing Existing Resources into a New Stack
In my AWS account, I have an S3 bucket and a DynamoDB table, both with some data inside, and I’d like to manage them using CloudFormation. In the CloudFormation console, I have two new options:

  • I can create a new stack importing existing resources.

  • I can import resources into an existing stack.

In this case, I want to start from scratch, so I create a new stack. The next step is to provide a template with the resources to import.

I upload the following template with two resources to import: a DynamoDB table and an S3 bucket.

AWSTemplateFormatVersion: "2010-09-09"
Description: Import test
Resources:

  ImportedTable:
    Type: AWS::DynamoDB::Table
    DeletionPolicy: Retain
    Properties: 
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions: 
        - AttributeName: id
          AttributeType: S
      KeySchema: 
        - AttributeName: id
          KeyType: HASH

  ImportedBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain

In this template I am setting DeletionPolicy  to Retain for both resources. In this way, if I remove them from the stack, they will not be deleted. This is a good option for resources which contain data you don’t want to delete by mistake, or that you may want to move to a different stack in the future. It is mandatory for imported resources to have a deletion policy set, so you can safely and easily revert the operation, and be protected from mistakenly deleting resources that were imported by someone else.

I now have to provide an identifier to map the logical IDs in the template with the existing resources. In this case, I use the DynamoDB table name and the S3 bucket name. For other resource types, there may be multiple ways to identify them and you can select which property to use in the drop-down menus.

In the final recap, I review changes before applying them. Here I check that I’m targeting the right resources to import with the right identifiers. This is actually a CloudFormation Change Set that will be executed when I import the resources.

When importing resources into an existing stack, no changes are allowed to the existing resources of the stack. The import operation will only allow the Change Set action of Import. Changes to parameters are allowed as long as they don’t cause changes to resolved values of properties in existing resources. You can change the template for existing resources to replace hard coded values with a Ref to a resource being imported. For example, you may have a stack with an EC2 instance using an existing IAM role that was created using the console. You can now import the IAM role into the stack and replace in the template the hard coded value used by the EC2 instance with a Ref to the role.

Moving on, each resource has its corresponding import events in the CloudFormation console.

When the import is complete, in the Resources tab, I see that the S3 bucket and the DynamoDB table are now part of the stack.

To be sure the imported resources are in sync with the stack template, I use drift detection.

All stack-level tags, including automatically created tags, are propagated to resources that CloudFormation supports. For example, I can use the AWS CLI to get the tag set associated with the S3 bucket I just imported into my stack. Those tags give me the CloudFormation stack name and ID, and the logical ID of the resource in the stack template:

$ aws s3api get-bucket-tagging --bucket danilop-toimport

{
  "TagSet": [
    {
      "Key": "aws:cloudformation:stack-name",
      "Value": "imported-stack"
    },
    {
      "Key": "aws:cloudformation:stack-id",
      "Value": "arn:aws:cloudformation:eu-west-1:123412341234:stack/imported-stack/..."
    },
    {
      "Key": "aws:cloudformation:logical-id",
      "Value": "ImportedBucket"
    }
  ]
}

Available Now
You can use the new CloudFormation import operation via the console, AWS Command Line Interface (CLI), or AWS SDKs, in the following regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), EU (London), EU (Paris), and South America (São Paulo).

It is now simpler to manage your infrastructure as code, you can learn more on bringing existing resources into CloudFormation management in the documentation.

Danilo

ICYMI: Serverless Q3 2019

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2019/

This post is courtesy of Julian Wood, Senior Developer Advocate – AWS Serverless

Welcome to the seventh edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

ICYMI calendar

Launches/New products

Amazon EventBridge was technically launched in this quarter although we were so excited to let you know, we squeezed it into the Q2 2019 update. If you missed it, EventBridge is the serverless event bus that connects application data from your own apps, SaaS, and AWS services. This allows you to create powerful event-driven serverless applications using a variety of event sources.

The AWS Bahrain Region has opened, the official name is Middle East (Bahrain) and the API name is me-south-1. AWS Cloud now spans 22 geographic Regions with 69 Availability Zones around the world.

AWS Lambda

In September we announced dramatic improvements in cold starts for Lambda functions inside a VPC. With this announcement, you see faster function startup performance and more efficient usage of elastic network interfaces, drastically reducing VPC cold starts.

VPC to VPC NAT

These improvements are rolling out to all existing and new VPC functions at no additional cost. Rollout is ongoing, you can track the status from the announcement post.

AWS Lambda now supports custom batch window for Kinesis and DynamoDB Event sources, which helps fine-tune Lambda invocation for cost optimization.

You can now deploy Amazon Machine Images (AMIs) and Lambda functions together from the AWS Marketplace using using AWS CloudFormation with just a few clicks.

AWS IoT Events actions now support AWS Lambda as a target. Previously you could only define actions to publish messages to SNS and MQTT. Now you can define actions to invoke AWS Lambda functions and even more targets, such as Amazon Simple Queue Service and Amazon Kinesis Data Firehose, and republish messages to IoT Events.

The AWS Lambda Console now shows recent invocations using CloudWatch Logs Insights. From the monitoring tab in the console, you can view duration, billing, and memory statistics for the 10 most recent invocations.

AWS Step Functions

AWS Step Functions example

AWS Step Functions has now been extended to support probably its most requested feature, Dynamic Parallelism, which allows steps within a workflow to be executed in parallel, with a new Map state type.

One way to use the new Map state is for fan-out or scatter-gather messaging patterns in your workflows:

  • Fan-out is applied when delivering a message to multiple destinations, and can be useful in workflows such as order processing or batch data processing. For example, you can retrieve arrays of messages from Amazon SQS and Map sends each message to a separate AWS Lambda function.
  • Scatter-gather broadcasts a single message to multiple destinations (scatter), and then aggregates the responses back for the next steps (gather). This is useful in file processing and test automation. For example, you can transcode ten 500-MB media files in parallel, and then join to create a 5-GB file.

Another important update is AWS Step Functions adds support for nested workflows, which allows you to orchestrate more complex processes by composing modular, reusable workflows.

AWS Amplify

A new Predictions category as been added to the Amplify Framework to quickly add machine learning capabilities to your web and mobile apps.

Amplify framework

With a few lines of code you can add and configure AI/ML services to configure your app to:

  • Identify text, entities, and labels in images using Amazon Rekognition, or identify text in scanned documents to get the contents of fields in forms and information stored in tables using Amazon Textract.
  • Convert text into a different language using Amazon Translate, text to speech using Amazon Polly, and speech to text using Amazon Transcribe.
  • Interpret text to find the dominant language, the entities, the key phrases, the sentiment, or the syntax of unstructured text using Amazon Comprehend.

AWS Amplify CLI (part of the open source Amplify Framework) has added local mocking and testing. This allows you to mock some of the most common cloud services and test your application 100% locally.

For this first release, the Amplify CLI can mock locally:

amplify mock

AWS CloudFormation

The CloudFormation team has released the much-anticipated CloudFormation Coverage Roadmap.

Styled after the popular AWS Containers Roadmap, the CloudFormation Coverage Roadmap provides transparency about our priorities, and the opportunity to provide your input.

The roadmap contains four columns:

  • Shipped – Available for use in production in all public AWS Regions.
  • Coming Soon – Generally a few months out.
  • We’re working on It – Work in progress, but further out.
  • Researching – We’re thinking about the right way to implement the coverage.

AWS CloudFormation roadmap

Amazon DynamoDB

NoSQL Workbench for Amazon DynamoDB has been released in preview. This is a free, client-side application available for Windows and macOS. It helps you more easily design and visualize your data model, run queries on your data, and generate the code for your application.

Amazon Aurora

Amazon Aurora Serverless is a dynamically scaling version of Amazon Aurora. It automatically starts up, shuts down, and scales up or down, based on your application workload.

Aurora Serverless has had a MySQL compatible edition for a while, now we’re excited to bring more serverless joy to databases with the PostgreSQL compatible version now GA.

We also have a useful post on Reducing Aurora PostgreSQL storage I/O costs.

AWS Serverless Application Repository

The AWS Serverless Application Repository has had some useful SAR apps added by Serverless Developer Advocate James Beswick.

  • S3 Auto Translator which automatically converts uploaded objects into other languages specified by the user, using Amazon Translate.
  • Serverless S3 Uploader allows you to upload JPG files to Amazon S3 buckets from your web applications using presigned URLs.

Serverless posts

July

August

September

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q3:

Twitch

July

August

September

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS re:Invent

AWS re:Invent

December 2 – 6 in Las Vegas, Nevada is peak AWS learning time with AWS re:Invent 2019. Join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements.

Be sure to take a look at the growing catalog of serverless sessions this year. Make sure to book time for Builders SessionsChalk Talks, and Workshops as these sessions will fill up quickly. The schedule is updated regularly so if your session is currently fully booked, a repeat may be scheduled.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft.

Our friends at IOPipe have written 5 tips for avoiding serverless FOMO at this year’s re:Invent.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

 

Setting up a CI/CD pipeline by integrating Jenkins with AWS CodeBuild and AWS CodeDeploy

Post Syndicated from Noha Ghazal original https://aws.amazon.com/blogs/devops/setting-up-a-ci-cd-pipeline-by-integrating-jenkins-with-aws-codebuild-and-aws-codedeploy/

In this post, I explain how to use the Jenkins open-source automation server to deploy AWS CodeBuild artifacts with AWS CodeDeploy, creating a functioning CI/CD pipeline. When properly implemented, the CI/CD pipeline is triggered by code changes pushed to your GitHub repo, automatically fed into CodeBuild, then the output is deployed on CodeDeploy.

Solution overview

The functioning pipeline creates a fully managed build service that compiles your source code. It then produces code artifacts that can be used by CodeDeploy to deploy to your production environment automatically.

The deployment workflow starts by placing the application code on the GitHub repository. To automate this scenario, I added source code management to the Jenkins project under the Source Code section. I chose the GitHub option, which by design clones a copy from the GitHub repo content in the Jenkins local workspace directory.

In the second step of my automation procedure, I enabled a trigger for the Jenkins server using an “Poll SCM” option. This option makes Jenkins check the configured repository for any new commits/code changes with a specified frequency. In this testing scenario, I configured the trigger to perform every two minutes. The automated Jenkins deployment process works as follows:

  1. Jenkins checks for any new changes on GitHub every two minutes.
  2. Change determination:
    1. If Jenkins finds no changes, Jenkins exits the procedure.
    2. If it does find changes, Jenkins clones all the files from the GitHub repository to the Jenkins server workspace directory.
  3. The File Operation plugin deletes all the files cloned from GitHub. This keeps the Jenkins workspace directory clean.
  4. The AWS CodeBuild plugin zips the files and sends them to a predefined Amazon S3 bucket location then initiates the CodeBuild project, which obtains the code from the S3 bucket. The project then creates the output artifact zip file, and stores that file again on the S3 bucket.
  5. The HTTP Request plugin downloads the CodeBuild output artifacts from the S3 bucket.
    I edited the S3 bucket policy to allow access from the Jenkins server IP address. See the following example policy:

    {
      "Version": "2012-10-17",
      "Id": "S3PolicyId1",
      "Statement": [
        {
          "Sid": "IPAllow",
          "Effect": "Allow",
          "Principal": "*",
          "Action": "s3:*",
          "Resource": "arn:aws:s3:::examplebucket/*",
          "Condition": {
             "IpAddress": {"aws:SourceIp": "x.x.x.x/x"},  <--- IP of the Jenkins server
          } 
        } 
      ]
    }
    
    

    This policy enables the HTTP request plugin to access the S3 bucket. This plugin doesn’t use the IAM instance profile or the AWS access keys (access key ID and secret access key).

  6. The output artifact is a compressed ZIP file. The CodeDeploy plugin by design requires the files to be unzipped to zip them and send them over to the S3 bucket for the CodeDeploy deployment. For that, I used the File Operation plugin to perform the following:
    1. Unzip the CodeBuild zipped artifact output in the Jenkins root workspace directory. At this point, the workspace directory should include the original zip file downloaded from the S3 bucket from Step 5 and the files extracted from this archive.
    2. Delete the original .zip file, and leave only the source bundle contents for the deployment.
  7. The CodeDeploy plugin selects and zips all workspace directory files. This plugin uses the CodeDeploy application name, deployment group name, and deployment configurations that you configured to initiate a new CodeDeploy deployment. The CodeDeploy plugin then uploads the newly zipped file according to the S3 bucket location provided to CodeDeploy as a source code for its new deployment operation.

Walkthrough

In this post, I walk you through the following steps:

  • Creating resources to build the infrastructure, including the Jenkins server, CodeBuild project, and CodeDeploy application.
  • Accessing and unlocking the Jenkins server.
  • Creating a project and configuring the CodeDeploy Jenkins plugin.
  • Testing the whole CI/CD pipeline.

Create the resources

In this section, I show you how to launch an AWS CloudFormation template, a tool that creates the following resources:

  • Amazon S3 bucket—Stores the GitHub repository files and the CodeBuild artifact application file that CodeDeploy uses.
  • IAM S3 bucket policy—Allows the Jenkins server access to the S3 bucket.
  • JenkinsRole—An IAM role and instance profile for the Amazon EC2 instance for use as a Jenkins server. This role allows Jenkins on the EC2 instance to access the S3 bucket to write files and access to create CodeDeploy deployments.
  • CodeDeploy application and CodeDeploy deployment group.
  • CodeDeploy service role—An IAM role to enable CodeDeploy to read the tags applied to the instances or the EC2 Auto Scaling group names associated with the instances.
  • CodeDeployRole—An IAM role and instance profile for the EC2 instances of CodeDeploy. This role has permissions to write files to the S3 bucket created by this template and to create deployments in CodeDeploy.
  • CodeBuildRole—An IAM role to be used by CodeBuild to access the S3 bucket and create the build projects.
  • Jenkins server—An EC2 instance running Jenkins.
  • CodeBuild project—This is configured with the S3 bucket and S3 artifact.
  • Auto Scaling group—Contains EC2 instances running Apache and the CodeDeploy agent fronted by an Elastic Load Balancer.
  • Auto Scaling launch configurations—For use by the Auto Scaling group.
  • Security groups—For the Jenkins server, the load balancer, and the CodeDeploy EC2 instances.

 

  1. To create the CloudFormation stack (for example in the AWS Frankfurt Region) click the below link:
    .

    .
  2. Choose Next and provide the following values on the Specify Details page:
    • For Stack name, name your stack as you prefer.
    • For CodedeployInstanceCount, choose the default of t2.medium.
      To check the supported instance types by AWS Region, see Supported Regions.
    • For InstanceCount, keep the default of 3, to launch three EC2 instances for CodeDeploy.
    • For JenkinsInstanceType, keep the default of t2.medium.
    • For KeyName, choose an existing EC2 key pair in your AWS account. Use this to connect by using SSH to the Jenkins server and the CodeDeploy EC2 instances. Make sure that you have access to the private key of this key pair.
    • For PublicSubnet1, choose a public subnet from which the load balancer, Jenkins server, and CodeDeploy web servers launch.
    • For PublicSubnet2, choose a public subnet from which the load balancers and CodeDeploy web servers launch.
    • For VpcId, choose the VPC for the public subnets you used in PublicSubnet1 and PublicSubnet2.
    • For YourIPRange, enter the CIDR block of the network from which you connect to the Jenkins server using HTTP and SSH. If your local machine has a static public IP address, go to https://www.whatismyip.com/ to find your IP address, and then enter your IP address followed by /32. If you don’t have a static IP address (or aren’t sure if you have one), enter 0.0.0.0/0. Then, any address can reach your Jenkins server.
      .
  3. Choose Next.
  4. On the Review page, select the I acknowledge that this template might cause AWS CloudFormation to create IAM resources check box.
  5. Choose Create and wait for the CloudFormation stack status to change to CREATE_COMPLETE. This takes approximately 6–10 minutes.
  6. Check the resulting values on the Outputs tab. You need them later.
    .
  7. Browse to the ELBDNSName value from the Outputs tab, verifying that you can see the Sample page. You should see a congratulatory message.
  8. Your Jenkins server should be ready to deploy.

Access and unlock your Jenkins server

In this section, I discuss how to access, unlock, and customize your Jenkins server.

  1. Copy the JenkinsServerDNSName value from the Outputs tab of the CloudFormation stack, and paste it into your browser.
  2. To unlock the Jenkins server, SSH to the server using the IP address and key pair, following the instructions from Unlocking Jenkins.
  3. Use the root user to Cat the log file (/var/log/jenkins/jenkins.log) and copy the automatically generated alphanumeric password (between the two sets of asterisks). Then, use the password to unlock your Jenkins server, as shown in the following screenshots.
    .
  4. On the Customize Jenkins page, choose Install suggested plugins.

  5. Wait until Jenkins installs all the suggested plugins. When the process completes, you should see the check marks alongside all of the installed plugins.
    .
    .
  6. On the Create First Admin User page, enter a user name, password, full name, and email address of the Jenkins user.
  7. Choose Save and continue, Save and finish, and Start using Jenkins.
    .
    After you install all the needed Jenkins plugins along with their required dependencies, the Jenkins server restarts. This step should take about two minutes. After Jenkins restarts, refresh the page. Your Jenkins server should be ready to use.

Create a project and configure the CodeDeploy Jenkins plugin

Now, to create our project in Jenkins we need to configure the required Jenkins plugin.

  1. Sign in to Jenkins with the user name and password that you created earlier and click on Manage Jenkins then Manage Plugins.
  2. From the Available tab search for and select the below plugins then choose Install without restart:
    .
    AWS CodeDeploy
    AWS CodeBuild
    Http Request
    File Operations
    .
  3. Select the Restart Jenkins when installation is complete and no jobs are running.
    Jenkins will take couple of minutes to download the plugins along with their dependencies then will restart.
  4. Login then choose New Item, Freestyle project.
  5. Enter a name for the project (for example, CodeDeployApp), and choose OK.
    .

    .
  6. On the project configuration page, under Source Code Management, choose Git. For Repository URL, enter the URL of your GitHub repository.
    .

    .
  7. For Build Triggers, select the Poll SCM check box. In the Schedule, for testing enter H/2 * * * *. This entry tells Jenkins to poll GitHub every two minutes for updates.
    .

    .
  8. Under Build Environment, select the Delete workspace before build starts check box. Each Jenkins project has a dedicated workspace directory. This option allows you to wipe out your workspace directory with each new Jenkins build, to keep it clean.
    .

    .
  9. Under Build Actions, add a Build Step, and AWS CodeBuild. On the AWS Configurations, choose Manually specify access and secret keys and provide the keys.
    .
    .
  10. From the CloudFormation stack Outputs tab, copy the AWS CodeBuild project name (myProjectName) and paste it in the Project Name field. Also, set the Region that you are using and choose Use Jenkins source.
    It is a best practice is to store AWS credentials for CodeBuild in the native Jenkins credential store. For more information, see the Jenkins AWS CodeBuild Plugin wiki.
    .
    .
  11. To make sure that all files cloned from the GitHub repository are deleted choose Add build step and select File Operation plugin, then click Add and select File Delete. Under File Delete operation in the Include File Pattern, type an asterisk.
    .
    .
  12. Under Build, configure the following:
    1. Choose Add a Build step.
    2. Choose HTTP Request.
    3. Copy the S3 bucket name from the CloudFormation stack Outputs tab and paste it after (http://s3-eu-central-1.amazonaws.com/) along with the name of the zip file codebuild-artifact.zip as the value for HTTP Plugin URL.
      Example: (http://s3-eu-central-1.amazonaws.com/mybucketname/codebuild-artifact.zip)
    4. For Ignore SSL errors?, choose Yes.
      .

      .
  13. Under HTTP Request, choose Advanced and leave the default values for Authorization, Headers, and Body. Under Response, for Output response to file, enter the codebuild-artifact.zip file name.
    .

    .
  14. Add the two build steps for the File Operations plugin, in the following order:
    1. Unzip action: This build step unzips the codebuild-artifact.zip file and places the contents in the root workspace directory.
    2. File Delete action: This build step deletes the codebuild-artifact.zip file, leaving only the source bundle contents for deployment.
      .
      .
  15. On the Post-build Actions, choose Add post-build actions and select the Deploy an application to AWS CodeDeploy check box.
  16. Enter the following values from the Outputs tab of your CloudFormation stack and leave the other settings at their default (blank):
    • For AWS CodeDeploy Application Name, enter the value of CodeDeployApplicationName.
    • For AWS CodeDeploy Deployment Group, enter the value of CodeDeployDeploymentGroup.
    • For AWS CodeDeploy Deployment Config, enter CodeDeployDefault.OneAtATime.
    • For AWS Region, choose the Region where you created the CodeDeploy environment.
    • For S3 Bucket, enter the value of S3BucketName.
      The CodeDeploy plugin uses the Include Files option to filter the files based on specific file names existing in your current Jenkins deployment workspace directory. The plugin zips specified files into one file. It then sends them to the location specified in the S3 Bucket parameter for CodeDeploy to download and use in the new deployment.
      .
      As shown below, in the optional Include Files field, I used (**) so all files in the workspace directory get zipped.
      .
      .
  17. Choose Deploy Revision. This option registers the newly created revision to your CodeDeploy application and gets it ready for deployment.
  18. Select the Wait for deployment to finish? check box. This option allows you to view the CodeDeploy deployments logs and events on your Jenkins server console output.
    .
    .
    Now that you have created a project, you are ready to test deployment.

Testing the whole CI/CD pipeline

To test the whole solution, put an application on your GitHub repository. You can download the sample from here.

The following screenshot shows an application tree containing the application source files, including text and binary files, executables, and packages:

In this example, the application files are the templates directory, test_app.py file, and web.py file.

The appspec.yml file is the main application specification file telling CodeDeploy how to deploy your application. Jenkins uses the AppSpec file to manage each deployment as a series of lifecycle event “hooks”, as defined in the file. For information about how to create a well-formed AppSpec file, see AWS CodeDeploy AppSpec File Reference.

The buildspec.yml file is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. You can include a build spec as part of the source code, or you can define a build spec when you create a build project. For more information, see How AWS CodeBuild Works.

The scripts folder contains the scripts that you would like to run during the CodeDeploy LifecycleHooks execution with respect to your application requirements. For more information, see Plan a Revision for AWS CodeDeploy.

To test this solution, perform the following steps:

  1. Unzip the application files and send them to your GitHub repository, run the following git commands from the path where you placed your sample application:
    $ git add -A
    
    $ git commit -m 'Your first application'
    
    $ git push
  2. On the Jenkins server dashboard, wait for two minutes until the previously set project trigger starts working. After the trigger starts working, you should see a new build taking place.
    .

    .
  3. In the Jenkins server Console Output page, check the build events and review the steps performed by each Jenkins plugin. You can also review the CodeDeploy deployment in detail, as shown in the following screenshot:
    .

On completion, Jenkins should report that you have successfully deployed a web application. You can also use your ELBDNSName value to confirm that the deployed application is running successfully.

.

.Conclusion

In this post, I outlined how you can use a Jenkins open-source automation server to deploy CodeBuild artifacts with CodeDeploy. I showed you how to construct a functioning CI/CD pipeline with these tools. I walked you through how to build the deployment infrastructure and automatically deploy application version changes from GitHub to your production environment.

Hopefully, you have found this post informative and the proposed solution useful. As always, AWS welcomes all feedback or comment.

About the Author

.

 

Noha Ghazal is a Cloud Support Engineer at Amazon Web Services. She is is a subject matter expert for AWS CodeDeploy. In her role, she enjoys supporting customers with their CodeDeploy and other DevOps configurations. Outside of work she enjoys drawing portraits, fishing and playing video games.

 

 

Deploying GitOps with Weave Flux and Amazon EKS

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/deploying-gitops-with-weave-flux-and-amazon-eks/

This post is contributed by Jon Jozwiak | Senior Solutions Architect, AWS

 

You have countless options for deploying resources into an Amazon EKS cluster. GitOps—a term coined by Weaveworks—provides some substantial advantages over the alternatives. With only Git as the single, central source for controlling deployment into your cluster, GitOps provides easy version control on a platform your team already knows. Getting started with GitOps is straightforward: create a pull request, merge, and the configuration deploys to the EKS cluster.

Weave Flux makes running GitOps in your EKS cluster fast and easy, as it monitors your configuration in Git and image repositories and automates deployments. Weave Flux follows a pull model, automatically triggering deployments based on changes. This provides better security than most continuous deployment tools, which need permissions to access your cluster. This approach also provides Git with version control over your configuration and enables rollback.

This post walks through implementing Weave Flux and deploying resources to EKS using Git. To simplify the image build pipeline, I use AWS Service Catalog to provide a standardized pipeline. AWS Service Catalog lets you centrally define a portfolio of approved products that AWS users can provision. An AWS CloudFormation template defines each product, which can be version-controlled.

After you deploy the sample resources, I quickly demonstrate the GitOps approach where a new image results in the configuration automatically deploying to EKS. This new image may be a commit of Kubernetes manifests or a commit of Helm release definitions.

The following diagram shows the workflow.

Prerequisites

In GitOps, you manage Docker image builds separately from deployment configuration. For image builds, this example uses AWS CodePipeline and AWS CodeBuild, which provide a managed workflow from GitHub source through to an image landing in Amazon Elastic Container Registry (ECR).

This post assumes that you already have an EKS cluster deployed, including kubectl access. It also assumes that you have a GitHub account.

GitHub setup

First, create a GitHub repository to store the Kubernetes manifests (configuration files) to apply to the cluster.

In GitHub, create a GitHub repository. This repository holds Kubernetes manifests for your deployments. Name the repository k8s-config to align with this post. Leave it as a public repository, check the box for Initialize this repository with a README, and choose Create Repo.

On the GitHub repository page, choose Clone or Download and save the SSH string:

[email protected]:youruser/k8s-config.git

Next, create a GitHub token that allows creating and deleting repositories so AWS Service Catalog can deploy and remove pipelines.

  1. In your GitHub profile, access your token settings.
  2. Choose Generate New Token.
  3. Name your new token CodePipeline Service Catalog, and select the following options:
  • repo scopes (repo:status, repo_deployment, public_repo, and repo:invite)
  • read:org
  • write:public_key and read:public_key
  • write:repo_hook and read:repo_hook
  • read:user and user:email
  • delete_repo

4 . Choose Generate Token.

5. Copy and save your access token for future access.

 

Deploy Helm

Helm is a package manager for Kubernetes that allows you to define a chart. Charts are collections of related resources that let you create, version, share, and publish applications. By deploying Helm into your cluster, you make it much easier to deploy Weave Flux and other systems. If you’ve deployed Helm already, skip this section.

First, install the Helm client with the following command:

curl -LO https://git.io/get_helm.sh

chmod 700 get_helm.sh

./get_helm.sh

 

On macOS, you could alternatively enter the following command:

brew install kubernetes-helm

 

Next, set up a service account with cluster role for Tiller, Helm’s server-side component. This allows Tiller to manage resources in your cluster.

kubectl -n kube-system create sa tiller

kubectl create clusterrolebinding tiller-cluster-rule \

--clusterrole=cluster-admin \

--serviceaccount=kube-system:tiller

 

Finally, initialize Helm and verify your version. Tiller takes a few seconds to start.

helm init --service-account tiller --history-max 200

helm version

 

Deploy Weave Flux

With Helm installed, proceed with the Weave Flux installation. Begin by installing the Flux Custom Resource Definition.

kubectl apply -f https://raw.githubusercontent.com/fluxcd/flux/helm-0.10.1/deploy-helm/flux-helm-release-crd.yaml

Now add the Weave Flux Helm repository and proceed with the install. Make sure that you update the git.url to match the GitHub repository that you created earlier.

helm repo add fluxcd https://charts.fluxcd.io

helm upgrade -i flux --set helmOperator.create=true --set helmOperator.createCRD=false --set [email protected]:YOURUSER/k8s-config --namespace flux fluxcd/flux

 

You can use the following code to verify that you successfully deployed Flux. You should see three pods running:

kubectl get pods -n flux

NAME                                 READY     STATUS    RESTARTS   AGE

flux-5bd7fb6bb6-4sc78                1/1       Running   0          52s

flux-helm-operator-df5746688-84kw8   1/1       Running   0          52s

flux-memcached-6f8c446979-f45wj      1/1       Running   0          52s

 

Flux requires a deploy key to work with the GitHub repository. In this post, Flux generates the SSH key pair itself, but you can also specify a different key pair when deploying. To access the key, download fluxctl, a command line utility that interacts with the Flux API. The following steps work for Linux. For other OS platforms, see Installing fluxctl.

sudo wget -O /usr/local/bin/fluxctl https://github.com/fluxcd/flux/releases/download/1.14.1/fluxctl_linux_amd64

sudo chmod 755 /usr/local/bin/fluxctl

 

Validate that fluxctl installed successfully, then retrieve the public key pair using the following command. Specify the namespace where you deployed Flux.

fluxctl version

fluxctl --k8s-fwd-ns=flux identity

 

Copy the key and add that as a deploy key in your GitHub repository.

  1. In your GitHub repository, choose Settings, Deploy Keys.
  2. Choose Add deploy key and name the key Flux Deploy Key.
  3. Paste the key from fluxctl identity.
  4. Choose Allow Write Access, Add Key.

Now use AWS Service Catalog to set up your image build pipeline.

 

Set up AWS Service Catalog

To allow end users to consume product portfolios, you must associate a portfolio with an IAM principal (or principals): a user, group, or role. For this example, associate your current identity. After you master these basics, there are additional resources to teach you how to set up a multi-region, multi-account catalog.

To retrieve your current identity, use the AWS CLI to get your ARN:

aws sts get-caller-identity

Deploy the product portfolio that contains an image build pipeline service by doing the following:

  1. In the AWS CloudFormation console, launch the CloudFormation stack with the following link:

 

 

2. Choose Next.

3. On the Specify Details page, enter your ARN from get-caller-identity. Also enter an environment tag, which AWS applies to all resources from this portfolio.

4. Choose Next.

5. On the Options page, choose Next.

6. On the Review page, select the check box displayed next to I acknowledge that AWS CloudFormation might create IAM resources.

7. Choose Create. CloudFormation takes a few minutes to create your resources.

 

Deploy the image pipeline

The image pipeline provisions a GitHub repository, Amazon ECR repository, and AWS CodeBuild project. It also uses AWS CodePipeline to build a Docker image.

  1. In the AWS Management Console, go to the AWS Service Catalog products list and choose Pipeline for Docker Images.
  2. Choose Launch Product.
  3. For Name, enter ExamplePipeline, and choose Next.
  4. On the Parameters page, fill in a project name, description, and unique S3 bucket name. The specifics don’t matter, but make a note of the name and S3 bucket for later use.
  5. Fill in your GitHub User and GitHub Token values from earlier. Leave the rest of the fields as the default values.
  6. To clean up your GitHub repository on stack delete, change Delete Repository to true.
  7. Choose Next.
  8. On the TagOptions screen, choose Next.
  9. Choose Next on the Notifications page.
  10. On the Review page, choose Launch.

The launch process takes 1–2 minutes. You can verify that you now have a repository matching your project name (eks-example) in GitHub. You can also look at the pipeline created in the AWS CodePipeline console.

 

Deploying with GitOps

You can now provision workloads into the EKS cluster. With a GitOps approach, you only commit code and Kubernetes resource definitions to GitHub. AWS CodePipeline handles the image builds, and Weave Flux applies the desired state to Kubernetes.

First, create a simple Hello World application in your example pipeline. Clone the GitHub repository that you created in the previous step and substitute your GitHub user below.

git clone [email protected]:youruser/eks-example.git

cd eks-example

Create a base README file, a source directory, and download a simple NGINX configuration (hello.conf), home page (index.html), and Dockerfile.

echo "# eks-example" > README.md

mkdir src

wget -O src/hello.conf https://blog-gitops-eks.s3.amazonaws.com/hello.conf

wget -O src/index.html https://blog-gitops-eks.s3.amazonaws.com/index.html

wget https://blog-gitops-eks.s3.amazonaws.com/Dockerfile

 

Now that you have a simple Hello World app with Dockerfile, commit the changes to kick off the pipeline.

git add .

git commit -am "Initial commit"

[master (root-commit) d69a6ba] Initial commit

4 files changed, 34 insertions(+)

create mode 100644 Dockerfile

create mode 100644 README.md

create mode 100644 src/hello.conf

create mode 100644 src/index.html

git push

 

Watch in the AWS CodePipeline console to see the image build in process. This may take a minute to start. When it’s done, look in the ECR console to see the first version of the container image.

To deploy this image and the Hello World application, commit Kubernetes manifests for Flux. Create a namespace, deployment, and service in the Kubernetes Git repository (k8s-config) you created. Make sure that you aren’t in your eks-example repository directory.

cd ..

git clone [email protected]:youruser/k8s-config.git

cd k8s-config

mkdir charts namespaces releases workloads

 

The preceding directory structure helps organize the repository but isn’t necessary. Flux can descend into subdirectories and look for YAML files to apply.

Create a namespace Kubernetes manifest.

cat << EOF > namespaces/eks-example.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: eks-example
  name: eks-example
EOF

Now create a deployment manifest. Make sure that you update this image to point to your repository and image tag. For example, <Account ID>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac.

cat << EOF > workloads/eks-example-dep.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
  annotations:
    # Container Image Automated Updates
    flux.weave.works/automated: "true"
    # do not apply this manifest on the cluster
    #flux.weave.works/ignore: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: eks-example
  template:
    metadata:
      labels:
        app: eks-example
    spec:
      containers:
      - name: eks-example
        image: <Your Account>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: http
        readinessProbe:
          httpGet:
            path: /
            port: http
EOF

 

Finally, create a service manifest to create a load balancer.

cat << EOF > workloads/eks-example-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: eks-example
EOF

 

In the preceding code, there are two Kubernetes annotations for Flux. The first, flux.weave.works/automated, tells Flux whether the container image should be automatically updated. This example sets the value to true, enabling updates to your deployment as new images arrive in the registry. This example comments out the second annotation, flux.weave.works/ignore. However, you can use it to tell Flux to ignore the deployment temporarily.

Commit the changes, and in a few minutes, it automatically deploys.

git add .
git commit -am "eks-example deployment"
[master 954908c] eks-example deployment
 3 files changed, 64 insertions(+)
 create mode 100644 namespaces/eks-example.yaml
 create mode 100644 workloads/eks-example-dep.yaml
 create mode 100644 workloads/eks-example-svc.yaml

 

Make sure that you push your changes.

git push

Now check the logs of your Flux pod:

kubectl get pods -n flux

Update the name below to reflect the name of the pod in your deployment. This sample pulls every five minutes for changes. When it triggers, you should see kubectl apply log messages to create the namespace, service, and deployment.

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

Find the load balancer input for your service with the following:

kubectl describe service eks-example -n eks-example

Now when you connect to the load balancer address in a browser, you can see the Hello World app.

Change the eks-example source code in a small way (such as changing index.html to say Hello World Deployment 2), then commit and push to Git.

After a few minutes, refresh your browser to see the deployed change. You can watch the changes in AWS CodePipeline, in ECR, and through Flux logs. Weave Flux automatically updated your deployment manifests in the k8s-config repository to deploy the new image as it detected it. To back out that change, use a git revert or git reset command.

Finally, you can use the same approach to deploy Helm charts. You can host these charts within the configuration Git repository (k8s-config in this example), or on an external chart repository. In the following example, you use an external chart repository.

In your k8s-config directory, get the latest changes from your repository and then create a Helm release from an external chart.

cd k8s-config

git pull

 

First, create the namespace manifest.

cat << EOF > namespaces/nginx.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: nginx
  name: nginx
EOF

 

Then create the Helm release manifest. This is a custom resource definition provided by Weave Flux.

cat << EOF > releases/nginx.yaml
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: mywebserver
  namespace: nginx
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.nginx: semver:~1.16
    flux.weave.works/locked: 'true'
    flux.weave.works/locked_msg: '"Halt updates for now"'
    flux.weave.works/locked_user: User Name <[email protected]>
spec:
  releaseName: mywebserver
  chart:
    repository: https://charts.bitnami.com/bitnami/
    name: nginx
    version: 3.3.2
  values:
    usePassword: true
    image:
      registry: docker.io
      repository: bitnami/nginx
      tag: 1.16.0-debian-9-r46
    service:
      type: LoadBalancer
      port: 80
      nodePorts:
        http: ""
      externalTrafficPolicy: Cluster
    ingress:
      enabled: false
    livenessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 30
      timeoutSeconds: 5
      failureThreshold: 6
    readinessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 5
      timeoutSeconds: 3
      periodSeconds: 5
    metrics:
      enabled: false
EOF

git add . 
git commit -am "Adding NGINX Helm release"
git push

 

There are a few new annotations for Flux above. The flux.weave.works/locked annotation tells Flux to lock the deployment. This is useful if you find a known bad image and must roll back to a previous version. In addition, the flux.weave.works/tag.nginx annotation filters image tags by semantic versioning.

Wait up to five minutes for Flux to pull the configuration and verify this deployment as you did in the previous example:

kubectl get pods -n flux

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

 

kubectl get all -n nginx

 

If this doesn’t deploy, ensure Helm initialized as described earlier in this post.

kubectl get pods -n kube-system | grep tiller

kubectl get pods -n flux

kubectl logs flux-helm-operator-df5746688-84kw8 -n flux

 

Clean up

Log in as an administrator and follow these steps to clean up your sample deployment.

  1. Delete all images from the Amazon ECR repository.

2. In AWS Service Catalog provisioned products, select the three dots to the left of your ExamplePipeline service and choose Terminate provisioned product. Wait until it completes termination (1–2 minutes).

3. Delete your Amazon S3 artifact bucket.

4. Delete Weave Flux:

helm delete flux --purge

kubectl delete ns flux

kubectl delete crd helmreleases.flux.weave.works

5. Delete the load balancer services:

helm delete mywebserver --purge

kubectl delete ns nginx

kubectl delete svc eks-example -n eks-example

kubectl delete deployment eks-example -n eks-example

kubectl delete ns eks-example

6. Clean up your GitHub repositories:

 – Go to your k8s-config repository in GitHub, choose Settings, scroll to the bottom and choose Delete this repository. If you set delete to false in the pipeline service, you also must delete your eks-example repository.

 – Delete the personal access token that you created.

7.     If you provisioned an EKS cluster at the beginning of this post, delete it:

eksctl get cluster

eksctl delete cluster <clustername>

8.     In the AWS CloudFormation console, select the DevServiceCatalog stack, and choose the Actions, Delete Stack.

Conclusion

In this post, I demonstrated how to use a GitOps approach, which allows you to focus on committing code and configuration to Git rather than learning new CI/CD tooling. Git acts as the single source of truth, and Weave Flux pulls changes and ensures that the Kubernetes cluster configuration matches the desired state.

In addition, AWS Service Catalog can be used to create a portfolio of services that enables you to standardize your offerings, such as an image build pipeline based on AWS CodePipeline.

As always, AWS welcomes feedback. Please submit comments or questions below.

Automate Amazon Redshift cluster creation using AWS CloudFormation

Post Syndicated from Sudhir Gupta original https://aws.amazon.com/blogs/big-data/automate-amazon-redshift-cluster-creation-using-aws-cloudformation/

In this post, I explain how to automate the deployment of an Amazon Redshift cluster in an AWS account. AWS best practices for security and high availability drive the cluster’s configuration, and you can create it quickly by using AWS CloudFormation. I walk you through a set of sample CloudFormation templates, which you can customize as per your needs.

Amazon Redshift is a fast, scalable, fully managed, ACID and ANSI SQL-compliant cloud data warehouse service. You can set up and deploy a new data warehouse in minutes, and run queries across petabytes of structured data stored in Amazon Redshift. With Amazon Redshift Spectrum, it extends your data warehousing capability to data lakes built on Amazon S3. Redshift Spectrum allows you to query exabytes of structured and semi-structured data in its native format, without requiring you to load the data. Amazon Redshift delivers faster performance than other data warehouse databases by using machine learning, massively parallel query execution, and columnar storage on high-performance disk. You can configure Amazon Redshift to scale up and down in minutes, as well as expand compute power automatically to ensure unlimited concurrency.

As you begin your journey with Amazon Redshift and set up AWS resources based on the recommended best practices of AWS Well-Architected Framework, you can use the CloudFormation templates provided here. With the modular approach, you can choose to build AWS infrastructure from scratch, or you can deploy Amazon Redshift into an existing virtual private cloud (VPC).

Benefits of using CloudFormation templates

With an AWS CloudFormation template, you can condense hundreds of manual procedures into a few steps listed in a text file. The declarative code in the file captures the intended state of the resources to create, and you can choose to automate the creation of hundreds of AWS resources. This template becomes the single source of truth for your infrastructure.

A CloudFormation template acts as an accelerator. It helps you automate the deployment of technology and infrastructure in a safe and repeatable manner across multiple Regions and multiple accounts with the least amount of effort and time.

Architecture overview

The following architecture diagram and summary describe the solution that this post uses.

Figure 1: Architecture diagram

The sample CloudFormation templates provision the network infrastructure and all the components shown in the architecture diagram.

I broke the CloudFormation templates into the following three stacks:

  1. A CloudFormation template to set up a VPC, subnets, route tables, internet gateway, NAT gateway, Amazon S3 gateway endpoint, and other networking components.
  2. A CloudFormation template to set up an Amazon Linux bastion host in an Auto Scaling group to connect to the Amazon Redshift cluster.
  3. A CloudFormation template to set up an Amazon Redshift cluster, CloudWatch alarms, AWS Glue Data Catalog, and an Amazon Redshift IAM role for Amazon Redshift Spectrum and ETL jobs.

I integrated the stacks using exported output values. Using three different CloudFormation stacks instead of one nested stack gives you additional flexibility. For example, you can choose to deploy the VPC and bastion host CloudFormation stacks one time and Amazon Redshift cluster CloudFormation stack multiple times in an AWS Region.

Best practices

The architecture built by these CloudFormation templates supports AWS best practices for high availability and security.

The VPC CloudFormation template takes care of the following:

  1. Configures three Availability Zones for high availability and disaster recovery. It geographically distributes the zones within a Region for best insulation and stability in the event of a natural disaster.
  2. Provisions one public subnet and one private subnet for each zone. I recommend using public subnets for external-facing resources and private subnets for internal resources to reduce the risk of data exfiltration.
  3. Creates and associates network ACLs with default rules to the private and public subnets. AWS recommends using network ACLs as firewalls to control inbound and outbound traffic at the subnet level. These network ACLs provide individual controls that you can customize as a second layer of defense.
  4. Creates and associates independent routing tables for each of the private subnets, which you can configure to control the flow of traffic within and outside the VPC. The public subnets share a single routing table because they all use the same internet gateway as the sole route to communicate with the internet.
  5. Creates a NAT gateway in each of the three public subnets for high availability. NAT gateways offer significant advantages over NAT instances in terms of deployment, availability, and maintenance. NAT gateways allow instances in a private subnet to connect to the internet or other AWS services even as they prevent the internet from initiating a connection with those instances.
  6. Creates an VPC endpoint for Amazon S3. Amazon Redshift and other AWS resources—running in a private subnet of a VPC—can connect privately to access S3 buckets. For example, data loading from S3 and unloading data to S3 happens over a private, secure, and reliable connection.

The Amazon Linux bastion host CloudFormation template takes care of the following:

  1. Creates an Auto Scaling group spread across the three public subnets set up by the VPC CloudFormation template. The Auto Scaling group keeps the Amazon Linux bastion host available in one of the three Availability Zones.
  2. Sets up an Elastic IP address and associates it with the Amazon Linux bastion host. An Elastic IP address makes it easier to remember and allow IP addresses from on-premises firewalls. If your system terminates an instance and the Auto Scaling group launches a new instance in its place, the existing Elastic IP address re-associates with the new instance automatically. This lets you use the same trusted Elastic IP address at all times.
  3. Sets up an Amazon EC2 security group and associates with the Amazon Linux bastion host. This allows you to lock down access to the bastion hosts, only allowing inbound traffic from known CIDR scopes and ports.
  4. Creates an Amazon CloudWatch Logs log group to hold the Amazon Linux bastion host’s shell history logs and sets up a CloudWatch metric to track SSH command counts. This helps with security audits by allowing you to check who accesses the bastion host and when.
  5. Creates a CloudWatch alarm to monitor the CPU on the bastion host and send an Amazon SNS notification when anything triggers the alarm.

The Amazon Redshift cluster template takes care of the following:

  1. Creates an Amazon Redshift cluster subnet group span across multiple Availability Zones so that you can create different clusters into different zones to minimize the impact of failure of one zone.
  2. Configures database auditing and stores audit logs into an S3 bucket. It also restricts access to the Amazon Redshift logging service and configures lifecycle rules to archive logs older than 14 days to Amazon S3 Glacier.
  3. Creates an IAM role with a policy to grant the minimum permissions required to use Amazon Redshift Spectrum to access S3, CloudWatch Logs, AWS Glue, and Amazon Athena. It then associates this IAM role with Amazon Redshift.
  4. Creates an EC2 security group and associates it with the Amazon Redshift cluster. This allows you to lock down access to the Amazon Redshift cluster to known CIDR scopes and ports.
  5. Creates an Amazon Redshift cluster parameter group with the following configuration and associates it with the Amazon Redshift cluster. These parameters are only a general guide. Review and customize them to suit your needs.

Parameter

Value

Description

enable_user_activity_loggingTRUEThis enables the user activity log. For more information, see Database Audit Logging.
require_sslTRUEThis enables SSL connections to the Amazon Redshift cluster.
wlm_json_configuration[ {
"query_group" : [ ],
"query_group_wild_card" : 0,
"user_group" : [ ],
"user_group_wild_card" : 0,
"concurrency_scaling" : "auto",
"rules" : [ {
"rule_name" : "DiskSpilling",
"predicate" : [ {
"metric_name" : "query_temp_blocks_to_disk",
"operator" : ">",
"value" : 100000
} ],
"action" : "log",
"value" : ""
}, {
"rule_name" : "RowJoining",
"predicate" : [ {
"metric_name" : "join_row_count",
"operator" : ">",
"value" : 1000000000
} ],
"action" : "log",
"value" : ""
} ],
"auto_wlm" : true
}, {
"short_query_queue" : true
} ]

This creates a custom workload management queue (WLM) with the following configuration:

Auto WLM: Amazon Redshift manages query concurrency and memory allocation automatically, as per workload.

Enable Short Query Acceleration (SQA): Amazon Redshift executes short-running queries in a dedicated space so that SQA queries aren’t forced to wait in queues behind longer queries.

Enable Concurrency Scaling for the queries routed to this WLM queue.

Creates two WLM QMR Rules:

Log queries when temporary disk space used to write intermediate results exceeds 100 GB.

Log queries when the number of rows processed in a join step exceed one billion rows.

You can also create different rules based on your needs and choose different actions (abort or hop or log).

max_concurrency_scaling_clusters1 (or what you chose)Sets the maximum number of concurrency scaling clusters allowed when concurrency scaling is enabled.
auto_analyzeTRUEIf true, Amazon Redshift continuously monitors your database and automatically performs analyze operations in the background.
statement_timeout43200000Terminates any statement that takes more than the specified number of milliseconds. The statement_timeout value is the maximum amount of time a query can run before Amazon Redshift terminates it.
  1. Configures the Amazon Redshift cluster to listen on a non-default Amazon Redshift port, according to security best practices.
  2. Creates the Amazon Redshift cluster in the private subnets according to AWS security best practices. To access the Amazon Redshift cluster, use the Amazon Linux bastion host that the Linux bastion host CloudFormation template sets up.
  3. Creates minimum two-nodes cluster, unless you choose 1 against input parameter NumberOfNodes. AWS recommends using at least two nodes per cluster for production. For more information, see the Availability and Durability section of Amazon Redshift-FAQ.
  4. Enables encryption at-rest for the Amazon Redshift cluster by using the Amazon Redshift managed KMS key or a user-specified KMS key. To use the user-specified KMS key and you have not created it yet, first create a KMS key. For more information, see Creating KMS Keys.
  5. Configures Amazon EBS snapshots retention to 35 days for production environments and 8 days for non-production environments. This allows you to recover your production database to any point in time in the last 35 days or the last 8 days for a non-production database.
  6. It takes a final snapshot of the Amazon Redshift database automatically when you delete the Amazon Redshift cluster using Delete stack option. It prevents data loss from the accidental deletion of your CloudFormation stack.
  7. Creates an AWS Glue Data Catalog as a metadata store for your AWS data lake.
  8. Configures CloudWatch alarms for key CloudWatch metrics like PercentageDiskSpaceUsed, and CPUUtilization for the Amazon Redshift cluster, and sends an SNS notification when one of these conditions triggers the alarm.
  9. Provides the option to restore the Amazon Redshift cluster from a previously taken snapshot.
  10. Attaches common tags to the Amazon Redshift clusters and other resources. AWS recommends assigning tags to your cloud infrastructure resources to manage resource access control, cost tracking, automation, and organization.

Prerequisites

Before setting up the CloudFormation stacks, note the following prerequisites.

  1. You must have an AWS account and an IAM user with sufficient permissions to interact with the AWS Management Console and the services listed in the preceding Architecture overview section. Your IAM permissions must also include access to create IAM roles and policies created by the AWS CloudFormation template.
  2. The VPC CloudFormation stack requires three Availability Zones to set up the public and private subnets. Make sure to select an AWS Region that has at least three Availability Zones.
  3. Create an EC2 key pair in the EC2 console in the AWS Region where you are planning to set up the CloudFormation stacks. Make sure that you save the private key, as this is the only time you can do this. You use this EC2 key pair as an input parameter during setup for the Amazon Linux bastion host CloudFormation stack.

Set up the resources using AWS CloudFormation

I provide these CloudFormation templates as a general guide. Review and customize them to suit your needs. Some of the resources deployed by these stacks incur costs as long as they remain in use.

Set up the VPC, subnets, and other networking components

This CloudFormation template will create a VPC, subnets, route tables, internet gateway, NAT gateway, Amazon S3 gateway endpoint, and other networking components. Follow below steps to create these resources in your AWS account.

  1. Log in to the AWS Management Console.
  2. In the top navigation ribbon, choose the AWS Region in which to create the stack, and choose Next. This CloudFormation stack requires three Availability Zones for setting up the public and private subnets. Select an AWS Region that has at least three Availability Zones.
  3. Choose the following Launch Stack button. This button automatically launches the AWS CloudFormation service in your AWS account with a template. It prompts you to sign in as needed. You can view the CloudFormation template from within the console as required.
  4. The CloudFormation stack requires a few parameters, as shown in the following screenshot.
    • Stack name: Enter a meaningful name for the stack, for example, rsVPC
    • ClassB 2nd Octet : Specify the second octet of the IPv4 CIDR block for the VPC (10.XXX.0.0/16). You can specify any number between and including 0–255, for example, specify 33 to create a VPC with IPv4 CIDR block 10.33.0.0/16.To learn more about VPC and subnet sizing for IPv4, see VPC and Subnet Sizing for IPv4.

      Figure 2: VPC Stack, in the CloudFormation Console

  5. After entering all the parameter values, choose Next.
  6. On the next screen, enter any required tags, an IAM role, or any advanced options, and then choose Next.
  7. Review the details on the final screen, and choose Create.

Stack creation takes a few minutes. Check the AWS CloudFormation Resources section to see the physical IDs of the various components this stack sets up.

After this, you must set up the Amazon Linux bastion host, which you use to log in to the Amazon Redshift cluster.

Set up the Amazon Linux bastion host

This CloudFormation template will create an Amazon Linux bastion host in an Auto Scaling group. Follow below steps to create the bastion host in the VPC.

  1. In the top navigation ribbon, choose the AWS Region in which to create the stack, and choose Next.
  2. Choose the following Launch Stack button. This button automatically launches the AWS CloudFormation service in your AWS account with a template to launch.
  3. The CloudFormation stack requires a few parameters, as shown in the following screenshots.
    • Stack name: Enter a meaningful name for the stack, for example, rsBastion.
    • Parent VPC Stack: Enter the CloudFormation stack name for the VPC stack that you set up in the previous step. Find this value in the CloudFormation console, for example, rsVPC.
    • Allowed Bastion External Access CIDR: Enter the allowed CIDR block in the x.x.x.x/x format for external SSH access to the bastion host.
    • Key Pair Name: Select the key pair name that you set up in the Prerequisites section.
    • Bastion Instance Type: Select the Amazon EC2 instance type for the bastion instance.
    • LogsRetentionInDays: Specify the number of days to retain CloudWatch log events for the bastion host.
    • SNS Notification Email: Enter the email notification list used to configure an SNS topic for sending CloudWatch alarm notifications.
    • Bastion Tenancy: Select the VPC tenancy in which you launched the bastion host.
    • Enable Banner: Select to display a banner when connecting through SSH to the bastion.
    • Bastion Banner: Use Default or provide an S3 location for the file containing the banner text that the host displays upon login.
    • Enable TCP Forwarding: Select True to Enable/Disable TCP Forwarding. Setting this value to true enables TCP forwarding (SSH tunneling). This can be useful, but also presents a security risk, so I recommend that you keep the default Disabled setting unless required.
    • Enable X11 Forwarding: Select to Enable/Disable X11 Forwarding. Setting this value to true enables X Windows over SSH. X11 forwarding can be useful but it is also a security risk, so I recommend that you keep the default (disabled) setting unless required.
    • Custom Bootstrap Script: Optional. Specify a custom bootstrap script S3 location for running during bastion host setup.
    • AMI override: Optional. Specify an AWS Region-specific image for the instance.

      Figure 3: Bastion Stack, in the CloudFormation Console

  1. After entering all the parameter values, choose Next.
  2. On the next screen, enter any required tags, an IAM role, or any advanced options, and then choose Next.
  3. Review the details on the final screen, select I acknowledge that AWS CloudFormation might create IAM resources, and then choose Create.

Stack creation takes a few minutes. Check the AWS CloudFormation Resources section to see the physical IDs of the various components set up by this stack.

You are now ready to set up the Amazon Redshift cluster.

Set up the Amazon Redshift cluster

This CloudFormation template will set up an Amazon Redshift cluster, CloudWatch alarms, AWS Glue Data Catalog, an Amazon Redshift IAM role and required configuration. Follow below steps to create these resources in the VPC:

  1. Choose the AWS Region where you want to create the stack on the top right of the screen, and then choose Next.
  2. Choose the following Launch Stack button. This button automatically launches the AWS CloudFormation service in your AWS account with a template.
  3. The CloudFormation stack requires a few parameters, as shown in the following screenshots:
    • Stack name: Enter a meaningful name for the stack, for example, rsdb
    • Environment: Select the environment stage (Development, Test, Pre-prod, Production) of the Amazon Redshift cluster. If you specify the Production option for this parameter, it sets snapshot retention to 35 days, sets the enable_user_activity_logging parameter to true, and creates CloudWatch alarms for high CPU-utilization and high disk-space-usage. Setting Development, Test, or Pre-prod for this parameter sets snapshot retention to 8 days, sets the enable_user_activity_logging parameter to false, and creates CloudWatch alarms only for high disk-space-Usage.
    • Parent VPC stack: Provide the stack name of the parent VPC stack.  Find this value inthe CloudFormation console.
    • Parent bastion stack (Optional): Provide the stack name of parent Amazon Linux bastion host stack. Find this value in the CloudFormation console.
    • Node type for Redshift cluster: Enter the type of the node for your Amazon Redshift cluster, for example, dc2.large.
    • Number of nodes in Redshift cluster: Enter the number of compute nodes for the Amazon Redshift cluster, for example, 2.
    • Redshift cluster port: Enter the TCP/IP port for the Amazon Redshift cluster, for example, 8200.
    • Redshift database name:  Enter a database name, for example, rsdev01.
    • Redshift master user name: Enter a database master user name, for example, rsadmin.
    • Redshift master user password: Enter an alphanumeric password for the master user. The password must contain 8–64 printable ASCII characters, excluding: /, “, \”, \, and @. It must contain one uppercase letter, one lowercase letter, and one number. For example, Welcome123.
    • Enable Redshift logging to S3: If you choose true for this parameter, the stack enables database auditing for the newly created S3 bucket.
    • Max. number of concurrent clusters: Enter any number between 1–10 for concurrency scaling. To configure more than 10, you must request a limit increase by submitting an Amazon Redshift Limit Increase Form.
    • Encryption at rest: If you choose true for this parameter, the database encrypts your data with the KMS key.
    • KMS key ID: If you leave this empty, then the cluster uses the default Amazon Redshift KMS to encrypt the Amazon Redshift database. If you enter a user-created KMS key, then the cluster uses your user-defined KMS key to encrypt the Amazon Redshift database.
    • Redshift snapshot identifier: Enter a snapshot identifier only if you want to restore from a snapshot. Leave it blank for a new cluster.
    • AWS account ID of the Redshift snapshot: Enter the AWS Account number that created the snapshot. Leave it blank if snapshot comes from the current AWS account or you don’t want to restore from previously taken snapshot.
    • Redshift maintenance window: Enter a maintenance window for your Amazon Redshift cluster. For more information, see Amazon Redshift maintenance window. For example, sat:05:00-sat:05:30.
    • S3 bucket for Redshift IAM role: Enter the existing S3 bucket. The stack automatically creates an IAM role and associates it with the Amazon Redshift cluster with GET and LIST access to this bucket.
    • AWS Glue Data Catalog database name: Leave this field empty if you don’t want to create an AWS Glue Data Catalog. If you do want an associated AWS Glue Data Catalog database, enter a name for it, for example, dev-catalog-01. For a list of the AWS Regions in which AWS Glue is available, check the regional-product-services map.
    • Email address for SNS notification: Enter the email notification list that you used to configure an SNS topic for sending CloudWatch alarms. SNS sends a subscription confirmation email to the recipient. The recipient must choose the Confirm subscription link in this email to set up notifications.
    • Unique friendly name: This tag designates a unique, friendly name to append as a NAME tag into all AWS resources that this stack manages.
    • Designate business owner’s email: This tag designates the business owner’s email address associated with the given AWS resource. The stack sends outage or maintenance notifications to this address.
    • Functional tier: This tag designates the specific version of the application.
    • Project cost center: This tag designates the cost center associated with the project of the given AWS resource.
    • Confidentiality classifier: This tag designates the confidentiality classification of the data associated with the resource.
    • Compliance classifier: This tag specifies the Compliance level for the AWS resource.


      Figure 4: Amazon Redshift Stack, in the CloudFormation Console

  4. After entering the parameter values, choose Next.
  5. On the next screen, enter any required tags, an IAM role, or any advanced options, and then choose Next.
  6. Review the details on the final screen, select I acknowledge that AWS CloudFormation might create IAM resources, and choose Create.

Stack creation takes a few minutes. Check the AWS CloudFormation Resources section to see the physical IDs of the various components set up by these stacks.

With setup complete, log in to the Amazon Redshift cluster and run some basic commands to test it.

Log in to the Amazon Redshift cluster using the Amazon Linux bastion host

The following instructions assume that you use a Linux computer and use an SSH client to connect to the bastion host. For more information about how to connect using various clients, see Connect to Your Linux Instance.

  1. Move the private key of the EC2 key pair (that you saved in the Prerequisites section) to a location on your SSH Client, where you are connecting to the Amazon Linux bastion host.
  2. Change the permission of the private key using the following command, so that it’s not publicly viewable.chmod 400 <private key file name, e.g., bastion-key.pem >
  3. In the CloudFormation console, select the Amazon Linux bastion host stack. Choose Outputs and make a note of the SSHCommand parameter value, which you use to apply SSH to the Amazon Linux bastion host.
  4. On the SSH client, change the directory to the location where you saved the EC2 private key, and then copy and paste the SSHCommand value from the previous step.
  5. On the CloudFormation Dashboard, select the Amazon Redshift cluster stack. Choose Outputs and note the PSQLCommandLine parameter value, which you use to log in to the Amazon Redshift database using psql client.
  6. The EC2 Auto Scaling launch configuration already set up PostgreSQL binaries on the Amazon Linux bastion host. Copy and paste the PSQLCommandLine value at the command prompt of the bastion host.
    psql -h ClusterEndpointAddress -p AmazonRedshiftClusterPort -U Username -d DatabaseNameWhen prompted, enter the database user password.
  7. Run some basic commands, as shown in the following screenshot:
    select current_database();
    select current_user;

    Figure 5: Successful connection to Amazon Redshift

Next steps

Before you use the Amazon Redshift cluster to set up your application-related database objects, consider creating the following:

  • An application schema
  • A user with full access to create and modify objects in the application schema
  • A user with read/write access to the application schema
  • A user with read-only access to the application schema

Use the master user that you set up with the Amazon Redshift cluster only for administering the Amazon Redshift cluster. To create and modify application-related database objects, use the user with full access to the application schema. Your application should use the read/write user for storing, updating, deleting, and retrieving data. Any reporting or read-only application should use the read-only user. Granting the minimum privileges required to perform operations is a database security best practice.

Review AWS CloudTrail, AWS Config, and Amazon GuardDuty and configure them for your AWS account, according to AWS security best practices. Together, these services help you monitor activity in your AWS account; assess, audit, and evaluate the configurations of your AWS resources; monitor malicious or unauthorized behavior; and detect security threats against your resources.

Delete CloudFormation stacks

Some of the AWS resources deployed by the CloudFormation stacks in this post incur a cost as long as you continue to use them.

You can delete the CloudFormation stack to delete all AWS resources created by the stack. To clean up all your stacks, use the CloudFormation console to remove the three stacks that you created in reverse order.

To delete a stack:

  1. On the Stacks page in the CloudFormation console, and select the stack to delete. The stack must be currently running.
  2. In the stack details pane, choose Delete.
  3. Select Delete stack when prompted.

After stack deletion begins, you cannot stop it. The stack proceeds to the DELETE_IN_PROGRESS state. After the stack deletion completes, the stack changes to the DELETE_COMPLETE state. The AWS CloudFormation console does not display stacks in the DELETE_COMPLETE state by default. To display deleted stacks, you must change the stack view filter, as described in Viewing Deleted Stacks on the AWS CloudFormation Console..

If the delete fails, the stack enters the DELETE_FAILED state. For solutions, see Delete Stack Fails.

Summary

In this post, I showed you how to automate creation of an Amazon Redshift cluster and required AWS infrastructure based on AWS security and high availability best practices using AWS CloudFormation. I hope you find the sample CloudFormation templates helpful and encourage you to modify them to support your business needs.

If you have any comments or questions about this post, I encourage you to use the comments section.

 


About the Author

Sudhir Gupta is a senior partner solutions architect at Amazon Web Services. He works with AWS consulting and technology partners to provide guidance and technical assistance on data warehouse and data lake projects, helping them to improve the value of their solutions when using AWS.

AWS CloudFormation Update – Public Coverage Roadmap & CDK Goodies

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-cloudformation-update-public-coverage-roadmap-cdk-goodies/

I launched AWS CloudFormation in early 2011 with a pair of posts: AWS CloudFormation – Create Your AWS Stack From a Recipe and AWS CloudFormation in the AWS Management Console. Since that launch, we have added support for many AWS resource types, launched many new features, and worked behind the scenes to ensure that CloudFormation is efficient, scalable, and highly available.

Public Coverage Roadmap
CloudFormation use is growing even faster than AWS itself, and the team has prioritized scalability over complete resource coverage. While our goal of providing 100% coverage remains, the reality is that it will take us some time to get there. In order to be more transparent about our priorities and to give you an opportunity to manage them, I am pleased to announce the much-anticipated CloudFormation Coverage Roadmap:

Styled after the popular AWS Containers Roadmap, the CloudFormation Coverage Roadmap contains four columns:

Shipped – Available for use in production form in all public AWS regions.

Coming Soon – Generally a few months out.

We’re working on It – Work in progress, but further out.

Researching – We’re thinking about the right way to implement the coverage.

Please feel free to create your own issues, and to give a thumbs-up to those that you need to have in order to make better use of CloudFormation:

Before I close out, I would like to address one common comment – that AWS is part of a big company, and that we should simply throw more resources at it. While the team is growing, implementing robust, secure coverage is still resource-intensive. Please consider the following quote, courtesy of the must-read Mythical Man-Month:

Good cooking takes time. If you are made to wait, it is to serve you better, and to please you.

Cloud Development Kit Goodies
The Cloud Development Kit (CDK) lets you model and provision your AWS resources using a programming language that you are already familiar with. You use a set of CDK Constructs (VPCs, subnets, and so forth) to define your application, and then use the CDK CLI to synthesize a CloudFormation template, deploy it to AWS, and create a stack.

Here are some resources to help you to get started with the CDK:

Stay Tuned
The CloudFormation Coverage Roadmap is an important waypoint on a journey toward open source that started out with cfn-lint, with some more stops in the works. Stay tuned and I’ll tell you more just as soon as I can!

Jeff;

AWS Cloud Development Kit (CDK) – TypeScript and Python are Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-cloud-development-kit-cdk-typescript-and-python-are-now-generally-available/

Managing your Infrastructure as Code provides great benefits and is often a stepping stone for a successful application of DevOps practices. In this way, instead of relying on manually performed steps, both administrators and developers can automate provisioning of compute, storage, network, and application services required by their applications using configuration files.

For example, defining your Infrastructure as Code makes it possible to:

  • Keep infrastructure and application code in the same repository
  • Make infrastructure changes repeatable and predictable across different environments, AWS accounts, and AWS regions
  • Replicate production in a staging environment to enable continuous testing
  • Replicate production in a performance test environment that you use just for the time required to run a stress test
  • Release infrastructure changes using the same tools as code changes, so that deployments include infrastructure updates
  • Apply software development best practices to infrastructure management, such as code reviews, or deploying small changes frequently

Configuration files used to manage your infrastructure are traditionally implemented as YAML or JSON text files, but in this way you’re missing most of the advantages of modern programming languages. Specifically with YAML, it can be very difficult to detect a file truncated while transferring to another system, or a missing line when copying and pasting from one template to another.

Wouldn’t it be better if you could use the expressive power of your favorite programming language to define your cloud infrastructure? For this reason, we introduced last year in developer preview the AWS Cloud Development Kit (CDK), an extensible open-source software development framework to model and provision your cloud infrastructure using familiar programming languages.

I am super excited to share that the AWS CDK for TypeScript and Python is generally available today!

With the AWS CDK you can design, compose, and share your own custom components that incorporate your unique requirements. For example, you can create a component setting up your own standard VPC, with its associated routing and security configurations. Or a standard CI/CD pipeline for your microservices using tools like AWS CodeBuild and CodePipeline.

Personally I really like that by using the AWS CDK, you can build your application, including the infrastructure, in your IDE, using the same programming language and with the support of autocompletion and parameter suggestion that modern IDEs have built in, without having to do a mental switch between one tool, or technology, and another. The AWS CDK makes it really fun to quickly code up your AWS infrastructure, configure it, and tie it together with your application code!

How the AWS CDK works
Everything in the AWS CDK is a construct. You can think of constructs as cloud components that can represent architectures of any complexity: a single resource, such as an S3 bucket or an SNS topic, a static website, or even a complex, multi-stack application that spans multiple AWS accounts and regions. To foster reusability, constructs can include other constructs. You compose constructs together into stacks, that you can deploy into an AWS environment, and apps, a collection of one of more stacks.

How to use the AWS CDK
We continuously add new features based on the feedback of our customers. That means that when creating an AWS resource, you often have to specify many options and dependencies. For example, if you create a VPC you have to think about how many Availability Zones (AZs) to use and how to configure subnets to give private and public access to the resources that will be deployed in the VPC.

To make it easy to define the state of AWS resources, the AWS Construct Library exposes the full richness of many AWS services with sensible defaults that you can customize as needed. In the case above, the VPC construct creates by default public and private subnets for all the AZs in the VPC, using 3 AZs if not specified.

For creating and managing CDK apps, you can use the AWS CDK Command Line Interface (CLI), a command-line tool that requires Node.js and can be installed quickly with:

npm install -g aws-cdk

After that, you can use the CDK CLI with different commands:

  • cdk init to initialize in the current directory a new CDK project in one of the supported programming languages
  • cdk synth to print the CloudFormation template for this app
  • cdk deploy to deploy the app in your AWS Account
  • cdk diff to compare what is in the project files with what has been deployed

Just run cdk to see more of the available commands and options.

You can easily include the CDK CLI in your deployment automation workflow, for example using Jenkins or AWS CodeBuild.

Let’s use the AWS CDK to build two sample projects, using different programming languages.

An example in TypeScript
For the first project I am using TypeScript to define the infrastructure:

cdk init app --language=typescript

Here’s a simplified view of what I want to build, not entering into the details of the public/private subnets in the VPC. There is an online frontend, writing messages in a queue, and an asynchronous backend, consuming messages from the queue:

Inside the stack, the following TypeScript code defines the resources I need, and their relations:

  • First I define the VPC and an Amazon ECS cluster in that VPC. By using the defaults provided by the AWS Construct Library, I don’t need to specify any parameter here.
  • Then I use an ECS pattern that in a few lines of code sets up an Amazon SQS queue and an ECS service running on AWS Fargate to consume the messages in that queue.
  • The ECS pattern library provides higher-level ECS constructs which follow common architectural patterns, such as load balanced services, queue processing, and scheduled tasks.
  • A Lambda function has the name of the queue, created by the ECS pattern, passed as an environment variable and is granted permissions to send messages to the queue.
  • The code of the Lambda function and the Docker image are passed as assets. Assets allow you to bundle files or directories from your project and use them with Lambda or ECS.
  • Finally, an Amazon API Gateway endpoint provides an HTTPS REST interface to the function.
const myVpc = new ec2.Vpc(this, "MyVPC");

const myCluster = new ecs.Cluster(this, "MyCluster", {
  vpc: myVpc
});

const myQueueProcessingService = new ecs_patterns.QueueProcessingFargateService(
  this, "MyQueueProcessingService", {
    cluster: myCluster,
    memoryLimitMiB: 512,
    image: ecs.ContainerImage.fromAsset("my-queue-consumer")
  });

const myFunction = new lambda.Function(
  this, "MyFrontendFunction", {
    runtime: lambda.Runtime.NODEJS_10_X,
    timeout: Duration.seconds(3),
    handler: "index.handler",
    code: lambda.Code.asset("my-front-end"),
    environment: {
      QUEUE_NAME: myQueueProcessingService.sqsQueue.queueName
    }
  });

myQueueProcessingService.sqsQueue.grantSendMessages(myFunction);

const myApi = new apigateway.LambdaRestApi(
  this, "MyFrontendApi", {
    handler: myFunction
  });

I find this code very readable and easier to maintain than the corresponding JSON or YAML. By the way, cdk synth in this case outputs more than 800 lines of plain CloudFormation YAML.

An example in Python
For the second project I am using Python:

cdk init app --language=python

I want to build a Lambda function that is executed every 10 minutes:

When you initialize a CDK project in Python, a virtualenv is set up for you. You can activate the virtualenv and install your project requirements with:

source .env/bin/activate

pip install -r requirements.txt

Note that Python autocompletion may not work with some editors, like Visual Studio Code, if you don’t start the editor from an active virtualenv.

Inside the stack, here’s the Python code defining the Lambda function and the CloudWatch Event rule to invoke the function periodically as target:

myFunction = aws_lambda.Function(
    self, "MyPeriodicFunction",
    code=aws_lambda.Code.asset("src"),
    handler="index.main",
    timeout=core.Duration.seconds(30),
    runtime=aws_lambda.Runtime.PYTHON_3_7,
)

myRule = aws_events.Rule(
    self, "MyRule",
    schedule=aws_events.Schedule.rate(core.Duration.minutes(10)),
)
myRule.add_target(aws_events_targets.LambdaFunction(myFunction))

Again, this is easy to understand even if you don’t know the details of AWS CDK. For example, durations include the time unit and you don’t have to wonder if they are expressed in seconds, milliseconds, or days. The output of cdk synth in this case is more than 90 lines of plain CloudFormation YAML.

Available Now
There is no charge for using the AWS CDK, you pay for the AWS resources that are deployed by the tool.

To quickly get hands-on with the CDK, start with this awesome step-by-step online tutorial!

More examples of CDK projects, using different programming languages, are available in this repository:

https://github.com/aws-samples/aws-cdk-examples

You can find more information on writing your own constructs here.

The AWS CDK is open source and we welcome your contribution to make it an even better tool:

https://github.com/awslabs/aws-cdk

Check out our source code on GitHub, start building your infrastructure today using TypeScript or Python, or try different languages in developer preview, such as C# and Java, and give us your feedback!

Build and automate a serverless data lake using an AWS Glue trigger for the Data Catalog and ETL jobs

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/big-data/build-and-automate-a-serverless-data-lake-using-an-aws-glue-trigger-for-the-data-catalog-and-etl-jobs/

Today, data is flowing from everywhere, whether it is unstructured data from resources like IoT sensors, application logs, and clickstreams, or structured data from transaction applications, relational databases, and spreadsheets. Data has become a crucial part of every business. This has resulted in a need to maintain a single source of truth and automate the entire pipeline—from data ingestion to transformation and analytics— to extract value from the data quickly.

There is a growing concern over the complexity of data analysis as the data volume, velocity, and variety increases. The concern stems from the number and complexity of steps it takes to get data to a state that is usable by business users. Often data engineering teams spend most of their time on building and optimizing extract, transform, and load (ETL) pipelines. Automating the entire process can reduce the time to value and cost of operations. In this post, we describe how to create a fully automated data cataloging and ETL pipeline to transform your data.

Architecture

In this post, you learn how to build and automate the following architecture.

You build your serverless data lake with Amazon Simple Storage Service (Amazon S3) as the primary data store. Given the scalability and high availability of Amazon S3, it is best suited as the single source of truth for your data.

You can use various techniques to ingest and store data in Amazon S3. For example, you can use Amazon Kinesis Data Firehose to ingest streaming data. You can use AWS Database Migration Service (AWS DMS) to ingest relational data from existing databases. And you can use AWS DataSync to ingest files from an on-premises Network File System (NFS).

Ingested data lands in an Amazon S3 bucket that we refer to as the raw zone. To make that data available, you have to catalog its schema in the AWS Glue Data Catalog. You can do this using an AWS Lambda function invoked by an Amazon S3 trigger to start an AWS Glue crawler that catalogs the data. When the crawler is finished creating the table definition, you invoke a second Lambda function using an Amazon CloudWatch Events rule. This step starts an AWS Glue ETL job to process and output the data into another Amazon S3 bucket that we refer to as the processed zone.

The AWS Glue ETL job converts the data to Apache Parquet format and stores it in the processed S3 bucket. You can modify the ETL job to achieve other objectives, like more granular partitioning, compression, or enriching of the data. Monitoring and notification is an integral part of the automation process. So as soon as the ETL job finishes, another CloudWatch rule sends you an email notification using an Amazon Simple Notification Service (Amazon SNS) topic. This notification indicates that your data was successfully processed.

In summary, this pipeline classifies and transforms your data, sending you an email notification upon completion.

Deploy the automated data pipeline using AWS CloudFormation

First, you use AWS CloudFormation templates to create all of the necessary resources. This removes opportunities for manual error, increases efficiency, and ensures consistent configurations over time.

Launch the AWS CloudFormation template with the following Launch stack button.

Be sure to choose the US East (N. Virginia) Region (us-east-1). Then enter the appropriate stack name, email address, and AWS Glue crawler name to create the Data Catalog. Add the AWS Glue database name to save the metadata tables. Acknowledge the IAM resource creation as shown in the following screenshot, and choose Create.

Note: It is important to enter your valid email address so that you get a notification when the ETL job is finished.

This AWS CloudFormation template creates the following resources in your AWS account:

  • Two Amazon S3 buckets to store both the raw data and processed Parquet data.
  • Two AWS Lambda functions: one to create the AWS Glue Data Catalog and another function to publish topics to Amazon SNS.
  • An Amazon Simple Queue Service (Amazon SQS) queue for maintaining the retry logic.
  • An Amazon SNS topic to inform you that your data has been successfully processed.
  • Two CloudWatch Events rules: one rule on the AWS Glue crawler and another on the AWS Glue ETL job.
  • AWS Identity and Access Management (IAM) roles for accessing AWS Glue, Amazon SNS, Amazon SQS, and Amazon S3.

When the AWS CloudFormation stack is ready, check your email and confirm the SNS subscription. Choose the Resources tab and find the details.

Follow these steps to verify your email subscription so that you receive an email alert as soon as your ETL job finishes.

  1. On the Amazon SNS console, in the navigation pane, choose Topics. An SNS topic named SNSProcessedEvent appears in the display.

  1. Choose the ARN The topic details page appears, listing the email subscription as Pending confirmation. Be sure to confirm the subscription for your email address as provided in the Endpoint column.

If you don’t see an email address, or the link is showing as not valid in the email, choose the corresponding subscription endpoint. Then choose Request confirmation to confirm your subscription. Be sure to check your email junk folder for the request confirmation link.

Configure an Amazon S3 bucket event trigger

In this section, you configure a trigger on a raw S3 bucket. So when new data lands in the bucket, you trigger GlueTriggerLambda, which was created in the AWS CloudFormation deployment.

To configure notifications:

  1. Open the Amazon S3 console.
  2. Choose the source bucket. In this case, the bucket name contains raws3bucket, for example, <stackname>-raws3bucket-1k331rduk5aph.
  3. Go to the Properties tab, and under Advanced settings, choose Events.

  1. Choose Add notification and configure a notification with the following settings:
  • Name– Enter a name of your choice. In this example, it is crawlerlambdaTrigger.
  • Events– Select the All object create events check box to create the AWS Glue Data Catalog when you upload the file.
  • Send to– Choose Lambda function.
  • Lambda– Choose the Lambda function that was created in the deployment section. Your Lambda function should contain the string GlueTriggerLambda.

See the following screenshot for all the settings. When you’re finished, choose Save.

For more details on configuring events, see How Do I Enable and Configure Event Notifications for an S3 Bucket? in the Amazon S3 Console User Guide.

Download the dataset

For this post, you use a publicly available New York green taxi dataset in CSV format. You upload monthly data to your raw zone and perform automated data cataloging using an AWS Glue crawler. After cataloging, an automated AWS Glue ETL job triggers to transform the monthly green taxi data to Parquet format and store it in the processed zone.

You can download the raw dataset from the NYC Taxi & Limousine Commission trip record data site. Download the monthly green taxi dataset and upload only one month of data. For example, first upload only the green taxi January 2018 data to the raw S3 bucket.

Automate the Data Catalog with an AWS Glue crawler

One of the important aspects of a modern data lake is to catalog the available data so that it’s easily discoverable. To run ETL jobs or ad hoc queries against your data lake, you must first determine the schema of the data along with other metadata information like location, format, and size. An AWS Glue crawler makes this process easy.

After you upload the data into the raw zone, the Amazon S3 trigger that you created earlier in the post invokes the GlueTriggerLambdafunction. This function creates an AWS Glue Data Catalog that stores metadata information inferred from the data that was crawled.

Open the AWS Glue console. You should see the database, table, and crawler that were created using the AWS CloudFormation template. Your AWS Glue crawler should appear as follows.

Browse to the table using the left navigation, and you will see the table in the database that you created earlier.

Choose the table name, and further explore the metadata discovered by the crawler, as shown following.

You can also view the columns, data types, and other details.  In following screenshot, Glue Crawler has created schema from files available in Amazon S3 by determining column name and respective data type. You can use this schema to create external table.

Author ETL jobs with AWS Glue

AWS Glue provides a managed Apache Spark environment to run your ETL job without maintaining any infrastructure with a pay as you go model.

Open the AWS Glue console and choose Jobs under the ETL section to start authoring an AWS Glue ETL job. Give the job a name of your choice, and note the name because you’ll need it later. Choose the already created IAM role with the name containing <stackname>– GlueLabRole, as shown following. Keep the other default options.

AWS Glue generates the required Python or Scala code, which you can customize as per your data transformation needs. In the Advanced properties section, choose Enable in the Job bookmark list to avoid reprocessing old data.

On the next page, choose your raw Amazon S3 bucket as the data source, and choose Next. On the Data target page, choose the processed Amazon S3 bucket as the data target path, and choose Parquet as the Format.

On the next page, you can make schema changes as required, such as changing column names, dropping ones that you’re less interested in, or even changing data types. AWS Glue generates the ETL code accordingly.

Lastly, review your job parameters, and choose Save Job and Edit Script, as shown following.

On the next page, you can modify the script further as per your data transformation requirements. For this post, you can leave the script as is. In the next section, you automate the execution of this ETL job.

Automate ETL job execution

As the frequency of data ingestion increases, you will want to automate the ETL job to transform the data. Automating this process helps reduce operational overhead and free your data engineering team to focus on more critical tasks.

AWS Glue is optimized for processing data in batches. You can configure it to process data in batches on a set time interval. How often you run a job is determined by how recent the end user expects the data to be and the cost of processing. For information about the different methods, see Triggering Jobs in AWS Glue in the AWS Glue Developer Guide.

First, you need to make one-time changes and configure your ETL job name in the Lambda function and the CloudWatch Events rule. On the console, open the ETLJobLambda Lambda function, which was created using the AWS CloudFormation stack.

Choose the Lambda function link that appears, and explore the code. Change the JobName value to the ETL job name that you created in the previous step, and then choose Save.

As shown in in the following screenshot, you will see an AWS CloudWatch Events rule CrawlerEventRule that is associated with an AWS Lambda function. When the CloudWatch Events rule receives a success status, it triggers the ETLJobLambda Lambda function.

Now you are all set to trigger your AWS Glue ETL job as soon as you upload a file in the raw S3 bucket. Before testing your data pipeline, set up the monitoring and alerts.

Monitoring and notification with Amazon CloudWatch Events

Suppose that you want to receive a notification over email when your AWS Glue ETL job is completed. To achieve that, the CloudWatch Events rule OpsEventRule was deployed from the AWS CloudFormation template in the data pipeline deployment section. This CloudWatch Events rule monitors the status of the AWS Glue ETL job and sends an email notification using an SNS topic upon successful completion of the job.

As the following image shows, you configure your AWS Glue job name in the Event pattern section in CloudWatch. The event triggers an SNS topic configured as a target when the AWS Glue job state changes to SUCCEEDED. This SNS topic sends an email notification to the email address that you provided in the deployment section to receive notification.

Let’s make one-time configuration changes in the CloudWatch Events rule OpsEventRule to capture the status of the AWS Glue ETL job.

  1. Open the CloudWatch console.
  2. In the navigation pane, under Events, choose Rules. Choose the rule name that contains OpsEventRule, as shown following.

  1. In the upper-right corner, choose Actions, Edit.

  1. Replace Your-ETL-jobName with the ETL job name that you created in the previous step.

  1. Scroll down and choose Configure details. Then choose Update rule.

Now that you have set up an entire data pipeline in an automated way with the appropriate notifications and alerts, it’s time to test your pipeline. If you upload new monthly data to the raw Amazon S3 bucket (for example, upload the NY green taxi February 2018 CSV), it triggers the GlueTriggerLambda AWS Lambda function. You can navigate to the AWS Glue console, where you can see that the AWS Glue crawler is running.

Upon completion of the crawler, the CloudWatch Events rule CrawlerEventRule triggers your ETLJobLambda Lambda function. You can notice now that the AWS Glue ETL job is running.

When the ETL job is successful, the CloudWatch Events rule OpsEventRule sends an email notification to you using an Amazon SNS topic, as shown following, hence completing the automation cycle.

Be sure to check your processed Amazon S3 bucket, where you will find transformed data processed by your automated ETL pipeline. Now that the processed data is ready in Amazon S3, you need to run the AWS Glue crawler on this Amazon S3 location. The crawler creates a metadata table with the relevant schema in the AWS Glue Data Catalog.

After the Data Catalog table is created, you can execute standard SQL queries using Amazon Athena and visualize the data using Amazon QuickSight. To learn more, see the blog post Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight

Conclusion

Having an automated serverless data lake architecture lessens the burden of managing data from its source to destination—including discovery, audit, monitoring, and data quality. With an automated data pipeline across organizations, you can identify relevant datasets and extract value much faster than before. The advantage of reducing the time to analysis is that businesses can analyze the data as it becomes available in real time. From the BI tools, queries return results much faster for a single dataset than for multiple databases.

Business analysts can now get their job done faster, and data engineering teams can free themselves from repetitive tasks. You can extend it further by loading your data into a data warehouse like Amazon Redshift or making it available for machine learning via Amazon SageMaker.

Additional resources

See the following resources for more information:

 


About the Author

Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture in hybrid and AWS environments. He enjoys spending time with his family outdoors and traveling to new destinations to discover new cultures.

 

 

 

Luis Lopez Soria is a partner solutions architect and serverless specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys doing sports in addition to traveling around the world exploring new foods and cultures.

 

 

 

Chirag Oswal is a partner solutions architect and AR/VR specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys video games and travel.

Boost your infrastructure with the AWS CDK

Post Syndicated from Philipp Garbe original https://aws.amazon.com/blogs/aws/boost-your-infrastructure-with-cdk/

This guest post is by AWS Container Hero Philipp Garbe. Philipp works as Lead Platform Engineer at Scout24 in Germany. He is driven by technologies and tools that allow him to release faster and more often. He expects that every commit automatically goes into production. You can find him on Twitter at @pgarbe.

Infrastructure as code (IaC) has been adopted by many teams in the last few years. It makes provisioning of your infrastructure easy and helps to keep your environments consistent.

But by using declarative templates, you might still miss many practices that you are used to for “normal” code. You’ve probably already felt the pain that each AWS CloudFormation template is just a copy and paste of your last projects or from StackOverflow. But can you trust these snippets? How can you align improvements or even security fixes through your code base? How can you share best practices within your company or the community?

Fortunately for everyone, AWS published the beta for an important addition to AWS CloudFormation: the AWS Cloud Development Kit (AWS CDK).

What’s the big deal about the AWS CDK?

All your best practices about how to write good AWS CloudFormation templates can now easily be shared within your company or the developer community. At the same time, you can also benefit from others doing the same thing.

For example, think about Amazon DynamoDB. Should be easy to set up in AWS CloudFormation, right? Just some lines in your template. But wait. When you’re already in production, you realize that you’ve got to set up automatic scaling, regular backups, and most importantly, alarms for all relevant metrics. This can amount to several hundred lines.

Think ahead: Maybe you’ve got to create another application that also needs a DynamoDB database. Do you copy and paste all that YAML code? What happens later, when you find some bugs in your template? Do you apply the fix to both code bases?

With the AWS CDK, you’re able to write a “construct” for your best practice, production-ready DynamoDB database. Share it as an npm package with your company or anyone!

What is the AWS CDK?

Back up a step and see what the AWS CDK looks like. Compared to the declarative approach with YAML (or JSON), the CDK allows you to declare your infrastructure imperatively. The main language is TypeScript, but several other languages are also supported.

This is what the Hello World example from Hello, AWS CDK! looks like:

import cdk = require('@aws-cdk/cdk');
import s3 = require('@aws-cdk/aws-s3');

class MyStack extends cdk.Stack {
constructor(parent: cdk.App, id: string, props?: cdk.StackProps) {
super(parent, id, props);

new s3.Bucket(this, 'MyFirstBucket', {
versioned: true
});
}
}

class MyApp extends cdk.App {
constructor(argv: string[]) {
super(argv);

new MyStack(this, 'hello-cdk');
}
}

new MyApp().run();

Apps are the root constructs and can be used directly by the CDK CLI to render and deploy the AWS CloudFormation template.

Apps consist of one or more stacks that are deployable units and contains information about the Region and account. It’s possible to have an app that deploys different stacks to multiple Regions at the same time.

Stacks include constructs that are representations of AWS resources like a DynamoDB table or AWS Lambda function.

A lib is a construct that typically encapsulates further constructs. With that, higher class constructs can be built and also reused. As the construct is just TypeScript (or any other supported language), a package can be built and shared by any package manager.

Constructs

As the CDK is all about constructs, it’s important to understand them. It’s a hierarchical structure called a construct tree. You can think of constructs in three levels:

Level 1: AWS CloudFormation resources

This is a one-to-one mapping of existing resources and is automatically generated. It’s the same as the resources that you use currently in YAML. Ideally, you don’t have to deal with these constructs directly.

Level 2: The AWS Construct Library

These constructs are on an AWS service level. They’re opinionated, well-architected, and handwritten by AWS. They come with proper defaults and should make it easy to create AWS resources without worrying too much about the details.

As an example, this is how to create a complete VPC with private and public subnets in all available Availability Zones:

import ec2 = require('@aws-cdk/aws-ec2');

const vpc = new ec2.VpcNetwork(this, 'VPC');

The AWS Construct Library has some nice concepts about least privilege IAM policies, event-driven API actions, security groups, and metrics. For example, IAM policies are automatically created based on your intent. When a Lambda function subscribes to an SNS topic, a policy is created that allows the topic to invoke the function.

AWS services that offer Amazon CloudWatch metrics have functions like metricXxx() and return metric objects that can easily be used to create alarms.

new Alarm(this, 'Alarm', {
metric: fn.metricErrors(),
threshold: 100,
evaluationPeriods: 2,
});

For more information, see AWS Construct Library.

Level 3: Your awesome stuff

Here’s where it gets interesting. As mentioned earlier, constructs are hierarchical. They can be higher-level abstractions based on other constructs. For example, on this level, you can write your own Amazon ECS cluster construct that contains automatic node draining, automatic scaling, and all the right alarms. Or you can write a construct for all necessary alarms that an Amazon RDS database should monitor. It’s up to you to create and share your constructs.

Conclusion

It’s good that AWS went public in an early stage. The docs are already good, but not everything is covered yet. Not all AWS services have an AWS Construct Library module defined (level 2). Many have only the pure AWS CloudFormation constructs (level 1).

Personally, I think the AWS CDK is a huge step forward, as it allows you to re-use AWS CloudFormation code and share it with others. It makes it easy to apply company standards and allows people to work on awesome features and spend less time on writing “boring” code.

Using AWS CodePipeline to Perform Multi-Region Deployments

Post Syndicated from Nithin Reddy Cheruku original https://aws.amazon.com/blogs/devops/using-aws-codepipeline-to-perform-multi-region-deployments/

AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. Now that AWS CodePipeline supports cross-region actions, you can deploy your application across multiple regions from a single pipeline. Deploying your application to multiple regions can improve both latency and availability for your application.

Other AWS services

AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Lambda, and your on-premises servers.

AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment.

Amazon S3 has a simple web service interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.

Key AWS CodePipeline concepts

Stage: AWS CodePipeline breaks up your release workflow into a series of stages. For example, there might be a build stage, where code is built and tests are run. There are also deployment stages, where code updates are deployed to runtime environments. You can label each stage in the release process for better tracking, control, and reporting (for example “Source,” “Build,” and “Staging”).

Action: Every pipeline stage contains at least one action, which is some kind of task performed on the artifact. Pipeline actions occur in a specified order, in sequence or in parallel, as determined in the configuration of the stage.

For more information, see How AWS CodePipeline Works in the AWS CodePipeline User Guide.

In this blog post, you will learn how to:

  1. Create a continuous delivery pipeline using AWS CodePipeline and provisioned by AWS CloudFormation.
  2. Set up pipeline actions to execute in an AWS Region that is different from the region where the pipeline was created.
  3. Deploy a sample application to multiple regions using an AWS CodeDeploy action in the pipeline.

High-level deployment architecture

The deployment process can be summarized as follows:

  1. The latest application code is uploaded into an Amazon S3 bucket. Any new revision uploaded to the bucket
    triggers a pipeline execution.
  2. For each AWS CodeDeploy action in the pipeline, the application code from the S3 bucket is replicated to the
    artifact store of the region that is configured for that action.
  3. Each AWS CodeDeploy action deploys the latest revision of the application from its artifact store to Amazon
    EC2 instances in the region.

In this blog post, we set our primary region to us-west-2 region. The secondary regions are set to us-east-1 and ap-southeast-2.

Note: The resources created by the AWS CloudFormation template might result in charges to your account. The cost depends on how long you keep the AWS CloudFormation stack and its resources running.

We will walk you through the following steps for creating a multi-region deployment pipeline:

  1. Set up resources to which you will deploy your application using AWS CodeDeploy.
  2. Set up artifact stores for AWS CodePipeline in Amazon S3.
  3. Provision AWS CodePipeline with AWS CloudFormation.
  4. View deployments performed by the pipeline in the AWS Management Console.
  5. Validate the deployments.

Getting started

Step 1. As part of this process of setting up resources, you install the AWS CodeDeploy agent on the instances. The AWS CodeDeploy agent is a software package that enables an instance to be used in AWS CodeDeploy deployments. There are two tasks in this step:

  • Create Amazon EC2 instances and install the AWS CodeDeploy agent.
  • Create an application in AWS CodeDeploy.

The AWS CloudFormation template automates both tasks. We will launch the AWS CloudFormation template in each of the three regions (us-west-2, us-east-1, and ap-southeast-2).

Note: Before you begin, you must have an instance key pair to enable SSH access to the Amazon EC2 instance for that region. For more information, see Amazon EC2 Key Pairs.

To create the EC2 instances and an AWS CodeDeploy application, in the AWS CloudFormation console, launch the following AWS CloudFormation templates in each region (us-west-2, us-east-1, and ap-southeast-2). For information about how to launch AWS CloudFormation from the AWS Management Console, see Using the AWS CloudFormation Console.

On the Specify Details page, do the following:

  1. In Stack name, enter a name for the stack (for example, USEast1CodeDeploy).
  2. In ApplicationName, enter a name for the application (for example, CrossRegionActionSupport).
  3. In DeploymentGroupName, enter a name for the deployment group (for example, CrossRegionActionSupportDeploymentGroup).
  4. In EC2KeyPairName, if you already have a key pair to use with Amazon EC2 instances in that region, choose an existing key pair, and then select your key pair. For more information, see Amazon EC2 Key Pairs.
  5. In EC2TagKeyName, enter Name.
  6. In EC2TagValue, enter NVirginiaCrossRegionInstance.
  7. Choose Next.

It could take several minutes for AWS CloudFormation to create the resources on your behalf. You can watch the progress messages on the Events tab in the console. When the stack has been created, you will see a CREATE_COMPLETE message in the Status column on the Overview tab.

You should see new EC2 instances running in each of the three regions (us-west-2, us-east-1, and ap-southeast-2).

Step 2.  Set up artifact stores for AWS CodePipeline. AWS CodePipeline uses Amazon S3 buckets as an artifact store. These S3 buckets are regional and versioned. All of the artifacts are copied to the same region in which the pipeline action is configured to execute.

To create the artifact stores by using the AWS CloudFormation console, launch this AWS CloudFormation template in each region (us-west-2, us-east-1, and ap-southeast-2).

On the Specify Details page, do the following:

  1. In Stack name, enter a name for the stack (for example, artifactstore).
  2. In ArtifactStoreBucketNamePrefix, enter a prefix string of up to 30 characters. Use only lowercase letters, numbers, periods, and hyphens (for example, useast1).
  3. Choose Next.

It might take several minutes for AWS CloudFormation to create the resources on your behalf. You can watch the progress messages on the Events tab in the console. When the stack has been created, you will see a CREATE_COMPLETE message in the Status column on the Overview tab.

Now, copy the Amazon S3 bucket names created in each region. You need the bucket names in later steps.

Note: All Amazon S3 buckets, including the bucket used by the Source action in the pipeline, must be version-enabled to track versions that are being uploaded and processed by AWS CodePipeline.

Step 3. Provision AWS CodePipeline with AWS CloudFormation. We will create a new pipeline in AWS CodePipeline with an Amazon S3 bucket for its Source action and AWS CodeDeploy for its Deploy action. Using AWS CloudFormation, we will provision a new Amazon S3 bucket for the Source action and then provision a new pipeline in AWS CodePipeline.

To provision a new S3 bucket in the AWS CloudFormation console, launch this AWS CloudFormation template in our primary region, us-west-2.

On the Specify Details page, do the following:

  1. In Stack name, enter a name for the stack (for example, code-pipeline-us-west2-source-bucket).
  2. In SourceCodeBucketNamePrefix, enter a prefix string of up to 30 characters. Use only lowercase letters, numbers, periods, and hyphens (for example, uswest2).
  3. Choose Next.

It might take several minutes for AWS CloudFormation to create the resources on your behalf. You can watch the progress messages on the Events tab in the console. When the stack has been created, you will see a CREATE_COMPLETE message in the Status column on the Overview tab.

Download the sample app from s3-app-linux.zip and upload it to the source code bucket.

To provision a new pipeline in AWS CodePipeline

In the AWS CloudFormation console, launch this AWS CloudFormation template in our primary region, us-west-2.

On the Specify Details page, do the following:

  1. In Stack name, enter a name for the stack (for example, CrossRegionCodePipeline).
  2. In ApplicationName, enter a name for the application (for example, CrossRegionActionSupport).
  3. In APSouthEast2ArtifactStoreBucket, enter cross-region-artifact-store-bucket-ap-southeast-2 or enter the name you provided in step 2 for the S3 bucket created in ap-southeast-2.
  4. In DeploymentGroupName, enter a name for the deployment group (for example, CrossRegionActionSupportDeploymentGroup).
  5. In S3SourceBucketName, enter code-pipeline-us-west-2-source-bucket or enter the name you provided in step 3.
  6. In USEast1ArtifactStoreBucket, enter cross-region-artifact-store-bucket-us-east-1 or enter the name you provided in step 2 for the S3 bucket created in us-east-1.
  7. In USWest2ArtifactStoreBucket, enter cross-region-artifact-store-bucket-us-west-2 or enter the name you provided in step 2 for the S3 bucket created in us-west-2.
  8. In S3SourceBucketKey, enter s3-app-linux.zip.
  9. Choose Next.

It might take several minutes for AWS CloudFormation to create the resources on your behalf. You can watch the progress messages on the Events tab in the console. When the stack has been created, you will see a CREATE_COMPLETE message in the Status column on the Overview tab.

Step 4. View deployments performed by our pipeline in the AWS Management Console.

  1. In the Amazon S3 console, navigate to source bucket and copy the version ID (for example, in the following screenshot, kTtNtrHIhMt4.cX6YZHZ5lawDVy3R4Aj). 
  2. Go to the AWS CodePipeline console and open the pipeline we just executed. Notice that the version ID is the same across the source S3 bucket, the source action, and all three CodeDeploy actions across three regions (us-west-2, us-east-1, and ap-southeast-2) in the pipeline. 
  3. We can see that the deployment actions ran successfully across all three regions. 

Step 5. To validate the deployments, in a browser, type the public IP address of the Amazon EC2 instances provisioned through AWS CodeDeploy in step 1. You should see a deployment page like the one shown here. 

Conclusion

You have now created a multi-region deployment pipeline in AWS CodePipeline without having to worry about the mechanics of copying code across regions. AWS CodePipeline abstracted the copying of the code in the background using the artifact stores in each region. You can now upload new source code changes to the Amazon S3 source bucket in the primary region and changes will be deployed automatically to other regions in parallel using AWS CodeDeploy actions configured to execute in each region. Cross-region actions are very powerful and are not limited to deploy actions alone. They can also be used with build and test actions.

Wrapping up

After you’ve finished exploring your pipeline and its associated resources, you can do the following:

  • Extend the setup. Add more stages and actions to your pipeline in AWS CodePipeline. For complete AWS CloudFormation sample code, see the GitHub repository.
  • Delete the stack in AWS CloudFormation. This deletes the pipeline, its resources, and the stack itself. This is the option to choose if you no longer want to use the pipeline or any of its resources. Cleaning up resources you’re no longer using is important because you don’t want to continue to be charged.

To delete the CloudFormation stack

  • Delete the Amazon S3 buckets used as the artifact stores in AWS CodePipeline in the source and destination regions. Although the buckets were created as part of the AWS CloudFormation stack, Amazon S3 does not allow AWS CloudFormation to delete buckets that contain objects. To delete the buckets, open the Amazon S3 console, choose the buckets you created in this setup, and then delete them. For more information, see Delete or Empty a Bucket.
  • Follow the steps in the AWS CloudFormation User Guide to delete a stack.

If you have questions about this blog post, start a new thread on the AWS CodePipeline forum or contact AWS Support.

New – CloudFormation Drift Detection

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-cloudformation-drift-detection/

AWS CloudFormation supports you in your efforts to implement Infrastructure as Code (IaC). You can use a template to define the desired AWS resource configuration, and then use it to launch a CloudFormation stack. The stack contains the set of resources defined in the template, configured as specified. When you need to make a change to the configuration, you update the template and use a CloudFormation Change Set to apply the change. Your template completely and precisely specifies your infrastructure and you can rest assured that you can use it to create a fresh set of resources at any time.

That’s the ideal case! In reality, many organizations are still working to fully implement IaC. They are educating their staff and adjusting their processes, both of which take some time. During this transition period, they sometimes end up making direct changes to the AWS resources (and their properties) without updating the template. They might make a quick out-of-band fix to change an EC2 instance type, fix an Auto Scaling parameter, or update an IAM permission. These unmanaged configuration changes become problematic when it comes time to start fresh. The configuration of the running stack has drifted away from the template and is no longer properly described by it. In severe cases, the change can even thwart attempts to update or delete the stack.

New Drift Detection
Today we are announcing a powerful new drift detection feature that was designed to address the situation that I described above. After you create a stack from a template, you can detect drift from the Console, CLI, or from your own code. You can detect drift on an entire stack or on a particular resource, and see the results in just a few minutes. You then have the information necessary to update the template or to bring the resource back into compliance, as appropriate.

When you initiate a check for drift detection, CloudFormation compares the current stack configuration to the one specified in the template that was used to create or update the stack and reports on any differences, providing you with detailed information on each one.

We are launching with support for a core set of services, resources, and properties, with plans to add more over time. The initial list of resources spans API Gateway, Auto Scaling, CloudTrail, CloudWatch Events, CloudWatch Logs, DynamoDB, Amazon EC2, Elastic Load Balancing, IAM, AWS IoT, Lambda, Amazon RDS, Route 53, Amazon S3, Amazon SNS, Amazon SQS, and more.

You can perform drift detection on stacks that are in the CREATE_COMPLETE, UPDATE_COMPLETE, UPDATE_ROLLBACK_COMPLETE, and UPDATE_ROLLBACK_FAILED states. The drift detection does not apply to other stacks that are nested within the one you check; you can do these checks yourself instead.

Drift Detection in Action
I tested this feature on the simple stack that I used when I wrote about Provisioned Throughput for Amazon EFS. I simply select the stack and choose Detect drift from the Action menu:

I confirm my intent and click Yes, detect:

Drift detection starts right away; I can Close the window while it runs:

After it completes I can see that the Drift status of my stack is IN_SYNC:

I can also see the drift status of each checked resource by taking a look at the Resources tab:

Now, I will create a fake change by editing the IAM role, adding a new policy:

I detect drift a second time, and this time I find (not surprise) that my stack has drifted:

I click View details, and I inspect the Resource drift status to learn more:

I can expand the status line for the modified resource to learn more about the drift:

Available Now
This feature is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), and South America (São Paulo) Regions. As I noted above, we are launching with support for a strong, initial set of resources, and plan to add many more in the months to come.

Jeff;

 

How to create and retrieve secrets managed in AWS Secrets Manager using AWS CloudFormation template

Post Syndicated from Apurv Awasthi original https://aws.amazon.com/blogs/security/how-to-create-and-retrieve-secrets-managed-in-aws-secrets-manager-using-aws-cloudformation-template/

AWS Secrets Manager now integrates with AWS CloudFormation so you can create and retrieve secrets securely using CloudFormation. This integration makes it easier to automate provisioning your AWS infrastructure. For example, without any code changes, you can generate unique secrets for your resources with every execution of your CloudFormation template. This also improves the security of your infrastructure by storing secrets securely, encrypting automatically, and enabling rotation more easily.

Secrets Manager helps you protect the secrets needed to access your applications, services, and IT resources. In this post, I show how you can get the benefits of Secrets Manager for resources provisioned through CloudFormation. First, I describe the new Secrets Manager resource types supported in CloudFormation. Next, I show a sample CloudFormation template that launches a MySQL database on Amazon Relational Database Service (RDS). This template uses the new resource types to create, rotate, and retrieve the credentials (user name and password) of the database superuser required to launch the MySQL database.

Why use Secrets Manager with CloudFormation?

CloudFormation helps you model your AWS resources as templates and execute these templates to provision AWS resources at scale. Some AWS resources require secrets as part of the provisioning process. For example, to provision a MySQL database, you must provide the credentials for the database superuser. You can use Secrets Manager, the AWS dedicated secrets management service, to create and manage such secrets.

Secrets Manager makes it easier to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. You can now reference Secrets Manager in your CloudFormation templates to create unique secrets with every invocation of your template. By default, Secrets Manager encrypts these secrets with encryption keys that you own and control. Secrets Manager ensures the secret isn’t logged or persisted by CloudFormation by using a dynamic reference to the secret. You can configure Secrets Manager to rotate your secrets automatically without disrupting your applications. Secrets Manager offers built-in integrations for rotating credentials for all Amazon RDS databases and supports extensibility with AWS Lambda so you can meet your custom rotation requirements.

New Secrets Manager resource types supported in CloudFormation

  1. AWS::SecretsManager::Secret — Create a secret and store it in Secrets Manager.
  2. AWS::SecretsManager::ResourcePolicy — Create a resource-based policy and attach it to a secret. Resource-based policies enable you to control access to secrets.
  3. AWS::SecretsManager::SecretTargetAttachment — Configure Secrets Manager to rotate the secret automatically.
  4. AWS::SecretsManager::RotationSchedule — Define the Lambda function that will be used to rotate the secret.

How to use Secrets Manager in CloudFormation

Now that you’re familiar with the new Secrets Manager resource types supported in CloudFormation, I’ll show how you can use these in a CloudFormation template. I will use a sample template that creates a MySQL database in Amazon RDS and uses Secrets Manager to create the credentials for the superuser. The template also configures the secret to rotate every 30 days automatically.

  1. Create a stack on the AWS CloudFormation console by copying the following sample template.
    
    ---
    Description: "How to create and retrieve secrets securely using an AWS CloudFormation template"
    Resources:
    
    # Create a secret with the username admin and a randomly generated password in JSON.  
      MyRDSInstanceRotationSecret:
        Type: AWS::SecretsManager::Secret
        Properties:
          Description: 'This is the secret for my RDS instance'
          GenerateSecretString:
            SecretStringTemplate: '{"username": "admin"}'
            GenerateStringKey: 'password'
            PasswordLength: 16
            ExcludeCharacters: '"@/'
    
    
    
    # Create a MySQL database of size t2.micro.
    # The secret (username and password for the superuser) will be dynamically 
    # referenced. This ensures CloudFormation will not log or persist the resolved 
    # value. 
      MyDBInstance:
        Type: AWS::RDS::DBInstance
        Properties:
          AllocatedStorage: 20
          DBInstanceClass: db.t2.micro
          Engine: mysql
          MasterUsername: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:username}}' ]]
          MasterUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:password}}' ]]
          BackupRetentionPeriod: 0
          DBInstanceIdentifier: 'rotation-instance'
    
    
    
    # Update the referenced secret with properties of the RDS database.
    # This is required to enable rotation. To learn more, visit our documentation
    # https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html
      SecretRDSInstanceAttachment:
        Type: AWS::SecretsManager::SecretTargetAttachment
        Properties:
          SecretId: !Ref MyRDSInstanceRotationSecret
          TargetId: !Ref MyDBInstance
          TargetType: AWS::RDS::DBInstance
    
    
    
    # Schedule rotating the secret every 30 days. 
    # Note, the first rotation is triggered immediately. 
    # This enables you to verify that rotation is configured appropriately.
    # Subsequent rotations are scheduled according to the configured rotation. 
      MySecretRotationSchedule:
        Type: AWS::SecretsManager::RotationSchedule
        DependsOn: SecretRDSInstanceAttachment
        Properties:
          SecretId: !Ref MyRDSInstanceRotationSecret
          RotationLambdaARN: <% replace-with-lambda-arn %>
          RotationRules:
            AutomaticallyAfterDays: 30
     
    

  2. Next, execute the stack.
     
    Figure 1: Execute the stack

    Figure 1: Execute the stack

  3. After you execute the stack, open the RDS console to verify the database, rotation-instance, has been successfully created.
     
    Figure 2: Verify the database has been created

    Figure 2: Verify the database has been created

  4. Open the Secrets Manager console and verify the stack successfully created the secret, MyRDSInstanceRotationSecret.
     
    Figure 3: Verify the stack successfully created the secret

    Figure 3: Verify the stack successfully created the secret

Summary

I showed you how to create and retrieve secrets in CloudFormation. This improves the security of your infrastructure and makes it easier to automate infrastructure provisioning. To get started managing secrets, open the Secrets Manager console. To learn more, read How to Store, Distribute, and Rotate Credentials Securely with Secret Manager or refer to the Secrets Manager documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Apurv Awasthi

Apurv is the product manager for credentials management services at AWS, including AWS Secrets Manager and IAM Roles. He enjoys the “Day 1” culture at Amazon because it aligns with his experience building startups in the sports and recruiting industries. Outside of work, Apurv enjoys hiking. He holds an MBA from UCLA and an MS in computer science from University of Kentucky.

ICYMI: Serverless Q3 2018

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2018/

Welcome to the third edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

If you didn’t see them, catch our Q1 ICYMI and Q2 ICYMI posts for what happened then.

So, what might you have missed this past quarter? Here’s the recap.

AWS Amplify CLI

In August, AWS Amplify launched the AWS Amplify Command Line Interface (CLI) toolchain for developers.

The AWS Amplify CLI enables developers to build, test, and deploy full web and mobile applications based on AWS Amplify directly from their CLI. It has built-in helpers for configuring AWS services such as Amazon Cognito for Auth , Amazon S3 and Amazon DynamoDB for storage, and Amazon API Gateway for APIs. With these helpers, developers can configure AWS services to interact with applications built in popular web frameworks such as React.

Get started with the AWS Amplify CLI toolchain.

New features

Rejoice Microsoft application developers: AWS Lambda now supports .NET Core 2.1 and PowerShell Core!

AWS SAM had a few major enhancements to help in both testing and debugging functions. The team launched support to locally emulate an endpoint for Lambda so that you can run automated tests against your functions. This differs from the existing functionality that emulated a proxy similar to API Gateway in front of your function. Combined with the new improved support for ‘sam local generate-event’ to generate over 50 different payloads, you can now test Lambda function code that would be invoked by almost all of the various services that interface with Lambda today. On the operational front, AWS SAM can now fetch, tail, and filter logs generated by your functions running live on AWS. Finally, with integration with Delve, a debugger for the Go programming language, you can more easily debug your applications locally.

If you’re part of an organization that uses AWS Service Catalog, you can now launch applications based on AWS SAM, too.

The AWS Serverless Application Repository launched new search improvements to make it even faster to find serverless applications that you can deploy.

In July, AWS AppSync added HTTP resolvers so that now you can query your REST APIs via GraphQL! API Inception! AWS AppSync also added new built-in scalar types to help with data validation at the GraphQL layer instead of having to do this in code that you write yourself. For building your GraphQL-based applications on AWS AppSync, an enhanced no-code GraphQL API builder enables you to model your data, and the service generates your GraphQL schema, Amazon DynamoDB tables, and resolvers for your backend. The team also published a Quick Start for using Amazon Aurora as a data source via a Lambda function. Finally, the service is now available in the Asia Pacific (Seoul) Region.

Amazon API Gateway announced support for AWS X-Ray!

With X-Ray integrated in API Gateway, you can trace and profile application workflows starting at the API layer and going through the backend. You can control the sample rates at a granular level.

API Gateway also announced improvements to usage plans that allow for method level throttling, request/response parameter and status overrides, and higher limits for the number of APIs per account for regional, private, and edge APIs. Finally, the team added support for the OpenAPI 3.0 API specification, the next generation of OpenAPI 2, formerly known as Swagger.

AWS Step Functions is now available in the Asia Pacific (Mumbai) Region. You can also build workflows visually with Step Functions and trigger them directly with AWS IoT Rules.

AWS [email protected] now makes the HTTP Request Body for POST and PUT requests available.

AWS CloudFormation announced Macros, a feature that enables customers to extend the functionality of AWS CloudFormation templates by calling out to transformations that Lambda powers. Macros are the same technology that enables SAM to exist.

Serverless posts

July:

August:

September:

Tech Talks

We hold several Serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q3:

Twitch

We’ve been busy streaming deeply technical content to you the past few months! Check out awesome sessions like this one by AWS’s Heitor Lessa and Jason Barto diving deep into Continuous Learning for ML and the entire “Build on Serverless” playlist.

For information about upcoming broadcasts and recent live streams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

For AWS Partners

In September, we announced the AWS Serverless Navigate program for AWS APN Partners. Via this program, APN Partners can gain a deeper understanding of the AWS Serverless Platform, including many of the services mentioned in this post. The program’s phases help partners learn best practices such as the Well-Architected Framework, business and technical concepts, and growing their business’s ability to better support AWS customers in their serverless projects.

Check out more at AWS Serverless Navigate.

In other news

AWS re:Invent 2018 is coming in just a few weeks! For November 26–30 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. The agenda for Serverless talks contains over 100 sessions where you can hear about serverless applications and technologies from fellow AWS customers, AWS product teams, solutions architects, evangelists, and more.

Register for AWS re:Invent now!

Want to get a sneak peek into what you can expect at re:Invent this year? Check out the awesome re:Invent Guides put out by AWS Community Heroes. AWS Community Hero Eric Hammond (@esh on Twitter) published one for advanced serverless attendees that you will want to read before the big event.

What did we do at AWS re:Invent 2017? Check out our recap: Serverless @ re:Invent 2017.

Still looking for more?

The Serverless landing page has lots of information. The resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Extending AWS CloudFormation with AWS Lambda Powered Macros

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/cloudformation-macros/

Today I’m really excited to show you a powerful new feature of AWS CloudFormation called Macros. CloudFormation Macros allow developers to extend the native syntax of CloudFormation templates by calling out to AWS Lambda powered transformations. This is the same technology that powers the popular Serverless Application Model functionality but the transforms run in your own accounts, on your own lambda functions, and they’re completely customizable. CloudFormation, if you’re new to AWS, is an absolutely essential tool for modeling and defining your infrastructure as code (YAML or JSON). It is a core building block for all of AWS and many of our services depend on it.

There are two major steps for using macros. First, we need to define a macro, which of course, we do with a CloudFormation template. Second, to use the created macro in our template we need to add it as a transform for the entire template or call it directly. Throughout this post, I use the term macro and transform somewhat interchangeably. Ready to see how this works?

Creating a CloudFormation Macro

Creating a macro has two components: a definition and an implementation. To create the definition of a macro we create a CloudFormation resource of a type AWS::CloudFormation::Macro, that outlines which Lambda function to use and what the macro should be called.

Type: "AWS::CloudFormation::Macro"
Properties:
  Description: String
  FunctionName: String
  LogGroupName: String
  LogRoleARN: String
  Name: String

The Name of the macro must be unique throughout the region and the Lambda function referenced by FunctionName must be in the same region the macro is being created in. When you execute the macro template, it will make that macro available for other templates to use. The implementation of the macro is fulfilled by a Lambda function. Macros can be in their own templates or grouped with others, but you won’t be able to use a macro in the same template you’re registering it in. The Lambda function receives a JSON payload that looks like something like this:

{
    "region": "us-east-1",
    "accountId": "$ACCOUNT_ID",
    "fragment": { ... },
    "transformId": "$TRANSFORM_ID",
    "params": { ... },
    "requestId": "$REQUEST_ID",
    "templateParameterValues": { ... }
}

The fragment portion of the payload contains either the entire template or the relevant fragments of the template – depending on how the transform is invoked from the calling template. The fragment will always be in JSON, even if the template is in YAML.

The Lambda function is expected to return a simple JSON response:

{
    "requestId": "$REQUEST_ID",
    "status": "success",
    "fragment": { ... }
}

The requestId needs to be the same as the one received in the input payload, and if status contains any value other than success (case-insensitive) then the changeset will fail to create. Now, fragment must contain the valid CloudFormation JSON of the transformed template. Even if your function performed no action it would still need to return the fragment for it to be included in the final template.

Using CloudFormation Macros


To use the macro we simply call out to Fn::Transform with the required parameters. If we want to have a macro parse the whole template we can include it in our list of transforms in the template the same way we would with SAM: Transform: [Echo]. When we go to execute this template the transforms will be collected into a changeset, by calling out to each macro’s specified function and returning the final template.

Let’s imagine we have a dummy Lambda function called EchoFunction, it just logs the data passed into it and returns the fragments unchanged. We define the macro as a normal CloudFormation resource, like this:

EchoMacro:
  Type: "AWS::CloudFormation::Macro"
  Properties:
    FunctionName: arn:aws:lambda:us-east-1:1234567:function:EchoFunction
	Name: EchoMacro

The code for the lambda function could be as simple as this:

def lambda_handler(event, context):
    print(event)
    return {
        "requestId": event['requestId'],
        "status": "success",
        "fragment": event["fragment"]
    }

Then, after deploying this function and executing the macro template, we can invoke the macro in a transform at the top level of any other template like this:

AWSTemplateFormatVersion: 2010-09-09 
 Transform: [EchoMacro, AWS::Serverless-2016-10-31]
 Resources:
    FancyTable:
      Type: AWS::Serverless::SimpleTable

The CloudFormation service creates a changeset for the template by first calling the Echo macro we defined and then the AWS::Serverless transform. It will execute the macros listed in the transform in the order they’re listed.

We could also invoke the macro using the Fn::Transform intrinsic function which allows us to pass in additional parameters. For example:

AWSTemplateFormatVersion: 2010-09-09
Resources:
  MyS3Bucket:
    Type: 'AWS::S3::Bucket'
    Fn::Transform:
      Name: EchoMacro
      Parameters:
        Key: Value

The inline transform will have access to all of its sibling nodes and all of its children nodes. Transforms are processed from deepest to shallowest which means top-level transforms are executed last. Since I know most of you are going to ask: no you cannot include macros within macros – but nice try.

When you go to execute the CloudFormation template it would simply ask you to create a changeset and you could preview the output before deploying.

Example Macros

We’re launching a number of reference macros to help developers get started and I expect many people will publish others. These four are the winners from a little internal hackathon we had prior to releasing this feature:

NameDescriptionAuthor
PyPlateAllows you to inline Python in your templatesJay McConnel – Partner SA
ShortHandDefines a short-hand syntax for common cloudformation resourcesSteve Engledow – Solutions Builder
StackMetricsAdds cloudwatch metrics to stacksSteve Engledow and Jason Gregson – Global SA
String FunctionsAdds common string functions to your templatesJay McConnel – Partner SA

Here are a few ideas I thought of that might be fun for someone to implement:

If you end up building something cool I’m more than happy to tweet it out!

Available Now

CloudFormation Macros are available today, in all AWS regions that have AWS Lambda. There is no additional CloudFormation charge for Macros meaning you are only billed normal AWS Lambda function charges. The documentation has more information that may be helpful.

This is one of my favorite new features for CloudFormation and I’m excited to see some of the amazing things our customers will build with it. The real power here is that you can extend your existing infrastructure as code with code. The possibilities enabled by this new functionality are virtually unlimited.

Randall

Protecting your API using Amazon API Gateway and AWS WAF — Part 2

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-2/

This post courtesy of Heitor Lessa, AWS Specialist Solutions Architect – Serverless

In Part 1 of this blog, we described how to protect your API provided by Amazon API Gateway using AWS WAF. In this blog, we show how to use API keys between an Amazon CloudFront distribution and API Gateway to secure access to your API in API Gateway in addition to your preferred authorization (AuthZ) mechanism already set up in API Gateway. For more information about AuthZ mechanisms in API Gateway, see Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

We also extend the AWS CloudFormation stack previously used to automate the creation of the following necessary resources of this solution:

The following are alternative solutions to using an API key, depending on your security requirements:

Using a randomly generated HTTP secret header in CloudFront and verifying by API Gateway request validation
Signing incoming requests with [email protected] and verifying with API Gateway Lambda authorizers

Requirements

To follow along, you need full permissions to create, update, and delete API Gateway, CloudFront, Lambda, and CloudWatch Events through AWS CloudFormation.

Extending the existing AWS CloudFormation stack

First, click here to download the full template. Then follow these steps to update the existing AWS CloudFormation stack:

  1. Go to the AWS Management Console and open the AWS CloudFormation console.
  2. Select the stack that you created in Part 1, right-click it, and select Update Stack.
  3. For option 2, choose Choose file and select the template that you downloaded.
  4. Fill in the required parameters as shown in the following image.

Here’s more information about these parameters:

  • API Gateway to send traffic to – We use the same API Gateway URL as in Part 1 except without the URL scheme (https://): cxm45444t9a.execute-api.us-east-2.amazonaws.com/prod
  • Rotating API Keys – We define Daily and use 2018-04-03 as the timestamp value to append to the API key name

Continue with the AWS CloudFormation console to complete the operation. It might take a couple of minutes to update the stack as CloudFront takes its time to propagate changes across all point of presences.

Enabling API Keys in the example Pet Store API

While the stack completes in the background, let’s enable the use of API Keys in the API that CloudFront will send traffic to.

  1. Go to the AWS Management Console and open the API Gateway console.
  2. Select the API that you created in Part 1 and choose Resources.
  3. Under /pets, choose GET and then choose Method Request.
  4. For API Key Required, choose the dropdown menu and choose true.
  5. To save this change, select the highlighted check mark as shown in the following image.

Next, we need to deploy these changes so that requests sent to /pets fail if an API key isn’t present.

  1. Choose Actions and select Deploy API.
  2. Choose the Deployment stage dropdown menu and select the stage you created in Part 1.
  3. Add a deployment description such as “Requires API Keys under /pets” and choose Deploy.

When the deployment succeeds, you’re redirected to the API Gateway Stage page. There you can use the Invoke URL to test if the following request fails due to not having an API key.

This failure is expected and proves that our deployed changes are working. Next, let’s try to access the same API but this time through our CloudFront distribution.

  1. From the AWS Management Console, open the AWS Cloudformation console.
  2. Select the stack that you created in Part 1 and choose Outputs at the bottom left.
  3. On the CFDistribution line, copy the URL. Before you paste in a new browser tab or window, append ‘/pets’ to it.

As opposed to our first attempt without an API key, we receive a JSON response from the PetStore API. This is because CloudFront is injecting an API key before it forwards the request to the PetStore API. The following image demonstrates both of these tests:

  1. Successful request when accessing the API through CloudFront
  2. Unsuccessful request when accessing the API directly through its Invoke URL

This works as a secret between CloudFront and API Gateway, which could be any agreed random secret that can be rotated like an API key. However, it’s important to know that the API key is a feature to track or meter API consumers’ usage. It’s not a secure authorization mechanism and therefore should be used only in conjunction with an API Gateway authorizer.

Rotating API keys

API keys are automatically rotated based on the schedule (e.g., daily or monthly) that you chose when updating the AWS CloudFormation stack. This requires no maintenance or intervention on your part. In this section, we explain how this process works under the hood and what you can do if you want to manually trigger an API key rotation.

The AWS CloudFormation template that we downloaded and used to update our stack does the following in addition to Part 1.

Introduce a Timestamp parameter that is appended to the API key name

Parameters:
  Timestamp:
    Type: String
    Description: Fill in this format <Year>-<Month>-<Day>
    Default: 2018-04-02

Create an API Gateway key, API Gateway usage plan, associate the new key with the API gateway given as a parameter, and configure the CloudFront distribution to send a custom header when forwarding traffic to API Gateway

CFDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Logging:
        IncludeCookies: 'false'
        Bucket: !Sub ${S3BucketAccessLogs}.s3.amazonaws.com
        Prefix: cloudfront-logs
      Enabled: 'true'
      Comment: API Gateway Regional Endpoint Blog post
      Origins:
        -
          Id: APIGWRegional
          DomainName: !Select [0, !Split ['/', !Ref ApiURL]]
          CustomOriginConfig:
            HTTPPort: 443
            OriginProtocolPolicy: https-only
          OriginCustomHeaders:
            - 
              HeaderName: x-api-key
              HeaderValue: !Ref ApiKey
              ...

ApiUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    Description: CloudFront usage only
    UsagePlanName: CloudFront_only
    ApiStages:
      - 
        ApiId: !Select [0, !Split ['.', !Ref ApiURL]]
        Stage: !Select [1, !Split ['/', !Ref ApiURL]]

ApiKey: 
  Type: "AWS::ApiGateway::ApiKey"
  Properties: 
    Name: !Sub "CloudFront-${Timestamp}"
    Description: !Sub "CloudFormation API Key ${Timestamp}"
    Enabled: true

ApiKeyUsagePlan:
  Type: "AWS::ApiGateway::UsagePlanKey"
  Properties:
    KeyId: !Ref ApiKey
    KeyType: API_KEY
    UsagePlanId: !Ref ApiUsagePlan

As shown in the ApiKey resource, we append the given Timestamp to Name as well as use it in the API Gateway usage plan key resource. This means that whenever the Timestamp parameter changes, AWS CloudFormation triggers a resource replacement and updates every resource that depends on that API key. In this case, that includes the AWS CloudFront configuration and API Gateway usage plan.

But what does the rotation schedule that you chose at the beginning of this blog mean in this example?

Create a scheduled activity to trigger a Lambda function on a given schedule

Parameters:
...
  ApiKeyRotationSchedule: 
    Description: Schedule to rotate API Keys e.g. Daily, Monthly, Bimonthly basis
    Type: String
    Default: Daily
    AllowedValues:
      - Daily
      - Fortnightly
      - Monthly
      - Bimonthly
      - Quarterly
    ConstraintDescription: Must be any of the available options

Mappings: 

  ScheduleMap: 
    CloudwatchEvents: 
      Daily: "rate(1 day)"
      Fortnightly: "rate(14 days)"
      Monthly: "rate(30 days)"
      Bimonthly: "rate(60 days)"
      Quarterly: "rate(90 days)"

Resources:
...
  RotateApiKeysScheduledJob: 
    Type: "AWS::Events::Rule"
    Properties: 
      Description: "ScheduledRule"
      ScheduleExpression: !FindInMap [ScheduleMap, CloudwatchEvents, !Ref ApiKeyRotationSchedule]
      State: "ENABLED"
      Targets: 
        - 
          Arn: !GetAtt RotateApiKeysFunction.Arn
          Id: "RotateApiKeys"

The resource RotateApiKeysScheduledJob shows that the schedule that you selected through a dropdown menu when updating the AWS CloudFormation stack is actually converted to a CloudWatch Events rule. This in turn triggers a Lambda function that is defined in the same template.

RotateApiKeysFunction:
      Type: "AWS::Lambda::Function"
      Properties:
        Handler: "index.lambda_handler"
        Role: !GetAtt RotateApiKeysFunctionRole.Arn
        Runtime: python3.6
        Environment:
          Variables:
            StackName: !Ref "AWS::StackName"
        Code:
          ZipFile: !Sub |
            import datetime
            import os

            import boto3
            from botocore.exceptions import ClientError

            session = boto3.Session()
            cfn = session.client('cloudformation')
            
            timestamp = datetime.date.today()            
            params = {
                'StackName': os.getenv('StackName'),
                'UsePreviousTemplate': True,
                'Capabilities': ["CAPABILITY_IAM"],
                'Parameters': [
                    {
                      'ParameterKey': 'ApiURL',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'ApiKeyRotationSchedule',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'Timestamp',
                      'ParameterValue': str(timestamp)
                    },
                ],                
            }

            def lambda_handler(event, context):
              """ Updates CloudFormation Stack with a new timestamp and returns CloudFormation response"""
              try:
                  response = cfn.update_stack(**params)
              except ClientError as err:
                  if "No updates are to be performed" in err.response['Error']['Message']:
                      return {"message": err.response['Error']['Message']}
                  else:
                      raise Exception("An error happened while updating the stack: {}".format(err))          
  
              return response

All this Lambda function does is trigger an AWS CloudFormation stack update via API (exactly what you did through the console but programmatically) and updates the Timestamp parameter. As a result, it rotates the API key and the CloudFront distribution configuration.

This gives you enough flexibility to change the API key rotation schedule at any time without maintaining or writing any code. You can also manually update the stack and rotate the keys by updating the AWS CloudFormation stack’s Timestamp parameter.

Next Steps

We hope you found the information in this blog helpful. You can use it to understand how to create a mechanism to allow traffic only from CloudFront to API Gateway and avoid bypassing the AWS WAF rules that Part 1 set up.

Keep the following important notes in mind about this solution:

  • It assumes that you already have a strong AuthZ mechanism, managed by API Gateway, to control access to your API.
  • The API Gateway usage plan and other resources created in this solution work only for APIs created in the same account (the ApiUrl parameter).
  • If you already use API keys for tracking API usage, consider using either of the following solutions as a replacement:
    • Use a random HTTP header value in CloudFront origin configuration and use an API Gateway request model validation to verify it instead of API keys alone.
    • Combine [email protected] and an API Gateway custom authorizer to sign and verify incoming requests using a shared secret known only to the two. This is a more advanced technique.

Managing Amazon SNS Subscription Attributes with AWS CloudFormation

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/managing-amazon-sns-subscription-attributes-with-aws-cloudformation/

This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.

Amazon SNS is a fully managed pub/sub messaging and event-driven computing service that can decouple distributed systems and microservices. By default, when your publisher system posts a message to an Amazon SNS topic, all systems subscribed to the topic receive a copy of the message. By using Amazon SNS subscription attributes, you can customize this default behavior and make Amazon SNS fit your use cases even more naturally. The available set of Amazon SNS subscription attributes includes FilterPolicy, DeliveryPolicy, and RawMessageDelivery.

You can manually manage your Amazon SNS subscription attributes via the AWS Management Console or programmatically via AWS Development Tools (SDK and AWS CLI). Now you can automate their provisioning via AWS CloudFormation templates as well. AWS CloudFormation lets you use a simple text file to model and provision all the Amazon SNS resources for your messaging use cases, across AWS Regions and accounts, in an automated and secure manner.

The following sections describe how you can simultaneously create Amazon SNS subscriptions and set their attributes via AWS CloudFormation templates.

Setting the FilterPolicy attribute

The FilterPolicy attribute is valid in the context of message filtering, regardless of the delivery protocol, and defines which type of message the subscriber expects to receive from the topic. Hence, by applying the FilterPolicy attribute, you can offload the message-filtering logic from subscribers and the message-routing logic from publishers.

To set the FilterPolicy attribute in your AWS CloudFormation template, use the syntax in the following JSON snippet. This snippet creates an Amazon SNS subscription whose endpoint is an AWS Lambda function. Simultaneously, this code also sets a subscription filter policy that matches messages carrying an attribute whose key is “pet” and value is either “dog” or “cat.”

{
   "Resources": {
      "mySubscription": {
         "Type" : "AWS::SNS::Subscription",
         "Properties" : {
            "Protocol": "lambda",
            "Endpoint": "arn:aws:lambda:us-east-1:000000000000:function:SavePet",
            "TopicArn": "arn:aws:sns:us-east-1:000000000000:PetTopic",
            "FilterPolicy": {
               "pet": ["dog", "cat"]
            }
         }
      }
   }
}

Setting the DeliveryPolicy attribute

The DeliveryPolicy attribute is valid in the context of message delivery to HTTP endpoints and defines a delivery-retry policy. By applying the DeliveryPolicy attribute, you can control the maximum number of retries the subscriber expects, the time delay between each retry, and the backoff function. You should fine-tune these values based on the traffic volume your subscribing HTTP server can handle.

To set the DeliveryPolicy attribute in your AWS CloudFormation template, use the syntax in the following JSON snippet. This snippet creates an Amazon SNS subscription whose endpoint is an HTTP address. The code also sets a delivery policy capped at 10 retries for this subscription, with a linear backoff function.

{
   "Resources": {
      "mySubscription": {
         "Type" : "AWS::SNS::Subscription",
         "Properties" : {
            "Protocol": "https",
            "Endpoint": "https://api.myendpoint.ca/pets",
            "TopicArn": "arn:aws:sns:us-east-1:000000000000:PetTopic",
            "DeliveryPolicy": {
               "healthyRetryPolicy": {
                  "numRetries": 10,
                  "minDelayTarget": 10,
                  "maxDelayTarget": 30,
                  "numMinDelayRetries": 3,
                  "numMaxDelayRetries": 7,
                  "numNoDelayRetries": 0,
                  "backoffFunction": "linear"
               }
            }
         }
      }
   }
}

Setting the RawMessageDelivery attribute

The RawMessageDelivery attribute is valid in the context of message delivery to Amazon SQS queues and HTTP endpoints. This Boolean attribute eliminates the need for the subscriber to process the JSON formatting that is created by default to decorate all published messages with Amazon SNS metadata. When you set RawMessageDelivery to true, you get two outcomes. First, your message is delivered as is, with no metadata added. Second, your message attributes propagate from Amazon SNS to Amazon SQS, when the subscribing endpoint is an Amazon SQS queue.

To set the RawMessageDelivery attribute in your AWS CloudFormation template, use the syntax in the following JSON snippet. This snippet creates an Amazon SNS subscription whose endpoint is an Amazon SQS queue. This code also enables raw message delivery for the subscription, which prevents Amazon SNS metadata from being added to the message payload.

{
   "Resources": {
      "mySubscription": {
         "Type" : "AWS::SNS::Subscription",
         "Properties" : {
            "Protocol": "https",
            "Endpoint": "https://api.myendpoint.ca/pets",
            "TopicArn": "arn:aws:sns:us-east-1:000000000000:PetTopic",
            "DeliveryPolicy": {
               "healthyRetryPolicy": {
                  "numRetries": 10,
                  "minDelayTarget": 10,
                  "maxDelayTarget": 30,
                  "numMinDelayRetries": 3,
                  "numMaxDelayRetries": 7,
                  "numNoDelayRetries": 0,
                  "backoffFunction": "linear"
               }
            }
         }
      }
   }
}

Applying subscription attributes in a use case

Here’s how everything comes together. The following example is based on a car dealer company, which operates with the following distributed systems hosted on Amazon EC2 instances:

  • Car-Dealer-System – Front-office system that takes orders placed by car buyers
  • ERP-System – Enterprise resource planning, the back-office system that handles finance, accounting, human resources, and related business activities
  • CRM-System – Customer relationship management, the back-office system responsible for storing car buyers’ profile information and running sales workflows
  • SCM-System – Supply chain management, the back-office system that handles inventory tracking and demand forecast and planning

 

Whenever an order is placed in the car dealer system, this event is broadcasted to all back-office systems interested in this type of event. As shown in the preceding diagram, the company applied AWS Messaging services to decouple their distributed systems, promoting more scalability and maintainability for their architecture. The queues and topic used are the following:

  • Car-Sales – Amazon SNS topic that receives messages from the car dealer system. All orders placed by car buyers are published to this topic, then delivered to subscribers (two Amazon SQS queues and one HTTP endpoint).
  • ERP-Integration – Amazon SQS queue that feeds the ERP system with orders published by the car dealer system. The ERP pulls messages from this queue to track revenue and trigger related bookkeeping processes.
  • CRM-Integration – Amazon SQS queue that feeds the CRM system with orders published by the car dealer system. The CRM pulls messages from this queue to track car buyers’ interests and update sales workflows.

The company created the following three Amazon SNS subscriptions:

  • The first subscription refers to the ERP-Integration queue. This subscription has the RawMessageDelivery attribute set to true. Hence, no metadata is added to the message payload, and message attributes are propagated from Amazon SNS to Amazon SQS.
  • The second subscription refers to the CRM-Integration queue. Like the first subscription, this one also has the RawMessageDelivery attribute set to true. Additionally, it has the FilterPolicy attribute set to {“buyer-class”: [“vip”]}. This policy defines that only orders placed by VIP buyers are managed in the CRM system, and orders from other buyers are filtered out.
  • The third subscription points to the HTTP endpoint that serves the SCM-System. Unlike ERP and CRM, the SCM system provides its own HTTP API. Therefore, its HTTP endpoint was subscribed to the topic directly without a queue in between. This subscription has a DeliveryPolicy that caps the number of retries to 20, with exponential back-off function.

The company didn’t want to create all these resources manually, though. They wanted to turn this infrastructure into versionable code, and the ability to quickly spin up and tear down this infrastructure in an automated manner. Therefore, they created an AWS CloudFormation template to manage these AWS messaging resources: Amazon SNS topic, Amazon SNS subscriptions, Amazon SNS subscription attributes, and Amazon SQS queues.

Executing the AWS CloudFormation template

Now you’re ready to execute this AWS CloudFormation template yourself. To bootstrap this architecture in your AWS account:

    1. Download the sample AWS CloudFormation template from the repository.
    2. Go to the AWS CloudFormation console.
    3. Choose Create Stack.
    4. For Select Template, choose to upload a template to Amazon S3, and choose Browse.
    5. Select the template you downloaded and choose Next.
    6. For Specify Details:
      • Enter the following stack name: Car-Dealer-Stack.
      • Enter the HTTP endpoint to be subscribed to your topic. If you don’t have an HTTP endpoint, create a temp one.
      • Choose Next.
    7. For Options, choose Next.
    8. For Review, choose Create.
    9. Wait until your stack creation process is complete.

Now that all the infrastructure is in place, verify the Amazon SNS subscriptions attributes set by the AWS CloudFormation template as follows:

  1. Go to the Amazon SNS console.
  2. Choose Topics and then select the ARN associated with Car-Sales.
  3. Verify the first subscription:
    • Select the subscription related to ERP-Integration (Amazon SQS protocol).
    • Choose Other subscription actions and then choose Edit subscription attributes.
    • Note that raw message delivery is enabled, and choose Cancel to go back.
  4. Verify the second subscription:
    • Select the subscription related to CRM-Integration (Amazon SQS protocol).
    • Choose Other subscription actions and then choose Edit subscription attributes.
    • Note that raw message delivery is enabled and then choose Cancel to go back.
    • Choose Other subscription actions and then choose Edit subscription filter policy.
    • Note that the filter policy is set, and then choose Cancel to go back
  5. Confirm the third subscription.
  6. Verify the third subscription:
    • Select the subscription related to SCM-System (HTTP protocol).
    • Choose Other subscription actions and then choose Edit subscription delivery policy.
    •  Choose Advanced view.
    • Note that an exponential delivery retry policy is set, and then choose Cancel to go back.

Now that you have verified all subscription attributes, you can delete your AWS CloudFormation stack as follows:

  1. Go to the AWS CloudFormation console.
  2. In the list of stacks, select Car-Dealer-Stack.
  3. Choose Actions, choose Delete Stack, and then choose Yes Delete.
  4. Wait for the stack deletion process to complete.

That’s it! At this point, you have deleted all Amazon SNS and Amazon SQS resources created in this exercise from your AWS account.

Summary

AWS CloudFormation templates enable the simultaneous creation of Amazon SNS subscriptions and their attributes (such as FilterPolicy, DeliveryPolicy, and RawMessageDelivery) in an automated and secure manner. AWS CloudFormation support for Amazon SNS subscription attributes is available now in all AWS Regions.

For information about pricing, see AWS CloudFormation Pricing. For more information on setting up Amazon SNS resources via AWS CloudFormation templates, see:

Amazon SageMaker Updates – Tokyo Region, CloudFormation, Chainer, and GreenGrass ML

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/sagemaker-tokyo-summit-2018/

Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.

Amazon SageMaker Chainer Estimator


Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.

Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.


import argparse
import os

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=64)
    parser.add_argument('--learning-rate', type=float, default=0.05)

    # Data, model, and output directories
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir.

Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.


from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
    entry_point='example.py',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})

# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
    instance_type="ml.m4.xlarge",
    initial_instance_count=1
)

Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.

There’s a lot more documentation and information available in both the README and the example notebooks.

AWS GreenGrass ML with Chainer

AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.

JAWS UG

I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.

Randall

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall