Tag Archives: CI/CD

Terraform CI/CD and testing on AWS with the new Terraform Test Framework

2024-04-03 Kevon Mayers

Post Syndicated from Kevon Mayers original https://aws.amazon.com/blogs/devops/terraform-ci-cd-and-testing-on-aws-with-the-new-terraform-test-framework/

Graphic created by Kevon Mayers

Introduction

Organizations often use Terraform Modules to orchestrate complex resource provisioning and provide a simple interface for developers to enter the required parameters to deploy the desired infrastructure. Modules enable code reuse and provide a method for organizations to standardize deployment of common workloads such as a three-tier web application, a cloud networking environment, or a data analytics pipeline. When building Terraform modules, it is common for the module author to start with manual testing. Manual testing is performed using commands such as terraform validate for syntax validation, terraform plan to preview the execution plan, and terraform apply followed by manual inspection of resource configuration in the AWS Management Console. Manual testing is prone to human error, not scalable, and can result in unintended issues. Because modules are used by multiple teams in the organization, it is important to ensure that any changes to the modules are extensively tested before the release. In this blog post, we will show you how to validate Terraform modules and how to automate the process using a Continuous Integration/Continuous Deployment (CI/CD) pipeline.

Terraform Test

Terraform test is a new testing framework for module authors to perform unit and integration tests for Terraform modules. Terraform test can create infrastructure as declared in the module, run validation against the infrastructure, and destroy the test resources regardless if the test passes or fails. Terraform test will also provide warnings if there are any resources that cannot be destroyed. Terraform test uses the same HashiCorp Configuration Language (HCL) syntax used to write Terraform modules. This reduces the burden for modules authors to learn other tools or programming languages. Module authors run the tests using the command terraform test which is available on Terraform CLI version 1.6 or higher.

Module authors create test files with the extension *.tftest.hcl. These test files are placed in the root of the Terraform module or in a dedicated tests directory. The following elements are typically present in a Terraform tests file:

Provider block: optional, used to override the provider configuration, such as selecting AWS region where the tests run.
Variables block: the input variables passed into the module during the test, used to supply non-default values or to override default values for variables.
Run block: used to run a specific test scenario. There can be multiple run blocks per test file, Terraform executes run blocks in order. In each run block you specify the command Terraform (plan or apply), and the test assertions. Module authors can specify the conditions such as: length(var.items) != 0. A full list of condition expressions can be found in the HashiCorp documentation.

Terraform tests are performed in sequential order and at the end of the Terraform test execution, any failed assertions are displayed.

Basic test to validate resource creation

Now that we understand the basic anatomy of a Terraform tests file, let’s create basic tests to validate the functionality of the following Terraform configuration. This Terraform configuration will create an AWS CodeCommit repository with prefix name repo-.

# main.tf

variable "repository_name" {
  type = string
}
resource "aws_codecommit_repository" "test" {
  repository_name = format("repo-%s", var.repository_name)
  description     = "Test repository."
}

Now we create a Terraform test file in the tests directory. See the following directory structure as an example:

├── main.tf 
└── tests 
└── basic.tftest.hcl

For this first test, we will not perform any assertion except for validating that Terraform execution plan runs successfully. In the tests file, we create a variable block to set the value for the variable repository_name. We also added the run block with command = plan to instruct Terraform test to run Terraform plan. The completed test should look like the following:

# basic.tftest.hcl

variables {
  repository_name = "MyRepo"
}

run "test_resource_creation" {
  command = plan
}

Now we will run this test locally. First ensure that you are authenticated into an AWS account, and run the terraform init command in the root directory of the Terraform module. After the provider is initialized, start the test using the terraform test command.

❯ terraform test
tests/basic.tftest.hcl... in progress
run "test_resource_creation"... pass
tests/basic.tftest.hcl... tearing down
tests/basic.tftest.hcl... pass

Our first test is complete, we have validated that the Terraform configuration is valid and the resource can be provisioned successfully. Next, let’s learn how to perform inspection of the resource state.

Create resource and validate resource name

Re-using the previous test file, we add the assertion block to checks if the CodeCommit repository name starts with a string repo- and provide error message if the condition fails. For the assertion, we use the startswith function. See the following example:

# basic.tftest.hcl

variables {
  repository_name = "MyRepo"
}

run "test_resource_creation" {
  command = plan

  assert {
    condition = startswith(aws_codecommit_repository.test.repository_name, "repo-")
    error_message = "CodeCommit repository name ${var.repository_name} did not start with the expected value of ‘repo-****’."
  }
}

Now, let’s assume that another module author made changes to the module by modifying the prefix from repo- to my-repo-. Here is the modified Terraform module.

# main.tf

variable "repository_name" {
  type = string
}
resource "aws_codecommit_repository" "test" {
  repository_name = format("my-repo-%s", var.repository_name)
  description = "Test repository."
}

We can catch this mistake by running the the terraform test command again.

❯ terraform test
tests/basic.tftest.hcl... in progress
run "test_resource_creation"... fail
╷
│ Error: Test assertion failed
│
│ on tests/basic.tftest.hcl line 9, in run "test_resource_creation":
│ 9: condition = startswith(aws_codecommit_repository.test.repository_name, "repo-")
│ ├────────────────
│ │ aws_codecommit_repository.test.repository_name is "my-repo-MyRepo"
│
│ CodeCommit repository name MyRepo did not start with the expected value 'repo-***'.
╵
tests/basic.tftest.hcl... tearing down
tests/basic.tftest.hcl... fail

Failure! 0 passed, 1 failed.

We have successfully created a unit test using assertions that validates the resource name matches the expected value. For more examples of using assertions see the Terraform Tests Docs. Before we proceed to the next section, don’t forget to fix the repository name in the module (revert the name back to repo- instead of my-repo-) and re-run your Terraform test.

Testing variable input validation

When developing Terraform modules, it is common to use variable validation as a contract test to validate any dependencies / restrictions. For example, AWS CodeCommit limits the repository name to 100 characters. A module author can use the length function to check the length of the input variable value. We are going to use Terraform test to ensure that the variable validation works effectively. First, we modify the module to use variable validation.

# main.tf

variable "repository_name" {
  type = string
  validation {
    condition = length(var.repository_name) <= 100
    error_message = "The repository name must be less than or equal to 100 characters."
  }
}

resource "aws_codecommit_repository" "test" {
  repository_name = format("repo-%s", var.repository_name)
  description = "Test repository."
}

By default, when variable validation fails during the execution of Terraform test, the Terraform test also fails. To simulate this, create a new test file and insert the repository_name variable with a value longer than 100 characters.

# var_validation.tftest.hcl

variables {
  repository_name = “this_is_a_repository_name_longer_than_100_characters_7rfD86rGwuqhF3TH9d3Y99r7vq6JZBZJkhw5h4eGEawBntZmvy”
}

run “test_invalid_var” {
  command = plan
}

Notice on this new test file, we also set the command to Terraform plan, why is that? Because variable validation runs prior to Terraform apply, thus we can save time and cost by skipping the entire resource provisioning. If we run this Terraform test, it will fail as expected.

❯ terraform test
tests/basic.tftest.hcl… in progress
run “test_resource_creation”… pass
tests/basic.tftest.hcl… tearing down
tests/basic.tftest.hcl… pass
tests/var_validation.tftest.hcl… in progress
run “test_invalid_var”… fail
╷
│ Error: Invalid value for variable
│
│ on main.tf line 1:
│ 1: variable “repository_name” {
│ ├────────────────
│ │ var.repository_name is “this_is_a_repository_name_longer_than_100_characters_7rfD86rGwuqhF3TH9d3Y99r7vq6JZBZJkhw5h4eGEawBntZmvy”
│
│ The repository name must be less than or equal to 100 characters.
│
│ This was checked by the validation rule at main.tf:3,3-13.
╵
tests/var_validation.tftest.hcl… tearing down
tests/var_validation.tftest.hcl… fail

Failure! 1 passed, 1 failed.

For other module authors who might iterate on the module, we need to ensure that the validation condition is correct and will catch any problems with input values. In other words, we expect the validation condition to fail with the wrong input. This is especially important when we want to incorporate the contract test in a CI/CD pipeline. To prevent our test from failing due introducing an intentional error in the test, we can use the expect_failures attribute. Here is the modified test file:

# var_validation.tftest.hcl

variables {
  repository_name = “this_is_a_repository_name_longer_than_100_characters_7rfD86rGwuqhF3TH9d3Y99r7vq6JZBZJkhw5h4eGEawBntZmvy”
}

run “test_invalid_var” {
  command = plan

  expect_failures = [
    var.repository_name
  ]
}

Now if we run the Terraform test, we will get a successful result.

❯ terraform test
tests/basic.tftest.hcl… in progress
run “test_resource_creation”… pass
tests/basic.tftest.hcl… tearing down
tests/basic.tftest.hcl… pass
tests/var_validation.tftest.hcl… in progress
run “test_invalid_var”… pass
tests/var_validation.tftest.hcl… tearing down
tests/var_validation.tftest.hcl… pass

Success! 2 passed, 0 failed.

As you can see, the expect_failures attribute is used to test negative paths (the inputs that would cause failures when passed into a module). Assertions tend to focus on positive paths (the ideal inputs). For an additional example of a test that validates functionality of a completed module with multiple interconnected resources, see this example in the Terraform CI/CD and Testing on AWS Workshop.

Orchestrating supporting resources

In practice, end-users utilize Terraform modules in conjunction with other supporting resources. For example, a CodeCommit repository is usually encrypted using an AWS Key Management Service (KMS) key. The KMS key is provided by end-users to the module using a variable called kms_key_id. To simulate this test, we need to orchestrate the creation of the KMS key outside of the module. In this section we will learn how to do that. First, update the Terraform module to add the optional variable for the KMS key.

# main.tf

variable "repository_name" {
  type = string
  validation {
    condition = length(var.repository_name) <= 100
    error_message = "The repository name must be less than or equal to 100 characters."
  }
}

variable "kms_key_id" {
  type = string
  default = ""
}

resource "aws_codecommit_repository" "test" {
  repository_name = format("repo-%s", var.repository_name)
  description = "Test repository."
  kms_key_id = var.kms_key_id != "" ? var.kms_key_id : null
}

In a Terraform test, you can instruct the run block to execute another helper module. The helper module is used by the test to create the supporting resources. We will create a sub-directory called setup under the tests directory with a single kms.tf file. We also create a new test file for KMS scenario. See the updated directory structure:

├── main.tf
└── tests
├── setup
│ └── kms.tf
├── basic.tftest.hcl
├── var_validation.tftest.hcl
└── with_kms.tftest.hcl

The kms.tf file is a helper module to create a KMS key and provide its ARN as the output value.

# kms.tf

resource "aws_kms_key" "test" {
  description = "test KMS key for CodeCommit repo"
  deletion_window_in_days = 7
}

output "kms_key_id" {
  value = aws_kms_key.test.arn
}

The new test will use two separate run blocks. The first run block (setup) executes the helper module to generate a KMS key. This is done by assigning the command apply which will run terraform apply to generate the KMS key. The second run block (codecommit_with_kms) will then use the KMS key ARN output of the first run as the input variable passed to the main module.

# with_kms.tftest.hcl

run "setup" {
  command = apply
  module {
    source = "./tests/setup"
  }
}

run "codecommit_with_kms" {
  command = apply

  variables {
    repository_name = "MyRepo"
    kms_key_id = run.setup.kms_key_id
  }

  assert {
    condition = aws_codecommit_repository.test.kms_key_id != null
    error_message = "KMS key ID attribute value is null"
  }
}

Go ahead and run the Terraform init, followed by Terraform test. You should get the successful result like below.

❯ terraform test
tests/basic.tftest.hcl... in progress
run "test_resource_creation"... pass
tests/basic.tftest.hcl... tearing down
tests/basic.tftest.hcl... pass
tests/var_validation.tftest.hcl... in progress
run "test_invalid_var"... pass
tests/var_validation.tftest.hcl... tearing down
tests/var_validation.tftest.hcl... pass
tests/with_kms.tftest.hcl... in progress
run "create_kms_key"... pass
run "codecommit_with_kms"... pass
tests/with_kms.tftest.hcl... tearing down
tests/with_kms.tftest.hcl... pass

Success! 4 passed, 0 failed.

We have learned how to run Terraform test and develop various test scenarios. In the next section we will see how to incorporate all the tests into a CI/CD pipeline.

Terraform Tests in CI/CD Pipelines

Now that we have seen how Terraform Test works locally, let’s see how the Terraform test can be leveraged to create a Terraform module validation pipeline on AWS. The following AWS services are used:

AWS CodeCommit – a secure, highly scalable, fully managed source control service that hosts private Git repositories.
AWS CodeBuild – a fully managed continuous integration service that compiles source code, runs tests, and produces ready-to-deploy software packages.
AWS CodePipeline – a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates.
Amazon Simple Storage Service (Amazon S3) – an object storage service offering industry-leading scalability, data availability, security, and performance.

Terraform module validation pipeline

In the above architecture for a Terraform module validation pipeline, the following takes place:

A developer pushes Terraform module configuration files to a git repository (AWS CodeCommit).
AWS CodePipeline begins running the pipeline. The pipeline clones the git repo and stores the artifacts to an Amazon S3 bucket.
An AWS CodeBuild project configures a compute/build environment with Checkov installed from an image fetched from Docker Hub. CodePipeline passes the artifacts (Terraform module) and CodeBuild executes Checkov to run static analysis of the Terraform configuration files.
Another CodeBuild project configured with Terraform from an image fetched from Docker Hub. CodePipeline passes the artifacts (repo contents) and CodeBuild runs Terraform command to execute the tests.

CodeBuild uses a buildspec file to declare the build commands and relevant settings. Here is an example of the buildspec files for both CodeBuild Projects:

# Checkov
version: 0.1
phases:
  pre_build:
    commands:
      - echo pre_build starting

  build:
    commands:
      - echo build starting
      - echo starting checkov
      - ls
      - checkov -d .
      - echo saving checkov output
      - checkov -s -d ./ > checkov.result.txt

In the above buildspec, Checkov is run against the root directory of the cloned CodeCommit repository. This directory contains the configuration files for the Terraform module. Checkov also saves the output to a file named checkov.result.txt for further review or handling if needed. If Checkov fails, the pipeline will fail.

# Terraform Test
version: 0.1
phases:
  pre_build:
    commands:
      - terraform init
      - terraform validate

  build:
    commands:
      - terraform test

In the above buildspec, the terraform init and terraform validate commands are used to initialize Terraform, then check if the configuration is valid. Finally, the terraform test command is used to run the configured tests. If any of the Terraform tests fails, the pipeline will fail.

For a full example of the CI/CD pipeline configuration, please refer to the Terraform CI/CD and Testing on AWS workshop. The module validation pipeline mentioned above is meant as a starting point. In a production environment, you might want to customize it further by adding Checkov allow-list rules, linting, checks for Terraform docs, or pre-requisites such as building the code used in AWS Lambda.

Choosing various testing strategies

At this point you may be wondering when you should use Terraform tests or other tools such as Preconditions and Postconditions, Check blocks or policy as code. The answer depends on your test type and use-cases. Terraform test is suitable for unit tests, such as validating resources are created according to the naming specification. Variable validations and Pre/Post conditions are useful for contract tests of Terraform modules, for example by providing error warning when input variables value do not meet the specification. As shown in the previous section, you can also use Terraform test to ensure your contract tests are running properly. Terraform test is also suitable for integration tests where you need to create supporting resources to properly test the module functionality. Lastly, Check blocks are suitable for end to end tests where you want to validate the infrastructure state after all resources are generated, for example to test if a website is running after an S3 bucket configured for static web hosting is created.

When developing Terraform modules, you can run Terraform test in command = plan mode for unit and contract tests. This allows the unit and contract tests to run quicker and cheaper since there are no resources created. You should also consider the time and cost to execute Terraform test for complex / large Terraform configurations, especially if you have multiple test scenarios. Terraform test maintains one or many state files within the memory for each test file. Consider how to re-use the module’s state when appropriate. Terraform test also provides test mocking, which allows you to test your module without creating the real infrastructure.

Conclusion

In this post, you learned how to use Terraform test and develop various test scenarios. You also learned how to incorporate Terraform test in a CI/CD pipeline. Lastly, we also discussed various testing strategies for Terraform configurations and modules. For more information about Terraform test, we recommend the Terraform test documentation and tutorial. To get hands on practice building a Terraform module validation pipeline and Terraform deployment pipeline, check out the Terraform CI/CD and Testing on AWS Workshop.

Authors

Your DevOps and Developer Productivity guide to re:Invent 2023

2023-11-21 Anubhav Rao

Post Syndicated from Anubhav Rao original https://aws.amazon.com/blogs/devops/your-devops-and-developer-productivity-guide-to-reinvent-2023/

Your DevOps and Developer Productivity guide to re:Invent 2023

ICYMI – AWS re:Invent is less than a week away! We can’t wait to join thousands of builders in person and virtually for another exciting event. Still need to save your spot? You can register here.

With so much planned for the DevOps and Developer Productivity (DOP) track at re:Invent, we’re highlighting the most exciting sessions for technology leaders and developers in this post. Sessions span intermediate (200) through expert (400) levels of content in a mix of interactive chalk talks, hands-on workshops, and lecture-style breakout sessions.

You will experience the future of efficient development at the DevOps and Developer Productivity track and get a chance to talk to AWS experts about exciting services, tools, and new AI capabilities that optimize and automate your software development lifecycle. Attendees will leave re:Invent with the latest strategies to accelerate development, use generative AI to improve developer productivity, and focus on high-value work and innovation.

How to reserve a seat in the sessions

Reserved seating is available for registered attendees to secure seats in the sessions of their choice. Reserve a seat by signing in to the attendee portal and navigating to Event, then Sessions.

Do not miss the Innovation Talk led by Vice President of AWS Generative Builders, Adam Seligman. In DOP225-INT Build without limits: The next-generation developer experience at AWS, Adam will provide updates on the latest developer tools and services, including generative AI-powered capabilities, low-code abstractions, cloud development, and operations. He’ll also welcome special guests to lead demos of key developer services and showcase how they integrate to increase productivity and innovation.

DevOps and Developer Productivity breakout sessions

What are breakout sessions?

AWS re:Invent breakout sessions are lecture-style and 60 minutes long. These sessions are delivered by AWS experts and typically reserve 10–15 minutes for Q&A at the end. Breakout sessions are recorded and made available on-demand after the event.

Level 200 — Intermediate

DOP201 | Best practices for Amazon CodeWhisperer Generative AI can create new content and ideas, including conversations, stories, images, videos, and music. Learning how to interact with generative AI effectively and proficiently is a skill worth developing. Join this session to learn about best practices for engaging with Amazon CodeWhisperer, which uses an underlying foundation model to radically improve developer productivity by generating code suggestions in real time.

DOP202 | Realizing the developer productivity benefits of Amazon CodeWhisperer Developers spend a significant amount of their time writing undifferentiated code. Amazon CodeWhisperer radically improves productivity by generating code suggestions in real time to alleviate this burden. In this session, learn how CodeWhisperer can “write” much of this undifferentiated code, allowing developers to focus on business logic and accelerate the pace of their innovation.

DOP205 | Accelerate development with Amazon CodeCatalyst In this session, explore the newest features in Amazon CodeCatalyst. Learn firsthand how these practical additions to CodeCatalyst can simplify application delivery, improve team collaboration, and speed up the software development lifecycle from concept to deployment.

DOP206 | AWS infrastructure as code: A year in review AWS provides services that help with the creation, deployment, and maintenance of application infrastructure in a programmatic, descriptive, and declarative way. These services help provide rigor, clarity, and reliability to application development. Join this session to learn about the new features and improvements for AWS infrastructure as code with AWS CloudFormation and AWS Cloud Development Kit (AWS CDK) and how they can benefit your team.

DOP207 | Build and run it: Streamline DevOps with machine learning on AWS While organizations have improved how they deliver and operate software, development teams still run into issues when performing manual code reviews, looking for hard-to-find defects, and uncovering security-related problems. Developers have to keep up with multiple programming languages and frameworks, and their productivity can be impaired when they have to search online for code snippets. Additionally, they require expertise in observability to successfully operate the applications they build. In this session, learn how companies like Fidelity Investments use machine learning–powered tools like Amazon CodeWhisperer and Amazon DevOps Guru to boost application availability and write software faster and more reliably.

DOP208 | Continuous integration and delivery for AWS AWS provides one place where you can plan work, collaborate on code, build, test, and deploy applications with continuous integration/continuous delivery (CI/CD) tools. In this session, learn about how to create end-to-end CI/CD pipelines using infrastructure as code on AWS.

DOP209 | Governance and security with infrastructure as code In this session, learn how to use AWS CloudFormation and the AWS CDK to deploy cloud applications in regulated environments while enforcing security controls. Find out how to catch issues early with cdk-nag, validate your pipelines with cfn-guard, and protect your accounts from unintended changes with CloudFormation hooks.

DOP210 | Scale your application development with Amazon CodeCatalyst Amazon CodeCatalyst brings together everything you need to build, deploy, and collaborate on software into one integrated software development service. In this session, discover the ways that CodeCatalyst helps developers and teams build and ship code faster while spending more time doing the work they love.

DOP211 | Boost developer productivity with Amazon CodeWhisperer Generative AI is transforming the way that developers work. Writing code is already getting disrupted by tools like Amazon CodeWhisperer, which enhances developer productivity by providing real-time code completions based on natural language prompts. In this session, get insights into how to evaluate and measure productivity with the adoption of generative AI–powered tools. Learn from the AWS Disaster Recovery team who uses CodeWhisperer to solve complex engineering problems by gaining efficiency through longer productivity cycles and increasing velocity to market for ongoing fixes. Hear how integrating tools like CodeWhisperer into your workflows can boost productivity.

DOP212 | New AWS generative AI features and tools for developers Explore how generative AI coding tools are changing the way developers and companies build software. Generative AI–powered tools are boosting developer and business productivity by automating tasks, improving communication and collaboration, and providing insights that can inform better decision-making. In this session, see the newest AWS tools and features that make it easier for builders to solve problems with minimal technical expertise and that help technical teams boost productivity. Walk through how organizations like FINRA are exploring generative AI and beginning their journey using these tools to accelerate their pace of innovation.

DOP220 | Simplify building applications with AWS SDKs AWS SDKs play a vital role in using AWS services in your organization’s applications and services. In this session, learn about the current state and the future of AWS SDKs. Explore how they can simplify your developer experience and unlock new capabilities. Discover how SDKs are evolving, providing a consistent experience in multiple languages and empowering you to do more with high-level abstractions to make it easier to build on AWS. Learn how AWS SDKs are built using open source tools like Smithy, and how you can use these tools to build your own SDKs to serve your customers’ needs.

DevOps and Developer Productivity chalk talks

What are chalk talks?

Chalk Talks are highly interactive sessions with a small audience. Experts lead you through problems and solutions on a digital whiteboard as the discussion unfolds. Each begins with a short lecture (10–15 minutes) delivered by an AWS expert, followed by a 45- or 50-minute Q&A session with the audience.

Level 300 — Advanced

DOP306 | Streamline DevSecOps with a complete software development service Security is not just for application code—the automated software supply chains that build modern software can also be exploited by attackers. In this chalk talk, learn how you can use Amazon CodeCatalyst to incorporate security tests into every aspect of your software development lifecycle while maintaining a great developer experience. Discover how CodeCatalyst’s flexible actions-based CI/CD workflows streamline the process of adapting to security threats.

DOP309-R | AI for DevOps: Modernizing your DevOps operations with AWS As more organizations move to microservices architectures to scale their businesses, applications increasingly have become distributed, requiring the need for even greater visibility. IT operations professionals and developers need more automated practices to maintain application availability and reduce the time and effort required to detect, debug, and resolve operational issues. In this chalk talk, discover how you can use AWS services, including Amazon CodeWhisperer, Amazon CodeGuru and Amazon DevOps Guru, to start using AI for DevOps solutions to detect, diagnose, and remedy anomalous application behavior.

DOP310-R | Better together: GitHub Actions, Amazon CodeCatalyst, or AWS CodeBuild Learn how combining GitHub Actions with Amazon CodeCatalyst or AWS CodeBuild can maximize development efficiency. In this chalk talk, learn about the tradeoffs of using GitHub Actions runners hosted on Amazon EC2 or Amazon ECS with GitHub Actions hosted on CodeCatalyst or CodeBuild. Explore integration with other AWS services to enhance workflow automation. Join this talk to learn how GitHub Actions on AWS can take your development processes to the next level.

DOP311 | Building infrastructure as code with AWS CloudFormation AWS CloudFormation helps you manage your AWS infrastructure as code, increasing automation and supporting infrastructure-as-code best practices. In this chalk talk, learn the fundamentals of CloudFormation, including templates, stacks, change sets, and stack dependencies. See a demo of how to describe your AWS infrastructure in a template format and provision resources in an automated, repeatable way.

DOP312 | Creating custom constructs with AWS CDK Join this chalk talk to get answers to your questions about creating, publishing, and sharing your AWS CDK constructs publicly and privately. Learn about construct levels, how to test your constructs, how to discover and use constructs in your AWS CDK projects, and explore Construct Hub.

DOP313-R | Multi-account and multi-Region deployments at scale Many AWS customers are implementing multi-account strategies to more easily manage their cloud infrastructure and improve their security and compliance postures. In this chalk talk, learn about various options for deploying resources into multiple accounts and AWS Regions using AWS developer tools, including AWS CodePipeline, AWS CodeDeploy, and Amazon CodeCatalyst.

DOP314 | Simplifying cloud infrastructure creation with the AWS CDK The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. In this chalk talk, get an introduction to the AWS CDK and see a demo of how it can simplify infrastructure creation. Through code examples and diagrams, see how the AWS CDK lets you use familiar programming languages for declarative infrastructure definition. Also learn how it provides higher-level abstractions and constructs over native CloudFormation.

DOP317 | Applying Amazon’s DevOps culture to your team In this chalk talk, learn how Amazon helps its developers rapidly release and iterate software while maintaining industry-leading standards on security, reliability, and performance. Learn about the culture of two-pizza teams and how to maintain a culture of DevOps in a large enterprise. Also, discover how you can help build such a culture at your own organization.

DOP318 | Testing for resilience with AWS Fault Injection Simulator As cloud-based systems grow in scale and complexity, there is increased need to test distributed systems for resiliency. AWS Fault Injection Simulator (FIS) allows you to stress test your applications to understand failure modes and build more resilient services. Through code examples and diagrams, see how to set up and run fault injection experiments on AWS. By the end of this session, understand how FIS helps identify weaknesses and validate improvements to build more resilient cloud-based systems.

DOP319-R | Zero-downtime deployment strategies AWS services support a wealth of deployment options to meet your needs, ranging from in-place updates to blue/green deployment to continuous configuration with feature flags. In this chalk talk, hear about multiple options for deploying changes to Amazon EC2, Amazon ECS, and AWS Lambda compute platforms using AWS CodeDeploy, AWS AppConfig, AWS CloudFormation, AWS Cloud Development Kit (AWS CDK), and Amazon CodeCatalyst.

DOP320 | Build a path to production with Amazon CodeCatalyst blueprints Amazon CodeCatalyst uses blueprints to configure your software projects in the service. Blueprints instruct CodeCatalyst on how to set up a code repository with working sample code, define cloud infrastructure, and run pre-configured CI/CD workflows for your project. In this session, learn how blueprints in CodeCatalyst can give developers a compliant software service they’ll want to use on AWS.

DOP321-R | Code faster with Amazon CodeWhisperer Traditionally, building applications requires developers to spend a lot of time manually writing code and trying to learn and keep up with new frameworks, SDKs, and libraries. In the last three years, AI models have grown exponentially in complexity and sophistication, enabling the creation of tools like Amazon CodeWhisperer that can generate code suggestions in real time based on a natural language description of the task. In this session, learn how CodeWhisperer can accelerate and enhance your software development with code generation, reference tracking, security scans, and more.

DOP324 | Accelerating application development with AWS client-side tools Did you know AWS has more than just services? There are dozens of AWS client-side tools and libraries designed to make developing quality applications easier. In this chalk talk, explore some of the tools available in your development workspace. Learn more about command line tooling (AWS CLI), libraries (AWS SDK), IDE integrations, and application frameworks that can accelerate your AWS application development. The audience helps set the agenda so there’s sure to be something for every builder.

DevOps and Developer Productivity workshops

What are workshops?

Workshops are two-hour interactive learning sessions where you work in small group teams to solve problems using AWS services. Each workshop starts with a short lecture (10–15 minutes) by the main speaker, and the rest of the time is spent working as a group.

Level 300 — Advanced

DOP301 | Boost your application availability with AIOps on AWS As applications become increasingly distributed and complex, developers and IT operations teams can benefit from more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues manually. In this workshop, learn how AWS AIOps solutions can help you make the shift toward more automation and proactive mechanisms so your IT team can innovate faster. The workshop includes use cases spanning multiple AWS services such as AWS Lambda, Amazon DynamoDB, Amazon API Gateway, Amazon RDS, and Amazon EKS. Learn how you can reduce MTTR and quickly identify issues within your AWS infrastructure. You must bring your laptop to participate.

DOP302 | Build software faster with Amazon CodeCatalyst In this workshop, learn about creating continuous integration and continuous delivery (CI/CD) pipelines using Amazon CodeCatalyst. CodeCatalyst is a unified software development service on AWS that brings together everything teams need to plan, code, build, test, and deploy applications with continuous CI/CD tools. You can utilize AWS services and integrate AWS resources into your projects by connecting your AWS accounts. With all of the stages of an application’s lifecycle in one tool, you can deliver quality software quickly and confidently. You must bring your laptop to participate.

DOP303-R | Continuous integration and delivery on AWS In this workshop, learn to create end-to-end continuous integration and continuous delivery (CI/CD) pipelines using AWS Cloud Development Kit (AWS CDK). Review the fundamental concepts of continuous integration, continuous deployment, and continuous delivery. Then, using TypeScript/Python, define an AWS CodePipeline, AWS CodeBuild, and AWS CodeCommit workflow. You must bring your laptop to participate.

DOP304 | Develop AWS CDK resources to deploy your applications on AWS In this workshop, learn how to build and deploy applications using infrastructure as code with AWS Cloud Development Kit (AWS CDK). Create resources using AWS CDK and learn maintenance and operations tips. In addition, get an introduction to building your own constructs. You must bring your laptop to participate.

DOP305 | Develop AWS CloudFormation templates to manage your infrastructure In this workshop, learn how to develop and test AWS CloudFormation templates. Create CloudFormation templates to deploy and manage resources and learn about CloudFormation language features that allow you to reuse and extend templates for many scenarios. Explore testing tools that can help you validate your CloudFormation templates, including cfn-lint and CloudFormation Guard. You must bring your laptop to participate.

DOP307-R | Hands-on with Amazon CodeWhisperer In this workshop, learn how to build applications faster and more securely with Amazon CodeWhisperer. The workshop begins with several examples highlighting how CodeWhisperer incorporates your comments and existing code to produce results. Then dive into a series of challenges designed to improve your productivity using multiple languages and frameworks. You must bring your laptop to participate.

DOP308 | Enforcing development standards with Amazon CodeCatalyst In this workshop, learn how Amazon CodeCatalyst can accelerate the application development lifecycle within your organization. Discover how your cloud center of excellence (CCoE) can provide standardized code and workflows to help teams get started quickly and securely. In addition, learn how to update projects as organization standards evolve. You must bring your laptop to participate.

Level 400 — Expert

DOP401 | Get better at building AWS CDK constructs In this workshop, dive deep into how to design AWS CDK constructs, which are reusable and shareable cloud components that help you meet your organization’s security, compliance, and governance requirements. Learn how to build, test, and share constructs representing a single AWS resource, as well as how to create higher-level abstractions that include built-in defaults and allow you to provision multiple AWS resources. You must bring your laptop to participate.

DevOps and Developer Productivity builders’ sessions

What are builders’ sessions?

These 60-minute group sessions are led by an AWS expert and provide an interactive learning experience for building on AWS. Builders’ sessions are designed to create a hands-on experience where questions are encouraged.

Level 300 — Advanced

DOP322-R | Accelerate data science coding with Amazon CodeWhisperer Generative AI removes the heavy lifting that developers experience today by writing much of the undifferentiated code, allowing them to build faster. Helping developers code faster could be one of the most powerful uses of generative AI that we will see in the coming years—and this framework can also be applied to data science projects. In this builders’ session, explore how Amazon CodeWhisperer accelerates the completion of data science coding tasks with extensions for JupyterLab and Amazon SageMaker. Learn how to build data processing pipeline and machine learning models with the help of CodeWhisperer and accelerate data science experiments in Python. You must bring your laptop to participate.

Level 400 — Expert

DOP402-R | Manage dev environments at scale with Amazon CodeCatalyst Amazon CodeCatalyst Dev Environments are cloud-based environments that you can use to quickly work on the code stored in the source repositories of your project. They are automatically created with pre-installed dependencies and language-specific packages so you can work on a new or existing project right away. In this session, learn how to create secure, reproducible, and consistent environments for VS Code, AWS Cloud9, and JetBrains IDEs. You must bring your laptop to participate.

DOP403-R | Hands-on with Amazon CodeCatalyst: Automating security in CI/CD pipelines In this session, learn how to build a CI/CD pipeline with Amazon CodeCatalyst and add the necessary steps to secure your pipeline. Learn how to perform tasks such as secret scanning, software composition analysis (SCA), static application security testing (SAST), and generating a software bill of materials (SBOM). You must bring your laptop to participate.

DevOps and Developer Productivity lightning talks

What are lightning talks?

Lightning talks are short, 20-minute demos led from a stage.

DOP221 | Amazon CodeCatalyst in real time: Deploying to production in minutes In this follow-up demonstration to DOP210, see how you can use an Amazon CodeCatalyst blueprint to build a production-ready application that is set up for long-term success. See in real time how to create a project using a CodeCatalyst Dev Environment and deploy it to production using a CodeCatalyst workflow.

DevOps and Developer Productivity code talks

What are code talks?

Code talks are 60-minute, highly-interactive discussions featuring live coding. Attendees are encouraged to dig in and ask questions about the speaker’s approach.

DOP203 | The future of development on AWS This code talk includes a live demo and an open discussion about how builders can use the latest AWS developer tools and generative AI to build production-ready applications in minutes. Starting at an Amazon CodeCatalyst blueprint and using integrated AWS productivity and security capabilities, see a glimpse of what the future holds for developing on AWS.

DOP204 | Tips and tricks for coding with Amazon CodeWhisperer Generative AI tools that can generate code suggestions, such as Amazon CodeWhisperer, are growing rapidly in popularity. Join this code talk to learn how CodeWhisperer can accelerate and enhance your software development with code generation, reference tracking, security scans, and more. Learn best practices for prompt engineering, and get tips and tricks that can help you be more productive when building applications.

Want to stay connected?

Get the latest updates for DevOps and Developer Productivity by following us on Twitter and visiting the AWS devops blog.

Blue/Green deployments using AWS CDK Pipelines and AWS CodeDeploy

2023-10-10 Luiz Decaro

Post Syndicated from Luiz Decaro original https://aws.amazon.com/blogs/devops/blue-green-deployments-using-aws-cdk-pipelines-and-aws-codedeploy/

Customers often ask for help with implementing Blue/Green deployments to Amazon Elastic Container Service (Amazon ECS) using AWS CodeDeploy. Their use cases usually involve cross-Region and cross-account deployment scenarios. These requirements are challenging enough on their own, but in addition to those, there are specific design decisions that need to be considered when using CodeDeploy. These include how to configure CodeDeploy, when and how to create CodeDeploy resources (such as Application and Deployment Group), and how to write code that can be used to deploy to any combination of account and Region.

Today, I will discuss those design decisions in detail and how to use CDK Pipelines to implement a self-mutating pipeline that deploys services to Amazon ECS in cross-account and cross-Region scenarios. At the end of this blog post, I also introduce a demo application, available in Java, that follows best practices for developing and deploying cloud infrastructure using AWS Cloud Development Kit (AWS CDK).

The Pipeline

CDK Pipelines is an opinionated construct library used for building pipelines with different deployment engines. It abstracts implementation details that developers or infrastructure engineers need to solve when implementing a cross-Region or cross-account pipeline. For example, in cross-Region scenarios, AWS CloudFormation needs artifacts to be replicated to the target Region. For that reason, AWS Key Management Service (AWS KMS) keys, an Amazon Simple Storage Service (Amazon S3) bucket, and policies need to be created for the secondary Region. This enables artifacts to be moved from one Region to another. In cross-account scenarios, CodeDeploy requires a cross-account role with access to the KMS key used to encrypt configuration files. This is the sort of detail that our customers want to avoid dealing with manually.

AWS CodeDeploy is a deployment service that automates application deployment across different scenarios. It deploys to Amazon EC2 instances, On-Premises instances, serverless Lambda functions, or Amazon ECS services. It integrates with AWS Identity and Access Management (AWS IAM), to implement access control to deploy or re-deploy old versions of an application. In the Blue/Green deployment type, it is possible to automate the rollback of a deployment using Amazon CloudWatch Alarms.

CDK Pipelines was designed to automate AWS CloudFormation deployments. Using AWS CDK, these CloudFormation deployments may include deploying application software to instances or containers. However, some customers prefer using CodeDeploy to deploy application software. In this blog post, CDK Pipelines will deploy using CodeDeploy instead of CloudFormation.

A pipeline build with CDK Pipelines that deploys to Amazon ECS using AWS CodeDeploy. It contains at least 5 stages: Source, Build, UpdatePipeline, Assets and at least one Deployment stage.

Design Considerations

In this post, I’m considering the use of CDK Pipelines to implement different use cases for deploying a service to any combination of accounts (single-account & cross-account) and regions (single-Region & cross-Region) using CodeDeploy. More specifically, there are four problems that need to be solved:

CodeDeploy Configuration

The most popular options for implementing a Blue/Green deployment type using CodeDeploy are using CloudFormation Hooks or using a CodeDeploy construct. I decided to operate CodeDeploy using its configuration files. This is a flexible design that doesn’t rely on using custom resources, which is another technique customers have used to solve this problem. On each run, a pipeline pushes a container to a repository on Amazon Elastic Container Registry (ECR) and creates a tag. CodeDeploy needs that information to deploy the container.

I recommend creating a pipeline action to scan the AWS CDK cloud assembly and retrieve the repository and tag information. The same action can create the CodeDeploy configuration files. Three configuration files are required to configure CodeDeploy: appspec.yaml, taskdef.json and imageDetail.json. This pipeline action should be executed before the CodeDeploy deployment action. I recommend creating template files for appspec.yaml and taskdef.json. The following script can be used to implement the pipeline action:

##
#!/bin/sh
#
# Action Configure AWS CodeDeploy
# It customizes the files template-appspec.yaml and template-taskdef.json to the environment
#
# Account = The target Account Id
# AppName = Name of the application
# StageName = Name of the stage
# Region = Name of the region (us-east-1, us-east-2)
# PipelineId = Id of the pipeline
# ServiceName = Name of the service. It will be used to define the role and the task definition name
#
# Primary output directory is codedeploy/. All the 3 files created (appspec.json, imageDetail.json and 
# taskDef.json) will be located inside the codedeploy/ directory
#
##
Account=$1
Region=$2
AppName=$3
StageName=$4
PipelineId=$5
ServiceName=$6
repo_name=$(cat assembly*$PipelineId-$StageName/*.assets.json | jq -r '.dockerImages[] | .destinations[] | .repositoryName' | head -1) 
tag_name=$(cat assembly*$PipelineId-$StageName/*.assets.json | jq -r '.dockerImages | to_entries[0].key')  
echo ${repo_name} 
echo ${tag_name} 
printf '{"ImageURI":"%s"}' "$Account.dkr.ecr.$Region.amazonaws.com/${repo_name}:${tag_name}" > codedeploy/imageDetail.json                     
sed 's#APPLICATION#'$AppName'#g' codedeploy/template-appspec.yaml > codedeploy/appspec.yaml 
sed 's#APPLICATION#'$AppName'#g' codedeploy/template-taskdef.json | sed 's#TASK_EXEC_ROLE#arn:aws:iam::'$Account':role/'$ServiceName'#g' | sed 's#fargate-task-definition#'$ServiceName'#g' > codedeploy/taskdef.json 
cat codedeploy/appspec.yaml
cat codedeploy/taskdef.json
cat codedeploy/imageDetail.json

Using a Toolchain

A good strategy is to encapsulate the pipeline inside a Toolchain to abstract how to deploy to different accounts and regions. This helps decoupling clients from the details such as how the pipeline is created, how CodeDeploy is configured, and how cross-account and cross-Region deployments are implemented. To create the pipeline, deploy a Toolchain stack. Out-of-the-box, it allows different environments to be added as needed. Depending on the requirements, the pipeline may be customized to reflect the different stages or waves that different components might require. For more information, please refer to our best practices on how to automate safe, hands-off deployments and its reference implementation.

In detail, the Toolchain stack follows the builder pattern used throughout the CDK for Java. This is a convenience that allows complex objects to be created using a single statement:

 Toolchain.Builder.create(app, Constants.APP_NAME+"Toolchain")
        .stackProperties(StackProps.builder()
                .env(Environment.builder()
                        .account(Demo.TOOLCHAIN_ACCOUNT)
                        .region(Demo.TOOLCHAIN_REGION)
                        .build())
                .build())
        .setGitRepo(Demo.CODECOMMIT_REPO)
        .setGitBranch(Demo.CODECOMMIT_BRANCH)
        .addStage(
                "UAT",
                EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
                Environment.builder()
                        .account(Demo.SERVICE_ACCOUNT)
                        .region(Demo.SERVICE_REGION)
                        .build())                                                                                                             
        .build();

In the statement above, the continuous deployment pipeline is created in the TOOLCHAIN_ACCOUNT and TOOLCHAIN_REGION. It implements a stage that builds the source code and creates the Java archive (JAR) using Apache Maven. The pipeline then creates a Docker image containing the JAR file.

The UAT stage will deploy the service to the SERVICE_ACCOUNT and SERVICE_REGION using the deployment configuration CANARY_10_PERCENT_5_MINUTES. This means 10 percent of the traffic is shifted in the first increment and the remaining 90 percent is deployed 5 minutes later.

To create additional deployment stages, you need a stage name, a CodeDeploy deployment configuration and an environment where it should deploy the service. As mentioned, the pipeline is, by default, a self-mutating pipeline. For example, to add a Prod stage, update the code that creates the Toolchain object and submit this change to the code repository. The pipeline will run and update itself adding a Prod stage after the UAT stage. Next, I show in detail the statement used to add a new Prod stage. The new stage deploys to the same account and Region as in the UAT environment:

... 
        .addStage(
                "Prod",
                EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
                Environment.builder()
                        .account(Demo.SERVICE_ACCOUNT)
                        .region(Demo.SERVICE_REGION)
                        .build())                                                                                                                                      
        .build();

In the statement above, the Prod stage will deploy new versions of the service using a CodeDeploy deployment configuration CANARY_10_PERCENT_5_MINUTES. It means that 10 percent of traffic is shifted in the first increment of 5 minutes. Then, it shifts the rest of the traffic to the new version of the application. Please refer to Organizing Your AWS Environment Using Multiple Accounts whitepaper for best-practices on how to isolate and manage your business applications.

Some customers might find this approach interesting and decide to provide this as an abstraction to their application development teams. In this case, I advise creating a construct that builds such a pipeline. Using a construct would allow for further customization. Examples are stages that promote quality assurance or deploy the service in a disaster recovery scenario.

The implementation creates a stack for the toolchain and another stack for each deployment stage. As an example, consider a toolchain created with a single deployment stage named UAT. After running successfully, the DemoToolchain and DemoService-UAT stacks should be created as in the next image:

Two stacks are needed to create a Pipeline that deploys to a single environment. One stack deploys the Toolchain with the Pipeline and another stack deploys the Service compute infrastructure and CodeDeploy Application and DeploymentGroup. In this example, for an application named Demo that deploys to an environment named UAT, the stacks deployed are: DemoToolchain and DemoService-UAT

CodeDeploy Application and Deployment Group

CodeDeploy configuration requires an application and a deployment group. Depending on the use case, you need to create these in the same or in a different account from the toolchain (pipeline). The pipeline includes the CodeDeploy deployment action that performs the blue/green deployment. My recommendation is to create the CodeDeploy application and deployment group as part of the Service stack. This approach allows to align the lifecycle of CodeDeploy application and deployment group with the related Service stack instance.

CodePipeline allows to create a CodeDeploy deployment action that references a non-existing CodeDeploy application and deployment group. This allows us to implement the following approach:

Toolchain stack deploys the pipeline with CodeDeploy deployment action referencing a non-existing CodeDeploy application and deployment group
When the pipeline executes, it first deploys the Service stack that creates the related CodeDeploy application and deployment group
The next pipeline action executes the CodeDeploy deployment action. When the pipeline executes the CodeDeploy deployment action, the related CodeDeploy application and deployment will already exist.

Below is the pipeline code that references the (initially non-existing) CodeDeploy application and deployment group.

private IEcsDeploymentGroup referenceCodeDeployDeploymentGroup(
        final Environment env, 
        final String serviceName, 
        final IEcsDeploymentConfig ecsDeploymentConfig, 
        final String stageName) {

    IEcsApplication codeDeployApp = EcsApplication.fromEcsApplicationArn(
            this,
            Constants.APP_NAME + "EcsCodeDeployApp-"+stageName,
            Arn.format(ArnComponents.builder()
                    .arnFormat(ArnFormat.COLON_RESOURCE_NAME)
                    .partition("aws")
                    .region(env.getRegion())
                    .service("codedeploy")
                    .account(env.getAccount())
                    .resource("application")
                    .resourceName(serviceName)
                    .build()));

    IEcsDeploymentGroup deploymentGroup = EcsDeploymentGroup.fromEcsDeploymentGroupAttributes(
            this,
            Constants.APP_NAME + "-EcsCodeDeployDG-"+stageName,
            EcsDeploymentGroupAttributes.builder()
                    .deploymentGroupName(serviceName)
                    .application(codeDeployApp)
                    .deploymentConfig(ecsDeploymentConfig)
                    .build());

    return deploymentGroup;
}

To make this work, you should use the same application name and deployment group name values when creating the CodeDeploy deployment action in the pipeline and when creating the CodeDeploy application and deployment group in the Service stack (where the Amazon ECS infrastructure is deployed). This approach is necessary to avoid a circular dependency error when trying to create the CodeDeploy application and deployment group inside the Service stack and reference these objects to configure the CodeDeploy deployment action inside the pipeline. Below is the code that uses Service stack construct ID to name the CodeDeploy application and deployment group. I set the Service stack construct ID to the same name I used when creating the CodeDeploy deployment action in the pipeline.

   // configure AWS CodeDeploy Application and DeploymentGroup
   EcsApplication app = EcsApplication.Builder.create(this, "BlueGreenApplication")
           .applicationName(id)
           .build();

   EcsDeploymentGroup.Builder.create(this, "BlueGreenDeploymentGroup")
           .deploymentGroupName(id)
           .application(app)
           .service(albService.getService())
           .role(createCodeDeployExecutionRole(id))
           .blueGreenDeploymentConfig(EcsBlueGreenDeploymentConfig.builder()
                   .blueTargetGroup(albService.getTargetGroup())
                   .greenTargetGroup(tgGreen)
                   .listener(albService.getListener())
                   .testListener(listenerGreen)
                   .terminationWaitTime(Duration.minutes(15))
                   .build())
           .deploymentConfig(deploymentConfig)
           .build();

CDK Pipelines roles and permissions

CDK Pipelines creates roles and permissions the pipeline uses to execute deployments in different scenarios of regions and accounts. When using CodeDeploy in cross-account scenarios, CDK Pipelines deploys a cross-account support stack that creates a pipeline action role for the CodeDeploy action. This cross-account support stack is defined in a JSON file that needs to be published to the AWS CDK assets bucket in the target account. If the pipeline has the self-mutation feature on (default), the UpdatePipeline stage will do a cdk deploy to deploy changes to the pipeline. In cross-account scenarios, this deployment also involves deploying/updating the cross-account support stack. For this, the SelfMutate action in UpdatePipeline stage needs to assume CDK file-publishing and a deploy roles in the remote account.

The IAM role associated with the AWS CodeBuild project that runs the UpdatePipeline stage does not have these permissions by default. CDK Pipelines cannot grant these permissions automatically, because the information about the permissions that the cross-account stack needs is only available after the AWS CDK app finishes synthesizing. At that point, the permissions that the pipeline has are already locked-in. Hence, for cross-account scenarios, the toolchain should extend the permissions of the pipeline’s UpdatePipeline stage to include the file-publishing and deploy roles.

In cross-account environments it is possible to manually add these permissions to the UpdatePipeline stage. To accomplish that, the Toolchain stack may be used to hide this sort of implementation detail. In the end, a method like the one below can be used to add these missing permissions. For each different mapping of stage and environment in the pipeline it validates if the target account is different than the account where the pipeline is deployed. When the criteria is met, it should grant permission to the UpdatePipeline stage to assume CDK bootstrap roles (tagged using key aws-cdk:bootstrap-role) in the target account (with the tag value as file-publishing or deploy). The example below shows how to add permissions to the UpdatePipeline stage:

private void grantUpdatePipelineCrossAccoutPermissions(Map<String, Environment> stageNameEnvironment) {

    if (!stageNameEnvironment.isEmpty()) {

        this.pipeline.buildPipeline();
        for (String stage : stageNameEnvironment.keySet()) {

            HashMap<String, String[]> condition = new HashMap<>();
            condition.put(
                    "iam:ResourceTag/aws-cdk:bootstrap-role",
                    new String[] {"file-publishing", "deploy"});
            pipeline.getSelfMutationProject()
                    .getRole()
                    .addToPrincipalPolicy(PolicyStatement.Builder.create()
                            .actions(Arrays.asList("sts:AssumeRole"))
                            .effect(Effect.ALLOW)
                            .resources(Arrays.asList("arn:*:iam::"
                                    + stageNameEnvironment.get(stage).getAccount() + ":role/*"))
                            .conditions(new HashMap<String, Object>() {{
                                    put("ForAnyValue:StringEquals", condition);
                            }})
                            .build());
        }
    }
}

The Deployment Stage

Let’s consider a pipeline that has a single deployment stage, UAT. The UAT stage deploys a DemoService. For that, it requires four actions: DemoService-UAT (Prepare and Deploy), ConfigureBlueGreenDeploy and Deploy.

The
DemoService-UAT.Deploy action will create the ECS resources and the CodeDeploy application and deployment group. The
ConfigureBlueGreenDeploy action will read the AWS CDK
cloud assembly. It uses the configuration files to identify the Amazon Elastic Container Registry (Amazon ECR) repository and the container image tag pushed. The pipeline will send this information to the
Deploy action. The
Deploy action starts the deployment using CodeDeploy.

Solution Overview

As a convenience, I created an application, written in Java, that solves all these challenges and can be used as an example. The application deployment follows the same 5 steps for all deployment scenarios of account and Region, and this includes the scenarios represented in the following design:

A pipeline created by a Toolchain should be able to deploy to any combination of accounts and regions. This includes four scenarios: single-account and single-Region, single-account and cross-Region, cross-account and single-Region and cross-account and cross-Region

Conclusion

In this post, I identified, explained and solved challenges associated with the creation of a pipeline that deploys a service to Amazon ECS using CodeDeploy in different combinations of accounts and regions. I also introduced a demo application that implements these recommendations. The sample code can be extended to implement more elaborate scenarios. These scenarios might include automated testing, automated deployment rollbacks, or disaster recovery. I wish you success in your transformative journey.

How GitHub uses GitHub Actions and Actions larger runners to build and test GitHub.com

2023-09-26 Max Wagner

Post Syndicated from Max Wagner original https://github.blog/2023-09-26-how-github-uses-github-actions-and-actions-larger-runners-to-build-and-test-github-com/

The Developer Experience (DX) team at GitHub collaborated with a number of other teams to work on moving our continuous integration (CI) system to GitHub Actions to support the development and scaling demands of our engineering team. Our goal as a team is to enable our engineers to confidently and quickly ship software. To that end, we’ve worked on providing paved paths, a suite of automated tools and applications to streamline our development, runtime platforms, and deployments. Recently, we’ve been working to make our CI experience better by leveraging the newly released GitHub feature, Actions larger runners, to run our CI.

Read on to see how we run 15,000 CI jobs within an hour across 150,000 cores of compute!

Brief history of CI at GitHub

GitHub has invested in a variety of different CI systems throughout its history. With each system, our aim has been to enhance the development experience for both GitHub engineers writing and deploying code and for engineers maintaining the systems.

However, with past CI systems we faced challenges with scaling the system to meet the needs of our engineering team to provide both stable and ephemeral build environments. Neither of these challenges allowed us to provide the optimal developer experience.

Then, GitHub released GitHub Actions larger runners. This gave us an opportunity not only to transition to a fully featured CI system, but also to develop, experience, and utilize the systems we are creating for our customers and to drive feedback to help build the product. For the GitHub DX team, this transition was a great opportunity to move away from maintaining our past CI systems while delivering a superior developer experience.

What are larger runners?

Larger runners are GitHub Actions runners that are hosted by GitHub. They are managed virtual machines (VMs) with more RAM, CPU, and disk space than standard GitHub-hosted runners. There are a variety of different machine sizes offered for the runners as well as some additional features compared to the standard GitHub-hosted runners.

Larger runners are available to GitHub Team and GitHub Enterprise Cloud customers. Check out these docs to learn more about larger runners.

Why did we pick larger runners?

Autoscaling and managed

Coming from previous iterations of GitHub’s CI systems, we needed the ability to create CI machines on demand to meet the fast feedback cycles needed by GitHub engineers and to scale with the rate of change of the site.

With larger runners, we maintain the ability to autoscale our CI system because GitHub will automatically create multiple instances of a runner that scale up and down to match the job demands of our engineers. An added benefit is that the GitHub DX team no longer has to worry about the scaling of the runners since all of those complexities are handled by GitHub itself!

We wanted to share some raw numbers on our current peak utilization of larger runners:

Uses 4,500 concurrent 32-core runners
Runs 125,000 build minutes per hour
Queues and runs approximately 15,000 jobs within an hour
Allocates around 150,000 cores of compute

(Beta) Custom VM image support

GitHub Actions provides runners with a lot of tools already baked in, which is sufficient and convenient for a variety of projects across the company. However, for some complex production GitHub services, the prebuilt runners did not satisfy all our requirements.

To maintain an efficient and fast CI system, the DX team needed the ability to provide machines with all the tools needed to build those production services. We didn’t want to spend extra time installing tools or compiling projects during CI jobs.

We are currently building features into larger runners so they have the ability to be launched from a custom VM image, called custom images. While this feature is still in beta, using custom images is a huge benefit to GitHub’s CI lifecycle for a couple of reasons.

First, custom images allows GitHub to bundle all the required software and tools needed to build and test complex production bearing services. Anything that is unique to GitHub or one of our projects can be pre-installed on the image before a GitHub Actions workflow even starts.

Second, custom images enable GitHub to dramatically speed up our GitHub Actions workflows by acting as a bootstrapping cache for some projects. During custom image creation, we bundle a pre-built version of a project’s source code into the image. Subsequently, when the project starts a GitHub Actions workflow, it can utilize a cached version of its source code, and any other build artifacts, to speed up its build process.

The cached project source code on the custom VM image can quickly become out of date due to the rapid rate of development within GitHub. This, in turn, causes workflow durations to increase. The DX team worked with the GitHub Actions engineering team to create an API on GitHub to regularly update the custom image multiple times a day to keep the project source up to date.

In practice, this has reduced the bootstrapping time of our projects significantly. Without custom images, our workflows would take around 50 minutes from start to finish, versus the 12 minutes they take today. This is a game changer for our engineers.

We’re working on a way to offer this functionality at scale. If you are interested in custom images for your CI/CD workflows, please reach out to your account manager to learn more!

Important GitHub Actions features

There are thousands of projects at GitHub — from services that run production workloads to small tools that need to run CI to perform their daily operations. To make this a reality, GitHub leverages several important features in GitHub Actions that enable us to use the platform efficiently and securely across the company at scale.

Reusable workflows

One of the DX team’s driving goals is to pave paths for all repositories to run CI without introducing unnecessary repetition across repositories. Prior to GitHub Actions, we created single job configurations that could be used across multiple projects. In GitHub Actions, this was not as easy because any repository can define its own workflows. Reusable workflows to the rescue!

The reusable workflows feature in GitHub Actions provides a way to centrally manage a workflow in a repository that can be utilized by many other repositories in an organization. This was critical in our transition from our previous CI system to GitHub Actions. We were able to create several prebuilt workflows in a single repository, and many repositories could then use those workflows. This makes the process of adding CI to an existing or new project very much plug and play.

In our central repository hosting our reusable workflows, we can have workflows defined like:

on:
  workflow_call:
    inputs:
      cibuild-script:
        description: 'Which cibuild script to run.'
        type: string
        required: false
        default: "script/cibuild"
    secrets:
      service-api-key:
        required: true

jobs:
  reusable_workflow_job:
    runs-on: gh-larger-runner-medium
    name: Simple Workflow Job
    timeout-minutes: 20
    steps:
      - name: Checkout Project
        uses: actions/checkout@v3
      - name: Run cibuild script
        run: |
          bash ${{ inputs.cibuild-script }}
        shell: bash

And in consuming repositories, they can simply utilize the reusable workflow, with just a few lines of code!

name: my-new-project
on:
  workflow_dispatch:
  push:

jobs:
  call-reusable-workflow:
    uses: github/internal-actions/.github/workflows/default.yml@main
    with:
      cibuild-script: "script/cibuild-my-tests"
    secrets:
      service-api-key: ${{ secrets.SERVICE_API_KEY }}

Another great benefit of the reusable workflows feature is that the runner can be defined in the Reusable Workflow, meaning that we can guarantee all users of the workflow will run on our designated larger runner pool. Now, projects don’t need to worry about which runner they need to use!

(Beta) Reusing previous workflow outcomes

To optimize our developer experience, the DX team worked with our engineering team to create a feature for GitHub Actions that allows workflows to reuse the outcome of a previous workflow run where the outcomes would be the same.

In some cases, the file contents of a repository are exactly the same between workflow runs that run on different commits. That is, the Git tree IDs for the current commit is the same as the previous commit (there are no file differences). In these cases, we can bypass CI checks by reusing the previous workflow outcomes and allow engineers to not have to wait for CI to run again.

This feature saves GitHub engineers from running anywhere from 300 to 500 workflows runs a day!

Other challenges faced

Private service access

During some internal GitHub Actions workflow runs, the workflows need the ability to access some GitHub private services, within a GitHub virtual private cloud (VPC), over the network. These could be resources such as artifact storage, application metadata services, and other services that enable invocation of our test harness.

When we moved to larger runners, this requirement to access private services became a top-of-mind concern. In previous iterations of our CI infrastructure, these private services were accessible through other cloud and network configurations. However, larger runners are isolated from other production environments, meaning they cannot access our private services.

Like all companies, we need to focus on both the security of our platform as well as the developer experience. To satisfy these two requirements, GitHub developed a remote access solution that allows clients residing outside of our VPCs (larger runners) to securely access select private services.

This remote access solution works on the principle of minting an OIDC token in GitHub Actions, passing the OIDC token to a remote access gateway that authorizes the request by validating the OIDC token, and then proxying the request to the private service residing in a private network.

Flow diagram showing an OIDC token being mined in GitHub Actions, passed to a remote access gateway that authorizes the request by validating the OIDC token, and then proxying the request to the private service residing in a private network.

With this solution we are able to securely provide remote access from larger runners running GitHubActions to our private resources within our VPC.

GitHub has open sourced the basic scaffolding of this remote access gateway in the github/actions-oidc-gateway-example repository, so be sure to check it out!

Conclusion

GitHub Actions provides a robust and smooth developer experience for GitHub engineers working on GitHub.com. We have been able to accomplish this by using the power of GitHub Actions features, such as reusable workflows and reusable workflow outcomes, and by leveraging the scalability and manageability of the GitHub Actions larger runners. We have also used this effort to enhance the GitHub Actions product. To put it simply, GitHub runs on GitHub.

The post How GitHub uses GitHub Actions and Actions larger runners to build and test GitHub.com appeared first on The GitHub Blog.

How to add notifications and manual approval to an AWS CDK Pipeline

2023-08-17 Jehu Gray

Post Syndicated from Jehu Gray original https://aws.amazon.com/blogs/devops/how-to-add-notifications-and-manual-approval-to-an-aws-cdk-pipeline/

A deployment pipeline typically comprises several stages such as dev, test, and prod, which ensure that changes undergo testing before reaching the production environment. To improve the reliability and stability of release processes, DevOps teams must review Infrastructure as Code (IaC) changes before applying them in production. As a result, implementing a mechanism for notification and manual approval that grants stakeholders improved access to changes in their release pipelines has become a popular practice for DevOps teams.

Notifications keep development teams and stakeholders informed in real-time about updates and changes to deployment status within release pipelines. Manual approvals establish thresholds for transitioning a change from one stage to the next in the pipeline. They also act as a guardrail to mitigate risks arising from errors and rework because of faulty deployments.

Please note that manual approvals, as described in this post, are not a replacement for the use of automation. Instead, they complement automated checks within the release pipeline.

In this blog post, we describe how to set up notifications and add a manual approval stage to AWS Cloud Development Kit (AWS CDK) Pipeline.

Concepts

CDK Pipeline

CDK Pipelines is a construct library for painless continuous delivery of CDK applications. CDK Pipelines can automatically build, test, and deploy changes to CDK resources. CDK Pipelines are self-mutating which means as application stages or stacks are added, the pipeline automatically reconfigures itself to deploy those new stages or stacks. Pipelines need only be manually deployed once, afterwards, the pipeline keeps itself up to date from the source code repository by pulling the changes pushed to the repository.

Notifications

Adding notifications to a pipeline provides visibility to changes made to the environment by utilizing the NotificationRule construct. You can also use this rule to notify pipeline users of important changes, such as when a pipeline starts execution. Notification rules specify both the events and the targets, such as Amazon Simple Notification Service (Amazon SNS) topic or AWS Chatbot clients configured for Slack which represents the nominated recipients of the notifications. An SNS topic is a logical access point that acts as a communication channel while Chatbot is an AWS service that enables DevOps and software development teams to use messaging program chat rooms to monitor and respond to operational events.

Manual Approval

In a CDK pipeline, you can incorporate an approval action at a specific stage, where the pipeline should pause, allowing a team member or designated reviewer to manually approve or reject the action. When an approval action is ready for review, a notification is sent out to alert the relevant parties. This combination of notifications and approvals ensures timely and efficient decision-making regarding crucial actions within the pipeline.

Solution Overview

The solution explains a simple web service that is comprised of an AWS Lambda function that returns a static web page served by Amazon API Gateway. Since Continuous Deployment and Continuous Integration (CI/CD) are important components to most web projects, the team implements a CDK Pipeline for their web project.

There are two important stages in this CDK pipeline; the Pre-production stage for testing and the Production stage, which contains the end product for users.

The flow of the CI/CD process to update the website starts when a developer pushes a change to the repository using their Integrated Development Environment (IDE). An Amazon CloudWatch event triggers the CDK Pipeline. Once the changes reach the pre-production stage for testing, the CI/CD process halts. This is because a manual approval gate is between the pre-production and production stages. So, it becomes a stakeholder’s responsibility to review the changes in the pre-production stage before approving them for production. The pipeline includes an SNS notification that notifies the stakeholder whenever the pipeline requires manual approval.

After approving the changes, the CI/CD process proceeds to the production stage and the updated version of the website becomes available to the end user. If the approver rejects the changes, the process ends at the pre-production stage with no impact to the end user.

The following diagram illustrates the solution architecture.

This diagram shows the CDK pipeline process in the solution and how applications or updates are deployed using AWS Lambda Function to end users.

Figure 1. This image shows the CDK pipeline process in our solution and how applications or updates are deployed using AWS Lambda Function to end users.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Install Python version 3.6 or later
A basic understanding of CDK and CDK Pipelines. Please go through the Python Workshop on cdkworkshop.com to follow along with the code examples and get hands-on learning about CDK and related concepts.
Install AWS CDK version 2.73.0 or later
Set up a CDK pipeline, and have a basic understanding of how SNS works .
Since the pipeline stack is being modified, there may be a need to run cdk deploy locally again.
NOTE: The CDK Pipeline code structure used in the CDK workshop can be found here: Pipeline stack Code.

Add notification to the pipeline

In this tutorial, perform the following steps:

Add the import statements for AWS CodeStar notifications and SNS to the import section of the pipeline stack py

import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.pipelines as pipelines
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs

Ensure the pipeline is built by calling the ‘build pipeline’ function.

pipeline.build_pipeline()

Create an SNS topic.

topic = sns.Topic(self, "MyTopic1")

Add a subscription to the topic. This specifies where the notifications are sent (Add the stakeholders’ email here).

topic.add_subscription(subs.EmailSubscription("[email protected]"))

Define a rule. This contains the source for notifications, the event trigger, and the target .

rule = notifications.NotificationRule(self, "NotificationRule", )

Assign the source the value pipeline.pipeline The first pipeline is the name of the CDK pipeline(variable) and the .pipeline is to show it is a pipeline(function).

source=pipeline.pipeline,

Define the events to be monitored. Specify notifications for when the pipeline starts, when it fails, when the execution succeeds, and finally when manual approval is needed.

events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],

For the complete list of supported event types for pipelines, see here
Finally, add the target. The target here is the topic created previously.

targets=[topic]

The combination of all the steps becomes:

pipeline.build_pipeline()
topic = sns.Topic(self, "MyTopic1")
topic.add_subscription(subs.EmailSubscription("[email protected]"))
rule = notifications.NotificationRule(self, "NotificationRule",
source=pipeline.pipeline,
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
targets=[topic]
)

Adding Manual Approval

Add the ManualApprovalStep import to the aws_cdk.pipelines import statement.

from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)

Add the ManualApprovalStep to the production stage. The code must be added to the add_stage() function.

 prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])

When a stage is added to a pipeline, you can specify the pre and post steps, which are arbitrary steps that run before or after the contents of the stage. You can use them to add validations like manual or automated gates to the pipeline. It is recommended to put manual approval gates in the set of pre steps, and automated approval gates in the set of post steps. So, the manual approval action is added as a pre step that runs after the pre-production stage and before the production stage .

The final version of the pipeline_stack.py becomes:

from constructs import Construct
import aws_cdk as cdk
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
from aws_cdk import (
    Stack,
    aws_codecommit as codecommit,
    aws_codepipeline as codepipeline,
    pipelines as pipelines,
    aws_codepipeline_actions as cpactions,
    
)
from aws_cdk.pipelines import (
    CodePipeline,
    CodePipelineSource,
    ShellStep,
    ManualApprovalStep
)


class WorkshopPipelineStack(cdk.Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        # Creates a CodeCommit repository called 'WorkshopRepo'
        repo = codecommit.Repository(
            self, "WorkshopRepo", repository_name="WorkshopRepo",
            
        )
        
        #Create the Cdk pipeline
        pipeline = pipelines.CodePipeline(
            self,
            "Pipeline",
            
            synth=pipelines.ShellStep(
                "Synth",
                input=pipelines.CodePipelineSource.code_commit(repo, "main"),
                commands=[
                    "npm install -g aws-cdk",  # Installs the cdk cli on Codebuild
                    "pip install -r requirements.txt",  # Instructs Codebuild to install required packages
                    "npx cdk synth",
                ]
                
            ),
        )

        
         # Create the Pre-Prod Stage and its API endpoint
        deploy = WorkshopPipelineStage(self, "Pre-Prod")
        deploy_stage = pipeline.add_stage(deploy)
    
        deploy_stage.add_post(
            
            pipelines.ShellStep(
                "TestViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
            )
    
        
        )
        deploy_stage.add_post(
            pipelines.ShellStep(
                "TestAPIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create the Prod Stage with the Manual Approval Step
        prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])
        
        prod_stage.add_post(
            
            pipelines.ShellStep(
                "ViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
                
            )
            
        )
        prod_stage.add_post(
            pipelines.ShellStep(
                "APIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create The SNS Notification for the Pipeline
        
        pipeline.build_pipeline()
        
        topic = sns.Topic(self, "MyTopic")
        topic.add_subscription(subs.EmailSubscription("[email protected]"))
        rule = notifications.NotificationRule(self, "NotificationRule",
            source = pipeline.pipeline,
            events = ["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed", "codepipeline-pipeline-manual-approval-needed", "codepipeline-pipeline-manual-approval-succeeded"],
            targets=[topic]
            )

When a commit is made with git commit -am "Add manual Approval" and changes are pushed with git push, the pipeline automatically self-mutates to add the new approval stage.

Now when the developer pushes changes to update the build environment or the end user application, the pipeline execution stops at the point where the approval action was added. The pipeline won’t resume unless a manual approval action is taken.

Image showing the CDK pipeline with the added Manual Approval action on the AWS Management Console

Figure 2. This image shows the pipeline with the added Manual Approval action.

Since there is a notification rule that includes the approval action, an email notification is sent with the pipeline information and approval status to the stakeholder(s) subscribed to the SNS topic.

Image showing the SNS email notification sent when the pipeline starts

Figure 3. This image shows the SNS email notification sent when the pipeline starts.

After pushing the updates to the pipeline, the reviewer or stakeholder can use the AWS Management Console to access the pipeline to approve or deny changes based on their assessment of these changes. This process helps eliminate any potential issues or errors and ensures only changes deemed relevant are made.

Image showing the review action on the AWS Management Console that gives the stakeholder the ability to approve or reject any changes.

Figure 4. This image shows the review action that gives the stakeholder the ability to approve or reject any changes.

If a reviewer rejects the action, or if no approval response is received within seven days of the pipeline stopping for the review action, the pipeline status is “Failed.”

Image showing when a stakeholder rejects the action

Figure 5. This image depicts when a stakeholder rejects the action.

If a reviewer approves the changes, the pipeline continues its execution.

Image showing when a stakeholder approves the action

Figure 6. This image depicts when a stakeholder approves the action.

Considerations

It is important to consider any potential drawbacks before integrating a manual approval process into a CDK pipeline. one such consideration is its implementation may delay the delivery of updates to end users. An example of this is business hours limitation. The pipeline process might be constrained by the availability of stakeholders during business hours. This can result in delays if changes are made outside regular working hours and require approval when stakeholders are not immediately accessible.

Clean up

To avoid incurring future charges, delete the resources. Use cdk destroy via the command line to delete the created stack.

Conclusion

Adding notifications and manual approval to CDK Pipelines provides better visibility and control over the changes made to the pipeline environment. These features ideally complement the existing automated checks to ensure that all updates are reviewed before deployment. This reduces the risk of potential issues arising from bugs or errors. The ability to approve or deny changes through the AWS Management Console makes the review process simple and straightforward. Additionally, SNS notifications keep stakeholders updated on the status of the pipeline, ensuring a smooth and seamless deployment process.

Deploy serverless applications in a multicloud environment using Amazon CodeCatalyst

2023-07-31 Deepak Kovvuri

Post Syndicated from Deepak Kovvuri original https://aws.amazon.com/blogs/devops/deploy-serverless-applications-in-a-multicloud-environment-using-amazon-codecatalyst/

Amazon CodeCatalyst is an integrated service for software development teams adopting continuous integration and deployment practices into their software development process. CodeCatalyst puts the tools you need all in one place. You can plan work, collaborate on code, and build, test, and deploy applications by leveraging CodeCatalyst Workflows.

Introduction

In the first post of the blog series, we showed you how organizations can deploy workloads to instances, and virtual machines (VMs), across hybrid and multicloud environment. The second post of the series covered deploying containerized application in a multicloud environment. Finally, in this post, we explore how organizations can deploy modern, cloud-native, serverless application across multiple cloud platforms. Figure 1 shows the solution which we walk through in the post.

Figure 1 – Architecture diagram

The post walks through how to develop, deploy and test a HTTP RESTful API to Azure Functions using Amazon CodeCatalyst. The solution covers the following steps:

Set up CodeCatalyst development environment and develop your application using the Serverless Framework.
Build a CodeCatalyst workflow to test and then deploy to Azure Functions using GitHub Actions in Amazon CodeCatalyst.

An Amazon CodeCatalyst workflow is an automated procedure that describes how to build, test, and deploy your code as part of a continuous integration and continuous delivery (CI/CD) system. You can use GitHub Actions alongside native CodeCatalyst actions in a CodeCatalyst workflow.

Pre-requisites

An AWS Builder ID for signing into CodeCatalyst.
To create a space in CodeCatalyst, or be added to a space and have the space administrator role assigned to you in that space. For more information, see Creating a space in CodeCatalyst, Managing members of your space, and Space administrator role.
Access to an Azure and credentials for a service principal that has permissions to create and manage Azure Functions.

Walkthrough

In this post, we will create a hello world RESTful API using the Serverless Framework. As we progress through the solution, we will focus on building a CodeCatalyst workflow that deploys and tests the functionality of the application. At the end of the post, the workflow will look similar to the one shown in Figure 2.

CodeCatalyst CI/CD workflow

Figure 2 – CodeCatalyst CI/CD workflow

Environment Setup

Before we start developing the application, we need to setup a CodeCatalyst project and then link a code repository to the project. The code repository can be CodeCatalyst Repo or GitHub. In this scenario, we’ve used GitHub repository. By the time we develop the solution, the repository should look as shown below.

Files in solution's GitHub repository

Figure 3 – Files in GitHub repository

In Amazon CodeCatalyst, there’s an option to create Dev Environments, which can used to work on the code stored in the source repositories of a project. In the post, we create a Dev Environment, and associate it with the source repository created above and work off it. But you may choose not to use a Dev Environment, and can run the following commands, and commit to the repository. The /projects directory of a Dev Environment stores the files that are pulled from the source repository. In the dev environment, install the Serverless Framework using this command:

npm install -g serverless

and then initialize a serverless project in the source repository folder:

├── README.md
├── host.json
├── package.json
├── serverless.yml
└── src
    └── handlers
        ├── goodbye.js
        └── hello.js

We can push the code to the CodeCatalyst project using git. Now, that we have the code in CodeCatalyst, we can turn our focus to building the workflow using the CodeCatalyst console.

CI/CD Setup in CodeCatalyst

Configure access to the Azure Environment

We’ll use the GitHub action for Serverless to create and manage Azure Function. For the action to be able to access the Azure environment, it requires credentials associated with a Service Principal passed to the action as environment variables.

Service Principals in Azure are identified by the CLIENT_ID, CLIENT_SECRET, SUBSCRIPTION_ID, and TENANT_ID properties. Storing these values in plaintext anywhere in your repository should be avoided because anyone with access to the repository which contains the secret can see them. Similarly, these values shouldn’t be used directly in any workflow definitions because they will be visible as files in your repository. With CodeCatalyst, we can protect these values by storing them as secrets within the project, and then reference the secret in the CI\CD workflow.

We can create a secret by choosing Secrets (1) under CI\CD and then selecting ‘Create Secret’ (2) as shown in Figure 4. Now, we can key in the secret name and value of each of the identifiers described above.

Figure 4 – CodeCatalyst Secrets

Building the workflow

To create a new workflow, select CI/CD from navigation on the left and then select Workflows (1). Then, select Create workflow (2), leave the default options, and select Create (3) as shown in Figure 5.

Create CodeCatalyst CI/CD workflow

Figure 5 – Create CI/CD workflow

If the workflow editor opens in YAML mode, select Visual to open the visual designer. Now, we can start adding actions to the workflow.

Configure the Deploy action

We’ll begin by adding a GitHub action for deploying to Azure. Select “+ Actions” to open the actions list and choose GitHub from the dropdown menu. Find the Build action and click “+” to add a new GitHub action to the workflow.

Next, configure the GitHub action from the configurations tab by adding the following snippet to the GitHub Actions YAML property:

- name: Deploy to Azure Functions
  uses: serverless/[email protected]
  with:
    args: -c "serverless plugin install --name serverless-azure-functions && serverless deploy"
    entrypoint: /bin/sh
  env:
    AZURE_SUBSCRIPTION_ID: ${Secrets.SUBSCRIPTION_ID}
    AZURE_TENANT_ID: ${Secrets.TENANT_ID}
    AZURE_CLIENT_ID: ${Secrets.CLIENT_ID}
    AZURE_CLIENT_SECRET: ${Secrets.CLIENT_SECRET}

The above workflow configuration makes use of Serverless GitHub Action that wraps the Serverless Framework to run serverless commands. The action is configured to package and deploy the source code to Azure Functions using the serverless deploy command.

Please note how we were able to pass the secrets to GitHub action by referencing the secret identifiers in the above configuration.

Configure the Test action

Similar to the previous step, we add another GitHub action which will use the serverless framework’s serverless invoke command to test the API deployed on to Azure Functions.

- name: Test Function
  uses: serverless/[email protected]
  with:
    args: |
      -c "serverless plugin install --name serverless-azure-functions && \
          serverless invoke -f hello -d '{\"name\": \"CodeCatalyst\"}' && \
          serverless invoke -f goodbye -d '{\"name\": \"CodeCatalyst\"}'"
    entrypoint: /bin/sh
  env:
    AZURE_SUBSCRIPTION_ID: ${Secrets.SUBSCRIPTION_ID}
    AZURE_TENANT_ID: ${Secrets.TENANT_ID}
    AZURE_CLIENT_ID: ${Secrets.CLIENT_ID}
    AZURE_CLIENT_SECRET: ${Secrets.CLIENT_SECRET}

The workflow is now ready and can be validated by choosing ‘Validate’ and then saved to the repository by choosing ‘Commit’. The workflow should automatically kick-off after commit and the application is automatically deployed to Azure Functions.

The functionality of the API can now be verified from the logs of the test action of the workflow as shown in Figure 6.

Test action in CodeCatalyst CI/CD workfl

Figure 6 – CI/CD workflow Test action

Cleanup

If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the Azure Function App (usually prefixed ‘sls’) using the Azure console. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project. There’s no cost associated with the CodeCatalyst project and you can continue using it.

Conclusion

In summary, this post highlighted how Amazon CodeCatalyst can help organizations deploy cloud-native, serverless workload into multi-cloud environment. The post also walked through the solution detailing the process of setting up Amazon CodeCatalyst to deploy a serverless application to Azure Functions by leveraging GitHub Actions. Though we showed an application deployment to Azure Functions, you can follow a similar process and leverage CodeCatalyst to deploy any type of application to almost any cloud platform. Learn more and get started with your Amazon CodeCatalyst journey!

We would love to hear your thoughts, and experiences, on deploying serverless applications to multiple cloud platforms. Reach out to us if you’ve any questions, or provide your feedback in the comments section.

About Authors

Deploy container applications in a multicloud environment using Amazon CodeCatalyst

2023-07-31 Pawan Shrivastava

Post Syndicated from Pawan Shrivastava original https://aws.amazon.com/blogs/devops/deploy-container-applications-in-a-multicloud-environment-using-amazon-codecatalyst/

In the previous post of this blog series, we saw how organizations can deploy workloads to virtual machines (VMs) in a hybrid and multicloud environment. This post shows how organizations can address the requirement of deploying containers, and containerized applications to hybrid and multicloud platforms using Amazon CodeCatalyst. CodeCatalyst is an integrated DevOps service which enables development teams to collaborate on code, and build, test, and deploy applications with continuous integration and continuous delivery (CI/CD) tools.

One prominent scenario where multicloud container deployment is useful is when organizations want to leverage AWS’ broadest and deepest set of Artificial Intelligence (AI) and Machine Learning (ML) capabilities by developing and training AI/ML models in AWS using Amazon SageMaker, and deploying the model package to a Kubernetes platform on other cloud platforms, such as Azure Kubernetes Service (AKS) for inference. As shown in this workshop for operationalizing the machine learning pipeline, we can train an AI/ML model, push it to Amazon Elastic Container Registry (ECR) as an image, and later deploy the model as a container application.

Scenario description

The solution described in the post covers the following steps:

Setup Amazon CodeCatalyst environment.
Create a Dockerfile along with a manifest for the application, and a repository in Amazon ECR.
Create an Azure service principal which has permissions to deploy resources to Azure Kubernetes Service (AKS), and store the credentials securely in Amazon CodeCatalyst secret.
Create a CodeCatalyst workflow to build, test, and deploy the containerized application to AKS cluster using Github Actions.

The architecture diagram for the scenario is shown in Figure 1.

Solution architecture diagram

Figure 1 – Solution Architecture

Solution Walkthrough

This section shows how to set up the environment, and deploy a HTML application to an AKS cluster.

Setup Amazon ECR and GitHub code repository

Create a new Amazon ECR and a code repository. In this case we’re using GitHub as the repository but you can create a source repository in CodeCatalyst or you can choose to link an existing source repository hosted by another service if that service is supported by an installed extension. Then follow the application and Docker image creation steps outlined in Step 1 in the environment creation process in exposing Multiple Applications on Amazon EKS. Create a file named manifest.yaml as shown, and map the “image” parameter to the URL of the Amazon ECR repository created above.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: multicloud-container-deployment-app
  labels:
    app: multicloud-container-deployment-app
spec:
  selector:
    matchLabels:
      app: multicloud-container-deployment-app
  replicas: 2
  template:
    metadata:
      labels:
        app: multicloud-container-deployment-app
    spec:
      nodeSelector:
        "beta.kubernetes.io/os": linux
      containers:
      - name: ecs-web-page-container
        image: <aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/<my_repository>
        imagePullPolicy: Always
        ports:
            - containerPort: 80
        resources:
          limits:
            memory: "100Mi"
            cpu: "200m"
      imagePullSecrets:
          - name: ecrsecret
---
apiVersion: v1
kind: Service
metadata:
  name: multicloud-container-deployment-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: multicloud-container-deployment-app

Push the files to Github code repository. The multicloud-container-app github repository should look similar to Figure 2 below

Files in multicloud container app github repository

Figure 2 – Files in Github repository

Configure Azure Kubernetes Service (AKS) cluster to pull private images from ECR repository

Pull the docker images from a private ECR repository to your AKS cluster by running the following command. This setup is required during the azure/k8s-deploy Github Actions in the CI/CD workflow. Authenticate Docker to an Amazon ECR registry with get-login-password by using aws ecr get-login-password. Run the following command in a shell where AWS CLI is configured, and is used to connect to the AKS cluster. This creates a secret called ecrsecret, which is used to pull an image from the private ECR repository.

kubectl create secret docker-registry ecrsecret\
 --docker-server=<aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/<my_repository>\
 --docker-username=AWS\
 --docker-password= $(aws ecr get-login-password --region us-west-2)

Provide ECR URI in the variable “–docker-server =”.

CodeCatalyst setup

Follow these steps to set up CodeCatalyst environment:

Create a CodeCatalyst space and associated AWS account.
Create a CodeCatalyst project.
A CodeCatalyst environment connected to the AWS account, where the ECR repository is configured.

Configure access to the AKS cluster

In this solution, we use three GitHub Actions – azure/login, azure/aks-set-context and azure/k8s-deploy – to login, set the AKS cluster, and deploy the manifest file to the AKS cluster respectively. For the Github Actions to access the Azure environment, they require credentials associated with an Azure Service Principal.

Service Principals in Azure are identified by the CLIENT_ID, CLIENT_SECRET, SUBSCRIPTION_ID, and TENANT_ID properties. Create the Service principal by running the following command in the azure cloud shell:

az ad sp create-for-rbac \
    --name "ghActionHTMLapplication" \
    --scope /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP> \
    --role Contributor \
    --sdk-auth

The command generates a JSON output (shown in Figure 3), which is stored in CodeCatalyst secret called AZURE_CREDENTIALS. This credential is used by azure/login Github Actions.

JSON output stored in AZURE-CREDENTIALS secret

Figure 3 – JSON output

Configure secrets inside CodeCatalyst Project

Create three secrets CLUSTER_NAME (Name of AKS cluster), RESOURCE_GROUP(Name of Azure resource group) and AZURE_CREDENTIALS(described in the previous step) as described in the working with secret document. The secrets are shown in Figure 4.

Secrets in CodeCatalyst

Figure 4 – CodeCatalyst Secrets

CodeCatalyst CI/CD Workflow

To create a new CodeCatalyst workflow, select CI/CD from the navigation on the left and select Workflows (1). Then, select Create workflow (2), leave the default options, and select Create (3) as shown in Figure 5.

Create CodeCatalyst CI/CD workflow

Figure 5 – Create CodeCatalyst CI/CD workflow

Add “Push to Amazon ECR” Action

Add the Push to Amazon ECR action, and configure the environment where you created the ECR repository as shown in Figure 6. Refer to adding an action to learn how to add CodeCatalyst action.

Create ‘Push to ECR’ CodeCatalyst Action

Figure 6 – Create ‘Push to ECR’ Action

Select the Configuration tab and specify the configurations as shown in Figure7.

Configure ‘Push to ECR’ CodeCatalyst Action

Figure 7 – Configure ‘Push to ECR’ Action

Configure the Deploy action

1. Add a GitHub action for deploying to AKS as shown in Figure 8.

Github action to deploy to AKS

Figure 8 – Github action to deploy to AKS

2. Configure the GitHub action from the configurations tab by adding the following snippet to the GitHub Actions YAML property:

- name: Install Azure CLI
  run: pip install azure-cli
- name: Azure login
  id: login
  uses: azure/[email protected]
  with:
    creds: ${Secrets.AZURE_CREDENTIALS}
- name: Set AKS context
  id: set-context
  uses: azure/aks-set-context@v3
  with:
    resource-group: ${Secrets.RESOURCE_GROUP}
    cluster-name: ${Secrets.CLUSTER_NAME}
- name: Setup kubectl
  id: install-kubectl
  uses: azure/setup-kubectl@v3
- name: Deploy to AKS
  id: deploy-aks
  uses: Azure/k8s-deploy@v4
  with:
    namespace: default
    manifests: manifest.yaml
    pull-images: true

Github action configuration for deploying application to AKS

Figure 9 – Github action configuration

3. The workflow is now ready and can be validated by choosing ‘Validate’ and then saved to the repository by choosing ‘Commit’.
We have implemented an automated CI/CD workflow that builds the container image of the application (refer Figure 10), pushes the image to ECR, and deploys the application to AKS cluster. This CI/CD workflow is triggered as application code is pushed to the repository.

Automated CI/CD workflow

Figure 10 – Automated CI/CD workflow

Test the deployment

When the HTML application runs, Kubernetes exposes the application using a public facing load balancer. To find the external IP of the load balancer, connect to the AKS cluster and run the following command:

kubectl get service multicloud-container-deployment-service

The output of the above command should look like the image in Figure 11.

Output of kubectl get service command

Figure 11 – Output of kubectl get service

Paste the External IP into a browser to see the running HTML application as shown in Figure 12.

HTML application running successfully in AKS

Figure 12 – Application running in AKS

Cleanup

If you have been following along with the workflow described in the post, you should delete the resources you deployed so you do not continue to incur charges. First, delete the Amazon ECR repository using the AWS console. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project. There’s no cost associated with the CodeCatalyst project and you can continue using it. Finally, if you deployed the application on a new AKS cluster, delete the cluster from the Azure console. In case you deployed the application to an existing AKS cluster, run the following commands to delete the application resources.

kubectl delete deployment multicloud-container-deployment-app
kubectl delete services multicloud-container-deployment-service

Conclusion

In summary, this post showed how Amazon CodeCatalyst can help organizations deploy containerized workloads in a hybrid and multicloud environment. It demonstrated in detail how to set up and configure Amazon CodeCatalyst to deploy a containerized application to Azure Kubernetes Service, leveraging a CodeCatalyst workflow, and GitHub Actions. Learn more and get started with your Amazon CodeCatalyst journey!

If you have any questions or feedback, leave them in the comments section.

About Authors

How to deploy workloads in a multicloud environment with AWS developer tools

2023-06-08 Brent Van Wynsberge

Post Syndicated from Brent Van Wynsberge original https://aws.amazon.com/blogs/devops/how-to-deploy-workloads-in-a-multicloud-environment-with-aws-developer-tools/

As organizations embrace cloud computing as part of “cloud first” strategy, and migrate to the cloud, some of the enterprises end up in a multicloud environment. We see that enterprise customers get the best experience, performance and cost structure when they choose a primary cloud provider. However, for a variety of reasons, some organizations end up operating in a multicloud environment. For example, in case of mergers & acquisitions, an organization may acquire an entity which runs on a different cloud platform, resulting in the organization operating in a multicloud environment. Another example is in the case where an ISV (Independent Software Vendor) provides services to customers operating on different cloud providers. One more example is the scenario where an organization needs to adhere to data residency and data sovereignty requirements, and ends up with workloads deployed to multiple cloud platforms across locations. Thus, the organization ends up running in a multicloud environment.

In the scenarios described above, one of the challenges organizations face operating such a complex environment is managing release process (building, testing, and deploying applications at scale) across multiple cloud platforms. If an organization’s primary cloud provider is AWS, they may want to continue using AWS developer tools to deploy workloads in other cloud platforms. Organizations facing such scenarios can leverage AWS services to develop their end-to-end CI/CD and release process instead of developing a release pipeline for each platform, which is complex, and not sustainable in the long run.

In this post we show how organizations can continue using AWS developer tools in a hybrid and multicloud environment. We walk the audience through a scenario where we deploy an application to VMs running on-premises and Azure, showcasing AWS’ hybrid and multicloud DevOps capabilities.

Solution and scenario overview

In this post we’re demonstrating the following steps:

Setup a CI/CD pipeline using AWS CodePipeline, and show how it’s run when application code is updated, and checked into the code repository (GitHub).
Check out application code from the code repository, and use an IDE (Visual Studio Code) to make changes, and check-in the code to the code repository.
Check in the modified application code to automatically run the release process built using AWS CodePipeline. It makes use of AWS CodeBuild to retrieve the latest version of code from code repository, compile it, build the deployment package, and test the application.
Deploy the updated application to VMs across on-premises, and Azure using AWS CodeDeploy.

The high-level solution is shown below. This post does not show all of the possible combinations and integrations available to build the CI/CD pipeline. As an example, you can integrate the pipeline with your existing tools for test and build such as Selenium, Jenkins, SonarQube etc.

This post focuses on deploying application in a multicloud environment, and how AWS Developer Tools can support virtually any scenario or use case specific to your organization. We will be deploying a sample application from this AWS tutorial to an on-premises server, and an Azure Virtual Machine (VM) running Red Hat Enterprise Linux (RHEL). In future posts in this series, we will cover how you can deploy any type of workload using AWS tools, including containers, and serverless applications.

Architecture Diagram

CI/CD pipeline setup

This section describes instructions for setting up a multicloud CI/CD pipeline.

Note: A key point to note is that the CI/CD pipeline setup, and related sub-sections in this post, are a one-time activity, and you’ll not need to perform these steps every time an application is deployed or modified.

Install CodeDeploy agent

The AWS CodeDeploy agent is a software package that is used to execute deployments on an instance. You can install the CodeDeploy agent on an on-premises server and Azure VM by either using the command line, or AWS Systems Manager.

Setup GitHub code repository

Setup GitHub code repository using the following steps:

Create a new GitHub code repository or use a repository that already exists.
Copy the Sample_App_Linux app (zip) from Amazon S3 as described in Step 3 of Upload a sample application to your GitHub repository tutorial.

Commit the files to code repository

git add .
git commit -m 'Initial Commit'
git push

You will use this repository to deploy your code across environments.

Configure AWS CodePipeline

Follow the steps outlined below to setup and configure CodePipeline to orchestrate the CI/CD pipeline of our application.

Navigate to CodePipeline in the AWS console and click on ‘Create pipeline’
Give your pipeline a name (eg: MyWebApp-CICD) and allow CodePipeline to create a service role on your behalf.
For the source stage, select GitHub (v2) as your source provide and click on the Connect to GitHub button to give CodePipeline access to your git repository.
Create a new GitHub connection and click on the Install a new App button to install the AWS Connector in your GitHub account.
Back in the CodePipeline console select the repository and branch you would like to build and deploy.

Image showing the configured source stage

Now we create the build stage; Select AWS CodeBuild as the build provider.
Click on the ‘Create project’ button to create the project for your build stage, and give your project a name.
Select Ubuntu as the operating system for your managed image, chose the standard runtime and select the ‘aws/codebuild/standard’ image with the latest version.

In the Buildspec section select “Insert build commands” and click on switch to editor. Enter the following yaml code as your build commands:

version: 0.2
phases:
    build:
        commands:
            - echo "This is a dummy build command"
artifacts:
    files:
        - "*/*"

Note: you can also integrate build commands to your git repository by using a buildspec yaml file. More information can be found at Build specification reference for CodeBuild.

Leave all other options as default and click on ‘Continue to CodePipeline’

Image showing the configured buildspec

Back in the CodePipeline console your Project name will automatically be filled in. You can now continue to the next step.
Click the “Skip deploy stage” button; We will create this in the next section.
Review your changes and click “Create pipeline”. Your newly created pipeline will now build for the first time!

Image showing the first execution of the CI/CD pipeline

Configure AWS CodeDeploy on Azure and on-premises VMs

Now that we have built our application, we want to deploy it to both the environments – Azure, and on-premises. In the “Install CodeDeploy agent” section we’ve already installed the CodeDeploy agent. As a one-time step we now have to give the CodeDeploy agents access to the AWS environment. You can leverage AWS Identity and Access Management (IAM) Roles Anywhere in combination with the code-deploy-session-helper to give access to the AWS resources needed.
The IAM Role should at least have the AWSCodeDeployFullAccess AWS managed policy and Read only access to the CodePipeline S3 bucket in your account (called codepipeline-<region>-<account-id>) .

For more information on how to setup IAM Roles Anywhere please refer how to extend AWS IAM roles to workloads outside of AWS with IAM Roles Anywhere. Alternative ways to configure access can be found in the AWS CodeDeploy user guide. Follow the steps below for instances you want to configure.

Configure your CodeDeploy agent as described in the user guide. Ensure the AWS Command Line Interface (CLI) is installed on your VM and execute the following command to register the instance with CodeDeploy.
```
aws deploy register-on-premises-instance --instance-name <name_for_your_instance> --iam-role-arn <arn_of_your_iam_role>
```

Tag the instance as follows

aws deploy add-tags-to-on-premises-instances --instance-names <name_for_your_instance> --tags Key=Application,Value=MyWebApp

You should now see both instances registered in the “CodeDeploy > On-premises instances” panel. You can now deploy application to your Azure VM and on premises VMs!

Image showing the registered instances

Configure AWS CodeDeploy to deploy WebApp

Follow the steps mentioned below to modify the CI/CD pipeline to deploy the application to Azure, and on-premises environments.

Create an IAM role named CodeDeployServiceRole and select CodeDeploy > CodeDeploy as your use case. IAM will automatically select the right policy for you. CodeDeploy will use this role to manage the deployments of your application.
In the AWS console navigate to CodeDeploy > Applications. Click on “Create application”.
Give your application a name and choose “EC2/On-premises” as the compute platform.
Configure the instances we want to deploy to. In the detail view of your application click on “Create deployment group”.
Give your deployment group a name and select the CodeDeployServiceRole.
In the environment configuration section choose On-premises Instances.
Configure the Application, MyWebApp key value pair.
Disable load balancing and leave all other options default.
Click on create deployment group. You should now see your newly created deployment group.

Image showing the created CodeDeploy Application and Deployment group

We can now edit our pipeline to deploy to the newly created deployment group.
Navigate to your previously created Pipeline in the CodePipeline section and click edit. Add the deploy stage by clicking on Add stage and name it Deploy. Aftewards click Add action.
Name your action and choose CodeDeploy as your action provider.
Select “BuildArtifact” as your input artifact and select your newly created application and deployment group.
Click on Done and on Save in your pipeline to confirm the changes. You have now added the deploy step to your pipeline!

Image showing the updated pipeline

This completes the on-time devops pipeline setup, and you will not need to repeat the process.

Automated DevOps pipeline in action

This section demonstrates how the devops pipeline operates end-to-end, and automatically deploys application to Azure VM, and on-premises server when the application code changes.

Click on Release Change to deploy your application for the first time. The release change button manually triggers CodePipeline to update your code. In the next section we will make changes to the repository which triggers the pipeline automatically.
During the “Source” stage your pipeline fetches the latest version from github.
During the “Build” stage your pipeline uses CodeBuild to build your application and generate the deployment artifacts for your pipeline. It uses the buildspec.yml file to determine the build steps.
During the “Deploy” stage your pipeline uses CodeDeploy to deploy the build artifacts to the configured Deployment group – Azure VM and on-premises VM. Navigate to the url of your application to see the results of the deployment process.

Update application code in IDE

You can modify the application code using your favorite IDE. In this example we will change the background color and a paragraph of the sample application.

Once you’ve modified the code, save the updated file followed by pushing the code to the code repository.

git add .
git commit -m "I made changes to the index.html file "
git push

DevOps pipeline (CodePipeline) – compile, build, and test

Once the code is updated, and pushed to GitHub, the DevOps pipeline (CodePipeline) automatically compiles, builds and tests the modified application. You can navigate to your pipeline (CodePipeline) in the AWS Console, and should see the pipeline running (or has recently completed). CodePipeline automatically executes the Build and Deploy steps. In this case we’re not adding any complex logic, but based on your organization’s requirements you can add any build step, or integrate with other tools.

Deployment process using CodeDeploy

In this section, we describe how the modified application is deployed to the Azure, and on-premises VMs.

Open your pipeline in the CodePipeline console, and click on the “AWS CodeDeploy” link in the Deploy step to navigate to your deployment group. Open the “Deployments” tab.

Image showing application deployment history

Click on the first deployment in the Application deployment history section. This will show the details of your latest deployment.

In the “Deployment lifecycle events” section click on one of the “View events” links. This shows you the lifecycle steps executed by CodeDeploy and will display the error log output if any of the steps have failed.

Image showing deployment events on instance

Navigate back to your application. You should now see your changes in the application. You’ve successfully set up a multicloud DevOps pipeline!

Conclusion

In summary, the post demonstrated how AWS DevOps tools and services can help organizations build a single release pipeline to deploy applications and workloads in a hybrid and multicloud environment. The post also showed how to set up CI/CD pipeline to deploy applications to AWS, on-premises, and Azure VMs.

If you have any questions or feedback, leave them in the comments section.

About the Authors

Best practices to optimize your Amazon EC2 Spot Instances usage

2023-05-15 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/best-practices-to-optimize-your-amazon-ec2-spot-instances-usage/

This blog post is written by Pranaya Anshu, EC2 PMM, and Sid Ambatipudi, EC2 Compute GTM Specialist.

Amazon EC2 Spot Instances are a powerful tool that thousands of customers use to optimize their compute costs. The National Football League (NFL) is an example of customer using Spot Instances, leveraging 4000 EC2 Spot Instances across more than 20 instance types to build its season schedule. By using Spot Instances, it saves 2 million dollars every season! Virtually any organization – small or big – can benefit from using Spot Instances by following best practices.

Overview of Spot Instances

Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud and are available at up to a 90% discount compared to On-Demand prices. Through Spot Instances, you can take advantage of the massive operating scale of AWS and run hyperscale workloads at a significant cost saving. In exchange for these discounts, AWS has the option to reclaim Spot Instances when EC2 requires the capacity. AWS provides a two-minute notification before reclaiming Spot Instances, allowing workloads running on those instances to be gracefully shut down.

In this blog post, we explore four best practices that can help you optimize your Spot Instances usage and minimize the impact of Spot Instances interruptions: diversifying your instances, considering attribute-based instance type selection, leveraging Spot placement scores, and using the price-capacity-optimized allocation strategy. By applying these best practices, you’ll be able to leverage Spot Instances for appropriate workloads and ultimately reduce your compute costs. Note for the purposes of this blog, we will focus on the integration of Spot Instances with Amazon EC2 Auto Scaling groups.

Pre-requisites

Spot Instances can be used for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and AI/ML workloads. However, as previously mentioned, AWS can interrupt Spot Instances with a two-minute notification, so it is best not to use Spot Instances for workloads that cannot handle individual instance interruption — that is, workloads that are inflexible, stateful, fault-intolerant, or tightly coupled.

Best practices

Diversify your instances

The fundamental best practice when using Spot Instances is to be flexible. A Spot capacity pool is a set of unused EC2 instances of the same instance type (for example, m6i.large) within the same AWS Region and Availability Zone (for example, us-east-1a). When you request Spot Instances, you are requesting instances from a specific Spot capacity pool. Since Spot Instances are spare EC2 capacity, you want to base your selection (request) on as many spare pools of capacity as possible in order to increase your likelihood of getting Spot Instances. You should diversify across instance sizes, generations, instance types, and Availability Zones to maximize your savings with Spot Instances. For example, if you are currently using c5a.large in us-east-1a, consider including c6a instances (newer generation of instances), c5a.xl (larger size), or us-east-1b (different Availability Zone) to increase your overall flexibility. Instance diversification is beneficial not only for selecting Spot Instances, but also for scaling, resilience, and cost optimization.

To get hands-on experience with Spot Instances and to practice instance diversification, check out Amazon EC2 Spot Instances workshops. And once you’ve diversified your instances, you can leverage AWS Fault Injection Simulator (AWS FIS) to test your applications’ resilience to Spot Instance interruptions to ensure that they can maintain target capacity while still benefiting from the cost savings offered by Spot Instances. To learn more about stress testing your applications, check out the Back to Basics: Chaos Engineering with AWS Fault Injection Simulator video and AWS FIS documentation.

Consider attribute-based instance type selection

We have established that flexibility is key when it comes to getting the most out of Spot Instances. Similarly, we have said that in order to access your desired Spot Instances capacity, you should select multiple instance types. While building and maintaining instance type configurations in a flexible way may seem daunting or time-consuming, it doesn’t have to be if you use attribute-based instance type selection. With attribute-based instance type selection, you can specify instance attributes — for example, CPU, memory, and storage — and EC2 Auto Scaling will automatically identify and launch instances that meet your defined attributes. This removes the manual-lift of configuring and updating instance types. Moreover, this selection method enables you to automatically use newly released instance types as they become available so that you can continuously have access to an increasingly broad range of Spot Instance capacity. Attribute-based instance type selection is ideal for workloads and frameworks that are instance agnostic, such as HPC and big data workloads, and can help to reduce the work involved with selecting specific instance types to meet specific requirements.

For more information on how to configure attribute-based instance selection for your EC2 Auto Scaling group, refer to Create an Auto Scaling Group Using Attribute-Based Instance Type Selection documentation. To learn more about attribute-based instance type selection, read the Attribute-Based Instance Type Selection for EC2 Auto Scaling and EC2 Fleet news blog or check out the Using Attribute-Based Instance Type Selection and Mixed Instance Groups section of the Launching Spot Instances workshop.

Leverage Spot placement scores

Now that we’ve stressed the importance of flexibility when it comes to Spot Instances and covered the best way to select instances, let’s dive into how to find preferred times and locations to launch Spot Instances. Because Spot Instances are unused EC2 capacity, Spot Instances capacity fluctuates. Correspondingly, it is possible that you won’t always get the exact capacity at a specific time that you need through Spot Instances. Spot placement scores are a feature of Spot Instances that indicates how likely it is that you will be able to get the Spot capacity that you require in a specific Region or Availability Zone. Your Spot placement score can help you reduce Spot Instance interruptions, acquire greater capacity, and identify optimal configurations to run workloads on Spot Instances. However, it is important to note that Spot placement scores serve only as point-in-time recommendations (scores can vary depending on current capacity) and do not provide any guarantees in terms of available capacity or risk of interruption. To learn more about how Spot placement scores work and to get started with them, see the Identifying Optimal Locations for Flexible Workloads With Spot Placement Score blog and Spot placement scores documentation.

As a near real-time tool, Spot placement scores are often integrated into deployment automation. However, because of its logging and graphic capabilities, you may find it to be a valuable resource even before you launch a workload in the cloud. If you are looking to understand historical Spot placement scores for your workload, you should check out the Spot placement score tracker, a tool that automates the capture of Spot placement scores and stores Spot placement score metrics in Amazon CloudWatch. The tracker is available through AWS Labs, a GitHub repository hosting tools. Learn more about the tracker through the Optimizing Amazon EC2 Spot Instances with Spot Placement Scores blog.

When considering ideal times to launch Spot Instances and exploring different options via Spot placement scores, be sure to consider running Spot Instances at off-peak hours – or hours when there is less demand for EC2 Instances. As you may assume, there is less unused capacity – Spot Instances – available during typical business hours than after business hours. So, in order to leverage as much Spot capacity as you can, explore the possibility of running your workload at hours when there is reduced demand for EC2 instances and thus greater availability of Spot Instances. Similarly, consider running your Spot Instances in “off-peak Regions” – or Regions that are not experiencing business hours at that certain time.

On a related note, to maximize your usage of Spot Instances, you should consider using previous generation of instances if they meet your workload needs. This is because, as with off-peak vs peak hours, there is typically greater capacity available for previous generation instances than current generation instances, as most people tend to use current generation instances for their compute needs.

Use the price-capacity-optimized allocation strategy

Once you’ve selected a diversified and flexible set of instances, you should select your allocation strategy. When launching instances, your Auto Scaling group uses the allocation strategy that you specify to pick the specific Spot pools from all your possible pools. Spot offers four allocation strategies: price-capacity-optimized, capacity-optimized, capacity-optimized-prioritized, and lowest-price. Each of these allocation strategies select Spot Instances in pools based on price, capacity, a prioritized list of instances, or a combination of these factors.

The price-capacity-optimized strategy launched in November 2022. This strategy makes Spot Instance allocation decisions based on the most capacity at the lowest price. It essentially enables Auto Scaling groups to identify the Spot pools with the highest capacity availability for the number of instances that are launching. In other words, if you select this allocation strategy, we will find the Spot capacity pools that we believe have the lowest chance of interruption in the near term. Your Auto Scaling groups then request Spot Instances from the lowest priced of these pools.

We recommend you leverage the price-capacity-optimized allocation strategy for the majority of your workloads that run on Spot Instances. To see how the price-capacity-optimized allocation strategy selects Spot Instances in comparison with lowest-price and capacity-optimized allocation strategies, read the Introducing the Price-Capacity-Optimized Allocation Strategy for EC2 Spot Instances blog post.

Clean-up

If you’ve explored the different Spot Instances workshops we recommended throughout this blog post and spun up resources, please remember to delete resources that you are no longer using to avoid incurring future costs.

Conclusion

Spot Instances can be leveraged to reduce costs across a wide-variety of use cases, including containers, big data, machine learning, HPC, and CI/CD workloads. In this blog, we discussed four Spot Instances best practices that can help you optimize your Spot Instance usage to maximize savings: diversifying your instances, considering attribute-based instance type selection, leveraging Spot placement scores, and using the price-capacity-optimized allocation strategy.

To learn more about Spot Instances, check out Spot Instances getting started resources. Or to learn of other ways of reducing costs and improving performance, including leveraging other flexible purchase models such as AWS Savings Plans, read the Increase Your Application Performance at Lower Costs eBook or watch the Seven Steps to Lower Costs While Improving Application Performance webinar.

Multi-branch pipeline management and infrastructure deployment using AWS CDK Pipelines

2022-12-22 Iris Kraja

Post Syndicated from Iris Kraja original https://aws.amazon.com/blogs/devops/multi-branch-pipeline-management-and-infrastructure-deployment-using-aws-cdk-pipelines/

This post describes how to use the AWS CDK Pipelines module to follow a Gitflow development model using AWS Cloud Development Kit (AWS CDK). Software development teams often follow a strict branching strategy during a solutions development lifecycle. Newly-created branches commonly need their own isolated copy of infrastructure resources to develop new features.

CDK Pipelines is a construct library module for continuous delivery of AWS CDK applications. CDK Pipelines are self-updating: if you add application stages or stacks, then the pipeline automatically reconfigures itself to deploy those new stages and/or stacks.

The following solution creates a new AWS CDK Pipeline within a development account for every new branch created in the source repository (AWS CodeCommit). When a branch is deleted, the pipeline and all related resources are also destroyed from the account. This GitFlow model for infrastructure provisioning allows developers to work independently from each other, concurrently, even in the same stack of the application.

Solution overview

The following diagram provides an overview of the solution. There is one default pipeline responsible for deploying resources to the different application environments (e.g., Development, Pre-Prod, and Prod). The code is stored in CodeCommit. When new changes are pushed to the default CodeCommit repository branch, AWS CodePipeline runs the default pipeline. When the default pipeline is deployed, it creates two AWS Lambda functions.

These two Lambda functions are invoked by CodeCommit CloudWatch events when a new branch in the repository is created or deleted. The Create Lambda function uses the boto3 CodeBuild module to create an AWS CodeBuild project that builds the pipeline for the feature branch. This feature pipeline consists of a build stage and an optional update pipeline stage for itself. The Destroy Lambda function creates another CodeBuild project which cleans all of the feature branch’s resources and the feature pipeline.

Figure 1. Architecture diagram.

Prerequisites

Before beginning this walkthrough, you should have the following prerequisites:

An AWS account
AWS CDK installed
Python3 installed
Jq (JSON processor) installed
Basic understanding of continuous integration/continuous development (CI/CD) Pipelines

Initial setup

Download the repository from GitHub:

# Command to clone the repository
git clone https://github.com/aws-samples/multi-branch-cdk-pipelines.git
cd multi-branch-cdk-pipelines

Create a new CodeCommit repository in the AWS Account and region where you want to deploy the pipeline and upload the source code from above to this repository. In the config.ini file, change the repository_name and region variables accordingly.

Make sure that you set up a fresh Python environment. Install the dependencies:

pip install -r requirements.txt

Run the initial-deploy.sh script to bootstrap the development and production environments and to deploy the default pipeline. You’ll be asked to provide the following parameters: (1) Development account ID, (2) Development account AWS profile name, (3) Production account ID, and (4) Production account AWS profile name.

sh ./initial-deploy.sh --dev_account_id <YOUR DEV ACCOUNT ID> --
dev_profile_name <YOUR DEV PROFILE NAME> --prod_account_id <YOUR PRODUCTION
ACCOUNT ID> --prod_profile_name <YOUR PRODUCTION PROFILE NAME>

Default pipeline

In the CI/CD pipeline, we set up an if condition to deploy the default branch resources only if the current branch is the default one. The default branch is retrieved programmatically from the CodeCommit repository. We deploy an Amazon Simple Storage Service (Amazon S3) Bucket and two Lambda functions. The bucket is responsible for storing the feature branches’ CodeBuild artifacts. The first Lambda function is triggered when a new branch is created in CodeCommit. The second one is triggered when a branch is deleted.

if branch == default_branch:
    
...

    # Artifact bucket for feature AWS CodeBuild projects
    artifact_bucket = Bucket(
        self,
        'BranchArtifacts',
        encryption=BucketEncryption.KMS_MANAGED,
        removal_policy=RemovalPolicy.DESTROY,
        auto_delete_objects=True
    )
...
    # AWS Lambda function triggered upon branch creation
    create_branch_func = aws_lambda.Function(
        self,
        'LambdaTriggerCreateBranch',
        runtime=aws_lambda.Runtime.PYTHON_3_8,
        function_name='LambdaTriggerCreateBranch',
        handler='create_branch.handler',
        code=aws_lambda.Code.from_asset(path.join(this_dir, 'code')),
        environment={
            "ACCOUNT_ID": dev_account_id,
            "CODE_BUILD_ROLE_ARN": iam_stack.code_build_role.role_arn,
            "ARTIFACT_BUCKET": artifact_bucket.bucket_name,
            "CODEBUILD_NAME_PREFIX": codebuild_prefix
        },
        role=iam_stack.create_branch_role)


    # AWS Lambda function triggered upon branch deletion
    destroy_branch_func = aws_lambda.Function(
        self,
        'LambdaTriggerDestroyBranch',
        runtime=aws_lambda.Runtime.PYTHON_3_8,
        function_name='LambdaTriggerDestroyBranch',
        handler='destroy_branch.handler',
        role=iam_stack.delete_branch_role,
        environment={
            "ACCOUNT_ID": dev_account_id,
            "CODE_BUILD_ROLE_ARN": iam_stack.code_build_role.role_arn,
            "ARTIFACT_BUCKET": artifact_bucket.bucket_name,
            "CODEBUILD_NAME_PREFIX": codebuild_prefix,
            "DEV_STAGE_NAME": f'{dev_stage_name}-{dev_stage.main_stack_name}'
        },
        code=aws_lambda.Code.from_asset(path.join(this_dir,
                                                  'code')))

Then, the CodeCommit repository is configured to trigger these Lambda functions based on two events:

(1) Reference created

# Configure AWS CodeCommit to trigger the Lambda function when a new branch is created
repo.on_reference_created(
    'BranchCreateTrigger',
    description="AWS CodeCommit reference created event.",
    target=aws_events_targets.LambdaFunction(create_branch_func))

(2) Reference deleted

# Configure AWS CodeCommit to trigger the Lambda function when a branch is deleted
repo.on_reference_deleted(
    'BranchDeleteTrigger',
    description="AWS CodeCommit reference deleted event.",
    target=aws_events_targets.LambdaFunction(destroy_branch_func))

Lambda functions

The two Lambda functions build and destroy application environments mapped to each feature branch. An Amazon CloudWatch event triggers the LambdaTriggerCreateBranch function whenever a new branch is created. The CodeBuild client from boto3 creates the build phase and deploys the feature pipeline.

Create function

The create function deploys a feature pipeline which consists of a build stage and an optional update pipeline stage for itself. The pipeline downloads the feature branch code from the CodeCommit repository, initiates the Build and Test action using CodeBuild, and securely saves the built artifact on the S3 bucket.

The Lambda function handler code is as follows:

def handler(event, context):
    """Lambda function handler"""
    logger.info(event)

    reference_type = event['detail']['referenceType']

    try:
        if reference_type == 'branch':
            branch = event['detail']['referenceName']
            repo_name = event['detail']['repositoryName']

            client.create_project(
                name=f'{codebuild_name_prefix}-{branch}-create',
                description="Build project to deploy branch pipeline",
                source={
                    'type': 'CODECOMMIT',
                    'location': f'https://git-codecommit.{region}.amazonaws.com/v1/repos/{repo_name}',
                    'buildspec': generate_build_spec(branch)
                },
                sourceVersion=f'refs/heads/{branch}',
                artifacts={
                    'type': 'S3',
                    'location': artifact_bucket_name,
                    'path': f'{branch}',
                    'packaging': 'NONE',
                    'artifactIdentifier': 'BranchBuildArtifact'
                },
                environment={
                    'type': 'LINUX_CONTAINER',
                    'image': 'aws/codebuild/standard:4.0',
                    'computeType': 'BUILD_GENERAL1_SMALL'
                },
                serviceRole=role_arn
            )

            client.start_build(
                projectName=f'CodeBuild-{branch}-create'
            )
    except Exception as e:
        logger.error(e)

Create branch CodeBuild project’s buildspec.yaml content:

version: 0.2
env:
  variables:
    BRANCH: {branch}
    DEV_ACCOUNT_ID: {account_id}
    PROD_ACCOUNT_ID: {account_id}
    REGION: {region}
phases:
  pre_build:
    commands:
      - npm install -g aws-cdk && pip install -r requirements.txt
  build:
    commands:
      - cdk synth
      - cdk deploy --require-approval=never
artifacts:
  files:
    - '**/*'

Destroy function

The second Lambda function is responsible for the destruction of a feature branch’s resources. Upon the deletion of a feature branch, an Amazon CloudWatch event triggers this Lambda function. The function creates a CodeBuild Project which destroys the feature pipeline and all of the associated resources created by that pipeline. The source property of the CodeBuild Project is the feature branch’s source code saved as an artifact in Amazon S3.

The Lambda function handler code is as follows:

def handler(event, context):
    logger.info(event)
    reference_type = event['detail']['referenceType']

    try:
        if reference_type == 'branch':
            branch = event['detail']['referenceName']
            client.create_project(
                name=f'{codebuild_name_prefix}-{branch}-destroy',
                description="Build project to destroy branch resources",
                source={
                    'type': 'S3',
                    'location': f'{artifact_bucket_name}/{branch}/CodeBuild-{branch}-create/',
                    'buildspec': generate_build_spec(branch)
                },
                artifacts={
                    'type': 'NO_ARTIFACTS'
                },
                environment={
                    'type': 'LINUX_CONTAINER',
                    'image': 'aws/codebuild/standard:4.0',
                    'computeType': 'BUILD_GENERAL1_SMALL'
                },
                serviceRole=role_arn
            )

            client.start_build(
                projectName=f'CodeBuild-{branch}-destroy'
            )

            client.delete_project(
                name=f'CodeBuild-{branch}-destroy'
            )

            client.delete_project(
                name=f'CodeBuild-{branch}-create'
            )
    except Exception as e:
        logger.error(e)

Destroy the branch CodeBuild project’s buildspec.yaml content:

version: 0.2
env:
  variables:
    BRANCH: {branch}
    DEV_ACCOUNT_ID: {account_id}
    PROD_ACCOUNT_ID: {account_id}
    REGION: {region}
phases:
  pre_build:
    commands:
      - npm install -g aws-cdk && pip install -r requirements.txt
  build:
    commands:
      - cdk destroy cdk-pipelines-multi-branch-{branch} --force
      - aws cloudformation delete-stack --stack-name {dev_stage_name}-{branch}
      - aws s3 rm s3://{artifact_bucket_name}/{branch} --recursive

Create a feature branch

On your machine’s local copy of the repository, create a new feature branch using the following git commands. Replace user-feature-123 with a unique name for your feature branch. Note that this feature branch name must comply with the CodePipeline naming restrictions, as it will be used to name a unique pipeline later in this walkthrough.

# Create the feature branch
git checkout -b user-feature-123
git push origin user-feature-123

The first Lambda function will deploy the CodeBuild project, which then deploys the feature pipeline. This can take a few minutes. You can log in to the AWS Console and see the CodeBuild project running under CodeBuild.

Figure 2. AWS Console – CodeBuild projects.

After the build is successfully finished, you can see the deployed feature pipeline under CodePipelines.

Figure 3. AWS Console – CodePipeline pipelines.

The Lambda S3 trigger project from AWS CDK Samples is used as the infrastructure resources to demonstrate this solution. The content is placed inside the src directory and is deployed by the pipeline. When visiting the Lambda console page, you can see two functions: one by the default pipeline and one by our feature pipeline.

Figure 4. AWS Console – Lambda functions.

Destroy a feature branch

There are two common ways for removing feature branches. The first one is related to a pull request, also known as a “PR”. This occurs when merging a feature branch back into the default branch. Once it’s merged, the feature branch will be automatically closed. The second way is to delete the feature branch explicitly by running the following git commands:

# delete branch local
git branch -d user-feature-123

# delete branch remote
git push origin --delete user-feature-123

The CodeBuild project responsible for destroying the feature resources is now triggered. You can see the project’s logs while the resources are being destroyed in CodeBuild, under Build history.

Figure 5. AWS Console – CodeBuild projects.

Cleaning up

To avoid incurring future charges, log into the AWS console of the different accounts you used, go to the AWS CloudFormation console of the Region(s) where you chose to deploy, and select and click Delete on the main and branch stacks.

Conclusion

This post showed how you can work with an event-driven strategy and AWS CDK to implement a multi-branch pipeline flow using AWS CDK Pipelines. The described solutions leverage Lambda and CodeBuild to provide a dynamic orchestration of resources for multiple branches and pipelines.
For more information on CDK Pipelines and all the ways it can be used, see the CDK Pipelines reference documentation.

About the authors:

Build, Test and Deploy ETL solutions using AWS Glue and AWS CDK based CI/CD pipelines

2022-10-03 Puneet Babbar

Post Syndicated from Puneet Babbar original https://aws.amazon.com/blogs/big-data/build-test-and-deploy-etl-solutions-using-aws-glue-and-aws-cdk-based-ci-cd-pipelines/

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. It’s serverless, so there’s no infrastructure to set up or manage.

This post provides a step-by-step guide to build a continuous integration and continuous delivery (CI/CD) pipeline using AWS CodeCommit, AWS CodeBuild, and AWS CodePipeline to define, test, provision, and manage changes of AWS Glue based data pipelines using the AWS Cloud Development Kit (AWS CDK).

The AWS CDK is an open-source software development framework for defining cloud infrastructure as code using familiar programming languages and provisioning it through AWS CloudFormation. It provides you with high-level components called constructs that preconfigure cloud resources with proven defaults, cutting down boilerplate code and allowing for faster development in a safe, repeatable manner.

Solution overview

The solution constructs a CI/CD pipeline with multiple stages. The CI/CD pipeline constructs a data pipeline using COVID-19 Harmonized Data managed by Talend / Stitch. The data pipeline crawls the datasets provided by neherlab from the public Amazon Simple Storage Service (Amazon S3) bucket, exposes the public datasets in the AWS Glue Data Catalog so they’re available for SQL queries using Amazon Athena, performs ETL (extract, transform, and load) transformations to denormalize the datasets to a table, and makes the denormalized table available in the Data Catalog.

The solution is designed as follows:

A data engineer deploys the initial solution. The solution creates two stacks:
- cdk-covid19-glue-stack-pipeline – This stack creates the CI/CD infrastructure as shown in the architectural diagram (labeled Tool Chain).
- cdk-covid19-glue-stack – The cdk-covid19-glue-stack-pipeline stack deploys the cdk-covid19-glue-stack stack to create the AWS Glue based data pipeline as shown in the diagram (labeled ETL).
The data engineer makes changes on cdk-covid19-glue-stack (when a change in the ETL application is required).
The data engineer pushes the change to a CodeCommit repository (generated in the cdk-covid19-glue-stack-pipeline stack).
The pipeline is automatically triggered by the push, and deploys and updates all the resources in the cdk-covid19-glue-stack stack.

At the time of publishing of this post, the AWS CDK has two versions of the AWS Glue module: @aws-cdk/aws-glue and @aws-cdk/aws-glue-alpha, containing L1 constructs and L2 constructs, respectively. At this time, the @aws-cdk/aws-glue-alpha module is still in an experimental stage. We use the stable @aws-cdk/aws-glue module for the purpose of this post.

The following diagram shows all the components in the solution.

Figure 1 – Architecture diagram

The data pipeline consists of an AWS Glue workflow, triggers, jobs, and crawlers. The AWS Glue job uses an AWS Identity and Access Management (IAM) role with appropriate permissions to read and write data to an S3 bucket. AWS Glue crawlers crawl the data available in the S3 bucket, update the AWS Glue Data Catalog with the metadata, and create tables. You can run SQL queries on these tables using Athena. For ease of identification, we followed the naming convention for triggers to start with t_*, crawlers with c_*, and jobs with j_*. A CI/CD pipeline based on CodeCommit, CodeBuild, and CodePipeline builds, tests and deploys the solution. The complete infrastructure is created using the AWS CDK.

The following table lists the tables created by this solution that you can query using Athena.

Table Name	Description	Dataset Location	Access	Location
`neherlab_case_counts`	Total number of cases	s3://covid19-harmonized-dataset/covid19tos3/neherlab_case_counts/	Read	Public
`neherlab_country_codes`	Country code	s3://covid19-harmonized-dataset/covid19tos3/neherlab_country_codes/	Read	Public
`neherlab_icu_capacity`	Intensive Care Unit (ICU) capacity	s3://covid19-harmonized-dataset/covid19tos3/neherlab_icu_capacity/	Read	Public
`neherlab_population`	Population	s3://covid19-harmonized-dataset/covid19tos3/neherlab_population/	Read	Public
`neherla_denormalized`	Denormalized table that combines all the preceding tables into one table	s3://<your-S3-bucket-name>/neherlab_denormalized	Read/Write	Reader’s AWS account

Anatomy of the AWS CDK application

In this section, we visit key concepts and anatomy of the AWS CDK application, review the important sections of the code, and discuss how the AWS CDK reduces complexity of the solution as compared to AWS CloudFormation.

An AWS CDK app defines one or more stacks. Stacks (equivalent to CloudFormation stacks) contain constructs, each of which defines one or more concrete AWS resources. Each stack in the AWS CDK app is associated with an environment. An environment is the target AWS account ID and Region into which the stack is intended to be deployed.

In the AWS CDK, the top-most object is the AWS CDK app, which contains multiple stacks vs. the top-level stack in AWS CloudFormation. Given this difference, you can define all the stacks required for the application in the AWS CDK app. In AWS Glue based ETL projects, developers need to define multiple data pipelines by subject area or business logic. In AWS CloudFormation, we can achieve this by writing multiple CloudFormation stacks and often deploy them independently. In some cases, developers write nested stacks, which over time becomes very large and complicated to maintain. In the AWS CDK, all stacks are deployed from the AWS CDK app, increasing modularity of the code and allowing developers to identify all the data pipelines associated with an application easily.

Our AWS CDK application consists of four main files:

app.py – This is the AWS CDK app and the entry point for the AWS CDK application
pipeline.py – The pipeline.py stack, invoked by app.py, creates the CI/CD pipeline
etl/infrastructure.py – The etl/infrastructure.py stack, invoked by pipeline.py, creates the AWS Glue based data pipeline
default-config.yaml – The configuration file contains the AWS account ID and Region.

The AWS CDK application reads the configuration from the default-config.yaml file, sets the environment information (AWS account ID and Region), and invokes the PipelineCDKStack class in pipeline.py. Let’s break down the preceding line and discuss the benefits of this design.

For every application, we want to deploy in pre-production environments and a production environment. The application in all the environments will have different configurations, such as the size of the deployed resources. In the AWS CDK, every stack has a property called env, which defines the stack’s target environment. This property receives the AWS account ID and Region for the given stack.

Lines 26–34 in app.py show the aforementioned details:

# Initiating the CodePipeline stack
PipelineCDKStack(
app,
"PipelineCDKStack",
config=config,
env=env,
stack_name=config["codepipeline"]["pipelineStackName"]
)

The env=env line sets the target AWS account ID and Region for PipelieCDKStack. This design allows an AWS CDK app to be deployed in multiple environments at once and increases the parity of the application in all environment. For our example, if we want to deploy PipelineCDKStack in multiple environments, such as development, test, and production, we simply call the PipelineCDKStack stack after populating the env variable appropriately with the target AWS account ID and Region. This was more difficult in AWS CloudFormation, where developers usually needed to deploy the stack for each environment individually. The AWS CDK also provides features to pass the stage at the command line. We look into this option and usage in the later section.

Coming back to the AWS CDK application, the PipelineCDKStack class in pipeline.py uses the aws_cdk.pipeline construct library to create continuous delivery of AWS CDK applications. The AWS CDK provides multiple opinionated construct libraries like aws_cdk.pipeline to reduce boilerplate code from an application. The pipeline.py file creates the CodeCommit repository, populates the repository with the sample code, and creates a pipeline with the necessary AWS CDK stages for CodePipeline to run the CdkGlueBlogStack class from the etl/infrastructure.py file.

Line 99 in pipeline.py invokes the CdkGlueBlogStack class.

The CdkGlueBlogStack class in etl/infrastructure.py creates the crawlers, jobs, database, triggers, and workflow to provision the AWS Glue based data pipeline.

Refer to line 539 for creating a crawler using the CfnCrawler construct, line 564 for creating jobs using the CfnJob construct, and line 168 for creating the workflow using the CfnWorkflow construct. We use the CfnTrigger construct to stitch together multiple triggers to create the workflow. The AWS CDK L1 constructs expose all the available AWS CloudFormation resources and entities using methods from popular programing languages. This allows developers to use popular programing languages to provision resources instead of working with JSON or YAML files in AWS CloudFormation.

Refer to etl/infrastructure.py for additional details.

Walkthrough of the CI/CD pipeline

In this section, we walk through the various stages of the CI/CD pipeline. Refer to CDK Pipelines: Continuous delivery for AWS CDK applications for additional information.

Source – This stage fetches the source of the AWS CDK app from the CodeCommit repo and triggers the pipeline every time a new commit is made.
Build – This stage compiles the code (if necessary), runs the tests, and performs a cdk synth. The output of the step is a cloud assembly, which is used to perform all the actions in the rest of the pipeline. The pytest is run using the amazon/aws-glue-libs:glue_libs_3.0.0_image_01 Docker image. This image comes with all the required libraries to run tests for AWS Glue version 3.0 jobs using a Docker container. Refer to Develop and test AWS Glue version 3.0 jobs locally using a Docker container for additional information.
UpdatePipeline – This stage modifies the pipeline if necessary. For example, if the code is updated to add a new deployment stage to the pipeline or add a new asset to your application, the pipeline is automatically updated to reflect the changes.
Assets – This stage prepares and publishes all AWS CDK assets of the app to Amazon S3 and all Docker images to Amazon Elastic Container Registry (Amazon ECR). When the AWS CDK deploys an app that references assets (either directly by the app code or through a library), the AWS CDK CLI first prepares and publishes the assets to Amazon S3 using a CodeBuild job. This AWS Glue solution creates four assets.
CDKGlueStage – This stage deploys the assets to the AWS account. In this case, the pipeline deploys the AWS CDK template etl/infrastructure.py to create all the AWS Glue artifacts.

Code

The code can be found at AWS Samples on GitHub.

Prerequisites

This post assumes you have the following:

An AWS account
The AWS Command Line Interface (AWS CLI) installed
The GIT Command Line Interface (GIT CLI) installed
The AWS CDK Toolkit (cdk command) installed
Python 3 installed
Permissions to create AWS resources

Deploy the solution

To deploy the solution, complete the following steps:

Download the source code from the AWS Samples GitHub repository to the client machine:

$ git clone [email protected]:aws-samples/aws-glue-cdk-cicd.git

Create the virtual environment:

$ cd aws-glue-cdk-cicd 
$ python3 -m venv .venv

This step creates a Python virtual environment specific to the project on the client machine. We use a virtual environment in order to isolate the Python environment for this project and not install software globally.

Activate the virtual environment according to your OS:
- On MacOS and Linux, use the following code:

$ source .venv/bin/activate

- On a Windows platform, use the following code:

% .venv\Scripts\activate.bat

After this step, the subsequent steps run within the bounds of the virtual environment on the client machine and interact with the AWS account as needed.

Install the required dependencies described in requirements.txt to the virtual environment:

$ pip install -r requirements.txt

Bootstrap the AWS CDK app:

cdk bootstrap

This step populates a given environment (AWS account ID and Region) with resources required by the AWS CDK to perform deployments into the environment. Refer to Bootstrapping for additional information. At this step, you can see the CloudFormation stack CDKToolkit on the AWS CloudFormation console.

Synthesize the CloudFormation template for the specified stacks:

$ cdk synth # optional if not default (-c stage=default)

You can verify the CloudFormation templates to identify the resources to be deployed in the next step.

Deploy the AWS resources (CI/CD pipeline and AWS Glue based data pipeline):

$ cdk deploy # optional if not default (-c stage=default)

At this step, you can see CloudFormation stacks cdk-covid19-glue-stack-pipeline and cdk-covid19-glue-stack on the AWS CloudFormation console. The cdk-covid19-glue-stack-pipeline stack gets deployed first, which in turn deploys cdk-covid19-glue-stack to create the AWS Glue pipeline.

Verify the solution

When all the previous steps are complete, you can check for the created artifacts.

CloudFormation stacks

You can confirm the existence of the stacks on the AWS CloudFormation console. As shown in the following screenshot, the CloudFormation stacks have been created and deployed by cdk bootstrap and cdk deploy.

Figure 2 – AWS CloudFormation stacks

CodePipeline pipeline

On the CodePipeline console, check for the cdk-covid19-glue pipeline.

Figure 3 – AWS CodePipeline summary view

You can open the pipeline for a detailed view.

Figure 4 – AWS CodePipeline detailed view

AWS Glue workflow

To validate the AWS Glue workflow and its components, complete the following steps:

On the AWS Glue console, choose Workflows in the navigation pane.
Confirm the presence of the Covid_19 workflow.

Figure 5 – AWS Glue Workflow summary view

You can select the workflow for a detailed view.

Figure 6 – AWS Glue Workflow detailed view

Choose Triggers in the navigation pane and check for the presence of seven t-* triggers.

Figure 7 – AWS Glue Triggers

Choose Jobs in the navigation pane and check for the presence of three j_* jobs.

Figure 8 – AWS Glue Jobs

The jobs perform the following tasks:

- etlScripts/j_emit_start_event.py – A Python job that starts the workflow and creates the event
- etlScripts/j_neherlab_denorm.py – A Spark ETL job to transform the data and create a denormalized view by combining all the base data together in Parquet format
- etlScripts/j_emit_ended_event.py – A Python job that ends the workflow and creates the specific event

Choose Crawlers in the navigation pane and check for the presence of five neherlab-* crawlers.

Figure 9 – AWS Glue Crawlers

Execute the solution

The solution creates a scheduled AWS Glue workflow which runs at 10:00 AM UTC on day 1 of every month. A scheduled workflow can also be triggered on-demand. For the purpose of this post, we will execute the workflow on-demand using the following command from the AWS CLI. If the workflow is successfully started, the command returns the run ID. For instructions on how to run and monitor a workflow in Amazon Glue, refer to Running and monitoring a workflow in Amazon Glue.

aws glue start-workflow-run --name Covid_19

You can verify the status of a workflow run by execution the following command from the AWS CLI. Please use the run ID returned from the above command. A successfully executed Covid_19 workflow should return a value of 7 for SucceededActions and 0 for FailedActions.

aws glue get-workflow-run --name Covid_19 --run-id <run_ID>

A sample output of the above command is provided below.

{
"Run": {
"Name": "Covid_19",
"WorkflowRunId": "wr_c8855e82ab42b2455b0e00cf3f12c81f957447abd55a573c087e717f54a4e8be",
"WorkflowRunProperties": {},
"StartedOn": "2022-09-20T22:13:40.500000-04:00",
"CompletedOn": "2022-09-20T22:21:39.545000-04:00",
"Status": "COMPLETED",
"Statistics": {
"TotalActions": 7,
"TimeoutActions": 0,
"FailedActions": 0,
"StoppedActions": 0,
"SucceededActions": 7,
"RunningActions": 0
}
}
}

(Optional) To verify the status of the workflow run using AWS Glue console, choose Workflows in the navigation pane, select the Covid_19 workflow, click on the History tab, select the latest row and click on View run details. A successfully completed workflow is marked in green check marks. Please refer to the Legend section in the below screenshot for additional statuses.

Figure 10 – AWS Glue Workflow successful run

Check the output

When the workflow is complete, navigate to the Athena console to check the successful creation and population of neherlab_denormalized table. You can run SQL queries against all 5 tables to check the data. A sample SQL query is provided below.

SELECT "country", "location", "date", "cases", "deaths", "ecdc-countries",
        "acute_care", "acute_care_per_100K", "critical_care", "critical_care_per_100K" 
FROM "AwsDataCatalog"."covid19db"."neherlab_denormalized"
limit 10;

Figure 10 – Amazon Athena

Clean up

To clean up the resources created in this post, delete the AWS CloudFormation stacks in the following order:

cdk-covid19-glue-stack
cdk-covid19-glue-stack-pipeline
CDKToolkit

Then delete all associated S3 buckets:

cdk-covid19-glue-stack-p-pipelineartifactsbucketa-*
cdk-*-assets-<AWS_ACCOUNT_ID>-<AWS_REGION>
covid19-glue-config-<AWS_ACCOUNT_ID>-<AWS_REGION>
neherlab-denormalized-dataset-<AWS_ACCOUNT_ID>-<AWS_REGION>

Conclusion

In this post, we demonstrated a step-by-step guide to define, test, provision, and manage changes to an AWS Glue based ETL solution using the AWS CDK. We used an AWS Glue example, which has all the components to build a complex ETL solution, and demonstrated how to integrate individual AWS Glue components into a frictionless CI/CD pipeline. We encourage you to use this post and associated code as the starting point to build your own CI/CD pipelines for AWS Glue based ETL solutions.

About the authors

Puneet Babbar is a Data Architect at AWS, specialized in big data and AI/ML. He is passionate about building products, in particular products that help customers get more out of their data. During his spare time, he loves to spend time with his family and engage in outdoor activities including hiking, running, and skating. Connect with him on LinkedIn.

Suvojit Dasgupta is a Sr. Lakehouse Architect at Amazon Web Services. He works with customers to design and build data solutions on AWS.

Justin Kuskowski is a Principal DevOps Consultant at Amazon Web Services. He works directly with AWS customers to provide guidance and technical assistance around improving their value stream, which ultimately reduces product time to market and leads to a better customer experience. Outside of work, Justin enjoys traveling the country to watch his two kids play soccer and spending time with his family and friends wake surfing on the lakes in Michigan.

6 strategic ways to level up your CI/CD pipeline

2022-07-19 Damian Brady

Post Syndicated from Damian Brady original https://github.blog/2022-07-19-6-strategic-ways-to-level-up-your-ci-cd-pipeline/

In today’s world, a well-tuned CI/CD pipeline is a critical component for any development team looking to build and ship high-quality software fast. But here’s the thing: It’s rare you’ll find two CI/CD pipelines that are exactly the same. And that’s by design. Every CI/CD pipeline should be built to meet a team’s specific needs.

Despite this, there are levels of maturity when building a CI/CD pipeline that range from basic implementations to more advanced automation workflows. But wherever you are on your CI/CD journey, there are a few things you can do to level up your CI/CD pipeline.

With that, here are six strategic things I often see missing from CI/CD pipelines that can help any developer or team advance and improve their workflows.

Need a primer on how to build a CI/CD pipeline on GitHub? Check out our guide

1. Add performance, device compatibility, and accessibility testing

Performance, device compatibility, and accessibility testing are often a manual exercise—and something that some teams are only partially doing. Manually testing for these things can slow down your delivery cycle, so many teams either eat the costs or just don’t do it.

But if these things are important to you—and they should be—there are tools that can be included in your CI/CD pipeline to automate the testing for and discovery of any issues.

Performance and device compatibility testing

One tool, for example, is Playwright which can do end-to-end testing, automated testing, and everything in between. You can also use it to do UI testing so you can catch issues in your product.

Visual regression testing

There’s another class of tools that can help you automate visual regression testing to make sure you haven’t changed the UI when you weren’t intending to do so. That means you haven’t introduced any unexpected UI changes. This can be super useful for device compatibility testing too. If something looks bad on one device, you can quickly correct it.

Accessibility testing

This is another incredibly impactful class of automated tests to add to your CI/CD pipeline. Why? Because every one of your customers should be valuable to you—and if even just a fraction of your customers have trouble using your product, that matters.

There are a ton of accessibility testing tools that can tell you things like if you have appropriate content for screen readers or if the colors on your website make sense to someone with color blindness. A great example is Pa11y, an open source tool you can use to run automated accessibility tests via the command line or Node.js.

2. Incorporate more automated security testing

Security should always be part of your software delivery pipeline, and it’s incredibly vital in today’s environments. Even still, I’ve seen a number of teams and companies who aren’t incorporating automated security tests in their CI/CD pipelines and instead treat security as something that happens after the DevOps process takes place.

Here’s the good news: There are a lot of tools that can help you do this without too much effort—including GitHub-native tools like Dependabot, code scanning, secret scanning, and if you’re a GitHub Enterprise user, you can bundle all the security functionality GitHub offers and more with GitHub Advanced Security. But even with a free GitHub account, you still can use Dependabot on any public or private repository, and code scanning and secret scanning are available on all public repositories, too.

Dependabot, for example, can help you mitigate any potential issues in your dependencies by scanning them for outdated packages and automatically creating pull requests for teams to fix them. It can also be configured to automatically update any project dependencies, too.

This is super impactful. Developers and teams often don’t update their dependencies because of the time it takes—or, sometimes they even just forget to update their dependencies. Dependencies are a legitimate source of vulnerabilities that are all too often overlooked.

Additionally, code scanning and secret scanning are offered on the GitHub platform and can be built into your CI/CD pipeline to improve your security profile. Where code scanning offers SAST capabilities that show if your code itself contains any known vulnerabilities, secret scanning makes sure you’re not leaking any credentials to your repositories. It can also be used to prevent any pushes to your repository if there are any exposed credentials.

The biggest thing is that teams should treat security as something you do throughout the SDLC—and, not just before and after something goes to production. You should, of course, always be checking for security issues. But the earlier you can catch issues, the better (hello DevSecOps). So including security testing within your CI/CD pipeline is an essential practice.

A screenshot of automated security testing workflows on GitHub.

3. Build a phased testing strategy

Phased testing is a great strategy for making sure you’re able to deliver secure software fast and at scale. But it’s also something that takes time to build. And consequently, a lot of teams just aren’t doing it.

Often, developers will put all or most of their automated testing at the build phase in their CI/CD pipelines. That means the build can take a long time to execute. And while there’s nothing necessarily wrong with this, you may find that it takes longer to get feedback on your code.

With phased testing, you can catch the big things early and get faster feedback on your codebase. The goal is to have a quick build that rapidly tests the fundamentals with simpler tests such as unit tests. After this, you may then perhaps deploy your build to a test environment to execute additional tests such as some accessibility testing, user testing, and other things that may take longer to execute. This means you’re working your way through a number of possible issues starting with the most critical elements first.

As you get closer to production in a phased testing model, you’ll want to test more and more things. This will likely include key items such as regression testing to make sure previous bugs aren’t reappearing in your codebase. At this stage, things are less likely to go wrong. But you’ll want to effectively catch the big things early and then narrow your testing down to ensure you’re shipping a very high-quality application.

Oh, and of course, there’s also testing in production, which is its own thing. But you can incorporate post-deployment tests into your production environment. You may have a hypothesis you want to test about if something works in production and execute tests to find out. At GitHub, we do this a lot by releasing new features behind feature flags and then enabling that flag for a subset of our user base to collect feedback.

4. Invest in blue-green deployments for easier rollouts

When it comes to releasing a new version of an application, what’s one word you think of? For me, the big word is “stress” (although “excitement” and “relief” are a close second and third). Blue-green deployments are one way to improve how you roll out a new version of an application in your CI/CD pipeline, but it can also be a bit more complex, too.

In the simplest terms, a blue-green deployment involves having two or more versions of your application in production and slowly moving your users from an older version to a newer one. This means that when you need to update or deploy a new version of an application, it goes to an “unused” production environment, and you can slowly move your users across safely.

The benefit of this is you can quickly roll back any changes by redirecting users to another prod environment. It also leads to drastically reduced downtime while you’re deploying a new application version. You can get everything set up in the environment and then just point people to a new one.

Blue-green deployments are perfect when you have two environments that are interchangeable. In reality with larger systems, you may have a suite of web servers or a number of serverless applications running. In practice, this means you might be using a load balancer that can distribute traffic across multiple locations. The canonical example of a load balancer is nginx—but every cloud has its own offerings (like Azure Front Door or Elastic Load Balancing on AWS).

This kind of strategy is common among organizations using Kubernetes. You may have a number of pods that are running and when you do a deployment, Kubernetes will deploy updates to new instances and redirects traffic. The management of which ones are up and running operates under the same principles as blue-green deployments—but you’re also navigating a far more complex architecture.

5. Adopt infrastructure-as-code for greater flexibility

Infrastructure provisioning is the practice of building IT infrastructure as you need it—and some teams will adopt infrastructure-as-code (IaC) in their CI/CD pipelines to provision resources automatically at specific points in the pipeline.

I strongly recommend doing this. The goal of IaC is that when you’re deploying your application, you’re also deploying your infrastructure. That means you always know what your infrastructure looks like in production, and your testing environment is also replicable to what’s in production.

There are two benefits to building IaC into your CI/CD pipeline:

It helps you make sure that your application and the infrastructure it runs on are routinely being tested in tandem. The old school way of doing things was to say that this is a production machine and it looks like this—and this is our testing machine and we want it to be as close to production as possible. But almost always, you’ll find that production environments change over time—and it makes it harder to know what your production environment is.
It helps you mitigate any real-time issues with your infrastructure. That means if your production server goes down, it’s not a disaster—you can just re-deploy it (and even automate your redeployment at that).

Last but not least: building IaC into your CI/CD pipeline means you can more effectively do things like blue-green deployments. You can deploy a new version of an application—code and infrastructure included—and reroute your DNS to go to that version. If it doesn’t work, that’s fine—you can quickly roll back to your previous version.

A screenshot of a GitHub Actions Terraform workflow.

6. Create checkpoints for automated rollbacks

Ideally, you want to avoid ever having to roll back a software release. But let’s be honest. We all make mistakes and sometimes code that worked in your development or test environment doesn’t work perfectly in production.

When you need to roll back a release to a previous application version, automation makes it much easier to do so quickly. I think of a rollback as a general term for mitigating production problems by reverting to a previous version, whether that’s redeploying or restoring from backup. If you have a great CI/CD pipeline, you can ideally fix a problem and roll out an update immediately—so you can avoid having to go to a previous app version.

Looking for more ways to improve your CI/CD pipeline?

Try exploring the GitHub Marketplace for CI/CD and automation workflow templates. At the time I’m writing this, there are more than 14,000 pre-built, community-developed CI/CD and automation actions in the GitHub Marketplace. And, of course, you can always build your own custom workflows with GitHub Actions.

Explore the GitHub Marketplace

Additional resources

Jenkins high availability and disaster recovery on AWS

2022-06-29 James Bland

Post Syndicated from James Bland original https://aws.amazon.com/blogs/devops/jenkins-high-availability-and-disaster-recovery-on-aws/

We often hear from customers about their challenges architecting Jenkins for scale and high availability (HA). Jenkins was originally built as a continuous integration (CI) system to test software before it was committed to a repository. Since its beginning, Jenkins has grown out of necessity versus grand master plan. Developers who extended Jenkins favored speed of creating functionality over performance or scalability of the entire system. This is not to say that it’s impossible to scale Jenkins, it’s only mentioned here to highlight the challenges and technical debt that has accumulated because of the prioritization of features versus developing towards a specific architecture. In this post, we discuss these challenges and our proposed solution.

Challenges with Jenkins at scale and HA

Business and customer demand are forcing organizations to increase the speed and agility at which they release features and functionality. As organizations make this transition, the usage of continuous integration and continuous delivery (CI/CD) increases, which drives the need to scale Jenkins. Overlay this with an organization that commits hundreds of changes per day and works around the clock, with developers dispersed globally, and you end up with an operational situation where there is no room for downtime. To mitigate the risk of impacting an organization’s ability to release when they need it, developers require a system that not only scales but is also highly available.

The ability to scale Jenkins and provide HA comes down to two problems. One is the ability to scale compute to handle additional jobs, and the second is storage. To scale compute, we typically do it in one of two ways, horizontally or vertically. Horizontally means we scale Jenkins to add additional compute nodes. Scaling vertically means we scale Jenkins by adding more resources to the compute node.

Let’s start with the storage problem. Jenkins is designed around the local file system. Anyone who has spent time around Jenkins is aware that logs, cloned repos, plugins, and build artifacts are stored into JENKINS_HOME. Local file systems, while good for single-server designs, tend to be a challenge when HA comes into the picture. In on-premises designs, administrators have often used Network File System (NFS) and Storage Area Networks (SAN) to achieve some scale and resiliency. This type of design comes with a trade-off of performance and doesn’t provide the true HA and inherent disaster recovery (DR) required to meet the demands of the business.

Because of the local file system constraint, there are two native families of storage available in AWS: Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS). Amazon EBS is great for a single-server design in a single Availability Zone. The challenge is trying to scale a single-server design to support HA. Because of the requirement to assign an EBS volume to a specific Availability Zone, you can’t automatically transition the EBS volume to another Availability Zone and attach it to a Jenkins instance. If you don’t mind having an impact on Recovery Time Objective (RTO) and Recovery Point Objective (RPO), a solution using Amazon EBS snapshots copied to additional Availability Zones might work. Although EBS snapshot copy is possible, it’s not a recommended solution because it doesn’t scale and has complexities in building and maintaining this type of solution.

Amazon EFS as an alternative has worked well for customers that don’t have high usage patterns of Jenkins. All Jenkins instances within the Region can access the Amazon EFS file system and data durably stored in multiple Availability Zones. If a single Availability Zone experiences an outage, the Jenkins file system is still accessible from other Availability Zones providing HA for the storage layer. This solution is not recommended for high-usage systems due to the way that Jenkins reads and writes data. Jenkins’s access pattern is skewed towards writing data such as logs, cloned repos, and building artifacts versus reading data. Amazon EFS, on the other hand, is designed for workloads that read more than they write. On high-usage workloads, customers have experienced Jenkins build slowness and Jenkins page load latency. This is why Amazon EFS isn’t recommended for high-usage Jenkins systems.

Solution for Jenkins at scale and HA

Solving the compute problem is relatively straightforward by using Amazon Elastic Kubernetes Service (Amazon EKS). In the context of Jenkins, an organization would run Jenkins in an Amazon EKS cluster that spans multiple Availability Zones, as shown in the following diagram.

Diagram showing Jenkins deployment in Amazon EKS with three availability zones inside a VPC

Figure 1 –Jenkins deployment in Amazon EKS with multiple availability zones.

Jenkins Controller and Agent would run in an Availability Zone as a Kubernetes pod. Amazon EKS is designed around Desired State Configuration (DSC), which means that it continuously make sure that the running environment matches the configuration that has been applied to Amazon EKS. In practice, when Amazon EKS is told that you want a single pod of Jenkins running, it monitors and makes sure that pod is always running. If an Availability Zone is unavailable, Amazon EKS launches a new node in another Availability Zone and deploys all pods to meet any necessary constraints defined in Amazon EKS. With this option, we still need to have the data in other Availability Zones, which we cover later in this post.

The only option of scaling Jenkins controllers is vertical. Scaling Jenkins horizontally could lead to an undesirable state because the system wasn’t designed to have multiple instances of Jenkins attached to the same storage layer. There is no exclusive file locking mechanism to ensure data consistency. For organizations that have exhausted the limits with vertical scaling, the recommendation is to run multiple independent Jenkins controllers and separate them per team or group. Vertical scaling of Jenkins is simpler in Amazon EKS. Node sizes and container memory are controlled by configuration. Increasing memory size is as simple as changing a container’s memory setting. Due to the ease of changing configuration, it’s best to start with a lower memory setting, monitor performance, and increase as necessary. You want to find a good balance between price and performance.

For Jenkins agents, there are many options to scale the compute. In the context of scale and HA, the best options are to use AWS CodeBuild, AWS Fargate for Amazon EKS, or Amazon EKS managed node groups. With CodeBuild, you don’t need to provision, manage, or scale your build servers. CodeBuild scales continuously and processes multiple builds concurrently. You can use the Jenkins plugin for CodeBuild to integrate CodeBuild with Jenkins. Fargate is a good option but has some challenges if you’re trying to build container images within a container due to permissions necessary that aren’t exposed in Fargate. For additional information on how to overcome this challenge with Jenkins, refer to How to build container images with Amazon EKS on Fargate.

Now let’s look at the storage layer and see how LINBIT is helping organizations solve this problem with LINSTOR. LINBIT’s LINSTOR is an open-source management tool designed to manage block storage devices. Its primary use case is to provide Linux block storage for Kubernetes and other public and private cloud platforms. LINBIT also provides enterprise subscription for LINSTOR, which include technical support with SLA.

The following diagram illustrates a LINSTOR storage solution running on Amazon EKS using multiple Availability Zones and Amazon Simple Storage Service (Amazon S3) for snapshots.

Diagram showing LINSTOR storage solution running on Amazon EKS across three availability zone with snapshot stored in Amazon S3.

Figure 2. LINSTOR storage solution running on Amazon EKS using multiple availability zones and S3 for snapshot.

LINSTOR is composed of a control plane and a data plane. The control plane consists of a set of containers deployed into Amazon EKS and is responsible for managing the data plane. The data plane consists of a collection of open-source block storage software, most importantly LINBIT’s Distributed Replicated Storage System (DRBD) software. DRBD is responsible for provisioning and synchronously replicating storage between Amazon EKS worker instances in different Availability Zones.

LINSTOR is deployed via Helm into Amazon EKS, and the LINSTOR cluster is initialized by the LINSTOR Operator. Once deployed, LINSTOR volumes and volume snapshots are managed via Kubernetes Storage Classes and Snapshot Classes in a Kubernetes native fashion. LINSTOR volumes are backed by LINSTOR objects known as storage pools, which are composed of one or more EBS volumes attached to each Amazon EKS worker instance.

LINSTOR volumes layer DRBD on top of the worker’s attached EBS volume to enable synchronous replication between peers in the Amazon EKS cluster. This ensures that you have an identical copy of your persistent volume on the EBS volumes in each Availability Zone. In the event of an Availability Zone outage or planned migration, Amazon EKS moves the Jenkins deployment to another Availability Zone where the persistent volume copy is available. In terms of scaling, LINBIT DRDB supports up to 32 replicas per volume, with a maximum size of 1 PiB per volume. LINSTOR node itself can scale beyond hundreds of nodes, as shown in this case study.

LINSTOR also provides an HA Controller component in its control plane to speed up failover times during outages. LINSTOR’s HA Controller looks for pods with a specific label, and if LINSTOR’s persistent volumes replication network becomes interrupted (like during an Availability Zone outage), LINSTOR reschedules the pod sooner than the default Kubernetes pod-eviction-timeout.

LINBIT provides a detailed full installation for Jenkins HA in AWS. A sample of LINSTOR’s helm values supporting these features is as follows:

operator:
  satelliteSet:
    storagePools:
      lvmThinPools:
      - name: lvm-thin
        thinVolume: thinpool
        volumeGroup: ""
        devicePaths:
        - /dev/nvme1n1
    kernelModuleInjectionMode: Compile
stork:
  enabled: false
csi:
  enableTopology: true
etcd:
  replicas: 3
haController:
  replicas: 3

After LINSTOR is deployed, you create a Kubernetes StorageClass supporting persistent volumes with three replicas using the following example:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: "linstor-csi-lvm-thin-r3"
provisioner: linstor.csi.linbit.com
parameters:
  allowRemoteVolumeAccess: "false"
  autoPlace: "3"
  storagePool: "lvm-thin"
  DrbdOptions/Disk/disk-flushes: "no"
  DrbdOptions/Disk/md-flushes: "no"
  DrbdOptions/Net/max-buffers: "10000"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Finally, Jenkins helm charts are deployed into Amazon EKS with the following Helm values to request a PV from the LINSTOR StorageClass:

persistence:
  storageClass: linstor-csi-lvm-thin-r3
  size: "200Gi"
controller:
  serviceType: LoadBalancer
  podLabels:
    linstor.csi.linbit.com/on-storage-lost: remove

To protect against entire AWS Region outages and provide disaster recovery, LINSTOR takes volume snapshots and replicates it cross-Region using Amazon S3. LINSTOR requires read and write access to the target S3 bucket using AWS credentials provided as Kubernetes secrets:

kind: Secret
apiVersion: v1
metadata:
  name: linstor-csi-s3-access
  namespace: default
type: linstor.csi.linbit.com/s3-credentials.v1
immutable: true
stringData:
  access-key: REDACTED
  secret-key: REDACTED

The target S3 bucket is referenced as a snapshot shipping target using a LINSTOR S3 VolumeSnapshotClass. The following example shows a VolumeSnapshotClass referencing the S3 bucket’s secret and additional configuration for the target S3 bucket:

kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
  name: linstor-csi-snapshot-class-s3
driver: linstor.csi.linbit.com
deletionPolicy: Delete
parameters:
  snap.linstor.csi.linbit.com/type: S3
  snap.linstor.csi.linbit.com/remote-name: s3-us-west-2
  snap.linstor.csi.linbit.com/allow-incremental: "false"
  snap.linstor.csi.linbit.com/s3-bucket: name-of-bucket-123
  snap.linstor.csi.linbit.com/s3-endpoint: http://s3.us-west-2.amazonaws.com
  snap.linstor.csi.linbit.com/s3-signing-region: us-west-2
  snap.linstor.csi.linbit.com/s3-use-path-style: "false"
  # Secret to store access credentials
  csi.storage.k8s.io/snapshotter-secret-name: linstor-csi-s3-access
  csi.storage.k8s.io/snapshotter-secret-namespace: default

Jenkins deployment persistent volume claim (PVC) is stored as a snapshot in Amazon S3 by using a standard Kubernetes volumeSnapshot definition with LINSTOR’s snapshot class for Amazon S3:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: jenkins-dr-snapshot-0
spec:
  volumeSnapshotClassName: linstor-csi-snapshot-class-s3
  source:
    persistentVolumeClaimName: <jenkins-pvc-name>

Conclusion

In this post, we explained the challenges to scale Jenkins for HA and DR. We also reviewed Jenkins storage architecture with Amazon EBS and Amazon EFS and where to apply these. We demonstrated how you can use Amazon EKS to scale Jenkins compute for HA and how AWS partner solutions such as LINBIT LINSTOR can help scale Jenkins storage for HA and DR. Combining both solutions can help organizations maintain their ability to deploy software with speed and agility. We hope you found this post useful as you think through building your CI/CD infrastructure in AWS. To learn more about running Jenkins in Amazon EKS, check out Orchestrate Jenkins Workloads using Dynamic Pod Autoscaling with Amazon EKS. To find out more information about LINBIT’s LINSTOR, check the Jenkins technical guide.

Authors:

Integrating with GitHub Actions – CI/CD pipeline to deploy a Web App to Amazon EC2

2022-03-29 Mahesh Biradar

Post Syndicated from Mahesh Biradar original https://aws.amazon.com/blogs/devops/integrating-with-github-actions-ci-cd-pipeline-to-deploy-a-web-app-to-amazon-ec2/

Many Organizations adopt DevOps Practices to innovate faster by automating and streamlining the software development and infrastructure management processes. Beyond cultural adoption, DevOps also suggests following certain best practices and Continuous Integration and Continuous Delivery (CI/CD) is among the important ones to start with. CI/CD practice reduces the time it takes to release new software updates by automating deployment activities. Many tools are available to implement this practice. Although AWS has a set of native tools to help achieve your CI/CD goals, it also offers flexibility and extensibility for integrating with numerous third party tools.

In this post, you will use GitHub Actions to create a CI/CD workflow and AWS CodeDeploy to deploy a sample Java SpringBoot application to Amazon Elastic Compute Cloud (Amazon EC2) instances in an Autoscaling group.

GitHub Actions is a feature on GitHub’s popular development platform that helps you automate your software development workflows in the same place that you store code and collaborate on pull requests and issues. You can write individual tasks called actions, and then combine them to create a custom workflow. Workflows are custom automated processes that you can set up in your repository to build, test, package, release, or deploy any code project on GitHub.

AWS CodeDeploy is a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, serverless AWS Lambda functions, or Amazon Elastic Container Service (Amazon ECS) services.

Solution Overview

The solution utilizes the following services:

GitHub Actions – Workflow Orchestration tool that will host the Pipeline.
AWS CodeDeploy – AWS service to manage deployment on Amazon EC2 Autoscaling Group.
AWS Auto Scaling – AWS Service to help maintain application availability and elasticity by automatically adding or removing Amazon EC2 instances.
Amazon EC2 – Destination Compute server for the application deployment.
AWS CloudFormation – AWS infrastructure as code (IaC) service used to spin up the initial infrastructure on AWS side.
IAM OIDC identity provider – Federated authentication service to establish trust between GitHub and AWS to allow GitHub Actions to deploy on AWS without maintaining AWS Secrets and credentials.
Amazon Simple Storage Service (Amazon S3) – Amazon S3 to store the deployment artifacts.

The following diagram illustrates the architecture for the solution:

Architecture Diagram

Developer commits code changes from their local repo to the GitHub repository. In this post, the GitHub action is triggered manually, but this can be automated.
GitHub action triggers the build stage.
GitHub’s Open ID Connector (OIDC) uses the tokens to authenticate to AWS and access resources.
GitHub action uploads the deployment artifacts to Amazon S3.
GitHub action invokes CodeDeploy.
CodeDeploy triggers the deployment to Amazon EC2 instances in an Autoscaling group.
CodeDeploy downloads the artifacts from Amazon S3 and deploys to Amazon EC2 instances.

Prerequisites

Before you begin, you must complete the following prerequisites:

An AWS account with permissions to create the necessary resources.
A GitHub account with permissions to configure GitHub repositories, create workflows, and configure GitHub secrets.
A Git client to clone the provided source code.

Steps

The following steps provide a high-level overview of the walkthrough:

Clone the project from the AWS code samples repository.
Deploy the AWS CloudFormation template to create the required services.
Update the source code.
Setup GitHub secrets.
Integrate CodeDeploy with GitHub.
Trigger the GitHub Action to build and deploy the code.
Verify the deployment.

Download the source code

Clone the source code repository aws-codedeploy-github-actions-deployment.

git clone https://github.com/aws-samples/aws-codedeploy-github-actions-deployment.git

Create an empty repository in your personal GitHub account. To create a GitHub repository, see Create a repo. Clone this repo to your computer. Furthermore, ignore the warning about cloning an empty repository.

git clone https://github.com/<username>/<repoName>.git

Figure2: Github Clone

Copy the code. We need contents from the hidden .github folder for the GitHub actions to work.

cp -r aws-codedeploy-github-actions-deployment/. <new repository>

e.g. GitActionsDeploytoAWS

Now you should have the following folder structure in your local repository.

Repository folder structure

The .github folder contains actions defined in the YAML file.
The aws/scripts folder contains code to run at the different deployment lifecycle events.
The cloudformation folder contains the template.yaml file to create the required AWS resources.
Spring-boot-hello-world-example is a sample application used by GitHub actions to build and deploy.
Root of the repo contains appspec.yml. This file is required by CodeDeploy to perform deployment on Amazon EC2. Find more details here.

The following commands will help make sure that your remote repository points to your personal GitHub repository.

git remote remove origin

git remote add origin <your repository url>

git branch -M main

git push -u origin main

Deploy the CloudFormation template

To deploy the CloudFormation template, complete the following steps:

Open AWS CloudFormation console. Enter your account ID, user name, and Password.
Check your region, as this solution uses us-east-1.
If this is a new AWS CloudFormation account, select Create New Stack. Otherwise, select Create Stack.
Select Template is Ready
Select Upload a template file
Select Choose File. Navigate to template.yml file in your cloned repository at “aws-codedeploy-github-actions-deployment/cloudformation/template.yaml”.
Select the template.yml file, and select next.
In Specify Stack Details, add or modify the values as needed.

- Stack name = CodeDeployStack.
- VPC and Subnets = (these are pre-populated for you) you can change these values if you prefer to use your own Subnets)
- GitHubThumbprintList = 6938fd4d98bab03faadb97b34396831e3780aea1
- GitHubRepoName – Name of your GitHub personal repository which you created.

On the Options page, select Next.
Select the acknowledgement box to allow for the creation of IAM resources, and then select Create. It will take CloudFormation approximately 10 minutes to create all of the resources. This stack would create the following resources.

- Two Amazon EC2 Linux instances with Tomcat server and CodeDeploy agent are installed
- Autoscaling group with Internet Application load balancer
- CodeDeploy application name and deployment group
- Amazon S3 bucket to store build artifacts
- Identity and Access Management (IAM) OIDC identity provider
- Instance profile for Amazon EC2
- Service role for CodeDeploy
- Security groups for ALB and Amazon EC2

Update the source code

On the AWS CloudFormation console, select the Outputs tab. Note that the Amazon S3 bucket name and the ARM of the GitHub IAM Role. We will use this in the next step.

Update the Amazon S3 bucket in the workflow file deploy.yml. Navigate to /.github/workflows/deploy.yml from your Project root directory.

Replace ##s3-bucket## with the name of the Amazon S3 bucket created previously.

Replace ##region## with your AWS Region.

Update the Amazon S3 bucket name in after-install.sh. Navigate to aws/scripts/after-install.sh. This script would copy the deployment artifact from the Amazon S3 bucket to the tomcat webapps folder.

Remember to save all of the files and push the code to your GitHub repo.

Verify that you’re in your git repository folder by running the following command:

git remote -V

You should see your remote branch address, which is similar to the following:

username@3c22fb075f8a GitActionsDeploytoAWS % git remote -v

origin [email protected]:<username>/GitActionsDeploytoAWS.git (fetch)

origin [email protected]:<username>/GitActionsDeploytoAWS.git (push)

Now run the following commands to push your changes:

git add .

git commit -m “Initial commit”

git push

Setup GitHub Secrets

The GitHub Actions workflows must access resources in your AWS account. Here we are using IAM OpenID Connect identity provider and IAM role with IAM policies to access CodeDeploy and Amazon S3 bucket. OIDC lets your GitHub Actions workflows access resources in AWS without needing to store the AWS credentials as long-lived GitHub secrets.

These credentials are stored as GitHub secrets within your GitHub repository, under Settings > Secrets. For more information, see “GitHub Actions secrets”.

Navigate to your github repository. Select the Settings tab.
Select Secrets on the left menu bar.
Select New repository secret.
Select Actions under Secrets.
- Enter the secret name as ‘IAMROLE_GITHUB’.
- enter the value as ARN of GitHubIAMRole, which you copied from the CloudFormation output section.

Integrate CodeDeploy with GitHub

For CodeDeploy to be able to perform deployment steps using scripts in your repository, it must be integrated with GitHub.

CodeDeploy application and deployment group are already created for you. Please use these applications in the next step:

CodeDeploy Application =CodeDeployAppNameWithASG

Deployment group = CodeDeployGroupName

To link a GitHub account to an application in CodeDeploy, follow until step 10 from the instructions on this page.

You can cancel the process after completing step 10. You don’t need to create Deployment.

Trigger the GitHub Actions Workflow

Now you have the required AWS resources and configured GitHub to build and deploy the code to Amazon EC2 instances.

The GitHub actions as defined in the GITHUBREPO/.github/workflows/deploy.yml would let us run the workflow. The workflow is currently setup to be manually run.

Follow the following steps to run it manually.

Go to your GitHub Repo and select Actions tab

Select Build and Deploy link, and select Run workflow as shown in the following image.

After a few seconds, the workflow will be displayed. Then, select Build and Deploy.

You will see two stages:

Build and Package.
Deploy.

Build and Package

The Build and Package stage builds the sample SpringBoot application, generates the war file, and then uploads it to the Amazon S3 bucket.

You should be able to see the war file in the Amazon S3 bucket.

Deploy

In this stage, workflow would invoke the CodeDeploy service and trigger the deployment.

Verify the deployment

Select the Application name and deployment group. You will see the status as Succeeded if the deployment is successful.

Point your browsers to the URL of the Application Load balancer.

Note: You can get the URL from the output section of the CloudFormation stack or Amazon EC2 console Load Balancers.

Optional – Automate the deployment on Git Push

Workflow can be automated by changing the following line of code in your .github/workflow/deploy.yml file.

From

workflow_dispatch: {}


  #workflow_dispatch: {}
  push:
    branches: [ main ]
  pull_request:

This will be interpreted by GitHub actions to automaticaly run the workflows on every push or pull requests done on the main branch.

After testing end-to-end flow manually, you can enable the automated deployment.

Clean up

To avoid incurring future changes, you should clean up the resources that you created.

Empty the Amazon S3 bucket:
Delete the CloudFormation stack (CodeDeployStack) from the AWS console.
Delete the GitHub Secret (‘IAMROLE_GITHUB’)
1. Go to the repository settings on GitHub Page.
2. Select Secrets under Actions.
3. Select IAMROLE_GITHUB, and delete it.

Conclusion

In this post, you saw how to leverage GitHub Actions and CodeDeploy to securely deploy Java SpringBoot application to Amazon EC2 instances behind AWS Autoscaling Group. You can further add other stages to your pipeline, such as Test and security scanning.

Additionally, this solution can be used for other programming languages.

About the Authors

	Mahesh Biradar is a Solutions Architect at AWS. He is a DevOps enthusiast and enjoys helping customers implement cost-effective architectures that scale.
	Suresh Moolya is a Cloud Application Architect with Amazon Web Services. He works with customers to architect, design, and automate business software at scale on AWS cloud.

Parallel and dynamic SaaS deployments with AWS CDK Pipelines

2021-11-01 Jani Muuriaisniemi

Post Syndicated from Jani Muuriaisniemi original https://aws.amazon.com/blogs/devops/parallel-and-dynamic-saas-deployments-with-cdk-pipelines/

Software as a Service (SaaS) is an increasingly popular business model for independent software vendors (ISVs), including benefits such as a pay-as-you-go pricing model, scalability, and availability.

SaaS services can be built by using numerous architectural models. The silo model provides each tenant with dedicated resources and a shared-nothing architecture. Silo deployments also provide isolation between tenants’ compute resources and their data, and they help eliminate the noisy-neighbor problem. On the other hand, the pool model offers several benefits, such as lower maintenance overhead, simplified management and operations, and cost-saving opportunities, all due to a more efficient utilization of computing resources and capacity. In the bridge model, both silo and pool models are utilized side-by-side. The bridge model is a hybrid model, where parts of the system can be in a silo model, and parts in a pool.

End-customers benefit from SaaS delivery in numerous ways. For example, the service can be available from multiple locations, letting the customer choose what is best for them. The tenant onboarding process is often real-time and frictionless. To realize these benefits for their end-customers, SaaS providers need methods for reliable, fast, and multi-region capable provisioning and software lifecycle management.

This post will describe a deployment system for automating the provision and lifecycle management of workload components in pool or silo deployment models by using AWS Cloud Development Kit (AWS CDK) and CDK Pipelines. We will explore the system’s dynamic and database driven deployment model, as well as its multi-account and multi-region capabilities, and we will provision demo deployments of workload components in both the silo and pool models.

AWS Cloud Development Kit and CDK Pipelines

For this solution, we utilized AWS Cloud Development Kit (AWS CDK) and its CDK Pipelines construct library. AWS CDK is an open-source software development framework for modeling and provisioning cloud application resources by using familiar programming languages. AWS CDK lets you define your infrastructure as code and provision it through AWS CloudFormation.

CDK Pipelines is a high-level construct library with an opinionated implementation of a continuous deployment pipeline for your CDK applications. It is powered by AWS CodePipeline, a fully managed continuous delivery service that helps automate your release pipelines for fast and reliable application as well as infrastructure updates. No servers need to be provisioned or setup, and you only pay for what you use. This solution utilizes the recently released and stable CDK Pipelines modern API.

Business Scenario

As a baseline use case, we have selected the consideration of a fictitious ISV called Unicorn that wants to implement an SaaS business model.

Unicorn operates in several countries, and requires the storing of customer data within the customers’ chosen region. Currently, Unicorn needs two regions in order to satisfy its main customer base: one in EU and one in US. Unicorn expects rapid growth, and it needs a solution that can scale to thousands of tenants. Unicorn plans to have different tenant tiers with different isolation requirements. Their planned deployment model has the majority of tenants in shared pool instances, but they also plan to support dedicated silo instances for the tenants requiring it. The solution must also be easily extendable to new Regions as Unicorn’s business expands.

Unicorn is starting small with just a single development team responsible for currently the only component in their SaaS workload architecture. Following industry best practices, Unicorn has designed its workload architecture so that each component has a clear technical ownership boundary. The chosen solution must grow together with Unicorn, and support multiple independently developed and deployed components in the future.

Solution Overview

Today, many customers utilize AWS CodePipeline to build, test, and deploy their cloud applications. For an SaaS provider such as Unicorn, considering utilizing a single pipeline for managing every deployment presented concerns. At the scale that Unicorn requires, a single pipeline with potentially hundreds of actions runs the risk of becoming throughput limited. Moreover, a single pipeline would offer Unicorn limited control over how changes are released.

Our solution addresses this problem by having a separate dynamically provisioned pipeline for each pool and silo deployment. The solution is designed to manage multiple deployments of Unicorn’s single workload component, thereby aligning with their current needs — and with small changes, including future needs.

CDK Best Practices state that an AWS CDK application maps to a component as defined by the AWS Well-Architected Framework. A component is the code, configuration, and AWS Resources that together deliver against a workload requirement. And this is typically the unit of technical ownership. A component usually includes logical units (e.g., api, database), and can have a continuous deployment pipeline.

Utilizing CDK Pipelines provides a significant benefit: with no additional code, we can deploy cross-account and cross-region just as easily as we would to a single account and region. CDK Pipelines automatically creates and manages the required cross-account encryption keys and cross-region replication buckets. Furthermore, we only need to establish a trust relationship between the accounts during the CDK bootstrapping process.

The following diagram illustrates the solution architecture:

Solution Architecture Diagram

Figure 1: Solution architecture

Let’s look closer at the two primary high level solution flows: silo and pool pipeline provisioning (1 and 2), and component code deployment (3 and 4).

Provisioning is separated into a dedicated flow, so that code deployments do not interfere with tenant onboarding, and vice versa. At the heart of the provisioning flow is the deployment database (1), which is implemented by using an Amazon DynamoDB table.

Utilizing DynamoDB Streams and AWS Lambda Triggers, a new AWS CodeBuild provisioning project build (2) is automatically started after a record is inserted into the deployment database. The provisioning project directly provisions new silo and pool pipelines by using the “cdk deploy” command. Provisioning events are processed in parallel, so that the solution can handle possible bursts in Unicorn’s tenant onboarding volumes.

CDK best practices suggest that infrastructure and runtime code live in the same package. A single AWS CodeCommit repository (3) contains everything needed: the CI/CD pipeline definitions as well as the workload component code. This repository is the source artifact for every CodePipeline pipeline and CodeBuild project. The chapter “Managing application resources as code” describes related implementation details.

The CI/CD pipeline (4) is a CDK Pipelines pipeline, and it is responsible for the component’s Software Development Life Cycle (SDLC) activities. In addition to implementing the update release process, it is expected that most SaaS providers will also implement additional activities. This includes a variety of tests and pre-production environment deployments. The chapter “Controlling deployment updates” dives deeper into this topic.

Deployments have two parts: The pipeline (5) and the component resource stack(s) (6) that it manages. The pipelines are deployed to the central toolchain account and region, whereas the component resources are deployed to the AWS Account and Region, as specified in the deployments’ record in the deployment database.

Sample code for the solution is available in GitHub. The sample code is intended for utilization in conjunction with this post. Our solution is implemented in TypeScript.

Deployment Database

Our deployment database is an Amazon DynamoDB table, with the following structure:

Table structure explained in post.

Figure 2: DynamoDB table

‘id’ is a unique identifier for each deployment.
‘account’ is the AWS account ID for the component resources.
‘region’ is the AWS region ID for the component resources.
‘type’ is either ‘silo’ or ‘pool’, which defines the deployment model.

This design supports tenant deployment to multiple silo and pool deployments. Each of these can target any available and bootstrapped AWS Account and Region. For example, different pools can support tenants in different regions, with select tenants deployed to dedicated silos. As pools may be limited to how many tenants they can serve, the design also supports having multiple pools within a region, and it can easily be extended with an additional attribute to support the tiers concept.

Note that the deployment database does not contain tenant information. It is expected that such mapping is maintained in a separate tenant database, where each tenant record can map to the ID of the deployment that it is associated with.

Now that we have looked at our solution design and architecture, let’s move to the hands-on section, starting with the deployment requirements for the solution.

Prerequisites

The following tools are required to deploy the solution:

NodeJS version compatible with AWS CDK version 1.124.0
The AWS Command Line Interface (AWS CLI)
Git with git-remote-codecommit extension

To follow this tutorial completely, you should have administrator access to at least one, but preferably two AWS accounts:

Toolchain: Account for the SDLC toolchain: the pipelines, the provisioning project, the repository, and the deployment database.
Workload (optional): Account for the component resources.

If you have only a single account, then the toolchain account can be used for both purposes. Credentials for the account(s) are assumed to be configured in AWS CLI profile(s).

The instructions in this post use the following placeholders, which you must replace with your specific values:

<TOOLCHAIN_ACCOUNT_ID>: The AWS Account ID for the toolchain account
<TOOLCHAIN_PROFILE_NAME>: The AWS CLI profile name for the toolchain account credentials
<WORKLOAD_ACCOUNT_ID>: The AWS Account ID for the workload account
<WORKLOAD_PROFILE_NAME>: The AWS CLI profile name for the workload account credentials

Bootstrapping

The toolchain account, and all workload account(s), must be bootstrapped prior to first-time deployment.

AWS CDK and our solutions’ dependencies must be installed to start with. The easiest way to do this is to install them locally with npm. First, we need to download our sample code, so that the we have the package.json configuration file available for npm.

Note that throughout these instructions, many commands are broken over multiple lines for readability. Take care to execute the commands completely. It is always safe to execute each code block as a whole.

Clone the sample code repository from GitHub, and then install the dependencies by using npm:

git clone https://github.com/aws-samples/aws-saas-parallel-deployments
cd aws-saas-parallel-deployments
npm ci

CDK Pipelines requires use of modern bootstrapping. To ensure that this is enabled, start by setting the related environment variable:

export CDK_NEW_BOOTSTRAP=1

Then, bootstrap the toolchain account. You must bootstrap both the region where the toolchain stack is deployed, as well as every target region for component resources. Here, we will first bootstrap only the us-east-1 region, and later you can optionally bootstrap additional region(s).

To bootstrap, we use npx to execute the locally installed version of AWS CDK:

npx cdk bootstrap <TOOLCHAIN_ACCOUNT_ID>/us-east-1 --profile <TOOLCHAIN_PROFILE_NAME>

If you have a workload account that is separate from the toolchain account, then that account must also be bootstrapped. When bootstrapping the workload account, we will establish a trust relationship with the toolchain account. Skip this step if you don’t have a separate workload account.

The workload account boostrappings follows the security best practice of least privilege. First create an execution policy with the minimum permissions required to deploy our demo component resources. We provide a sample policy file in the solution repository for this purpose. Then, use that policy as the execution policy for the trust relationship between the toolchain account and the workload account

aws iam create-policy \
  --profile <WORKLOAD_PROFILE_NAME> \
  --policy-name CDK-Exec-Policy \
  --policy-document file://policies/workload-cdk-exec-policy.json
npx cdk bootstrap <WORKLOAD_ACCOUNT_ID>/us-east-1 \
  --profile <WORKLOAD_PROFILE_NAME> \
  --trust <TOOLCHAIN_ACCOUNT_ID> \
  --cloudformation-execution-policies arn:aws:iam::<WORKLOAD_ACCOUNT_ID>:policy/CDK-Exec-Policy

Toolchain deployment

Prior to being able to deploy for the first time, you must create an AWS CodeCommit repository for the solution. Create this repository in the toolchain account:

aws codecommit create-repository \
  --profile <TOOLCHAIN_PROFILE_NAME> \
  --region us-east-1 \
  --repository-name unicorn-repository

Next, you must push the contents to the CodeCommit repository. For this, use the git command together with the git-remote-codecommit extension in order to authenticate to the repository with your AWS CLI credentials. Our pipelines are configured to use the main branch.

git remote add unicorn codecommit::us-east-1://<TOOLCHAIN_PROFILE_NAME>@unicorn-repository
git push unicorn main

Now we are ready to deploy the toolchain stack:

export AWS_REGION=us-east-1
npx cdk deploy --profile <TOOLCHAIN_PROFILE_NAME>

Workload deployments

At this point, our CI/CD pipeline, provisioning project, and deployment database have been created. The database is initially empty.

Note that the DynamoDB command line interface demonstrated below is not intended to be the SaaS providers provisioning interface for production use. SaaS providers typically have online registration portals, wherein the customer signs up for the service. When new deployments are needed, then a record should automatically be inserted into the solution’s deployment database.

To demonstrate the solution’s capabilities, first we will provision two deployments, with an optional third cross-region deployment:

A silo deployment (silo1) in the us-east-1 region.
A pool deployment (pool1) in the us-east-1 region.
A pool deployment (pool2) in the eu-west-1 region (optional).

To start, configure the AWS CLI environment variables:

export AWS_REGION=us-east-1
export AWS_PROFILE=<TOOLCHAIN_PROFILE_NAME>

Add the deployment database records for the first two deployments:

aws dynamodb put-item \
  --table-name unicorn-deployments \
  --item '{
    "id": {"S":"silo1"},
    "type": {"S":"silo"},
    "account": {"S":"<WORKLOAD_ACCOUNT_ID>"},
    "region": {"S":"us-east-1"}
  }'
aws dynamodb put-item \
  --table-name unicorn-deployments \
  --item '{
    "id": {"S":"pool1"},
    "type": {"S":"pool"},
    "account": {"S":"<WORKLOAD_ACCOUNT_ID>"},
    "region": {"S":"us-east-1"}
  }'

This will trigger two parallel builds of the provisioning CodeBuild project. Use the CodeBuild Console in order to observe the status and progress of each build.

Cross-region deployment (optional)

Optionally, also try a cross-region deployment. Skip this part if a cross-region deployment is not relevant for your use case.

First, you must bootstrap the target region in the toolchain and the workload accounts. Bootstrapping of eu-west-1 here is identical to the bootstrapping of the us-east-1 region earlier. First bootstrap the toolchain account:

npx cdk bootstrap <TOOLCHAIN_ACCOUNT_ID>/eu-west-1 --profile <TOOLCHAIN_PROFILE_NAME>

If you have a separate workload account, then we must also bootstrap it for the new region. Again, please skip this if you have only a single account:

npx cdk bootstrap <WORKLOAD_ACCOUNT_ID>/eu-west-1 \
  --profile <WORKLOAD_PROFILE_NAME> \
  --trust <TOOLCHAIN_ACCOUNT_ID> \
  --cloudformation-execution-policies arn:aws:iam::<WORKLOAD_ACCOUNT_ID>:policy/CDK-Exec-Policy

Then, add the cross-region deployment:

aws dynamodb put-item \
  --table-name unicorn-deployments \
  --item '{
    "id": {"S":"pool2"},
    "type": {"S":"pool"},
    "account": {"S":"<WORKLOAD_ACCOUNT_ID>"},
    "region": {"S":"eu-west-1"}
  }'

Validation of deployments

After the builds have completed, use the CodePipeline console to verify that the deployment pipelines were successfully created in the toolchain account:

CodePipeline console showing Pool-pool2-pipeline, Pool-pool1-pipeline and Silo-silo1-pipeline all Succeeded most recent execution.

Figure 3: CodePipeline console

Similarly, in the workload account, stacks containing your component resources will have been deployed to each configured region for the deployments. In this demo, we are deploying a single “hello world” container application utilizing AWS App Runner as runtime environment. Successful deployment can be verified by using CloudFormation Console:

Console showing Pool-pool1-resources with status of CREATE_COMPLETE

Figure 4: CloudFormation console

Now that we have successfully finished with our demo deployments, let’s look at how updates to the pipelines and the component resources can be managed.

Managing application resources as code

As highlighted earlier in the Solution Overview, every aspect of our solution shares a single source repository. With all of our code in a single source, we can easily deliver complex changes impacting multiple aspects of our solution. And all of this can be packaged, tested, and released as a single change set. For example, a change can introduce a new stage to the CI/CD pipeline, modify an existing stage in the silo and pool pipelines, and/or make code and resource changes to the component resources.

Managing the pipeline definitions is made simple by the self-mutate capability of the CDK Pipelines. Once initially deployed, each CDK Pipelines pipeline can update its own definition. This is implemented by using a separate SelfMutate stage in the pipeline definition. This stage is executed before any deployment actions, thereby ensuring that the pipeline always executes the latest version that is defined by the source code.

Managing how and when the pipelines trigger to execute also required attention. CDK Pipelines configures pipelines by default to utilize event-based polling of the source repository. While this is a reasonable default, and it is great for the CI/CD pipeline, it is undesired for our silo and pool pipelines. If all of these pipelines would execute automatically on code commits to the source repository, the CI/CD pipeline could not manage the release flow. To address this, we have configured the silo and pool pipelines with the trigger in the CodeCommitSourceOptions to NONE.

Controlling deployment updates

A key aspect of SaaS delivery is controlling how you roll out changes to tenants. Significant business risk can arise if changes are released to all tenants all-at-once in a single big bang.

This risk can be managed by utilizing a combination of silo and pool deployments. Reduce your risk by spreading tenants into multiple pools, and gradually rolling out your changes to these pools. Based on business needs and/or risk assessment, select customers can be provisioned into dedicated silo deployments, thereby allowing update control for those customers separately. Note that while all of a pool’s tenants get the same underlying update simultaneously, you can utilize feature flags to selectively enable new features only for specific tenants in the deployment.

In the demo solution, the CI/CD pipeline contains only a single custom stage “UpdateDeployments”. This CodeBuild action implements a simple “one-at-a-time” strategy. The code has been purposely written so that it is simple and provides you with a starting point to implement your own more complex strategy, as based on your unique business needs. In the default implementation, every silo and pool pipeline tracks the same “main” branch of the repository. Releases are governed by controlling when each pipeline executes to update its resources.

When designing your release strategy, look into how the planned process helps implement releases and changes with high quality and frequency. A typical starting point is a CI/CD pipeline with continuous automated deployments via multiple test and staging environments in order to validate your changes prior to deployment to any production tenants.

Furthermore, consider if utilizing a canary release strategy would help identify potential issues with your changes prior to rolling them out across all deployments in production. In a canary release, each change is first deployed only to a small subset of your deployments. Once you are satisfied with the change quality, then the change can either automatically or manually be released to the rest of your deployments. As an example, an AWS Step Functions state machine could be combined with the solution, and then utilized to control the release flow, execute validation tests, implement approval steps (either manual or automatic), and even conduct rollback if necessary.

Further considerations

The example in this post provisions every silo and pool deployment to a single AWS account. However, the solution is not limited to a single account, and it can deploy equally easily to multiple AWS accounts. When operating at scale, it is best-practice to spread your workloads to several accounts. The Organizing Your AWS Environment using Multiple Accounts whitepaper has in-depth guidance on strategies for spreading your workloads.

If combined with an AWS account-vending machine implementation, such as an AWS Control Tower Landing Zone, then the demo solution could be adapted so that new AWS accounts are provisioned automatically. This would be useful if your business requires full account-level deployment isolation, and you also want automated provisioning.

To meet Unicorn’s future needs for spreading their solution architecture over multiple separate components, the deployment database and associated lambda function could be decoupled from the rest of the toolchain components in order to provide a central deployment service. When provisioned as standalone, and amended with Amazon Simple Notification Service-based notifications sent to the component deployment systems for example, this central deployment service could be utilized for managing the deployments for multiple components.

In addition, you should analyze your deployment lifecycle transitions, and then consider what action should be taken when a tenant is disabled and/or deleted. Implementing a deployment archival/deletion process is not in the scope of this post.

Cleanup

To cleanup every resource deployed in this post, conduct the following actions:

In the workload account:
1. In us-east-1 Region, delete CloudFormation stacks named “pool-pool1-resources” and “silo-silo1-resources” and the CDK bootstrap stack “CDKToolKit”.
2. In eu-west-1 Region, delete CloudFormation stack named “pool-pool2-resources” and the CDK Bootstrap stack “CDKToolKit”
In the toolchain account:
1. In us-east-1 Region, delete CloudFormation stacks “toolchain”, “pool-pool1-pipeline”, “pool-pool2-pipeline”, “silo-silo1-pipeline” and the CDK bootstrap stack “CDKToolKit”.
2. In eu-west-1 Region, delete CloudFormation stack “pool-pool2-pipeline-support-eu-west-1” and the CDK bootstrap stack “CDKToolKit”
3. Cleanup and delete S3 buckets “toolchain-*”, “pool-pool1-pipeline-*”, “pool-pool2-pipeline-*”, and “silo-silo1-pipeline-*”.

Conclusion

This solution demonstrated an implementation of an automated SaaS application component deployment factory. We covered how an ISV venturing into the SaaS model can utilize AWS CDK and CDK Pipelines in order to avoid a multitude of undifferentiated heavy lifting by leveraging and combining AWS CDK’s cross-region and cross-account capabilities with CDK Pipelines’ self-mutating deployment pipelines. Furthermore, we demonstrated how all of this can be written, managed, and released just like any other code you write. We also demonstrated how a single dynamic provisioning system can be utilized to operate in a mixed mode, with both silo and pool deployments.

Visit the AWS SaaS Factory Program page for further information on how AWS can help you on your SaaS journey — regardless of the stage you are currently in.

About the authors

AWS Control Tower Account vending through Amazon Lex ChatBot

2021-10-19 Marco Fischer

Post Syndicated from Marco Fischer original https://aws.amazon.com/blogs/devops/aws-control-tower-account-vending-through-amazon-lex-chatbot/

In this blog post you will learn about a multi-environment solution that uses a cloud native CICD pipeline to build, test, and deploy a Serverless ChatOps bot that integrates with AWS Control Tower Account Factory for AWS account vending. This solution can be used and integrated with any of your favourite request portal or channel that allows to call a RESTFUL API endpoint, for you to offer AWS Account vending at scale for your enterprise.

Introduction

Most of the AWS Control Tower customers use the AWS Control Tower Account Factory (a Service Catalog product), and the ServiceCatalog service to vend standardized AWS Services and Products into AWS Accounts. ChatOps is a collaboration model that interconnects a process with people, tools, and automation. It combines a Bot that can fulfill service requests (the work needed) and be augmented by Ops and Engineering staff in order to allow approval processes or corrections in the case of exception request. Major tasks in the public Cloud go toward building a proper foundation (the so called LandingZone). The main goals of this foundation are providing not only an AWS Account access (with the right permissions), but also the correct Cloud Center of Excellence (CCoE) approved products and services. This post demonstrates how to utilize the existing AWS Control Tower Account Factory, extending the Service Catalog portfolio in Control Tower with additional products, and executing Account vending and Product vending through an easy ChatBot interface. You will also learn how to utilize this Solution with Slack. But it can also be easily utilized with Chime/MS Teams or a normal Web-frontend, as the integration is channel-agnostig through an API Gateway integration layer. Then, you will combine all of this, integrating a ChatBot frontend where users can issue requests against the CCoE and Ops team to fulfill AWS services easily and transparently. As a result, you experience a more efficient process for vending AWS Accounts and Products and taking away the burden on your Cloud Operations team.

Background

An AWS Account Factory Account account is an AWS account provisioned using account factory in AWS Control Tower.
AWS Service Catalog lets you to centrally manage commonly deployed IT services. For this blog, account factory utilizes AWS Service Catalog to provision new AWS accounts.
A Control Tower provisioned product is an instance of the Control Tower Account Factory product that is provisioned by AWS Service Catalog. In this post, any new AWS account created through the ChatOps solution will be a provisioned product and visible in Service Catalog.
Amazon Lex: is a service for building conversational interfaces into any application using voice and text

Architecture Overview

The following architecture shows the overview of the solution which will be built with the code provided through Github.

Multi-Environment CICD Architecture

The multi-environment pipeline is building 3 environments (Dev, Staging, Production) with different quality gates to push changes on this solution from a “Development Environment” up to a “Production environment”. This will make sure that your AWS ChatBot and the account vending is scalable and fully functional before you release it to production and make it available to your end-users.

AWS Code Commit: There are two repositories used, one repository where Amazon Lex bot is created through a Java-Lambda function and installed in STEP 1. And one for the Amazon Lex bot APIs that are running and capturing the Account vending requests behind API Gateway and then communicating with the Amazon Lex Bot.
AWS Code Pipeline: It integrates CodeCommit and CodeBuild and CodeDeploy, to be manage your release pipelines moving from Dev to Production.
AWS Code Build: Each different activity executed inside the pipeline is a CodeBuild activity. Inside the source code repository there are different files with the prefix buildspec-. Each of these files contains the exact commands that the code build must execute on each of the stages: build/test.
AWS Code Deploy: Tthis is an AWS service that manages the deployment of the serverless application stack. In this solution it implements a canary deployment where in the first minute we switch 10% of the requests to the new version of it which will allow to test the scaling of the solution. (CodeDeployDefaultLambdaCanary10Percent5Minutes)

AWS ControlTower Account Vending integration and ChatOps bot architecture

AWS ControlTower Account Vending integration and ChatOps bot architecture

The actual Serverless Application architecture built with Amazon Lex and the Application code in Lambda accessible through Amazon API Gateway, which will allow you to integrate this solution with almost any front-end (Slack, MS Teams, Website).

Amazon Lex: With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots (“chatbots”). As Amazon lex is not available yet in all AWS regions that currently AWS Control Tower is supported, it may be that you want to deploy Amazon Lex in another region than you have AWS Control Tower deployed.
Amazon API Gateway / AWS Lambda: The API Gateway is used as a central entry point for the Lambda functions (AccountVendor) that are capturing the Account vending requests from a frontend (e.g. Slack or Website). As Lambda functions can not be exposed directly as a REST service, they need a trigger which in this case API Gateway does.
Amazon SNS: Amazon Simple Notification Service (Amazon SNS) is a fully managed messaging service. SNS is used to send notifications via e-mail channel to an approver mailbox.
Amazon DynamoDB: Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multi-region, multi-active, durable database. Amazon DynamoDB will store the Account vending requests from the Lambda code that get triggered by the Lex-bot interaction.

Solution Overview and Prerequisites

Solution Overview

Start with building these 2 main components of the Architecture through an automated script. This will be split into “STEP 1”, and “STEP 2” in this walkthrough. “STEP 3” and “STEP 4” will be testing the solution and then integrating the solution with a frontend, in this case we use Slack as an example and also provide you with the Slack App manifest file to build the solution quickly.

STEP 1) “Install Amazon Lex Bot”: The key part of the left side of the Architecture, the Amazon Lex Bot called (“ChatOps” bot) will be built in a first step, then
STEP 2) “Build of the multi-environment CI/CD pipeline”: Build and deploy a full load testing DevOps pipeline that will stresstest the Lex bot and its capabilities to answer to requests. This will build the supporting components that are needed to integrate with Amazon Lex and are described below (Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon SNS).
STEP 3) “Testing the ChatOps Bot”: We will execute some test scripts through Postman, that will trigger Amazon API Gateway and trigger a sample Account request that will require a feedback from the ChatOps Lex Bot.
STEP 4) “Integration with Slack”: The final step is an end-to-end integration with an communication platform solution such as Slack.

The DevOps pipeline (using CodePipeline, CodeCommit, CodeBuild and CodeDeploy) is automatically triggered when the stack is deployed and the AWS CodeCommit repository is created inside the account. The pipeline builds the Amazon Lex ChatOps bot from the source code. The Step 2 integrates the surrounding components with the ChatOps Lex bot in 3 different environments: Dev/Staging/Prod. In addition to that, we use canary deployment to promote updates in the lambda code from the AWS CodeCommit repository. During the canary deployment we implemented the rollback procedure using a log metric filter that scans the word Exception inside the log file in CloudWatch. When the word is found, an alarm is triggered and deployment is automatically rolled back. Usually, the rollback will occur automatically during the load test phase. This would prevent faulty code from being promoted into the production environment.

Prerequisites

For this walkthrough, you should have the following prerequisites ready. What you’ll need:

An AWS account
A ready AWS ControlTower deployment (needs 3 AWS Accounts/e-mail addresses)
AWS Cloud9 IDE or a development environment with access to download/run the scripts provided through Github
You need to log into the AWS Control Tower management account with AWSAdministratorAccess role if using AWS SSO or equivalent permissions if you are using other federations.

Walkthrough

To get started, you can use Cloud9 IDE or log into your AWS SSO environment within AWS Control Tower.

Prepare: Set up the sample solution

1.1. Clone the GitHub repository to your Cloud9 environment.

The complete solution can be found at the GitHub repository here. The actual deployment and build are scripted in shell, but the Serverless code is in Java and uses Amazon Serverless services to build this solution (Amazon API Gateway, Amazon DynamoDB, Amazon SNS).

git clone https://github.com/aws-samples/multi-environment-chatops-bot-for-controltower

STEP 1: Install Amazon Lex Bot

Amazon Lex is currently not deployable natively with Amazon CloudFormation. Therefore the solution is using a custom Lambda resource in Amazon CloudFormation to create the Amazon Lex bot. We will create the Lex bot, along some sample utterances, three custom slots (Account Type, Account E-Mail and Organizational OU) and one main intent (“Control Tower Account Vending Intent”) to capture the request to trigger an AWS Account vending process.

2.1. Start the script, “deploy.sh” and provide the below inputs. Select a project name. You can override it if you wan’t to choose a custom name and select the bucket name accordingly (we recommend to use the default names)

./deploy.sh

Choose a project name [chatops-lex-bot-xyz]:

Choose a bucket name for source code upload [chatops-lex-bot-xyz]:

2.2. To confirm, double check the AWS region you have specificed.

Attention: Make sure you have configured your AWS CLI region! (use either 'aws configure' or set your 'AWS_DEFAULT_REGION' variable).

Using region from $AWS_DEFAULT_REGION: eu-west-1

2.3. Then, make sure you choose the region where you want to install Amazon Lex (make sure you use an available AWS region where Lex is available), or use the default and leave empty. The Amazon Lex AWS region can be different as where you have AWS ControlTower deployed.

Choose a region where you want to install the chatops-lex-bot [eu-west-1]:

Using region eu-west-1

2.4. The script will create a new S3 bucket in the specified region in order to upload the code to create the Amazon Lex bot.

Creating a new S3 bucket on eu-west-1 for your convenience...
make_bucket: chatops-lex-bot-xyz
Bucket chatops-lex-bot-xyz successfully created!

2.5. We show a summary of the bucket name and the project being used.

Using project name................chatops-lex-bot-xyz
Using bucket name.................chatops-lex-bot-xyz

2.6 Make sure that if any of these names or outputs are wrong, you can still stop here by pressing Ctrl+c.

If these parameters are wrong press ctrl+c to stop now...

2.7 The script will upload the source code to the S3 bucket specified, you should see a successful upload.

Waiting 9 seconds before continuing
upload: ./chatops-lex-bot-xyz.zip to s3://chatops-lex-bot-xyz/chatops-lex-bot-xyz.zip

2.8 Then, the script will trigger an aws cloudformation package command, that will use the uploaded zip file, reference it and generate a ready CloudFormation yml file for deployment. The output of the generated package-file (devops-packaged.yml) will be stored locally and used to executed the aws cloudformation deploy command.

Successfully packaged artifacts and wrote output template to file devops-packaged.yml.

Note: You can ignore this part below as the shell script will execute the “aws cloudformation deploy” command for you.

Execute the following command to deploy the packaged template

aws cloudformation deploy --template-file devops-packaged.yml --stack-name <YOUR STACK NAME>

2.9 The AWS CloudFormation scripts should be running in the background

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - chatops-lex-bot-xyz-cicd

2.10 Once you see the successful output of the CloudFormation script “chatops-lex-bot-xyz-cicd”, everything is ready to continue.

------------------------------------------
ChatOps Lex Bot Pipeline is installed
Will install the ChatOps API as an Add-On to the Vending Machine
------------------------------------------

2.11 Before we continue, confirm the output of the AWS CloudFormation called “chatops-lex-bot-xyz-cicd”. You should find three outputs from the CloudFormation template.

A CodePipeline, CodeCommit Repository with the same naming convention (chatops-lex-bot-xyz), and a CodeBuild execution with one stage (Prod). The execution of this pipeline should show as “Succeeded” within CodePipeline.
As a successful result of the execution of the Pipeline, you should find another CloudFormation that was triggered, which you should find in the output of CodeBuild or the CloudFormation Console (chatops-lex-bot-xyz-Prod).
The created resource of this CloudFormation will be the Lambda function (chatops-lex-bot-xyz-Prod-AppFunction-abcdefgh) that will create the Amazon Lex Bot. You can find the details in Amazon Lambda in the Mgmt console. For more information on CloudFormation and custom resources, see the CloudFormation documentation.
You can find the successful execution in the CloudWatch Logs:

Adding Slot Type:: AccountTypeValues
Adding Slot Type:: AccountOUValues
Adding Intent:: AWSAccountVending
Adding LexBot:: ChatOps
Adding LexBot Alias:: AWSAccountVending

Check if the Amazon Lex bot has been created in the Amazon Lex console, you should see an Amazon Lex bot called “ChatOps” with the status “READY”.

2.12. This means you have successfully installed the ChatOps Lex Bot. You can now continue with STEP 2.

STEP 2. Build of the multi-environment CI/CD pipeline

In this section, we will finalize the set up by creating a full CI/CD Pipeline, the API Gateway and Lambda functions that can capture requests for Account creation (AccountVendor) and interact with Amazon Lex, and a full testing cycle to do a Dev-Staging-Production build pipeline that does a stress test on the whole set of Infrastructure created.

3.1 You should see the same name of the bucket and project as used previously. If not, please override the input here. Otherwise, leave empty (we recommend to use the default names).

Choose a bucket name for source code upload [chatops-lex-xyz]:

3.2. This means that the Amazon Lex Bot was successfully deployed, and we just confirm the deployed AWS region.

ChatOps-Lex-Bot is already deployed in region eu-west-1

3.3 Please specify a mailbox that you have access in order to approve new ChatOps (e.g. Account vending) vending requests as a manual approver step.

Choose a mailbox to receive approval e-mails for new accounts: [email protected]

3.4 Make sure you have the right AWS region where AWS Control Tower has deployed its Account Factory Portfolio product in Service Catalog (to double check you can log into AWS Service Catalog and confirm that you see the AWS Control Tower Account Factory)

Choose the AWS region where your vending machine is installed [eu-west-1]:
Using region eu-west-1

Creating a new S3 bucket on eu-west-1 for your convenience...
{
"Location": "http://chatops-lex-xyz.s3.amazonaws.com/"
}

Bucket chatops-lex-xyz successfully created!

3.5 Now the script will identify if you have Control Tower deployed and if it can identify the Control Tower Account Factory Product.

Trying to find the AWS Control Tower Account Factory Portfolio

Using project name....................chatops-lex-xyz
Using bucket name.....................chatops-lex-xyz
Using mailbox for approvals...........approvermail+chatops-lex-bot-xyz@yourdomain.com
Using lexbot region...................eu-west-1
Using service catalog portfolio-id....port-abcdefghijklm

If these parameters are wrong press ctrl+c to stop now…

3.6 If something is wrong or has not been set and you see an empty line for any of the, stop here and press ctr+c. Check the Q&A section if you might have missed some errors previously. These values need to be filled to proceed.

Waiting 1 seconds before continuing
[INFO] Scanning for projects...
[INFO] Building Serverless Jersey API 1.0-SNAPSHOT

3.7 You should see a “BUILD SUCCESS” message.

[INFO] BUILD SUCCESS
[INFO] Total time: 0.190 s

3.8 Then the package built locally will be uploaded to the S3 bucket, and then again prepared for Amazon CloudFormation to package- and deploy.

upload: ./chatops-lex-xyz.zip to s3://chatops-lex-xyz/chatops-lex-xyz.zip

Successfully packaged artifacts and wrote output template to file devops-packaged.yml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file devops-packaged.yml --stack-name <YOUR STACK NAME>

3.9 You can neglect the above message, as the shell script will execute the Cloudformation API for you. The AWS CloudFormation scripts should be running in the background, and you can double check in the AWS Mgmt Console.

Waiting for changeset to be created..
Waiting for stack create/update to complete

Successfully created/updated stack - chatops-lex-xyz-cicd
------------------------------------------
ChatOps Lex Pipeline and Chatops Lex Bot Pipelines successfully installed
------------------------------------------

3.10 This means that the Cloud Formation scripts have executed successfully. Lets confirm in the Amazon CloudFormation console, and in Code Pipeline if we have a successful outcome and full test-run of the CICD pipeline. To remember, have a look at the AWS Architecture overview and the resources / components created.

You should find the successful Cloud Formation artefacts named:

chatops-lex-xyz-cicd: This is the core CloudFormation that we created and uploaded that built a full CI/CD pipeline with three phases (DEV/STAGING/PROD). All three stages will create a similar set of AWS resources (e.g. Amazon API Gateway, AWS Lambda, Amazon DynamoDB), but only the Staging phase will run an additional Load-Test prior to doing the production release.
chatops-lex-xyz-DEV: A successful build, creation and deployment of the DEV environment.
chatops-lex-xyz-STAGING: The staging phase will run a set of load tests, for a full testing and through io (an open-source load testing framework)
chatops-lex-xyz-PROD: A successful build, creation and deployment of the Production environment.

3.11 For further confirmation, you can check the Lambda-Functions (chatops-lex-xyz-pipeline-1-Prod-ChatOpsLexFunction-), Amazon DynamoDB (chatops-lex-xyz-pipeline-1_account_vending_) and Amazon SNS (chatops-lex-xyz-pipeline-1_aws_account_vending_topic_Prod) if all the resources as shown in the Architecture picture have been created.

Within Lambda and/or Amazon API Gateway, you will find the API Gateway execution endpoints, same as in the Output section from CloudFormation:

ApiUrl: https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account
ApiApproval https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/confirm

3.11 This means you have successfully installed the Amazon Lex ChatOps bot, and the surrounding test CI/CD pipeline. Make sure you have accepted the SNS subscription confirmation.

AWS Notification - Subscription Confirmation

You have chosen to subscribe to the topic:
arn:aws:sns:eu-west-1:12345678901:chatops-lex-xyz-pipeline_aws_account_vending_topic_Prod
To confirm this subscription, click or visit the link below (If this was in error no action is necessary)

STEP 3: Testing the ChatOps Bot

In this section, we provided a test script to test if the Amazon Lex Bot is up and if Amazon API Gateway/Lambda are correctly configured to handle the requests.

4.1 Use the Postman script under the /test folder postman-test.json, before you start integrating this solution with a Chat or Web- frontend such as Slack or a custom website in Production.

4.2. You can import the JSON file into Postman and execute a RESTful test call to the API Gateway endpoint.

4.3 Once the script is imported in Postman, you should execute the two commands below and replace the HTTP URL of the two requests (Vending API and Confirmation API) by the value of APIs recently created in the Production environment. Alternatively, you can also access these values directly from the Output tab in the CloudFormation stack with a name similar to chatops-lex-xyz-Prod:

aws cloudformation describe-stacks --query "Stacks[0].Outputs[?OutputKey=='ApiUrl'].OutputValue" --output text

aws cloudformation describe-stacks --query "Stacks[0].Outputs[?OutputKey=='ApiApproval'].OutputValue" --output text

4.4 Execute an API call against the PROD API

Use the Amazon API Gateaway endpoint to trigger a REST call against the endpoint, an example would be https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/. Make sure you change the “apiId” with your Amazon Gateway API ID endpoint found in the above sections (CloudFormation Output or within the Lambda), see here the start of the parameters that you have to change in the postman-test.json file:

"url": {
"raw": "https://apiId.execute-api.us-east-1.amazonaws.com/Prod/account",
"protocol": "https",

Request Input, fill out and update the values on each of the JSON sections:

{ “UserEmail”: “[email protected]”, “UserName”:“TestUser-Name”, “UserLastname”: “TestUser-LastName”, “UserInput”: “Hi, I would like a new account please!”}

If the test response is SUCCESSFUL, you should see the following JSON as a return:

{"response": "Hi TestUser-Name, what account type do you want? Production or Sandbox?","initial-params": "{\"UserEmail\": \"[email protected]\",\"UserName\":\"TestUser-Name\",\"UserLastname\": \"TestUser-LastName\",\"UserInput\": \"Hi, I would like a new account please!\"}"}

4.5 Test the “confirm” action. To confirm the Account vending request, you can easily execute the /confirm API, which is similar to if you would confirm the action through the e-mail confirmation that you receive via Amazon SNS.

Make sure you change the following sections in Postman (Production-Confirm-API) and use the ApiApproval-apiID that has the /confirm path.

https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/confirm

STEP 4: Slack Integration Example

We will demonstrate you how to integrate with a Slack channel but any other request portal (Jira), Website or App that allows REST API integrations (e.g. Amazon Chime) could be used for this.

5.1 Use the attached YAML slack App manifest file to create a new Slack Application within your Organization. Go to “https://api.slack.com/apps?new_app=1” and choose “Create New App”.

5.2 Choose the “From an app manifest” to create a new Slack App and paste the sample code from the /test folder slack-app-manifest.yml .

Note: Make sure you first overwrite the request_url parameter for your Slack App that will point to the Production API Gateway endpoint.

request_url: https://apiId.execute-api.us-east-1.amazonaws.com/Prod/account"

5.3 Choose to deploy and re-install the Slack App to your workspace and then access the ChatBot Application within your Slack workspace. If everything is successful, you can see a working Serverless ChatBot as shown below.

Slack Example

Conclusion and Cleanup

Conclusion

In this blog post, you have learned how to create a multi-environment CICD pipeline that builds a fully Serverless AWS account vending solution using an AI powered Amazon Lex bot integrated with AWS Control Tower Account Factory. This solution will help you enable standardized account vending on AWS through an easy way by exposing a ChatBot to your AWS consumers coming from various channels. This solution can be extended with AWS ServiceCatalog to allow to launch not just AWS accounts, but almost any AWS Service by using IaC (CloudFormation) templates provided through the CCoE Ops and Architecture teams.

Cleanup

For a proper cleanup, you can just go into AWS CloudFormation and choose the deployed Stacks and choose to “delete Stack”. If you incur issues while deleting, see below troubleshooting solutions for a fix. Also make sure you delete your integration Apps (e.g. Slack) for a full cleanup.

Troubleshooting

An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.
Solution: Make sure you use a distinct name for the S3 bucket used in this project, for the Amazon Lex Bot and the CICD pipeline
When you delete and rollback of the CloudFormation stacks and you get an error (Code: 409; Error Code: BucketNotEmpty).
Solution: Delete the S3 build bucket and its content “delete permanently” and then delete the associated CloudFormation stack that has created the CICD pipeline.

Use Amazon ECS Fargate Spot with CircleCI to deploy and manage applications in a cost-effective way

2021-08-26 Pritam Pal

Post Syndicated from Pritam Pal original https://aws.amazon.com/blogs/devops/deploy-apps-cost-effective-way-with-ecs-fargate-spot-and-circleci/

This post is written by Pritam Pal, Sr EC2 Spot Specialist SA & Dan Kelly, Sr EC2 Spot GTM Specialist

Customers are using Amazon Web Services (AWS) to build CI/CD pipelines and follow DevOps best practices in order to deliver products rapidly and reliably. AWS services simplify infrastructure provisioning and management, application code deployment, software release processes automation, and application and infrastructure performance monitoring. Builders are taking advantage of low-cost, scalable compute with Amazon EC2 Spot Instances, as well as AWS Fargate Spot to build, deploy, and manage microservices or container-based workloads at a discounted price.

Amazon EC2 Spot Instances let you take advantage of unused Amazon Elastic Compute Cloud (Amazon EC2) capacity at steep discounts as compared to on-demand pricing. Fargate Spot is an AWS Fargate capability that can run interruption-tolerant Amazon Elastic Container Service (Amazon ECS) tasks at up to a 70% discount off the Fargate price. Since tasks can still be interrupted, only fault tolerant applications are suitable for Fargate Spot. However, for flexible workloads that can be interrupted, this feature enables significant cost savings over on-demand pricing.

CircleCI provides continuous integration and delivery for any platform, as well as your own infrastructure. CircleCI can automatically trigger low-cost, serverless tasks with AWS Fargate Spot in Amazon ECS. Moreover, CircleCI Orbs are reusable packages of CircleCI configuration that help automate repeated processes, accelerate project setup, and ease third-party tool integration. Currently, over 1,100 organizations are utilizing the CircleCI Amazon ECS Orb to power/run 250,000+ jobs per month.

Customers are utilizing Fargate Spot for a wide variety of workloads, such as Monte Carlo simulations and genomic processing. In this blog, I utilize a python code with the Tensorflow library that can run as a container image in order to train a simple linear model. It runs the training steps in a loop on a data batch and periodically writes checkpoints to S3. If there is a Fargate Spot interruption, then it restores the checkpoint from S3 (when a new Fargate Instance occurs) and continues training. We will deploy this on AWS ECS Fargate Spot for low-cost, serverless task deployment utilizing CircleCI.

Concepts

Before looking at the solution, let’s revisit some of the concepts we’ll be using.

Capacity Providers: Capacity providers let you manage computing capacity for Amazon ECS containers. This allows the application to define its requirements for how it utilizes the capacity. With capacity providers, you can define flexible rules for how containerized workloads run on different compute capacity types and manage the capacity scaling. Furthermore, capacity providers improve the availability, scalability, and cost of running tasks and services on Amazon ECS. In order to run tasks, the default capacity provider strategy will be utilized, or an alternative strategy can be specified if required.

AWS Fargate and AWS Fargate Spot capacity providers don’t need to be created. They are available to all accounts and only need to be associated with a cluster for utilization. When a new cluster is created via the Amazon ECS console, along with the Networking-only cluster template, the FARGATE and FARGATE_SPOT capacity providers are automatically associated with the new cluster.

CircleCI Orbs: Orbs are reusable CircleCI configuration packages that help automate repeated processes, accelerate project setup, and ease third-party tool integration. Orbs can be found in the developer hub on the CircleCI orb registry. Each orb listing has usage examples that can be referenced. Moreover, each orb includes a library of documented components that can be utilized within your config for more advanced purposes. Since the 2.0.0 release, the AWS ECS Orb supports the capacity provider strategy parameter for running tasks allowing you to efficiently run any ECS task against your new or existing clusters via Fargate Spot capacity providers.

Solution overview

Fargate Spot helps cost-optimize services that can handle interruptions like Containerized workloads, CI/CD, or Web services behind a load balancer. When Fargate Spot needs to interrupt a running task, it sends a SIGTERM signal. It is best practice to build applications capable of responding to the signal and shut down gracefully.

This walkthrough will utilize a capacity provider strategy leveraging Fargate and Fargate Spot, which mitigates risk if multiple Fargate Spot tasks get terminated simultaneously. If you’re unfamiliar with Fargate Spot, capacity providers, or capacity provider strategies, read our previous blog about Fargate Spot best practices here.

Prerequisites

Our walkthrough will utilize the following services:

GitHub as a code repository
AWS Fargate/Fargate Spot for running your containers as ECS tasks
CircleCI for demonstrating a CI/CD pipeline. We will utilize CircleCI Cloud Free version, which allows 2,500 free credits/week and can run 1 job at a time.

We will run a Job with CircleCI ECS Orb in order to deploy 4 ECS Tasks on Fargate and Fargate Spot. You should have the following prerequisites:

An AWS account
A GitHub account

Walkthrough

Step 1: Create AWS Keys for Circle CI to utilize.

Head to AWS IAM console, create a new user, i.e., circleci, and select only the Programmatic access checkbox. On the set permission page, select Attach existing policies directly. For the sake of simplicity, we added a managed policy AmazonECS_FullAccess to this user. However, for production workloads, employ a further least-privilege access model. Download the access key file, which will be utilized to connect to CircleCI in the next steps.

Step 2: Create an ECS Cluster, Task definition, and ECS Service

2.1 Open the Amazon ECS console

2.2 From the navigation bar, select the Region to use

2.3 In the navigation pane, choose Clusters

2.4 On the Clusters page, choose Create Cluster

2.5 Create a Networking only Cluster ( Powered by AWS Fargate)

Amazon ECS Create Cluster

This option lets you launch a cluster in your existing VPC to utilize for Fargate tasks. The FARGATE and FARGATE_SPOT capacity providers are automatically associated with the cluster.

2.6 Click on Update Cluster to define a default capacity provider strategy for the cluster, then add FARGATE and FARGATE_SPOT capacity providers each with a weight of 1. This ensures Tasks are divided equally among Capacity providers. Define other ratios for splitting your tasks between Fargate and Fargate Spot tasks, i.e., 1:1, 1:2, or 3:1.

ECS Update Cluster Capacity Providers

2.7 Here we will create a Task Definition by using the Fargate launch type, give it a name, and specify the task Memory and CPU needed to run the task. Feel free to utilize any Fargate task definition. You can use your own code, add the code in a container, or host the container in Docker hub or Amazon ECR. Provide a name and image URI that we copied in the previous step and specify the port mappings. Click Add and then click Create.

We are also showing an example of a python code using the Tensorflow library that can run as a container image in order to train a simple linear model. It runs the training steps in a loop on a batch of data, and it periodically writes checkpoints to S3. Please find the complete code here. Utilize a Dockerfile to create a container from the code.

Sample Docker file to create a container image from the code mentioned above.

FROM ubuntu:18.04
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt EXPOSE 5000 CMD python tensorflow_checkpoint.py

Below is the Code Snippet we are using for Tensorflow to Train and Checkpoint a Training Job.


def train_and_checkpoint(net, manager):
  ckpt.restore(manager.latest_checkpoint).expect_partial()
  if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
  else:
    print("Initializing from scratch.")
  for _ in range(5000):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    if int(ckpt.step) % 10 == 0:
        save_path = manager.save()
        list_of_files = glob.glob('tf_ckpts/*.index')
        latest_file = max(list_of_files, key=os.path.getctime)
        upload_file(latest_file, 'pythontfckpt', object_name=None)
        list_of_files = glob.glob('tf_ckpts/*.data*')
        latest_file = max(list_of_files, key=os.path.getctime)
        upload_file(latest_file, 'pythontfckpt', object_name=None)
        upload_file('tf_ckpts/checkpoint', 'pythontfckpt', object_name=None)

2.8 Next, we will create an ECS Service, which will be used to fetch Cluster information while running the job from CircleCI. In the ECS console, navigate to your Cluster, From Services tab, then click create. Create an ECS service by choosing Cluster default strategy from the Capacity provider strategy dropdown. For the Task Definition field, choose webapp-fargate-task, which is the one we created earlier, enter a service name, set the number of tasks to zero at this point, and then leave everything else as default. Click Next step, select an existing VPC and two or more Subnets, keep everything else default, and create the service.

Step 3: GitHub and CircleCI Configuration

Create a GitHub repository, i.e., circleci-fargate-spot, and then create a .circleci folder and a config file config.yml. If you’re unfamiliar with GitHub or adding a repository, check the user guide here.

For this project, the config.yml file contains the following lines of code that configure and run your deployments.

version: '2.1'
orbs:
  aws-ecs: circleci/[email protected]
  aws-cli: circleci/[email protected]
  orb-tools: circleci/[email protected]
  shellcheck: circleci/[email protected]
  jq: circleci/[email protected]

jobs:  

  test-fargatespot:
      docker:
        - image: cimg/base:stable
      steps:
        - aws-cli/setup
        - jq/install
        - run:
            name: Get cluster info
            command: |
              SERVICES_OBJ=$(aws ecs describe-services --cluster "${ECS_CLUSTER_NAME}" --services "${ECS_SERVICE_NAME}")
              VPC_CONF_OBJ=$(echo $SERVICES_OBJ | jq '.services[].networkConfiguration.awsvpcConfiguration')
              SUBNET_ONE=$(echo "$VPC_CONF_OBJ" |  jq '.subnets[0]')
              SUBNET_TWO=$(echo "$VPC_CONF_OBJ" |  jq '.subnets[1]')
              SECURITY_GROUP_IDS=$(echo "$VPC_CONF_OBJ" |  jq '.securityGroups[0]')
              CLUSTER_NAME=$(echo "$SERVICES_OBJ" |  jq '.services[].clusterArn')
              echo "export SUBNET_ONE=$SUBNET_ONE" >> $BASH_ENV
              echo "export SUBNET_TWO=$SUBNET_TWO" >> $BASH_ENV
              echo "export SECURITY_GROUP_IDS=$SECURITY_GROUP_IDS" >> $BASH_ENV=$SECURITY_GROUP_IDS=$SECURITY_GROUP_IDS" >> $BASH_ENV" >> $BASH_ENV
              echo "export CLUSTER_NAME=$CLUSTER_NAME" >> $BASH_ENV
        - run:
            name: Associate cluster
            command: |
              aws ecs put-cluster-capacity-providers \
                --cluster "${ECS_CLUSTER_NAME}" \
                --capacity-providers FARGATE FARGATE_SPOT  \
                --default-capacity-provider-strategy capacityProvider=FARGATE,weight=1 capacityProvider=FARGATE_SPOT,weight=1\                --region ${AWS_DEFAULT_REGION}
        - aws-ecs/run-task:
              cluster: $CLUSTER_NAME
              capacity-provider-strategy: capacityProvider=FARGATE,weight=1 capacityProvider=FARGATE_SPOT,weight=1
              launch-type: ""
              task-definition: webapp-fargate-task
              subnet-ids: '$SUBNET_ONE, $SUBNET_TWO'
              security-group-ids: $SECURITY_GROUP_IDS
              assign-public-ip : ENABLED
              count: 4

workflows:
  run-task:
    jobs:
      - test-fargatespot

Now, Create a CircleCI account. Choose Login with GitHub. Once you’re logged in from the CircleCI dashboard, click Add Project and add the project circleci-fargate-spot from the list shown.

When working with CircleCI Orbs, you will need the config.yml file and environment variables under Project Settings.

The config file utilizes CircleCI version 2.1 and various Orbs, i.e., AWS-ECS, AWS-CLI, and JQ. We will use a job test-fargatespot, which uses a Docker image, and we will setup the environment. In config.yml we are using the jq tool to parse JSON and fetch the ECS cluster information like VPC config, Subnets, and Security Groups needed to run an ECS task. As we are utilizing the capacity-provider-strategy, we will set the launch type parameter to an empty string.

In order to run a task, we will demonstrate how to override the default Capacity Provider strategy with Fargate & Fargate Spot, both with a weight of 1, and to divide tasks equally among Fargate & Fargate Spot. In our example, we are running 4 tasks, so 2 should run on Fargate and 2 on Fargate Spot.

Parameters like ECS_SERVICE_NAME, ECS_CLUSTER_NAME and other AWS access specific details are added securely under Project Settings and can be utilized by other jobs running within the project.

Add the following environment variables under Project Settings

- AWS_ACCESS_KEY_ID – From Step 1
- AWS_SECRET_ACCESS_KEY – From Step 1
- AWS_DEFAULT_REGION – i.e. : – us-west-2
- ECS_CLUSTER_NAME – From Step 2
- ECS_SERVICE_NAME – From Step 2
- SECURITY_GROUP_IDS – Security Group that will be used to run the task

Circle CI Environment Variables

Step 4: Run Job

Now in the CircleCI console, navigate to your project, choose the branch, and click Edit Config to verify that config.xml is correctly populated. Check for the ribbon at the bottom. A green ribbon means that the config file is valid and ready to run. Click Commit & Run from the top-right menu.

Click build Status to check its progress as it runs.

CircleCI Project Dashboard

A successful build should look like the one below. Expand each section to see the output.

CircleCI Job Configuration

Return to the ECS console, go to the Tasks Tab, and check that 4 new tasks are running. Click each task for the Capacity provider details. Two tasks should have run with FARGATE_SPOT as a Capacity provider, and two should have run with FARGATE.

Congratulations!

You have successfully deployed ECS tasks utilizing CircleCI on AWS Fargate and Fargate Spot. If you have used any sample web applications, then please use the public IP address to see the page. If you have used the sample code that we provided, then you should see Tensorflow training jobs running on Fargate instances. If there is a Fargate Spot interruption, then it restores the checkpoint from S3 when a new Fargate Instance comes up and continues training.

Cleaning up

In order to avoid incurring future charges, delete the resources utilized in the walkthrough. Go to the ECS console and Task tab.

Delete any running Tasks.
Delete ECS cluster.
Delete the circleci user from IAM console.

Cost analysis in Cost Explorer

In order to demonstrate a cost breakdown between the tasks running on Fargate and Fargate Spot, we left the tasks running for a day. Then, we utilized Cost Explorer with the following filters and groups in order discover the savings by running Fargate Spot.

Apply a filter on Service for ECS on the right-side filter, set Group by to Usage Type, and change the time period to the specific day.

Cost analysis in Cost Explorer

The cost breakdown demonstrates how Fargate Spot usage (indicated by “SpotUsage”) was significantly less expensive than non-Spot Fargate usage. Current Fargate Spot Pricing can be found here.

Conclusion

In this blog post, we have demonstrated how to utilize CircleCI to deploy and manage ECS tasks and run applications in a cost-effective serverless approach by using Fargate Spot.

Author bio

	Pritam is a Sr. Specialist Solutions Architect on the EC2 Spot team. For the last 15 years, he evangelized DevOps and Cloud adoption across industries and verticals. He likes to deep dive and find solutions to everyday problems.
	Dan is a Sr. Spot GTM Specialist on the EC2 Spot Team. He works closely with Amazon Partners to ensure that their customers can optimize and modernize their compute with EC2 Spot.

Deploy data lake ETL jobs using CDK Pipelines

2021-07-30 Ravi Itha

Post Syndicated from Ravi Itha original https://aws.amazon.com/blogs/devops/deploying-data-lake-etl-jobs-using-cdk-pipelines/

Many organizations are building data lakes on AWS, which provides the most secure, scalable, comprehensive, and cost-effective portfolio of services. Like any application development project, a data lake must answer a fundamental question: “What is the DevOps strategy?” Defining a DevOps strategy for a data lake requires extensive planning and multiple teams. This typically requires multiple development and test cycles before maturing enough to support a data lake in a production environment. If an organization doesn’t have the right people, resources, and processes in place, this can quickly become daunting.

What if your data engineering team uses basic building blocks to encapsulate data lake infrastructure and data processing jobs? This is where CDK Pipelines brings the full benefit of infrastructure as code (IaC). CDK Pipelines is a high-level construct library within the AWS Cloud Development Kit (AWS CDK) that makes it easy to set up a continuous deployment pipeline for your AWS CDK applications. The AWS CDK provides essential automation for your release pipelines so that your development and operations team remain agile and focus on developing and delivering applications on the data lake.

In this post, we discuss a centralized deployment solution utilizing CDK Pipelines for data lakes. This implements a DevOps-driven data lake that delivers benefits such as continuous delivery of data lake infrastructure, data processing, and analytical jobs through a configuration-driven multi-account deployment strategy. Let’s dive in!

Data lakes on AWS

A data lake is a centralized repository where you can store all of your structured and unstructured data at any scale. Store your data as is, without having to first structure it, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning in order to guide better decisions. To further explore data lakes, refer to What is a data lake?

We design a data lake with the following elements:

Secure data storage
Data cataloging in a central repository
Data movement
Data analysis

The following figure represents our data lake.

Data Lake on AWS

We use three Amazon Simple Storage Service (Amazon S3) buckets:

raw – Stores the input data in its original format
conformed – Stores the data that meets the data lake quality requirements
purpose-built – Stores the data that is ready for consumption by applications or data lake consumers

The data lake has a producer where we ingest data into the raw bucket at periodic intervals. We utilize the following tools: AWS Glue processes and analyzes the data. AWS Glue Data Catalog persists metadata in a central repository. AWS Lambda and AWS Step Functions schedule and orchestrate AWS Glue extract, transform, and load (ETL) jobs. Amazon Athena is used for interactive queries and analysis. Finally, we engage various AWS services for logging, monitoring, security, authentication, authorization, alerting, and notification.

A common data lake practice is to have multiple environments such as dev, test, and production. Applying the IaC principle for data lakes brings the benefit of consistent and repeatable runs across multiple environments, self-documenting infrastructure, and greater flexibility with resource management. The AWS CDK offers high-level constructs for use with all of our data lake resources. This simplifies usage and streamlines implementation.

Before exploring the implementation, let’s gain further scope of how we utilize our data lake.

The solution

Our goal is to implement a CI/CD solution that automates the provisioning of data lake infrastructure resources and deploys ETL jobs interactively. We accomplish this as follows: 1) applying separation of concerns (SoC) design principle to data lake infrastructure and ETL jobs via dedicated source code repositories, 2) a centralized deployment model utilizing CDK pipelines, and 3) AWS CDK enabled ETL pipelines from the start.

Data lake infrastructure

Our data lake infrastructure provisioning includes Amazon S3 buckets, S3 bucket policies, AWS Key Management Service (KMS) encryption keys, Amazon Virtual Private Cloud (Amazon VPC), subnets, route tables, security groups, VPC endpoints, and secrets in AWS Secrets Manager. The following diagram illustrates this.

Data Lake Infrastructure

Data lake ETL jobs

For our ETL jobs, we process New York City TLC Trip Record Data. The following figure displays our ETL process, wherein we run two ETL jobs within a Step Functions state machine.

AWS Glue ETL Jobs

Here are a few important details:

A file server uploads files to the S3 raw bucket of the data lake. The file server is a data producer and source for the data lake. We assume that the data is pushed to the raw bucket.
Amazon S3 triggers an event notification to the Lambda function.
The function inserts an item in the Amazon DynamoDB table in order to track the file processing state. The first state written indicates the AWS Step Function start.
The function starts the state machine.
The state machine runs an AWS Glue job (Apache Spark).
The job processes input data from the raw zone to the data lake conformed zone. The job also converts CSV input data to Parquet formatted data.
The job updates the Data Catalog table with the metadata of the conformed Parquet file.
A second AWS Glue job (Apache Spark) processes the input data from the conformed zone to the purpose-built zone of the data lake.
The job fetches ETL transformation rules from the Amazon S3 code bucket and transforms the input data.
The job stores the result in Parquet format in the purpose-built zone.
The job updates the Data Catalog table with the metadata of the purpose-built Parquet file.
The job updates the DynamoDB table and updates the job status to completed.
An Amazon Simple Notification Service (Amazon SNS) notification is sent to subscribers that states the job is complete.
Data engineers or analysts can now analyze data via Athena.

We will discuss data formats, Glue jobs, ETL transformation logics, data cataloging, auditing, notification, orchestration, and data analysis in more detail in AWS CDK Pipelines for Data Lake ETL Deployment GitHub repository. This will be discussed in the subsequent section.

Centralized deployment

Now that we have data lake infrastructure and ETL jobs ready, let’s define our deployment model. This model is based on the following design principles:

A dedicated AWS account to run CDK pipelines.
One or more AWS accounts into which the data lake is deployed.
The data lake infrastructure has a dedicated source code repository. Typically, data lake infrastructure is a one-time deployment and rarely evolves. Therefore, a dedicated code repository provides a landing zone for your data lake.
Each ETL job has a dedicated source code repository. Each ETL job may have unique AWS service, orchestration, and configuration requirements. Therefore, a dedicated source code repository will help you more flexibly build, deploy, and maintain ETL jobs.

We organize our source code repo into three branches: dev (main), test, and prod. In the deployment account, we manage three separate CDK Pipelines and each pipeline is sourced from a dedicated branch. Here we choose a branch-based software development method in order to demonstrate the strategy in more complex scenarios where integration testing and validation layers require human intervention. As well, these may not immediately follow with a corresponding release or deployment due to their manual nature. This facilitates the propagation of changes through environments without blocking independent development priorities. We accomplish this by isolating resources across environments in the central deployment account, allowing for the independent management of each environment, and avoiding cross-contamination during each pipeline’s self-mutating updates. The following diagram illustrates this method.

Centralized deployment

Note: This centralized deployment strategy can be adopted for trunk-based software development with minimal solution modification.

Deploying data lake ETL jobs

The following figure illustrates how we utilize CDK Pipelines to deploy data lake infrastructure and ETL jobs from a central deployment account. This model follows standard nomenclature from the AWS CDK. Each repository represents a cloud infrastructure code definition. This includes the pipelines construct definition. Pipelines have one or more actions, such as cloning the source code (source action) and synthesizing the stack into an AWS CloudFormation template (synth action). Each pipeline has one or more stages, such as testing and deploying. In an AWS CDK app context, the pipelines construct is a stack like any other stack. Therefore, when the AWS CDK app is deployed, a new pipeline is created in AWS CodePipeline.

This provides incredible flexibility regarding DevOps. In other words, as a developer with an understanding of AWS CDK APIs, you can harness the power and scalability of AWS services such as CodePipeline, AWS CodeBuild, and AWS CloudFormation.

Deploying data lake ETL jobs using CDK Pipelines

Here are a few important details:

The DevOps administrator checks in the code to the repository.
The DevOps administrator (with elevated access) facilitates a one-time manual deployment on a target environment. Elevated access includes administrative privileges on the central deployment account and target AWS environments.
CodePipeline periodically listens to commit events on the source code repositories. This is the self-mutating nature of CodePipeline. It’s configured to work with and can update itself according to the provided definition.
Code changes made to the main repo branch are automatically deployed to the data lake dev environment.
Code changes to the repo test branch are automatically deployed to the test environment.
Code changes to the repo prod branch are automatically deployed to the prod environment.

CDK Pipelines starter kits for data lakes

Want to get going quickly with CDK Pipelines for your data lake? Start by cloning our two GitHub repositories. Here is a summary:

AWS CDK Pipelines for Data Lake Infrastructure Deployment

This repository contains the following reusable resources:

CDK Application
CDK Pipelines stack
CDK Pipelines deploy stage
Amazon VPC stack
Amazon S3 stack

It also contains the following automation scripts:

AWS environments configuration
Deployment account bootstrapping
Target account bootstrapping
Account secrets configuration (e.g., GitHub access tokens)

AWS CDK Pipelines for Data Lake ETL Deployment

This repository contains the following reusable resources:

CDK Application
CDK Pipelines stack
CDK Pipelines deploy stage
Amazon DynamoDB stack
AWS Glue stack
AWS Step Functions stack

It also contains the following:

AWS Lambda scripts
AWS Glue scripts
AWS Step Functions State machine script

Advantages

This section summarizes some of the advantages offered by this solution.

Scalable and centralized deployment model

We utilize a scalable and centralized deployment model to deliver end-to-end automation. This allows DevOps and data engineers to use the single responsibility principal while maintaining precise control over the deployment strategy and code quality. The model can readily be expanded to more accounts, and the pipelines are responsive to custom controls within each environment, such as a production approval layer.

Configuration-driven deployment

Configuration in the source code and AWS Secrets Manager allow deployments to utilize targeted values that are declared globally in a single location. This provides consistent management of global configurations and dependencies such as resource names, AWS account Ids, Regions, and VPC CIDR ranges. Similarly, the CDK Pipelines export outputs from CloudFormation stacks for later consumption via other resources.

Repeatable and consistent deployment of new ETL jobs

Continuous integration and continuous delivery (CI/CD) pipelines allow teams to deploy to production more frequently. Code changes can be safely and securely propagated through environments and released for deployment. This allows rapid iteration on data processing jobs, and these jobs can be changed in isolation from pipeline changes, resulting in reliable workflows.

Cleaning up

You may delete the resources provisioned by utilizing the starter kits. You can do this by running the cdk destroy command using AWS CDK Toolkit. For detailed instructions, refer to the Clean up sections in the starter kit README files.

Conclusion

In this post, we showed how to utilize CDK Pipelines to deploy infrastructure and data processing ETL jobs of your data lake in dev, test, and production AWS environments. We provided two GitHub repositories for you to test and realize the full benefits of this solution first hand. We encourage you to fork the repositories, bring your ETL scripts, bootstrap your accounts, configure account parameters, and continuously delivery your data lake ETL jobs.

Let’s stay in touch via the GitHub—AWS CDK Pipelines for Data Lake Infrastructure Deployment and AWS CDK Pipelines for Data Lake ETL Deployment.

About the authors

Ravi Itha

Ravi Itha is a Sr. Data Architect at AWS. He works with customers to design and implement Data Lakes, Analytics, and Microservices on AWS. He is an open-source committer and has published more than a dozen solutions using AWS CDK, AWS Glue, AWS Lambda, AWS Step Functions, Amazon ECS, Amazon MQ, Amazon SQS, Amazon Kinesis Data Streams, and Amazon Kinesis Data Analytics for Apache Flink. His solutions can be found at his GitHub handle. Outside of work, he is passionate about books, cooking, movies, and yoga.

Isaiah Grant

Isaiah Grant is a Cloud Consultant at 2nd Watch. His primary function is to design architectures and build cloud-based applications and services. He leads customer engagements and helps customers with enterprise cloud adoptions. In his free time, he is engaged in local community initiatives and enjoys being outdoors with his family.

Zahid Ali

Zahid Ali is a Data Architect at AWS. He helps customers design, develop, and implement data warehouse and Data Lake solutions on AWS. Outside of work he enjoys playing tennis, spending time outdoors, and traveling.

Blue/Green deployment with AWS Developer tools on Amazon EC2 using Amazon EFS to host application source code

2021-07-28 Rakesh Singh

Post Syndicated from Rakesh Singh original https://aws.amazon.com/blogs/devops/blue-green-deployment-with-aws-developer-tools-on-amazon-ec2-using-amazon-efs-to-host-application-source-code/

Many organizations building modern applications require a shared and persistent storage layer for hosting and deploying data-intensive enterprise applications, such as content management systems, media and entertainment, distributed applications like machine learning training, etc. These applications demand a centralized file share that scales to petabytes without disrupting running applications and remains concurrently accessible from potentially thousands of Amazon EC2 instances.

Simultaneously, customers want to automate the end-to-end deployment workflow and leverage continuous methodologies utilizing AWS developer tools services for performing a blue/green deployment with zero downtime. A blue/green deployment is a deployment strategy wherein you create two separate, but identical environments. One environment (blue) is running the current application version, and one environment (green) is running the new application version. The blue/green deployment strategy increases application availability by generally isolating the two application environments and ensuring that spinning up a parallel green environment won’t affect the blue environment resources. This isolation reduces deployment risk by simplifying the rollback process if a deployment fails.

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, and fully-managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It scales on demand, thereby eliminating the need to provision and manage capacity in order to accommodate growth. Utilize Amazon EFS to create a shared directory that stores and serves code and content for numerous applications. Your application can treat a mounted Amazon EFS volume like local storage. This means you don’t have to deploy your application code every time the environment scales up to multiple instances to distribute load.

In this blog post, I will guide you through an automated process to deploy a sample web application on Amazon EC2 instances utilizing Amazon EFS mount to host application source code, and utilizing a blue/green deployment with AWS code suite services in order to deploy the application source code with no downtime.

How this solution works

This blog post includes a CloudFormation template to provision all of the resources needed for this solution. The CloudFormation stack deploys a Hello World application on Amazon Linux 2 EC2 Instances running behind an Application Load Balancer and utilizes Amazon EFS mount point to store the application content. The AWS CodePipeline project utilizes AWS CodeCommit as the version control, AWS CodeBuild for installing dependencies and creating artifacts, and AWS CodeDeploy to conduct deployment on EC2 instances running in an Amazon EC2 Auto Scaling group.

Figure 1 below illustrates our solution architecture.

Figure 1: Sample solution architecture

The event flow in Figure 1 is as follows:

A developer commits code changes from their local repo to the CodeCommit repository. The commit triggers CodePipeline execution.
CodeBuild execution begins to compile source code, install dependencies, run custom commands, and create deployment artifact as per the instructions in the Build specification reference file.
During the build phase, CodeBuild copies the source-code artifact to Amazon EFS file system and maintains two different directories for current (green) and new (blue) deployments.
After successfully completing the build step, CodeDeploy deployment kicks in to conduct a Blue/Green deployment to a new Auto Scaling Group.
During the deployment phase, CodeDeploy mounts the EFS file system on new EC2 instances as per the CodeDeploy AppSpec file reference and conducts other deployment activities.
After successful deployment, a Lambda function triggers in order to store a deployment environment parameter in Systems Manager parameter store. The parameter stores the current EFS mount name that the application utilizes.
The AWS Lambda function updates the parameter value during every successful deployment with the current EFS location.

Prerequisites

For this walkthrough, the following are required:

An AWS account
Access to an AWS account with administrator or PowerUser (or equivalent) AWS Identity and Access Management(IAM) role policies attached
Git Command Line installed and configured in your local environment

Deploy the solution

Once you’ve assembled the prerequisites, download or clone the GitHub repo and store the files on your local machine. Utilize the commands below to clone the repo:

mkdir -p ~/blue-green-sample/
cd ~/blue-green-sample/
git clone https://github.com/aws-samples/blue-green-deployment-pipeline-for-efs

Once completed, utilize the following steps to deploy the solution in your AWS account:

Create a private Amazon Simple Storage Service (Amazon S3) bucket by using this documentation

Figure 2: AWS S3 console view when creating a bucket
Upload the cloned or downloaded GitHub repo files to the root of the S3 bucket. the S3 bucket objects structure should look similar to Figure 3:

Figure 3: AWS S3 bucket object structure
Go to the S3 bucket and select the template name solution-stack-template.yml, and then copy the object URL.
Open the CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
Select Amazon S3 URL as the template source, paste the object URL that you copied in Step 3, and then choose Next.
On the Specify stack details page, enter a name for the stack and provide the following input parameter. Modify the default values for other parameters in order to customize the solution for your environment. You can leave everything as default for this walkthrough.

ArtifactBucket– The name of the S3 bucket that you created in the first step of the solution deployment. This is a mandatory parameter with no default value.

Figure 4: Defining the stack name and input parameters for the CloudFormation stack

Choose Next.
On the Options page, keep the default values and then choose Next.
On the Review page, confirm the details, acknowledge that CloudFormation might create IAM resources with custom names, and then choose Create Stack.
Once the stack creation is marked as CREATE_COMPLETE, the following resources are created:

A virtual private cloud (VPC) configured with two public and two private subnets.
NAT Gateway, an EIP address, and an Internet Gateway.
Route tables for private and public subnets.
Auto Scaling Group with a single EC2 Instance.
Application Load Balancer and a Target Group.
Three security groups—one each for ALB, web servers, and EFS file system.
Amazon EFS file system with a mount target for each Availability Zone.
CodePipeline project with CodeCommit repository, CodeBuild, and CodeDeploy resources.
SSM parameter to store the environment current deployment status.
Lambda function to update the SSM parameter for every successful pipeline execution.
Required IAM Roles and policies.

Note: It may take anywhere from 10-20 minutes to complete the stack creation.

Test the solution

Now that the solution stack is deployed, follow the steps below to test the solution:

Validate CodePipeline execution status

After successfully creating the CloudFormation stack, a CodePipeline execution automatically triggers to deploy the default application code version from the CodeCommit repository.

In the AWS console, choose Services and then CloudFormation. Select your stack name. On the stack Outputs tab, look for the CodePipelineURL key and click on the URL.
Validate that all steps have successfully completed. For a successful CodePipeline execution, you should see something like Figure 5. Wait for the execution to complete in case it is still in progress.

Figure 5: CodePipeline console showing execution status of all stages

Validate the Website URL

After completing the pipeline execution, hit the website URL on a browser to check if it’s working.

On the stack Outputs tab, look for the WebsiteURL key and click on the URL.
For a successful deployment, it should open a default page similar to Figure 6.

Figure 6: Sample “Hello World” application (Green deployment)

Validate the EFS share

After the website deployed successfully, we will get into the application server and validate the EFS mount point and the application source code directory.

Open the Amazon EC2 console, and then choose Instances in the left navigation pane.
Select the instance named bg-sample and choose
For Connection method, choose Session Manager, and then choose connect

After the connection is made, run the following bash commands to validate the EFS mount and the deployed content. Figure 7 shows a sample output from running the bash commands.

sudo df –h | grep efs
ls –la /efs/green
ls –la /var/www/

Figure 7: Sample output from the bash command (Green deployment)

Deploy a new revision of the application code

After verifying the application status and the deployed code on the EFS share, commit some changes to the CodeCommit repository in order to trigger a new deployment.

On the stack Outputs tab, look for the CodeCommitURL key and click on the corresponding URL.
Click on the file html.
Click on
Uncomment line 9 and comment line 10, so that the new lines look like those below after the changes:

background-color: #0188cc; 
#background-color: #90ee90;

Add Author name, Email address, and then choose Commit changes.

After you commit the code, the CodePipeline triggers and executes Source, Build, Deploy, and Lambda stages. Once the execution completes, hit the Website URL and you should see a new page like Figure 8.

Figure 8: New Application version (Blue deployment)

On the EFS side, the application directory on the new EC2 instance now points to /efs/blue as shown in Figure 9.

Figure 9: Sample output from the bash command (Blue deployment)

Solution review

Let’s review the pipeline stages details and what happens during the Blue/Green deployment:

1) Build stage

For this sample application, the CodeBuild project is configured to mount the EFS file system and utilize the buildspec.yml file present in the source code root directory to run the build. Following is the sample build spec utilized in this solution:

version: 0.2
phases:
  install:
    runtime-versions:
      php: latest   
  build:
    commands:
      - current_deployment=$(aws ssm get-parameter --name $SSM_PARAMETER --query "Parameter.Value" --region $REGION --output text)
      - echo $current_deployment
      - echo $SSM_PARAMETER
      - echo $EFS_ID $REGION
      - if [[ "$current_deployment" == "null" ]]; then echo "this is the first GREEN deployment for this project" ; dir='/efs/green' ; fi
      - if [[ "$current_deployment" == "green" ]]; then dir='/efs/blue' ; else dir='/efs/green' ; fi
      - if [ ! -d $dir ]; then  mkdir $dir >/dev/null 2>&1 ; fi
      - echo $dir
      - rsync -ar $CODEBUILD_SRC_DIR/ $dir/
artifacts:
  files:
      - '**/*'

During the build job, the following activities occur:

Installs latest php runtime version.
Reads the SSM parameter value in order to know the current deployment and decide which directory to utilize. The SSM parameter value flips between green and blue for every successful deployment.
Synchronizes the latest source code to the EFS mount point.
Creates artifacts to be utilized in subsequent stages.

Note: Utilize the default buildspec.yml as a reference and customize it further as per your requirement. See this link for more examples.

2) Deploy Stage

The solution is utilizing CodeDeploy blue/green deployment type for EC2/On-premises. The deployment environment is configured to provision a new EC2 Auto Scaling group for every new deployment in order to deploy the new application revision. CodeDeploy creates the new Auto Scaling group by copying the current one. See this link for more details on blue/green deployment configuration with CodeDeploy. During each deployment event, CodeDeploy utilizes the appspec.yml file to run the deployment steps as per the defined life cycle hooks. Following is the sample AppSpec file utilized in this solution.

version: 0.0
os: linux
hooks:
  BeforeInstall:
    - location: scripts/install_dependencies
      timeout: 180
      runas: root
  AfterInstall:
    - location: scripts/app_deployment
      timeout: 180
      runas: root
  BeforeAllowTraffic :
     - location: scripts/check_app_status
       timeout: 180
       runas: root

Note: The scripts mentioned in the AppSpec file are available in the scripts directory of the CodeCommit repository. Utilize these sample scripts as a reference and modify as per your requirement.

For this sample, the following steps are conducted during a deployment:

BeforeInstall:
- Installs required packages on the EC2 instance.
- Mounts the EFS file system.
- Creates a symbolic link to point the apache home directory /var/www/html to the appropriate EFS mount point. It also ensures that the new application version deploys to a different EFS directory without affecting the current running application.

AfterInstall:
- Stops apache web server.
- Fetches current EFS directory name from Systems Manager.
- Runs some clean up commands.
- Restarts apache web server.

BeforeAllowTraffic:
- Checks application status if running fine.
- Exits the deployment with error if the app returns a non 200 HTTP status code.

3) Lambda Stage

After completing the deploy stage, CodePipeline triggers a Lambda function in order to update the SSM parameter value with the updated EFS directory name. This parameter value alternates between “blue” and “green” to help CodePipeline identify the right EFS file system path during the next deployment.

CodeDeploy Blue/Green deployment

Let’s review the sequence of events flow during the CodeDeploy deployment:

CodeDeploy creates a new Auto Scaling group by copying the original one.
Provisions a replacement EC2 instance in the new Auto Scaling Group.
Conducts the deployment on the new instance as per the instructions in the yml file.
Sets up health checks and redirects traffic to the new instance.
Terminates the original instance along with the Auto Scaling Group.
After completing the deployment, it should appear as shown in Figure 10.

AWS CodeDeploy console view of a Blue/Green CodeDeploy deployment on Ec2

Figure 10: AWS console view of a Blue/Green CodeDeploy deployment on Ec2

Troubleshooting

To troubleshoot any service-related issues, see the following links:

More information

Now that you have tested the solution, here are some additional points worth noting:

The sample template and code utilized in this blog can work in any AWS region and are mainly intended for demonstration purposes. Utilize the sample as a reference and modify it further as per your requirement.
This solution works with single account, Region, and VPC combination.
For this sample, we have utilized AWS CodeCommit as version control, but you can also utilize any other source supported by AWS CodePipeline like Bitbucket, GitHub, or GitHub Enterprise Server

Clean up

Follow these steps to delete the components and avoid any future incurring charges:

Open the AWS CloudFormation console.
On the Stacks page in the CloudFormation console, select the stack that you created for this blog post. The stack must be currently running.
In the stack details pane, choose Delete.
Select Delete stack when prompted.
Empty and delete the S3 bucket created during deployment step 1.

Conclusion

In this blog post, you learned how to set up a complete CI/CD pipeline for conducting a blue/green deployment on EC2 instances utilizing Amazon EFS file share as mount point to host application source code. The EFS share will be the central location hosting your application content, and it will help reduce your overall deployment time by eliminating the need for deploying a new revision on every EC2 instance local storage. It also helps to preserve any dynamically generated content when the life of an EC2 instance ends.

Author bio

Rakesh Singh

Rakesh is a Senior Technical Account Manager at Amazon. He loves automation and enjoys working directly with customers to solve complex technical issues and provide architectural guidance. Outside of work, he enjoys playing soccer, singing karaoke, and watching thriller movies.

Enforcing AWS CloudFormation scanning in CI/CD Pipelines at scale using Trend Micro Cloud One Conformity

2021-07-09 Chris Dorrington

Post Syndicated from Chris Dorrington original https://aws.amazon.com/blogs/devops/cloudformation-scanning-cicd-pipeline-cloud-conformity/

Integrating AWS CloudFormation template scanning into CI/CD pipelines is a great way to catch security infringements before application deployment. However, implementing and enforcing this in a multi team, multi account environment can present some challenges, especially when the scanning tools used require external API access.

This blog will discuss those challenges and offer a solution using Trend Micro Cloud One Conformity (formerly Cloud Conformity) as the worked example. Accompanying this blog is the end to end sample solution and detailed install steps which can be found on GitHub here.

We will explore explore the following topics in detail:

When to detect security vulnerabilities
- Where can template scanning be enforced?
Managing API Keys for accessing third party APIs
- How can keys be obtained and distributed between teams?
- How easy is it to rotate keys with multiple teams relying upon them?
Viewing the results easily
- How do teams easily view the results of any scan performed?
Solution maintainability
- How can a fix or update be rolled out?
- How easy is it to change scanner provider? (i.e. from Cloud Conformity to in house tool)
Enforcing the template validation
- How to prevent teams from circumventing the checks?
Managing exceptions to the rules
- How can the teams proceed with deployment if there is a valid reason for a check to fail?

When to detect security vulnerabilities

During the DevOps life-cycle, there are multiple opportunities to test cloud applications for best practice violations when it comes to security. The Shift-left approach is to move testing to as far left in the life-cycle, so as to catch bugs as early as possible. It is much easier and less costly to fix on a local developer machine than it is to patch in production.

Diagram showing Shift-left approach

Figure 1 – depicting the stages that an app will pass through before being deployed into an AWS account

At the very left of the cycle is where developers perform the traditional software testing responsibilities (such as unit tests), With cloud applications, there is also a responsibility at this stage to ensure there are no AWS security, configuration, or compliance vulnerabilities. Developers and subsequent peer reviewers looking at the code can do this by eye, but in this way it is hard to catch every piece of bad code or misconfigured resource.

For example, you might define an AWS Lambda function that contains an access policy making it accessible from the world, but this can be hard to spot when coding or peer review. Once deployed, potential security risks are now live. Without proper monitoring, these misconfigurations can go undetected, with potentially dire consequences if exploited by a bad actor.

There are a number of tools and SaaS offerings on the market which can scan AWS CloudFormation templates and detect infringements against security best practices, such as Stelligent’s cfn_nag, AWS CloudFormation Guard, and Trend Micro Cloud One Conformity. These can all be run from the command line on a developer’s machine, inside the IDE or during a git commit hook. These options are discussed in detail in Using Shift-Left to Find Vulnerabilities Before Deployment with Trend Micro Template Scanner.

Whilst this is the most left the testing can be moved, it is hard to enforce it this early on in the development process. Mandating that scan commands be integrated into git commit hooks or IDE tools can significantly increase the commit time and quickly become frustrating for the developer. Because they are responsible for creating these hooks or installing IDE extensions, you cannot guarantee that a template scan is performed before deployment, because the developer could easily turn off the scans or not install the tools in the first place.

Another consideration for very-left testing of templates is that when applications are written using AWS CDK or AWS Serverless Application Model (SAM), the actual AWS CloudFormation template that is submitted to AWS isn’t available in source control; it’s created during the build or package stage. Therefore, moving template scanning as far to the left is just not possible in these situations. Developers have to run a command such as cdk synth or sam package to obtain the final AWS CloudFormation templates.

If we now look at the far right of Figure 1, when an application has been deployed, real time monitoring of the account can pick up security issues very quickly. Conformity performs excellently in this area by providing central visibility and real-time monitoring of your cloud infrastructure with a single dashboard. Accounts are checked against over 400 best practices, which allows you to find and remediate non-compliant resources. This real time alerting is fast – you can be assured of an email stating non-compliance in no time at all! However, remediation does takes time. Following the correct process, a fix to code will need to go through the CI/CD pipeline again before a patch is deployed. Relying on account scanning only at the far right is sub-optimal.

The best place to scan templates is at the most left of the enforceable part of the process – inside the CI/CD pipeline. Conformity provides their Template Scanner API for this exact purpose. Templates can be submitted to the API, and the same Conformity checks that are being performed in real time on the account are run against the submitted AWS CloudFormation template. When integrated programmatically into a build, failing checks can prevent a deployment from occurring.

Whilst it may seem a simple task to incorporate the Template Scanner API call into a CI/CD pipeline, there are many considerations for doing this successfully in an enterprise environment. The remainder of this blog will address each consideration in detail, and the accompanying GitHub repo provides a working sample solution to use as a base in your own organization.

View failing checks as AWS CodeBuild test reports

Treating failing Conformity checks the same as unit test failures within the build will make the process feel natural to the developers. A failing unit test will break the build, and so will a failing Conformity check.

AWS CodeBuild provides test reporting for common unit test frameworks, such as NUnit, JUnit, and Cucumber. This allows developers to easily and very visually see what failing tests have occurred within their builds, allowing for quicker remediation than having to trawl through test log files. This same principle can be applied to failing Conformity checks—this allows developers to quickly see what checks have failed, rather than looking into AWS CodeBuild logs. However, the AWS CodeBuild test reporting feature doesn’t natively support the JSON schema that the Conformity Template Scanner API returns. Instead, you need custom code to turn the Conformity response into a usable format. Later in this blog we will explore how the conversion occurs.

Cloud conformity failed checks displayed as CodeBuild Reports

Figure 2 – Cloud Conformity failed checks appearing as failed test cases in AWS CodeBuild reports

Enterprise speed bumps

Teams wishing to use template scanning as part of their AWS CodePipeline currently need to create an AWS CodeBuild project that calls the external API, and then performs the custom translation code. If placed inside a buildspec file, it can easily become bloated with many lines of code, leading to maintainability issues arising as copies of the same buildspec file are distributed across teams and accounts. Additionally, third-party APIs such as Conformity are often authorized by an API key. In some enterprises, not all teams have access to the Conformity console, further compounding the problem for API key management.

Below are some factors to consider when implementing template scanning in the enterprise:

How can keys be obtained and distributed between teams?
How easy is it to rotate keys when multiple teams rely upon them?
How can a fix or update be rolled out?
How easy is it to change scanner provider? (i.e. From Cloud Conformity to in house tool)

Overcome scaling issues, use a centralized Validation API

An approach to overcoming these issues is to create a single AWS Lambda function fronted by Amazon API Gateway within your organization that runs the call to the Template Scanner API, and performs the transform of results into a format usable by AWS CodeBuild reports. A good place to host this API is within the Cloud Ops team account or similar shared services account. This way, you only need to issue one API key (stored in AWS Secrets Manager) and it’s not available for viewing by any developers. Maintainability for the code performing the Template Scanner API calls is also very easy, because it resides in one location only. Key rotation is now simple (due to only one key in one location requiring an update) and can be automated through AWS Secrets Manager

The following diagram illustrates a typical setup of a multi-account, multi-dev team scenario in which a team’s AWS CodePipeline uses a centralized Validation API to call Conformity’s Template Scanner.

architecture diagram central api for cloud conformity template scanning

Figure 3 – Example of an AWS CodePipeline utilizing a centralized Validation API to call Conformity’s Template Scanner

Providing a wrapper API around the Conformity Template Scanner API encapsulates the code required to create the CodeBuild reports. Enabling template scanning within teams’ CI/CD pipelines now requires only a small piece of code within their CodeBuild buildspec file. It performs the following three actions:

Post the AWS CloudFormation templates to the centralized Validation API
Write the results to file (which are already in a format readable by CodeBuild test reports)
Stop the build if it detects failed checks within the results

The centralized Validation API in the shared services account can be hosted with a private API in Amazon API Gateway, fronted by a VPC endpoint. Using a private API denies any public access but does allow access from any internal address allowed by the VPC endpoint security group and endpoint policy. The developer teams can run their AWS CodeBuild validation phase within a VPC, thereby giving it access to the VPC endpoint.

A working example of the code required, along with an AWS CodeBuild buildspec file, is provided in the GitHub repository

Converting 3rd party tool results to CodeBuild Report format

With a centralized API, there is now only one place where the conversion code needs to reside (as opposed to copies embedded in each teams’ CodePipeline). AWS CodeBuild Reports are primarily designed for test framework outputs and displaying test case results. In our case, we want to display Conformity checks – which are not unit test case results. The accompanying GitHub repository to convert from Conformity Template Scanner API results, but we will discuss mappings between the formats so that bespoke conversions for other 3rd party tools, such as cfn_nag can be created if required.

AWS CodeBuild provides out of the box compatibility for common unit test frameworks, such as NUnit, JUnit and Cucumber. Out of the supported formats, Cucumber JSON is the most readable format to read and manipulate due to native support in languages such as Python (all other formats being in XML).

Figure 4 depicts where the Cucumber JSON fields will appear in the AWS CodeBuild reports page and Figure 5 below shows a valid Cucumber snippet, with relevant fields highlighted in yellow.

CodeBuild Reports page with fields highlighted that correspond to cucumber JSON fields

Figure 4 – AWS CodeBuild report test case field mappings utilized by Cucumber JSON

Cucumber JSON snippet showing CodeBuild Report field mappings

Figure 5 – Cucumber JSON with mappings to AWS CodeBuild report table

Note that in Figure 5, there are additional fields (eg. id, description etc) that are required to make the file valid Cucumber JSON – even though this data is not displayed in CodeBuild Reports page. However, raw reports are still available as AWS CodeBuild artifacts, and therefore it is useful to still populate these fields with data that could be useful to aid deeper troubleshooting.

Conversion code for Conformity results is provided in the accompanying GitHub repo, within file app.py, line 376 onwards

Making the validation phase mandatory in AWS CodePipeline

The Shift-Left philosophy states that we should shift testing as much as possible to the left. The furthest left would be before any CI/CD pipeline is triggered. Developers could and should have the ability to perform template validation from their own machines. However, as discussed earlier this is rarely enforceable – a scan during a pipeline deployment is the only true way to know that templates have been validated. But how can we mandate this and truly secure the validation phase against circumvention?

Preventing updates to deployed CI/CD pipelines

Using a centralized API approach to make the call to the validation API means that this code is now only accessible by the Cloud Ops team, and not the developer teams. However, the code that calls this API has to reside within the developer teams’ CI/CD pipelines, so that it can stop the build if failures are found. With CI/CD pipelines defined as AWS CloudFormation, and without any preventative measures in place, a team could move to disable the phase and deploy code without any checks performed.

Fortunately, there are a number of approaches to prevent this from happening, and to enforce the validation phase. We shall now look at one of them from the AWS CloudFormation Best Practices.

IAM to control access

Use AWS IAM to control access to the stacks that define the pipeline, and then also to the AWS CodePipeline/AWS CodeBuild resources within them.

IAM policies can generically restrict a team from updating a CI/CD pipeline provided to them if a naming convention is used in the stacks that create them. By using a naming convention, coupled with the wildcard “*”, these policies can be applied to a role even before any pipelines have been deployed..

For example, lets assume the pipeline depicted in Figure 6 is defined and deployed in AWS CloudFormation as follows:

Stack name is “cicd-pipeline-team-X”
AWS CodePipeline resource within the stack has logical name with prefix “CodePipelineCICD”
AWS CodeBuild Project for validation phase is prefixed with “CodeBuildValidateProject”

Creating an IAM policy with the statements below and attaching to the developer teams’ IAM role will prevent them from modifying the resources mentioned above. The AWS CloudFormation stack and resource names will match the wildcards in the statements and Deny the user to any update actions.

Example IAM policy highlighting how to deny updates to stacks and pipeline resources

Figure 6 – Example of how an IAM policy can restrict updates to AWS CloudFormation stacks and deployed resources

Preventing valid failing checks from being a bottleneck

When centralizing anything, and forcing developers to use tooling or features such as template scanners, it is imperative that it (or the team owning it) does not become a bottleneck and slow the developers down. This is just as true for our centralized API solution.

It is sometimes the case that a developer team has a valid reason for a template to yield a failing check. For instance, Conformity will report a HIGH severity alert if a load balancer does not have an HTTPS listener. If a team is migrating an older application which will only work on port 80 and not 443, the team may be able to obtain an exception from their cyber security team. It would not desirable to turn off the rule completely in the real time scanning of the account, because for other deployments this HIGH severity alert could be perfectly valid. The team faces an issue now because the validation phase of their pipeline will fail, preventing them from deploying their application – even though they have cyber approval to fail this one check.

It is imperative that when enforcing template scanning on a team that it must not become a bottleneck. Functionality and workflows must accompany such a pipeline feature to allow for quick resolution.

Screenshot of Trend Micro Cloud One Conformity rule from their website

Figure 7 – Screenshot of a Conformity rule from their website

Therefore the centralized validation API must provide a way to allow for exceptions on a case by case basis. Any exception should be tied to a unique combination of AWS account number + filename + rule ID, which ensures that exceptions are only valid for the specific instance of violation, and not for any other. This can be achieved by extending the centralized API with a set of endpoints to allow for exception request and approvals. These can then be integrated into existing or new tooling and workflows to be able to provide a self service method for teams to be able to request exceptions. Cyber security teams should be able to quickly approve/deny the requests.

The exception request/approve functionality can be implemented by extending the centralized private API to provide an /exceptions endpoint, and using DynamoDB as a data store. During a build and template validation, failed checks returned from Conformity are then looked up in the Dynamo table to see if an approved exception is available – if it is, then the check is not returned as a actual failing check, but rather an exempted check. The build can then continue and deploy to the AWS account.

Figure 8 and figure 9 depict the /exceptions endpoints that are provided as part of the sample solution in the accompanying GitHub repository.

screenshot of API gateway for centralized template scanner api

Figure 8 – Screenshot of API Gateway depicting the endpoints available as part of the accompanying solution

The /exceptions endpoint methods provides the following functionality:

Table containing HTTP verbs for exceptions endpoint

Figure 9 – HTTP verbs implementing exception functionality

Important note regarding endpoint authorization: Whilst the “validate” private endpoint may be left with no auth so that any call from within a VPC is accepted, the same is not true for the “exception” approval endpoint. It would be prudent to use AWS IAM authentication available in API Gateway to restrict approvals to this endpoint for certain users only (i.e. the cyber and cloud ops team only)

With the ability to raise and approve exception requests, the mandatory scanning phase of the developer teams’ pipelines is no longer a bottleneck.

Conclusion

Enforcing template validation into multi developer team, multi account environments can present challenges with using 3rd party APIs, such as Conformity Template Scanner, at scale. We have talked through each hurdle that can be presented, and described how creating a centralized Validation API and exception approval process can overcome those obstacles and keep the teams deploying without unwarranted speed bumps.

By shifting left and integrating scanning as part of the pipeline process, this can leave the cyber team and developers sure that no offending code is deployed into an account – whether they were written in AWS CDK, AWS SAM or AWS CloudFormation.

Additionally, we talked in depth on how to use CodeBuild reports to display the vulnerabilities found, aiding developers to quickly identify where attention is required to remediate.

Getting started

The blog has described real life challenges and the theory in detail. A complete sample for the described centralized validation API is available in the accompanying GitHub repo, along with a sample CodePipeline for easy testing. Step by step instructions are provided for you to deploy, and enhance for use in your own organization. Figure 10 depicts the sample solution available in GitHub.

https://github.com/aws-samples/aws-cloudformation-template-scanning-with-cloud-conformity

NOTE: Remember to tear down any stacks after experimenting with the provided solution, to ensure ongoing costs are not charged to your AWS account. Notes on how to do this are included inside the repo Readme.

example codepipeline architecture provided by the accompanying github solution

Figure 10 depicts the solution available for use in the accompanying GitHub repository

Find out more

Other blog posts are available that cover aspects when dealing with template scanning in AWS:

For more information on Trend Micro Cloud One Conformity, use the links below.

Avatar for Chris Dorrington

Chris Dorrington

Chris Dorrington is a Senior Cloud Architect with AWS Professional Services in Perth, Western Australia. Chris loves working closely with AWS customers to help them achieve amazing outcomes. He has over 25 years software development experience and has a passion for Serverless technologies and all things DevOps