Tag Archives: AWS CodePipeline

Parallel and dynamic SaaS deployments with AWS CDK Pipelines

2021-11-01 Jani Muuriaisniemi

Post Syndicated from Jani Muuriaisniemi original https://aws.amazon.com/blogs/devops/parallel-and-dynamic-saas-deployments-with-cdk-pipelines/

Software as a Service (SaaS) is an increasingly popular business model for independent software vendors (ISVs), including benefits such as a pay-as-you-go pricing model, scalability, and availability.

SaaS services can be built by using numerous architectural models. The silo model provides each tenant with dedicated resources and a shared-nothing architecture. Silo deployments also provide isolation between tenants’ compute resources and their data, and they help eliminate the noisy-neighbor problem. On the other hand, the pool model offers several benefits, such as lower maintenance overhead, simplified management and operations, and cost-saving opportunities, all due to a more efficient utilization of computing resources and capacity. In the bridge model, both silo and pool models are utilized side-by-side. The bridge model is a hybrid model, where parts of the system can be in a silo model, and parts in a pool.

End-customers benefit from SaaS delivery in numerous ways. For example, the service can be available from multiple locations, letting the customer choose what is best for them. The tenant onboarding process is often real-time and frictionless. To realize these benefits for their end-customers, SaaS providers need methods for reliable, fast, and multi-region capable provisioning and software lifecycle management.

This post will describe a deployment system for automating the provision and lifecycle management of workload components in pool or silo deployment models by using AWS Cloud Development Kit (AWS CDK) and CDK Pipelines. We will explore the system’s dynamic and database driven deployment model, as well as its multi-account and multi-region capabilities, and we will provision demo deployments of workload components in both the silo and pool models.

AWS Cloud Development Kit and CDK Pipelines

For this solution, we utilized AWS Cloud Development Kit (AWS CDK) and its CDK Pipelines construct library. AWS CDK is an open-source software development framework for modeling and provisioning cloud application resources by using familiar programming languages. AWS CDK lets you define your infrastructure as code and provision it through AWS CloudFormation.

CDK Pipelines is a high-level construct library with an opinionated implementation of a continuous deployment pipeline for your CDK applications. It is powered by AWS CodePipeline, a fully managed continuous delivery service that helps automate your release pipelines for fast and reliable application as well as infrastructure updates. No servers need to be provisioned or setup, and you only pay for what you use. This solution utilizes the recently released and stable CDK Pipelines modern API.

Business Scenario

As a baseline use case, we have selected the consideration of a fictitious ISV called Unicorn that wants to implement an SaaS business model.

Unicorn operates in several countries, and requires the storing of customer data within the customers’ chosen region. Currently, Unicorn needs two regions in order to satisfy its main customer base: one in EU and one in US. Unicorn expects rapid growth, and it needs a solution that can scale to thousands of tenants. Unicorn plans to have different tenant tiers with different isolation requirements. Their planned deployment model has the majority of tenants in shared pool instances, but they also plan to support dedicated silo instances for the tenants requiring it. The solution must also be easily extendable to new Regions as Unicorn’s business expands.

Unicorn is starting small with just a single development team responsible for currently the only component in their SaaS workload architecture. Following industry best practices, Unicorn has designed its workload architecture so that each component has a clear technical ownership boundary. The chosen solution must grow together with Unicorn, and support multiple independently developed and deployed components in the future.

Solution Overview

Today, many customers utilize AWS CodePipeline to build, test, and deploy their cloud applications. For an SaaS provider such as Unicorn, considering utilizing a single pipeline for managing every deployment presented concerns. At the scale that Unicorn requires, a single pipeline with potentially hundreds of actions runs the risk of becoming throughput limited. Moreover, a single pipeline would offer Unicorn limited control over how changes are released.

Our solution addresses this problem by having a separate dynamically provisioned pipeline for each pool and silo deployment. The solution is designed to manage multiple deployments of Unicorn’s single workload component, thereby aligning with their current needs — and with small changes, including future needs.

CDK Best Practices state that an AWS CDK application maps to a component as defined by the AWS Well-Architected Framework. A component is the code, configuration, and AWS Resources that together deliver against a workload requirement. And this is typically the unit of technical ownership. A component usually includes logical units (e.g., api, database), and can have a continuous deployment pipeline.

Utilizing CDK Pipelines provides a significant benefit: with no additional code, we can deploy cross-account and cross-region just as easily as we would to a single account and region. CDK Pipelines automatically creates and manages the required cross-account encryption keys and cross-region replication buckets. Furthermore, we only need to establish a trust relationship between the accounts during the CDK bootstrapping process.

The following diagram illustrates the solution architecture:

Solution Architecture Diagram

Figure 1: Solution architecture

Let’s look closer at the two primary high level solution flows: silo and pool pipeline provisioning (1 and 2), and component code deployment (3 and 4).

Provisioning is separated into a dedicated flow, so that code deployments do not interfere with tenant onboarding, and vice versa. At the heart of the provisioning flow is the deployment database (1), which is implemented by using an Amazon DynamoDB table.

Utilizing DynamoDB Streams and AWS Lambda Triggers, a new AWS CodeBuild provisioning project build (2) is automatically started after a record is inserted into the deployment database. The provisioning project directly provisions new silo and pool pipelines by using the “cdk deploy” command. Provisioning events are processed in parallel, so that the solution can handle possible bursts in Unicorn’s tenant onboarding volumes.

CDK best practices suggest that infrastructure and runtime code live in the same package. A single AWS CodeCommit repository (3) contains everything needed: the CI/CD pipeline definitions as well as the workload component code. This repository is the source artifact for every CodePipeline pipeline and CodeBuild project. The chapter “Managing application resources as code” describes related implementation details.

The CI/CD pipeline (4) is a CDK Pipelines pipeline, and it is responsible for the component’s Software Development Life Cycle (SDLC) activities. In addition to implementing the update release process, it is expected that most SaaS providers will also implement additional activities. This includes a variety of tests and pre-production environment deployments. The chapter “Controlling deployment updates” dives deeper into this topic.

Deployments have two parts: The pipeline (5) and the component resource stack(s) (6) that it manages. The pipelines are deployed to the central toolchain account and region, whereas the component resources are deployed to the AWS Account and Region, as specified in the deployments’ record in the deployment database.

Sample code for the solution is available in GitHub. The sample code is intended for utilization in conjunction with this post. Our solution is implemented in TypeScript.

Deployment Database

Our deployment database is an Amazon DynamoDB table, with the following structure:

Table structure explained in post.

Figure 2: DynamoDB table

‘id’ is a unique identifier for each deployment.
‘account’ is the AWS account ID for the component resources.
‘region’ is the AWS region ID for the component resources.
‘type’ is either ‘silo’ or ‘pool’, which defines the deployment model.

This design supports tenant deployment to multiple silo and pool deployments. Each of these can target any available and bootstrapped AWS Account and Region. For example, different pools can support tenants in different regions, with select tenants deployed to dedicated silos. As pools may be limited to how many tenants they can serve, the design also supports having multiple pools within a region, and it can easily be extended with an additional attribute to support the tiers concept.

Note that the deployment database does not contain tenant information. It is expected that such mapping is maintained in a separate tenant database, where each tenant record can map to the ID of the deployment that it is associated with.

Now that we have looked at our solution design and architecture, let’s move to the hands-on section, starting with the deployment requirements for the solution.

Prerequisites

The following tools are required to deploy the solution:

NodeJS version compatible with AWS CDK version 1.124.0
The AWS Command Line Interface (AWS CLI)
Git with git-remote-codecommit extension

To follow this tutorial completely, you should have administrator access to at least one, but preferably two AWS accounts:

Toolchain: Account for the SDLC toolchain: the pipelines, the provisioning project, the repository, and the deployment database.
Workload (optional): Account for the component resources.

If you have only a single account, then the toolchain account can be used for both purposes. Credentials for the account(s) are assumed to be configured in AWS CLI profile(s).

The instructions in this post use the following placeholders, which you must replace with your specific values:

<TOOLCHAIN_ACCOUNT_ID>: The AWS Account ID for the toolchain account
<TOOLCHAIN_PROFILE_NAME>: The AWS CLI profile name for the toolchain account credentials
<WORKLOAD_ACCOUNT_ID>: The AWS Account ID for the workload account
<WORKLOAD_PROFILE_NAME>: The AWS CLI profile name for the workload account credentials

Bootstrapping

The toolchain account, and all workload account(s), must be bootstrapped prior to first-time deployment.

AWS CDK and our solutions’ dependencies must be installed to start with. The easiest way to do this is to install them locally with npm. First, we need to download our sample code, so that the we have the package.json configuration file available for npm.

Note that throughout these instructions, many commands are broken over multiple lines for readability. Take care to execute the commands completely. It is always safe to execute each code block as a whole.

Clone the sample code repository from GitHub, and then install the dependencies by using npm:

git clone https://github.com/aws-samples/aws-saas-parallel-deployments
cd aws-saas-parallel-deployments
npm ci

CDK Pipelines requires use of modern bootstrapping. To ensure that this is enabled, start by setting the related environment variable:

export CDK_NEW_BOOTSTRAP=1

Then, bootstrap the toolchain account. You must bootstrap both the region where the toolchain stack is deployed, as well as every target region for component resources. Here, we will first bootstrap only the us-east-1 region, and later you can optionally bootstrap additional region(s).

To bootstrap, we use npx to execute the locally installed version of AWS CDK:

npx cdk bootstrap <TOOLCHAIN_ACCOUNT_ID>/us-east-1 --profile <TOOLCHAIN_PROFILE_NAME>

If you have a workload account that is separate from the toolchain account, then that account must also be bootstrapped. When bootstrapping the workload account, we will establish a trust relationship with the toolchain account. Skip this step if you don’t have a separate workload account.

The workload account boostrappings follows the security best practice of least privilege. First create an execution policy with the minimum permissions required to deploy our demo component resources. We provide a sample policy file in the solution repository for this purpose. Then, use that policy as the execution policy for the trust relationship between the toolchain account and the workload account

aws iam create-policy \
  --profile <WORKLOAD_PROFILE_NAME> \
  --policy-name CDK-Exec-Policy \
  --policy-document file://policies/workload-cdk-exec-policy.json
npx cdk bootstrap <WORKLOAD_ACCOUNT_ID>/us-east-1 \
  --profile <WORKLOAD_PROFILE_NAME> \
  --trust <TOOLCHAIN_ACCOUNT_ID> \
  --cloudformation-execution-policies arn:aws:iam::<WORKLOAD_ACCOUNT_ID>:policy/CDK-Exec-Policy

Toolchain deployment

Prior to being able to deploy for the first time, you must create an AWS CodeCommit repository for the solution. Create this repository in the toolchain account:

aws codecommit create-repository \
  --profile <TOOLCHAIN_PROFILE_NAME> \
  --region us-east-1 \
  --repository-name unicorn-repository

Next, you must push the contents to the CodeCommit repository. For this, use the git command together with the git-remote-codecommit extension in order to authenticate to the repository with your AWS CLI credentials. Our pipelines are configured to use the main branch.

git remote add unicorn codecommit::us-east-1://<TOOLCHAIN_PROFILE_NAME>@unicorn-repository
git push unicorn main

Now we are ready to deploy the toolchain stack:

export AWS_REGION=us-east-1
npx cdk deploy --profile <TOOLCHAIN_PROFILE_NAME>

Workload deployments

At this point, our CI/CD pipeline, provisioning project, and deployment database have been created. The database is initially empty.

Note that the DynamoDB command line interface demonstrated below is not intended to be the SaaS providers provisioning interface for production use. SaaS providers typically have online registration portals, wherein the customer signs up for the service. When new deployments are needed, then a record should automatically be inserted into the solution’s deployment database.

To demonstrate the solution’s capabilities, first we will provision two deployments, with an optional third cross-region deployment:

A silo deployment (silo1) in the us-east-1 region.
A pool deployment (pool1) in the us-east-1 region.
A pool deployment (pool2) in the eu-west-1 region (optional).

To start, configure the AWS CLI environment variables:

export AWS_REGION=us-east-1
export AWS_PROFILE=<TOOLCHAIN_PROFILE_NAME>

Add the deployment database records for the first two deployments:

aws dynamodb put-item \
  --table-name unicorn-deployments \
  --item '{
    "id": {"S":"silo1"},
    "type": {"S":"silo"},
    "account": {"S":"<WORKLOAD_ACCOUNT_ID>"},
    "region": {"S":"us-east-1"}
  }'
aws dynamodb put-item \
  --table-name unicorn-deployments \
  --item '{
    "id": {"S":"pool1"},
    "type": {"S":"pool"},
    "account": {"S":"<WORKLOAD_ACCOUNT_ID>"},
    "region": {"S":"us-east-1"}
  }'

This will trigger two parallel builds of the provisioning CodeBuild project. Use the CodeBuild Console in order to observe the status and progress of each build.

Cross-region deployment (optional)

Optionally, also try a cross-region deployment. Skip this part if a cross-region deployment is not relevant for your use case.

First, you must bootstrap the target region in the toolchain and the workload accounts. Bootstrapping of eu-west-1 here is identical to the bootstrapping of the us-east-1 region earlier. First bootstrap the toolchain account:

npx cdk bootstrap <TOOLCHAIN_ACCOUNT_ID>/eu-west-1 --profile <TOOLCHAIN_PROFILE_NAME>

If you have a separate workload account, then we must also bootstrap it for the new region. Again, please skip this if you have only a single account:

npx cdk bootstrap <WORKLOAD_ACCOUNT_ID>/eu-west-1 \
  --profile <WORKLOAD_PROFILE_NAME> \
  --trust <TOOLCHAIN_ACCOUNT_ID> \
  --cloudformation-execution-policies arn:aws:iam::<WORKLOAD_ACCOUNT_ID>:policy/CDK-Exec-Policy

Then, add the cross-region deployment:

aws dynamodb put-item \
  --table-name unicorn-deployments \
  --item '{
    "id": {"S":"pool2"},
    "type": {"S":"pool"},
    "account": {"S":"<WORKLOAD_ACCOUNT_ID>"},
    "region": {"S":"eu-west-1"}
  }'

Validation of deployments

After the builds have completed, use the CodePipeline console to verify that the deployment pipelines were successfully created in the toolchain account:

CodePipeline console showing Pool-pool2-pipeline, Pool-pool1-pipeline and Silo-silo1-pipeline all Succeeded most recent execution.

Figure 3: CodePipeline console

Similarly, in the workload account, stacks containing your component resources will have been deployed to each configured region for the deployments. In this demo, we are deploying a single “hello world” container application utilizing AWS App Runner as runtime environment. Successful deployment can be verified by using CloudFormation Console:

Console showing Pool-pool1-resources with status of CREATE_COMPLETE

Figure 4: CloudFormation console

Now that we have successfully finished with our demo deployments, let’s look at how updates to the pipelines and the component resources can be managed.

Managing application resources as code

As highlighted earlier in the Solution Overview, every aspect of our solution shares a single source repository. With all of our code in a single source, we can easily deliver complex changes impacting multiple aspects of our solution. And all of this can be packaged, tested, and released as a single change set. For example, a change can introduce a new stage to the CI/CD pipeline, modify an existing stage in the silo and pool pipelines, and/or make code and resource changes to the component resources.

Managing the pipeline definitions is made simple by the self-mutate capability of the CDK Pipelines. Once initially deployed, each CDK Pipelines pipeline can update its own definition. This is implemented by using a separate SelfMutate stage in the pipeline definition. This stage is executed before any deployment actions, thereby ensuring that the pipeline always executes the latest version that is defined by the source code.

Managing how and when the pipelines trigger to execute also required attention. CDK Pipelines configures pipelines by default to utilize event-based polling of the source repository. While this is a reasonable default, and it is great for the CI/CD pipeline, it is undesired for our silo and pool pipelines. If all of these pipelines would execute automatically on code commits to the source repository, the CI/CD pipeline could not manage the release flow. To address this, we have configured the silo and pool pipelines with the trigger in the CodeCommitSourceOptions to NONE.

Controlling deployment updates

A key aspect of SaaS delivery is controlling how you roll out changes to tenants. Significant business risk can arise if changes are released to all tenants all-at-once in a single big bang.

This risk can be managed by utilizing a combination of silo and pool deployments. Reduce your risk by spreading tenants into multiple pools, and gradually rolling out your changes to these pools. Based on business needs and/or risk assessment, select customers can be provisioned into dedicated silo deployments, thereby allowing update control for those customers separately. Note that while all of a pool’s tenants get the same underlying update simultaneously, you can utilize feature flags to selectively enable new features only for specific tenants in the deployment.

In the demo solution, the CI/CD pipeline contains only a single custom stage “UpdateDeployments”. This CodeBuild action implements a simple “one-at-a-time” strategy. The code has been purposely written so that it is simple and provides you with a starting point to implement your own more complex strategy, as based on your unique business needs. In the default implementation, every silo and pool pipeline tracks the same “main” branch of the repository. Releases are governed by controlling when each pipeline executes to update its resources.

When designing your release strategy, look into how the planned process helps implement releases and changes with high quality and frequency. A typical starting point is a CI/CD pipeline with continuous automated deployments via multiple test and staging environments in order to validate your changes prior to deployment to any production tenants.

Furthermore, consider if utilizing a canary release strategy would help identify potential issues with your changes prior to rolling them out across all deployments in production. In a canary release, each change is first deployed only to a small subset of your deployments. Once you are satisfied with the change quality, then the change can either automatically or manually be released to the rest of your deployments. As an example, an AWS Step Functions state machine could be combined with the solution, and then utilized to control the release flow, execute validation tests, implement approval steps (either manual or automatic), and even conduct rollback if necessary.

Further considerations

The example in this post provisions every silo and pool deployment to a single AWS account. However, the solution is not limited to a single account, and it can deploy equally easily to multiple AWS accounts. When operating at scale, it is best-practice to spread your workloads to several accounts. The Organizing Your AWS Environment using Multiple Accounts whitepaper has in-depth guidance on strategies for spreading your workloads.

If combined with an AWS account-vending machine implementation, such as an AWS Control Tower Landing Zone, then the demo solution could be adapted so that new AWS accounts are provisioned automatically. This would be useful if your business requires full account-level deployment isolation, and you also want automated provisioning.

To meet Unicorn’s future needs for spreading their solution architecture over multiple separate components, the deployment database and associated lambda function could be decoupled from the rest of the toolchain components in order to provide a central deployment service. When provisioned as standalone, and amended with Amazon Simple Notification Service-based notifications sent to the component deployment systems for example, this central deployment service could be utilized for managing the deployments for multiple components.

In addition, you should analyze your deployment lifecycle transitions, and then consider what action should be taken when a tenant is disabled and/or deleted. Implementing a deployment archival/deletion process is not in the scope of this post.

Cleanup

To cleanup every resource deployed in this post, conduct the following actions:

In the workload account:
1. In us-east-1 Region, delete CloudFormation stacks named “pool-pool1-resources” and “silo-silo1-resources” and the CDK bootstrap stack “CDKToolKit”.
2. In eu-west-1 Region, delete CloudFormation stack named “pool-pool2-resources” and the CDK Bootstrap stack “CDKToolKit”
In the toolchain account:
1. In us-east-1 Region, delete CloudFormation stacks “toolchain”, “pool-pool1-pipeline”, “pool-pool2-pipeline”, “silo-silo1-pipeline” and the CDK bootstrap stack “CDKToolKit”.
2. In eu-west-1 Region, delete CloudFormation stack “pool-pool2-pipeline-support-eu-west-1” and the CDK bootstrap stack “CDKToolKit”
3. Cleanup and delete S3 buckets “toolchain-*”, “pool-pool1-pipeline-*”, “pool-pool2-pipeline-*”, and “silo-silo1-pipeline-*”.

Conclusion

This solution demonstrated an implementation of an automated SaaS application component deployment factory. We covered how an ISV venturing into the SaaS model can utilize AWS CDK and CDK Pipelines in order to avoid a multitude of undifferentiated heavy lifting by leveraging and combining AWS CDK’s cross-region and cross-account capabilities with CDK Pipelines’ self-mutating deployment pipelines. Furthermore, we demonstrated how all of this can be written, managed, and released just like any other code you write. We also demonstrated how a single dynamic provisioning system can be utilized to operate in a mixed mode, with both silo and pool deployments.

Visit the AWS SaaS Factory Program page for further information on how AWS can help you on your SaaS journey — regardless of the stage you are currently in.

About the authors

Building an InnerSource ecosystem using AWS DevOps tools

2021-10-07 Debashish Chakrabarty

Post Syndicated from Debashish Chakrabarty original https://aws.amazon.com/blogs/devops/building-an-innersource-ecosystem-using-aws-devops-tools/

InnerSource is the term for the emerging practice of organizations adopting the open source methodology, albeit to develop proprietary software. This blog discusses the building of a model InnerSource ecosystem that leverages multiple AWS services, such as CodeBuild, CodeCommit, CodePipeline, CodeArtifact, and CodeGuru, along with other AWS services and open source tools.

What is InnerSource and why is it gaining traction?

Most software companies leverage open source software (OSS) in their products, as it is a great mechanism for standardizing software and bringing in cost effectiveness via the re-use of high quality, time-tested code. Some organizations may allow its use as-is, while others may utilize a vetting mechanism to ensure that the OSS adheres to the organization standards of security, quality, etc. This confidence in OSS stems from how these community projects are managed and sustained, as well as the culture of openness, collaboration, and creativity that they nurture.

Many organizations building closed source software are now trying to imitate these development principles and practices. This approach, which has been perhaps more discussed than adopted, is popularly called “InnerSource”. InnerSource serves as a great tool for collaborative software development within the organization perimeter, while keeping its concerns for IP & Legality in check. It provides collaboration and innovation avenues beyond the confines of organizational silos through knowledge and talent sharing. Organizations reap the benefits of better code quality and faster time-to-market, yet at only a fraction of the cost.

What constitutes an InnerSource ecosystem?

Infrastructure and processes that harbor collaboration stand at the heart of InnerSource ecology. These systems (refer Figure 1) would typically include tools supporting features such as code hosting, peer reviews, Pull Request (PR) approval flow, issue tracking, documentation, communication & collaboration, continuous integration, and automated testing, among others. Another major component of this system is an entry portal that enables the employees to discover the InnerSource projects and join the community, beginning as ordinary users of the reusable code and later graduating to contributors and committers.

A typical InnerSource ecosystem

Figure 1: A typical InnerSource ecosystem

More to InnerSource than meets the eye

This blog focuses on detailing a technical solution for establishing the required tools for an InnerSource system primarily to enable a development workflow and infrastructure. But the secret sauce of an InnerSource initiative in an enterprise necessitates many other ingredients.

Figure 2: InnerSource Roles & Use Cases

InnerSource thrives on community collaboration and a low entry barrier to enable adoptability. In turn, that demands a cultural makeover. While strategically deciding on the projects that can be inner sourced as well as the appropriate licensing model, enterprises should bootstrap the initiative with a seed product that draws the community, with maintainers and the first set of contributors. Many of these users would eventually be promoted, through a meritocracy-based system, to become the trusted committers.

Over a set period, the organization should plan to move from an infra specific model to a project specific model. In a Project-specific InnerSource model, the responsibility for a particular software asset is owned by a dedicated team funded by other business units. Whereas in the Infrastructure-based InnerSource model, the organization provides the necessary infrastructure to create the ecosystem with code & document repositories, communication tools, etc. This enables anybody in the organization to create a new InnerSource project, although each project initiator maintains their own projects. They could begin by establishing a community of practice, and designating a core team that would provide continuing support to the InnerSource projects’ internal customers. Having a team of dedicated resources would clearly indicate the organization’s long-term commitment to sustaining the initiative. The organization should promote this culture through regular boot camps, trainings, and a recognition program.

Lastly, the significance of having a modular architecture in the InnerSource projects cannot be understated. This architecture helps developers understand the code better, as well as aids code reuse and parallel development, where multiple contributors could work on different code modules while avoiding conflicts during code merges.

A model InnerSource solution using AWS services

This blog discusses a solution that weaves various services together to create the necessary infrastructure for an InnerSource system. While it is not a full-blown solution, and it may lack some other components that an organization may desire in its own system, it can provide you with a good head start.

The ultimate goal of the model solution is to enable a developer workflow as depicted in Figure 3.

Typical developer workflow at InnerSource

Figure 3: Typical developer workflow at InnerSource

At the core of the InnerSource-verse is the distributed version control (AWS CodeCommit in our case). To maintain system transparency, openness, and participation, we must have a discovery mechanism where users could search for the projects and receive encouragement to contribute to the one they prefer (Step 1 in Figure 4).

Architecture diagram for the model InnerSource system

Figure 4: Architecture diagram for the model InnerSource system

For this purpose, the model solution utilizes an open source reference implantation of InnerSource Portal. The portal indexes data from AWS CodeCommit by using a crawler, and it lists available projects with associated metadata, such as the skills required, number of active branches, and average number of commits. For CodeCommit, you can use the crawler implementation that we created in the open source code repo at https://github.com/aws-samples/codecommit-crawler-innersource.

The major portal feature is providing an option to contribute to a project by using a “Contribute” link. This can present a pop-up form to “apply as a contributor” (Step 2 in Figure 4), which when submitted sends an email (or creates a ticket) to the project maintainer/committer who can create an IAM (Step 3 in Figure 4) user with access to the particular repository. Note that the pop-up form functionality is built into the open source version of the portal. However, it would be trivial to add one with an associated action (send an email, cut a ticket, etc.).

InnerSource portal indexes CodeCommit repos and provides a bird’s eye view

Figure 5: InnerSource portal indexes CodeCommit repos and provides a bird’s eye view

The contributor, upon receiving access, logs in to CodeCommit, clones the mainline branch of the InnerSource project (Step 4 in Figure 4) into a fix or feature branch, and starts altering/adding the code. Once completed, the contributor commits the code to the branch and raises a PR (Step 5 in Figure 4). A Pull Request is a mechanism to offer code to an existing repository, which is then peer-reviewed and tested before acceptance for inclusion.

The PR triggers a CodeGuru review (Step 6 in Figure 4) that adds the recommendations as comments on the PR. Furthermore, it triggers a CodeBuild process (Steps 7 to 10 in Figure 4) and logs the build result in the PR. At this point, the code can be peer reviewed by Trusted Committers or Owners of the project repository. The number of approvals would depend on the approval template rule configured in CodeCommit. The Committer(s) can approve the PR (Step 12 in Figure 4) and merge the code to the mainline branch – that is once they verify that the code serves its purpose, has passed the required tests, and doesn’t break the build. They could also rely on the approval vote from a sanity test conducted by a CodeBuild process. Optionally, a build process could deploy the latest mainline code (Step 14 in Figure 4) on the PR merge commit.

To maintain transparency in all communications related to progress, bugs, and feature requests to downstream users and contributors, a communication tool may be needed. This solution does not show integration with any Issue/Bug tracking tool out of the box. However, multiple of these tools are available at the AWS marketplace, with some offering forum and Wiki add-ons in order to elicit discussions. Standard project documentation can be kept within the repository by using the constructs of the README.md file to provide project mission details and the CONTRIBUTING.md file to guide the potential code contributors.

An overview of the AWS services used in the model solution

The model solution employs the following AWS services:

Amazon CodeCommit: a fully managed source control service to host secure and highly scalable private Git repositories.
Amazon CodeBuild: a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy.
Amazon CodeDeploy: a service that automates code deployments to any instance, including EC2 instances and instances running on-premises.
Amazon CodeGuru: a developer tool providing intelligent recommendations to improve code quality and identify an application’s most expensive lines of code.
Amazon CodePipeline: a fully managed continuous delivery service that helps automate release pipelines for fast and reliable application and infrastructure updates.
Amazon CodeArtifact: a fully managed artifact repository service that makes it easy to securely store, publish, and share software packages utilized in their software development process.
Amazon S3: an object storage service that offers industry-leading scalability, data availability, security, and performance.
Amazon EC2: a web service providing secure, resizable compute capacity in the cloud. It is designed to ease web-scale computing for developers.
Amazon EventBridge: a serverless event bus that eases the building of event-driven applications at scale by using events generated from applications and AWS services.
Amazon Lambda: a serverless compute service that lets you run code without provisioning or managing servers.

The journey of a thousand miles begins with a single step

InnerSource might not be the right fit for every organization, but is a great step for those wanting to encourage a culture of quality and innovation, as well as purge silos through enhanced collaboration. It requires backing from leadership to sponsor the engineering initiatives, as well as champion the establishment of an open and transparent culture granting autonomy to the developers across the org to contribute to projects outside of their teams. The organizations best-suited for InnerSource have already participated in open source initiatives, have engineering teams that are adept with CI/CD tools, and are willing to adopt OSS practices. They should start small with a pilot and build upon their successes.

Conclusion

Ever more enterprises are adopting the open source culture to develop proprietary software by establishing an InnerSource. This instills innovation, transparency, and collaboration that result in cost effective and quality software development. This blog discussed a model solution to build the developer workflow inside an InnerSource ecosystem, from project discovery to PR approval and deployment. Additional features, like an integrated Issue Tracker, Real time chat, and Wiki/Forum, can further enrich this solution.

If you need helping hands, AWS Professional Services can help adapt and implement this model InnerSource solution in your enterprise. Moreover, our Advisory services can help establish the governance model to accelerate OSS culture adoption through Experience Based Acceleration (EBA) parties.

References

An introduction to InnerSource
The InnerSource Commons is a forum for sharing experiences and best practices to advance the InnerSource movement. Some proven approaches have been provided as patterns. In addition, several free books are also available on the topic.

About the authors

Validate IAM policies in CloudFormation templates using IAM Access Analyzer

2021-09-29 Matt Luttrell

Post Syndicated from Matt Luttrell original https://aws.amazon.com/blogs/security/validate-iam-policies-in-cloudformation-templates-using-iam-access-analyzer/

In this blog post, I introduce IAM Policy Validator for AWS CloudFormation (cfn-policy-validator), an open source tool that extracts AWS Identity and Access Management (IAM) policies from an AWS CloudFormation template, and allows you to run existing IAM Access Analyzer policy validation APIs against the template. I also show you how to run the tool in a continuous integration and continuous delivery (CI/CD) pipeline to validate IAM policies in a CloudFormation template before they are deployed to your AWS environment.

Embedding this validation in a CI/CD pipeline can help prevent IAM policies that have IAM Access Analyzer findings from being deployed to your AWS environment. This tool acts as a guardrail that can allow you to delegate the creation of IAM policies to the developers in your organization. You can also use the tool to provide additional confidence in your existing policy authoring process, enabling you to catch mistakes prior to IAM policy deployment.

What is IAM Access Analyzer?

IAM Access Analyzer mathematically analyzes access control policies that are attached to resources, and determines which resources can be accessed publicly or from other accounts. IAM Access Analyzer can also validate both identity and resource policies against over 100 checks, each designed to improve your security posture and to help you to simplify policy management at scale.

The IAM Policy Validator for AWS CloudFormation tool

IAM Policy Validator for AWS CloudFormation (cfn-policy-validator) is a new command-line tool that parses resource-based and identity-based IAM policies from your CloudFormation template, and runs the policies through IAM Access Analyzer checks. The tool is designed to run in the CI/CD pipeline that deploys your CloudFormation templates, and to prevent a deployment when an IAM Access Analyzer finding is detected. This ensures that changes made to IAM policies are validated before they can be deployed.

The cfn-policy-validator tool looks for all identity-based policies, and a subset of resource-based policies, from your templates. For the full list of supported resource-based policies, see the cfn-policy-validator GitHub repository.

Parsing IAM policies from a CloudFormation template

One of the challenges you can face when parsing IAM policies from a CloudFormation template is that these policies often contain CloudFormation intrinsic functions (such as Ref and Fn::GetAtt) and pseudo parameters (such as AWS::AccountId and AWS::Region). As an example, it’s common for least privileged IAM policies to reference the Amazon Resource Name (ARN) of another CloudFormation resource. Take a look at the following example CloudFormation resources that create an Amazon Simple Queue Service (Amazon SQS) queue, and an IAM role with a policy that grants access to perform the sqs:SendMessage action on the SQS queue.

Figure 1- Example policy in CloudFormation template

As you can see in Figure 1, line 21 uses the function Fn::Sub to restrict this policy to MySQSQueue created earlier in the template.

In this example, if you were to pass the root policy (lines 15-21) as written to IAM Access Analyzer, you would get an error because !Sub ${MySQSQueue.Arn} is syntax that is specific to CloudFormation. The cfn-policy-validator tool takes the policy and translates the CloudFormation syntax to valid IAM policy syntax that Access Analyzer can parse.

The cfn-policy-validator tool recognizes when an intrinsic function such as !Sub ${MySQSQueue.Arn} evaluates to a resource’s ARN, and generates a valid ARN for the resource. The tool creates the ARN by mapping the type of CloudFormation resource (in this example AWS::SQS::Queue) to a pattern that represents what the ARN of the resource will look like when it is deployed. For example, the following is what the mapping looks like for the SQS queue referenced previously:

AWS::SQS::Queue.Arn -> arn:${Partition}:sqs:${Region}:${Account}:${QueueName}

For some CloudFormation resources, the Ref intrinsic function also returns an ARN. The cfn-policy-validator tool handles these cases as well.

Cfn-policy-validator walks through each of the six parts of an ARN and substitutes values for variables in the ARN pattern (any text contained within ${}). The values of ${Partition} and ${Account} are taken from the identity of the role that runs the cfn-policy-validator tool, and the value for ${Region} is provided as an input flag. The cfn-policy-validator tool performs a best-effort resolution of the QueueName, but typically defaults it to the name of the CloudFormation resource (in the previous example, MySQSQueue). Validation of policies with IAM Access Analyzer does not rely on the name of the resource, so the cfn-policy-validator tool is able to substitute a replacement name without affecting the policy checks.

The final ARN generated for the MySQSQueue resource looks like the following (for an account with ID of 111111111111):

arn:aws:sqs:us-east-1:111111111111:MySQSQueue

The cfn-policy-validator tool substitutes this generated ARN for !Sub ${MySQSQueue.Arn}, which allows the cfn-policy-validator tool to parse a policy from the template that can be fed into IAM Access Analyzer for validation. The cfn-policy-validator tool walks through your entire CloudFormation template and performs this ARN substitution until it has generated ARNs for all policies in your template.

Validating the policies with IAM Access Analyzer

After the cfn-policy-validator tool has your IAM policies in a valid format (with no CloudFormation intrinsic functions or pseudo parameters), it can take those policies and feed them into IAM Access Analyzer for validation. The cfn-policy-validator tool runs resource-based and identity-based policies in your CloudFormation template through the ValidatePolicy action of the IAM Access Analyzer. ValidatePolicy is what ensures that your policies have correct grammar and follow IAM policy best practices (for example, not allowing iam:PassRole to all resources). The cfn-policy-validator tool also makes a call to the CreateAccessPreview action for supported resource policies to determine if the policy would grant unintended public or cross-account access to your resource.

The cfn-policy-validator tool categorizes findings from IAM Access Analyzer into the categories blocking or non-blocking. Findings categorized as blocking cause the tool to exit with a non-zero exit code, thereby causing your deployment to fail and preventing your CI/CD pipeline from continuing. If there are no findings, or only non-blocking findings detected, the tool will exit with an exit code of zero (0) and your pipeline can to continue to the next stage. For more information about how the cfn-policy-validator tool decides what findings to categorize as blocking and non-blocking, as well as how to customize the categorization, see the cfn-policy-validator GitHub repository.

Example of running the cfn-policy-validator tool

This section guides you through an example of what happens when you run a CloudFormation template that has some policy violations through the cfn-policy-validator tool.

The following template has two CloudFormation resources with policy findings: an SQS queue policy that grants account 111122223333 access to the SQS queue, and an IAM role with a policy that allows the role to perform a misspelled sqs:ReceiveMessages action. These issues are highlighted in the policy below.

Important: The policy in Figure 2 is written to illustrate a CloudFormation template with potentially undesirable IAM policies. You should be careful when setting the Principal element to an account that is not your own.

Figure 2: CloudFormation template with undesirable IAM policies

When you pass this template as a parameter to the cfn-policy-validator tool, you specify the AWS Region that you want to deploy the template to, as follows:

cfn-policy-validator validate --template-path ./template.json --region us-east-1

After the cfn-policy-validator tool runs, it returns the validation results, which includes the actual response from IAM Access Analyzer:

{
    "BlockingFindings": [
        {
            "findingType": "ERROR",
            "code": "INVALID_ACTION",
            "message": "The action sqs:ReceiveMessages does not exist.",
            "resourceName": "MyRole",
            "policyName": "root",
            "details": …
        },
        {
            "findingType": "SECURITY_WARNING",
            "code": "EXTERNAL_PRINCIPAL",
            "message": "Resource policy allows access from external principals.",
            "resourceName": "MyQueue",
            "policyName": "QueuePolicy",
            "details": …
        }
    ],
    "NonBlockingFindings": []
}

The output from the cfn-policy-validator tool includes the type of finding, the code for the finding, and a message that explains the finding. It also includes the resource and policy name from the CloudFormation template, to allow you to quickly track down and resolve the finding in your template. In the previous example, you can see that IAM Access Analyzer has detected two findings, one security warning and one error, which the cfn-policy-validator tool has classified as blocking. The actual response from IAM Access Analyzer is returned under details, but is excluded above for brevity.

If account 111122223333 is an account that you trust and you are certain that it should have access to the SQS queue, then you can suppress the finding for external access from the 111122223333 account in this example. Modify the call to the cfn-policy-validator tool to ignore this specific finding by using the –-allow-external-principals flag, as follows:

cfn-policy-validator validate --template-path ./template.json --region us-east-1 --allow-external-principals 111122223333

When you look at the output that follows, you’re left with only the blocking finding that states that sqs:ReceiveMessages does not exist.

{
    "BlockingFindings": [
        {
            "findingType": "ERROR",
            "code": "INVALID_ACTION",
            "message": "The action sqs:ReceiveMessages does not exist.",
            "resourceName": "MyRole",
            "policyName": "root",
            "details": …
    ],
    "NonBlockingFindings": []
}

To resolve this finding, update the template to have the correct spelling, sqs:ReceiveMessage (without trailing s).

For the full list of available flags and commands supported, see the cfn-policy-validator GitHub repository.

Now that you’ve seen an example of how you can run the cfn-policy-validator tool to validate policies that are on your local machine, you will take it a step further and see how you can embed the cfn-policy-validator tool in a CI/CD pipeline. By embedding the cfn-policy-validator tool in your CI/CD pipeline, you can ensure that your IAM policies are validated each time a commit is made to your repository.

Embedding the cfn-policy-validator tool in a CI/CD pipeline

The CI/CD pipeline you will create in this post uses AWS CodePipeline and AWS CodeBuild. AWS CodePipeline is a continuous delivery service that enables you to model, visualize, and automate the steps required to release your software. AWS CodeBuild is a fully-managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy. You will also use AWS CodeCommit as the source repository where your CloudFormation template is stored. AWS CodeCommit is a fully managed source-control service that hosts secure Git-based repositories.

To deploy the pipeline

To deploy the CloudFormation template that builds the source repository and AWS CodePipeline pipeline, select the following Launch Stack button.
In the CloudFormation console, choose Next until you reach the Review page.
Select I acknowledge that AWS CloudFormation might create IAM resources and choose Create stack.
Open the AWS CodeCommit console and choose Repositories.
Select cfn-policy-validator-source-repository.
Download the template.json and template-configuration.json files to your machine.
In the cfn-policy-validator-source-repository, on the right side, select Add file and choose Upload file.
Choose Choose File and select the template.json file that you downloaded previously.
Enter an Author name and an E-mail address and choose Commit changes.
In the cfn-policy-validator-source-repository, repeat steps 7-9 for the template-configuration.json file.

To view validation in the pipeline

In the AWS CodePipeline console choose the IAMPolicyValidatorPipeline.
Watch as your commit travels through the pipeline. If you followed the previous instructions and made two separate commits, you can ignore the failed results of the first pipeline execution. As shown in Figure 3, you will see that the pipeline fails in the Validation stage on the CfnPolicyValidator action, because it detected a blocking finding in the template you committed, which prevents the invalid policy from reaching your AWS environment.

Figure 3: Validation failed on the CfnPolicyValidator action
Under CfnPolicyValidator, choose Details, as shown in Figure 3.
In the Action execution failed pop-up, choose Link to execution details to view the cfn-policy-validator tool output in AWS CodeBuild.

Architectural overview of deploying cfn-policy-validator in your pipeline

You can see the architecture diagram for the CI/CD pipeline you deployed in Figure 4.

Figure 4: CI/CD pipeline that performs IAM policy validation using the AWS CloudFormation Policy Validator and IAM Access Analyzer

Figure 4 shows the following steps, starting with the CodeCommit source action on the left:

The pipeline starts when you commit to your AWS CodeCommit source code repository. The AWS CodeCommit repository is what contains the CloudFormation template that has the IAM policies that you would like to deploy to your AWS environment.
AWS CodePipeline detects the change in your source code repository and begins the Validation stage. The first step it takes is to start an AWS CodeBuild project that runs the CloudFormation template through the AWS CloudFormation Linter (cfn-lint). The cfn-lint tool validates your template against the CloudFormation resource specification. Taking this initial step ensures that you have a valid CloudFormation template before validating your IAM policies. This is an optional step, but a recommended one. Early schema validation provides fast feedback for any typos or mistakes in your template. There’s little benefit to running additional static analysis tools if your template has an invalid schema.
If the cfn-lint tool completes successfully, you then call a separate AWS CodeBuild project that invokes the IAM Policy Validator for AWS CloudFormation (cfn-policy-validator). The cfn-policy-validator tool then extracts the identity-based and resource-based policies from your template, as described earlier, and runs the policies through IAM Access Analyzer.

Note: if your template has parameters, then you need to provide them to the cfn-policy-validator tool. You can provide parameters as command-line arguments, or use a template configuration file. It is recommended to use a template configuration file when running validation with an AWS CodePipeline pipeline. The same file can also be used in the deploy stage to deploy the CloudFormation template. By using the template configuration file, you can ensure that you use the same parameters to validate and deploy your template. The CloudFormation template for the pipeline provided with this blog post defaults to using a template configuration file.

If there are no blocking findings found in the policy validation, the cfn-policy-validator tool exits with an exit code of zero (0) and the pipeline moves to the next stage. If any blocking findings are detected, the cfn-policy-validator tool will exit with a non-zero exit code and the pipeline stops, to prevent the deployment of undesired IAM policies.
The final stage of the pipeline uses the AWS CloudFormation action in AWS CodePipeline to deploy the template to your environment. Your template will only make it to this stage if it passes all static analysis checks run in the Validation Stage.

Cleaning Up

To avoid incurring future charges, in the AWS CloudFormation console delete the validate-iam-policy-pipeline stack. This will remove the validation pipeline from your AWS account.

Summary

In this blog post, I introduced the IAM Policy Validator for AWS CloudFormation (cfn-policy-validator). The cfn-policy-validator tool automates the parsing of identity-based and resource-based IAM policies from your CloudFormation templates and runs those policies through IAM Access Analyzer. This enables you to validate that the policies in your templates follow IAM best practices and do not allow unintended external access to your AWS resources.

I showed you how the IAM Policy Validator for AWS CloudFormation can be included in a CI/CD pipeline. This allows you to run validation on your IAM policies on every commit to your repository and only deploy the template if validation succeeds.

For more information, or to provide feedback and feature requests, see the cfn-policy-validator GitHub repository.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

17 additional AWS services authorized for DoD workloads in the AWS GovCloud Regions

2021-09-09 Tyler Harding

Post Syndicated from Tyler Harding original https://aws.amazon.com/blogs/security/17-additional-aws-services-authorized-for-dod-workloads-in-the-aws-govcloud-regions/

I’m pleased to announce that the Defense Information Systems Agency (DISA) has authorized 17 additional Amazon Web Services (AWS) services and features in the AWS GovCloud (US) Regions, bringing the total to 105 services and major features that are authorized for use by the U.S. Department of Defense (DoD). AWS now offers additional services to DoD mission owners in these categories: business applications; computing; containers; cost management; developer tools; management and governance; media services; security, identity, and compliance; and storage.

Why does authorization matter?

DISA authorization of 17 new cloud services enables mission owners to build secure innovative solutions to include systems that process unclassified national security data (for example, Impact Level 5). DISA’s authorization demonstrates that AWS effectively implemented more than 421 security controls by using applicable criteria from NIST SP 800-53 Revision 4, the US General Services Administration’s FedRAMP High baseline, and the DoD Cloud Computing Security Requirements Guide.

Recently authorized AWS services at DoD Impact Levels (IL) 4 and 5 include the following:

Business Applications

Amazon Simple Email Service (Amazon SES) – Inbound and outbound cloud email
Amazon Pinpoint – Multichannel marketing communication
AWS Marketplace – A digital catalog with thousands of software listings from independent software vendors that you can use to find, test, buy, and deploy software that runs on AWS

Compute

AWS Fargate (a feature of Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS)) – A serverless compute engine for containers

Containers

Amazon Elastic Kubernetes Service (Amazon EKS) – A trusted way to run Kubernetes

Cost Management

AWS Budgets – Set custom budgets to track your cost and usage, from the simplest to the most complex use cases
AWS Cost Explorer – An interface that lets you visualize, understand, and manage your AWS costs and usage over time
AWS Cost & Usage Report – Itemize usage at the account or organization level by product code, usage type, and operation

Developer Tools

AWS CodePipeline – Automate continuous delivery pipelines for fast and reliable updates
AWS X-Ray – Analyze and debug production and distributed applications, such as those built using a microservices architecture

Management & Governance

AWS License Manager – Manage your software licenses from vendors
AWS Personal Health Dashboard – Provide alerts and guidance for AWS events that might affect your environment
AWS Systems Manager – An operations hub for AWS

Media Services

Amazon Textract – Extract printed text, handwriting, and data from virtually any document

Security, Identity & Compliance

Amazon Cognito – Secure user sign-up, sign-in, and access control
AWS Security Hub – Centrally view and manage security alerts and automate security checks

Storage

AWS Backup – Centrally manage and automate backups across AWS services

Figure 1 shows the IL 4 and IL 5 AWS services that are now authorized for DoD workloads, broken out into functional categories.

Figure 1: The AWS services newly authorized by DISA

To learn more about AWS solutions for the DoD, see our AWS solution offerings. Follow the AWS Security Blog for updates on our Services in Scope by Compliance Program. If you have feedback about this blog post, let us know in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How MOIA built a fully automated GDPR compliant data lake using AWS Lake Formation, AWS Glue, and AWS CodePipeline

2021-09-02 Leonardo Pêpe

Post Syndicated from Leonardo Pêpe original https://aws.amazon.com/blogs/big-data/how-moia-built-a-fully-automated-gdpr-compliant-data-lake-using-aws-lake-formation-aws-glue-and-aws-codepipeline/

This is a guest blog post co-written by Leonardo Pêpe, a Data Engineer at MOIA.

MOIA is an independent company of the Volkswagen Group with locations in Berlin and Hamburg, and operates its own ride pooling services in Hamburg and Hanover. The company was founded in 2016 and develops mobility services independently or in partnership with cities and existing transport systems. MOIA’s focus is on ride pooling and the holistic development of the software and hardware for it. In October 2017, MOIA started a pilot project in Hanover to test a ride pooling service, which was brought into public operation in July 2018. MOIA covers the entire value chain in the area of ride pooling. MOIA has developed a ridesharing system to avoid individual car traffic and use the road infrastructure more efficiently.

In this post, we discuss how MOIA uses AWS Lake Formation, AWS Glue, and AWS CodePipeline to store and process gigabytes of data on a daily basis to serve 20 different teams with individual user data access and implement fine-grained control of the data lake to comply with General Data Protection Regulation (GDPR) guidelines. This involves controlling access to data at a granular level. The solution enables MOIA’s fast pace of innovation to automatically adapt user permissions to new tables and datasets as they become available.

Background

Each MOIA vehicle can carry six passengers. Customers interact with the MOIA app to book a trip, cancel a trip, and give feedback. The highly distributed system prepares multiple offers to reach their destination with different pickup points and prices. Customers select an option and are picked up from their chosen location. All interactions between the customers and the app, as well as all the interactions between internal components and systems (the backend’s and vehicle’s IoT components), are sent to MOIA’s data lake.

Data from the vehicle, app, and backend must be centralized to have the synchronization between trips planned and implemented, and then to collect passenger feedback. To provide different pricing and routing options, MOIA needed centralized data. MOIA decided to build and secure its Amazon Simple Storage Service (Amazon S3) based Data Lake using AWS Lake Formation.

Different MOIA teams that includes Data Analysts, Data Scientists and Data Engineers need to access centralized data from different sources for the development and operations of the application workloads. It’s a legal requirement to control the access and format of the data to these different teams. The app development team needs to understand customer feedback in an anonymized way, pricing-related data must be accessed only by the business analytics team, vehicle data is meant to be used only by the vehicle maintenance team, and the routing team needs access to customer location and destination.

Architecture overview

The following diagram illustrates MOIA solution architecture.

The solution has the following components:

Data input – The apps, backend, and IoT devices are continuously streaming event messages via Amazon Kinesis Data Streams in a nested information in a JSON format.
Data transformation – When the raw data is received, the data pipeline orchestrator (Apache Airflow) starts the data transformation jobs to meet requirements for respective data formats. Amazon EMR, AWS Lambda, or AWS Fargate cleans, transforms, and loads the events in the S3 data lake.
Persistence layer – After the data is transformed and normalized, AWS Glue crawlers classify the data to determine the format, schema, and associated properties. They group data into tables or partitions and write metadata to the AWS Glue Data Catalog based on the structured partitions created in Amazon S3. The data is written on Amazon S3 partitioned by its event type and timestamp so it can be easily accessed via Amazon Athena and the correct permissions can be applied using Lake Formation. The crawlers are run at a fixed interval in accordance with the batch jobs, so no manual runs are required.
Governance layer – Access on the data lake is managed using the Lake Formation governance layer. MOIA uses Lake Formation to centrally define security, governance, and auditing policies in one place. These policies are consistently implemented, which eliminates the need to manually configure them across security services like AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), storage services like Amazon S3, or analytics and machine learning (ML) services like Amazon Redshift, Athena, Amazon EMR for Apache Spark. Lake Formation reduces the effort in configuring policies across services and provides consistent assistance for GDPR requirements and compliance. When a user makes a request to access Data Catalog resources or underlying data from the data lake, for the request to succeed, it must pass permission checks by both IAM and Lake Formation. Lake Formation permissions control access to Data Catalog resources, Amazon S3 locations, and the underlying data at those locations. User permissions to the data on the table, row, or column level are defined in Lake Formation, so that users only access the data if they’re authorized.
Data access layer – After the flattened and structured data is available to be accessed by the data analysts, scientists, and engineers they can use Amazon Redshift or Athena to perform SQL analysis, or AWS Data Wrangler and Apache Spark to perform programmatic analysis and build ML use cases.
Infrastructure orchestration and data pipeline – After the data is ingested and transformed, the infrastructure orchestration layer (CodePipeline and Lake Formation) creates or updates the governance layer with the proper and predefined permissions periodically every hour (see more about the automation in the governance layer later in this post).

Governance challenge

MOIA wants to evolve their ML models for routing, demand prediction, and business models continuously. This requires MOIA to constantly review models and update them, therefore power users such as data administrators and engineers frequently redesign the table schemas in the AWS Glue Data Catalog as part of the data engineering workflow. This highly dynamic metadata transformation requires an equally dynamic governance layer pipeline that can assign the right user permissions to all tables and adapt to these changes transparently without disruptions to end-users. Due to GDPR requirements, there is no room for error, and manual work is required to assign the right permissions according to GDPR compliance on the tables in the data lake with many terabytes of data. Without automation, many developers are needed for administration; this adds human error into the workflows, which is not acceptable. Manual administration and access management isn’t a scalable solution. MOIA needed to innovate faster with GDPR compliance with a small team of developers.

Data schema and data structure often changes at MOIA, resulting in new tables being created in the data lake. To guarantee that new tables inherit the permissions granted to the same group of users who already have access to the entire database, MOIA uses an automated process that grants Lake Formation permission to newly created tables, columns, and databases, as described in the next section.

Solution overview

The following diagram illustrates the continuous deployment loop using AWS CloudFormation.

The workflow contains the following steps:

MOIA data scientists and engineers create or modify new tables to evolve the business and ML models. This results in new tables or changes in the existing schema of the table.
The data pipeline is orchestrated via Apache Airflow, which controls the whole cycle of ingestion, cleansing, and data transformation using Spark jobs (as shown in Step 1). When the data is in its desired format and state, Apache Airflow triggers the AWS Glue crawler job to discover the data schema in the S3 bucket and add the new partitions to the Data Catalog. Apache Airflow also triggers CodePipeline to assign the right permissions on all the tables.
Apache Airflow triggers CodePipeline every hour, which uses MOIA’s libraries to discover the new tables, columns, and databases, and grants Lake Formation permissions in the AWS CloudFormation template for all tables, including new and modified. This orchestration layer ensures Lake Formation permissions are granted. Without this step, nobody can access new tables, and new data isn’t visible to data scientists, business analysts, and others who need it.
MOIA uses Stacker and Troposphere to identify all the tables, assign the tables the right permissions, and deploy the CloudFormation stack. Troposphere is the infrastructure as code (IaC) framework that renders the CloudFormation templates, and stacker helps with the parameterization and deployment of the templates on AWS. Stacker also helps produce reusable templates in the form of blueprints as pure Python code, which can be easily parameterized using YAML files. The Python code uses Boto3 clients to lookup the Data Catalog and search for databases and tables.
The stacker blueprint libraries developed by MOIA (which internally use Boto3 and troposphere) are used to discover newly created databases or tables in the Data Catalog.
A Permissions configuration file allows MOIA to predefine data lake tables (Can be a wildcard indicating all tables available in one database) and specific personas who have access to these tables. These roles are split into different categories called lake personas, and have specific access levels. Example lake personas include data domain engineers, data domain technical users, data administrators, and data analysts. Users and their permissions are defined in the file. In case access to personally identifiable information (PII) data is granted, the GDPR officer ensures that a record of processing activities is in place and the usage of personal data is documented according to the law.
MOIA uses stacker to read the predefined permissions file, and uses the customized stacker blueprints library containing the logic to assign permissions to lake personas for each of the tables discovered in Step 5.
The discovered and modified tables are matched with predefined permissions in Step 6 and Step 7. Stacker uses YAML format as the input for the CloudFormation template parameters to define users and data access permissions. These parameters are related to the user’s roles and define access to the tables. Based on this, MOIA creates a customized stacker that creates AWS CloudFormation resources dynamically. The following are the example stacks in AWS CloudFormation:
1. data-lake-domain-database – Holds all resources that are relevant during the creation of the database, which includes lake permissions at the database level that allow the domain database admin personas to perform operations in the database.
2. data-lake-domain-crawler – Scans Amazon S3 constantly to discover and create new tables and updates.
3. data-lake-lake-table-permissions – Uses Boto3 combined with troposphere to list the available tables and grant lake permissions to the domain roles that access it.

This way, new CloudFormation templates are created, containing permissions for new or modified tables. This process guarantees a fully automated governance layer for the data lake. The generated CloudFormation template contains Lake Formation permission resources for each table, database, or column. The process of managing Lake Formation permissions on Data Catalog databases, tables, and columns is simplified by granting Data Catalog permissions using the Lake Formation tag-based access control method. The advantage of generating a CloudFormation template is audibility. When the new version of the CloudFormation stack is prepared with an access control set on new or modified tables, administrators can compare that stack with the older version to discover newly prepared and modified tables. MOIA can view the differences via the AWS CloudFormation console before new stack deployment.

Either the CloudFormation stack is deployed in the account with the right permissions for all users, or a stack deployment failure notice is sent to the developers via a Slack channel (Step 10). This ensures the visibility in the deployment process and failed pipelines in permission assignment.
Whenever the pipeline fails, MOIA receives the notification via Slack so the engineers can promptly fix the errors. This loop is very important to guarantee that whenever the schema changes, those changes are reflected in the data lake without manual intervention.
Following this automated pipeline, all the tables are assigned right permissions.

Benefits

This solution delivers the following benefits:

Automated enforcement of data access permissions allows people to access the data without manual interventions in a scalable way. With automation, 1,000 hours of manual work are saved every year.
The GDPR team does the assessment internally every month. The GDPR team constantly provides guidance and approves each change in permissions. This audit trail for records of processing activities is automated with the help of Lake Formation (otherwise it needs a dedicated human resource).
The automated workflow of permission assignment is integrated into existing CI/CD processes, resulting in faster onboarding of new teams, features, and dataset releases. An average of 48 releases are done each month (including major and minor version releases and new event types). Onboarding new teams and forming internal new teams is very easy now.
This solution enables you to create a data lake using IaC processes that are easy and GDPR-compliant.

Conclusion

MOIA has created scalable, automated, and versioned permissions with a GDPR-supported, governed data lake using Lake Formation. This solution helps them bring new features and models to market faster, and reduces administrative and repetitive tasks. MOIA can focus on 48 average releases every month, contributing to a great customer experience and new data insights.

About the Authors

Leonardo Pêpe is a Data Engineer at MOIA. With a strong background in infrastructure and application support and operations, he is immersed in the DevOps philosophy. He’s helping MOIA build automated solutions for its data platform and enabling the teams to be more data-driven and agile. Outside of MOIA, Leonardo enjoys nature, Jiu-Jitsu and martial arts, and explores the good of life with his family.

Sushant Dhamnekar is a Solutions Architect at AWS. As a trusted advisor, Sushant helps automotive customers to build highly scalable, flexible, and resilient cloud architectures, and helps them follow the best practices around advanced cloud-based solutions. Outside of work, Sushant enjoys hiking, food, travel, and CrossFit workouts.

Shiv Narayanan is Global Business Development Manager for Data Lakes and Analytics solutions at AWS. He works with AWS customers across the globe to strategize, build, develop and deploy modern data platforms. Shiv loves music, travel, food and trying out new tech.

Field Notes: How to Deploy End-to-End CI/CD in the China Regions Using AWS CodePipeline

2021-09-01 Ashutosh Pateriya

Post Syndicated from Ashutosh Pateriya original https://aws.amazon.com/blogs/architecture/field-notes-how-to-deploy-end-to-end-ci-cd-in-the-china-regions-using-aws-codepipeline/

This post was co-authored by Ravi Intodia, Cloud Archiect, Infosys Technologies Ltd, Nirmal Tomar, Principal Consultant, Infosys Technologies Ltd and Ashutosh Pateriya, Solution Architect, AWS.

Today’s businesses must contend with fast-changing competitive environments, expanding security needs, and scalability issues. Businesses must find a way to reconcile the need for operational stability with the need for quick product development. Continuous integration and continuous delivery (CI/CD) enables rapid software iterations while maintaining system stability and security.

With an increase in AWS Cloud and DevOps adoption, many organizations seek solutions which go beyond geographical boundaries. AWS CodePipeline, along with its related services, lets you integrate and deploy your solutions across multiple AWS accounts and Regions. However, it becomes more challenging when you want to deploy your application in multiple AWS Regions as well as in China, due to the unavailability of AWS CodePipeline in the Beijing and Ningxia Regions.

In this blog post, you will learn how to overcome the unique challenges when deploying applications across many parts of the world, including China. For this solution, we will use the power and flexibility of AWS CodeBuild to implement AWS Command Line Interface (AWS CLI) commands to perform custom actions that are not directly supported by CodePipeline or AWS CodeDeploy.

CodePipeline for multi-account and multi-Region deployment consists of the following components:

ArtifactStore and encryption keys – In the AWS account which hosts CodePipeline, there should be an Amazon Simple Storage Service (Amazon S3) bucket and an AWS Key Management Service (AWS KMS) key for each Region where resources need to be deployed.
CodeBuild and CodePipeline roles – In the AWS account which hosts CodePipeline there should be roles created that can be used or assumed by CodeBuild and CodePipeline projects for performing required actions.
Cross-account roles – In each AWS account where cross-account deployments are required, an AWS role with the necessary permissions must be created. The CodePipeline role of the deploying account must be allowed to assume this role for all required accounts. Cross-account roles will also have access to the required S3 buckets and AWS KMS keys for deploying accounts.

Figure 1. High-level solution for AWS Regions

Although the solution works for most Regions, we encounter challenges when we try to expand our current worldwide solutions into the China Regions.

The challenges are as follows:

Cross-account roles – Cross-account roles cannot be created between accounts in non-China Regions and the China Regions. This means that CodeDeploy will be unable to assume the target account role necessary to complete component deployment.
Availability of services – Services required to configure a cloud native CI/CD pipeline are unavailable in the China Regions.
Connectivity – There is no direct network connectivity available between the China Regions and other AWS Regions.
User management – Accounts by users in China are distinct from AWS Region user accounts, and must be maintained independently.

Due to the lack of cross-account roles and the CodePipeline service, setting up a worldwide CI/CD pipeline that includes the China Regions is not automatically supported.

High-level solution

In the proposed solution, we will build and deploy the application to both Regions using the AWS CI/CD services from the non-China Region, and we will create an access key in a China Region with access to deploy the application using services, such as AWS Lambda, AWS Elastic Beanstalk, Amazon Elastic Container Service, and Amazon Elastic Kubernetes Service. This access key is stored in a non-China account as an SSM parameter after encryption. On committing, the CodePipeline in the non-China Region is initiated, and it builds the package and deploys the application in both Regions from a single place.

Solution architecture

Figure 2. High-level solution for cross-account deployment from AWS Regions to a China Region

In this architecture, AWS CLI commands are used to set an AWS profile of CodeBuild instance with China credentials (retrieved from the AWS Systems Manager Parameter Store). This enables a CodeBuild instance to run an AWS CloudFormation package and deploy commands directly on the China account, thereby deploying required resources in the desired China Region.

This solution is not relying on any AWS CI/CD services like CodeDeploy in the China Region. With this solution we can create a complete CI/CD pipeline running in an AWS Region that can deploy an application in both Regions.

The following key components are needed for deployment:

AWS Identity and Access Management (IAM) user credentials – An IAM user needs to be created in the target account in China.
SSM parameter (secure string) – China IAM user access key (secret access key needs to be saved as a secure string SSM parameter in the deployment AWS account).
Update CloudFormation templates – CloudFormation templates need to be updated to support China Region mappings (such as using “arn:aws-cn” instead of “arn:aws”).
Enhance CodeBuild to support build and deployment – CodeBuild buildspec.yml needs to be enhanced to perform build and deployment to China accounts, as mentioned in the following.

Prerequisites

Two AWS accounts: One AWS account outside of China, and one account in China.
Practical experience in deploying Lambda functions using CodeBuild, CodeDeploy, and CodePipeline, and using AWS CLI. Because this example focuses specifically on extending CodePipeline from Regions outside of China to deploy in China Region, we are not going to explore a standard CodePipeline set up.

Detailed Implementation

This solution is built using CodePipeline, CodeCommit, CodeBuild, AWS CloudFormation templates, and IAM.

Steps

One-time key generation in an account in China with necessary access to deploy application, including creation of one S3 bucket for CodeBuild artifacts.
Note: As a best practice, we suggest rotation of the access key every 30 days.
Complete the setup of CodePipeline to deploy application in Regions outside of China, as well as including China Region.

As a demonstration, let’s deploy a Lambda function in us-east-1 and cn-north-1 and discuss the steps in detail. The same steps can be followed to deploy any other AWS service.

Part 1 – In the account based in China Region: cn-north-1

Create an S3 bucket with default encryption enabled for CodeBuild artifacts.

Create an IAM user (with programmatic access only) with the required permissions to deploy Lambda functions and related resources. The IAM user will also have access to the S3 bucket created for CodeBuild artifacts.To create an IAM policy, refer to the AWS IAM Policy resource.

Part 2 – In AWS account based in non-China Region: us-east-1

Create two SSM parameters of type secure string.

SSM Parameter Name	SSM Parameter Value
/China/Dev/UserAccessKey	<Value of China IAM User Access Key>
/China/Dev/UserAccessKey	<Value of China IAM User Access Key>

To create secure SSM parameters using the AWS CLI, refer to the Create a Systems Manager parameter (AWS CLI) tutorial.

To create secure SSM parameters using the AWS Management Console, refer to the Create a Systems Manager parameter (console) tutorial.

Note: Creating secure SSM parameters is not supported by CloudFormation templates. Also, as a security best practice, you should not have any sensitive information as part of CloudFormation templates to avoid any possible security breach.

Create an AWS KMS key for encrypting CodeBuild or CodePipeline artifacts (for cross-Region deployments, create AWS KMS key in all Regions, and create SSM parameters for each in the Region having CodePipeline).
Create artifacts S3 bucket for CodeBuild or CodePipeline artifacts.
Create CI/CD related roles. For CI/CD service roles, refer to:
https://docs.aws.amazon.com/codepipeline/latest/userguide/pipelines-create-service-role.html
https://docs.aws.amazon.com/codebuild/latest/userguide/setting-up.html#setting-up-service-role
Create a CodeCommit repository.
Create SSM parameters for the following.

SSM Parameter Name	SSM Parameter Value
/China/Dev/DeploymentS3Bucket	<Artifact Bucket Name in China Region>
/US/Dev/CodeBuildRole	<Role ARN of CodeBuild Service Role>
/US/Dev/CodePipelineRole	<Role ARN of Codepipeline Service Role>
/US/Dev/CloudformationRole	<Role ARN of Cloudformation Service Role>
/US/Dev/DeploymentS3Bucket	<Artifact Bucket Name in Pipeline Region>
/US/Dev/CodeBuildImage	<Code Build Image Details>
/US/Stage/CrossAccountStageRole	<Role ARN for Cross Account Service Role for Stage>
/US/Prod/CrossAccountStageRole	<Role ARN for Cross Account Service Role for Prod>

In CodeCommit, push the Lambda code and CloudFormation template for deploying Lambda resources (Lambda function, Lambda role, Lambda log group, and so forth).
In CodeCommit, push two buildspec yml files, one for us-east-1, and one for cn-north-1.
1. buildspec.yml: For us-east-1

# Buildspec Reference Doc: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
version: 0.2
phases:
  install:
    runtime-versions:
      python: 3.7
  pre_build:
    commands:
      - echo "[+] Updating PIP...."
      - pip install --upgrade pip
      - echo "[+] Installing dependencies...."
      #- Commands To Install required dependencies
      - yum install zip unzip -y -q
      - pip install awscli --upgrade
      
  build:
    commands:
      - echo "Starting build `date` in `pwd`"
      - echo "Starting SAM packaging `date` in `pwd`"
      - aws cloudformation package --template-file cloudformation_template.yaml --s3-bucket ${S3_BUCKET} --output-template-file transform-packaged.yaml
      # Additional package commands for cross-region deployments
      - echo "SAM packaging completed on `date`"
      - echo "Build completed `date` in `pwd`"
     
artifacts:
  type: zip
  files:
    - transform-packaged.yaml
    # - additional artifacts for cross-region deployments 

  discard-paths: yes

cache:
  paths:
    - '/root/.cache/pip'

1. buildspec-china.yml: For cn-north-1
  buildspec-china.yml will be customized for performing build and deployment both. Refer to the following for details.

# Buildspec Reference Doc: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
version: 0.2    
phases:
  install:
    runtime-versions:
      python: 3.7

  pre_build:
    commands:
      - echo "[+] Updating PIP...."
      - pip install --upgrade pip
      - echo "[+] Installing dependencies...."
      #- Commands To Install required dependencies
      - yum install zip unzip -y -q
      - pip install awscli --upgrade
      # Setting China Region IAM User Profile
      - echo "Start setting User Profile  `date` in `pwd`"
      - USER_ACCESS_KEY=`aws ssm get-parameter --name ${USER_ACCESS_KEY_SSM} --with-decryption --query Parameter.Value --output text`
      - USER_SECRET_KEY=`aws ssm get-parameter --name ${USER_SECRET_KEY_SSM} --with-decryption --query Parameter.Value --output text`
      - aws configure --profile china set aws_access_key_id ${USER_ACCESS_KEY}
      - aws configure --profile china set aws_secret_access_key ${USER_SECRET_KEY}
      - echo "Setting User Profile Completed `date` in `pwd`"

  build:
    commands:
      # Creating Deployment Package 
      - echo "Start build/packaging  `date` in `pwd`"
      - S3_BUCKET=`aws ssm get-parameter --name ${S3_BUCKET_SSM} --query Parameter.Value --output text`
      - zip -q -r package.zip *
      - >
        bash -c '
        aws cloudformation package 
        --template-file cloudformation_template_china.yaml
        --s3-bucket ${S3_BUCKET}
        --output-template-file transformed-template-china.yaml
        --profile china
        --region cn-north-1'
      - echo "Completed build/packaging  `date` in `pwd`"
  post_build:
    commands:
      # Deploying 
      - echo "Start deployment  `date` in `pwd`"
      - >
        bash -c '
        aws cloudformation deploy 
        --capabilities CAPABILITY_NAMED_IAM
        --template-file transformed-template-china.yaml
        --stack-name ${ProjectName}-app-stack-dev
        --profile china
        --region cn-north-1'
      - echo "Completed deployment  `date` in `pwd`"
artifacts:
  type: zip
  files:
    - package.zip
    - transformed-template-china.yaml

Environment Variables: USER_ACCESS_KEY_SSM, USER_SECRET_KEY_SSM and S3_BUCKET_SSM

After creating and committing the previous files, your CodeCommit repository will look like the following.

Now that we have a CodeCommit repository, next we will create a CodePipeline for Lambda with the following stages:

Source – Use the previously created CodeCommit repository (CodeRepository-US-East-1) as the source.
Build – The CodeBuild project uses buildspec.yml by default, and takes output of Source stage as input and builds artifacts for us-east-1.
Deploy
1. Code-Deploy Project for deploying to us-east-1
  This takes output of the previous CodeBuild stage as input and performs deployment in two steps: create-changeset and execute-changeset (assuming the required role attached to the Code-Deploy for deployment).
2. CodeBuild Project for deploying to cn-north-1
  This takes output of Source stage as input and performs build and deployment both to cn-north-1 using buildspec-china.yml. Also, it uses China IAM user credentials and bucket SSM parameters from environment variables.CodeBuild project details are outlined in the following image.

Optional – Add further steps like manual approval, deployment to higher environment, and so forth, as required.

Congratulations! you have just created a CodePipeline with Lambda deployed in both a non-China Region and a China Region. Your CodePipeline should appear similar to the following.

Figure 3. CodePipeline implementation for both Regions

Note: Actual CodePipeline view will be vertical only where all environment deployment will be one after the other. For the purpose of this example, we have placed them side-by-side to more easily showcase multiple environments.

CodePipeline Implementation Steps

We have created this pipeline with the following high-level steps, and you can add or remove steps as needed.

Step 1. After you commit the source code, CodePipeline will launch in the non-China Region and fetch the source code.

Step 2. Build the package using using buildspec.yml.

Step 3. Deploy the application in both Regions by following the subsection steps for development environment.

1. Create the changeset for the development environment.
2. Implement the changeset for the development environment.

Step 4. Repeat step 3, but deploy the application in the staging environment.

Step 5. Wait for approval from your administrator or application owner before deploying application in production environment.

Step 6. Repeat steps 3 and 4 to deploy the application in the production environment.

Cleaning up

To avoid incurring future charges, clean up the resources created as part of this blog post.

Delete the CloudFormation stack created in the non-China Region.
Delete the SSM parameter created to store the access key.
Delete the access created in the China Region.

Conclusion

In this blog post, we have explored the question: how can you use AWS services to implement CI/CD in a China Region and keep them in sync with an AWS Region? Although we are using us-east-1 as an example here, this solution will work for any Region where CodePipeline services are available, including the China Region.

The question has been answered by dividing it into three problem statements as follows.

Problem 1: CodePipeline is not available in the China Regions.
Solution: Set up CodePipeline in a non-China Region and deploy to a China Region.

Problem 2: AWS cross-account roles are not possible between a non-China Region and the China Regions.
Solution: Use the power and flexibility of CodeBuild to build your application and also deploy your application using the AWS CLI.

Problem 3: Keep a non-China Region and the China Regions in sync.
Solution: Maintain all code and managing deployments from a common deployment AWS account.

Reference:

AWS CodeCommit | Managed Source Control Service

AWS CodePipeline | Continuous Integration and Continuous Delivery

AWS CodeBuild – Fully Managed Build Service

Using AWS CodePipeline for deploying container images to AWS Lambda Functions

2021-08-20 Kirankumar Chandrashekar

Post Syndicated from Kirankumar Chandrashekar original https://aws.amazon.com/blogs/devops/using-aws-codepipeline-for-deploying-container-images-to-aws-lambda-functions/

AWS Lambda launched support for packaging and deploying functions as container images at re:Invent 2020. In the post working with Lambda layers and extensions in container images, we demonstrated packaging Lambda Functions with layers while using container images. This post will teach you to use AWS CodePipeline to deploy docker images for microservices architecture involving multiple Lambda Functions and a common layer utilized as a separate container image. Lambda functions packaged as container images do not support adding Lambda layers to the function configuration. Alternatively, we can use a container image as a common layer while building other container images along with Lambda Functions shown in this post. Packaging Lambda functions as container images enables familiar tooling and larger deployment limits.

Here are some advantages of using container images for Lambda:

Easier dependency management and application building with container
- Install native operating system packages
- Install language-compatible dependencies
Consistent tool set for containers and Lambda-based applications
- Utilize the same container registry to store application artifacts (Amazon ECR, Docker Hub)
- Utilize the same build and pipeline tools to deploy
- Tools that can inspect Dockerfile work the same
Deploy large applications with AWS-provided or third-party images up to 10 GB
- Include larger application dependencies that previously were impossible

When using container images with Lambda, CodePipeline automatically detects code changes in the source repository in AWS CodeCommit, then passes the artifact to the build server like AWS CodeBuild and pushes the container images to ECR, which is then deployed to Lambda functions.

Architecture diagram

DevOps Architecture

Lambda-docker-images-DevOps-Architecture

Application Architecture

lambda-docker-image-microservices-app

In the above architecture diagram, two architectures are combined, namely 1, DevOps Architecture and 2, Microservices Application Architecture. DevOps architecture demonstrates the use of AWS Developer services such as AWS CodeCommit, AWS CodePipeline, AWS CodeBuild along with Amazon Elastic Container Repository (ECR) and AWS CloudFormation. These are used to support Continuous Integration and Continuous Deployment/Delivery (CI/CD) for both infrastructure and application code. Microservices Application architecture demonstrates how various AWS Lambda Functions that are part of microservices utilize container images for application code. This post will focus on performing CI/CD for Lambda functions utilizing container containers. The application code used in here is a simpler version taken from Serverless DataLake Framework (SDLF). For more information, refer to the AWS Samples GitHub repository for SDLF here.

DevOps workflow

There are two CodePipelines: one for building and pushing the common layer docker image to Amazon ECR, and another for building and pushing the docker images for all the Lambda Functions within the microservices architecture to Amazon ECR, as well as deploying the microservices architecture involving Lambda Functions via CloudFormation. Common layer container image functions as a common layer in all other Lambda Function container images, therefore its code is maintained in a separate CodeCommit repository used as a source stage for a CodePipeline. Common layer CodePipeline takes the code from the CodeCommit repository and passes the artifact to a CodeBuild project that builds the container image and pushes it to an Amazon ECR repository. This common layer ECR repository functions as a source in addition to the CodeCommit repository holding the code for all other Lambda Functions and resources involved in the microservices architecture CodePipeline.

Due to all or the majority of the Lambda Functions in the microservices architecture requiring the common layer container image as a layer, any change made to it should invoke the microservices architecture CodePipeline that builds the container images for all Lambda Functions. Moreover, a CodeCommit repository holding the code for every resource in the microservices architecture is another source to that CodePipeline to get invoked. This has two sources, because the container images in the microservices architecture should be built for changes in the common layer container image as well as for the code changes made and pushed to the CodeCommit repository.

Below is the sample dockerfile that uses the common layer container image as a layer:

ARG ECR_COMMON_DATALAKE_REPO_URL
FROM ${ECR_COMMON_DATALAKE_REPO_URL}:latest AS layer
FROM public.ecr.aws/lambda/python:3.8
# Layer Code
WORKDIR /opt
COPY --from=layer /opt/ .
# Function Code
WORKDIR /var/task
COPY src/lambda_function.py .
CMD ["lambda_function.lambda_handler"]

where the argument ECR_COMMON_DATALAKE_REPO_URL should resolve to the ECR url for common layer container image, which is provided to the --build-args along with docker build command. For example:

export ECR_COMMON_DATALAKE_REPO_URL="0123456789.dkr.ecr.us-east-2.amazonaws.com/dev-routing-lambda"
docker build --build-arg ECR_COMMON_DATALAKE_REPO_URL=$ECR_COMMON_DATALAKE_REPO_URL .

Deploying a Sample

Step1: Clone the repository Codepipeline-lambda-docker-images to your workstation. If using the zip file, then unzip the file to a local directory.
- ```
git clone https://github.com/aws-samples/codepipeline-lambda-docker-images.git
```
Step 2: Change the directory to the cloned directory or extracted directory. The local code repository structure should appear as follows:
- ```
cd codepipeline-lambda-docker-images
```

code-repository-structure

Step 3: Deploy the CloudFormation stack used in the template file CodePipelineTemplate/codepipeline.yaml to your AWS account. This deploys the resources required for DevOps architecture involving AWS CodePipelines for common layer code and microservices architecture code. Deploy CloudFormation stacks using the AWS console by following the documentation here, providing the name for the stack (for example datalake-infra-resources) and passing the parameters while navigating the console. Furthermore, use the AWS CLI to deploy a CloudFormation stack by following the documentation here.
Step 4: When the CloudFormation Stack deployment completes, navigate to the AWS CloudFormation console and to the Outputs section of the deployed stack, then note the CodeCommit repository urls. Three CodeCommit repo urls are available in the CloudFormation stack outputs section for each CodeCommit repository. Choose one of them based on the way you want to access it. Refer to the following documentation Setting up for AWS CodeCommit. I will be using the git-remote-codecommit (grc) method throughout this post for CodeCommit access.
Step 5: Clone the CodeCommit repositories and add code:
- - Common Layer CodeCommit repository: Take the value of the Output for the key oCommonLayerCodeCommitHttpsGrcRepoUrl from datalake-infra-resources CloudFormation Stack Outputs section which looks like below:
- - Clone the repository:
    - git clone codecommit::us-east-2://dev-CommonLayerCode
  - Change the directory to dev-CommonLayerCode
    - cd dev-CommonLayerCode
  - Add contents to the cloned repository from the source code downloaded in Step 1. Copy the code from the CommonLayerCode directory and the repo contents should appear as follows:
- - Create the main branch and push to the remote repository
```
git checkout -b main
git add ./
git commit -m "Initial Commit"
git push -u origin main
```
  - Application CodeCommit repository: Take the value of the Output for the key oAppCodeCommitHttpsGrcRepoUrl from datalake-infra-resources CloudFormation Stack Outputs section which looks like below:
- - Clone the repository:
    - git clone codecommit::us-east-2://dev-AppCode
  - Change the directory to dev-CommonLayerCode
    - cd dev-AppCode
  - Add contents to the cloned repository from the source code downloaded in Step 1. Copy the code from the ApplicationCode directory and the repo contents should appear as follows from the root:
- Create the main branch and push to the remote repository
```
git checkout -b main
git add ./
git commit -m "Initial Commit"
git push -u origin main
```

What happens now?

Now the Common Layer CodePipeline goes to the InProgress state and invokes the Common Layer CodeBuild project that builds the docker image and pushes it to the Common Layer Amazon ECR repository. The image tag utilized for the container image is the value resolved for the environment variable available in the AWS CodeBuild project CODEBUILD_RESOLVED_SOURCE_VERSION. This is the CodeCommit git Commit Id in this case.
For example, if the CommitId in CodeCommit is f1769c87, then the pushed docker image will have this tag along with latest

buildspec.yaml files appears as follows:

version: 0.2
phases:
  install:
    runtime-versions:
      docker: 19
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=$ECR_COMMON_DATALAKE_REPO_URL
      - COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - IMAGE_TAG=${COMMIT_HASH:=latest}
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...          
      - docker build -t $REPOSITORY_URI:latest .
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG

Now the microservices architecture CodePipeline goes to the InProgress state and invokes all of the application image builder CodeBuild project that builds the docker images and pushes them to the Amazon ECR repository.

To improve the performance, every docker image is built in parallel within the codebuild project. The buildspec.yaml executes the build.sh script. This has the logic to build docker images required for each Lambda Function part of the microservices architecture. The docker images used for this sample architecture took approximately 4 to 5 minutes when the docker images were built serially. After switching to parallel building, it took approximately 40 to 50 seconds.

buildspec.yaml files appear as follows:

version: 0.2
phases:
  install:
    runtime-versions:
      docker: 19
    commands:
      - uname -a
      - set -e
      - chmod +x ./build.sh
      - ./build.sh
artifacts:
  files:
    - cfn/**/*
  name: builds/$CODEBUILD_BUILD_NUMBER/cfn-artifacts

build.sh file appears as follows:

#!/bin/bash
set -eu
set -o pipefail

RESOURCE_PREFIX="${RESOURCE_PREFIX:=stg}"
AWS_DEFAULT_REGION="${AWS_DEFAULT_REGION:=us-east-1}"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text 2>&1)
ECR_COMMON_DATALAKE_REPO_URL="${ECR_COMMON_DATALAKE_REPO_URL:=$ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com\/$RESOURCE_PREFIX-common-datalake-library}"
pids=()
pids1=()

PROFILE='new-profile'
aws configure --profile $PROFILE set credential_source EcsContainer

aws --version
$(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
BUILD_TAG=build-$(echo $CODEBUILD_BUILD_ID | awk -F":" '{print $2}')
IMAGE_TAG=${BUILD_TAG:=COMMIT_HASH:=latest}

cd dockerfiles;
mkdir ../logs
function pwait() {
    while [ $(jobs -p | wc -l) -ge $1 ]; do
        sleep 1
    done
}

function build_dockerfiles() {
    if [ -d $1 ]; then
        directory=$1
        cd $directory
        echo $directory
        echo "---------------------------------------------------------------------------------"
        echo "Start creating docker image for $directory..."
        echo "---------------------------------------------------------------------------------"
            REPOSITORY_URI=$ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$RESOURCE_PREFIX-$directory
            docker build --build-arg ECR_COMMON_DATALAKE_REPO_URL=$ECR_COMMON_DATALAKE_REPO_URL . -t $REPOSITORY_URI:latest -t $REPOSITORY_URI:$IMAGE_TAG -t $REPOSITORY_URI:$COMMIT_HASH
            echo Build completed on `date`
            echo Pushing the Docker images...
            docker push $REPOSITORY_URI
        cd ../
        echo "---------------------------------------------------------------------------------"
        echo "End creating docker image for $directory..."
        echo "---------------------------------------------------------------------------------"
    fi
}

for directory in *; do 
   echo "------Started processing code in $directory directory-----"
   build_dockerfiles $directory 2>&1 1>../logs/$directory-logs.log | tee -a ../logs/$directory-logs.log &
   pids+=($!)
   pwait 20
done

for pid in "${pids[@]}"; do
  wait "$pid"
done

cd ../cfn/
function build_cfnpackages() {
    if [ -d ${directory} ]; then
        directory=$1
        cd $directory
        echo $directory
        echo "---------------------------------------------------------------------------------"
        echo "Start packaging cloudformation package for $directory..."
        echo "---------------------------------------------------------------------------------"
        aws cloudformation package --profile $PROFILE --template-file template.yaml --s3-bucket $S3_BUCKET --output-template-file packaged-template.yaml
        echo "Replace the parameter 'pEcrImageTag' value with the latest built tag"
        echo $(jq --arg Image_Tag "$IMAGE_TAG" '.Parameters |= . + {"pEcrImageTag":$Image_Tag}' parameters.json) > parameters.json
        cat parameters.json
        ls -al
        cd ../
        echo "---------------------------------------------------------------------------------"
        echo "End packaging cloudformation package for $directory..."
        echo "---------------------------------------------------------------------------------"
    fi
}

for directory in *; do
    echo "------Started processing code in $directory directory-----"
    build_cfnpackages $directory 2>&1 1>../logs/$directory-logs.log | tee -a ../logs/$directory-logs.log &
    pids1+=($!)
    pwait 20
done

for pid in "${pids1[@]}"; do
  wait "$pid"
done

cd ../logs/
ls -al
for f in *; do
  printf '%s\n' "$f"
  paste /dev/null - < "$f"
done

cd ../

The function build_dockerfiles() loops through each directory within the dockerfiles directory and runs the docker build command in order to build the docker image. The name for the docker image and then the ECR repository is determined by the directory name in which the DockerFile is used from. For example, if the DockerFile directory is routing-lambda and the environment variables take the below values,

ACCOUNT_ID=0123456789
AWS_DEFAULT_REGION=us-east-2
RESOURCE_PREFIX=dev
directory=routing-lambda
REPOSITORY_URI=$ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$RESOURCE_PREFIX-$directory

Then REPOSITORY_URI becomes 0123456789.dkr.ecr.us-east-2.amazonaws.com/dev-routing-lambda
And the docker image is pushed to this resolved REPOSITORY_URI. Similarly, docker images for all other directories are built and pushed to Amazon ECR.

Important Note: The ECR repository names match the directory names where the DockerFiles exist and was already created as part of the CloudFormation template codepipeline.yaml that was deployed in step 3. In order to add more Lambda Functions to the microservices architecture, make sure that the ECR repository name added to the new repository in the codepipeline.yaml template matches the directory name within the AppCode repository dockerfiles directory.

Every docker image is built in parallel in order to save time. Each runs as a separate operating system process and is pushed to the Amazon ECR repository. This also controls the number of processes that could run in parallel by setting a value for the variable pwait within the loop. For example, if pwait 20, then the maximum number of parallel processes is 20 at a given time. The image tag for all docker images used for Lambda Functions is constructed via the CodeBuild BuildId, which is available via environment variable $CODEBUILD_BUILD_ID, in order to ensure that a new image gets a new tag. This is required for CloudFormation to detect changes and update Lambda Functions with the new container image tag.

Once every docker image is built and pushed to Amazon ECR in the CodeBuild project, it builds every CloudFormation package by uploading all local artifacts to Amazon S3 via AWS Cloudformation package CLI command for the templates available in its own directory within the cfn directory. Moreover, it updates every parameters.json file for each directory with the ECR image tag to the parameter value pEcrImageTag. This is required for CloudFormation to detect changes and update the Lambda Function with the new image tag.

After this, the CodeBuild project will output the packaged CloudFormation templates and parameters files as an artifact to AWS CodePipeline so that it can be deployed via AWS CloudFormation in further stages. This is done by first creating a ChangeSet and then deploying it at the next stage.

Testing the microservices architecture

As stated above, the sample application utilized for microservices architecture involving multiple Lambda Functions is a modified version of the Serverless Data Lake Framework. The microservices architecture CodePipeline deployed every AWS resource required to run the SDLF application via AWS CloudFormation stages. As part of SDLF, it also deployed a set of DynamoDB tables required for the applications to run. I utilized the meteorites sample for this, thereby the DynamoDb tables should be added with the necessary data for the application to run for this sample.

Utilize the AWS console to write data to the AWS DynamoDb Table. For more information, refer to this documentation. The sample json files are in the utils/DynamoDbConfig/ directory.

1. Add the record below to the octagon-Pipelines-dev DynamoDB table:

{
"description": "Main Pipeline to Ingest Data",
"ingestion_frequency": "WEEKLY",
"last_execution_date": "2020-03-11",
"last_execution_duration_in_seconds": 4.761,
"last_execution_id": "5445249c-a097-447a-a957-f54f446adfd2",
"last_execution_status": "COMPLETED",
"last_execution_timestamp": "2020-03-11T02:34:23.683Z",
"last_updated_timestamp": "2020-03-11T02:34:23.683Z",
"modules": [
{
"name": "pandas",
"version": "0.24.2"
},
{
"name": "Python",
"version": "3.7"
}
],
"name": "engineering-main-pre-stage",
"owner": "Yuri Gagarin",
"owner_contact": "y.gagarin@",
"status": "ACTIVE",
"tags": [
{
"key": "org",
"value": "VOSTOK"
}
],
"type": "INGESTION",
"version": 127
}

2. Add the record below to the octagon-Pipelines-dev DynamoDB table:

{
"description": "Main Pipeline to Merge Data",
"ingestion_frequency": "WEEKLY",
"last_execution_date": "2020-03-11",
"last_execution_duration_in_seconds": 570.559,
"last_execution_id": "0bb30d20-ace8-4cb2-a9aa-694ad018694f",
"last_execution_status": "COMPLETED",
"last_execution_timestamp": "2020-03-11T02:44:36.069Z",
"last_updated_timestamp": "2020-03-11T02:44:36.069Z",
"modules": [
{
"name": "PySpark",
"version": "1.0"
}
],
"name": "engineering-main-post-stage",
"owner": "Neil Armstrong",
"owner_contact": "n.armstrong@",
"status": "ACTIVE",
"tags": [
{
"key": "org",
"value": "NASA"
}
],
"type": "TRANSFORM",
"version": 4
}

3. Add the record below to the octagon-Datsets-dev DynamoDB table:

{
"classification": "Orange",
"description": "Meteorites Name, Location and Classification",
"frequency": "DAILY",
"max_items_process": 250,
"min_items_process": 1,
"name": "engineering-meteorites",
"owner": "NASA",
"owner_contact": "[email protected]",
"pipeline": "main",
"tags": [
{
"key": "cost",
"value": "meteorites division"
}
],
"transforms": {
"stage_a_transform": "light_transform_blueprint",
"stage_b_transform": "heavy_transform_blueprint"
},
"type": "TRANSACTIONAL",
"version": 1
}

If you want to create these samples using AWS CLI, please refer to this documentation.

Record 1:

aws dynamodb put-item --table-name octagon-Pipelines-dev --item '{"description":{"S":"Main Pipeline to Merge Data"},"ingestion_frequency":{"S":"WEEKLY"},"last_execution_date":{"S":"2021-03-16"},"last_execution_duration_in_seconds":{"N":"930.097"},"last_execution_id":{"S":"e23b7dae-8e83-4982-9f97-5784a9831a14"},"last_execution_status":{"S":"COMPLETED"},"last_execution_timestamp":{"S":"2021-03-16T04:31:16.968Z"},"last_updated_timestamp":{"S":"2021-03-16T04:31:16.968Z"},"modules":{"L":[{"M":{"name":{"S":"PySpark"},"version":{"S":"1.0"}}}]},"name":{"S":"engineering-main-post-stage"},"owner":{"S":"Neil Armstrong"},"owner_contact":{"S":"n.armstrong@"},"status":{"S":"ACTIVE"},"tags":{"L":[{"M":{"key":{"S":"org"},"value":{"S":"NASA"}}}]},"type":{"S":"TRANSFORM"},"version":{"N":"8"}}'

Record 2:

aws dynamodb put-item --table-name octagon-Pipelines-dev --item '{"description":{"S":"Main Pipeline to Ingest Data"},"ingestion_frequency":{"S":"WEEKLY"},"last_execution_date":{"S":"2021-03-28"},"last_execution_duration_in_seconds":{"N":"1.75"},"last_execution_id":{"S":"7e0e04e7-b05e-41a6-8ced-829d47866a6a"},"last_execution_status":{"S":"COMPLETED"},"last_execution_timestamp":{"S":"2021-03-28T20:23:06.031Z"},"last_updated_timestamp":{"S":"2021-03-28T20:23:06.031Z"},"modules":{"L":[{"M":{"name":{"S":"pandas"},"version":{"S":"0.24.2"}}},{"M":{"name":{"S":"Python"},"version":{"S":"3.7"}}}]},"name":{"S":"engineering-main-pre-stage"},"owner":{"S":"Yuri Gagarin"},"owner_contact":{"S":"y.gagarin@"},"status":{"S":"ACTIVE"},"tags":{"L":[{"M":{"key":{"S":"org"},"value":{"S":"VOSTOK"}}}]},"type":{"S":"INGESTION"},"version":{"N":"238"}}'

Record 3:

aws dynamodb put-item --table-name octagon-Pipelines-dev --item '{"description":{"S":"Main Pipeline to Ingest Data"},"ingestion_frequency":{"S":"WEEKLY"},"last_execution_date":{"S":"2021-03-28"},"last_execution_duration_in_seconds":{"N":"1.75"},"last_execution_id":{"S":"7e0e04e7-b05e-41a6-8ced-829d47866a6a"},"last_execution_status":{"S":"COMPLETED"},"last_execution_timestamp":{"S":"2021-03-28T20:23:06.031Z"},"last_updated_timestamp":{"S":"2021-03-28T20:23:06.031Z"},"modules":{"L":[{"M":{"name":{"S":"pandas"},"version":{"S":"0.24.2"}}},{"M":{"name":{"S":"Python"},"version":{"S":"3.7"}}}]},"name":{"S":"engineering-main-pre-stage"},"owner":{"S":"Yuri Gagarin"},"owner_contact":{"S":"y.gagarin@"},"status":{"S":"ACTIVE"},"tags":{"L":[{"M":{"key":{"S":"org"},"value":{"S":"VOSTOK"}}}]},"type":{"S":"INGESTION"},"version":{"N":"238"}}'

Now upload the sample json files to the raw s3 bucket. The raw S3 bucket name can be obtained in the output of the common-cloudformation stack deployed as part of the microservices architecture CodePipeline. Navigate to the CloudFormation console in the region where the CodePipeline was deployed and locate the stack with the name common-cloudformation, navigate to the Outputs section, and then note the output bucket name with the key oCentralBucket. Navigate to the Amazon S3 Bucket console and locate the bucket for oCentralBucket, create two path directories named engineering/meteorites, and upload every sample json file to this directory. Meteorites sample json files are available in the utils/meteorites-test-json-files directory of the previously cloned repository. Wait a few minutes and then navigate to the stage bucket noted from the common-cloudformation stack output name oStageBucket. You can see json files converted into csv in pre-stage/engineering/meteorites folder in S3. Wait a few more minutes and then navigate to the post-stage/engineering/meteorites folder in the oStageBucket to see the csv files converted to parquet format.

Cleanup

Navigate to the AWS CloudFormation console, note the S3 bucket names from the common-cloudformation stack outputs, and empty the S3 buckets. Refer to Emptying the Bucket for more information.

Delete the CloudFormation stacks in the following order:
1. Common-Cloudformation
2. stagea
3. stageb
4. sdlf-engineering-meteorites
Then delete the infrastructure CloudFormation stack datalake-infra-resources deployed using the codepipeline.yaml template. Refer to the following documentation to delete CloudFormation Stacks: Deleting a stack on the AWS CloudFormation console or Deleting a stack using AWS CLI.

Conclusion

This method lets us use CI/CD via CodePipeline, CodeCommit, and CodeBuild, along with other AWS services, to automatically deploy container images to Lambda Functions that are part of the microservices architecture. Furthermore, we can build a common layer that is equivalent to the Lambda layer that could be built independently via its own CodePipeline, and then build the container image and push to Amazon ECR. Then, the common layer container image Amazon ECR functions as a source along with its own CodeCommit repository which holds the code for the microservices architecture CodePipeline. Having two sources for microservices architecture codepipeline lets us build every docker image. This is due to a change made to the common layer docker image that is referred to in other docker images, and another source that holds the code for other microservices including Lambda Function.

About the Author

Kirankumar Chandrashekar is a Sr.DevOps consultant at AWS Professional Services. He focuses on leading customers in architecting DevOps technologies. Kirankumar is passionate about DevOps, Infrastructure as Code, and solving complex customer issues. He enjoys music, as well as cooking and traveling.

Chaos Testing with AWS Fault Injection Simulator and AWS CodePipeline

2021-08-11 Matt Chastain

Post Syndicated from Matt Chastain original https://aws.amazon.com/blogs/architecture/chaos-testing-with-aws-fault-injection-simulator-and-aws-codepipeline/

The COVID-19 pandemic has proven to be the largest stress test of our technology infrastructures in generations. A meteoric increase in internet consumption followed, due in large part to working and schooling from home. The chaotic, early months of the pandemic have clearly demonstrated the value of resiliency in production. How can we better prepare our critical systems for these global events in the future? A more modern approach to testing and validating your application architecture is needed. Chaos engineering has emerged as an innovative approach to solving these types of challenges.

This blog shows an architecture pattern for automating chaos testing as part of your continuous integration/continuous delivery (CI/CD) process. By automating the implementation of chaos experiments inside CI/CD pipelines, complex risks and modeled failure scenarios can be tested against application environments with every deployment. Application teams can use the results of these experiments to prioritize improvements in their architecture. These results will give your team the confidence they need to operate in an unpredictable production environment.

AWS Fault Injection Simulator (FIS) is a managed service that enables you to perform fault injection experiments on your AWS workloads. Fault injection is based on the principles of chaos engineering. These experiments stress an application by creating disruptive events so that you can observe how your application responds. You can then use this information to improve the performance and resiliency of your applications. With AWS FIS, you set up and run experiments that help you create the real-world conditions needed to uncover application issues.

AWS CodePipeline is a fully managed continuous delivery service for fast and reliable application and infrastructure updates. You can use AWS CodePipeline to model and automate your software release processes. Automating your build, test, and release process allows you to quickly test each code change. You can ensure the quality of your application or infrastructure code by running each change through your staging and release process.

Continuous chaos testing

Figure 1. High-level architecture pattern for automating chaos engineering

Create FIS experiments

Begin with creating an FIS experiment template by configuring one or more actions (action set) to run against the target resources of the application architecture. Here we have created an action to stop Amazon EC2 instances in our Amazon Elastic Container Service (ECS) cluster identified by a tag. Target resources can be identified by resource IDs, filters, or tags. We can also set up the action parameters for running the actions before or after the actions/duration. Additionally, you can set up Amazon CloudWatch alarms to stop running one or more fault experiments once a particular threshold or boundary has been reached. In Increase your e-commerce website reliability using chaos engineering and AWS Fault Injection Simulator, Bastien Leblanc shares how to set up CloudWatch metric thresholds as stop conditions for experiments.

Figure 2. AWS FIS experiment template

Author AWS Lambda to initiate FIS experiments

Create/Add a FIS IAM role to the Lambda function in the configuration permissions section. To start a specific FIS experiment, we use the experimentTemplateId parameter in our Lambda code. Refer to the AWS FIS API Reference when writing your Lambda code. When integrating the Lambda function into your pipeline, a new AWS CodePipeline can be created or an existing one can be used. A new pipeline stage is added at the point we initiate our Lambda function (post deployment stage), which launches our FIS experiment.

Figure 3. AWS Lambda function initiating AWS FIS experiment

Figure 4. AWS CodePipeline with AWS FIS experiment stage

The experimentTemplateId parameter can also be staged as a key/value ‘environment variable’ in your Lambda function configuration. This is useful as it allows you to change your FIS experiment template without having to adjust your function code. You can use the same Lambda function code by dynamically injecting the experimentTemplateId in multiple environments on your way to production.

Verify FIS experiment results on deployed application

By continuously performing fault injection post-deployment in AWS CodePipeline, you learn about complex failure conditions, which you must solve. User experience and availability testing on your application during the runtime of the FIS experiment can be started by a notification rule. In AWS CodePipeline, you can use an Amazon Simple Notification Service (SNS) topic or chatbot integration. CloudWatch Synthetics can be used for those looking to automate experience testing on the candidate application while other FIS experiments are running.

Figure 5. AWS CodePipeline notification rule setting

Summary

Using AWS CodePipeline to automate chaos engineering experiments on application architecture with AWS FIS is straightforward. Following are some benefits from automating fault injection testing in our CI/CD pipelines:

Our team can achieve a higher degree of confidence in meeting the resiliency requirements of our application. We use a more modern approach to testing and automating this experimentation inside our existing CI/CD process with AWS CodePipeline.
We know more about the unknown risks to our application. All testing results we receive provide benefit and learning opportunities for our team. We use these results to understand what we do well, where we need to improve, or what we are willing to tolerate based on our application requirements.
Continuously evaluating our architectural fitness inside CI/CD allows our team to validate the impact each feature or component iteration has on the resiliency of our application.

However, the sole value of automating chaos testing is not limited to finding, fixing, or documenting the risks that surface in our application. Additional confidence is gained through constantly validating your operational practices, such as alerts and alarms, monitoring, and notifications.

FIS gives you a controlled and repeatable way to reproduce necessary conditions to fine-tune your operational procedures and runbooks. Automating this testing inside a CI/CD pipeline ensures a nearly continuous feedback loop for these operational practices.

If you want to review additional FIS experiment examples, check out our AWS Samples GitHub
More blog posts on FIS

CICD on Serverless Applications using AWS CodeArtifact

2021-08-07 Anand Krishna

Post Syndicated from Anand Krishna original https://aws.amazon.com/blogs/devops/cicd-on-serverless-applications-using-aws-codeartifact/

Developing and deploying applications rapidly to users requires a working pipeline that accepts the user code (usually via a Git repository). AWS CodeArtifact was announced in 2020. It’s a secure and scalable artifact management product that easily integrates with other AWS products and services. CodeArtifact allows you to publish, store, and view packages, list package dependencies, and share your application’s packages.

In this post, I will show how we can build a simple DevOps pipeline for a sample JAVA application (JAR file) to be built with Maven.

Solution Overview

We utilize the following AWS services/Tools/Frameworks to set up our continuous integration, continuous deployment (CI/CD) pipeline:

The following diagram illustrates the pipeline architecture and flow:

aws-codeartifact-pipeline

Our pipeline is built on CodePipeline with CodeCommit as the source (CodePipeline Source Stage). This triggers the pipeline via a CloudWatch Events rule. Then the code is fetched from the CodeCommit repository branch (main) and sent to the next pipeline phase. This CodeBuild phase is specifically for compiling, packaging, and publishing the code to CodeArtifact by utilizing a package manager—in this case Maven.

After Maven publishes the code to CodeArtifact, the pipeline asks for a manual approval to be directly approved in the pipeline. It can also optionally trigger an email alert via Amazon Simple Notification Service (Amazon SNS). After approval, the pipeline moves to another CodeBuild phase. This downloads the latest packaged JAR file from a CodeArtifact repository and deploys to the AWS Lambda function.

Clone the Repository

Clone the GitHub repository as follows:

git clone https://github.com/aws-samples/aws-cdk-codeartifact-pipeline-sample.git

Code Deep Dive

After the Git repository is cloned, the directory structure is shown as in the following screenshot :

aws-codeartifact-pipeline-code

Let’s study the files and code to understand how the pipeline is built.

The directory java-events is a sample Java Maven project. Find numerous sample applications on GitHub. For this post, we use the sample application java-events.

To add your own application code, place the pom.xml and settings.xml files in the root directory for the AWS CDK project.

Let’s study the code in the file lib/cdk-pipeline-codeartifact-new-stack.ts of the stack CdkPipelineCodeartifactStack. This is the heart of the AWS CDK code that builds the whole pipeline. The stack does the following:

Creates a CodeCommit repository called ca-pipeline-repository.
References a CloudFormation template (lib/ca-template.yaml) in the AWS CDK code via the module @aws-cdk/cloudformation-include.
Creates a CodeArtifact domain called cdkpipelines-codeartifact.
Creates a CodeArtifact repository called cdkpipelines-codeartifact-repository.
Creates a CodeBuild project called JarBuild_CodeArtifact. This CodeBuild phase does all of the code compiling, packaging, and publishing to CodeArtifact into a repository called cdkpipelines-codeartifact-repository.
Creates a CodeBuild project called JarDeploy_Lambda_Function. This phase fetches the latest artifact from CodeArtifact created in the previous step (cdkpipelines-codeartifact-repository) and deploys to the Lambda function.
Finally, creates a pipeline with four phases:
- Source as CodeCommit (ca-pipeline-repository).
- CodeBuild project JarBuild_CodeArtifact.
- A Manual approval Stage.
- CodeBuild project JarDeploy_Lambda_Function.

CodeArtifact shows the domain-specific and repository-specific connection settings to mention/add in the application’s pom.xml and settings.xml files as below:

aws-codeartifact-repository-connections

Deploy the Pipeline

The AWS CDK code requires the following packages in order to build the CI/CD pipeline:

@aws-cdk/core
@aws-cdk/aws-codepipeline
@aws-cdk/aws-codepipeline-actions
@aws-cdk/aws-codecommit
@aws-cdk/aws-codebuild
@aws-cdk/aws-iam
@aws-cdk/cloudformation-include

Install the required AWS CDK packages as below:

npm i @aws-cdk/core @aws-cdk/aws-codepipeline @aws-cdk/aws-codepipeline-actions @aws-cdk/aws-codecommit @aws-cdk/aws-codebuild @aws-cdk/pipelines @aws-cdk/aws-iam @ @aws-cdk/cloudformation-include

Compile the AWS CDK code:

npm run build

Deploy the AWS CDK code:

cdk synth cdk deploy

After the AWS CDK code is deployed, view the final output on the stack’s detail page on the AWS CloudFormation :

aws-codeartifact-pipeline-cloudformation-stack

How the pipeline works with artifact versions (using SNAPSHOTS)

In this demo, I publish SNAPSHOT to the repository. As per the documentation here and here, a SNAPSHOT refers to the most recent code along a branch. It’s a development version preceding the final release version. Identify a snapshot version of a Maven package by the suffix SNAPSHOT appended to the package version.

The application settings are defined in the pom.xml file. For this post, we define the following:

The version to be used, called 1.0-SNAPSHOT.
The specific packaging, called jar.
The specific project display name, called JavaEvents.
The specific group ID, called JavaEvents.

The screenshot below shows the pom.xml settings we utilised in the application:

aws-codeartifact-pipeline-pom-xml

You can’t republish a package asset that already exists with different content, as per the documentation here.

When a Maven snapshot is published, its previous version is preserved in a new version called a build. Each time a Maven snapshot is published, a new build version is created.

When a Maven snapshot is published, its status is set to Published, and the status of the build containing the previous version is set to Unlisted. If you request a snapshot, the version with status Published is returned. This is always the most recent Maven snapshot version.

For example, the image below shows the state when the pipeline is run for the FIRST RUN. The latest version has the status Published and previous builds are marked Unlisted.

aws-codeartifact-repository-package-versions

For all subsequent pipeline runs, multiple Unlisted versions will occur every time the pipeline is run, as all previous versions of a snapshot are maintained in its build versions.

aws-codeartifact-repository-package-versions

Fetching the Latest Code

Retrieve the snapshot from the repository in order to deploy the code to an AWS Lambda Function. I have used AWS CLI to list and fetch the latest asset of package version 1.0-SNAPSHOT.

Listing the latest snapshot

export ListLatestArtifact = `aws codeartifact list-package-version-assets —domain cdkpipelines-codeartifact --domain-owner $Account_Id --repository cdkpipelines-codeartifact-repository --namespace JavaEvents --format maven --package JavaEvents --package-version "1.0-SNAPSHOT"| jq ".assets[].name"|grep jar|sed ’s/“//g’`

NOTE : Please note the dynamic CDK variable $Account_Id which represents AWS Account ID.

Fetching the latest code using Package Version

aws codeartifact get-package-version-asset --domain cdkpipelines-codeartifact --repository cdkpipelines-codeartifact-repository --format maven --package JavaEvents --package-version 1.0-SNAPSHOT --namespace JavaEvents --asset $ListLatestArtifact demooutput

Notice that I’m referring the last code by using variable $ListLatestArtifact. This always fetches the latest code, and demooutput is the outfile of the AWS CLI command where the content (code) is saved.

Testing the Pipeline

Now clone the CodeCommit repository that we created with the following code:

git clone https://git-codecommit.<region>.amazonaws.com/v1/repos/codeartifact-pipeline-repository

Enter the following code to push the code to the CodeCommit repository:

cp -rp cdk-pipeline-codeartifact-new /* ca-pipeline-repository cd ca-pipeline-repository git checkout -b main git add . git commit -m “testing the pipeline” git push origin main

Once the code is pushed to Git repository, the pipeline is automatically triggered by Amazon CloudWatch events.

The following screenshots shows the second phase (AWS CodeBuild Phase – JarBuild_CodeArtifact) of the pipeline, wherein the asset is successfully compiled and published to the CodeArtifact repository by Maven:

aws-codeartifact-pipeline-codebuild-jarbuild

aws-codeartifact-pipeline-codebuild-screenshot

aws-codeartifact-pipeline-codebuild-screenshot2

The following screenshots show the last phase (AWS CodeBuild Phase – Deploy-to-Lambda) of the pipeline, wherein the latest asset is successfully pulled and deployed to AWS Lambda Function.

Asset JavaEvents-1.0-20210618.131629-5.jar is the latest snapshot code for the package version 1.0-SNAPSHOT. This is the same asset version code that will be deployed to AWS Lambda Function, as seen in the screenshots below:

aws-codeartifact-pipeline-codebuild-jardeploy

aws-codeartifact-pipeline-codebuild-screenshot-jarbuild

The following screenshot of the pipeline shows a successful run. The code was fetched and deployed to the existing Lambda function (codeartifact-test-function).

aws-codeartifact-pipeline-codepipeline

Cleanup

To clean up, You can either delete the entire stack through the AWS CloudFormation console or use AWS CDK command like below –

cdk destroy

For more information on the AWS CDK commands, please check the here or sample here.

Summary

In this post, I demonstrated how to build a CI/CD pipeline for your serverless application with AWS CodePipeline by utilizing AWS CDK with AWS CodeArtifact. Please check the documentation here for an in-depth explanation regarding other package managers and the getting started guide.

Deploy data lake ETL jobs using CDK Pipelines

2021-07-30 Ravi Itha

Post Syndicated from Ravi Itha original https://aws.amazon.com/blogs/devops/deploying-data-lake-etl-jobs-using-cdk-pipelines/

Many organizations are building data lakes on AWS, which provides the most secure, scalable, comprehensive, and cost-effective portfolio of services. Like any application development project, a data lake must answer a fundamental question: “What is the DevOps strategy?” Defining a DevOps strategy for a data lake requires extensive planning and multiple teams. This typically requires multiple development and test cycles before maturing enough to support a data lake in a production environment. If an organization doesn’t have the right people, resources, and processes in place, this can quickly become daunting.

What if your data engineering team uses basic building blocks to encapsulate data lake infrastructure and data processing jobs? This is where CDK Pipelines brings the full benefit of infrastructure as code (IaC). CDK Pipelines is a high-level construct library within the AWS Cloud Development Kit (AWS CDK) that makes it easy to set up a continuous deployment pipeline for your AWS CDK applications. The AWS CDK provides essential automation for your release pipelines so that your development and operations team remain agile and focus on developing and delivering applications on the data lake.

In this post, we discuss a centralized deployment solution utilizing CDK Pipelines for data lakes. This implements a DevOps-driven data lake that delivers benefits such as continuous delivery of data lake infrastructure, data processing, and analytical jobs through a configuration-driven multi-account deployment strategy. Let’s dive in!

Data lakes on AWS

A data lake is a centralized repository where you can store all of your structured and unstructured data at any scale. Store your data as is, without having to first structure it, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning in order to guide better decisions. To further explore data lakes, refer to What is a data lake?

We design a data lake with the following elements:

Secure data storage
Data cataloging in a central repository
Data movement
Data analysis

The following figure represents our data lake.

Data Lake on AWS

We use three Amazon Simple Storage Service (Amazon S3) buckets:

raw – Stores the input data in its original format
conformed – Stores the data that meets the data lake quality requirements
purpose-built – Stores the data that is ready for consumption by applications or data lake consumers

The data lake has a producer where we ingest data into the raw bucket at periodic intervals. We utilize the following tools: AWS Glue processes and analyzes the data. AWS Glue Data Catalog persists metadata in a central repository. AWS Lambda and AWS Step Functions schedule and orchestrate AWS Glue extract, transform, and load (ETL) jobs. Amazon Athena is used for interactive queries and analysis. Finally, we engage various AWS services for logging, monitoring, security, authentication, authorization, alerting, and notification.

A common data lake practice is to have multiple environments such as dev, test, and production. Applying the IaC principle for data lakes brings the benefit of consistent and repeatable runs across multiple environments, self-documenting infrastructure, and greater flexibility with resource management. The AWS CDK offers high-level constructs for use with all of our data lake resources. This simplifies usage and streamlines implementation.

Before exploring the implementation, let’s gain further scope of how we utilize our data lake.

The solution

Our goal is to implement a CI/CD solution that automates the provisioning of data lake infrastructure resources and deploys ETL jobs interactively. We accomplish this as follows: 1) applying separation of concerns (SoC) design principle to data lake infrastructure and ETL jobs via dedicated source code repositories, 2) a centralized deployment model utilizing CDK pipelines, and 3) AWS CDK enabled ETL pipelines from the start.

Data lake infrastructure

Our data lake infrastructure provisioning includes Amazon S3 buckets, S3 bucket policies, AWS Key Management Service (KMS) encryption keys, Amazon Virtual Private Cloud (Amazon VPC), subnets, route tables, security groups, VPC endpoints, and secrets in AWS Secrets Manager. The following diagram illustrates this.

Data Lake Infrastructure

Data lake ETL jobs

For our ETL jobs, we process New York City TLC Trip Record Data. The following figure displays our ETL process, wherein we run two ETL jobs within a Step Functions state machine.

AWS Glue ETL Jobs

Here are a few important details:

A file server uploads files to the S3 raw bucket of the data lake. The file server is a data producer and source for the data lake. We assume that the data is pushed to the raw bucket.
Amazon S3 triggers an event notification to the Lambda function.
The function inserts an item in the Amazon DynamoDB table in order to track the file processing state. The first state written indicates the AWS Step Function start.
The function starts the state machine.
The state machine runs an AWS Glue job (Apache Spark).
The job processes input data from the raw zone to the data lake conformed zone. The job also converts CSV input data to Parquet formatted data.
The job updates the Data Catalog table with the metadata of the conformed Parquet file.
A second AWS Glue job (Apache Spark) processes the input data from the conformed zone to the purpose-built zone of the data lake.
The job fetches ETL transformation rules from the Amazon S3 code bucket and transforms the input data.
The job stores the result in Parquet format in the purpose-built zone.
The job updates the Data Catalog table with the metadata of the purpose-built Parquet file.
The job updates the DynamoDB table and updates the job status to completed.
An Amazon Simple Notification Service (Amazon SNS) notification is sent to subscribers that states the job is complete.
Data engineers or analysts can now analyze data via Athena.

We will discuss data formats, Glue jobs, ETL transformation logics, data cataloging, auditing, notification, orchestration, and data analysis in more detail in AWS CDK Pipelines for Data Lake ETL Deployment GitHub repository. This will be discussed in the subsequent section.

Centralized deployment

Now that we have data lake infrastructure and ETL jobs ready, let’s define our deployment model. This model is based on the following design principles:

A dedicated AWS account to run CDK pipelines.
One or more AWS accounts into which the data lake is deployed.
The data lake infrastructure has a dedicated source code repository. Typically, data lake infrastructure is a one-time deployment and rarely evolves. Therefore, a dedicated code repository provides a landing zone for your data lake.
Each ETL job has a dedicated source code repository. Each ETL job may have unique AWS service, orchestration, and configuration requirements. Therefore, a dedicated source code repository will help you more flexibly build, deploy, and maintain ETL jobs.

We organize our source code repo into three branches: dev (main), test, and prod. In the deployment account, we manage three separate CDK Pipelines and each pipeline is sourced from a dedicated branch. Here we choose a branch-based software development method in order to demonstrate the strategy in more complex scenarios where integration testing and validation layers require human intervention. As well, these may not immediately follow with a corresponding release or deployment due to their manual nature. This facilitates the propagation of changes through environments without blocking independent development priorities. We accomplish this by isolating resources across environments in the central deployment account, allowing for the independent management of each environment, and avoiding cross-contamination during each pipeline’s self-mutating updates. The following diagram illustrates this method.

Centralized deployment

Note: This centralized deployment strategy can be adopted for trunk-based software development with minimal solution modification.

Deploying data lake ETL jobs

The following figure illustrates how we utilize CDK Pipelines to deploy data lake infrastructure and ETL jobs from a central deployment account. This model follows standard nomenclature from the AWS CDK. Each repository represents a cloud infrastructure code definition. This includes the pipelines construct definition. Pipelines have one or more actions, such as cloning the source code (source action) and synthesizing the stack into an AWS CloudFormation template (synth action). Each pipeline has one or more stages, such as testing and deploying. In an AWS CDK app context, the pipelines construct is a stack like any other stack. Therefore, when the AWS CDK app is deployed, a new pipeline is created in AWS CodePipeline.

This provides incredible flexibility regarding DevOps. In other words, as a developer with an understanding of AWS CDK APIs, you can harness the power and scalability of AWS services such as CodePipeline, AWS CodeBuild, and AWS CloudFormation.

Deploying data lake ETL jobs using CDK Pipelines

Here are a few important details:

The DevOps administrator checks in the code to the repository.
The DevOps administrator (with elevated access) facilitates a one-time manual deployment on a target environment. Elevated access includes administrative privileges on the central deployment account and target AWS environments.
CodePipeline periodically listens to commit events on the source code repositories. This is the self-mutating nature of CodePipeline. It’s configured to work with and can update itself according to the provided definition.
Code changes made to the main repo branch are automatically deployed to the data lake dev environment.
Code changes to the repo test branch are automatically deployed to the test environment.
Code changes to the repo prod branch are automatically deployed to the prod environment.

CDK Pipelines starter kits for data lakes

Want to get going quickly with CDK Pipelines for your data lake? Start by cloning our two GitHub repositories. Here is a summary:

AWS CDK Pipelines for Data Lake Infrastructure Deployment

This repository contains the following reusable resources:

CDK Application
CDK Pipelines stack
CDK Pipelines deploy stage
Amazon VPC stack
Amazon S3 stack

It also contains the following automation scripts:

AWS environments configuration
Deployment account bootstrapping
Target account bootstrapping
Account secrets configuration (e.g., GitHub access tokens)

AWS CDK Pipelines for Data Lake ETL Deployment

This repository contains the following reusable resources:

CDK Application
CDK Pipelines stack
CDK Pipelines deploy stage
Amazon DynamoDB stack
AWS Glue stack
AWS Step Functions stack

It also contains the following:

AWS Lambda scripts
AWS Glue scripts
AWS Step Functions State machine script

Advantages

This section summarizes some of the advantages offered by this solution.

Scalable and centralized deployment model

We utilize a scalable and centralized deployment model to deliver end-to-end automation. This allows DevOps and data engineers to use the single responsibility principal while maintaining precise control over the deployment strategy and code quality. The model can readily be expanded to more accounts, and the pipelines are responsive to custom controls within each environment, such as a production approval layer.

Configuration-driven deployment

Configuration in the source code and AWS Secrets Manager allow deployments to utilize targeted values that are declared globally in a single location. This provides consistent management of global configurations and dependencies such as resource names, AWS account Ids, Regions, and VPC CIDR ranges. Similarly, the CDK Pipelines export outputs from CloudFormation stacks for later consumption via other resources.

Repeatable and consistent deployment of new ETL jobs

Continuous integration and continuous delivery (CI/CD) pipelines allow teams to deploy to production more frequently. Code changes can be safely and securely propagated through environments and released for deployment. This allows rapid iteration on data processing jobs, and these jobs can be changed in isolation from pipeline changes, resulting in reliable workflows.

Cleaning up

You may delete the resources provisioned by utilizing the starter kits. You can do this by running the cdk destroy command using AWS CDK Toolkit. For detailed instructions, refer to the Clean up sections in the starter kit README files.

Conclusion

In this post, we showed how to utilize CDK Pipelines to deploy infrastructure and data processing ETL jobs of your data lake in dev, test, and production AWS environments. We provided two GitHub repositories for you to test and realize the full benefits of this solution first hand. We encourage you to fork the repositories, bring your ETL scripts, bootstrap your accounts, configure account parameters, and continuously delivery your data lake ETL jobs.

Let’s stay in touch via the GitHub—AWS CDK Pipelines for Data Lake Infrastructure Deployment and AWS CDK Pipelines for Data Lake ETL Deployment.

About the authors

Ravi Itha

Ravi Itha is a Sr. Data Architect at AWS. He works with customers to design and implement Data Lakes, Analytics, and Microservices on AWS. He is an open-source committer and has published more than a dozen solutions using AWS CDK, AWS Glue, AWS Lambda, AWS Step Functions, Amazon ECS, Amazon MQ, Amazon SQS, Amazon Kinesis Data Streams, and Amazon Kinesis Data Analytics for Apache Flink. His solutions can be found at his GitHub handle. Outside of work, he is passionate about books, cooking, movies, and yoga.

Isaiah Grant

Isaiah Grant is a Cloud Consultant at 2nd Watch. His primary function is to design architectures and build cloud-based applications and services. He leads customer engagements and helps customers with enterprise cloud adoptions. In his free time, he is engaged in local community initiatives and enjoys being outdoors with his family.

Zahid Ali

Zahid Ali is a Data Architect at AWS. He helps customers design, develop, and implement data warehouse and Data Lake solutions on AWS. Outside of work he enjoys playing tennis, spending time outdoors, and traveling.

Secure and analyse your Terraform code using AWS CodeCommit, AWS CodePipeline, AWS CodeBuild and tfsec

2021-07-30 César Prieto Ballester

Post Syndicated from César Prieto Ballester original https://aws.amazon.com/blogs/devops/secure-and-analyse-your-terraform-code-using-aws-codecommit-aws-codepipeline-aws-codebuild-and-tfsec/

Introduction

More and more customers are using Infrastructure-as-Code (IaC) to design and implement their infrastructure on AWS. This is why it is essential to have pipelines with Continuous Integration/Continuous Deployment (CI/CD) for infrastructure deployment. HashiCorp Terraform is one of the popular IaC tools for customers on AWS.

In this blog, I will guide you through building a CI/CD pipeline on AWS to analyze and identify possible configurations issues in your Terraform code templates. This will help mitigate security risks within our infrastructure deployment pipelines as part of our CI/CD. To do this, we utilize AWS tools and the Open Source tfsec tool, a static analysis security scanner for your Terraform code, including more than 90 preconfigured checks with the ability to add custom checks.

Solutions Overview

The architecture goes through a CI/CD pipeline created on AWS using AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and Amazon ECR.

Our demo has two separate pipelines:

CI/CD Pipeline to build and push our custom Docker image to Amazon ECR
CI/CD Pipeline where our tfsec analysis is executed and Terraform provisions infrastructure

The tfsec configuration and Terraform goes through a buildspec specification file defined within an AWS CodeBuild action. This action will calculate how many potential security risks we currently have within our Terraform templates, which will be displayed in our manual acceptance process for verification.

Architecture diagram

Provisioning the infrastructure

We have created an AWS Cloud Development Kit (AWS CDK) app hosted in a Git Repository written in Python. Here you can deploy the two main pipelines in order to manage this scenario. For a list of the deployment prerequisites, see the README.md file.

Clone the repo in your local machine. Then, bootstrap and deploy the CDK stack:

git clone https://github.com/aws-samples/aws-cdk-tfsec
cd aws-cdk-tfsec
pip install -r requirements.txt
cdk bootstrap aws://account_id/eu-west-1
cdk deploy --all

The infrastructure creation takes around 5-10 minutes due the AWS CodePipelines and referenced repository creation. Once the CDK has deployed the infrastructure, clone the two new AWS CodeCommit repos that have already been created and push the example code. First, one for the custom Docker image, and later for your Terraform code, like this:

git clone https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/awsome-terraform-example-container
cd awsome-terraform-example-container
git checkout -b main
cp repos/docker_image/* .
git add .
git commit -am "First commit"
git push origin main

Once the Docker image is built and pushed to the Amazon ECR, proceed with Terraform repo. Check the pipeline process on the AWS CodePipeline console.

Screenshot of CI/CD Pipeline to build Docker Image

git clone https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/awsome-terraform-example
cd awsome-terraform-example
git checkout -b main
cp -aR repos/terraform_code/* .
git add .
git commit -am "First commit"
git push origin main

The Terraform provisioning AWS CodePipeline has the following aspect:

Screenshot of CodePipeline to run security and orchestrate IaC

The pipeline has three main stages:

Source – AWS CodeCommit stores the Terraform repository infrastructure and every time we push code to the main branch the AWS CodePipeline will be triggered.
tfsec analysis – AWS CodeBuild looks for a buildspec to execute the tfsec actions configured on the same buildspec.

Screenshot showing tfsec analysis

The output shows the potential security issues detected by tfsec for our Terraform code. The output is linking to the different security issues already defined on tfsec. Check the security checks defined by tfsec here. After tfsec execution, a manual approval action is set up to decide if we should go for the next steps or if we reject and stop the AWS CodePipeline execution.

The URL for review is linking to our tfsec output console.

Screenshot of tfsec output

Terraform plan and Terraform apply – This will be applied to our infrastructure plan. After the Terraform plan command and before the Terraform apply, a manual action is set up to decide if we can apply the changes.

After going through all of the stages, our Terraform infrastructure should be created.

Clean up

After completing your demo, feel free to delete your stack using the CDK cli:

cdk destroy --all

Conclusion

At AWS, security is our top priority. This post demonstrates how to build a CI/CD pipeline by using AWS Services to automate and secure your infrastructure as code via Terraform and tfsec.

Learn more about tfsec through the official documentation: https://tfsec.dev/

About the authors

César Prieto Ballester is a DevOps Consultant at Amazon Web Services. He enjoys automating everything and building infrastructure using code. Apart from work, he plays electric guitar and loves riding his mountain bike.

Bruno Bardelli is a Senior DevOps Consultant at Amazon Web Services. He loves to build applications and in his free time plays video games, practices aikido, and goes on walks with his dog.

Blue/Green deployment with AWS Developer tools on Amazon EC2 using Amazon EFS to host application source code

2021-07-28 Rakesh Singh

Post Syndicated from Rakesh Singh original https://aws.amazon.com/blogs/devops/blue-green-deployment-with-aws-developer-tools-on-amazon-ec2-using-amazon-efs-to-host-application-source-code/

Many organizations building modern applications require a shared and persistent storage layer for hosting and deploying data-intensive enterprise applications, such as content management systems, media and entertainment, distributed applications like machine learning training, etc. These applications demand a centralized file share that scales to petabytes without disrupting running applications and remains concurrently accessible from potentially thousands of Amazon EC2 instances.

Simultaneously, customers want to automate the end-to-end deployment workflow and leverage continuous methodologies utilizing AWS developer tools services for performing a blue/green deployment with zero downtime. A blue/green deployment is a deployment strategy wherein you create two separate, but identical environments. One environment (blue) is running the current application version, and one environment (green) is running the new application version. The blue/green deployment strategy increases application availability by generally isolating the two application environments and ensuring that spinning up a parallel green environment won’t affect the blue environment resources. This isolation reduces deployment risk by simplifying the rollback process if a deployment fails.

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, and fully-managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It scales on demand, thereby eliminating the need to provision and manage capacity in order to accommodate growth. Utilize Amazon EFS to create a shared directory that stores and serves code and content for numerous applications. Your application can treat a mounted Amazon EFS volume like local storage. This means you don’t have to deploy your application code every time the environment scales up to multiple instances to distribute load.

In this blog post, I will guide you through an automated process to deploy a sample web application on Amazon EC2 instances utilizing Amazon EFS mount to host application source code, and utilizing a blue/green deployment with AWS code suite services in order to deploy the application source code with no downtime.

How this solution works

This blog post includes a CloudFormation template to provision all of the resources needed for this solution. The CloudFormation stack deploys a Hello World application on Amazon Linux 2 EC2 Instances running behind an Application Load Balancer and utilizes Amazon EFS mount point to store the application content. The AWS CodePipeline project utilizes AWS CodeCommit as the version control, AWS CodeBuild for installing dependencies and creating artifacts, and AWS CodeDeploy to conduct deployment on EC2 instances running in an Amazon EC2 Auto Scaling group.

Figure 1 below illustrates our solution architecture.

Figure 1: Sample solution architecture

The event flow in Figure 1 is as follows:

A developer commits code changes from their local repo to the CodeCommit repository. The commit triggers CodePipeline execution.
CodeBuild execution begins to compile source code, install dependencies, run custom commands, and create deployment artifact as per the instructions in the Build specification reference file.
During the build phase, CodeBuild copies the source-code artifact to Amazon EFS file system and maintains two different directories for current (green) and new (blue) deployments.
After successfully completing the build step, CodeDeploy deployment kicks in to conduct a Blue/Green deployment to a new Auto Scaling Group.
During the deployment phase, CodeDeploy mounts the EFS file system on new EC2 instances as per the CodeDeploy AppSpec file reference and conducts other deployment activities.
After successful deployment, a Lambda function triggers in order to store a deployment environment parameter in Systems Manager parameter store. The parameter stores the current EFS mount name that the application utilizes.
The AWS Lambda function updates the parameter value during every successful deployment with the current EFS location.

Prerequisites

For this walkthrough, the following are required:

An AWS account
Access to an AWS account with administrator or PowerUser (or equivalent) AWS Identity and Access Management(IAM) role policies attached
Git Command Line installed and configured in your local environment

Deploy the solution

Once you’ve assembled the prerequisites, download or clone the GitHub repo and store the files on your local machine. Utilize the commands below to clone the repo:

mkdir -p ~/blue-green-sample/
cd ~/blue-green-sample/
git clone https://github.com/aws-samples/blue-green-deployment-pipeline-for-efs

Once completed, utilize the following steps to deploy the solution in your AWS account:

Create a private Amazon Simple Storage Service (Amazon S3) bucket by using this documentation

Figure 2: AWS S3 console view when creating a bucket
Upload the cloned or downloaded GitHub repo files to the root of the S3 bucket. the S3 bucket objects structure should look similar to Figure 3:

Figure 3: AWS S3 bucket object structure
Go to the S3 bucket and select the template name solution-stack-template.yml, and then copy the object URL.
Open the CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
Select Amazon S3 URL as the template source, paste the object URL that you copied in Step 3, and then choose Next.
On the Specify stack details page, enter a name for the stack and provide the following input parameter. Modify the default values for other parameters in order to customize the solution for your environment. You can leave everything as default for this walkthrough.

ArtifactBucket– The name of the S3 bucket that you created in the first step of the solution deployment. This is a mandatory parameter with no default value.

Figure 4: Defining the stack name and input parameters for the CloudFormation stack

Choose Next.
On the Options page, keep the default values and then choose Next.
On the Review page, confirm the details, acknowledge that CloudFormation might create IAM resources with custom names, and then choose Create Stack.
Once the stack creation is marked as CREATE_COMPLETE, the following resources are created:

A virtual private cloud (VPC) configured with two public and two private subnets.
NAT Gateway, an EIP address, and an Internet Gateway.
Route tables for private and public subnets.
Auto Scaling Group with a single EC2 Instance.
Application Load Balancer and a Target Group.
Three security groups—one each for ALB, web servers, and EFS file system.
Amazon EFS file system with a mount target for each Availability Zone.
CodePipeline project with CodeCommit repository, CodeBuild, and CodeDeploy resources.
SSM parameter to store the environment current deployment status.
Lambda function to update the SSM parameter for every successful pipeline execution.
Required IAM Roles and policies.

Note: It may take anywhere from 10-20 minutes to complete the stack creation.

Test the solution

Now that the solution stack is deployed, follow the steps below to test the solution:

Validate CodePipeline execution status

After successfully creating the CloudFormation stack, a CodePipeline execution automatically triggers to deploy the default application code version from the CodeCommit repository.

In the AWS console, choose Services and then CloudFormation. Select your stack name. On the stack Outputs tab, look for the CodePipelineURL key and click on the URL.
Validate that all steps have successfully completed. For a successful CodePipeline execution, you should see something like Figure 5. Wait for the execution to complete in case it is still in progress.

Figure 5: CodePipeline console showing execution status of all stages

Validate the Website URL

After completing the pipeline execution, hit the website URL on a browser to check if it’s working.

On the stack Outputs tab, look for the WebsiteURL key and click on the URL.
For a successful deployment, it should open a default page similar to Figure 6.

Figure 6: Sample “Hello World” application (Green deployment)

Validate the EFS share

After the website deployed successfully, we will get into the application server and validate the EFS mount point and the application source code directory.

Open the Amazon EC2 console, and then choose Instances in the left navigation pane.
Select the instance named bg-sample and choose
For Connection method, choose Session Manager, and then choose connect

After the connection is made, run the following bash commands to validate the EFS mount and the deployed content. Figure 7 shows a sample output from running the bash commands.

sudo df –h | grep efs
ls –la /efs/green
ls –la /var/www/

Figure 7: Sample output from the bash command (Green deployment)

Deploy a new revision of the application code

After verifying the application status and the deployed code on the EFS share, commit some changes to the CodeCommit repository in order to trigger a new deployment.

On the stack Outputs tab, look for the CodeCommitURL key and click on the corresponding URL.
Click on the file html.
Click on
Uncomment line 9 and comment line 10, so that the new lines look like those below after the changes:

background-color: #0188cc; 
#background-color: #90ee90;

Add Author name, Email address, and then choose Commit changes.

After you commit the code, the CodePipeline triggers and executes Source, Build, Deploy, and Lambda stages. Once the execution completes, hit the Website URL and you should see a new page like Figure 8.

Figure 8: New Application version (Blue deployment)

On the EFS side, the application directory on the new EC2 instance now points to /efs/blue as shown in Figure 9.

Figure 9: Sample output from the bash command (Blue deployment)

Solution review

Let’s review the pipeline stages details and what happens during the Blue/Green deployment:

1) Build stage

For this sample application, the CodeBuild project is configured to mount the EFS file system and utilize the buildspec.yml file present in the source code root directory to run the build. Following is the sample build spec utilized in this solution:

version: 0.2
phases:
  install:
    runtime-versions:
      php: latest   
  build:
    commands:
      - current_deployment=$(aws ssm get-parameter --name $SSM_PARAMETER --query "Parameter.Value" --region $REGION --output text)
      - echo $current_deployment
      - echo $SSM_PARAMETER
      - echo $EFS_ID $REGION
      - if [[ "$current_deployment" == "null" ]]; then echo "this is the first GREEN deployment for this project" ; dir='/efs/green' ; fi
      - if [[ "$current_deployment" == "green" ]]; then dir='/efs/blue' ; else dir='/efs/green' ; fi
      - if [ ! -d $dir ]; then  mkdir $dir >/dev/null 2>&1 ; fi
      - echo $dir
      - rsync -ar $CODEBUILD_SRC_DIR/ $dir/
artifacts:
  files:
      - '**/*'

During the build job, the following activities occur:

Installs latest php runtime version.
Reads the SSM parameter value in order to know the current deployment and decide which directory to utilize. The SSM parameter value flips between green and blue for every successful deployment.
Synchronizes the latest source code to the EFS mount point.
Creates artifacts to be utilized in subsequent stages.

Note: Utilize the default buildspec.yml as a reference and customize it further as per your requirement. See this link for more examples.

2) Deploy Stage

The solution is utilizing CodeDeploy blue/green deployment type for EC2/On-premises. The deployment environment is configured to provision a new EC2 Auto Scaling group for every new deployment in order to deploy the new application revision. CodeDeploy creates the new Auto Scaling group by copying the current one. See this link for more details on blue/green deployment configuration with CodeDeploy. During each deployment event, CodeDeploy utilizes the appspec.yml file to run the deployment steps as per the defined life cycle hooks. Following is the sample AppSpec file utilized in this solution.

version: 0.0
os: linux
hooks:
  BeforeInstall:
    - location: scripts/install_dependencies
      timeout: 180
      runas: root
  AfterInstall:
    - location: scripts/app_deployment
      timeout: 180
      runas: root
  BeforeAllowTraffic :
     - location: scripts/check_app_status
       timeout: 180
       runas: root

Note: The scripts mentioned in the AppSpec file are available in the scripts directory of the CodeCommit repository. Utilize these sample scripts as a reference and modify as per your requirement.

For this sample, the following steps are conducted during a deployment:

BeforeInstall:
- Installs required packages on the EC2 instance.
- Mounts the EFS file system.
- Creates a symbolic link to point the apache home directory /var/www/html to the appropriate EFS mount point. It also ensures that the new application version deploys to a different EFS directory without affecting the current running application.

AfterInstall:
- Stops apache web server.
- Fetches current EFS directory name from Systems Manager.
- Runs some clean up commands.
- Restarts apache web server.

BeforeAllowTraffic:
- Checks application status if running fine.
- Exits the deployment with error if the app returns a non 200 HTTP status code.

3) Lambda Stage

After completing the deploy stage, CodePipeline triggers a Lambda function in order to update the SSM parameter value with the updated EFS directory name. This parameter value alternates between “blue” and “green” to help CodePipeline identify the right EFS file system path during the next deployment.

CodeDeploy Blue/Green deployment

Let’s review the sequence of events flow during the CodeDeploy deployment:

CodeDeploy creates a new Auto Scaling group by copying the original one.
Provisions a replacement EC2 instance in the new Auto Scaling Group.
Conducts the deployment on the new instance as per the instructions in the yml file.
Sets up health checks and redirects traffic to the new instance.
Terminates the original instance along with the Auto Scaling Group.
After completing the deployment, it should appear as shown in Figure 10.

AWS CodeDeploy console view of a Blue/Green CodeDeploy deployment on Ec2

Figure 10: AWS console view of a Blue/Green CodeDeploy deployment on Ec2

Troubleshooting

To troubleshoot any service-related issues, see the following links:

More information

Now that you have tested the solution, here are some additional points worth noting:

The sample template and code utilized in this blog can work in any AWS region and are mainly intended for demonstration purposes. Utilize the sample as a reference and modify it further as per your requirement.
This solution works with single account, Region, and VPC combination.
For this sample, we have utilized AWS CodeCommit as version control, but you can also utilize any other source supported by AWS CodePipeline like Bitbucket, GitHub, or GitHub Enterprise Server

Clean up

Follow these steps to delete the components and avoid any future incurring charges:

Open the AWS CloudFormation console.
On the Stacks page in the CloudFormation console, select the stack that you created for this blog post. The stack must be currently running.
In the stack details pane, choose Delete.
Select Delete stack when prompted.
Empty and delete the S3 bucket created during deployment step 1.

Conclusion

In this blog post, you learned how to set up a complete CI/CD pipeline for conducting a blue/green deployment on EC2 instances utilizing Amazon EFS file share as mount point to host application source code. The EFS share will be the central location hosting your application content, and it will help reduce your overall deployment time by eliminating the need for deploying a new revision on every EC2 instance local storage. It also helps to preserve any dynamically generated content when the life of an EC2 instance ends.

Author bio

Rakesh Singh

Rakesh is a Senior Technical Account Manager at Amazon. He loves automation and enjoys working directly with customers to solve complex technical issues and provide architectural guidance. Outside of work, he enjoys playing soccer, singing karaoke, and watching thriller movies.

Implementing Multi-Region Disaster Recovery Using Event-Driven Architecture

2021-07-27 Vaibhav Shah

Post Syndicated from Vaibhav Shah original https://aws.amazon.com/blogs/architecture/implementing-multi-region-disaster-recovery-using-event-driven-architecture/

In this blog post, we share a reference architecture that uses a multi-Region active/passive strategy to implement a hot standby strategy for disaster recovery (DR).

We highlight the benefits of performing DR failover using event-driven, serverless architecture, which provides high reliability, one of the pillars of AWS Well Architected Framework.

With the multi-Region active/passive strategy, your workloads operate in primary and secondary Regions with full capacity. The main traffic flows through the primary and the secondary Region acts as a recovery Region in case of a disaster event. This makes your infrastructure more resilient and highly available and allows business continuity with minimal impact on production workloads. This blog post aligns with the Disaster Recovery Series that explains various DR strategies that you can implement based on your goals for recovery time objectives (RTO), recovery point objectives (RPO), and cost.

Figure 1. DR strategies

Keeping RTO and RPO low

DR allows you to recover from various unforeseen failures that may make a Region unusable, including human errors causing misconfiguration, technical failures, natural disasters, etc. DR also mitigates the impact of disaster events and improves resiliency, which keeps Service Level Agreements high with minimum impact on business continuity.

As shown in Figure 2, the multi-Region active-passive strategy switches request traffic from primary to secondary Region via DNS records via Amazon Route 53 routing policies. This keeps RTO and RPO low.

Figure 2. DR implementation architecture on multi-Region active/passive workloads

Deploying your multi-Region workload with AWS CodePipeline

In the multi-Region active/passive strategy, your workload handles full capacity in primary and secondary AWS Regions using AWS CloudFormation. By using AWS CodePipeline, one deploy stage within the pipeline will deploy the stack to the primary Region (Figure 3). After that, the same stack is copied to the secondary Region.

The workloads in the primary and secondary Regions will be treated as two different environments. However, they will run the same version of your application for consistency and availability in event of a failure.

Figure 3. Deploying new versions to Lambda using CodePipeline and CloudFormation in two Regions

Fail over with event-driven serverless architecture

The event-driven serverless architecture performs failover by updating the weights of the Route 53 record. This shifts the traffic flow from the primary to the secondary Region. This operation specifies the source Region from where the failover is happening to the destination Region.

For a given application, there will be two Route 53 records with the same name. The two records will point at two different endpoints for the application deployed in two different Regions.

The record will use a weighted policy with the weight as 100 for the record pointing at the endpoint in the primary Region. This means that all the request traffic will be served by the endpoint in the primary Region. Similarly, the second record will have weight as 0 and will be pointing to the endpoint in the secondary Region. This means none of the request traffic will reach that endpoint. This process can be repeated for multiple API endpoints. The information about the applications, like DNS records, endpoints, hosted zone IDs, Regions, and weights will all be stored in a DynamoDB table.

Then the Amazon API Gateway calls an AWS Lambda function that scans each item in an Amazon DynamoDB table. The API Gateway also updates the weighted policy of the Route 53 record and the DynamoDB table weight attribute.

The API Gateway, Lambda, function and global DynamoDB table will all be deployed in the primary and secondary Regions.

Ensuring workload availability after a disaster

In the event of disaster, the data in the affected Region must be available in the recovery Region. In this section, we talk about fail over of databases like DynamoDB tables and Amazon Relational Database Service (Amazon RDS) databases.

Global DynamoDB tables

If you have two tables coexisting in two different Regions, any changes made to the table in the primary Region will be replicated to the secondary Region and vice versa. Once failover occurs, the request traffic moves through the recovery Region and connects to its databases, meaning your data and workloads are still working and available.

Amazon RDS database

Amazon RDS does not offer the same failover features that DynamoDB tables do. However, Amazon Aurora does offer a replication feature to read replicas in other Regions.

To use this feature, you’ll create an RDS database in the primary Region and enable backup replication in the configuration. In case of a DR event, you can choose to restore the replicated backup on the Amazon RDS instance in the destination Region.

Approach

Initiating DR

The DR process can be initiated manually or automatically based on certain metrics like status checks, error rates, etc. If the established thresholds are reached for these metrics, it signifies the workloads in the primary Region are failing.

You can initiate the DR process automatically by invoking an API call that can initiate backend automation. This allows you to measure how resilient and reliable your workload is and how quickly you can switch traffic to another Region if a real disaster happens.

Failover

The failover process is initiated by the API Gateway invoking a Lambda function, as shown on the left side of Figure 2. The Lambda function then performs failover by updating the weights of the Route 53 and DynamoDB table records. Similar steps can be performed for failing over database endpoints.

Once failover is complete, you’ll want to monitor traffic using Amazon CloudWatch. The List the available CloudWatch metrics for your instances user guide provides common metrics for you to monitor for Amazon Elastic Compute Cloud (Amazon EC2). The Amazon ECS CloudWatch metrics user guide provides common metrics to monitor for Amazon Elastic Container Service (Amazon ECS).

Failback

Once failover is successful and you’ve proven that traffic is being successfully routed to the new Region, you’ll failback to the primary Region. Similar metrics can be monitored in the secondary Region like you did in the Failover section.

Testing and results

Regularly test your DR process and evaluate the results and metrics. Based on the success and misses, you can make nearly continuous improvements in the DR process.

The critical applications within your organization will likely change over time, so it is important to evaluate which applications are mission critical and require an active/passive DR strategy. If an application is no longer mission critical, another DR strategy may be more appropriate.

Conclusion

The multi-Region active/passive strategy is one way to implement DR for applications hosted on AWS. It can fail over several applications in a short period of time by using serverless capabilities.

By using this strategy, your applications will be highly available and resilient to issues impacting Regional failures. It provides high business continuity, limits losses due to revenue reduction, and your customers will see minimum impact on performance efficiency, which is one of the pillars of AWS Well Architected Framework. By using this strategy, you can significantly reduce DR time by trading lower RTO and RPO for higher costs for critical applications.

Related information

Use the Snyk CLI to scan Python packages using AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild

2021-07-27 BK Das

Post Syndicated from BK Das original https://aws.amazon.com/blogs/devops/snyk-cli-scan-python-codecommit-codepipeline-codebuild/

One of the primary advantages of working in the cloud is achieving agility in product development. You can adopt practices like continuous integration and continuous delivery (CI/CD) and GitOps to increase your ability to release code at quicker iterations. Development models like these demand agility from security teams as well. This means your security team has to provide the tooling and visibility to developers for them to fix security vulnerabilities as quickly as possible.

Vulnerabilities in cloud-native applications can be roughly classified into infrastructure misconfigurations and application vulnerabilities. In this post, we focus on enabling developers to scan vulnerable data around Python open-source packages using the Snyk Command Line Interface (CLI).

The world of package dependencies

Traditionally, code scanning is performed by the security team; they either ship the code to the scanning instance, or in some cases ship it to the vendor for vulnerability scanning. After the vendor finishes the scan, the results are provided to the security team and forwarded to the developer. The end-to-end process of organizing the repositories, sending the code to security team for scanning, getting results back, and remediating them is counterproductive to the agility of working in the cloud.

Let’s take an example of package A, which uses package B and C. To scan package A, you scan package B and C as well. Similar to package A having dependencies on B and C, packages B and C can have their individual dependencies too. So the dependencies for each package get complex and cumbersome to scan over time. The ideal method is to scan all the dependencies in one go, without having manual intervention to understand the dependencies between packages.

Building on the foundation of GitOps and Gitflow

GitOps was introduced in 2017 by Weaveworks as a DevOps model to implement continuous deployment for cloud-native applications. It focuses on the developer ability to ship code faster. Because security is a non-negotiable piece of any application, this solution includes security as part of the deployment process. We define the Snyk scanner as declarative and immutable AWS Cloud Development Kit (AWS CDK) code, which instructs new Python code committed to the repository to be scanned.

Another continuous delivery practice that we base this solution on is Gitflow. Gitflow is a strict branching model that enables project release by enforcing a framework for managing Git projects. As a brief introduction on Gitflow, typically you have a main branch, which is the code sent to production, and you have a development branch where new code is committed. After the code in development branch passes all tests, it’s merged to the main branch, thereby becoming the code in production. In this solution, we aim to provide this scanning capability in all your branches, providing security observability through your entire Gitflow.

AWS services used in this solution

We use the following AWS services as part of this solution:

AWS CDK – The AWS CDK is an open-source software development framework to define your cloud application resources using familiar programming languages. In this solution, we use Python to write our AWS CDK code.
AWS CodeBuild – CodeBuild is a fully managed build service in the cloud. CodeBuild compiles your source code, runs unit tests, and produces artifacts that are ready to deploy. CodeBuild eliminates the need to provision, manage, and scale your own build servers.
AWS CodeCommit – CodeCommit is a fully managed source control service that hosts secure Git-based repositories. It makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. You can use CodeCommit to securely store anything from source code to binaries, and it works seamlessly with your existing Git tools.
AWS CodePipeline – CodePipeline is a continuous delivery service you can use to model, visualize, and automate the steps required to release your software. You can quickly model and configure the different stages of a software release process. CodePipeline automates the steps required to release your software changes continuously.
Amazon EventBridge – EventBridge rules deliver a near-real-time stream of system events that describe changes in AWS resources. With simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams.
AWS Systems Manager Parameter Store – Parameter Store, a capability of AWS Systems Manager, provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values.

Prerequisites

Before you get started, make sure you have the following prerequisites:

An AWS account (use a Region that supports CodeCommit, CodeBuild, Parameter Store, and CodePipeline)
A Snyk account
An existing CodeCommit repository you want to test on

Architecture overview

After you complete the steps in this post, you will have a working pipeline that scans your Python code for open-source vulnerabilities.

We use the Snyk CLI, which is available to customers on all plans, including the Free Tier, and provides the ability to programmatically scan repositories for vulnerabilities in open-source dependencies as well as base image recommendations for container images. The following reference architecture represents a general workflow of how Snyk performs the scan in an automated manner. The design uses DevSecOps principles of automation, event-driven triggers, and keeping humans out of the loop for its run.

As developers keep working on their code, they continue to commit their code to the CodeCommit repository. Upon each commit, a CodeCommit API call is generated, which is then captured using the EventBridge rule. You can customize this event rule for a specific event or feature branch you want to trigger the pipeline for.

When the developer commits code to the specified branch, that EventBridge event rule triggers a CodePipeline pipeline. This pipeline has a build stage using CodeBuild. This stage interacts with the Snyk CLI, and uses the token stored in Parameter Store. The Snyk CLI uses this token as authentication and starts scanning the latest code committed to the repository. When the scan is complete, you can review the results on the Snyk console.

This code is built for Python pip packages. You can edit the buildspec.yml to incorporate for any other language that Snyk supports.

The following diagram illustrates our architecture.

snyk architecture codepipeline

Code overview

The code in this post is written using the AWS CDK in Python. If you’re not familiar with the AWS CDK, we recommend reading Getting started with AWS CDK before you customize and deploy the code.

Repository URL: https://github.com/aws-samples/aws-cdk-codecommit-snyk

This AWS CDK construct uses the Snyk CLI within the CodeBuild job in the pipeline to scan the Python packages for open-source package vulnerabilities. The construct uses CodePipeline to create a two-stage pipeline: one source, and one build (the Snyk scan stage). The construct takes the input of the CodeCommit repository you want to scan, the Snyk organization ID, and Snyk auth token.

Resources deployed

This solution deploys the following resources:

An EventBridge rule
A CodeBuild project
Four AWS Identity and Access Management (IAM) roles with inline policies
A CodePipeline pipeline
An Amazon Simple Storage Service (Amazon S3) bucket
An AWS Key Management Service (AWS KMS) key and alias

For the deployment, we use the AWS CDK construct in the codebase cdk_snyk_construct/cdk_snyk_construct_stack.py in the AWS CDK stack cdk-snyk-stack. The construct requires the following parameters:

ARN of the CodeCommit repo you want to scan
Name of the repository branch you want to be monitored
Parameter Store name of the Snyk organization ID
Parameter Store name for the Snyk auth token

Set up the organization ID and auth token before deploying the stack. Because these are confidential and sensitive data, you should deploy them as a separate stack or manual process. In this solution, the parameters have been stored as a SecureString parameter type and encrypted using the AWS-managed KMS key.

You create the organization ID and auth token on the Snyk console. On the Settings page, choose General in the navigation page to add these parameters.

snyk settings console

You can retrieve the names of the parameters on the Systems Manager console by navigating to Parameter Store and finding the name on the Overview tab.

SSM Parameter Store

Create a requirements.txt file in the CodeCommit repository

We now create a repository in CodeCommit to store the code. For simplicity, we primarily store the requirements.txt file in our repository. In Python, a requirements file stores the packages that are used. Having clearly defined packages and versions makes it easier for development, especially in virtual environments.

For more information on the requirements file in Python, see Requirement Specifiers.

To create a CodeCommit repository, run the following AWS Command Line Interface (AWS CLI) command in your AWS accounts:

aws codecommit create-repository --repository-name snyk-repo \
--repository-description "Repository for Snyk to scan Python packages"

Now let’s create a branch called main in the repository using the following command:

aws codecommit create-branch --repository-name snyk-repo \
--branch-name main

After you create the repository, commit a file named requirements.txt with the following content. The following packages are pinned to a particular version that they have a vulnerability with. This file is our hypothetical vulnerable set of packages that have been committed into your development code.

PyYAML==5.3.1
Pillow==7.1.2
pylint==2.5.3
urllib3==1.25.8

For instructions on committing files in CodeCommit, see Connect to an AWS CodeCommit repository.

When you store the Snyk auth token and organization ID in Parameter Store, note the parameter names—you need to pass them as parameters during the deployment step.

Now clone the CDK code from the GitHub repository with the command below:

git clone https://github.com/aws-samples/aws-cdk-codecommit-snyk.git

After the cloning is complete you should see a directory named aws-cdk-codecommit-snyk on your machine.

When you’re ready to deploy, enter the aws-cdk-codecommit-snyk directory, and run the following command with the appropriate values:

cdk deploy cdk-snyk-stack \
--parameters RepoName=<name-of-codecommit-repo> \
--parameters RepoBranch=<branch-to-be-scanned> \
--parameters SnykOrgId=<value> \
--parameters SnykAuthToken=<value>

After the stack deployment is complete, you can see a new pipeline in your AWS account, which is configured to be triggered every time a commit occurs on the main branch.

You can view the results of the scan on the Snyk console. After the pipeline runs, log in to snyk.io and you should see a project named as per your repository (see the following screenshot).

snyk dashboard

Choose the repo name to get a detailed view of the vulnerabilities found. Depending on what packages you put in your requirements.txt, your report will differ from the following screenshot.

snyk-vuln-details

To fix the vulnerability identified, you can change the version of these packages in the requirements.txt file. The edited requirements file should look like the following:

PyYAML==5.4
Pillow==8.2.0
pylint==2.6.1
urllib3==1.25.9

After you update the requirements.txt file in your repository, push your changes back to the CodeCommit repository you created earlier on the main branch. The push starts the pipeline again.

After the commit is performed to the targeted branch, you don’t see the vulnerability reported on the Snyk dashboard because the pinned version 5.4 doesn’t contain that vulnerability.

Clean up

To avoid accruing further cost for the resources deployed in this solution, run cdk destroy to remove all the AWS resources you deployed through CDK.

As the CodeCommit repository was created using AWS CLI, the following command deletes the CodeCommit repository:

aws codecommit delete-repository --repository-name snyk-repo

Conclusion

In this post, we provided a solution so developers can self- remediate vulnerabilities in their code by monitoring it through Snyk. This solution provides observability, agility, and security for your Python application by following DevOps principles.

A similar architecture has been used at NFL to shift-left the security of their code. According to the shift-left design principle, security should be moved closer to the developers to identify and remediate security issues earlier in the development cycle. NFL has implemented a similar architecture which made the total process, from committing code on the branch to remediating 15 times faster than their previous code scanning setup.

Here’s what NFL has to say about their experience:

“NFL used Snyk to scan Python packages for a service launch. Traditionally it would have taken 10days to scan the packages through our existing process but with Snyk we were able to follow DevSecOps principles and get the scans completed, and reviewed within matter of days. This simplified our time to market while maintaining visibility into our security posture.” – Joe Steinke (Director, Data Solution Architect)

Enforcing AWS CloudFormation scanning in CI/CD Pipelines at scale using Trend Micro Cloud One Conformity

2021-07-09 Chris Dorrington

Post Syndicated from Chris Dorrington original https://aws.amazon.com/blogs/devops/cloudformation-scanning-cicd-pipeline-cloud-conformity/

Integrating AWS CloudFormation template scanning into CI/CD pipelines is a great way to catch security infringements before application deployment. However, implementing and enforcing this in a multi team, multi account environment can present some challenges, especially when the scanning tools used require external API access.

This blog will discuss those challenges and offer a solution using Trend Micro Cloud One Conformity (formerly Cloud Conformity) as the worked example. Accompanying this blog is the end to end sample solution and detailed install steps which can be found on GitHub here.

We will explore explore the following topics in detail:

When to detect security vulnerabilities
- Where can template scanning be enforced?
Managing API Keys for accessing third party APIs
- How can keys be obtained and distributed between teams?
- How easy is it to rotate keys with multiple teams relying upon them?
Viewing the results easily
- How do teams easily view the results of any scan performed?
Solution maintainability
- How can a fix or update be rolled out?
- How easy is it to change scanner provider? (i.e. from Cloud Conformity to in house tool)
Enforcing the template validation
- How to prevent teams from circumventing the checks?
Managing exceptions to the rules
- How can the teams proceed with deployment if there is a valid reason for a check to fail?

When to detect security vulnerabilities

During the DevOps life-cycle, there are multiple opportunities to test cloud applications for best practice violations when it comes to security. The Shift-left approach is to move testing to as far left in the life-cycle, so as to catch bugs as early as possible. It is much easier and less costly to fix on a local developer machine than it is to patch in production.

Diagram showing Shift-left approach

Figure 1 – depicting the stages that an app will pass through before being deployed into an AWS account

At the very left of the cycle is where developers perform the traditional software testing responsibilities (such as unit tests), With cloud applications, there is also a responsibility at this stage to ensure there are no AWS security, configuration, or compliance vulnerabilities. Developers and subsequent peer reviewers looking at the code can do this by eye, but in this way it is hard to catch every piece of bad code or misconfigured resource.

For example, you might define an AWS Lambda function that contains an access policy making it accessible from the world, but this can be hard to spot when coding or peer review. Once deployed, potential security risks are now live. Without proper monitoring, these misconfigurations can go undetected, with potentially dire consequences if exploited by a bad actor.

There are a number of tools and SaaS offerings on the market which can scan AWS CloudFormation templates and detect infringements against security best practices, such as Stelligent’s cfn_nag, AWS CloudFormation Guard, and Trend Micro Cloud One Conformity. These can all be run from the command line on a developer’s machine, inside the IDE or during a git commit hook. These options are discussed in detail in Using Shift-Left to Find Vulnerabilities Before Deployment with Trend Micro Template Scanner.

Whilst this is the most left the testing can be moved, it is hard to enforce it this early on in the development process. Mandating that scan commands be integrated into git commit hooks or IDE tools can significantly increase the commit time and quickly become frustrating for the developer. Because they are responsible for creating these hooks or installing IDE extensions, you cannot guarantee that a template scan is performed before deployment, because the developer could easily turn off the scans or not install the tools in the first place.

Another consideration for very-left testing of templates is that when applications are written using AWS CDK or AWS Serverless Application Model (SAM), the actual AWS CloudFormation template that is submitted to AWS isn’t available in source control; it’s created during the build or package stage. Therefore, moving template scanning as far to the left is just not possible in these situations. Developers have to run a command such as cdk synth or sam package to obtain the final AWS CloudFormation templates.

If we now look at the far right of Figure 1, when an application has been deployed, real time monitoring of the account can pick up security issues very quickly. Conformity performs excellently in this area by providing central visibility and real-time monitoring of your cloud infrastructure with a single dashboard. Accounts are checked against over 400 best practices, which allows you to find and remediate non-compliant resources. This real time alerting is fast – you can be assured of an email stating non-compliance in no time at all! However, remediation does takes time. Following the correct process, a fix to code will need to go through the CI/CD pipeline again before a patch is deployed. Relying on account scanning only at the far right is sub-optimal.

The best place to scan templates is at the most left of the enforceable part of the process – inside the CI/CD pipeline. Conformity provides their Template Scanner API for this exact purpose. Templates can be submitted to the API, and the same Conformity checks that are being performed in real time on the account are run against the submitted AWS CloudFormation template. When integrated programmatically into a build, failing checks can prevent a deployment from occurring.

Whilst it may seem a simple task to incorporate the Template Scanner API call into a CI/CD pipeline, there are many considerations for doing this successfully in an enterprise environment. The remainder of this blog will address each consideration in detail, and the accompanying GitHub repo provides a working sample solution to use as a base in your own organization.

View failing checks as AWS CodeBuild test reports

Treating failing Conformity checks the same as unit test failures within the build will make the process feel natural to the developers. A failing unit test will break the build, and so will a failing Conformity check.

AWS CodeBuild provides test reporting for common unit test frameworks, such as NUnit, JUnit, and Cucumber. This allows developers to easily and very visually see what failing tests have occurred within their builds, allowing for quicker remediation than having to trawl through test log files. This same principle can be applied to failing Conformity checks—this allows developers to quickly see what checks have failed, rather than looking into AWS CodeBuild logs. However, the AWS CodeBuild test reporting feature doesn’t natively support the JSON schema that the Conformity Template Scanner API returns. Instead, you need custom code to turn the Conformity response into a usable format. Later in this blog we will explore how the conversion occurs.

Cloud conformity failed checks displayed as CodeBuild Reports

Figure 2 – Cloud Conformity failed checks appearing as failed test cases in AWS CodeBuild reports

Enterprise speed bumps

Teams wishing to use template scanning as part of their AWS CodePipeline currently need to create an AWS CodeBuild project that calls the external API, and then performs the custom translation code. If placed inside a buildspec file, it can easily become bloated with many lines of code, leading to maintainability issues arising as copies of the same buildspec file are distributed across teams and accounts. Additionally, third-party APIs such as Conformity are often authorized by an API key. In some enterprises, not all teams have access to the Conformity console, further compounding the problem for API key management.

Below are some factors to consider when implementing template scanning in the enterprise:

How can keys be obtained and distributed between teams?
How easy is it to rotate keys when multiple teams rely upon them?
How can a fix or update be rolled out?
How easy is it to change scanner provider? (i.e. From Cloud Conformity to in house tool)

Overcome scaling issues, use a centralized Validation API

An approach to overcoming these issues is to create a single AWS Lambda function fronted by Amazon API Gateway within your organization that runs the call to the Template Scanner API, and performs the transform of results into a format usable by AWS CodeBuild reports. A good place to host this API is within the Cloud Ops team account or similar shared services account. This way, you only need to issue one API key (stored in AWS Secrets Manager) and it’s not available for viewing by any developers. Maintainability for the code performing the Template Scanner API calls is also very easy, because it resides in one location only. Key rotation is now simple (due to only one key in one location requiring an update) and can be automated through AWS Secrets Manager

The following diagram illustrates a typical setup of a multi-account, multi-dev team scenario in which a team’s AWS CodePipeline uses a centralized Validation API to call Conformity’s Template Scanner.

architecture diagram central api for cloud conformity template scanning

Figure 3 – Example of an AWS CodePipeline utilizing a centralized Validation API to call Conformity’s Template Scanner

Providing a wrapper API around the Conformity Template Scanner API encapsulates the code required to create the CodeBuild reports. Enabling template scanning within teams’ CI/CD pipelines now requires only a small piece of code within their CodeBuild buildspec file. It performs the following three actions:

Post the AWS CloudFormation templates to the centralized Validation API
Write the results to file (which are already in a format readable by CodeBuild test reports)
Stop the build if it detects failed checks within the results

The centralized Validation API in the shared services account can be hosted with a private API in Amazon API Gateway, fronted by a VPC endpoint. Using a private API denies any public access but does allow access from any internal address allowed by the VPC endpoint security group and endpoint policy. The developer teams can run their AWS CodeBuild validation phase within a VPC, thereby giving it access to the VPC endpoint.

A working example of the code required, along with an AWS CodeBuild buildspec file, is provided in the GitHub repository

Converting 3rd party tool results to CodeBuild Report format

With a centralized API, there is now only one place where the conversion code needs to reside (as opposed to copies embedded in each teams’ CodePipeline). AWS CodeBuild Reports are primarily designed for test framework outputs and displaying test case results. In our case, we want to display Conformity checks – which are not unit test case results. The accompanying GitHub repository to convert from Conformity Template Scanner API results, but we will discuss mappings between the formats so that bespoke conversions for other 3rd party tools, such as cfn_nag can be created if required.

AWS CodeBuild provides out of the box compatibility for common unit test frameworks, such as NUnit, JUnit and Cucumber. Out of the supported formats, Cucumber JSON is the most readable format to read and manipulate due to native support in languages such as Python (all other formats being in XML).

Figure 4 depicts where the Cucumber JSON fields will appear in the AWS CodeBuild reports page and Figure 5 below shows a valid Cucumber snippet, with relevant fields highlighted in yellow.

CodeBuild Reports page with fields highlighted that correspond to cucumber JSON fields

Figure 4 – AWS CodeBuild report test case field mappings utilized by Cucumber JSON

Cucumber JSON snippet showing CodeBuild Report field mappings

Figure 5 – Cucumber JSON with mappings to AWS CodeBuild report table

Note that in Figure 5, there are additional fields (eg. id, description etc) that are required to make the file valid Cucumber JSON – even though this data is not displayed in CodeBuild Reports page. However, raw reports are still available as AWS CodeBuild artifacts, and therefore it is useful to still populate these fields with data that could be useful to aid deeper troubleshooting.

Conversion code for Conformity results is provided in the accompanying GitHub repo, within file app.py, line 376 onwards

Making the validation phase mandatory in AWS CodePipeline

The Shift-Left philosophy states that we should shift testing as much as possible to the left. The furthest left would be before any CI/CD pipeline is triggered. Developers could and should have the ability to perform template validation from their own machines. However, as discussed earlier this is rarely enforceable – a scan during a pipeline deployment is the only true way to know that templates have been validated. But how can we mandate this and truly secure the validation phase against circumvention?

Preventing updates to deployed CI/CD pipelines

Using a centralized API approach to make the call to the validation API means that this code is now only accessible by the Cloud Ops team, and not the developer teams. However, the code that calls this API has to reside within the developer teams’ CI/CD pipelines, so that it can stop the build if failures are found. With CI/CD pipelines defined as AWS CloudFormation, and without any preventative measures in place, a team could move to disable the phase and deploy code without any checks performed.

Fortunately, there are a number of approaches to prevent this from happening, and to enforce the validation phase. We shall now look at one of them from the AWS CloudFormation Best Practices.

IAM to control access

Use AWS IAM to control access to the stacks that define the pipeline, and then also to the AWS CodePipeline/AWS CodeBuild resources within them.

IAM policies can generically restrict a team from updating a CI/CD pipeline provided to them if a naming convention is used in the stacks that create them. By using a naming convention, coupled with the wildcard “*”, these policies can be applied to a role even before any pipelines have been deployed..

For example, lets assume the pipeline depicted in Figure 6 is defined and deployed in AWS CloudFormation as follows:

Stack name is “cicd-pipeline-team-X”
AWS CodePipeline resource within the stack has logical name with prefix “CodePipelineCICD”
AWS CodeBuild Project for validation phase is prefixed with “CodeBuildValidateProject”

Creating an IAM policy with the statements below and attaching to the developer teams’ IAM role will prevent them from modifying the resources mentioned above. The AWS CloudFormation stack and resource names will match the wildcards in the statements and Deny the user to any update actions.

Example IAM policy highlighting how to deny updates to stacks and pipeline resources

Figure 6 – Example of how an IAM policy can restrict updates to AWS CloudFormation stacks and deployed resources

Preventing valid failing checks from being a bottleneck

When centralizing anything, and forcing developers to use tooling or features such as template scanners, it is imperative that it (or the team owning it) does not become a bottleneck and slow the developers down. This is just as true for our centralized API solution.

It is sometimes the case that a developer team has a valid reason for a template to yield a failing check. For instance, Conformity will report a HIGH severity alert if a load balancer does not have an HTTPS listener. If a team is migrating an older application which will only work on port 80 and not 443, the team may be able to obtain an exception from their cyber security team. It would not desirable to turn off the rule completely in the real time scanning of the account, because for other deployments this HIGH severity alert could be perfectly valid. The team faces an issue now because the validation phase of their pipeline will fail, preventing them from deploying their application – even though they have cyber approval to fail this one check.

It is imperative that when enforcing template scanning on a team that it must not become a bottleneck. Functionality and workflows must accompany such a pipeline feature to allow for quick resolution.

Screenshot of Trend Micro Cloud One Conformity rule from their website

Figure 7 – Screenshot of a Conformity rule from their website

Therefore the centralized validation API must provide a way to allow for exceptions on a case by case basis. Any exception should be tied to a unique combination of AWS account number + filename + rule ID, which ensures that exceptions are only valid for the specific instance of violation, and not for any other. This can be achieved by extending the centralized API with a set of endpoints to allow for exception request and approvals. These can then be integrated into existing or new tooling and workflows to be able to provide a self service method for teams to be able to request exceptions. Cyber security teams should be able to quickly approve/deny the requests.

The exception request/approve functionality can be implemented by extending the centralized private API to provide an /exceptions endpoint, and using DynamoDB as a data store. During a build and template validation, failed checks returned from Conformity are then looked up in the Dynamo table to see if an approved exception is available – if it is, then the check is not returned as a actual failing check, but rather an exempted check. The build can then continue and deploy to the AWS account.

Figure 8 and figure 9 depict the /exceptions endpoints that are provided as part of the sample solution in the accompanying GitHub repository.

screenshot of API gateway for centralized template scanner api

Figure 8 – Screenshot of API Gateway depicting the endpoints available as part of the accompanying solution

The /exceptions endpoint methods provides the following functionality:

Table containing HTTP verbs for exceptions endpoint

Figure 9 – HTTP verbs implementing exception functionality

Important note regarding endpoint authorization: Whilst the “validate” private endpoint may be left with no auth so that any call from within a VPC is accepted, the same is not true for the “exception” approval endpoint. It would be prudent to use AWS IAM authentication available in API Gateway to restrict approvals to this endpoint for certain users only (i.e. the cyber and cloud ops team only)

With the ability to raise and approve exception requests, the mandatory scanning phase of the developer teams’ pipelines is no longer a bottleneck.

Conclusion

Enforcing template validation into multi developer team, multi account environments can present challenges with using 3rd party APIs, such as Conformity Template Scanner, at scale. We have talked through each hurdle that can be presented, and described how creating a centralized Validation API and exception approval process can overcome those obstacles and keep the teams deploying without unwarranted speed bumps.

By shifting left and integrating scanning as part of the pipeline process, this can leave the cyber team and developers sure that no offending code is deployed into an account – whether they were written in AWS CDK, AWS SAM or AWS CloudFormation.

Additionally, we talked in depth on how to use CodeBuild reports to display the vulnerabilities found, aiding developers to quickly identify where attention is required to remediate.

Getting started

The blog has described real life challenges and the theory in detail. A complete sample for the described centralized validation API is available in the accompanying GitHub repo, along with a sample CodePipeline for easy testing. Step by step instructions are provided for you to deploy, and enhance for use in your own organization. Figure 10 depicts the sample solution available in GitHub.

https://github.com/aws-samples/aws-cloudformation-template-scanning-with-cloud-conformity

NOTE: Remember to tear down any stacks after experimenting with the provided solution, to ensure ongoing costs are not charged to your AWS account. Notes on how to do this are included inside the repo Readme.

example codepipeline architecture provided by the accompanying github solution

Figure 10 depicts the solution available for use in the accompanying GitHub repository

Find out more

Other blog posts are available that cover aspects when dealing with template scanning in AWS:

For more information on Trend Micro Cloud One Conformity, use the links below.

Avatar for Chris Dorrington

Chris Dorrington

Chris Dorrington is a Senior Cloud Architect with AWS Professional Services in Perth, Western Australia. Chris loves working closely with AWS customers to help them achieve amazing outcomes. He has over 25 years software development experience and has a passion for Serverless technologies and all things DevOps

How Banks Can Use AWS to Meet Compliance

2021-06-29 Jiwan Panjiker

Post Syndicated from Jiwan Panjiker original https://aws.amazon.com/blogs/architecture/how-banks-can-use-aws-to-meet-compliance/

Since the 2008 financial crisis, banking supervisory institutions such as the Basel Committee on Banking Supervision (BCBS) have strengthened regulations. There is now increased oversight over the financial services industry. For banks, making the necessary changes to comply with these rules is a challenging, multi-year effort.

Basel IV, a massive update to existing rules, is due for implementation in January 2023. Basel IV standardizes the approach to calculating credit risk, increases the impact of risk-weighted assets (RWAs) and emphasizes data transparency.

Given the complexity of data, modeling, and numerous assumptions that have to be made, compliance under Basel IV implementation will be challenging. Standardization omits nuances unique to your business, which can drive up costs, but violating guidelines will result in steep penalties.

This post will address these challenges by outlining a mechanism that facilitates a healthy, data-driven dialogue between banks and regulators to better achieve compliance objectives. The reference architecture will focus on enabling fast, iterative releases with the help of serverless AWS services.

There are four key actions to take in order to support this mechanism:

Automate data management
Establish a continuous integration/continuous delivery (CI/CD) pipeline
Enable fast, point-in-time audit replays
Set up proactive monitoring and notifications

Automate data management

Due to frequent merger activity, banks are typically comprised of a web of integrated systems and siloed business units, making it difficult to consolidate data. Under Basel IV guidelines, auditors want banks to provide detailed data in a presentable way.

You can tackle this first challenge by establishing a data pipeline as shown in Figure 1. Take inventory of each data source as it is incorporated into the pipeline. Identify the critical internal and external data sources that will be used to populate the initial landing area. Amazon Simple Storage Service (S3) is a great choice for this.

Figure 1. Data pipeline that cleans, processes, and segments data

Amazon S3 is a highly available, durable service that is a popular data lake solution. S3 offers WORM storage capabilities like S3 Glacier Vault and S3 Object Lock to protect the integrity of your archived data in accordance with U.S. SEC and FINRA rules.

Basel IV regulations also require banks to use many attributes to develop accurate credit risk models. The attributes can be a mix of datasets such as financial statements, internal balanced scorecards, macro-economic data, and credit ratings. The risk models themselves can also be segmented by portfolio types, industry segments, asset types and much more.

You can split data into different domains and designate data owners with separate S3 buckets. Credit risk model developers, analyst, and data scientists can then use the structure of the S3 buckets to pull together relevant datasets. They can then store the outputs into S3 buckets.

To support fast, automated data retrieval, store object metadata in a highly scalable, and queryable database. You can set up Amazon S3 so that an event can initiate a function to populate Amazon DynamoDB. Developers can use AWS Lambda to write these functions using popular languages like Python.

With AWS Glue, you can automate Extract/Load/Transform (ETL) processes to clean and move data to the different S3 buckets. AWS Glue can also support data operations by automatically cataloging your various data sources.

Taking on a structured approach will simplify data governance and transparency as the business continues to grow and operate.

Establish a CI/CD pipeline

Adopt tools that machine learning teams can use to build a streamlined CI/CD solution as demonstrated in Figure 2.

Figure 2. An end-to-end machine learning development and deployment pipeline

Using tightly integrated AWS services, your teams can minimize time spent managing tools and deployment processes, and instead, focus on tuning the models and analyzing the results.

Amazon SageMaker brings together a powerful set of machine learning capabilities on the AWS Cloud. It helps data scientists and engineers build insightful models. Figure 2 depicts the high-level architecture and shows how Amazon SageMaker Pipelines helps teams orchestrate the automation and deployment processes.

The core of the pipeline uses a set of AWS deployment services so that your teams can collaborate and review effectively. With AWS CodeCommit, your teams can set up git-based repository to store and version models for data processing, training, and evaluation. The repository can also store code and configuration files using AWS CloudFormation for deployment. You can use AWS CodePipeline and AWS CodeBuild to create and update a model endpoint based on the approved/reviewed changes.

Any updates detected in the AWS CodeCommit repository initiate a deployment whenever a new model version is added to the Model Registry. Amazon S3 can be used to store generated model artifacts, historical data, and models.

Enable fast, point-in-time audit replays

Figure 3. Containers offer a lightweight, powerful solution to run audits using historical assets

One of the main themes of Basel IV is transparency. Figure 3 illustrates a solution to build trust with regulators by allowing them to verify and understand modeling activity.

A lightweight application is hosted in AWS Fargate and enables auditors to re-run Basel credit risk models under specified conditions. With AWS Fargate, you don’t need to manually manage instances or container orchestration. Configure the CPU or memory specifications at the task level and set guidelines around scalability for your service. Your tasks then scale up and down automatically, based on demand, and will optimize cost efficiency and availability.

Figure 3 shows the following:

The application takes inputs such as date, release version, and model type.
It then queries DynamoDB with this information.
The query will return the data necessary to retrieve model artifacts from previous CI/CD deployments and relevant datasets from historical S3 buckets.
Using this information, it can spin up as many containers as needed to run the model.
It then stores the outputs in a separate S3 bucket.
Auditors will have a detailed trace of all the attributes, assumptions, and data that went into the modeling effort. To streamline this process, the app can also compare the outputs of the historical runs to the recent replay and highlight any significant deviations.

Though internal models will be de-emphasized under Basel IV, banks will continue to run internal models as a benchmark against the broader standards. Schedule AWS Fargate tasks to run these models regularly to capitalize on highly performant compute services while minimizing costs.

Set up proactive monitoring and notifications

Figure 4. Scheduled jobs can send out notifications using Amazon SNS when certain thresholds are breached

The last principle is based around establishing an early warning system, enabling banks to take on a more proactive role in maintaining compliance.

With automated monitoring and notifications, banks will be able to respond quickly to potential concerns. For instance, there can be a daily scheduled job that launches containers and runs the models against the latest data. If any thresholds are breached, alerts can be sent out via SMS or email. Operational teams can be subscribed to certain message topics using Amazon Simple Notification Service (SNS). They can then respond before actual compliance issues emerge.

Conclusion

With a Well-Architected approach, AWS helps you control your data, deploy new features, and embrace a serverless approach. This frees you to innovate quickly and address regulatory challenges.

You can iterate with new AWS services and bring machine learning to bear on various streams of data to identify high impact pools of value. You can get a clearer picture of the data to make it easier to identify areas where you can reduce RWAs. Using Amazon S3, you can turn on AWS analytics services such as Amazon QuickSight and Amazon Athena to visualize the data. You’ll be able to fulfill reporting requirements such as those found in regulatory studies like CCAR, DFAST, CECL, and IFRS9.

For more information about establishing a data pipeline, read Lake House Formation Architecture. It is a powerful pattern that combines a few concepts that will help bring your data together cohesively. To set up a robust CI/CD pipeline, explore the AWS Serverless CI/CD Reference Architecture.

Continuous Compliance Workflow for Infrastructure as Code: Part 2

2021-06-26 DAMODAR SHENVI WAGLE

Post Syndicated from DAMODAR SHENVI WAGLE original https://aws.amazon.com/blogs/devops/continuous-compliance-workflow-for-infrastructure-as-code-part-2/

In the first post of this series, we introduced a continuous compliance workflow in which an enterprise security and compliance team can release guardrails in a continuous integration, continuous deployment (CI/CD) fashion in your organization.

In this post, we focus on the technical implementation of the continuous compliance workflow. We demonstrate how to use AWS Developer Tools to create a CI/CD pipeline that releases guardrails for Terraform application workloads.

We use the Terraform-Compliance framework to define the guardrails. Terraform-Compliance is a lightweight, security and compliance-focused test framework for Terraform to enable the negative testing capability for your infrastructure as code (IaC).

With this compliance framework, we can ensure that the implemented Terraform code follows security standards and your own custom standards. Currently, HashiCorp provides Sentinel (a policy as code framework) for enterprise products. AWS has CloudFormation Guard an open-source policy-as-code evaluation tool for AWS CloudFormation templates. Terraform-Compliance allows us to build a similar functionality for Terraform, and is open source.

This post is from the perspective of a security and compliance engineer, and assumes that the engineer is familiar with the practices of IaC, CI/CD, behavior-driven development (BDD), and negative testing.

Solution overview

You start by building the necessary resources as listed in the workload (application development team) account:

An AWS CodeCommit repository for the Terraform workload
A CI/CD pipeline built using AWS CodePipeline to deploy the workload
A cross-account AWS Identity and Access Management (IAM) role that gives the security and compliance account the permissions to pull the Terraform workload from the workload account repository for testing their guardrails in observation mode

Next, we build the resources in the security and compliance account:

A CodeCommit repository to hold the security and compliance standards (guardrails)
A CI/CD pipeline built using CodePipeline to release new guardrails
A cross-account role that gives the workload account the permissions to pull the activated guardrails from the main branch of the security and compliance account repository.

The following diagram shows our solution architecture.

solution architecture diagram

The architecture has two workflows: security and compliance (Steps 1–4) and application delivery (Steps 5–7).

When a new security and compliance guardrail is introduced into the develop branch of the compliance repository, it triggers the security and compliance pipeline.
The pipeline pulls the Terraform workload.
The pipeline tests this compliance check guardrail against the Terraform workload in the workload account repository.
If the workload is compliant, the guardrail is automatically merged into the main branch. This activates the guardrail by making it available for all Terraform application workload pipelines to consume. By doing this, we make sure that we don’t break the Terraform application deployment pipeline by introducing new guardrails. It also provides the security and compliance team visibility into the resources in the application workload that are noncompliant. The security and compliance team can then reach out to the application delivery team and suggest appropriate remediation before the new standards are activated. If the compliance check fails, the automatic merge to the main branch is stopped. The security and compliance team has an option to force merge the guardrail into the main branch if it’s deemed critical and they need to activate it immediately.
The Terraform deployment pipeline in the workload account always pulls the latest security and compliance checks from the main branch of the compliance repository.
Checks are run against the Terraform workload to ensure that it meets the organization’s security and compliance standards.
Only secure and compliant workloads are deployed by the pipeline. If the workload is noncompliant, the security and compliance checks fail and break the pipeline, forcing the application delivery team to remediate the issue and recheck-in the code.

Prerequisites

Before proceeding any further, you need to identify and designate two AWS accounts required for the solution to work:

Security and Compliance – In which you create a CodeCommit repository to hold compliance standards that are written based on Terraform-Compliance framework. You also create a CI/CD pipeline to release new compliance guardrails.
Workload – In which the Terraform workload resides. The pipeline to deploy the Terraform workload enforces the compliance guardrails prior to the deployment.

You also need to create two AWS account profiles in ~/.aws/credentials for the tools and target accounts, if you don’t already have them. These profiles need to have sufficient permissions to run an AWS Cloud Development Kit (AWS CDK) stack. They should be your private profiles and only be used during the course of this use case. Therefore, it should be fine if you want to use admin privileges. Don’t share the profile details, especially if it has admin privileges. I recommend removing the profile when you’re finished with this walkthrough. For more information about creating an AWS account profile, see Configuring the AWS CLI.

In addition, you need to generate a cucumber-sandwich.jar file by following the steps in the cucumber-sandwich GitHub repo. The JAR file is needed to generate pretty HTML compliance reports. The security and compliance team can use these reports to make sure that the standards are met.

To implement our solution, we complete the following high-level steps:

Create the security and compliance account stack.
Create the workload account stack.
Test the compliance workflow.

Create the security and compliance account stack

We create the following resources in the security and compliance account:

A CodeCommit repo to hold the security and compliance guardrails
A CI/CD pipeline to roll out the Terraform compliance guardrails
An IAM role that trusts the application workload account and allows it to pull compliance guardrails from its CodeCommit repo

In this section, we set up the properties for the pipeline and cross-account role stacks, and run the deployment scripts.

Set up properties for the pipeline stack

Clone the GitHub repo aws-continuous-compliance-for-terraform and navigate to the folder security-and-compliance-account/stacks. This contains the folder pipeline_stack/, which holds the code and properties for creating the pipeline stack.

The folder has a JSON file cdk-stack-param.json, which has the parameter TERRAFORM_APPLICATION_WORKLOADS, which represents the list of application workloads that the security and compliance pipeline pulls and runs tests against to make sure that the workloads are compliant. In the workload list, you have the following parameters:

GIT_REPO_URL – The HTTPS URL of the CodeCommit repository in the workload account against which the security and compliance check pipeline runs compliance guardrails.
CROSS_ACCOUNT_ROLE_ARN – The ARN for the cross-account role we create in the next section. This role gives the security and compliance account permissions to pull Terraform code from the workload account.

For CROSS_ACCOUNT_ROLE_ARN, replace <workload-account-id> with the account ID for your designated AWS workload account. For GIT_REPO_URL, replace <region> with AWS Region where the repository resides.

security and compliance pipeline stack parameters

Set up properties for the cross-account role stack

In the cloned GitHub repo aws-continuous-compliance-for-terraform from the previous step, navigate to the folder security-and-compliance-account/stacks. This contains the folder cross_account_role_stack/, which holds the code and properties for creating the cross-account role.

The folder has a JSON file cdk-stack-param.json, which has the parameter TERRAFORM_APPLICATION_WORKLOAD_ACCOUNTS, which represents the list of Terraform workload accounts that intend to integrate with the security and compliance account for running compliance checks. All these accounts are trusted by the security and compliance account and given permissions to pull compliance guardrails. Replace <workload-account-id> with the account ID for your designated AWS workload account.

security and compliance cross account role stack parameters

Run the deployment script

Run deploy.sh by passing the name of the AWS security and compliance account profile you created earlier. The script uses the AWS CDK CLI to bootstrap and deploy the two stacks we discussed. See the following code:

cd aws-continuous-compliance-for-terraform/security-and-compliance-account/
./deploy.sh "<AWS-COMPLIANCE-ACCOUNT-PROFILE-NAME>"

You should now see three stacks in the tools account:

CDKToolkit – AWS CDK creates the CDKToolkit stack when we bootstrap the AWS CDK app. This creates an Amazon Simple Storage Service (Amazon S3) bucket needed to hold deployment assets such as an AWS CloudFormation template and AWS Lambda code package.
cf-CrossAccountRoles – This stack creates the cross-account IAM role.
cf-SecurityAndCompliancePipeline – This stack creates the pipeline. On the Outputs tab of the stack, you can find the CodeCommit source repo URL from the key OutSourceRepoHttpUrl. Record the URL to use later.

security and compliance stack

Create a workload account stack

We create the following resources in the workload account:

A CodeCommit repo to hold the Terraform workload to be deployed
A CI/CD pipeline to deploy the Terraform workload
An IAM role that trusts the security and compliance account and allows it to pull Terraform code from its CodeCommit repo for testing

We follow similar steps as in the previous section to set up the properties for the pipeline stack and cross-account role stack, and then run the deployment script.

Set up properties for the pipeline stack

In the already cloned repo, navigate to the folder workload-account/stacks. This contains the folder pipeline_stack/, which holds the code and properties for creating the pipeline stack.

The folder has a JSON file cdk-stack-param.json, which has the parameter COMPLIANCE_CODE, which provides details on where to pull the compliance guardrails from. The pipeline pulls and runs compliance checks prior to deployment, to make sure that application workload is compliant. You have the following parameters:

GIT_REPO_URL – The HTTPS URL of the CodeCommit repositoryCode in the security and compliance account, which contains compliance guardrails that the pipeline in the workload account pulls to carry out compliance checks.
CROSS_ACCOUNT_ROLE_ARN – The ARN for the cross-account role we created in the previous step in the security and compliance account. This role gives the workload account permissions to pull the Terraform compliance code from its respective security and compliance account.

For CROSS_ACCOUNT_ROLE_ARN, replace <compliance-account-id> with the account ID for your designated AWS security and compliance account. For GIT_REPO_URL, replace <region> with Region where the repository resides.

workload pipeline stack config

Set up the properties for cross-account role stack

In the already cloned repo, navigate to folder workload-account/stacks. This contains the folder cross_account_role_stack/, which holds the code and properties for creating the cross-account role stack.

The folder has a JSON file cdk-stack-param.json, which has the parameter COMPLIANCE_ACCOUNT, which represents the security and compliance account that intends to integrate with the workload account for running compliance checks. This account is trusted by the workload account and given permissions to pull compliance guardrails. Replace <compliance-account-id> with the account ID for your designated AWS security and compliance account.

workload cross account role stack config

Run the deployment script

Run deploy.sh by passing the name of the AWS workload account profile you created earlier. The script uses the AWS CDK CLI to bootstrap and deploy the two stacks we discussed. See the following code:

cd aws-continuous-compliance-for-terraform/workload-account/
./deploy.sh "<AWS-WORKLOAD-ACCOUNT-PROFILE-NAME>"

You should now see three stacks in the tools account:

CDKToolkit –AWS CDK creates the CDKToolkit stack when we bootstrap the AWS CDK app. This creates an S3 bucket needed to hold deployment assets such as a CloudFormation template and Lambda code package.
cf-CrossAccountRoles – This stack creates the cross-account IAM role.
cf-TerraformWorkloadPipeline – This stack creates the pipeline. On the Outputs tab of the stack, you can find the CodeCommit source repo URL from the key OutSourceRepoHttpUrl. Record the URL to use later.

workload pipeline stack

Test the compliance workflow

In this section, we walk through the following steps to test our workflow:

Push the application workload code into its repo.
Push the security and compliance code into its repo and run its pipeline to release the compliance guardrails.
Run the application workload pipeline to exercise the compliance guardrails.
Review the generated reports.

Push the application workload code into its repo

Clone the empty CodeCommit repo from workload account. You can find the URL from the variable OutSourceRepoHttpUrl on the Outputs tab of the cf-TerraformWorkloadPipeline stack we deployed in the previous section.

Create a new branch main and copy the workload code into it.
Copy the cucumber-sandwich.jar file you generated in the prerequisites section into a new folder /lib.
Create a directory called reports with an empty file dummy. The reports directory is where Terraform-Compliance framework create compliance reports.
Push the code to the remote origin.

See the following sample script

git checkout -b main
# Copy the code from git repo location
# Create reports directory and a dummy file.
mkdir reports
touch reports/dummy
git add .
git commit -m “Initial commit”
git push origin main

The folder structure of workload code repo should match the structure shown in the following screenshot.

workload code folder structure

The first commit triggers the pipeline-workload-main pipeline, which fails in the stage RunComplianceCheck due to the security and compliance repo not being present (which we add in the next section).

Push the security and compliance code into its repo and run its pipeline

Clone the empty CodeCommit repo from the security and compliance account. You can find the URL from the variable OutSourceRepoHttpUrl on the Outputs tab of the cf-SecurityAndCompliancePipeline stack we deployed in the previous section.

Create a new local branch main and check in the empty branch into the remote origin so that the main branch is created in the remote origin. Skipping this step leads to failure in the code merge step of the pipeline due to the absence of the main branch.
Create a new branch develop and copy the security and compliance code into it. This is required because the security and compliance pipeline is configured to be triggered from the develop branch for the purposes of this post.
Copy the cucumber-sandwich.jar file you generated in the prerequisites section into a new folder /lib.

See the following sample script:

cd security-and-compliance-code
git checkout -b main
git add .
git commit --allow-empty -m “initial commit”
git push origin main
git checkout -b develop main
# Here copy the code from git repo location
# You also copy cucumber-sandwich.jar into a new folder /lib
git add .
git commit -m “Initial commit”
git push origin develop

The folder structure of security and compliance code repo should match the structure shown in the following screenshot.

security and compliance code folder structure

The code push to the develop branch of the security-and-compliance-code repo triggers the security and compliance pipeline. The pipeline pulls the code from the workload account repo, then runs the compliance guardrails against the Terraform workload to make sure that the workload is compliant. If the workload is compliant, the pipeline merges the compliance guardrails into the main branch. If the workload fails the compliance test, the pipeline fails. The following screenshot shows a sample run of the pipeline.

security and compliance pipeline

Run the application workload pipeline to exercise the compliance guardrails

After we set up the security and compliance repo and the pipeline runs successfully, the workload pipeline is ready to proceed (see the following screenshot of its progress).

workload pipeline

The service delivery teams are now being subjected to the security and compliance guardrails being implemented (RunComplianceCheck stage), and their pipeline breaks if any resource is noncompliant.

Review the generated reports

CodeBuild supports viewing reports generated in cucumber JSON format. In our workflow, we generate reports in cucumber JSON and BDD XML formats, and we use this capability of CodeBuild to generate and view HTML reports. Our implementation also generates report directly in HTML using the cucumber-sandwich library.

The following screenshot is snippet of the script compliance-check.sh, which implements report generation.

compliance check script

The bug noted in the screenshot is in the radish-bdd library that Terraform-Compliance uses for the cucumber JSON format report generation. For more information, you can review the defect logged against radish-bdd for this issue.

After the script generates the reports, CodeBuild needs to be configured to access them to generate HTML reports. The following screenshot shows a snippet from buildspec-compliance-check.yml, which shows how the reports section is set up for report generation:

buildspec compliance check

For more details on how to set up buildspec file for CodeBuild to generate reports, see Create a test report.

CodeBuild displays the compliance run reports as shown in the following screenshot.

code build cucumber report

We can also view a trending graph for multiple runs.

code build cucumber report

The other report generated by the workflow is the pretty HTML report generated by the cucumber-sandwich library.

code build cucumber report

The reports are available for download from the S3 bucket <OutPipelineBucketName>/pipeline-security-an/report_App/<zip file>.

The cucumber-sandwich generated report marks scenarios with skipped tests as failed scenarios. This is the only noticeable difference between the CodeBuild generated HTML and cucumber-sandwich generated HTML reports.

Clean up

To remove all the resources from the workload account, complete the following steps in order:

Go to the folder where you cloned the workload code and edit buildspec-workload-deploy.yml:
- Comment line 44 (- ./workload-deploy.sh).
- Uncomment line 45 (- ./workload-deploy.sh --destroy).
- Commit and push the code change to the remote repo. The workload pipeline is triggered, which cleans up the workload.
Delete the CloudFormation stack cf-CrossAccountRoles. This step removes the cross-account role from the workload account, which gives permission to the security and compliance account to pull the Terraform workload.
Go to the CloudFormation stack cf-TerraformWorkloadPipeline and note the OutPipelineBucketName and OutStateFileBucketName on the Outputs tab. Empty the two buckets and then delete the stack. This removes pipeline resources from workload account.
Go to the CDKToolkit stack and note the BucketName on the Outputs tab. Empty that bucket and then delete the stack.

To remove all the resources from the security and compliance account, complete the following steps in order:

Delete the CloudFormation stack cf-CrossAccountRoles. This step removes the cross-account role from the security and compliance account, which gives permission to the workload account to pull the compliance code.
Go to CloudFormation stack cf-SecurityAndCompliancePipeline and note the OutPipelineBucketName on the Outputs tab. Empty that bucket and then delete the stack. This removes pipeline resources from the security and compliance account.
Go to the CDKToolkit stack and note the BucketName on the Outputs tab. Empty that bucket and then delete the stack.

Security considerations

Cross-account IAM roles are very powerful and need to be handled carefully. For this post, we strictly limited the cross-account IAM role to specific CodeCommit permissions. This makes sure that the cross-account role can only do those things.

Conclusion

In this post in our two-part series, we implemented a continuous compliance workflow using CodePipeline and the open-source Terraform-Compliance framework. The Terraform-Compliance framework allows you to build guardrails for securing Terraform applications deployed on AWS.

We also showed how you can use AWS developer tools to seamlessly integrate security and compliance guardrails into an application release cycle and catch noncompliant AWS resources before getting deployed into AWS.

Try implementing the solution in your enterprise as shown in this post, and leave your thoughts and questions in the comments.

About the authors

sumit mishra

Sumit Mishra is Senior DevOps Architect at AWS Professional Services. His area of expertise include IaC, Security in pipeline, CI/CD and automation.

Damodar Shenvi Wagle

Damodar Shenvi Wagle is a Cloud Application Architect at AWS Professional Services. His areas of expertise include architecting serverless solutions, CI/CD and automation.

Building a CI/CD pipeline to update an AWS CloudFormation StackSets

2021-06-26 Karim Afifi

Post Syndicated from Karim Afifi original https://aws.amazon.com/blogs/devops/building-a-ci-cd-pipeline-to-update-an-aws-cloudformation-stacksets/

AWS CloudFormation StackSets can extend the functionality of CloudFormation Stacks by enabling you to create, update, or delete one or more stack across multiple accounts. As a developer working in a large enterprise or for a group that supports multiple AWS accounts, you may often find yourself challenged with updating AWS CloudFormation StackSets. If you’re building a CI/CD pipeline to automate the process of updating CloudFormation stacks, you can do so natively. AWS CodePipeline can initiate a workflow that builds and tests a stack, and then pushes it to production. The workflow can either create or manipulate an existing stack; however, working with AWS CloudFormation StackSets is currently not a supported action at the time of this writing.

You can update an existing CloudFormation stack using one of two methods:

Directly updating the stack – AWS immediately deploys the changes that you submit. You can use this method when you want to quickly deploy your updates.
Running change sets – You can preview the changes AWS CloudFormation will make to the stack, and decide whether to proceed with the changes.

You have several options when building a CI/CD pipeline to automate creating or updating a stack. You can create or update a stack, delete a stack, create or replace a change set, or run a change set. Creating or updating a CloudFormation StackSet, however, is not a supported action.

The following screenshot shows the existing actions supported by CodePipeline against AWS CloudFormation on the CodePipeline console.

CodePipeline console

This post explains how to use CodePipeline to update an existing CloudFormation StackSet. For this post, we update the StackSet’s parameters. Parameters enable you to input custom values to your template each time you create or update a stack.

Overview of solution

To implement this solution, we walk you through the following high-level steps:

Update a parameter for a StackSet by passing a parameter key and its associated value via an AWS CodeCommit
Create an AWS CodeBuild
Build a CI/CD pipeline.
Run your pipeline and monitor its status.

After completing all the steps in this post, you will have a fully functional CI/CD that updates the CloudFormation StackSet parameters. The pipeline starts automatically after you apply the intended changes into the CodeCommit repository.

The following diagram illustrates the solution architecture.

Solution Architecture

The solution workflow is as follows:

Developers integrate changes into a main branch hosted within a CodeCommit repository.
CodePipeline polls the source code repository and triggers the pipeline to run when a new version is detected.
CodePipeline runs a build of the new revision in CodeBuild.
CodeBuild runs the changes in the yml file, which includes the changes against the StackSets. (To update all the stack instances associated with this StackSet, do not specify DeploymentTargets or Regions in the buildspec.yml file.)
Verify that the changes were applied successfully.

Prerequisites

To complete this tutorial, you should have the following prerequisites:

An AWS account.
Access to either AWS Cloud9 or the AWS Command Line Interface (AWS CLI).
Basic knowledge of AWS CloudFormation.
A CloudFormation StackSet. You can create a StackSet via the AWS Management Console or the AWS CLI. For instructions, see Create a StackSet. You can also use one of the sample templates provided by AWS CloudFormation device to create a StackSet. For more information, please visit the Create a StackSet
An AWS Identity and Access Management (IAM) service role to allow CodeBuild to build this project.

Retrieving your StackSet parameters

Your first step is to verify that you have a StackSet in the AWS account you intend to use. If not, create one before proceeding. For this post, we use an existing StackSet called StackSet-Test.

Sign in to your AWS account.
On the CloudFormation console, choose StackSets.
Choose your StackSet.

StackSet

For this post, we modify the value of the parameter with the key KMSId.

On the Parameters tab, note the value of the key KMSId.

Parameters

Creating a CodeCommit repository

To create your repository, complete the following steps:

On the CodeCommit console, choose Repositories.
Choose Create repository.

Repositories name

For Repository name, enter a name (for example, Demo-Repo).
Choose Create.

Repositories Description

Choose Create file to populate the repository with the following artifacts.

Create file

A buildspec.yml file informs CodeBuild of all the actions that should be taken during a build run for our application. We divide the build run into separate predefined phases for logical organization, and list the commands that run on the provisioned build server performing a build job.

Enter the following code in the code editor:

YAML

phases:

pre_build:

commands:

- aws cloudformation update-stack-set --stack-set-name StackSet-Test --use-previous-template --parameters ParameterKey=KMSId,ParameterValue=newCustomValue

The preceding AWS CloudFormation command updates a StackSet with the name StackSet-Test. The command results in updating the parameter value of the parameter key KMSId to newCustomValue.

Name the file yml.
Provide an author name and email address.
Choose Commit changes.

Creating a CodeBuild project

To create your CodeBuild project, complete the following steps:

On the CodeBuild console, choose Build projects.
Choose Create build project.

create build project

For Project name, enter your project name (for example, Demo-Build).
For Description, enter an optional description.

project name

For Source provider, choose AWS CodeCommit.
For Repository, choose the CodeCommit repository you created in the previous step.
For Reference type, keep default selection Branch.
For Branch, choose master.

Source configuration

To set up the CodeBuild environment, we use a managed image based on Amazon Linux 2.

For Environment Image, select Managed image.
For Operating system, choose Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose amazonlinux2-aarch64-standard:1.0.
For Image version, choose Always use the latest for this runtime version.

Environment

For Service role¸ select New service role.
For Role name, enter your service role name.

Service Role

Chose Create build project.

Creating a CodePipeline pipeline

To create your pipeline, complete the following steps:

On the CodePipeline console, choose Pipelines.
Choose Create pipeline

Code Pipeline

For Pipeline name, enter a name for the pipeline (for example, DemoPipeline).
For Service role, select New service role.
For Role name, enter your service role name.

Pipeline name

Choose Next.
For Source provider, choose AWS CodeCommit.
For Repository name, choose the repository you created.
For Branch name, choose master.

Source Configurations

Choose Next.
For Build provider, choose AWS CodeBuild.
For Region, choose your Region.
For Project name, choose the build project you created.

CodeBuild

Choose Next.
Choose Skip deploy stage.
Choose Skip
Choose Create pipeline.

The pipeline is now created successfully.

Running and monitoring your pipeline

We use the pipeline to release changes. By default, a pipeline starts automatically when it’s created and any time a change is made in a source repository. You can also manually run the most recent revision through your pipeline, as in the following steps:

On the CodePipeline console, choose the pipeline you created.
On the pipeline details page, choose Release change.

The following screenshot shows the status of the run from the pipeline.

Release change

Under Build, choose Details to view build logs, phase details, reports, environment variables, and build details.

Build details

Choose the Build logs tab to view the logs generated as a result of the build in more detail.

The following screenshot shows that we ran the AWS CloudFormation command that was provided in the buildspec.yml file. It also shows that all phases of the build process are successfully complete.

Phase Details

The StackSet parameter KMSId has been updated successfully with the new value newCustomValue as a result of running the pipeline. Please note that we used the parameter KMSId as an example for demonstration purposes. Any other parameter that is part of your StackSet could have been used instead.

Cleaning up

You may delete the resources that you created during this post:

AWS CloudFormation StackSet.
AWS CodeCommit repository.
AWS CodeBuild project.
AWS CodePipeline.

Conclusion

In this post, we explored how to use CodePipeline, CodeBuild, and CodeCommit to update an existing CloudFormation StackSet. Happy coding!

About the author

Karim Afifi is a Solutions Architect Leader with Amazon Web Services. He is part of the Global Life Sciences Solution Architecture team. team. He is based out of New York, and enjoys helping customers throughout their journey to innovation.

Building an end-to-end Kubernetes-based DevSecOps software factory on AWS

2021-06-26 Srinivas Manepalli

Post Syndicated from Srinivas Manepalli original https://aws.amazon.com/blogs/devops/building-an-end-to-end-kubernetes-based-devsecops-software-factory-on-aws/

DevSecOps software factory implementation can significantly vary depending on the application, infrastructure, architecture, and the services and tools used. In a previous post, I provided an end-to-end DevSecOps pipeline for a three-tier web application deployed with AWS Elastic Beanstalk. The pipeline used cloud-native services along with a few open-source security tools. This solution is similar, but instead uses a containers-based approach with additional security analysis stages. It defines a software factory using Kubernetes along with necessary AWS Cloud-native services and open-source third-party tools. Code is provided in the GitHub repo to build this DevSecOps software factory, including the integration code for third-party scanning tools.

DevOps is a combination of cultural philosophies, practices, and tools that combine software development with information technology operations. These combined practices enable companies to deliver new application features and improved services to customers at a higher velocity. DevSecOps takes this a step further by integrating and automating the enforcement of preventive, detective, and responsive security controls into the pipeline.

In a DevSecOps factory, security needs to be addressed from two aspects: security of the software factory, and security in the software factory. In this architecture, we use AWS services to address the security of the software factory, and use third-party tools along with AWS services to address the security in the software factory. This AWS DevSecOps reference architecture covers DevSecOps practices and security vulnerability scanning stages including secret analysis, SCA (Software Composite Analysis), SAST (Static Application Security Testing), DAST (Dynamic Application Security Testing), RASP (Runtime Application Self Protection), and aggregation of vulnerability findings into a single pane of glass.

The focus of this post is on application vulnerability scanning. Vulnerability scanning of underlying infrastructure such as the Amazon Elastic Kubernetes Service (Amazon EKS) cluster and network is outside the scope of this post. For information about infrastructure-level security planning, refer to Amazon Guard Duty, Amazon Inspector, and AWS Shield.

You can deploy this pipeline in either the AWS GovCloud (US) Region or standard AWS Regions. All listed AWS services are authorized for FedRamp High and DoD SRG IL4/IL5.

Security and compliance

Thoroughly implementing security and compliance in the public sector and other highly regulated workloads is very important for achieving an ATO (Authority to Operate) and continuously maintain an ATO (c-ATO). DevSecOps shifts security left in the process, integrating it at each stage of the software factory, which can make ATO a continuous and faster process. With DevSecOps, an organization can deliver secure and compliant application changes rapidly while running operations consistently with automation.

Security and compliance are shared responsibilities between AWS and the customer. Depending on the compliance requirements (such as FedRamp or DoD SRG), a DevSecOps software factory needs to implement certain security controls. AWS provides tools and services to implement most of these controls. For example, to address NIST 800-53 security controls families such as access control, you can use AWS Identity Access and Management (IAM) roles and Amazon Simple Storage Service (Amazon S3) bucket policies. To address auditing and accountability, you can use AWS CloudTrail and Amazon CloudWatch. To address configuration management, you can use AWS Config rules and AWS Systems Manager. Similarly, to address risk assessment, you can use vulnerability scanning tools from AWS.

The following table is the high-level mapping of the NIST 800-53 security control families and AWS services that are used in this DevSecOps reference architecture. This list only includes the services that are defined in the AWS CloudFormation template, which provides pipeline as code in this solution. You can use additional AWS services and tools or other environmental specific services and tools to address these and the remaining security control families on a more granular level.

#	NIST 800-53 Security Control Family – Rev 5	AWS Services Used (In this DevSecOps Pipeline)
1	AC – Access Control	AWS IAM, Amazon S3, and Amazon CloudWatch are used. AWS::IAM::ManagedPolicy AWS::IAM::Role AWS::S3::BucketPolicy AWS::CloudWatch::Alarm
2	AU – Audit and Accountability	AWS CloudTrail, Amazon S3, Amazon SNS, and Amazon CloudWatch are used. AWS::CloudTrail::Trail AWS::Events::Rule AWS::CloudWatch::LogGroup AWS::CloudWatch::Alarm AWS::SNS::Topic
3	CM – Configuration Management	AWS Systems Manager, Amazon S3, and AWS Config are used. AWS::SSM::Parameter AWS::S3::Bucket AWS::Config::ConfigRule
4	CP – Contingency Planning	AWS CodeCommit and Amazon S3 are used. AWS::CodeCommit::Repository AWS::S3::Bucket
5	IA – Identification and Authentication	AWS IAM is used. AWS:IAM:User AWS::IAM::Role
6	RA – Risk Assessment	AWS Config, AWS CloudTrail, AWS Security Hub, and third party scanning tools are used. AWS::Config::ConfigRule AWS::CloudTrail::Trail AWS::SecurityHub::Hub Vulnerability Scanning Tools (AWS/AWS Partner/3rd party)
7	CA – Assessment, Authorization, and Monitoring	AWS CloudTrail, Amazon CloudWatch, and AWS Config are used. AWS::CloudTrail::Trail AWS::CloudWatch::LogGroup AWS::CloudWatch::Alarm AWS::Config::ConfigRule
8	SC – System and Communications Protection	AWS KMS and AWS Systems Manager are used. AWS::KMS::Key AWS::SSM::Parameter SSL/TLS communication
9	SI – System and Information Integrity	AWS Security Hub, and third party scanning tools are used. AWS::SecurityHub::Hub Vulnerability Scanning Tools (AWS/AWS Partner/3rd party)
10	AT – Awareness and Training	N/A
11	SA – System and Services Acquisition	N/A
12	IR – Incident Response	Not implemented, but services like AWS Lambda, and Amazon CloudWatch Events can be used.
13	MA – Maintenance	N/A
14	MP – Media Protection	N/A
15	PS – Personnel Security	N/A
16	PE – Physical and Environmental Protection	N/A
17	PL – Planning	N/A
18	PM – Program Management	N/A
19	PT – PII Processing and Transparency	N/A
20	SR – SupplyChain Risk Management	N/A

Services and tools

In this section, we discuss the various AWS services and third-party tools used in this solution.

CI/CD services

For continuous integration and continuous delivery (CI/CD) in this reference architecture, we use the following AWS services:

AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.
AWS CodeCommit – A fully managed source control service that hosts secure Git-based repositories.
AWS CodeDeploy – A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Fargate, AWS Lambda, and your on-premises servers.
AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates.
AWS Lambda – A service that lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
Amazon Simple Notification Service – Amazon SNS is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.
Amazon S3 – Amazon S3 is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.
AWS Systems Manager Parameter Store – Parameter Store provides secure, hierarchical storage for configuration data management and secrets management.

Continuous testing tools

The following are open-source scanning tools that are integrated in the pipeline for the purpose of this post, but you could integrate other tools that meet your specific requirements. You can use the static code review tool Amazon CodeGuru for static analysis, but at the time of this writing, it’s not yet available in AWS GovCloud and currently supports Java and Python.

Anchore (SCA and SAST) – Anchore Engine is an open-source software system that provides a centralized service for analyzing container images, scanning for security vulnerabilities, and enforcing deployment policies.
Amazon Elastic Container Registry image scanning – Amazon ECR image scanning helps in identifying software vulnerabilities in your container images. Amazon ECR uses the Common Vulnerabilities and Exposures (CVEs) database from the open-source Clair project and provides a list of scan findings.
Git-Secrets (Secrets Scanning) – Prevents you from committing sensitive information to Git repositories. It is an open-source tool from AWS Labs.
OWASP ZAP (DAST) – Helps you automatically find security vulnerabilities in your web applications while you’re developing and testing your applications.
Snyk (SCA and SAST) – Snyk is an open-source security platform designed to help software-driven businesses enhance developer security.
Sysdig Falco (RASP) – Falco is an open source cloud-native runtime security project that detects unexpected application behavior and alerts on threats at runtime. It is the first runtime security project to join CNCF as an incubation-level project.

You can integrate additional security stages like IAST (Interactive Application Security Testing) into the pipeline to get code insights while the application is running. You can use AWS partner tools like Contrast Security, Synopsys, and WhiteSource to integrate IAST scanning into the pipeline. Malware scanning tools, and image signing tools can also be integrated into the pipeline for additional security.

Continuous logging and monitoring services

The following are AWS services for continuous logging and monitoring used in this reference architecture:

Amazon CloudWatch Events – Delivers a near-real-time stream of system events that describe changes in AWS resources
Amazon CloudWatch Logs – Allows you to monitor, store, and access your log files from EC2 instances, CloudTrail, Amazon Route 53, and other sources

Auditing and governance services

The following are AWS auditing and governance services used in this reference architecture:

AWS CloudTrail – Enables governance, compliance, operational auditing, and risk auditing of your AWS account.
AWS Config – Allows you to assess, audit, and evaluate the configurations of your AWS resources.
AWS Identity and Access Management – Enables you to manage access to AWS services and resources securely. With IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.

Operations services

The following are the AWS operations services used in this reference architecture:

AWS CloudFormation – Gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.
Amazon ECR – A fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts anywhere.
Amazon EKS – A managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. Amazon EKS runs up-to-date versions of the open-source Kubernetes software, so you can use all of the existing plugins and tooling from the Kubernetes community.
AWS Security Hub – Gives you a comprehensive view of your security alerts and security posture across your AWS accounts. This post uses Security Hub to aggregate all the vulnerability findings as a single pane of glass.
AWS Systems Manager Parameter Store – Provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values.

Pipeline architecture

The following diagram shows the architecture of the solution. We use AWS CloudFormation to describe the pipeline as code.

Kubernetes DevSecOps Pipeline Architecture

The main steps are as follows:

1. When a user commits the code to CodeCommit repository, a CloudWatch event is generated, which triggers CodePipeline to orchestrate the events.
2. CodeBuild packages the build and uploads the artifacts to an S3 bucket.
3. CodeBuild scans the code with git-secrets. If there is any sensitive information in the code such as AWS access keys or secrets keys, CodeBuild fails the build.
4. CodeBuild creates the container image and perform SCA and SAST by scanning the image with Snyk or Anchore. In the provided CloudFormation template, you can pick one of these tools during the deployment. Please note, CodeBuild is fully enabled for a “bring your own tool” approach.
  - (4a) If there are any vulnerabilities, CodeBuild invokes the Lambda function. The function parses the results into AWS Security Finding Format (ASFF) and posts them to Security Hub. Security Hub helps aggregate and view all the vulnerability findings in one place as a single pane of glass. The Lambda function also uploads the scanning results to an S3 bucket.
  - (4b) If there are no vulnerabilities, CodeBuild pushes the container image to Amazon ECR and triggers another scan using built-in Amazon ECR scanning.
5. CodeBuild retrieves the scanning results.
  - (5a) If there are any vulnerabilities, CodeBuild invokes the Lambda function again and posts the findings to Security Hub. The Lambda function also uploads the scan results to an S3 bucket.
  - (5b) If there are no vulnerabilities, CodeBuild deploys the container image to an Amazon EKS staging environment.
6. After the deployment succeeds, CodeBuild triggers the DAST scanning with the OWASP ZAP tool (again, this is fully enabled for a “bring your own tool” approach).
  - (6a) If there are any vulnerabilities, CodeBuild invokes the Lambda function, which parses the results into ASFF and posts it to Security Hub. The function also uploads the scan results to an S3 bucket (similar to step 4a).
7. If there are no vulnerabilities, the approval stage is triggered, and an email is sent to the approver for action via Amazon SNS.
8. After approval, CodeBuild deploys the code to the production Amazon EKS environment.
9. During the pipeline run, CloudWatch Events captures the build state changes and sends email notifications to subscribed users through Amazon SNS.
10. CloudTrail tracks the API calls and sends notifications on critical events on the pipeline and CodeBuild projects, such as UpdatePipeline, DeletePipeline, CreateProject, and DeleteProject, for auditing purposes.
11. AWS Config tracks all the configuration changes of AWS services. The following AWS Config rules are added in this pipeline as security best practices:
  1. CODEBUILD_PROJECT_ENVVAR_AWSCRED_CHECK – Checks whether the project contains environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The rule is NON_COMPLIANT when the project environment variables contain plaintext credentials. This rule ensures that sensitive information isn’t stored in the CodeBuild project environment variables.
  2. CLOUD_TRAIL_LOG_FILE_VALIDATION_ENABLED – Checks whether CloudTrail creates a signed digest file with logs. AWS recommends that the file validation be enabled on all trails. The rule is noncompliant if the validation is not enabled. This rule ensures that pipeline resources such as the CodeBuild project aren’t altered to bypass critical vulnerability checks.

Security of the pipeline is implemented using IAM roles and S3 bucket policies to restrict access to pipeline resources. Pipeline data at rest and in transit is protected using encryption and SSL secure transport. We use Parameter Store to store sensitive information such as API tokens and passwords. To be fully compliant with frameworks such as FedRAMP, other things may be required, such as MFA.

Security in the pipeline is implemented by performing the Secret Analysis, SCA, SAST, DAST, and RASP security checks. Applicable AWS services provide encryption at rest and in transit by default. You can enable additional controls on top of these wherever required.

In the next section, I explain how to deploy and run the pipeline CloudFormation template used for this example. As a best practice, we recommend using linting tools like cfn-nag and cfn-guard to scan CloudFormation templates for security vulnerabilities. Refer to the provided service links to learn more about each of the services in the pipeline.

Prerequisites

Before getting started, make sure you have the following prerequisites:

An EKS cluster environment with your application deployed. In this post, we use PHP WordPress as a sample application, but you can use any other application.
Sysdig Falco installed on an EKS cluster. Sysdig Falco captures events on the EKS cluster and sends those events to CloudWatch using AWS FireLens. For implementation instructions, see Implementing Runtime security in Amazon EKS using CNCF Falco. This step is required only if you need to implement RASP in the software factory.
A CodeCommit repo with your application code and a Dockerfile. For more information, see Create an AWS CodeCommit repository.
An Amazon ECR repo to store container images and scan for vulnerabilities. Enable vulnerability scanning on image push in Amazon ECR. You can enable or disable the automatic scanning on image push via the Amazon ECR
The provided buildspec-*.yml files for git-secrets, Anchore, Snyk, Amazon ECR, OWASP ZAP, and your Kubernetes deployment .yml files uploaded to the root of the application code repository. Please update the Kubernetes (kubectl) commands in the buildspec files as needed.
A Snyk API key if you use Snyk as a SAST tool.
The Lambda function uploaded to an S3 bucket. We use this function to parse the scan reports and post the results to Security Hub.
An OWASP ZAP URL and generated API key for dynamic web scanning.
An application web URL to run the DAST testing.
An email address to receive approval notifications for deployment, pipeline change notifications, and CloudTrail events.
AWS Config and Security Hub services enabled. For instructions, see Managing the Configuration Recorder and Enabling Security Hub manually, respectively.

Deploying the pipeline

To deploy the pipeline, complete the following steps:

Download the CloudFormation template and pipeline code from the GitHub repo.
Sign in to your AWS account if you have not done so already.
On the CloudFormation console, choose Create Stack.
Choose the CloudFormation pipeline template.
Choose Next.
Under Code, provide the following information:
1. Code details, such as repository name and the branch to trigger the pipeline.
2. The Amazon ECR container image repository name.
Under SAST, provide the following information:
1. Choose the SAST tool (Anchore or Snyk) for code analysis.
2. If you select Snyk, provide an API key for Snyk.
Under DAST, choose the DAST tool (OWASP ZAP) for dynamic testing and enter the API token, DAST tool URL, and the application URL to run the scan.
Under Lambda functions, enter the Lambda function S3 bucket name, filename, and the handler name.
For STG EKS cluster, enter the staging EKS cluster name.
For PRD EKS cluster, enter the production EKS cluster name to which this pipeline deploys the container image.
Under General, enter the email addresses to receive notifications for approvals and pipeline status changes.
Choose Next.
Complete the stack.
After the pipeline is deployed, confirm the subscription by choosing the provided link in the email to receive notifications.

Pipeline CloudFormation Parameters

The provided CloudFormation template in this post is formatted for AWS GovCloud. If you’re setting this up in a standard Region, you have to adjust the partition name in the CloudFormation template. For example, change ARN values from arn:aws-us-gov to arn:aws.

Running the pipeline

To trigger the pipeline, commit changes to your application repository files. That generates a CloudWatch event and triggers the pipeline. CodeBuild scans the code and if there are any vulnerabilities, it invokes the Lambda function to parse and post the results to Security Hub.

When posting the vulnerability finding information to Security Hub, we need to provide a vulnerability severity level. Based on the provided severity value, Security Hub assigns the label as follows. Adjust the severity levels in your code based on your organization’s requirements.

0 – INFORMATIONAL
1–39 – LOW
40– 69 – MEDIUM
70–89 – HIGH
90–100 – CRITICAL

The following screenshot shows the progression of your pipeline.

DevSecOps Kubernetes CI/CD Pipeline

Secrets analysis scanning

In this architecture, after the pipeline is initiated, CodeBuild triggers the Secret Analysis stage using git-secrets and the buildspec-gitsecrets.yml file. Git-Secrets looks for any sensitive information such as AWS access keys and secret access keys. Git-Secrets allows you to add custom strings to look for in your analysis. CodeBuild uses the provided buildspec-gitsecrets.yml file during the build stage.

SCA and SAST scanning

In this architecture, CodeBuild triggers the SCA and SAST scanning using Anchore, Snyk, and Amazon ECR. In this solution, we use the open-source versions of Anchore and Snyk. Amazon ECR uses open-source Clair under the hood, which comes with Amazon ECR for no additional cost. As mentioned earlier, you can choose Anchore or Snyk to do the initial image scanning.

Scanning with Anchore

If you choose Anchore as a SAST tool during the deployment, the build stage uses the buildspec-anchore.yml file to scan the container image. If there are any vulnerabilities, it fails the build and triggers the Lambda function to post those findings to Security Hub. If there are no vulnerabilities, it proceeds to next stage.

Anchore Lambda Code Snippet

Scanning with Snyk

If you choose Snyk as a SAST tool during the deployment, the build stage uses the buildspec-snyk.yml file to scan the container image. If there are any vulnerabilities, it fails the build and triggers the Lambda function to post those findings to Security Hub. If there are no vulnerabilities, it proceeds to next stage.

Snyk Lambda Code Snippet

Scanning with Amazon ECR

If there are no vulnerabilities from Anchore or Snyk scanning, the image is pushed to Amazon ECR, and the Amazon ECR scan is triggered automatically. Amazon ECR lists the vulnerability findings on the Amazon ECR console. To provide a single pane of glass view of all the vulnerability findings and for easy administration, we retrieve those findings and post them to Security Hub. If there are no vulnerabilities, the image is deployed to the EKS staging cluster and next stage (DAST scanning) is triggered.

ECR Lambda Code Snippet

DAST scanning with OWASP ZAP

In this architecture, CodeBuild triggers DAST scanning using the DAST tool OWASP ZAP.

After deployment is successful, CodeBuild initiates the DAST scanning. When scanning is complete, if there are any vulnerabilities, it invokes the Lambda function, similar to SAST analysis. The function parses and posts the results to Security Hub. The following is the code snippet of the Lambda function.

Zap Lambda Code Snippet

The following screenshot shows the results in Security Hub. The highlighted section shows the vulnerability findings from various scanning stages.

Vulnerability Findings in Security Hub

We can drill down to individual resource IDs to get the list of vulnerability findings. For example, if we drill down to the resource ID of SASTBuildProject*, we can review all the findings from that resource ID.

SAST Vulnerabilities in Security Hub

If there are no vulnerabilities in the DAST scan, the pipeline proceeds to the manual approval stage and an email is sent to the approver. The approver can review and approve or reject the deployment. If approved, the pipeline moves to next stage and deploys the application to the production EKS cluster.

Aggregation of vulnerability findings in Security Hub provides opportunities to automate the remediation. For example, based on the vulnerability finding, you can trigger a Lambda function to take the needed remediation action. This also reduces the burden on operations and security teams because they can now address the vulnerabilities from a single pane of glass instead of logging into multiple tool dashboards.

Along with Security Hub, you can send vulnerability findings to your issue tracking systems such as JIRA, Systems Manager SysOps, or can automatically create an incident management ticket. This is outside the scope of this post, but is one of the possibilities you can consider when implementing DevSecOps software factories.

RASP scanning

Sysdig Falco is an open-source runtime security tool. Based on the configured rules, Falco can detect suspicious activity and alert on any behavior that involves making Linux system calls. You can use Falco rules to address security controls like NIST SP 800-53. Falco agents on each EKS node continuously scan the containers running in pods and send the events as STDOUT. These events can be then sent to CloudWatch or any third-party log aggregator to send alerts and respond. For more information, see Implementing Runtime security in Amazon EKS using CNCF Falco. You can also use Lambda to trigger and automatically remediate certain security events.

The following screenshot shows Falco events on the CloudWatch console. The highlighted text describes the Falco event that was triggered based on the default Falco rules on the EKS cluster. You can add additional custom rules to meet your security control requirements. You can also trigger responsive actions from these CloudWatch events using services like Lambda.

Falco alerts in CloudWatch

Cleanup

This section provides instructions to clean up the DevSecOps pipeline setup:

Conclusion

In this post, I presented an end-to-end Kubernetes-based DevSecOps software factory on AWS with continuous testing, continuous logging and monitoring, auditing and governance, and operations. I demonstrated how to integrate various open-source scanning tools, such as Git-Secrets, Anchore, Snyk, OWASP ZAP, and Sysdig Falco for Secret Analysis, SCA, SAST, DAST, and RASP analysis, respectively. To reduce operations overhead, I explained how to aggregate and manage vulnerability findings in Security Hub as a single pane of glass. This post also talked about how to implement security of the pipeline and in the pipeline using AWS Cloud-native services. Finally, I provided the DevSecOps software factory as code using AWS CloudFormation.

To get started with DevSecOps on AWS, see AWS DevOps and the DevOps blog.

Srinivas Manepalli is a DevSecOps Solutions Architect in the U.S. Fed SI SA team at Amazon Web Services (AWS). He is passionate about helping customers, building and architecting DevSecOps and highly available software systems. Outside of work, he enjoys spending time with family, nature and good food.

Choosing a CI/CD approach: AWS Services with BigHat Biosciences

2021-06-10 Mike Apted

Post Syndicated from Mike Apted original https://aws.amazon.com/blogs/devops/choosing-ci-cd-aws-services-bighat-biosciences/

Founded in 2019, BigHat Biosciences’ mission is to improve human health by reimagining antibody discovery and engineering to create better antibodies faster. Their integrated computational + experimental approach speeds up antibody design and discovery by combining high-speed molecular characterization with machine learning technologies to guide the search for better antibodies. They apply these design capabilities to develop new generations of safer and more effective treatments for patients suffering from today’s most challenging diseases. Their platform, from wet lab robots to cloud-based data and logistics plane, is woven together with rapidly changing BigHat-proprietary software. BigHat uses continuous integration and continuous deployment (CI/CD) throughout their data engineering workflows and when training and evaluating their machine learning (ML) models.

BigHat Biosciences Logo

In a previous post, we discussed the key considerations when choosing a CI/CD approach. In this post, we explore BigHat’s decisions and motivations in adopting managed AWS CI/CD services. You may find that your organization has commonalities with BigHat and some of their insights may apply to you. Throughout the post, considerations are informed and choices are guided by the best practices in the AWS Well-Architected Framework.

How did BigHat decide what they needed?

Making decisions on appropriate (CI/CD) solutions requires understanding the characteristics of your organization, the environment you operate in, and your current priorities and goals.

“As a company designing therapeutics for patients rather than software, the role of technology at BigHat is to enable a radically better approach to develop therapeutic molecules,” says Eddie Abrams, VP of Engineering at BigHat. “We need to automate as much as possible. We need the speed, agility, reliability and reproducibility of fully automated infrastructure to enable our company to solve complex problems with maximum scientific rigor while integrating best in class data analysis. Our engineering-first approach supports that.”

BigHat possesses a unique insight to an unsolved problem. As an early stage startup, their core focus is optimizing the fully integrated platform that they built from the ground-up to guide the design for better molecules. They respond to feedback from partners and learn from their own internal experimentation. With each iteration, the quality of what they’re creating improves, and they gain greater insight and improved models to support the next iteration. More than anything, they need to be able to iterate rapidly. They don’t need any additional complexity that would distract from their mission. They need uncomplicated and enabling solutions.

They also have to take into consideration the regulatory requirements that apply to them as a company, the data they work with and its security requirements; and the market segment they compete in. Although they don’t control these factors, they can control how they respond to them, and they want to be able to respond quickly. It’s not only speed that matters in designing for security and compliance, but also visibility and track-ability. These often overlooked and critical considerations are instrumental in choosing a CI/CD strategy and platform.

“The ability to learn faster than your competitors may be the only sustainable competitive advantage,” says Cindy Alvarez in her book Lean Customer Development.

The tighter the feedback loop, the easier it is to make a change. Rapid iteration allows BigHat to easily build upon what works, and make adjustments as they identify avenues that won’t lead to success.

Feature set

CI/CD is applicable to more than just the traditional use case. It doesn’t have to be software delivered in a classic fashion. In the case of BigHat, they apply CI/CD in their data engineering workflows and in training their ML models. BigHat uses automated solutions in all aspects of their workflow. Automation further supports taking what they have created internally and enabling advances in antibody design and development for safer, more effective treatments of conditions.

“We see a broadening of the notion of what can come under CI/CD,” says Abrams. “We use automated solutions wherever possible including robotics to perform scaled assays. The goal in tightening the loop is to improve precision and speed, and reduce latency and lag time.”

BigHat reached the conclusion that they would adopt managed service offerings wherever possible, including in their CI/CD tooling and other automation initiatives.

“The phrase ‘undifferentiated heavy lifting’ has always resonated,” says Abrams. “Building, scaling, and operating core software and infrastructure are hard problems, but solving them isn’t itself a differentiating advantage for a therapeutics company. But whether we can automate that infrastructure, and how we can use that infrastructure at scale on a rock solid control plane to provide our custom solutions iteratively, reliably and efficiently absolutely does give us an edge. We need an end-to-end, complete infrastructure solution that doesn’t force us to integrate a patchwork of solutions ourselves. AWS provides exactly what we need in this regard.”

Reducing risk

Startups can be full of risk, with the upside being potential future reward. They face risk in finding the right problem, in finding a solution to that problem, and in finding a viable customer base to buy that solution.

A key priority for early stage startups is removing risk from as many areas of the business as possible. Any steps an early stage startup can take to remove risk without commensurately limiting reward makes them more viable. The more risk a startup can drive out of their hypothesis the more likely their success, in part because they’re more attractive to customers, employees, and investors alike. The more likely their product solves their problem, the more willing a customer is to give it a chance. Likewise, the more attractive they are to investors when compared to alternative startups with greater risk in reaching their next major milestone.

Adoption of managed services for CI/CD accomplishes this goal in several ways. The most important advantage remains speed. The core functionality required can be stood up very quickly, as it’s an existing service. Customers have a large body of reference examples and documentation available to demonstrate how to use that service. They also insulate teams from the need to configure and then operate the underlying infrastructure. The team remains focused on their differentiation and their core value proposition.

“We are automated right up to the organizational level and because of this, running those services ourselves represents operational risk,” says Abrams. “The largest day-to-day infrastructure risk to us is having the business stalled while something is not working. Do I want to operate these services, and focus my staff on that? There is no guarantee I can just throw more compute at a self-managed software service I’m running and make it scale effectively. There is no guarantee that if one datacenter is having a network or electrical problem that I can simply switch to another datacenter. I prefer AWS manages those scale and uptime problems.”

Embracing an opinionated model

BigHat is a startup with a singular focus on using ML to reduce the time and difficulty of designing antibodies and other therapeutic proteins. By adopting managed services, they have removed the burden of implementing and maintaining CI/CD systems.

Accepting the opinionated guardrails of the managed service approach allows, and to a degree reinforces, the focus on what makes a startup unique. Rather than being focused on performance tuning, making decisions on what OS version to use, or which of the myriad optional puzzle pieces to put together, they can use a well-integrated set of tools built to work with each other in a defined fashion.

The opinionated model means best practices are baked into the toolchain. Instead of hiring for specialized administration skills they’re hiring for specialized biotech skills.

“The only degrees of freedom I care about are the ones that improve our technologies and reduce the time, cost, and risk of bringing a therapeutic to market,” says Abrams. “We focus on exactly where we can gain operational advantages by simply adopting managed services that already embrace the Well-Architected Framework. If we had to tackle all of these engineering needs with limited resources, we would be spending into a solved problem. Before AWS, startups just didn’t do these sorts of things very well. Offloading this effort to a trusted partner is pretty liberating.”

Beyond the reduction in operational concerns, BigHat can also expect continuous improvement of that service over time to be delivered automatically by the provider. For their use case they will likely derive more benefit for less cost over time without any investment required.

Overview of solution

BigHat uses the following key services:

Code – GitHub for source code management.
Test and build – AWS CodePipeline as the orchestration layer to manage build and test flows through AWS CodeBuild (compute workers for CI jobs).
Publish stage – Amazon Elastic Container Registry (Amazon ECR) stores the infrastructure containers for the runners and product containers. AWS Code Artifact stores and serves our development and production distributables.
Observability – Amazon CloudWatch Logs, Insights, Dashboards, and AWS X-Ray provide logging, metrics, and tracing. AWS CloudTrail and Amazon Athena provide deep insight into our API and object level security. AWS Security Hub is our window into our security posture.
Infrastructure As Code – AWS CloudFormation is used to define and deploy all resources.

BigHat Reference Architecture

Security

Managed services are supported, owned and operated by the provider . This allows BigHat to leave concerns like patching and security of the underlying infrastructure and services to the provider. BigHat continues to maintain ownership in the shared responsibility model, but their scope of concern is significantly narrowed. The surface area the’re responsible for is reduced, helping to minimize risk. Choosing a partner with best in class observability, tracking, compliance and auditing tools is critical to any company that manages sensitive data.

Cost advantages

A startup must also make strategic decisions about where to deploy the capital they have raised from their investors. The vendor managed services bring a model focused on consumption, and allow the startup to make decisions about where they want to spend. This is often referred to as an operational expense (OpEx) model, in other words “pay as you go”, like a utility. This is in contrast to a large upfront investment in both time and capital to build these tools. The lack of need for extensive engineering efforts to stand up these tools, and continued investment to evolve them, acts as a form of capital expenditure (CapEx) avoidance. Startups can allocate their capital where it matters most for them.

“This is corporate-level changing stuff,” says Abrams. “We engage in a weekly leadership review of cost budgets. Operationally I can set the spending knob where I want it monthly, weekly or even daily, and avoid the risks involved in traditional capacity provisioning.”

The right tool for the right time

A key consideration for BigHat was the ability to extend the provider managed tools, where needed, to incorporate extended functionality from the ecosystem. This allows for additional functionality that isn’t covered by the core managed services, while maintaining a focus on their product development versus operating these tools.

Startups must also ask themselves what they need now, versus what they need in the future. As their needs change and grow, they can augment, extend, and replace the tools they have chosen to meet the new requirements. Starting with a vendor-managed service is not a one-way door; it’s an opportunity to defer investment in building and operating these capabilities yourself until that investment is justified. The time to value in using managed services initially doesn’t leave a startup with a sunk cost that limits future options.

“You have to think about the degree you want to adopt a hybrid model for the services you run. Today we aren’t running any software or services that require us to run our own compute instances. It’s very rare we run into something that is hard to do using just the services AWS already provides. Where our needs are not met, we can communicate them to AWS and we can choose to wait for them on their roadmap, which we have done in several cases, or we can elect to do it ourselves,” says Abrams. “This freedom to tweak and expand our service model at will is incomparably liberating.”

Conclusion

BigHat Biosciences was able to make an informed decision by considering the priorities of the business at this stage of its lifecycle. They adopted and embraced opinionated and service provider-managed tooling, which allowed them to inherit a largely best practice set of technology and practices, de-risk their operations, and focus on product velocity and customer feedback. This maintains future flexibility, which delivers significantly more value to the business in its current stage.

“We believe that the underlying engineering, the underlying automation story, is an advantage that applies to every aspect of what we do for our customers,” says Abrams. “By taking those advantages into every aspect of the business, we deliver on operations in a way that provides a competitive advantage a lot of other companies miss by not thinking about it this way.”

About the authors

Mike is a Principal Solutions Architect with the Startup Team at Amazon Web Services. He is a former founder, current mentor, and enjoys helping startups live their best cloud life.

Sean is a Senior Startup Solutions Architect at AWS. Before AWS, he was Director of Scientific Computing at the Howard Hughes Medical Institute.