Tag Archives: devops

Policy-based access control in application development with Amazon Verified Permissions

2023-06-19 Marc von Mandel

Post Syndicated from Marc von Mandel original https://aws.amazon.com/blogs/devops/policy-based-access-control-in-application-development-with-amazon-verified-permissions/

Today, accelerating application development while shifting security and assurance left in the development lifecycle is essential. One of the most critical components of application security is access control. While traditional access control mechanisms such as role-based access control (RBAC) and access control lists (ACLs) are still prevalent, policy-based access control (PBAC) is gaining momentum. PBAC is a more powerful and flexible access control model, allowing developers to apply any combination of coarse-, medium-, and fine-grained access control over resources and data within an application. In this article, we will explore PBAC and how it can be used in application development using Amazon Verified Permissions and how you can define permissions as policies using Cedar, an expressive and analyzable open-source policy language. We will briefly describe here how developers and admins can define policy-based access controls using roles and attributes for fine-grained access.

What is Policy-Based Access Control?

PBAC is an access control model that uses permissions expressed as policies to determine who can access what within an application. Administrators and developers can define application access statically as admin-time authorization where the access is based on users and groups defined by roles and responsibilities. On the other hand, developers set up run-time or dynamic authorization at any time to apply access controls at the time when a user attempts to access a particular application resource. Run-time authorization takes in attributes of application resources, such as contextual elements like time or location, to determine what access should be granted or denied. This combination of policy types makes policy-based access control a more powerful authorization engine.

A central policy store and policy engine evaluates these policies continuously, in real-time to determine access to resources. PBAC is a more dynamic access control model as it allows developers and administrators to create and modify policies according to their needs, such as defining custom roles within an application or enabling secure, delegated authorization. Developers can use PBAC to apply role- and attributed-based access controls across many different types of applications, such as customer-facing web applications, internal workforce applications, multi-tenant software-as-a-service (SaaS) applications, edge device access, and more. PBAC brings together RBAC and attribute-based access control (ABAC), which have been the two most widely used access control models for the past couple decades (See the figure below).

Policy-based access control with admin-time and run-time authorization

Figure 1: Overview of policy-based access control (PBAC)

Before we try and understand how to modernize permissions, let’s understand how developers implement it in a traditional development process. We typically see developers hardcode access control into each and every application. This creates four primary challenges.

First, you need to update code every time to update access control policies. This is time-consuming for a developer and done at the expense of working on the business logic of the application.
Second, you need to implement these permissions in each and every application you build.
Third, application audits are challenging, you need to run a battery of tests or dig through thousands of lines of code spread across multiple files to demonstrate who has access to application resources. For example, providing evidence to audits that only authorized users can access a patient’s health record.
Finally, developing hardcoded application access control is often time consuming and error prone.

Amazon Verified Permissions simplifies this process by externalizing access control rules from the application code to a central policy store within the service. Now, when a user tries to take an action in your application, you call Verified Permissions to check if it is authorized. Policy admins can respond faster to changing business requirements, as they no longer need to depend on the development team when updating access controls. They can use a central policy store to make updates to authorization policies. This means that developers can focus on the core application logic, and access control policies can be created, customized, and managed separately or collectively across applications. Developers can use PBAC to define authorization rules for users, user groups, or attributes based on the entity type accessing the application. Restricting access to data and resources using PBAC protects against unintended access to application resources and data.

For example, a developer can define a role-based and attribute-based access control policy that allows only certain users or roles to access a particular API. Imagine a group of users within a Marketing department that can only view specific photos within a photo sharing application. The policy might look something like the following using Cedar.

permit(

principal in Role::"expo-speakers",

action == Action::"view",

resource == Photo::"expoPhoto94.jpg"

)

when {

principal.department == “Marketing”

}

;

How do I get started using PBAC in my applications?

PBAC can be integrated into the application development process in several ways when using Amazon Verified Permissions. Developers begin by defining an authorization model for their application and use this to describe the scope of authorization requests made by the application and the basis for evaluating the requests. Think of this as a narrative or structure to authorization requests. Developers then write a schema which documents the form of the authorization model in a machine-readable syntax. This schema document describes each entity type, including principal types, actions, resource types, and conditions. Developers can then craft policies, as statements, that permit or forbid a principal to one or more actions on a resource.

Next, you define a set of application policies which define the overall framework and guardrails for access controls in your application. For example, a guardrail policy might be that only the owner can access photos that are marked ‘private’. These policies are applicable to a large set of users or resources, and are not user or resource specific. You create these policies in the code of your applications, and instantiate them in your CI/CD pipeline, using CloudFormation, and tested in beta stages before being deployed to production.

Lastly, you define the shape of your end-user policies using policy templates. These end-user policies are specific to a user (or user group). For example, a policy that states “Alice” can view “expoPhoto94.jpg”. Policy templates simplify managing end-user policies as a group. Now, every time a user in your application tries to take an action, you call Verified Permissions to confirm that the action is authorized.

Benefits of using Amazon Verified Permissions policies in application development

Amazon Verified Permissions offers several benefits when it comes to application development.

One of the most significant benefits is the flexibility in using the PBAC model. Amazon Verified Permissions allows application administrators or developers to create and modify policies at any time without going into application code, making it easier to respond to changing security needs.
Secondly, it simplifies the application development process by externalizing access control rules from the application code. Developers can reuse PBAC controls for newly built or acquired applications. This allows developers to focus on the core application logic and mitigates security risks within applications by applying fine-grained access controls.
Lastly, developers can add secure delegated authorization using PBAC and Amazon Verified Permissions. This enables developers to enable a group, role, or resource owner the ability to manage data sharing within application resources or between services. This has exciting implications for developers wanting to add privacy and consent capabilities for end users while still enforcing guardrails defined within a centralized policy store.

In Summary

PBAC is a more flexible access control model that enables fine-grained control over access to resources in an application. By externalizing access control rules from the application code, PBAC simplifies the application development process and reduces the risks of security vulnerabilities in the application. PBAC also offers flexibility, aligns with compliance mandates for access control, and developers and administrators benefit from centralized permissions across various stages of the DevOps process. By adopting PBAC in application development, organizations can improve their application security and better align with industry regulations.

Amazon Verified Permissions is a scalable permissions management and fine-grained authorization service for applications developers build. The service helps developers to build secure applications faster by externalizing authorization and centralizing policy management and administration. Developers can align their application access with Zero Trust principles by implementing least privilege and continuous verification within applications. Security and audit teams can better analyze and audit who has access to what within applications.

How to deploy workloads in a multicloud environment with AWS developer tools

2023-06-08 Brent Van Wynsberge

Post Syndicated from Brent Van Wynsberge original https://aws.amazon.com/blogs/devops/how-to-deploy-workloads-in-a-multicloud-environment-with-aws-developer-tools/

As organizations embrace cloud computing as part of “cloud first” strategy, and migrate to the cloud, some of the enterprises end up in a multicloud environment. We see that enterprise customers get the best experience, performance and cost structure when they choose a primary cloud provider. However, for a variety of reasons, some organizations end up operating in a multicloud environment. For example, in case of mergers & acquisitions, an organization may acquire an entity which runs on a different cloud platform, resulting in the organization operating in a multicloud environment. Another example is in the case where an ISV (Independent Software Vendor) provides services to customers operating on different cloud providers. One more example is the scenario where an organization needs to adhere to data residency and data sovereignty requirements, and ends up with workloads deployed to multiple cloud platforms across locations. Thus, the organization ends up running in a multicloud environment.

In the scenarios described above, one of the challenges organizations face operating such a complex environment is managing release process (building, testing, and deploying applications at scale) across multiple cloud platforms. If an organization’s primary cloud provider is AWS, they may want to continue using AWS developer tools to deploy workloads in other cloud platforms. Organizations facing such scenarios can leverage AWS services to develop their end-to-end CI/CD and release process instead of developing a release pipeline for each platform, which is complex, and not sustainable in the long run.

In this post we show how organizations can continue using AWS developer tools in a hybrid and multicloud environment. We walk the audience through a scenario where we deploy an application to VMs running on-premises and Azure, showcasing AWS’ hybrid and multicloud DevOps capabilities.

Solution and scenario overview

In this post we’re demonstrating the following steps:

Setup a CI/CD pipeline using AWS CodePipeline, and show how it’s run when application code is updated, and checked into the code repository (GitHub).
Check out application code from the code repository, and use an IDE (Visual Studio Code) to make changes, and check-in the code to the code repository.
Check in the modified application code to automatically run the release process built using AWS CodePipeline. It makes use of AWS CodeBuild to retrieve the latest version of code from code repository, compile it, build the deployment package, and test the application.
Deploy the updated application to VMs across on-premises, and Azure using AWS CodeDeploy.

The high-level solution is shown below. This post does not show all of the possible combinations and integrations available to build the CI/CD pipeline. As an example, you can integrate the pipeline with your existing tools for test and build such as Selenium, Jenkins, SonarQube etc.

This post focuses on deploying application in a multicloud environment, and how AWS Developer Tools can support virtually any scenario or use case specific to your organization. We will be deploying a sample application from this AWS tutorial to an on-premises server, and an Azure Virtual Machine (VM) running Red Hat Enterprise Linux (RHEL). In future posts in this series, we will cover how you can deploy any type of workload using AWS tools, including containers, and serverless applications.

Architecture Diagram

CI/CD pipeline setup

This section describes instructions for setting up a multicloud CI/CD pipeline.

Note: A key point to note is that the CI/CD pipeline setup, and related sub-sections in this post, are a one-time activity, and you’ll not need to perform these steps every time an application is deployed or modified.

Install CodeDeploy agent

The AWS CodeDeploy agent is a software package that is used to execute deployments on an instance. You can install the CodeDeploy agent on an on-premises server and Azure VM by either using the command line, or AWS Systems Manager.

Setup GitHub code repository

Setup GitHub code repository using the following steps:

Create a new GitHub code repository or use a repository that already exists.
Copy the Sample_App_Linux app (zip) from Amazon S3 as described in Step 3 of Upload a sample application to your GitHub repository tutorial.

Commit the files to code repository

git add .
git commit -m 'Initial Commit'
git push

You will use this repository to deploy your code across environments.

Configure AWS CodePipeline

Follow the steps outlined below to setup and configure CodePipeline to orchestrate the CI/CD pipeline of our application.

Navigate to CodePipeline in the AWS console and click on ‘Create pipeline’
Give your pipeline a name (eg: MyWebApp-CICD) and allow CodePipeline to create a service role on your behalf.
For the source stage, select GitHub (v2) as your source provide and click on the Connect to GitHub button to give CodePipeline access to your git repository.
Create a new GitHub connection and click on the Install a new App button to install the AWS Connector in your GitHub account.
Back in the CodePipeline console select the repository and branch you would like to build and deploy.

Image showing the configured source stage

Now we create the build stage; Select AWS CodeBuild as the build provider.
Click on the ‘Create project’ button to create the project for your build stage, and give your project a name.
Select Ubuntu as the operating system for your managed image, chose the standard runtime and select the ‘aws/codebuild/standard’ image with the latest version.

In the Buildspec section select “Insert build commands” and click on switch to editor. Enter the following yaml code as your build commands:

version: 0.2
phases:
    build:
        commands:
            - echo "This is a dummy build command"
artifacts:
    files:
        - "*/*"

Note: you can also integrate build commands to your git repository by using a buildspec yaml file. More information can be found at Build specification reference for CodeBuild.

Leave all other options as default and click on ‘Continue to CodePipeline’

Image showing the configured buildspec

Back in the CodePipeline console your Project name will automatically be filled in. You can now continue to the next step.
Click the “Skip deploy stage” button; We will create this in the next section.
Review your changes and click “Create pipeline”. Your newly created pipeline will now build for the first time!

Image showing the first execution of the CI/CD pipeline

Configure AWS CodeDeploy on Azure and on-premises VMs

Now that we have built our application, we want to deploy it to both the environments – Azure, and on-premises. In the “Install CodeDeploy agent” section we’ve already installed the CodeDeploy agent. As a one-time step we now have to give the CodeDeploy agents access to the AWS environment. You can leverage AWS Identity and Access Management (IAM) Roles Anywhere in combination with the code-deploy-session-helper to give access to the AWS resources needed.
The IAM Role should at least have the AWSCodeDeployFullAccess AWS managed policy and Read only access to the CodePipeline S3 bucket in your account (called codepipeline-<region>-<account-id>) .

For more information on how to setup IAM Roles Anywhere please refer how to extend AWS IAM roles to workloads outside of AWS with IAM Roles Anywhere. Alternative ways to configure access can be found in the AWS CodeDeploy user guide. Follow the steps below for instances you want to configure.

Configure your CodeDeploy agent as described in the user guide. Ensure the AWS Command Line Interface (CLI) is installed on your VM and execute the following command to register the instance with CodeDeploy.
```
aws deploy register-on-premises-instance --instance-name <name_for_your_instance> --iam-role-arn <arn_of_your_iam_role>
```

Tag the instance as follows

aws deploy add-tags-to-on-premises-instances --instance-names <name_for_your_instance> --tags Key=Application,Value=MyWebApp

You should now see both instances registered in the “CodeDeploy > On-premises instances” panel. You can now deploy application to your Azure VM and on premises VMs!

Image showing the registered instances

Configure AWS CodeDeploy to deploy WebApp

Follow the steps mentioned below to modify the CI/CD pipeline to deploy the application to Azure, and on-premises environments.

Create an IAM role named CodeDeployServiceRole and select CodeDeploy > CodeDeploy as your use case. IAM will automatically select the right policy for you. CodeDeploy will use this role to manage the deployments of your application.
In the AWS console navigate to CodeDeploy > Applications. Click on “Create application”.
Give your application a name and choose “EC2/On-premises” as the compute platform.
Configure the instances we want to deploy to. In the detail view of your application click on “Create deployment group”.
Give your deployment group a name and select the CodeDeployServiceRole.
In the environment configuration section choose On-premises Instances.
Configure the Application, MyWebApp key value pair.
Disable load balancing and leave all other options default.
Click on create deployment group. You should now see your newly created deployment group.

Image showing the created CodeDeploy Application and Deployment group

We can now edit our pipeline to deploy to the newly created deployment group.
Navigate to your previously created Pipeline in the CodePipeline section and click edit. Add the deploy stage by clicking on Add stage and name it Deploy. Aftewards click Add action.
Name your action and choose CodeDeploy as your action provider.
Select “BuildArtifact” as your input artifact and select your newly created application and deployment group.
Click on Done and on Save in your pipeline to confirm the changes. You have now added the deploy step to your pipeline!

Image showing the updated pipeline

This completes the on-time devops pipeline setup, and you will not need to repeat the process.

Automated DevOps pipeline in action

This section demonstrates how the devops pipeline operates end-to-end, and automatically deploys application to Azure VM, and on-premises server when the application code changes.

Click on Release Change to deploy your application for the first time. The release change button manually triggers CodePipeline to update your code. In the next section we will make changes to the repository which triggers the pipeline automatically.
During the “Source” stage your pipeline fetches the latest version from github.
During the “Build” stage your pipeline uses CodeBuild to build your application and generate the deployment artifacts for your pipeline. It uses the buildspec.yml file to determine the build steps.
During the “Deploy” stage your pipeline uses CodeDeploy to deploy the build artifacts to the configured Deployment group – Azure VM and on-premises VM. Navigate to the url of your application to see the results of the deployment process.

Update application code in IDE

You can modify the application code using your favorite IDE. In this example we will change the background color and a paragraph of the sample application.

Once you’ve modified the code, save the updated file followed by pushing the code to the code repository.

git add .
git commit -m "I made changes to the index.html file "
git push

DevOps pipeline (CodePipeline) – compile, build, and test

Once the code is updated, and pushed to GitHub, the DevOps pipeline (CodePipeline) automatically compiles, builds and tests the modified application. You can navigate to your pipeline (CodePipeline) in the AWS Console, and should see the pipeline running (or has recently completed). CodePipeline automatically executes the Build and Deploy steps. In this case we’re not adding any complex logic, but based on your organization’s requirements you can add any build step, or integrate with other tools.

Deployment process using CodeDeploy

In this section, we describe how the modified application is deployed to the Azure, and on-premises VMs.

Open your pipeline in the CodePipeline console, and click on the “AWS CodeDeploy” link in the Deploy step to navigate to your deployment group. Open the “Deployments” tab.

Image showing application deployment history

Click on the first deployment in the Application deployment history section. This will show the details of your latest deployment.

In the “Deployment lifecycle events” section click on one of the “View events” links. This shows you the lifecycle steps executed by CodeDeploy and will display the error log output if any of the steps have failed.

Image showing deployment events on instance

Navigate back to your application. You should now see your changes in the application. You’ve successfully set up a multicloud DevOps pipeline!

Conclusion

In summary, the post demonstrated how AWS DevOps tools and services can help organizations build a single release pipeline to deploy applications and workloads in a hybrid and multicloud environment. The post also showed how to set up CI/CD pipeline to deploy applications to AWS, on-premises, and Azure VMs.

If you have any questions or feedback, leave them in the comments section.

About the Authors

Optimize software development with Amazon CodeWhisperer

2023-05-31 Dhaval Shah

Post Syndicated from Dhaval Shah original https://aws.amazon.com/blogs/devops/optimize-software-development-with-amazon-codewhisperer/

Businesses differentiate themselves by delivering new capabilities to their customers faster. They must leverage automation to accelerate their software development by optimizing code quality, improving performance, and ensuring their software meets security/compliance requirements. Trained on billions of lines of Amazon and open-source code, Amazon CodeWhisperer is an AI coding companion that helps developers write code by generating real-time whole-line and full-function code suggestions in their IDEs. Amazon CodeWhisperer has two tiers: the individual tier is free for individual use, and the professional tier provides administrative capabilities for organizations seeking to grant their developers access to CW. This blog provides a high-level overview of how developers can use CodeWhisperer.

Getting Started

Getting started with CodeWhisperer is straightforward and documented here. After setup, CodeWhisperer integrates with the IDE and provides code suggestions based on comments written in the IDE. Use TAB to accept a suggestion, ESC to reject the suggestion ALT+C (Windows)/Option + C(MAC) to force a suggestion, and left and right arrow keys to switch between suggestions.

CodeWhisperer supports code generation for 15 programming languages. CodeWhisperer can be used in various IDEs like Amazon Sagemaker Studio, Visual Studio Code, AWS Cloud9, AWS Lambda and many JetBrains IDEs. Refer to the Amazon CodeWhisperer documentation for the latest updates on supported languages and IDEs.

Contextual Code Suggestions

CodeWhisperer continuously examines code and comments for contextual code suggestions. It will generate code snippets using this contextual information and the location of your cursor. Illustrated below is an example of a code suggestion from inline comments in Visual Studio Code that demonstrates how CodeWhisperer can provide context-specific code suggestions without requiring the user to manually replace variables or parameters. In the comment, the file and Amazon Simple Storage Service (Amazon S3) bucket are specified, and CodeWhisperer uses this context to suggest relevant code.

CodeWhisperer also supports and recommends writing declarative code and procedural code, such as shell scripting and query languages. The following example shows how CodeWhisperer recommend the blocks of code in a shell script to loop through servers to execute the hostname command and save their response to an output file.

In the following example, based on the comment, CodeWhisperer suggests Structured Query Language (SQL) code for using common table expression.

CodeWhisperer works with popular Integrated Development Environments (IDEs), for more information on IDE’s supported please refer to CodeWhisperer’s documentation. Illustrated below is CodeWhisperer integrated with AWS Lambda console.

Amazon CodeWhisperer is a versatile AI coding assistant that can aid in a variety of tasks, including AWS-related tasks and API integrations, as well as external (non AWS) API integrations. For example, illustrated below is CodeWhisperer suggesting code for Twilio’s APIs.

Now that we have seen how CodeWhisperer can help with writing code faster, the next section explores how to use AI responsibly.

Use AI responsibly

Developers often leverage open-source code, however run into challenges of license attribution such as attributing the original authors or maintaining the license text. The challenge lies in properly identifying and attributing the relevant open-source components used within a project. With the abundance of open-source libraries and frameworks available, it can be time-consuming and complex to track and attribute each piece of code accurately. Failure to meet the license attribution requirements can result in legal issues, violation of intellectual property rights, and damage to a developer’s reputation. Code Whisperer’s reference tracking continuously monitors suggested code for similarities with known open-source code, allowing developers to make informed decisions about incorporating it into their project and ensuring proper attribution.

Shift left application security

CodeWhisperer can scan code for hard-to-find vulnerabilities such as those in the top ten Open Web Application Security Project (OWASP), or those that don’t meet crypto library best practices, AWS internal security best practices, and others. As of this writing, CodeWhisperer supports security scanning in Python, Java, and JavaScript languages. Below is an illustration of identifying the most known CWEs (Common Weakness Enumeration) along with the ability to dive deep into the problematic line of code with a click of a button.

In the following example, CodeWhisperer provides file-by-file analysis of CWE’s and highlights the top 10 OWASP CWEs such as Unsensitized input is run as code, Cross-site scripting, Resource leak, Hardcoded credentials, SQL injection, OS command injection and Insecure hashing.

Generating Test Cases

A good developer always writes tests. CodeWhisperer can help suggest test cases and verify the code’s functionality. CodeWhisperer considers boundary values, edge cases, and other potential issues that may need to be tested. In the example below, a comment referring to using fact_demo() function leads CodeWhisperer to suggest a unit test for fact_demo() while leveraging contextual details.

Also, CodeWhisperer can simplify creating repetitive code for unit testing. For example, if you need to create sample data using INSERT statements, CodeWhisperer can generate the necessary inserts based on a pattern.

CodeWhisperer with Amazon SageMaker Studio and Jupyter Lab

CodeWhisperer works with SageMaker Studio and Jupyter Lab, providing code completion support for Python in code cells. To utilize CodeWhisperer, follow the setup instructions to activate it in Amazon SageMaker Studio and Jupyter Lab. To begin coding, see User actions.
The following illustration showcases CodeWhisperer’s code recommendations in SageMaker Studio. It demonstrates the suggested code based on comments for loading and analyzing a dataset.

Conclusion

In conclusion, this blog has highlighted the numerous ways in which developers can leverage CodeWhisperer to increase productivity, streamline workflows, and ensure the development of secure code. By adopting Code Whisperer’s AI-powered features, developers can experience enhanced productivity, accelerated learning, and significant time savings.

To take advantage of CodeWhisperer and optimize your coding process, here are the next steps:

1. Visit feature page to learn more about the benefits of CodeWhisperer.

2. Sign up and start using CodeWhisperer.

3. Read about CodeWhisperer success stories

About the Authors

The history and future roadmap of the AWS CloudFormation Registry

2023-05-04 Eric Z. Beard

Post Syndicated from Eric Z. Beard original https://aws.amazon.com/blogs/devops/cloudformation-coverage/

AWS CloudFormation is an Infrastructure as Code (IaC) service that allows you to model your cloud resources in template files that can be authored or generated in a variety of languages. You can manage stacks that deploy those resources via the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the API. CloudFormation helps customers to quickly and consistently deploy and manage cloud resources, but like all IaC tools, it faced challenges keeping up with the rapid pace of innovation of AWS services. In this post, we will review the history of the CloudFormation registry, which is the result of a strategy we developed to address scaling and standardization, as well as integration with other leading IaC tools and partner products. We will also give an update on the current state of CloudFormation resource coverage and review the future state, which has a goal of keeping CloudFormation and other IaC tools up to date with the latest AWS services and features.

History

The CloudFormation service was first announced in February of 2011, with sample templates that showed how to deploy common applications like blogs and wikis. At launch, CloudFormation supported 13 out of 15 available AWS services with 48 total resource types. At first, resource coverage was tightly coupled to the core CloudFormation engine, and all development on those resources was done by the CloudFormation team itself. Over the past decade, AWS has grown at a rapid pace, and there are currently 200+ services in total. A challenge over the years has been the coverage gap between what was possible for a customer to achieve using AWS services, and what was possible to define in a CloudFormation template.

It became obvious that we needed a change in strategy to scale resource development in a way that could keep up with the rapid pace of innovation set by hundreds of service teams delivering new features on a daily basis. Over the last decade, our pace of innovation has increased nearly 40-fold, with 80 significant new features launched in 2011 versus more than 3,000 in 2021. Since CloudFormation was a key adoption driver (or blocker) for new AWS services, those teams needed a way to create and manage their own resources. The goal was to enable day one support of new services at the time of launch with complete CloudFormation resource coverage.

In 2016, we launched an internal self-service platform that allowed service teams to control their own resources. This began to solve the scaling problems inherent in the prior model where the core CloudFormation team had to do all the work themselves. The benefits went beyond simply distributing developer effort, as the service teams have deep domain knowledge on their products, which allowed them to create more effective IaC components. However, as we developed resources on this model, we realized that additional design features were needed, such as standardization that could enable automatic support for features like drift detection and resource imports.

We embarked on a new project to address these concerns, with the goal of improving the internal developer experience as well as providing a public registry where customers could use the same programming model to define their own resource types. We realized that it wasn’t enough to simply make the new model available—we had to evangelize it with a training campaign, conduct engineering boot-camps, build better tooling like dashboards and deployment pipeline templates, and produce comprehensive on-boarding documentation. Most importantly, we made CloudFormation support a required item on the feature launch checklist for new services, a requirement that goes beyond documentation and is built into internal release tooling (exceptions to this requirement are rare as training and awareness around the registry have improved over time). This was a prime example of one of the maxims we repeat often at Amazon: good mechanisms are better than good intentions.

In 2019, we made this new functionality available to customers when we announced the CloudFormation registry, a capability that allowed developers to create and manage private resource types. We followed up in 2021 with the public registry where third parties, such as partners in the AWS Partner Network (APN), can publish extensions. The open source resource model that customers and partners use to publish third-party registry extensions is the same model used by AWS service teams to provide CloudFormation support for their features.

Once a service team on-boards their resources to the new resource model and builds the expected Create, Read, Update, Delete, and List (CRUDL) handlers, managed experiences like drift detection and resource import are all supported with no additional development effort. One recent example of day-1 CloudFormation support for a popular new feature was Lambda Function URLs, which offered a built-in HTTPS endpoint for single-function micro-services. We also migrated the Amazon Relational Database Service (Amazon RDS) Database Instance resource (AWS::RDS::DBInstance) to the new resource model in September 2022, and within a month, Amazon RDS delivered support for Amazon Aurora Serverless v2 in CloudFormation. This accelerated delivery is possible because teams can now publish independently by taking advantage of the de-centralized Registry ownership model.

Current State

We are building out future innovations for the CloudFormation service on top of this new standardized resource model so that customers can benefit from a consistent implementation of event handlers. We built AWS Cloud Control API on top of this new resource model. Cloud Control API takes the Create-Read-Update-Delete-List (CRUDL) handlers written for the new resource model and makes them available as a consistent API for provisioning resources. APN partner products such as HashiCorp Terraform, Pulumi, and Red Hat Ansible use Cloud Control API to stay in sync with AWS service launches without recurring development effort.

Figure 1. Cloud Control API Resource Handler Diagram

Besides 3rd party application support, the public registry can also be used by the developer community to create useful extensions on top of AWS services. A common solution to extending the capabilities of CloudFormation resources is to write a custom resource, which generally involves inline AWS Lambda function code that runs in response to CREATE, UPDATE, and DELETE signals during stack operations. Some of those use cases can now be solved by writing a registry extension resource type instead. For more information on custom resources and resource types, and the differences between the two, see Managing resources using AWS CloudFormation Resource Types.

CloudFormation Registry modules, which are building blocks authored in JSON or YAML, give customers a way to replace fragile copy-paste template reuse with template snippets that are published in the registry and consumed as if they were resource types. Best practices can be encapsulated and shared across an organization, which allows infrastructure developers to easily adhere to those best practices using modular components that abstract away the intricate details of resource configuration.

CloudFormation Registry hooks give security and compliance teams a vital tool to validate stack deployments before any resources are created, modified, or deleted. An infrastructure team can activate hooks in an account to ensure that stack deployments cannot avoid or suppress preventative controls implemented in hook handlers. Provisioning tools that are strictly client-side do not have this level of enforcement.

A useful by-product of publishing a resource type to the public registry is that you get automatic support for the AWS Cloud Development Kit (CDK) via an experimental open source repository on GitHub called cdk-cloudformation. In large organizations it is typical to see a mix of CloudFormation deployments using declarative templates and deployments that make use of the CDK in languages like TypeScript and Python. By publishing re-usable resource types to the registry, all of your developers can benefit from higher level abstractions, regardless of the tool they choose to create and deploy their applications. (Note that this project is still considered a developer preview and is subject to change)

If you want to see if a given CloudFormation resource is on the new registry model or not, check if the provisioning type is either Fully Mutable or Immutable by invoking the DescribeType API and inspecting the ProvisioningType response element.

Here is a sample CLI command that gets a description for the AWS::Lambda::Function resource, which is on the new registry model.

$ aws cloudformation describe-type --type RESOURCE \
    --type-name AWS::Lambda::Function | grep ProvisioningType

   "ProvisioningType": "FULLY_MUTABLE",

The difference between FULLY_MUTABLE and IMMUTABLE is the presence of the Update handler. FULLY_MUTABLE types includes an update handler to process updates to the type during stack update operations. Whereas, IMMUTABLE types do not include an update handler, so the type can’t be updated and must instead be replaced during stack update operations. Legacy resource types will be NON_PROVISIONABLE.

Opportunities for improvement

As we continue to strive towards our ultimate goal of achieving full feature coverage and a complete migration away from the legacy resource model, we are constantly identifying opportunities for improvement. We are currently addressing feature gaps in supported resources, such as tagging support for EC2 VPC Endpoints and boosting coverage for resource types to support drift detection, resource import, and Cloud Control API. We have fully migrated more than 130 resources, and acknowledge that there are many left to go, and the migration has taken longer than we initially anticipated. Our top priority is to maintain the stability of existing stacks—we simply cannot break backwards compatibility in the interest of meeting a deadline, so we are being careful and deliberate. One of the big benefits of a server-side provisioning engine like CloudFormation is operational stability—no matter how long ago you deployed a stack, any future modifications to it will work without needing to worry about upgrading client libraries. We remain committed to streamlining the migration process for service teams and making it as easy and efficient as possible.

The developer experience for creating registry extensions has some rough edges, particularly for languages other than Java, which is the language of choice on AWS service teams for their resource types. It needs to be easier to author schemas, write handler functions, and test the code to make sure it performs as expected. We are devoting more resources to the maintenance of the CLI and plugins for Python, Typescript, and Go. Our response times to issues and pull requests in these and other repositories in the aws-cloudformation GitHub organization have not been as fast as they should be, and we are making improvements. One example is the cloudformation-cli repository, where we have merged more than 30 pull requests since October of 2022.

To keep up with progress on resource coverage, check out the CloudFormation Coverage Roadmap, a GitHub project where we catalog all of the open issues to be resolved. You can submit bug reports and feature requests related to resource coverage in this repository and keep tabs on the status of open requests. One of the steps we took recently to improve responses to feature requests and bugs reported on GitHub is to create a system that converts GitHub issues into tickets in our internal issue tracker. These tickets go directly to the responsible service teams—an example is the Amazon RDS resource provider, which has hundreds of merged pull requests.

We have recently announced a new GitHub repository called community-registry-extensions where we are managing a namespace for public registry extensions. You can submit and discuss new ideas for extensions and contribute to any of the related projects. We handle the testing, validation, and deployment of all resources under the AwsCommunity:: namespace, which can be activated in any AWS account for use in your own templates.

To get started with the CloudFormation registry, visit the user guide, and then dive in to the detailed developer guide for information on how to use the CloudFormation Command Line Interface (CFN-CLI) to write your own resource types, modules, and hooks.

We recently created a new Discord server dedicated to CloudFormation. Please join us to ask questions, discuss best practices, provide feedback, or just hang out! We look forward to seeing you there.

Conclusion

In this post, we hope you gained some insights into the history of the CloudFormation registry, and the design decisions that were made during our evolution towards a standardized, scalable model for resource development that can be shared by AWS service teams, customers, and APN partners. Some of the lessons that we learned along the way might be applicable to complex design initiatives at your own company. We hope to see you on Discord and GitHub as we build out a rich set of registry resources together!

About the authors:

Integrating DevOps Guru Insights with CloudWatch Dashboard

2023-05-04 Suresh Babu

Post Syndicated from Suresh Babu original https://aws.amazon.com/blogs/devops/integrating-devops-guru-insights-with-cloudwatch-dashboard/

Many customers use Amazon CloudWatch dashboards to monitor applications and often ask how they can integrate Amazon DevOps Guru Insights in order to have a unified dashboard for monitoring. This blog post showcases integrating DevOps Guru proactive and reactive insights to a CloudWatch dashboard by using Custom Widgets. It can help you to correlate trends over time and spot issues more efficiently by displaying related data from different sources side by side and to have a single pane of glass visualization in the CloudWatch dashboard.

Amazon DevOps Guru is a machine learning (ML) powered service that helps developers and operators automatically detect anomalies and improve application availability. DevOps Guru’s anomaly detectors can proactively detect anomalous behavior even before it occurs, helping you address issues before they happen; detailed insights provide recommendations to mitigate that behavior.

Amazon CloudWatch dashboard is a customizable home page in the CloudWatch console that monitors multiple resources in a single view. You can use CloudWatch dashboards to create customized views of the metrics and alarms for your AWS resources.

Solution overview

This post will help you to create a Custom Widget for Amazon CloudWatch dashboard that displays DevOps Guru Insights. A custom widget is part of your CloudWatch dashboard that calls an AWS Lambda function containing your custom code. The Lambda function accepts custom parameters, generates your dataset or visualization, and then returns HTML to the CloudWatch dashboard. The CloudWatch dashboard will display this HTML as a widget. In this post, we are providing sample code for the Lambda function that will call DevOps Guru APIs to retrieve the insights information and displays as a widget in the CloudWatch dashboard. The architecture diagram of the solution is below.

Solution Architecture

Figure 1: Reference architecture diagram

Prerequisites and Assumptions

An AWS account. To sign up:
- Create an AWS account. For instructions, see Sign Up For AWS.
DevOps Guru should be enabled in the account. For enabling DevOps guru, see DevOps Guru Setup
Follow this Workshop to deploy a sample application in your AWS Account which can help generate some DevOps Guru insights.

Solution Deployment

We are providing two options to deploy the solution – using the AWS console and AWS CloudFormation. The first section has instructions to deploy using the AWS console followed by instructions for using CloudFormation. The key difference is that we will create one Widget while using the Console, but three Widgets are created when we use AWS CloudFormation.

Using the AWS Console:

We will first create a Lambda function that will retrieve the DevOps Guru insights. We will then modify the default IAM role associated with the Lambda function to add DevOps Guru permissions. Finally we will create a CloudWatch dashboard and add a custom widget to display the DevOps Guru insights.

Navigate to the Lambda Console after logging to your AWS Account and click on Create function.

Figure 2a: Create Lambda Function
Choose Author from Scratch and use the runtime Node.js 16.x. Leave the rest of the settings at default and create the function.

Figure 2b: Create Lambda Function

After a few seconds, the Lambda function will be created and you will see a code source box. Copy the code from the text box below and replace the code present in code source as shown in screen print below.

// SPDX-License-Identifier: MIT-0
// CloudWatch Custom Widget sample: displays count of Amazon DevOps Guru Insights
const aws = require('aws-sdk');

const DOCS = `## DevOps Guru Insights Count
Displays the total counts of Proactive and Reactive Insights in DevOps Guru.
`;

async function getProactiveInsightsCount(DevOpsGuru, StartTime, EndTime) {
    let NextToken = null;
    let proactivecount=0;

    do {
        const args = { StatusFilter: { Any : { StartTimeRange: { FromTime: StartTime, ToTime: EndTime }, Type: 'PROACTIVE'  }}}
        const result = await DevOpsGuru.listInsights(args).promise();
        console.log(result)
        NextToken = result.NextToken;
        result.ProactiveInsights.forEach(res =&gt; {
        console.log(result.ProactiveInsights[0].Status)
        proactivecount++;
        });
        } while (NextToken);
    return proactivecount;
}

async function getReactiveInsightsCount(DevOpsGuru, StartTime, EndTime) {
    let NextToken = null;
    let reactivecount=0;

    do {
        const args = { StatusFilter: { Any : { StartTimeRange: { FromTime: StartTime, ToTime: EndTime }, Type: 'REACTIVE'  }}}
        const result = await DevOpsGuru.listInsights(args).promise();
        NextToken = result.NextToken;
        result.ReactiveInsights.forEach(res =&gt; {
        reactivecount++;
        });
        } while (NextToken);
    return reactivecount;
}

function getHtmlOutput(proactivecount, reactivecount, region, event, context) {

    return `DevOps Guru Proactive Insights&lt;br&gt;&lt;font size="+10" color="#FF9900"&gt;${proactivecount}&lt;/font&gt;
    &lt;p&gt;DevOps Guru Reactive Insights&lt;/p&gt;&lt;font size="+10" color="#FF9900"&gt;${reactivecount}`;
}

exports.handler = async (event, context) =&gt; {
    if (event.describe) {
        return DOCS;
    }
    const widgetContext = event.widgetContext;
    const timeRange = widgetContext.timeRange.zoom || widgetContext.timeRange;
    const StartTime = new Date(timeRange.start);
    const EndTime = new Date(timeRange.end);
    const region = event.region || process.env.AWS_REGION;
    const DevOpsGuru = new aws.DevOpsGuru({ region });

    const proactivecount = await getProactiveInsightsCount(DevOpsGuru, StartTime, EndTime);
    const reactivecount = await getReactiveInsightsCount(DevOpsGuru, StartTime, EndTime);

    return getHtmlOutput(proactivecount, reactivecount, region, event, context);
    
};

Figure 3: Lambda Function Source Code

Click on Deploy to save the function code
Since we used the default settings while creating the function, a default Execution role is created and associated with the function. We will need to modify the IAM role to grant DevOps Guru permissions to retrieve Proactive and Reactive insights.
Click on the Configuration tab and select Permissions from the left side option list. You can see the IAM execution role associated with the function as shown in figure 4.

Figure 4: Lambda function execution role
Click on the IAM role name to open the role in the IAM console. Click on Add Permissions and select Attach policies.

Figure 5: IAM Role Update
Search for DevOps and select the AmazonDevOpsGuruReadOnlyAccess. Click on Add permissions to update the IAM role.

Figure 6: IAM Role Policy Update
Now that we have created the Lambda function for our custom widget and assigned appropriate permissions, we can navigate to CloudWatch to create a Dashboard.
Navigate to CloudWatch and click on dashboards from the left side list. You can choose to create a new dashboard or add the widget in an existing dashboard.
We will choose to create a new dashboard

Figure 7: Create New CloudWatch dashboard
Choose Custom Widget in the Add widget page

Figure 8: Add widget
Click Next in the custom widge page without choosing a sample

Figure 9: Custom Widget Selection
Choose the region where devops guru is enabled. Select the Lambda function that we created earlier. In the preview pane, click on preview to view DevOps Guru metrics. Once the preview is successful, create the Widget.

Figure 10: Create Custom Widget
Congratulations, you have now successfully created a CloudWatch dashboard with a custom widget to get insights from DevOps Guru. The sample code that we provided can be customized to suit your needs.

Using AWS CloudFormation

You may skip this step and move to future scope section if you have already created the resources using AWS Console.

In this step we will show you how to deploy the solution using AWS CloudFormation. AWS CloudFormation lets you model, provision, and manage AWS and third-party resources by treating infrastructure as code. Customers define an initial template and then revise it as their requirements change. For more information on CloudFormation stack creation refer to this blog post.

The following resources are created.

Three Lambda functions that will support CloudWatch Dashboard custom widgets
An AWS Identity and Access Management (IAM) role to that allows the Lambda function to access DevOps Guru Insights and to publish logs to CloudWatch
Three Log Groups under CloudWatch
A CloudWatch dashboard with widgets to pull data from the Lambda Functions

To deploy the solution by using the CloudFormation template

You can use this downloadable template to set up the resources. To launch directly through the console, choose Launch Stack button, which creates the stack in the us-east-1 AWS Region.
Choose Next to go to the Specify stack details page.
(Optional) On the Configure Stack Options page, enter any tags, and then choose Next.
On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create stack.

It takes approximately 2-3 minutes for the provisioning to complete. After the status is “Complete”, proceed to validate the resources as listed below.

Validate the resources

Now that the stack creation has completed successfully, you should validate the resources that were created.

On AWS Console, head to CloudWatch, under Dashboards – there will be a dashboard created with name <StackName-Region>.
On AWS Console, head to CloudWatch, under LogGroups there will be 3 new log-groups created with the name as:
- lambdaProactiveLogGroup
- lambdaReactiveLogGroup
- lambdaSummaryLogGroup
On AWS Console, head to Lambda, there will be lambda function(s) under the name:
- lambdaFunctionDGProactive
- lambdaFunctionDGReactive
- lambdaFunctionDGSummary
On AWS Console, head to IAM, under Roles there will be a new role created with name “lambdaIAMRole”

To View Results/Outcome

With the appropriate time-range setup on CloudWatch Dashboard, you will be able to navigate through the insights that have been generated from DevOps Guru on the CloudWatch Dashboard.

Figure 11: DevOpsGuru Insights in Cloudwatch Dashboard

Cleanup

For cost optimization, after you complete and test this solution, clean up the resources. You can delete them manually if you used the AWS Console or by deleting the AWS CloudFormation stack called devopsguru-cloudwatch-dashboard if you used AWS CloudFormation.

For more information on deleting the stacks, see Deleting a stack on the AWS CloudFormation console.

Conclusion

This blog post outlined how you can integrate DevOps Guru insights into a CloudWatch Dashboard. As a customer, you can start leveraging CloudWatch Custom Widgets to include DevOps Guru Insights in an existing Operational dashboard.

AWS Customers are now using Amazon DevOps Guru to monitor and improve application performance. You can start monitoring your applications by following the instructions in the product documentation. Head over to the Amazon DevOps Guru console to get started today.

To learn more about AIOps for Serverless using Amazon DevOps Guru check out this video.

DevSecOps with Amazon CodeGuru Reviewer CLI and Bitbucket Pipelines

2023-04-28 Bineesh Ravindran

Post Syndicated from Bineesh Ravindran original https://aws.amazon.com/blogs/devops/devsecops-with-amazon-codeguru-reviewer-cli-and-bitbucket-pipelines/

DevSecOps refers to a set of best practices that integrate security controls into the continuous integration and delivery (CI/CD) workflow. One of the first controls is Static Application Security Testing (SAST). SAST tools run on every code change and search for potential security vulnerabilities before the code is executed for the first time. Catching security issues early in the development process significantly reduces the cost of fixing them and the risk of exposure.

This blog post, shows how we can set up a CI/CD using Bitbucket Pipelines and Amazon CodeGuru Reviewer . Bitbucket Pipelines is a cloud-based continuous delivery system that allows developers to automate builds, tests, and security checks with just a few lines of code. CodeGuru Reviewer is a cloud-based static analysis tool that uses machine learning and automated reasoning to generate code quality and security recommendations for Java and Python code.

We demonstrate step-by-step how to set up a pipeline with Bitbucket Pipelines, and how to call CodeGuru Reviewer from there. We then show how to view the recommendations produced by CodeGuru Reviewer in Bitbucket Code Insights, and how to triage and manage recommendations during the development process.

Bitbucket Overview

Bitbucket is a Git-based code hosting and collaboration tool built for teams. Bitbucket’s best-in-class Jira and Trello integrations are designed to bring the entire software team together to execute a project. Bitbucket provides one place for a team to collaborate on code from concept to cloud, build quality code through automated testing, and deploy code with confidence. Bitbucket makes it easy for teams to collaborate and reduce issues found during integration by providing a way to combine easily and test code frequently. Bitbucket gives teams easy access to tools needed in other parts of the feedback loop, from creating an issue to deploying on your hardware of choice. It also provides more advanced features for those customers that need them, like SAML authentication and secrets storage.

Solution Overview

Bitbucket Pipelines uses a Docker container to perform the build steps. You can specify any Docker image accessible by Bitbucket, including private images, if you specify credentials to access them. The container starts and then runs the build steps in the order specified in your configuration file. The build steps specified in the configuration file are nothing more than shell commands executed on the Docker image. Therefore, you can run scripts, in any language supported by the Docker image you choose, as part of the build steps. These scripts can be stored either directly in your repository or an Internet-accessible location. This solution demonstrates an easy way to integrate Bitbucket pipelines with AWS CodeReviewer using bitbucket-pipelines.yml file.

You can interact with your Amazon Web Services (AWS) account from your Bitbucket Pipeline using the OpenID Connect (OIDC) feature. OpenID Connect is an identity layer above the OAuth 2.0 protocol.

Now that you understand how Bitbucket and your AWS Account securely communicate with each other, let’s look into the overall summary of steps to configure this solution.

Fork the repository
Configure Bitbucket Pipelines as an IdP on AWS.
Create an IAM role.
Add repository variables needed for pipeline
Adding the CodeGuru Reviewer CLI to your pipeline
Review CodeGuru recommendations

Now let’s look into each step in detail. To configure the solution, follow steps mentioned below.

Step 1: Fork this repo

https://bitbucket.org/aws-samples/amazon-codeguru-samples

Figure 1 : Fork amazon-codeguru-samples bitbucket repository.

Step 2: Configure Bitbucket Pipelines as an Identity Provider on AWS

Configuring Bitbucket Pipelines as an IdP in IAM enables Bitbucket Pipelines to issue authentication tokens to users to connect to AWS.
In your Bitbucket repo, go to Repository Settings > OpenID Connect. Note the provider URL and the Audience variable on that screen.

The Identity Provider URL will look like this:

https://api.bitbucket.org/2.0/workspaces/YOUR_WORKSPACE/pipelines-config/identity/oidc – This is the issuer URL for authentication requests. This URL issues a token to a requester automatically as part of the workflow. See more detail about issuer URL in RFC . Here “YOUR_WORKSPACE” need to be replaced with name of your bitbucket workspace.

And the Audience will look like:

ari:cloud:bitbucket::workspace/ari:cloud:bitbucket::workspace/84c08677-e352-4a1c-a107-6df387cfeef7 – This is the recipient the token is intended for. See more detail about audience in Request For Comments (RFC) which is memorandum published by the Internet Engineering Task Force(IETF) describing methods and behavior for securely transmitting information between two parties usinf JSON Web Token ( JWT).

Figure 2 : Configure Bitbucket Pipelines as an Identity Provider on AWS

Next, navigate to the IAM dashboard > Identity Providers > Add provider, and paste in the above info. This tells AWS that Bitbucket Pipelines is a token issuer.

Step 3: Create a custom policy

You can always use the CLI with Admin credentials but if you want to have a specific role to use the CLI, your credentials must have at least the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "codeguru-reviewer:ListRepositoryAssociations",
                "codeguru-reviewer:AssociateRepository",
                "codeguru-reviewer:DescribeRepositoryAssociation",
                "codeguru-reviewer:CreateCodeReview",
                "codeguru-reviewer:DescribeCodeReview",
                "codeguru-reviewer:ListRecommendations",
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:CreateBucket",
                "s3:GetBucket*",
                "s3:List*",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::codeguru-reviewer-cli-<AWS ACCOUNT ID>*",
                "arn:aws:s3:::codeguru-reviewer-cli-<AWS ACCOUNT ID>*/*"
            ],
            "Effect": "Allow"
        }
    ]
}

To create an IAM policy, navigate to the IAM dashboard > Policies > Create Policy

Now then paste the above mentioned json document into the json tab as shown in screenshot below and replace <AWS ACCOUNT ID> with your own AWS Account ID

Figure 3 : Create a Policy.

Name your policy; in our example, we name it CodeGuruReviewerOIDC.

Figure 4 : Review and Create a IAM policy.

Step 4: Create an IAM Role

Once you’ve enabled Bitbucket Pipelines as a token issuer, you need to configure permissions for those tokens so they can execute actions on AWS.
To create an IAM web identity role, navigate to the IAM dashboard > Roles > Create Role, and choose the IdP and audience you just created.

Figure 5 : Create an IAM role

Next, select the “CodeGuruReviewerOIDC “ policy to attach to the role.

Figure 6 : Assign policy to role

Figure 7 : Review and Create role

Name your role; in our example, we name it CodeGuruReviewerOIDCRole.

After adding a role, copy the Amazon Resource Name (ARN) of the role created:

The Amazon Resource Name (ARN) will look like this:

arn:aws:iam::000000000000:role/CodeGuruReviewerOIDCRole

we will need this in a later step when we create AWS_OIDC_ROLE_ARN as a repository variable.

Step 5: Add repository variables needed for pipeline

Variables are configured as environment variables in the build container. You can access the variables from the bitbucket-pipelines.yml file or any script that you invoke by referring to them. Pipelines provides a set of default variables that are available for builds, and can be used in scripts .Along with default variables we need to configure few additional variables called Repository Variables which are used to pass special parameter to the pipeline.

Figure 8 : Create repository variables

Figure 8 Create repository variables

Below mentioned are the few repository variables that need to be configured for this solution.

1.AWS_DEFAULT_REGION Create a repository variableAWS_DEFAULT_REGION with value “us-east-1”

2.BB_API_TOKEN Create a new repository variable BB_API_TOKEN and paste the below created App password as the value

App passwords are user-based access tokens for scripting tasks and integrating tools (such as CI/CD tools) with Bitbucket Cloud.These access tokens have reduced user access (specified at the time of creation) and can be useful for scripting, CI/CD tools, and testing Bitbucket connected applications while they are in development.
To create an App password:

- Select your avatar (Your profile and settings) from the navigation bar at the top of the screen.
- Under Settings, select Personal settings.
- On the sidebar, select App passwords.
- Select Create app password.
- Give the App password a name, usually related to the application that will use the password.
- Select the permissions the App password needs. For detailed descriptions of each permission, see: App password permissions.
- Select the Create button. The page will display the New app password dialog.
- Copy the generated password and either record or paste it into the application you want to give access. The password is only displayed once and can’t be retrieved later.

3.BB_USERNAME Create a repository variable BB_USERNAME and add your bitbucket username as the value of this variable

4.AWS_OIDC_ROLE_ARN

After adding a role in Step 4, copy the Amazon Resource Name (ARN) of the role created:

The Amazon Resource Name (ARN) will look something like this:

arn:aws:iam::000000000000:role/CodeGuruReviewerOIDCRole

and create AWS_OIDC_ROLE_ARN as a repository variable in the target Bitbucket repository.

Step 6: Adding the CodeGuru Reviewer CLI to your pipeline

In order to add CodeGuruRevewer CLi to your pipeline update the bitbucket-pipelines.yml file as shown below

#  Template maven-build

 #  This template allows you to test and build your Java project with Maven.
 #  The workflow allows running tests, code checkstyle and security scans on the default branch.

 # Prerequisites: pom.xml and appropriate project structure should exist in the repository.

 image: docker-public.packages.atlassian.com/atlassian/bitbucket-pipelines-mvn-python3-awscli

 pipelines:
  default:
    - step:
        name: Build Source Code
        caches:
          - maven
        script:
          - cd $BITBUCKET_CLONE_DIR
          - chmod 777 ./gradlew
          - ./gradlew build
        artifacts:
          - build/**
    - step: 
        name: Download and Install CodeReviewer CLI   
        script:
          - curl -OL https://github.com/aws/aws-codeguru-cli/releases/download/0.2.3/aws-codeguru-cli.zip
          - unzip aws-codeguru-cli.zip
        artifacts:
          - aws-codeguru-cli/**
    - step:
        name: Run CodeGuruReviewer 
        oidc: true
        script:
          - export AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION
          - export AWS_ROLE_ARN=$AWS_OIDC_ROLE_ARN
          - export S3_BUCKET=$S3_BUCKET

          # Setup aws cli
          - export AWS_WEB_IDENTITY_TOKEN_FILE=$(pwd)/web-identity-token
          - echo $BITBUCKET_STEP_OIDC_TOKEN > $(pwd)/web-identity-token
          - aws configure set web_identity_token_file "${AWS_WEB_IDENTITY_TOKEN_FILE}"
          - aws configure set role_arn "${AWS_ROLE_ARN}"
          - aws sts get-caller-identity

          # setup codegurureviewercli
          - export PATH=$PATH:./aws-codeguru-cli/bin
          - chmod 777 ./aws-codeguru-cli/bin/aws-codeguru-cli

          - export SRC=$BITBUCKET_CLONE_DIR/src
          - export OUTPUT=$BITBUCKET_CLONE_DIR/test-reports
          - export CODE_INSIGHTS=$BITBUCKET_CLONE_DIR/bb-report

          # Calling Code Reviewer CLI
          - ./aws-codeguru-cli/bin/aws-codeguru-cli --region $AWS_DEFAULT_REGION  --root-dir $BITBUCKET_CLONE_DIR --build $BITBUCKET_CLONE_DIR/build/classes/java --src $SRC --output $OUTPUT --no-prompt --bitbucket-code-insights $CODE_INSIGHTS        
        artifacts:
          - test-reports/*.* 
          - target/**
          - bb-report/**
    - step: 
        name: Upload Code Insights Artifacts to Bitbucket Reports 
        script:
          - chmod 777 upload.sh
          - ./upload.sh bb-report/report.json bb-report/annotations.json
    - step:
        name: Upload Artifacts to Bitbucket Downloads       # Optional Step
        script:
          - pipe: atlassian/bitbucket-upload-file:0.3.3
            variables:
              BITBUCKET_USERNAME: $BB_USERNAME
              BITBUCKET_APP_PASSWORD: $BB_API_TOKEN
              FILENAME: '**/*.json'
    - step:
          name: Validate Findings     #Optional Step
          script:
            # Looking into CodeReviewer results and failing if there are Critical recommendations
            - grep -o "Critical" test-reports/recommendations.json | wc -l
            - count="$(grep -o "Critical" test-reports/recommendations.json | wc -l)"
            - echo $count
            - if (( $count > 0 )); then
            - echo "Critical findings discovered. Failing."
            - exit 1
            - fi
          artifacts:
            - '**/*.json'

Let’s look into the pipeline file to understand various steps defined in this pipeline

Figure 9 : Bitbucket pipeline execution steps

Step 1) Build Source Code

In this step source code is downloaded into a working directory and build using Gradle.All the build artifacts are then passed on to next step

Step 2) Download and Install Amazon CodeGuru Reviewer CLI
In this step Amazon CodeGuru Reviewer is CLI is downloaded from a public github repo and extracted into working directory. All artifacts downloaded and extracted are then passed on to next step

Step 3) Run CodeGuruReviewer

This step uses flag oidc: true which declares you are using the OIDC authentication method, while AWS_OIDC_ROLE_ARN declares the role created in the previous step that contains all of the necessary permissions to deal with AWS resources.
Further repository variables are exported, which is then used to set AWS CLI .Amazon CodeGuruReviewer CLI which was downloaded and extracted in previous step is then used to invoke CodeGuruReviewer along with some parameters .

Following are the parameters that are passed on to the CodeGuruReviewer CLI
--region $AWS_DEFAULT_REGION The AWS region in which CodeGuru Reviewer will run (in this blog we used us-east-1).

--root-dir $BITBUCKET_CLONE_DIR The root directory of the repository that CodeGuru Reviewer should analyze.

--build $BITBUCKET_CLONE_DIR/build/classes/java Points to the build artifacts. Passing the Java build artifacts allows CodeGuru Reviewer to perform more in-depth bytecode analysis, but passing the build artifacts is not required.

--src $SRC Points the source code that should be analyzed. This can be used to focus the analysis on certain source files, e.g., to exclude test files. This parameter is optional, but focusing on relevant code can shorten analysis time and cost.

--output $OUTPUT The directory where CodeGuru Reviewer will store its recommendations.

--no-prompt This ensures that CodeGuru Reviewer does run in interactive mode where it pauses for user input.

–-bitbucket-code-insights $CODE_INSIGHTS The location where recommendations in Bitbucket CodeInsights format should be written to.

Once Amazon CodeGuruReviewer scans the code based on the above parameters, it generates two json files (reports.json and annotations.json) Code Insight Reports which is then passed on as artifacts to the next step.

Step 4) Upload Code Insights Artifacts to Bitbucket Reports
In this step code Insight Report generated by Amazon CodeGuru Reviewer is then uploaded to Bitbucket Reports. This makes the report available in the reports section in the pipeline as displayed in the screenshot

Figure 10 : CodeGuru Reviewer Report

Step 5) [Optional] Upload the copy of these reports to Bitbucket Downloads
This is an Optional step where you can upload the artifacts to Bitbucket Downloads. This is especially useful because the artifacts inside a build pipeline gets deleted after 14 days of the pipeline run. Using Bitbucket Downloads, you can store these artifacts for a much longer duration.

Figure 11 : Bitbucket downloads

Step 6) [Optional] Validate Findings by looking into results and failing is there are any Critical Recommendations
This is an optional step showcasing how the results for CodeGururReviewer can be used to trigger the success and failure of a Bitbucket pipeline. In this step the pipeline fails, if a critical recommendation exists in report.

Step 7: Review CodeGuru recommendations

CodeGuru Reviewer supports different recommendation formats, including CodeGuru recommendation summaries, SARIF, and Bitbucket CodeInsights.

Keeping your Pipeline Green

Now that CodeGuru Reviewer is running in our pipeline, we need to learn how to unblock ourselves if there are recommendations. The easiest way to unblock a pipeline after is to address the CodeGuru recommendation. If we want to validate on our local machine that a change addresses a recommendation using the same CLI that we use as part of our pipeline.
Sometimes, it is not convenient to address a recommendation. E.g., because there are mitigations outside of the code that make the recommendation less relevant, or simply because the team agrees that they don’t want to block deployments on recommendations unless they are critical. For these cases, developers can add a .codeguru-ignore.yml file to their repository where they can use a variety of criteria under which a recommendation should not be reported. Below we explain all available criteria to filter recommendations. Developers can use any subset of those criteria in their .codeguru-ignore.yml file. We will give a specific example in the following sections.

version: 1.0 # The version number is mandatory. All other entries are optional.

# The CodeGuru Reviewer CLI produces a recommendations.json file which contains deterministic IDs for each
# recommendation. This ID can be excluded so that this recommendation will not be reported in future runs of the
# CLI.
 ExcludeById:
 - '4d2c43618a2dac129818bef77093730e84a4e139eef3f0166334657503ecd88d'
# We can tell the CLI to exclude all recommendations below a certain severity. This can be useful in CI/CD integration.
 ExcludeBelowSeverity: 'HIGH'
# We can exclude all recommendations that have a certain tag. Available Tags can be found here:
# https://docs.aws.amazon.com/codeguru/detector-library/java/tags/
# https://docs.aws.amazon.com/codeguru/detector-library/python/tags/
 ExcludeTags:
  - 'maintainability'
# We can also exclude recommendations by Detector ID. Detector IDs can be found here:
# https://docs.aws.amazon.com/codeguru/detector-library
 ExcludeRecommendations:
# Ignore all recommendations for a given Detector ID 
  - detectorId: 'java/[email protected]'
# Ignore all recommendations for a given Detector ID in a provided set of locations.
# Locations can be written as Unix GLOB expressions using wildcard symbols.
  - detectorId: 'java/[email protected]'
    Locations:
      - 'src/main/java/com/folder01/*.java'
# Excludes all recommendations in the provided files. Files can be provided as Unix GLOB expressions.
 ExcludeFiles:
  - tst/**

The recommendations will still be reported in the CodeGuru Reviewer console, but not by the CodeGuru Reviewer CLI and thus they will not block the pipeline anymore.

Conclusion

In this post, we outlined how you can set up a CI/CD pipeline using Bitbucket Pipelines, and Amazon CodeGuru Reviewer and we outlined how you can integrate Amazon CodeGuru Reviewer CLI with the Bitbucket cloud-based continuous delivery system that allows developers to automate builds, tests, and security checks with just a few lines of code. We showed you how to create a Bitbucket pipeline job and integrate the CodeGuru Reviewer CLI to detect issues in your Java and Python code, and access the recommendations for remediating these issues.

We presented an example where you can stop the build upon finding critical violations. Furthermore, we discussed how you could upload these artifacts to BitBucket downloads and store these artifacts for a much longer duration. The CodeGuru Reviewer CLI offers you a one-line command to scan any code on your machine and retrieve recommendations .You can use the CLI to integrate CodeGuru Reviewer into your favorite CI tool, as a pre-commit hook, in your workflow. In turn, you can combine CodeGuru Reviewer with Dynamic Application Security Testing (DAST) and Software Composition Analysis (SCA) tools to achieve a hybrid application security testing method that helps you combine the inside-out and outside-in testing approaches, cross-reference results, and detect vulnerabilities that both exist and are exploitable.

If you need hands-on keyboard support, then AWS Professional Services can help implement this solution in your enterprise, and introduce you to our AWS DevOps services and offerings.

About the authors:

10 ways to build applications faster with Amazon CodeWhisperer

2023-04-26 Kris Schultz

Post Syndicated from Kris Schultz original https://aws.amazon.com/blogs/devops/10-ways-to-build-applications-faster-with-amazon-codewhisperer/

Amazon CodeWhisperer is a powerful generative AI tool that gives me coding superpowers. Ever since I have incorporated CodeWhisperer into my workflow, I have become faster, smarter, and even more delighted when building applications. However, learning to use any generative AI tool effectively requires a beginner’s mindset and a willingness to embrace new ways of working.

Best practices for tapping into CodeWhisperer’s power are still emerging. But, as an early explorer, I’ve discovered several techniques that have allowed me to get the most out of this amazing tool. In this article, I’m excited to share these techniques with you, using practical examples to illustrate just how CodeWhisperer can enhance your programming workflow. I’ll explore:

Before we begin

If you would like to try these techniques for yourself, you will need to use a code editor with the AWS Toolkit extension installed. VS Code, AWS Cloud9, and most editors from JetBrains will work. Refer to the CodeWhisperer “Getting Started” resources for setup instructions.

CodeWhisperer will present suggestions automatically as you type. If you aren’t presented with a suggestion, you can always manually trigger a suggestion using the Option + C (Mac) or Alt + C (Windows) shortcut. CodeWhisperer will also sometimes present you with multiple suggestions to choose from. You can press the → and ← keys to cycle through all available suggestions.

The suggestions CodeWhisperer offers are non-deterministic, which means you may receive slightly different suggestions than the ones shown in this article. If you receive a suggestion that doesn’t suit your needs, try cycling through all available suggestions using the arrow keys or try rephrasing any code comment prompts slightly. Even punctuation changes can produce fresh suggestions.

The examples in this article are written in JavaScript and TypeScript. However, the techniques I’ll demonstrate are equally suited to many of the programming languages supported by CodeWhisperer. Now, onto the examples!

Typing less

At the most basic level, CodeWhisperer can be used as a supercharged auto-complete, significantly reducing your typing when writing individual lines or blocks of code. To use CodeWhisperer in this way, just write code as you normally would. CodeWhisperer will make automatic suggestions as you type. Accept suggestions when they’re helpful, and ignore suggestions when they’re not. In my experience, using CodeWhisperer in this way alone reduces my typing by 50% or more.

Generating functions

Automatically generating function bodies from signatures

Skilled programmers use clear, descriptive names for their functions and parameters. If you follow this practice in your own programming, CodeWhisperer will have all the context it needs to implement many functions for you.

After I enter the function signature below, CodeWhisperer generates the function body for me.

User input (JavaScript):

function standardDeviation(numbers)

Result:

function standardDeviation(numbers) {
  let mean = numbers.reduce((a, b) => a + b) / numbers.length;
  let variance =
    numbers.reduce((a, b) => a + Math.pow(b - mean, 2)) / numbers.length;
  return Math.sqrt(variance);
}

Generating functions from comments

Sometimes a good function signature alone isn’t enough to convey the desired behavior of a function. In those cases, you can use a detailed code comment as a prompt to CodeWhisperer. I recommend using your language’s doc comment delimiter when using this technique.

User Input (JavaScript):

/**
 * Given a list that contains some numbers and strings, format them into a string
 * in which the numbers are prepended with "#" and the strings are wrapped in
 * double quotes.
 */

Result:

/**
 * Given a list that contains some numbers and strings, format them into a string
 * in which the numbers are prepended with "#" and the strings are wrapped in
 * double quotes.
 */
function formatList(list) {
  return list
    .map((item) => {
      if (typeof item === "number") {
        return `#${item}`;
      } else {
        return `"${item}"`;
      }
    })
    .join(", ");
}

Generating classes

You can generate full or partial class implementations from a single code comment. The comment should describe the key properties and behaviors of the class. Simple descriptions are often all that is required.

User input (TypeScript):

/**
 * A folder has a name, children, and an optional parent. A child can't be added
 * twice.
 */

Result:

After I write the above block comment, CodeWhisperer provides multiple suggestions, including the following implementation, complete with documentation comments for each method!

/**
 * A folder has a name, children, and an optional parent. A child can't be added
 * twice.
 */
export class Folder {
  name: string;
  children: Folder[];
  parent?: Folder;

  constructor(name: string, parent?: Folder) {
    this.name = name;
    this.children = [];
    this.parent = parent;
  }

  /**
   * Add a child to the folder.
   * @param child The child to add.
   * @returns True if the child was added, false if it was already present.
   */
  addChild(child: Folder): boolean {
    if (this.children.includes(child)) {
      return false;
    }
    this.children.push(child);
    child.parent = this;
    return true;
  }

  /**
   * Remove a child from the folder.
   * @param child The child to remove.
   * @returns True if the child was removed, false if it wasn't present.
   */
  removeChild(child: Folder): boolean {
    const index = this.children.indexOf(child);
    if (index === -1) {
      return false;
    }
    this.children.splice(index, 1);
    child.parent = undefined;
    return true;
  }
}

Implementing algorithms

CodeWhisperer enables you to implement well-known algorithms with ease. Simply tell CodeWhisperer which algorithm you want to apply to a particular coding problem, and it will oblige.

User input (JavaScript):

const actors = [ ... ];

// Sort the actors by age using the merge sort algorithm.

Result:

const actors = [ ... ];

// Sort the actors by age using the merge sort algorithm.
function mergeSort(arr) {
  if (arr.length < 2) {
    return arr;
  }

  const middle = Math.floor(arr.length / 2);
  const left = arr.slice(0, middle);
  const right = arr.slice(middle);

  return merge(mergeSort(left), mergeSort(right));
}

function merge(left, right) {
  const result = [];

  while (left.length && right.length) {
    if (left[0].age < right[0].age) {
      result.push(left.shift());
    } else {
      result.push(right.shift());
    }
  }

  while (left.length) {
    result.push(left.shift());
  }

  while (right.length) {
    result.push(right.shift());
  }

  return result;
}

console.log(mergeSort(actors));

Writing unit tests

CodeWhisperer accelerates the implementation of unit tests. Not only is it able to implement the logic for most of the explicit unit tests I ask for, but it often suggests unit tests for edge cases that hadn’t even occurred to me.

When I want CodeWhisperer to implement a unit test, I start with a declaration of the condition I’m testing and the expected result as in the following example.

User input (JavaScript):

describe("Folder.addChild()", () => {
  it("when adding an existing folder should not add duplicate"

Result:

describe("Folder.addChild()", () => {
  it("when adding an existing folder should not add duplicate", () => {
    const folder = new Folder("root");
    const child = new Folder("child");
    folder.addChild(child);
    folder.addChild(child);
    expect(folder.children.length).toBe(1);
  });
});

After you’ve written a test or two, CodeWhisperer is likely to suggest more tests, even without explicit prompting.

Tip: You’ll notice that my test description includes 1) the function I’m testing, 2) the condition I’m testing, and 3) what I expect the result to be. Following this three-part pattern is a great way to ensure CodeWhisperer has the context it needs to implement high quality test logic.

Creating sample data

One of the delightful discoveries I’ve made is that CodeWhisperer can generate dummy data—or really any code that has a repeating pattern. As an illustration, for the following list I entered only the first “user” object as an example, and CodeWhisperer helpfully continued the pattern.

User input (JavaScript):

const fakeUsers = [
  { name: "User 1", id: "user1" },

Result:

const fakeUsers = [
  { name: "User 1", id: "user1" },
  { name: "User 2", id: "user2" },
  { name: "User 3", id: "user3" },  
  { name: "User 4", id: "user4" },
  { name: "User 5", id: "user5" },
  { name: "User 6", id: "user6" }
]

It even works for generating random but plausible values like the names seen below.

User input (JavaScript):

const fakeUsers = [
  { name: "Sally", id: "user1" },

Result:

const fakeUsers = [
  { name: "Sally", id: "user1" },
  { name: "Bob", id: "user2" },
  { name: "Joe", id: "user3" },
  { name: "Jane", id: "user4" },
  { name: "Jill", id: "user5" },
  { name: "Jack", id: "user6" }
]

Simplifying regular expressions

While CodeWhisperer can’t translate natural language into custom regular expressions (yet), it does know a lot of commonly used regular expressions. This saves you time by putting these regular expressions at your fingertips whenever you need them.

Let’s look at one notoriously complex but common regular expression.

User input (JavaScript):

/**
 * Function that validates whether a string is a valid email address.
 */

Result:

/**
 * Function that validates whether a string is a valid email address.
 */
function validateEmail(email: string): boolean {
  const re =
    /^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
  return re.test(String(email).toLowerCase());
}

Learning third-party code libraries faster

CodeWhisperer helps accelerate my learning when I’m using an unfamiliar library, module, or framework. In the example below, I needed to write a utility function that would take a gzip compressed value coming from an API and translate it into a value that the rest of my code could understand. I knew the “pako” library would help me, but I had no idea how to use it. I just prompted CodeWhisperer with an import statement and a comment describing my goal.

import pako from "pako";

/**
 * Accepts a base64-encoded, gzip-compressed JSON string and returns
 * a parsed JSON object.
 */

Result:

import pako from "pako";

/**
 * Accepts a base64-encoded, gzip-compressed JSON string and returns
 * a parsed JSON object.
 */
export function decodeGzipJson(base64String) {
  const compressed = atob(base64String);
  const uncompressed = pako.inflate(compressed, { to: "string" });
  return JSON.parse(uncompressed);
}

Documenting code

CodeWhisperer is capable of generating docstrings and comments for the code it generates, as well as for your existing code. For example, let’s say I want CodeWhisperer to document the matches() method of this FavoritesFilter TypeScript class I’ve implemented (I’ve omitted some implementation details for brevity).

class FavoritesFilter implements IAssetFilter {
  ...
  matches(asset: Asset): boolean {
    ...
  }
}

I can just type a doc comment delimiter (/** */) immediately above the method name and CodeWhisperer will generate the body of the doc comment for me.

Note: When using CodeWhisperer in this way you may have to manually trigger a suggestion using Option + C (Mac) or Alt + C (Windows).

class FavoritesFilter implements IAssetFilter {
  ...
  /**
   * Determines whether the asset matches the filter.
   */
  matches(asset: Asset): boolean {
    ...
  }
}

Conclusion

I hope the techniques above inspire ideas for how CodeWhisperer can make you a more productive coder. Install CodeWhisperer today to start using these time-saving techniques in your own projects. These examples only scratch the surface. As additional creative minds start applying CodeWhisperer to their daily workflows, I’m sure new techniques and best practices will continue to emerge. If you discover a novel approach that you find useful, post a comment to share what you’ve discovered. Perhaps your technique will make it into a future article and help others in the CodeWhisperer community enhance their superpowers.

Monitoring Amazon DevOps Guru insights using Amazon Managed Grafana

2023-04-19 MJ Kubba

Post Syndicated from MJ Kubba original https://aws.amazon.com/blogs/devops/monitoring-amazon-devops-guru-insights-using-amazon-managed-grafana/

As organizations operate day-to-day, having insights into their cloud infrastructure state can be crucial for the durability and availability of their systems. Industry research estimates[1] that downtime costs small businesses around $427 per minute of downtime, and medium to large businesses an average of $9,000 per minute of downtime. Amazon DevOps Guru customers want to monitor and generate alerts using a single dashboard. This allows them to reduce context switching between applications, providing them an opportunity to respond to operational issues faster.

DevOps Guru can integrate with Amazon Managed Grafana to create and display operational insights. Alerts can be created and communicated for any critical events captured by DevOps Guru and notifications can be sent to operation teams to respond to these events. The key telemetry data types of logs and metrics are parsed and filtered to provide the necessary insights into observability.

Furthermore, it provides plug-ins to popular open-source databases, third-party ISV monitoring tools, and other cloud services. With Amazon Managed Grafana, you can easily visualize information from multiple AWS services, AWS accounts, and Regions in a single Grafana dashboard.

In this post, we will walk you through integrating the insights generated from DevOps Guru with Amazon Managed Grafana.

Solution Overview:

This architecture diagram shows the flow of the logs and metrics that will be utilized by Amazon Managed Grafana. Insights originate from DevOps Guru, each insight generating an event. These events are captured by Amazon EventBridge, and then saved as logs to Amazon CloudWatch Log Group DevOps Guru service metrics, and then parsed by Amazon Managed Grafana to create new dashboards.

This architecture diagram shows the flow of the logs and metrics that will be utilized by Amazon Managed Grafana, starting with DevOps Guru and then using Amazon EventBridge to save the insight event logs to Amazon CloudWatch Log Group DevOps Guru service metrics to be parsed by Amazon Managed Grafana and create new dashboards in Grafana from these logs and Metrics.

Now we will walk you through how to do this and set up notifications to your operations team.

Prerequisites:

The following prerequisites are required for this walkthrough:

An AWS Account
Enabled DevOps Guru on your account with CloudFormation stack, or tagged resources monitored.

Using Amazon CloudWatch Metrics

DevOps Guru sends service metrics to CloudWatch Metrics. We will use these to track metrics for insights and metrics for your DevOps Guru usage; the DevOps Guru service reports the metrics to the AWS/DevOps-Guru namespace in CloudWatch by default.

First, we will provision an Amazon Managed Grafana workspace and then create a Dashboard in the workspace that uses Amazon CloudWatch as a data source.

Setting up Amazon CloudWatch Metrics

Create Grafana Workspace
Navigate to Amazon Managed Grafana from AWS console, then click Create workspace

a. Select the Authentication mechanism

i. AWS IAM Identity Center (AWS SSO) or SAML v2 based Identity Providers

ii. Service Managed Permission or Customer Managed

iii. Choose Next

b. Under “Data sources and notification channels”, choose Amazon CloudWatch

c. Create the Service.

You can use this post for more information on how to create and configure the Grafana workspace with SAML based authentication.

Next, we will show you how to create a dashboard and parse the Logs and Metrics to display the DevOps Guru insights and recommendations.

2. Configure Amazon Managed Grafana

a. Add CloudWatch as a data source:
From the left bar navigation menu, hover over AWS and select Data sources.

b. From the Services dropdown select and configure CloudWatch.

3. Create a Dashboard

a. From the left navigation bar, click on add a new Panel.

b. You will see a demo panel.

c. In the demo panel – Click on Data source and select Amazon CloudWatch.

The Amazon Grafana Workspace dashboard with the Grafana data source dropdown menu open. The drop down has 'Amazon CloudWatch (region name)' highlighted, other options include 'Mixed, 'Dashboard', and 'Grafana'.

d. For this panel we will use CloudWatch metrics to display the number of insights.

e. From Namespace select the AWS/DevOps-Guru name space, Insights as Metric name and Average for Statistics.

In the Amazon Grafana Workspace dashboard the user has entered values in three fields. "Grafana Query with Namespace" has the chosen value: AWS/DevOps-Guru. "Metric name" has the chosen value: Insights. "Statistic" has the chosen value: Average.

click apply

Time series graph contains a single new data point, indicting a recent event.

f. This is our first panel. We can change the panel name from the right-side bar under Title. We will name this panel “Insights”

g. From the top right menu, click save dashboard and give your new dashboard a name

Using Amazon CloudWatch Logs via Amazon EventBridge

For other insights outside of the service metrics, such as a number of insights per specific service or the average for a region or for a specific AWS account, we will need to parse the event logs. These logs first need to be sent to Amazon CloudWatch Logs. We will go over the details on how to set this up and how we can parse these logs in Amazon Managed Grafana using CloudWatch Logs Query Syntax. In this post, we will show a couple of examples. For more details, please check out this User Guide documentation. This is not done by default and we will need to use Amazon EventBridge to pass these logs to CloudWatch.

DevOps Guru logs include other details that can be helpful when building Dashboards, such as region, Insight Severity (High, Medium, or Low), associated resources, and DevOps guru dashboard URL, among other things. For more information, please check out this User Guide documentation.

EventBridge offers a serverless event bus that helps you receive, filter, transform, route, and deliver events. It provides one to many messaging solutions to support decoupled architectures, and it is easy to integrate with AWS Services and 3rd-party tools. Using Amazon EventBridge with DevOps Guru provides a solution that is easy to extend to create a ticketing system through integrations with ServiceNow, Jira, and other tools. It also makes it easy to set up alert systems through integrations with PagerDuty, Slack, and more.

Setting up Amazon CloudWatch Logs

Let’s dive in to creating the EventBridge rule and enhance our Grafana dashboard:

a. First head to Amazon EventBridge in the AWS console.

b. Click Create rule.

Type in rule Name and Description. You can leave the Event bus to default and Rule type to Rule with an event pattern.

c. Select AWS events or EventBridge partner events.

For event Pattern change to Customer patterns (JSON editor) and use:

{"source": ["aws.devops-guru"]}

This filters for all events generated from DevOps Guru. You can use the same mechanism to filter out specific messages such as new insights, or insights closed to a different channel. For this demonstration, let’s consider extracting all events.

As the user configures their EventBridge Rule, for the Creation method they have chosen "Custom pattern (JSON editor) write an event pattern in JSON." For the Event pattern editor just below they have entered {"source":["aws.devops-guru"]}

d. Next, for Target, select AWS service.

Then use CloudWatch log Group.

For the Log Group, give your group a name, such as “devops-guru”.

In the prompt for the new Target's configurations, the user has chosen AWS service as the Target type. For the Select a target drop down, they chose CloudWatch log Group. For the log group, they selected the /aws/events radio option, and then filled in the following input text box with the kebab case group name devops-guru.

e. Click Create rule.

f. Navigate back to Amazon Managed Grafana.
It’s time to add a couple more additional Panels to our dashboard. Click Add panel.
Then Select Amazon CloudWatch, and change from metrics to CloudWatch Logs and select the Log Group we created previously.

In the Grafana Workspace, the user has "Data source" selected as Amazon CloudWatch us-east-1. Underneath that they have chosen to use the default region and CloudWatch Logs. Below that, for the Log Groups they have entered /aws/events/DevOpsGuru

g. For the query use the following to get the number of closed insights:

fields @detail.messageType
| filter detail.messageType="CLOSED_INSIGHT"
| count(detail.messageType)

You’ll see the new dashboard get updated with “Data is missing a time field”.

New panel suggestion with switch to table or open visualization suggestions

You can either open the suggestions and select a gauge that makes sense;

New Suggestions display a dial graph, a bar graph, and a count numerical tracker

Or choose from multiple visualization options.

Now we have 2 panels:

Two panels are shown, one is the new dial graph, and the other is the time series graph that was created earlier.

h. You can repeat the same process. To create 3^rd panel for the new insights using this query:

fields @detail.messageType 
| filter detail.messageType="NEW_INSIGHT" 
| count(detail.messageType)

Now we have 3 panels:

Grafana now shows three 3 panels. Two dial graphs, and the time series graph.

Next, depending on the visualizations, you can work with the Logs and metrics data types to parse and filter the data.

Setting up a 4th panel as table. Under the Query tab, in the query editor, the user has entered the text: fields detail.messageType, detail.insightSeverity, detail.insightUrlfilter | filter detail.messageType="CLOSED_INSIGHT" or detail.messageType="NEW_INSIGHT"

i. For our fourth panel, we will add DevOps Guru dashboard direct link to the AWS Console.

Repeat the same process as demonstrated previously one more time with this query:

fields detail.messageType, detail.insightSeverity, detail.insightUrlfilter 
| filter detail.messageType="CLOSED_INSIGHT" or detail.messageType="NEW_INSIGHT"

Switch to table when prompted on the panel.

Grafana now shows 4 panels. The new panel displays a data table that contains information about the most recent DevOps Guru insights. There are also the two dial graphs, and the time series graph from before.

This will give us a direct link to the DevOps Guru dashboard and help us get to the insight details and Recommendations.

Save your dashboard.

You can extend observability by sending notifications through alerts on dashboards of panels providing metrics. The alerts will be triggered when a condition is met. The Alerts are communicated with Amazon SNS notification mechanism. This is our SNS notification channel setup.

Screenshot: notification settings show Name: DevopsGuruAlertsFromGrafana and Type: SNS

A previously created notification is used next to communicate any alerts when the condition is met across the metrics being observed.

Screenshot: notification setting with condition when count of query is above 5, a notification is sent to DevopsGuruAlertsFromGrafana with message, "More than 5 insights in the past 1 hour"

Cleanup

To avoid incurring future charges, delete the resources.

Navigate to EventBridge in AWS console and delete the rule created in step 4 (a-e) “devops-guru”.
Navigate to CloudWatch logs in AWS console and delete the log group created as results of step 4 (a-e) named “devops-guru”.
Amazon Managed Grafana: Navigate to Amazon Managed Grafana service and delete the Grafana services you created in step 1.

Conclusion

In this post, we have demonstrated how to successfully incorporate Amazon DevOps Guru insights into Amazon Managed Grafana and use Grafana as the observability tool. This will allow Operations team to successfully observe the state of their AWS resources and notify them through Alarms on any preset thresholds on DevOps Guru metrics and logs. You can expand on this to create other panels and dashboards specific to your needs. If you don’t have DevOps Guru, you can start monitoring your AWS applications with AWS DevOps Guru today using this link.

[1] https://www.atlassian.com/incident-management/kpis/cost-of-downtime

About the authors:

3 benefits of migrating and consolidating your source code

2023-04-14 Mark Paulsen

Post Syndicated from Mark Paulsen original https://github.blog/2023-04-14-3-benefits-of-migrating-and-consolidating-your-source-code/

In a previous blog on consolidating your toolkit, we shared strategies to help you simplify your tech stack, which ultimately helps developers be more productive. In fact, developers are almost 60% more likely to feel equipped to do their job when they can easily find what they need.

But there are other benefits of consolidating and simplifying your toolkit that may be surprising–especially when migrating your source code and collaboration history to GitHub.

Today, we’ll explore three benefits that will support enterprises in a business climate where everyone is being asked to do more with less, as well as some resources to help get started on the migration journey.

1. Enable developer self-service culture

Some of the benefits enterprises can achieve with DevOps are improved productivity, security, and collaboration. Silos should be broken down and traditionally separated teams should be working in a cohesive and cloud native way.

Another benefit that DevOps enables, which is a key part of the Platform Engineering technology approach, is the ability for development teams to self-service workflows, processes, and controls which traditionally have either been manual, or tightly-coupled with other teams. A great example of this was covered in a previous blog where we described how to build consistent and shared IaC workflows. IaC workflows can be created by operations teams, if your enterprise separation of duties governance policies require this, but self-serviced when needed by development teams.

But this type of consistent, managed, and governable, self-service culture would not be possible if you have multiple source code management tools in your enterprise. If development teams have to spend time figuring out which tool has the source of truth for the workflow they need to execute, the benefits of DevOps and Platform Engineering quickly deteriorate.

There is no better place to migrate the core of your self-service culture to than GitHub–which is the home to 100 million developers and counting. Your source code management tool should be an enabler for developer productivity and happiness or else they will be reluctant to use it. And if they don’t use it, you won’t have a self-service culture within your enterprise.

2. Save time and money during audits

The latest Forrester report on the economic impact of GitHub Enterprise Cloud and GitHub Advanced Security, determined a 75% improvement in time spent managing tools and code infrastructure. But one of the potentially surprising benefits is related to implementing DevOps and cloud native processes that would both help developers and auditors save time and money.

If your tech stack includes multiple source code tools, and other development tools which may not be integrated our have overlapping capabilities, each time your security, compliance, and audit teams need to review the source of truth for your delivery artifacts, you will need to gather artifacts and setup walkthroughs for each of the tools. This can lead to days and even weeks of lost time and money on simply preparing and executing audits–taking your delivery teams away from creating business value.

Working with GitHub customers, Forrester identified and quantified key benefits of investing in GitHub Enterprise Cloud and GitHub Advanced Security. The corresponding GitHub Ent ROI Estimate Calculator includes factors for time saving on IT Audit preparations related to the number of non-development security or audit staff involved in software development. This itself can lead to hundreds of thousands if not millions of dollars of time savings.

What is not factored into the calculator is the potential time savings for development teams who have a single source of truth for their code and collaboration history. A simplified and centrally auditable tech stack with a single developer-friendly core source code management platform will enable consistent productivity even during traditionally time-consuming audit and compliance reviews–for both developers and non-developers.

3. Keep up with innovation

If you are using another source code platform besides GitHub, or if GitHub is one of several tools that are providing the overlapping functionality, some of your teams may be missing out on the amazing innovations that have been happening lately.

Generative AI is enabling some amazing capabilities and GitHub is at the forefront with our AI pair-programmer, GitHub Copilot. The improvements to developer productivity are truly amazing and continue to improve.

A graphic showing how many developers and companies have already used GitHub Copilot and how it's helping improve productivity and happiness. — A graphic showing how many developers and companies have already used GitHub Copilot and how it’s helping improve productivity and happiness.

GitHub continues to innovate with the news about GitHub Copilot X, which is not only adopting OpenAI’s new GPT-4 model, but introducing chat and voice for GitHub Copilot, and bringing GitHub Copilot to pull requests, the command line, and docs to answer questions on your projects.

Innovations like this need to be rolled-out in a controlled and governable manner within many enterprises. But if your techstack is overly complex and you have several source code management tools, the roll-out may take a long time or may be stalled while security and compliance reviews take place.

However, if your development core is GitHub, security and compliance reviews can happen once, on a centrally managed platform that is well understood and secure. And you’ll be front row for all of the amazing new innovations that GitHub will be releasing down the road.

Get started today

If you are planning on migrating your source code and collaboration history to GitHub and have questions, thankfully, many other enterprises have done this already with great success and there are resources to help you through the process.

Visit our GitHub Enterprise Importer documentation for details on supported migration paths, guides for different migration sources, and more.

If you want to learn more about how GitHub can benefit your business, while increasing developer velocity and collaboration, see how GitHub Enterprise can help.

Enabling DevSecOps with Amazon CodeCatalyst

2023-03-29 Imtranur Rahman

Post Syndicated from Imtranur Rahman original https://aws.amazon.com/blogs/devops/enabling-devsecops-with-amazon-codecatalyst/

DevSecOps is the practice of integrating security testing at every stage of the software development process. Amazon CodeCatalyst includes tools that encourage collaboration between developers, security specialists, and operations teams to build software that is both efficient and secure. DevSecOps brings cultural transformation that makes security a shared responsibility for everyone who is building the software.

Introduction

In a prior post in this series, Maintaining Code Quality with Amazon CodeCatalyst Reports, I discussed how developers can quickly configure test cases, run unit tests, set up code coverage, and generate reports using CodeCatalyst’s workflow actions. This was done through the lens of Maxine, the main character of Gene Kim’s The Unicorn Project. In the story, Maxine meets Purna – the QA and Release Manager and Shannon – a Security Engineer. Everyone has the same common goal to integrate security into every stage of the Software Development Lifecycle (SDLC) to ensure secure code deployments. The issue Maxine faces is that security testing is not automated and the separation of responsibilities by role leads to project stagnation.

In this post, I will focus on how DevSecOps teams can use Amazon CodeCatalyst to easily integrate and automate security using CodeCatalyst workflows. I’ll start by checking for vulnerabilities using OWASP dependency checker and Mend SCA. Then, I’ll conduct Static Analysis (SA) of source code using Pylint. I will also outline how DevSecOps teams can influence the outcome of a build by defining success criteria for Software Composition Analysis (SCA) and Static Analysis actions in the workflow. Last, I’ll show you how to gain insights from CodeCatalyst reports and surface potential issues to development teams through CodeCatalyst Issues for faster remediation.

Prerequisites

If you would like to follow along with this walkthrough, you will need to:

Have an AWS Builder ID for signing in to CodeCatalyst.
Belong to a CodeCatalyst space and have the Space administrator role assigned to you in that space. For more information, see Creating a space in CodeCatalyst, Managing members of your space, and Space administrator role.
Have an AWS account associated with your space and have the IAM role in that account. For more information about the role and role policy, see Creating a CodeCatalyst service role.
Have a Mend Account (required for the optional Mend Section)

Walkthrough

To follow along, you can re-use a project you created previously, or you can refer to a previous post that walks through creating a project using the Modern Three-tier Web Application blueprint. Blueprints provide sample code and CI/CD workflows to help you get started easily across different combinations of programming languages and architectures. The back-end code for this project is written in Python and the front-end code is written in JavaScript.

Modern Three-tier Web Application architecture including a presentation, application and data layer

Figure 1. Modern Three-tier Web Application architecture including a presentation, application and data layer

Once the project is deployed, CodeCatalyst opens the project overview. Select CI/CD → Workflows → ApplicationDeploymentPipeline to view the current workflow.

Six step Workflow described in the prior paragraph

Figure 2. ApplicationDeploymentPipeline

Modern applications use a wide array of open-source dependencies to speed up feature development, but sometimes these dependencies have unknown exploits within them. As a DevSecOps engineer, I can easily edit this workflow to scan for those vulnerable dependencies to ensure I’m delivering secure code.

Software Composition Analysis (SCA)

Software composition analysis (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and analyzes whether they are up-to-date, contain security flaws, or have licensing requirements. For this walkthrough, I’ll highlight two SCA methods:

I’ll use the open-source OWASP Dependency-Check tool to scan for vulnerable dependencies in my application. It does this by determining if there is a Common Platform Enumeration (CPE) identifier for a given dependency. If found, it will generate a report linking to the associated Common Vulnerabilities and Exposures (CVE) entries.
I’ll use CodeCatalyst’s Mend SCA Action to integrate Mend for SCA into my CI/CD workflow.

Note that developers can replace either of these with a tool of their choice so long as that tool outputs an SCA report format supported by CodeCatalyst.

Software Composition Analysis using OWASP Dependency Checker

To get started, I select Edit at the top-right of the workflows tab. By default, CodeCatalyst opens the YAML tab. I change to the Visual tab to visually edit the workflow and add a CodeCatalyst Action by selecting “+Actions” (1) and then “+” (2). Next select the Configuration (3) tab and edit the Action Name (4). Make sure to select the check mark after you’re done.

New action configuration showing steps to add a build action

Figure 3. New Action Initial Configuration

Scroll down in the Configuration tab to Shell commands. Here, copy and paste the following command snippets that runs when action is invoked.

#Set Source Repo Directory to variable
- Run: sourceRepositoryDirectory=$(pwd)
#Install Node Dependencies
- Run: cd web &amp;&amp; npm install
#Install known vulnerable dependency (This is for Demonstrative Purposes Only)
- Run: npm install [email protected]
#Go to parent directory and download OWASP dependency-check CLI tool
- Run: cd .. && wget https://github.com/jeremylong/DependencyCheck/releases/download/v8.1.2/dependency-check-8.1.2-release.zip
#Unzip file - Run: unzip dependency-check-8.1.2-release.zip
#Navigate to dependency-check script location
- Run: cd dependency-check/bin
#Execute dependency-check shell script. Outputs in SARIF format
- Run: ./dependency-check.sh --scan $sourceRepositoryDirectory/web -o $sourceRepositoryDirectory/web/vulnerabilities -f SARIF --disableYarnAudit

These commands will install the node dependencies, download the OWASP dependency-check tool, and run it to generate findings in a SARIF file. Note the third command, which installs a module with known vulnerabilities (This is for demonstrative purposes only).

On the Outputs (1) tab, I change the Report prefix (2) to owasp-frontend. Then I set the Success criteria (3) for Vulnerabilities to 0 – Critical (4). This configuration will stop the workflow if any critical vulnerabilities are found.

Report configuration showing SCA configuration

Figure 4: owasp-dependecy-check-frontend

It is a best practice to scan for vulnerable dependencies before deploying resources so I’ll set my owasp-dependency-check-frontend action as the first step in the workflow. Otherwise, I might accidentally deploy vulnerable code. To do this, I select the Build (1) action group and set the Depends on (2) dropdown to my owasp-dependency-check-frontend action. Now, my action will run before any resources are built and deployed to my AWS environment. To save my changes and run the workflow, I select Commit (3) and provide a commit message.

Setting OWASP as the First Action

Figure 5: Setting OWASP as the First Workflow Action

Amazon CodeCatalyst shows me the state of the workflow run in real-time. After the workflow completes, I see that the action has entered a failed state. If I were a QA Manager like Purna from the Unicorn Project, I would want to see why the action failed. On the lefthand navigation bar, I select the Reports → owasp-frontend-web/vulnerabilities/dependency-check-report.sarif for more details.

SCA report showing 1 critical and 7 medium findings

Figure 6: SCA Report Overview

This report view provides metadata such as the workflow name, run ID, action name, repository, and the commit ID. I can also see the report status, a bar graph of vulnerabilities grouped by severity, the number of libraries scanned, and a Findings panel. I had set the success criteria for this report to 0 – Critical so it failed because 1 Critical vulnerability was found. If I select a specific finding ID, I can learn more about that specific finding and even view it on the National Vulnerability Database website.

Dialog showing CVE details for the critical vulnerability

Figure 7: Critical Vulnerability CVE Finding

Now I can raise this issue with the development team through the Issues board on the left-hand navigation panel. See this previous post to learn more about how teams can collaborate in CodeCatalyst.

Note: Let’s remove [email protected] install from owasp-dependency-check-frontend action’s list of commands to allow the workflow to proceed and finish successfully.

Software Composition Analysis using Mend

Mend, formerly known as WhiteSource, is an application security company built to secure today’s digital world. Mend secures all aspects of software, providing automated remediation, prevention, and protection from problem to solution versus only detection and suggested fixes. Find more information about Mend here.

Mend Software Composition Analysis (SCA) can be run as an action within Amazon CodeCatalyst CI/CD workflows, making it easy for developers to perform open-source software vulnerability detection when building and deploying their software projects. This makes it easier for development teams to quickly build and deliver secure applications on AWS.

Getting started with CodeCatalyst and Mend is very easy. After logging in to my Mend Account, I need to create a new Mend Product named Amazon-CodeCatalyst and a Project named mythical-misfits.

Next, I navigate back to my existing workflow in CodeCatalyst and add a new action. However, this time I’ll select the Mend SCA action.

Adding the Mend action

Figure 8: Mend Action

All I need to do now is go to the Configuration tab and set the following values:

Mend Project Name: mythical-misfits
Mend Product Name: Amazon-CodeCatalyst
Mend License Key: You can get the License Key from your Mend account in the CI/CD Integration section. You can get more information from here.

Mend Action Configuration

Figure 9: Mend Action Configuration

Then I commit the changes and return to Mend.

Mend console showing analysis of the Mythical Mysfits app

Figure 10: Mend Console

After successful execution, Mend will automatically update and show a report similar to the screenshot above. It contains useful information about this project like vulnerabilities, licenses, policy violations, etc. To learn more about the various capabilities of Mend SCA, see the documentation here.

Static Analysis (SA)

Static analysis, also called static code analysis, is a method of debugging that is done by examining the code without executing the program. The process provides an understanding of the code structure and can help ensure that the code adheres to industry standards. Static analysis is used in software engineering by software development and quality assurance teams.

Currently, my workflow does not do static analysis. As a DevSecOps engineer, I can add this as a step to the workflow. For this walkthrough, I’ll create an action that uses Pylint to scan my Python source code for Static Analysis. Note that you can also use other static analysis tools or a GitHub Action like SuperLinter, as covered in this previous post.

Static Analysis using Pylint

After navigating back to CI/CD → Workflows → ApplicationDeploymentPipeline and selecting Edit, I create a new test action. I change the action name to pylint and set the Configuration tab to run the following shell commands:

- Run: pip install pylint 
- Run: pylint $PWD --recursive=y --output-format=json:pylint-report.json --exit-zero

On the Outputs tab, I change the Report prefix to pylint. Then I set the Success criteria for Static analysis as shown in the figure below:

Report configuration tab showing static analysis configuration

Figure 11: Static Analysis Report Configuration

Being that Static Analysis is typically run before any execution, the pylint or OWASP action should be the very first action in the workflow. For the sake of this blog we will use pylint. I select the OWASP or Mend actions I created before, set the Depends on dropdown to my pylint action, and commit the changes. Once the workflow finishes, I can go to Reports > pylint-pylint-report.json for more details.

Static analysis report showing 7 high findings

Figure 12: Pylint Static Analysis Report

The Report status is Failed because more than 1 high-severity or above bug was detected. On the Results tab I can view each finding in greater detail, including the severity, type of finding, message from the linter, and which specific line the error originates from.

Cleanup

If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the two stacks that AWS Cloud Development Kit (CDK) deployed using the AWS CloudFormation console in the AWS account you associated when you launched the blueprint. These stacks will have names like mysfitsXXXXXWebStack and mysfitsXXXXXAppStack. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project.

Conclusion

In this post, I demonstrated how DevSecOps teams can easily integrate security into Amazon CodeCatalyst workflows to automate security testing by checking for vulnerabilities using OWASP dependency checker or Mend through Software Composition Analysis (SCA) of dependencies. I also outlined how DevSecOps teams can configure Static Analysis (SA) reports and use success criteria to influence the outcome of a workflow action.

Extending CloudFormation and CDK with Third-Party Extensions

2023-03-29 Lucas Chen

Post Syndicated from Lucas Chen original https://aws.amazon.com/blogs/devops/extending-cloudformation-and-cdk-with-third-party-extensions/

Did you know you can use CloudFormation to manage third-party resources? The AWS CloudFormation Public Registry provides a searchable collection of CloudFormation extensions and makes it easy to discover and provision them in CloudFormation templates and AWS Cloud Development Kit (CDK) applications. In the past three months, we’ve added a number of new, exciting partners to the Public Registry, including GitLab, Okta, and PagerDuty.

The extensions available on the registry are wide-ranging and include third-party resources from partners such as MongoDB; hooks, which are preventative controls that add safeguards to provisioning; and modules, which are re-usable components that take into account best practices and opinionated definitions of resources. AWS Partner Network (APN), third parties, and the developer community contribute these extensions to the Public Registry. Using extensions, customers no longer need to create and maintain custom provisioning logic for resource types from third-party vendors.

Over last few months, AWS collaborated with partners to develop and publish over 80 new resources across 14 providers to Public Registry for CloudFormation. Below is a summary of the new resource type additions.

Key Benefits

Here are some of the benefits for extension builders and consumers when publishing extensions to the public registry:

Discoverability – Publishing your extensions in the public registry will make them discoverable by 1M+ active CloudFormation and CDK customers.
CDK Support – We’re seeing rapid growth in the adoption of the CDK amongst the developer population. Upon publishing to the registry, L1 CDK Constructs will automatically be created for your third party resources making them compatible with the CDK with no added work required. These constructs will also be listed on Construct Hub and aids discoverability discoverable by customers. Note: Automated L1 CDK construct generation is currently an experimental feature.
Drift detection – Third-party resource types in the public registry also integrate with drift detection. After creating a resource from a third-party resource type, CloudFormation will detect changes to the third-party resource from its template configuration, known as configuration drift, just as it would with AWS resources.
AWS Config – You can also use AWS Config to manage compliance for third-party resources consumed from the registry. The resource types are automatically tracked as Configuration Items when you have configured AWS Config to record them, and used CloudFormation to create, update, and delete them. Whether the resource types you use are third-party or AWS resources, you can view configuration history for them, in addition to being able to write AWS Config rules to verify configuration best practices.
Abstraction of Best Practices with Modules – Browse and use modules from the registry when creating your CloudFormation templates to ensure you’re provisioning resources while adhering to best practices.
AWS Cloud Control API – The AWS Cloud Control API allows AWS partners and customers to interface with your resource type through API calls using Create, Read, Update, Delete, and List (CRUD-L) operations. Resources in the registry will be automatically integrated with our AWS Cloud Control API and expands your third party resource compatibility to even more AWS services and IaC tools.

We’ve seen great momentum from our partners and developer community over the past year. We are looking forward to continued investment and innovation in the Public Registry.

How to Get Started

For Resource Type Users: Explore and Activate Third Party Resource Types

Third party resource types must first be activated before they can be used. You do this by logging into your AWS Console > Navigate to CloudFormation > Registry > Public extensions > Set the Publisher to Third Party. This will show you a list of available third-party resources in your region (note that different regions may have a different set of third-party resource types). Select the radio box next to the resource types you want to activate and click the activate button at the top of the list.

Figure 1:

Don’t see the extension you need in the registry?

You can submit requests for new third-party extensions through our Community Registry Extensions Github repo issue tracker! Click the New Issue button and describe the third-party extension along with information about your use case.

For Developers and Publishers: Join the CloudFormation Developer Community and Start Building

You can see several of the community-built registry extensions in the AWS CloudFormation Community Registry Extensions repository and even contribute yourself. You can also read about the experiences and lessons learned from publishing to the Registry through this blog written by Cloudsoft.

For developers looking to create new resource types to add to the public Registry, follow this creating resource types walkthrough help you get started. If you need assistance creating, publishing resources, or just want to join the discussion, you can join the conversation today in our CloudFormation Discord Channel. We’d love to hear about your experiences and use cases in developing innovations with registry extensions.

About the authors:

Publish Amazon DevOps Guru Insights to ServiceNow for Incident Management

2023-03-29 Abdullahi Olaoye

Post Syndicated from Abdullahi Olaoye original https://aws.amazon.com/blogs/devops/publish-amazon-devops-guru-insights-to-servicenow-for-incident-management/

Amazon DevOps Guru is a fully managed AIOps service that uses machine learning (ML) to quickly identify when applications are behaving outside of their normal operating patterns and generates insights from its findings. These insights generated by Amazon DevOps Guru can be used to alert on-call teams to react to anomalies for mission critical workloads. Various customers already utilize Incident management systems like ServiceNow to identify, analyze and resolve critical incidents which could impact business operations. ServiceNow is an IT Service Management (ITSM) platform that enables enterprise organizations to improve operational efficiencies. Among its products is Incident Management which provides a single pane view to customers and allows customers restore services and resolve issues quickly.

This blog post will show you how to integrate Amazon DevOps Guru insights with ServiceNow to automatically create and manage Incidents. We will demonstrate how an insight generated by Amazon DevOps Guru for an anomaly can automatically create a ServiceNow Incident, update the incident when there are new anomalies or recommendations from Amazon DevOps Guru, and close the ServiceNow Incident once the insight is resolved by Amazon DevOps Guru.

Overview of solution

This solution uses a combination of event driven architecture and Serverless technologies, to integrate DevOps Guru insights with ServiceNow. When an Amazon DevOps Guru insight is created, an Amazon EventBridge rule is used to capture the insight as an event and routed to an AWS Lambda Function target. The lambda function interacts with ServiceNow using a REST API to create, update and close an incident for corresponding DevOps Guru events captured by EventBridge.

The EventBridge rule can be customized to capture all DevOps Guru insights or narrowed down to specific insights. In this blog, we will be capturing all DevOps Guru insights and will be performing actions on ServiceNow for the below DevOps Guru events:

DevOps Guru New Insight Open
DevOps Guru New Anomaly Association
DevOps Guru Insight Severity Upgraded
DevOps Guru New Recommendation Created
DevOps Guru Insight Closed

Figure 1: Amazon DevOps Guru Integration with ServiceNow using Amazon EventBridge and AWS Lambda

Solution Implementation Steps

Prerequisites

Before you deploy the solution and proceed with this walkthrough, you should have the following prerequisites:

Gather the hostname for your ServiceNow cloud instance. If you do not have a ServiceNow instance, you can request a developer instance through the ServiceNow Developer page.
Gather the credentials of a ServiceNow user who has permissions to make REST API calls to ServiceNow, specifically to the Table API. If you don’t have a user provisioned, you can create one by following the steps in Getting started with the REST API in the ServiceNow documentation.
Create a secret in Secrets Manager to store the ServiceNow credentials created in previous step. You can choose any name for the secret but it should have two key/value pairs, one for username and other for password.
Enable DevOps Guru for your applications by following these steps or you can follow this blog to deploy a sample serverless application that can be used to generate DevOps Guru insights for anomalies detected in the application.
Install and set up SAM CLI – Install the SAM CLI
Download and set up Java. The version should be matching to the runtime that you defined in the SAM template.yaml Serverless function configuration – Install the Java SE Development Kit 11
Maven – Install Maven
Docker – Install Docker community edition

You have two options to deploy this solution, one options is to deploy from the AWS Serverless Repository and other from the Command Line Interface (CLI).

Option 1: Deploy sample ServiceNow Connector App from AWS Serverless Repository

The DevOps Guru ServiceNow Connector application is available in the AWS Serverless Application Repository which is a managed repository for serverless applications. The application is packaged with an AWS Serverless Application Model (SAM) template, definition of the AWS resources used and the link to the source code. Follow the steps below to quickly deploy this serverless application in your AWS account.

Follow the steps below to quickly deploy this serverless application in your AWS account:

Login to the AWS management console of the account to which you plan to deploy this solution.
Go to the DevOps Guru ServiceNow Connector application in the AWS Serverless Repository and click on “Deploy”.

Figure 2: Deploy solution through AWS Serverless Repository
The Lambda application deployment screen will be displayed where you can enter the ServiceNow hostname (do not include the https prefix) and the Secret Name you created in the prerequisite steps. Click on the ‘Deploy’ button.

Figure 3: AWS Lambda Application Settings

After successful deployment the AWS Lambda Application page will display the “Create complete” status for the serverlessrepo-DevOps-Guru-ServiceNow-Connector application. The CloudFormation template creates four resources:
1. Lambda function which has the logic to integrate to the ServiceNow
2. Event Bridge rule for the DevOps Guru Insights
3. Lambda permission
4. IAM role
5. Now you can skip Option 2 and follow the steps in the “Test the Solution” section to trigger some DevOps Guru insights and validate that the incidents are created and updated in ServiceNow.

Option 2: Build and Deploy sample ServiceNow Connector App using AWS SAM Command Line Interface

As you have seen above, you can directly deploy the sample serverless application from the Serverless Repository with one click deployment. Alternatively, you can choose to clone the github source repository and deploy using the SAM CLI from your terminal.

The Serverless Application Model Command Line Interface (SAM CLI) is an extension of the AWS CLI that adds functionality for building and testing serverless applications. The CLI provides commands that enable you to verify that AWS SAM template files are written according to the specification, invoke Lambda functions locally, step-through debug Lambda functions, package and deploy serverless applications to the AWS Cloud, and so on. For details about how to use the AWS SAM CLI, including the full AWS SAM CLI Command Reference, see AWS SAM reference – AWS Serverless Application Model.

Before you proceed, make sure you have completed the Prerequisites section in the beginning which should set up the AWS SAM CLI, Maven and Java on your local terminal. You also need to install and set up Docker to run your functions in an Amazon Linux environment that matches Lambda.

Follow the steps below to build and deploy this serverless application using AWS SAM CLI in your AWS account:

Clone the source code from the github repo

$ git clone https://github.com/aws-samples/amazon-devops-guru-connector-servicenow.git

Before you build the resources defined in the SAM template, you can use the below validate command which will run cfn-lint validations on your SAM JSON/YAML template

$ sam validate –-lint --template template.yaml

3. Build the application with SAM CLI

$ cd amazon-devops-guru-connector-servicenow
$ sam build

If everything is set up correctly, you should have a success message like shown below:

Build Succeeded

Built Artifacts : .aws-sam/build
Built Template : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Validate SAM template: sam validate
[*] Invoke Function: sam local invoke
[*] Test Function in the Cloud: sam sync --stack-name {{stack-name}} --watch
[*] Deploy: sam deploy –guided

4. Deploy the application with SAM CLI

$ sam deploy –-guided

This command will package and deploy your application to AWS, with a series of prompts that you should respond to as shown below:

Stack Name: The name of the stack to deploy to CloudFormation. This should be unique to your account and region, and a good starting point would be something matching your project name – amazon-devops-guru-connector-servicenow
AWS Region: The AWS region you want to deploy your application to.
Parameter ServiceNowHost []: The ServiceNow host name/instance URL you set up. Example: dev92031.service-now.com
Parameter SecretName []: The secret name that you set up for ServiceNow credentials in the Prerequisites.
Confirm changes before deploy: If set to yes, any change sets will be shown to you before execution for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes.
Allow SAM CLI IAM role creation: Many AWS SAM templates, including this example, create AWS IAM roles required for the AWS Lambda function(s) included to access AWS services. By default, these are scoped down to minimum required permissions. To deploy an AWS CloudFormation stack which creates or modifies IAM roles, the CAPABILITY_IAM value for capabilities must be provided. If permission isn’t provided through this prompt, to deploy this example you must explicitly pass --capabilities CAPABILITY_IAM to the sam deploy command.
Disable rollback [y/N]: If set to Y, preserves the state of previously provisioned resources when an operation fails.
Save arguments to configuration file (samconfig.toml): If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can just re-run sam deploy without parameters to deploy changes to your application.

After you enter your parameters, you should see something like this if you have provided Y to view and confirm ChangeSets. Proceed here by providing ‘Y’ for deploying the resources.

Initiating deployment
=====================
Uploading to amazon-devops-guru-connector-servicenow/46bb4841f8f37fd41d3f40f86f31c4d7.template 1918 / 1918 (100.00%)

Waiting for changeset to be created..
CloudFormation stack changeset
-----------------------------------------------------------------------------------------------------------------------------------------------------
Operation LogicalResourceId ResourceType Replacement
-----------------------------------------------------------------------------------------------------------------------------------------------------
+ Add FunctionsDevOpsGuruPermission AWS::Lambda::Permission N/A
+ Add FunctionsDevOpsGuru AWS::Events::Rule N/A
+ Add FunctionsRole AWS::IAM::Role N/A
+ Add Functions AWS::Lambda::Function N/A
-----------------------------------------------------------------------------------------------------------------------------------------------------

Changeset created successfully. arn:aws:cloudformation:us-east-1:123456789012:changeSet/samcli-deploy1669232233/7c97b7f5-369d-400d-89cd-ebabefaa0b57

Previewing CloudFormation changeset before deployment
======================================================
Deploy this changeset? [y/N]:

Once the deployment succeeds, you should be able to see the successful creation of your resources

CloudFormation events from stack operations (refresh every 0.5 seconds)
-----------------------------------------------------------------------------------------------------------------------------------------------------
ResourceStatus ResourceType LogicalResourceId ResourceStatusReason
-----------------------------------------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS AWS::CloudFormation::Stack amazon-devops-guru-connector- User Initiated
servicenow
CREATE_IN_PROGRESS AWS::IAM::Role FunctionsRole -
CREATE_IN_PROGRESS AWS::IAM::Role FunctionsRole Resource creation Initiated
CREATE_COMPLETE AWS::IAM::Role FunctionsRole -
CREATE_IN_PROGRESS AWS::Lambda::Function Functions -
CREATE_IN_PROGRESS AWS::Lambda::Function Functions Resource creation Initiated
CREATE_COMPLETE AWS::Lambda::Function Functions -
CREATE_IN_PROGRESS AWS::Events::Rule FunctionsDevOpsGuru -
CREATE_IN_PROGRESS AWS::Events::Rule FunctionsDevOpsGuru Resource creation Initiated
CREATE_COMPLETE AWS::Events::Rule FunctionsDevOpsGuru -
CREATE_IN_PROGRESS AWS::Lambda::Permission FunctionsDevOpsGuruPermission -
CREATE_IN_PROGRESS AWS::Lambda::Permission FunctionsDevOpsGuruPermission Resource creation Initiated
CREATE_COMPLETE AWS::Lambda::Permission FunctionsDevOpsGuruPermission -
CREATE_COMPLETE AWS::CloudFormation::Stack amazon-devops-guru-connector- -
servicenow
-----------------------------------------------------------------------------------------------------------------------------------------------------

Successfully created/updated stack - amazon-devops-guru-connector-servicenow in us-east-1

You can also use the below command to list the resources deployed by passing in the stack name.

$ sam list resources --stack-name amazon-devops-guru-connector-servicenow

You can also choose to test and debug your function locally with sample events using the SAM CLI local functionality. Test a single function by invoking it directly with a test event. An event is a JSON document that represents the input that the function receives from the event source. Refer the Invoking Lambda functions locally – AWS Serverless Application Model link here for more details.

Follow the below steps for testing the lambda with the SAM CLI local. You have to create an env.json file with the correct values for your ServiceNow Host and SecretManager secret name that was created in the previous step.

Make sure you have created the AWS Secrets Manager secret with the desired name as mentioned in the prerequisites, which should be used here for SECRET_NAME.
Create env.json as below, by replacing the values for SERVICE_NOW_HOST and SECRET_NAME with your real value. These will be set as the local Lambda execution environment variables.

{"Parameters": {"SERVICE_NOW_HOST": "SNOW_HOST","SECRET_NAME": "SNOW_CREDS"}}

Run the command below to validate locally that with a sample DevOps Guru payload, to trigger Lambda locally and invoke. Remember for this to work, you should have Docker instance running and also the Secret Name created in your AWS account.

$ sam local invoke Functions --event Functions/src/test/Events/CreateIncident.json --env-vars Functions/src/test/Events/env.json

Once you are done with the above steps, move on to “Test the Solution” section below to trigger sample DevOps Guru insights and validate that the incidents are created and updated in ServiceNow.

Test the Solution

To test the solution, we will simulate a DevOps Guru insight. You can also simulate an insight by following the steps in this blog. After an anomaly is detected in the application, DevOps Guru creates an insight as seen below.

Sample DevOps Guru insights page with anomalous behavior of DynamoDB ThrottledRequests from the application deployed with the workshop link.

Figure 4: DevOps Guru Insight created for anomalous behavior

For the DevOps Guru insight shown above, a corresponding incident is automatically created on ServiceNow as shown below. In addition to the incident creation, any new anomalies and recommendations from DevOps Guru is also associated with the incident.

ServiceNow incident detail page with the DevOps Guru insight information.

Figure 5: Corresponding ServiceNow Incident is created for the DevOps Guru Insight

When the anomalous behavior that generated the DevOps Guru insight is resolved, DevOps Guru automatically closes the insight. The corresponding ServiceNow incident that was created for the insight is also closed as seen below

ServiceNow incident Notes section showing Incident as resolved due to the insight being closed in Amazon DevOps Guru.

Figure 6: ServiceNow Incident created for DevOps Guru Insight is resolved due to insight closure

Cleaning up

To avoid incurring future charges, delete the resources.

To delete the sample application that you created, use the AWS CLI command below and pass the stack name you provided in the sam deploy step.

$ aws cloudformation delete-stack --stack-name amazon-devops-guru-connector-servicenow

You could also use the AWS CloudFormation Console to delete the stack:

AWS CloudFormation console with Delete option to clean up the deployed stack.

Figure 7: AWS Stack Console with Delete action

Conclusion

This blog post showcased how DevOps Guru continuously monitor resources in a particular region in your AWS account and automatically detects operational issues, predicts impending resource exhaustion, details likely cause, and recommends remediation actions. This post described a custom solution using serverless integration pattern with AWS Lambda and Amazon EventBridge which enabled integration of the DevOps Guru insights with customer’s most popular ITSM and Change management tool ServiceNow thus streamlining the Service Management governance and oversight over AWS services. Using this solution helps Customer’s with ServiceNow to improve their operational efficiencies, and get customized insights and real time incident alerts and management directly from DevOps Guru which provides a single pane of glass to restore services and systems quickly.

This solution was created to help customers who already use ServiceNow Incident Management, if you are already using Incident Manager from AWS Systems Manager, check out how that works with Amazon DevOps Guru here.

To learn more about Amazon DevOps Guru, join us for a free hands-on Immersion Day. Events are virtual and hosted at three global time zones. Register here: April 12th.

About the authors:

Building diversified and cost-optimized EC2 server groups in Spinnaker

2023-03-24 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-diversified-and-cost-optimized-ec2-server-groups-in-spinnaker/

This blog post is written by Sandeep Palavalasa, Sr. Specialist Containers SA, and Prathibha Datta-Kumar, Software Development Engineer

Spinnaker is an open source continuous delivery platform created by Netflix for releasing software changes rapidly and reliably. It enables teams to automate deployments into pipelines that are run whenever a new version is released with proven deployment strategies that are faster and more dependable with zero downtime. For many AWS customers, Spinnaker is a critical piece of technology that allows developers to deploy their applications safely and reliably across different AWS managed services.

Listening to customer requests on the Spinnaker open source project and in the Amazon EC2 Spot Instances integrations roadmap, we have further enhanced Spinnaker’s ability to deploy on Amazon Elastic Compute Cloud (Amazon EC2). The enhancements make it easier to combine Spot Instances with On-Demand, Reserved, and Savings Plans Instances to optimize workload costs with performance. You can improve workload availability when using Spot Instances with features such as allocation strategies and proactive Spot capacity rebalancing, when you are flexible about Instance types and Availability Zones. Combinations of these features offer the best possible experience when using Amazon EC2 with Spinnaker.

In this post, we detail the recent enhancements, along with a walkthrough of how you can use them following the best practices.

Amazon EC2 Spot Instances

EC2 Spot Instances are spare compute capacity in the AWS Cloud available at steep discounts of up to 90% when compared to On-Demand Instance prices. The primary difference between an On-Demand Instance and a Spot Instance is that a Spot Instance can be interrupted by Amazon EC2 with a two-minute notification when Amazon EC2 needs the capacity back. Amazon EC2 now sends rebalance recommendation notifications when Spot Instances are at an elevated risk of interruption. This signal can arrive sooner than the two-minute interruption notice. This lets you proactively replace your Spot Instances before it’s interrupted.

The best way to adhere to Spot best practices and instance fleet management is by using an Amazon EC2 Auto Scaling group When using Spot Instances in Auto Scaling group, enabling Capacity Rebalancing helps you maintain workload availability by proactively augmenting your fleet with a new Spot Instance before a running instance is interrupted by Amazon EC2.

Spinnaker concepts

Spinnaker uses three key concepts to describe your services, including applications, clusters, and server groups, and how your services are exposed to users is expressed as Load balancers and firewalls.

An application is a collection of clusters, a cluster is a collection of server groups, and a server group identifies the deployable artifact and basic configuration settings such as the number of instances, autoscaling policies, metadata, etc. This corresponds to an Auto Scaling group in AWS. We use Auto Scaling groups and server groups interchangeably in this post.

Spinnaker and Amazon EC2 Integration

In mid-2020, we started looking into customer requests and gaps in the Amazon EC2 feature set supported in Spinnaker. Around the same time, Spinnaker OSS added support for Amazon EC2 Launch Templates. Thanks to their effort, we could follow-up and expand the Amazon EC2 feature set supported in Spinnaker. Now that we understand the new features, let’s look at how to use some of them in the following tutorial spinnaker.io.

Here are some highlights of the features contributed recently:

*Feature*	*Why use it? (Example use cases)*
Multiple Instance Types	Tap into multiple capacity pools to achieve and maintain the desired scale using Spot Instances.
Combining On-Demand and Spot Instances	– Control the proportion of On-Demand and Spot Instances launched in your sever group. – Combine Spot Instances with Amazon EC2 Reserved Instances or Savings Plans.
Amazon EC2 Auto Scaling allocation strategies	Reduce overall Spot interruptions by launching from Spot pools that are optimally chosen based on the available Spot capacity, using capacity-optimized Spot allocation strategy.
Capacity rebalancing	Improve your workload availability by proactively shifting your Spot capacity to optimal pools by enabling capacity rebalancing along with capacity-optimized allocation strategy.
Improved support for burstable performance instance types with custom credit specification	Reduce costs by preventing wastage of CPU cycles.

We recommend using Spinnaker stable release 1.28.x for API users and 1.29.x for UI users. Here is the Git issue for related PRs and feature releases.

Now that we understand the new features, let’s look at how to use some of them in the following tutorial.

Example tutorial: Deploy a demo web application on an Auto Scaling group with On-Demand and Spot Instances

In this example tutorial, we setup Spinnaker to deploy to Amazon EC2, create an Application Load Balancer, and deploy a demo application on a server group diversified across multiple instance types and purchase options – this case On-Demand and Spot Instances.

We leverage Spinnaker’s API throughout the tutorial to create new resources, along with a quick guide on how to deploy the same using Spinnaker UI (Deck) and leverage UI to view them.

Prerequisites

As a prerequisite to complete this tutorial, you must have an AWS Account with an AWS Identity and Access Management (IAM) User that has the AdministratorAccess configured to use with AWS Command Line Interface (AWS CLI).

1. Spinnaker setup

We will use the AWS CloudFormation template setup-spinnaker-with-deployment-vpc.yml to setup Spinnaker and the required resources.

1.1 Create an Secure Shell(SSH) keypair used to connect to Spinnaker and EC2 instances launched by Spinnaker.

AWS_REGION=us-west-2 # Change the region where you want Spinnaker deployed
EC2_KEYPAIR_NAME=spinnaker-blog-${AWS_REGION}
aws ec2 create-key-pair --key-name ${EC2_KEYPAIR_NAME} --region ${AWS_REGION} --query KeyMaterial --output text > ~/${EC2_KEYPAIR_NAME}.pem
chmod 600 ~/${EC2_KEYPAIR_NAME}.pem

1.2 Deploy the Cloudformation stack.

STACK_NAME=spinnaker-blog
SPINNAKER_VERSION=1.29.1 # Change the version if newer versions are available
NUMBER_OF_AZS=3
AVAILABILITY_ZONES=${AWS_REGION}a,${AWS_REGION}b,${AWS_REGION}c
ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
S3_BUCKET_NAME=spin-persitent-store-${ACCOUNT_ID}

# Download template
curl -o setup-spinnaker-with-deployment-vpc.yml https://raw.githubusercontent.com/awslabs/ec2-spot-labs/master/ec2-spot-spinnaker/setup-spinnaker-with-deployment-vpc.yml

# deploy stack
aws cloudformation deploy --template-file setup-spinnaker-with-deployment-vpc.yml \
    --stack-name ${STACK_NAME} \
    --parameter-overrides NumberOfAZs=${NUMBER_OF_AZS} \
    AvailabilityZones=${AVAILABILITY_ZONES} \
    EC2KeyPairName=${EC2_KEYPAIR_NAME} \
    SpinnakerVersion=${SPINNAKER_VERSION} \
    SpinnakerS3BucketName=${S3_BUCKET_NAME} \
    --capabilities CAPABILITY_NAMED_IAM --region ${AWS_REGION}

1.3 Connecting to Spinnaker

1.3.1 Get the SSH command to port forwarding for Deck – the browser-based UI (9000) and Gate – the API Gateway (8084) to access the Spinnaker UI and API.

SPINNAKER_INSTANCE_DNS_NAME=$(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --region ${AWS_REGION} --query "Stacks[].Outputs[?OutputKey=='SpinnakerInstance'].OutputValue" --output text)
echo 'ssh -A -L 9000:localhost:9000 -L 8084:localhost:8084 -L 8087:localhost:8087 -i ~/'${EC2_KEYPAIR_NAME}' ubuntu@$'{SPINNAKER_INSTANCE_DNS_NAME}''

1.3.2 Open a new terminal and use the SSH command (output from the previous command) to connect to the Spinnaker instance. After you successfully connect to the Spinnaker instance via SSH, access the Spinnaker UI here and API here.

2. Deploy a demo web application

Let’s make sure that we have the environment variables required in the shell before proceeding. If you’re using the same terminal window as before, then you might already have these variables.

STACK_NAME=spinnaker-blog
AWS_REGION=us-west-2 # use the same region as before
EC2_KEYPAIR_NAME=spinnaker-blog-${AWS_REGION}
VPC_ID=$(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --region ${AWS_REGION} --query "Stacks[].Outputs[?OutputKey=='VPCID'].OutputValue" --output text)

2.1 Create a Spinnaker Application

We start by creating an application in Spinnaker, a placeholder for the service that we deploy.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
--data-raw \
'{
   "job":[
      {
         "type":"createApplication",
         "application":{
            "cloudProviders":"aws",
            "instancePort":80,
            "name":"demoapp",
            "email":"[email protected]",
            "providerSettings":{
               "aws":{
                  "useAmiBlockDeviceMappings":true
               }
            }
         }
      }
   ],
   "application":"demoapp",
   "description":"Create Application: demoapp"
}'

2.2 Create an Application Load Balancer

Let’s create an Application Load Balanacer and a target group for port 80, spanning the three availability zones in our public subnet. We use the Demo-ALB-SecurityGroup for Firewalls to allow public access to the ALB on port 80.

As Spot Instances are interrupted with a two minute warning, you must adjust the Target Group’s deregistration delay to a slightly lower time. Recommended values are 90 seconds or less. This allows time for in-flight requests to complete and gracefully close existing connections before the instance is interrupted.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
--data-binary \
'{
   "application":"demoapp",
   "description":"Create Load Balancer: demoapp",
   "job":[
      {
         "type":"upsertLoadBalancer",
         "name":"demoapp-lb",
         "loadBalancerType":"application",
         "cloudProvider":"aws",
         "credentials":"my-aws-account",
         "region":"'"${AWS_REGION}"'",
         "vpcId":"'"${VPC_ID}"'",
         "subnetType":"public-subnet",
         "idleTimeout":60,
         "targetGroups":[
            {
               "name":"demoapp-targetgroup",
               "protocol":"HTTP",
               "port":80,
               "targetType":"instance",
               "healthCheckProtocol":"HTTP",
               "healthCheckPort":"traffic-port",
               "healthCheckPath":"/",
               "attributes":{
                  "deregistrationDelay":90
               }
            }
         ],
         "regionZones":[
            "'"${AWS_REGION}"'a",
            "'"${AWS_REGION}"'b",
            "'"${AWS_REGION}"'c"
         ],
         "securityGroups":[
            "Demo-ALB-SecurityGroup"
         ],
         "listeners":[
            {
               "protocol":"HTTP",
               "port":80,
               "defaultActions":[
                  {
                     "type":"forward",
                     "targetGroupName":"demoapp-targetgroup"
                 }
               ]
            }
         ]
      }
   ]
}'

2.3 Create a server group

Before creating a server group (Auto Scaling group), here is a brief overview of the features used in the example:

- - onDemandBaseCapacity (default 0): The minimum amount of your ASG’s capacity that must be fulfilled by On-Demand instances (can also be applied toward Reserved Instances or Savings Plans). The example uses an onDemandBaseCapacity of three.
  - onDemandPercentageAboveBaseCapacity (default 100): The percentages of On-Demand and Spot Instances for additional capacity beyond OnDemandBaseCapacity. The example uses onDemandPercentageAboveBaseCapacity of 10% (i.e. 90% Spot).
  - spotAllocationStrategy: This indicates how you want to allocate instances across Spot Instance pools in each Availability Zone. The example uses the recommended Capacity Optimized strategy. Instances are launched from optimal Spot pools that are chosen based on the available Spot capacity for the number of instances that are launching.
  - launchTemplateOverridesForInstanceType: The list of instance types that are acceptable for your workload. Specifying multiple instance types enables tapping into multiple instance pools in multiple Availability Zones, designed to enhance your service’s availability. You can use the ec2-instance-selector, an open source AWS Command Line Interface(CLI) tool to narrow down the instance types based on resource criteria like vcpus and memory.

- - capacityRebalance: When enabled, this feature proactively manages the EC2 Spot Instance lifecycle leveraging the new EC2 Instance rebalance recommendation. This increases the emphasis on availability by automatically attempting to replace Spot Instances in an ASG before they are interrupted by Amazon EC2. We enable this feature in this example.

Learn more on spinnaker.io: feature descriptions and use cases and sample API requests.

Let’s create a server group with a desired capacity of 12 instances diversified across current and previous generation instance types, attach the previously created ALB, use Demo-EC2-SecurityGroup for the Firewalls which allows http traffic only from the ALB, use the following bash script for UserData to install httpd, and add instance metadata into the index.html.

2.3.1 Save the userdata bash script into a file user-date.sh.

Note that Spinnaker only support base64 encoded userdata. We use base64 bash command to encode the file contents in the next step.

cat << "EOF" > user-data.sh
#!/bin/bash
yum update -y
yum install httpd -y
echo "<html>
    <head>
        <title>Demo Application</title>
        <style>body {margin-top: 40px; background-color: #Gray;} </style>
    </head>
    <body>
        <h2>You have reached a Demo Application running on</h2>
        <ul>
            <li>instance-id: <b> `curl http://169.254.169.254/latest/meta-data/instance-id` </b></li>
            <li>instance-type: <b> `curl http://169.254.169.254/latest/meta-data/instance-type` </b></li>
            <li>instance-life-cycle: <b> `curl http://169.254.169.254/latest/meta-data/instance-life-cycle` </b></li>
            <li>availability-zone: <b> `curl http://169.254.169.254/latest/meta-data/placement/availability-zone` </b></li>
        </ul>
    </body>
</html>" > /var/www/html/index.html
systemctl start httpd
systemctl enable httpd
EOF

2.3.2 Create the server group by running the following command. Note we use the KeyPairName that we created as part of the prerequisites.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
-d \
'{
   "job":[
      {
         "type":"createServerGroup",
         "cloudProvider":"aws",
         "account":"my-aws-account",
         "application":"demoapp",
         "stack":"",
         "credentials":"my-aws-account",
	"healthCheckType": "ELB",
	"healthCheckGracePeriod":600,
	"capacityRebalance": true,
         "onDemandBaseCapacity":3, 
         "onDemandPercentageAboveBaseCapacity":10,
         "spotAllocationStrategy":"capacity-optimized",
         "setLaunchTemplate":true,
         "launchTemplateOverridesForInstanceType":[
            {
               "instanceType":"m4.large"
            },
            {
               "instanceType":"m5.large"
            },
            {
               "instanceType":"m5a.large"
            },
            {
               "instanceType":"m5ad.large"
            },
            {
               "instanceType":"m5d.large"
            },
            {
               "instanceType":"m5dn.large"
            },
            {
               "instanceType":"m5n.large"
            }

         ],
         "capacity":{
            "min":6,
            "max":21,
            "desired":12
         },
         "subnetType":"private-subnet",
         "availabilityZones":{
            "'"${AWS_REGION}"'":[
               "'"${AWS_REGION}"'a",
               "'"${AWS_REGION}"'b",
               "'"${AWS_REGION}"'c"
            ]
         },
         "keyPair":"'"${EC2_KEYPAIR_NAME}"'",
         "securityGroups":[
            "Demo-EC2-SecurityGroup"
         ],
         "instanceType":"m5.large",
         "virtualizationType":"hvm",
         "amiName":"'"$(aws ec2 describe-images --owners amazon --filters "Name=name,Values=amzn2-ami-hvm-2*x86_64-gp2" --query 'reverse(sort_by(Images, &CreationDate))[0].Name' --region ${AWS_REGION} --output text)"'",
         "targetGroups":[
            "demoapp-targetgroup"
         ],
         "base64UserData":"'"$(base64 user-data.sh)"'",,
        "associatePublicIpAddress":false,
         "instanceMonitoring":false
      }
   ],
   "application":"demoapp",
   "description":"Create New server group in cluster demoapp"
}'

Spinnaker creates an Amazon EC2 Launch Template and an ASG with specified parameters and waits until the ALB health check passes before sending traffic to the EC2 Instances.

The server group and launch template that we just created will look like this in Spinnaker UI:

The UI also displays capacity type, such as the purchase option for each instance type in the Instance Information section:

3. Access the application

Copy the Application Load Balancer URL by selecting the tree icon in the right top corner of the server group, and access it in a browser. You can refresh multiple times to see that the requests are going to different instances every time.

Congratulations! You successfully deployed the demo application on an Amazon EC2 server group diversified across multiple instance types and purchase options.

Moreover, you can clone, modify, disable, and destroy these server groups, as well as use them with Spinnaker pipelines to effectively release new versions of your application.

Cost savings

Check the savings you realized by deploying your demo application on EC2 Spot Instances by going to EC2 console > Spot Requests > Saving Summary.

Cleanup

To avoid incurring any additional charges, clean up the resources created in the tutorial.

Frist, delete the server group, application load balancer and application in Spinnaker.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
--data-raw \
'{
   "job":[
      {
         "reason":"Cleanup",
         "asgName":"demoapp-v000",
         "moniker":{
            "app":"demoapp",
            "cluster":"demoapp",
            "sequence":0
         },
         "serverGroupName":"demoapp-v000",
         "type":"destroyServerGroup",
         "region":"'"${AWS_REGION}"'",
         "credentials":"my-aws-account",
         "cloudProvider":"aws"
      },
      {
         "cloudProvider":"aws",
         "loadBalancerName":"demoapp-lb",
         "loadBalancerType":"application",
         "regions":[
            "'"${AWS_REGION}"'"
         ],
         "credentials":"my-aws-account",
         "vpcId":"'"${VPC_ID}"'",
         "type":"deleteLoadBalancer"
      },
      {
         "type":"deleteApplication",
         "application":{
            "name":"demoapp",
            "cloudProviders":"aws"
         }
      }
   ],
   "application":"demoapp",
   "description":"Deleting ServerGroup, ALB and Application: demoapp"
}'

Wait for Spinnaker to delete all of the resources before proceeding further. You can confirm this either on the Spinnaker UI or AWS Management Console.

Then delete the Spinnaker infrastructure by running the following command:

aws ec2 delete-key-pair --key-name ${EC2_KEYPAIR_NAME} --region ${AWS_REGION}
rm ~/${EC2_KEYPAIR_NAME}.pem
aws s3api delete-objects \
--bucket ${S3_BUCKET_NAME} \
--delete "$(aws s3api list-object-versions \
--bucket ${S3_BUCKET_NAME} \
--query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')" #If error occurs, there are no Versions and is OK
aws s3api delete-objects \
--bucket ${S3_BUCKET_NAME} \
--delete "$(aws s3api list-object-versions \
--bucket ${S3_BUCKET_NAME} \
--query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}')" #If error occurs, there are no DeleteMarkers and is OK
aws s3 rb s3://${S3_BUCKET_NAME} --force #Delete Bucket
aws cloudformation delete-stack --region ${AWS_REGION} --stack-name ${STACK_NAME}

Conclusion

In this post, we learned about the new Amazon EC2 features recently added to Spinnaker, and how to use them to build diversified and optimized Auto Scaling Groups. We also discussed recommended best practices for EC2 Spot and how they can improve your experience with it.

We would love to hear from you! Tell us about other Continuous Integration/Continuous Delivery (CI/CD) platforms that you want to use with EC2 Spot and/or Auto Scaling Groups by adding an issue on the Spot integrations roadmap.

Unit Testing AWS Lambda with Python and Mock AWS Services

2023-03-22 Kevin Hakanson

Post Syndicated from Kevin Hakanson original https://aws.amazon.com/blogs/devops/unit-testing-aws-lambda-with-python-and-mock-aws-services/

When building serverless event-driven applications using AWS Lambda, it is best practice to validate individual components. Unit testing can quickly identify and isolate issues in AWS Lambda function code. The techniques outlined in this blog demonstrates unit test techniques for Python-based AWS Lambda functions and interactions with AWS Services.

The full code for this blog is available in the GitHub project as a demonstrative example.

Example use case

Let’s consider unit testing a serverless application which provides an API endpoint to generate a document. When the API endpoint is called with a customer identifier and document type, the Lambda function retrieves the customer’s name from DynamoDB, then retrieves the document text from DynamoDB for the given document type, finally generating and writing the resulting document to S3.

Figure 1. Example application architecture

Amazon API Gateway provides an endpoint to request the generation of a document for a given customer. A document type and customer identifier are provided in this API call.
The endpoint invokes an AWS Lambda function that generates a document using the customer identifier and the document type provided.
An Amazon DynamoDB table stores the contents of the documents and the users name, which are retrieved by the Lambda function.
The resulting text document is stored to Amazon S3.

Our testing goal is to determine if an isolated “unit” of code works as intended. In this blog, we will be writing tests to provide confidence that the logic written in the above AWS Lambda function behaves as we expect. We will mock the service integrations to Amazon DynamoDB and S3 to isolate and focus our tests on the Lambda function code, and not on the behavior of the AWS Services.

Define the AWS Service resources in the Lambda function

Before writing our first unit test, let’s look at the Lambda function that contains the behavior we wish to test. The full code for the Lambda function is available in the GitHub repository as src/sample_lambda/app.py.

As part of our Best practices for working AWS Lambda functions, we recommend initializing AWS service resource connections outside of the handler function and in the global scope. Additionally, we can retrieve any relevant environment variables in the global scope so that subsequent invocations of the Lambda function do not repeatedly need to retrieve them. For organization, we can put the resource and variables in a dictionary:

_LAMBDA_DYNAMODB_RESOURCE = { "resource" : resource('dynamodb'), 
                              "table_name" : environ.get("DYNAMODB_TABLE_NAME","NONE") }

However, globally scoped code and global variables are challenging to test in Python, as global statements are executed on import, and outside of the controlled test flow. To facilitate testing, we define classes for supporting AWS resource connections that we can override (patch) during testing. These classes will accept a dictionary containing the boto3 resource and relevant environment variables.

For example, we create a DynamoDB resource class with a parameter “boto3_dynamodb_resource” that accepts a boto3 resource connected to DynamoDB:

class LambdaDynamoDBClass:
    def __init__(self, lambda_dynamodb_resource):
        self.resource = lambda_dynamodb_resource["resource"]
        self.table_name = lambda_dynamodb_resource["table_name"]
        self.table = self.resource.Table(self.table_name)

Build the Lambda Handler

The Lambda function handler is the method in the AWS Lambda function code that processes events. When the function is invoked, Lambda runs the handler method. When the handler exits or returns a response, it becomes available to process another event.

To facilitate unit test of the handler function, move as much of logic as possible to other functions that are then called by the Lambda hander entry point. Also, pass the AWS resource global variables to these subsequent function calls. This approach enables us to mock and intercept all resources and calls during test.

In our example, the handler references the global variables, and instantiates the resource classes to setup the connections to specific AWS resources. (We will be able to override and mock these connections during unit test.)

Then the handler calls the create_letter_in_s3 function to perform the steps of creating the document, passing the resource classes. This downstream function avoids directly referencing the global context or any AWS resource connections directly.

def lambda_handler(event: APIGatewayProxyEvent, context: LambdaContext) -> Dict[str, Any]:

    global _LAMBDA_DYNAMODB_RESOURCE
    global _LAMBDA_S3_RESOURCE

    dynamodb_resource_class = LambdaDynamoDBClass(_LAMBDA_DYNAMODB_RESOURCE)
    s3_resource_class = LambdaS3Class(_LAMBDA_S3_RESOURCE)

    return create_letter_in_s3(
            dynamo_db = dynamodb_resource_class,
            s3 = s3_resource_class,
            doc_type = event["pathParameters"]["docType"],
            cust_id = event["pathParameters"]["customerId"])

Unit testing with mock AWS services

Our Lambda function code has now been written and is ready to be tested, let’s take a look at the unit test code! The full code for the unit test is available in the GitHub repository as tests/unit/src/test_sample_lambda.py.

In production, our Lambda function code will directly access the AWS resources we defined in our function handler; however, in our unit tests we want to isolate our code and replace the AWS resources with simulations. This isolation facilitates running unit tests in an isolated environment to prevent accidental access to actual cloud resources.

Moto is a python library for Mocking AWS Services that we will be using to simulate AWS resource our tests. Moto supports many AWS resources, and it allows you to test your code with little or no modification by emulating functionality of these services.

Moto uses decorators to intercept and simulate responses to and from AWS resources. By adding a decorator for a given AWS service, subsequent calls from the module to that service will be re-directed to the mock.

@moto.mock_dynamodb
@moto.mock_s3

Configure Test Setup and Tear-down

The mocked AWS resources will be used during the unit test suite. Using the setUp() method allows you to define and configure the mocked global AWS Resources before the tests are run.

We define the test class and a setUp() method and initialize the mock AWS resource. This includes configuring the resource to prepare it for testing, such as defining a mock DynamoDB table or creating a mock S3 Bucket.

class TestSampleLambda(TestCase):
    def setUp(self) -> None:
        dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
        dynamodb.create_table(
            TableName = self.test_ddb_table_name,
            KeySchema = [{"AttributeName": "PK", "KeyType": "HASH"}],
            AttributeDefinitions = [{"AttributeName": "PK", 
                                     "AttributeType": "S"}],
            BillingMode = 'PAY_PER_REQUEST'
           
        s3_client = boto3.client('s3', region_name="us-east-1")
        s3_client.create_bucket(Bucket = self.test_s3_bucket_name )

After creating the mocked resources, the setup function creates resource class object referencing those mocked resources, which will be used during testing.

        mocked_dynamodb_resource = resource("dynamodb")
        mocked_s3_resource = resource("s3")
        mocked_dynamodb_resource = { "resource" : resource('dynamodb'),
                                     "table_name" : self.test_ddb_table_name  }
        mocked_s3_resource = { "resource" : resource('s3'),
                               "bucket_name" : self.test_s3_bucket_name }
        self.mocked_dynamodb_class = LambdaDynamoDBClass(mocked_dynamodb_resource)
        self.mocked_s3_class = LambdaS3Class(mocked_s3_resource)

Test #1: Verify the code writes the document to S3

Our first test will validate our Lambda function writes the customer letter to an S3 bucket in the correct manner. We will follow the standard test format of arrange, act, assert when writing this unit test.

Arrange the data we need in the DynamoDB table:

def test_create_letter_in_s3(self) -> None:
    
    self.mocked_dynamodb_class.table.put_item(Item={"PK":"D#UnitTestDoc",
                                                        "data":"Unit Test Doc Corpi"})
    self.mocked_dynamodb_class.table.put_item(Item={"PK":"C#UnitTestCust",
                                                        "data":"Unit Test Customer"})

Act by calling the create_letter_in_s3 function. During these act calls, the test passes the AWS resources as created in the setUp().

    test_return_value = create_letter_in_s3(
                        dynamo_db = self.mocked_dynamodb_class,
                        s3=self.mocked_s3_class,
                        doc_type = "UnitTestDoc",
                        cust_id = "UnitTestCust"
                        )

Assert by reading the data written to the mock S3 bucket, and testing conformity to what we are expecting:

bucket_key = "UnitTestCust/UnitTestDoc.txt"
    body = self.mocked_s3_class.bucket.Object(bucket_key).get()['Body'].read()

    self.assertEqual(test_return_value["statusCode"], 200)
    self.assertIn("UnitTestCust/UnitTestDoc.txt", test_return_value["body"])
    self.assertEqual(body.decode('ascii'),"Dear Unit Test Customer;\nUnit Test Doc Corpi")

Tests #2 and #3: Data not found error conditions

We can also test error conditions and handling, such as keys not found in the database. For example, if a customer identifier is submitted, but does not exist in the database lookup, does the logic handle this and return a “Not Found” code of 404?

To test this in test #2, we add data to the mocked DynamoDB table, but then submit a customer identifier that is not in the database.

This test, and a similar test #3 for “Document Types not found”, are implemented in the example test code on GitHub.

Test #4: Validate the handler interface

As the application logic resides in independently tested functions, the Lambda handler function provides only interface validation and function call orchestration. Therefore, the test for the handler validates that the event is parsed correctly, any functions are invoked as expected, and the return value is passed back.

To emulate the global resource variables and other functions, patch both the global resource classes and logic functions.

    @patch("src.sample_lambda.app.LambdaDynamoDBClass")
    @patch("src.sample_lambda.app.LambdaS3Class")
    @patch("src.sample_lambda.app.create_letter_in_s3")
    def test_lambda_handler_valid_event_returns_200(self,
                            patch_create_letter_in_s3 : MagicMock,
                            patch_lambda_s3_class : MagicMock,
                            patch_lambda_dynamodb_class : MagicMock
                            ):

Arrange for the test by setting return values for the patched objects.

patch_lambda_dynamodb_class.return_value = self.mocked_dynamodb_class
        patch_lambda_s3_class.return_value = self.mocked_s3_class

        return_value_200 = {"statusCode" : 200, "body":"OK"}
        patch_create_letter_in_s3.return_value = return_value_200

We need to provide event data when invoking the Lambda handler. A good practice is to save test events as separate JSON files, rather than placing them inline as code. In the example project, test events are located in the folder “tests/events/”. During test execution, the event object is created from the JSON file using the utility function named load_sample_event_from_file.

test_event = self.load_sample_event_from_file("sampleEvent1")

Act by calling the lambda_handler function.

test_return_value = lambda_handler(event=test_event, context=None)

Assert by ensuring the create_letter_in_s3 function is called with the expected parameters based on the event, and a create_letter_in_s3 function return value is passed back to the caller. In our example, this value is simply passed with no alterations.

patch_create_letter_in_s3.assert_called_once_with(
                                        dynamo_db=self.mocked_dynamodb_class,
                                        s3=self.mocked_s3_class,
                                        doc_type=test_event["pathParameters"]["docType"],
                                        cust_id=test_event["pathParameters"]["customerId"])

       self.assertEqual(test_return_value, return_value_200)

Tear Down

The tearDown() method is called immediately after the test method has been run and the result is recorded. In our example tearDown() method, we clean up any data or state created so the next test won’t be impacted.

Running the unit tests

The unittest Unit testing framework can be run using the Python pytest utility. To ensure network isolation and verify the unit tests are not accidently connecting to AWS resources, the pytest-socket project provides the ability to disable network communication during a test.

pytest -v --disable-socket -s tests/unit/src/

The pytest command results in a PASSED or FAILED status for each test. A PASSED status verifies that your unit tests, as written, did not encounter errors or issues,

Conclusion

Unit testing is a software development process in which different parts of an application, called units, are individually and independently tested. Tests validate the quality of the code and confirm that it functions as expected. Other developers can gain familiarity with your code base by consulting the tests. Unit tests reduce future refactoring time, help engineers get up to speed on your code base more quickly, and provide confidence in the expected behaviour.

We’ve seen in this blog how to unit test AWS Lambda functions and mock AWS Services to isolate and test individual logic within our code.

AWS Lambda Powertools for Python has been used in the project to validate hander events. Powertools provide a suite of utilities for AWS Lambda functions to ease adopting best practices such as tracing, structured logging, custom metrics, idempotency, batching, and more.

Learn more about AWS Lambda testing in our prescriptive test guidance, and find additional test examples on GitHub. For more serverless learning resources, visit Serverless Land.

About the authors:

Integrating with GitHub Actions – Amazon CodeGuru in your DevSecOps Pipeline

2023-03-22 Mahesh Biradar

Post Syndicated from Mahesh Biradar original https://aws.amazon.com/blogs/devops/integrating-with-github-actions-amazon-codeguru-in-your-devsecops-pipeline/

Many organizations have adopted DevOps practices to streamline and automate software delivery and IT operations. A DevOps model can be adopted without sacrificing security by using automated compliance policies, fine-grained controls, and configuration management techniques. However, one of the key challenges customers face is analyzing code and detecting any vulnerabilities in the code pipeline due to a lack of access to the right tool. Amazon CodeGuru addresses this challenge by using machine learning and automated reasoning to identify critical issues and hard-to-find bugs during application development and deployment, thus improving code quality.

We discussed how you can build a CI/CD pipeline to deploy a web application in our previous post “Integrating with GitHub Actions – CI/CD pipeline to deploy a Web App to Amazon EC2”. In this post, we will use that pipeline to include security checks and integrate it with Amazon CodeGuru Reviewer to analyze and detect potential security vulnerabilities in the code before deploying it.

Amazon CodeGuru Reviewer helps you improve code security and provides recommendations based on common vulnerabilities (OWASP Top 10) and AWS security best practices. CodeGuru analyzes Java and Python code and provides recommendations for remediation. CodeGuru Reviewer detects a deviation from best practices when using AWS APIs and SDKs, and also identifies concurrency issues, resource leaks, security vulnerabilities and validates input parameters. For every workflow run, CodeGuru Reviewer’s GitHub Action copies your code and build artifacts into an S3 bucket and calls CodeGuru Reviewer APIs to analyze the artifacts and provide recommendations. Refer to the code detector library here for more information about CodeGuru Reviewer’s security and code quality detectors.

With GitHub Actions, developers can easily integrate CodeGuru Reviewer into their CI workflows, conducting code quality and security analysis. They can view CodeGuru Reviewer recommendations directly within the GitHub user interface to quickly identify and fix code issues and security vulnerabilities. Any pull request or push to the master branch will trigger a scan of the changed lines of code, and scheduled pipeline runs will trigger a full scan of the entire repository, ensuring comprehensive analysis and continuous improvement.

Solution overview

The solution comprises of the following components:

GitHub Actions – Workflow Orchestration tool that will host the Pipeline.
AWS CodeDeploy – AWS service to manage deployment on Amazon EC2 Autoscaling Group.
AWS Auto Scaling – AWS service to help maintain application availability and elasticity by automatically adding or removing Amazon EC2 instances.
Amazon EC2 – Destination Compute server for the application deployment.
Amazon CodeGuru – AWS Service to detect security vulnerabilities and automate code reviews.
AWS CloudFormation – AWS infrastructure as code (IaC) service used to orchestrate the infrastructure creation on AWS.
AWS Identity and Access Management (IAM) OIDC identity provider – Federated authentication service to establish trust between GitHub and AWS to allow GitHub Actions to deploy on AWS without maintaining AWS Secrets and credentials.
Amazon Simple Storage Service (Amazon S3) – Amazon S3 to store deployment and code scan artifacts.

The following diagram illustrates the architecture:

Figure 1. Architecture Diagram of the proposed solution in the blog

Developer commits code changes from their local repository to the GitHub repository. In this post, the GitHub action is triggered manually, but this can be automated.
GitHub action triggers the build stage.
GitHub’s Open ID Connector (OIDC) uses the tokens to authenticate to AWS and access resources.
GitHub action uploads the deployment artifacts to Amazon S3.
GitHub action invokes Amazon CodeGuru.
The source code gets uploaded into an S3 bucket when the CodeGuru scan starts.
GitHub action invokes CodeDeploy.
CodeDeploy triggers the deployment to Amazon EC2 instances in an Autoscaling group.
CodeDeploy downloads the artifacts from Amazon S3 and deploys to Amazon EC2 instances.

Prerequisites

This blog post is a continuation of our previous post – Integrating with GitHub Actions – CI/CD pipeline to deploy a Web App to Amazon EC2. You will need to setup your pipeline by following instructions in that blog.

After completing the steps, you should have a local repository with the below directory structure, and one completed Actions run.

Figure 2. Directory structure

To enable automated deployment upon git push, you will need to make a change to your .github/workflow/deploy.yml file. Specifically, you can activate the automation by modifying the following line of code in the deploy.yml file:

From:

workflow_dispatch: {}

To:

  #workflow_dispatch: {}
  push:
    branches: [ main ]
  pull_request:

Solution walkthrough

The following steps provide a high-level overview of the walkthrough:

Create an S3 bucket for the Amazon CodeGuru Reviewer.
Update the IAM role to include permissions for Amazon CodeGuru.
Associate the repository in Amazon CodeGuru.
Add Vulnerable code.
Update GitHub Actions Job to run the Amazon CodeGuru Scan.
Push the code to the repository.
Verify the pipeline.
Check the Amazon CodeGuru recommendations in the GitHub user interface.

1. Create an S3 bucket for the Amazon CodeGuru Reviewer

- When you run a CodeGuru scan, your code is first uploaded to an S3 bucket in your AWS account.

Note that CodeGuru Reviewer expects the S3 bucket name to begin with codeguru-reviewer-.

- You can create this bucket using the bucket policy outlined in this CloudFormation template (JSON or YAML) or by following these instructions.

2. Update the IAM role to add permissions for Amazon CodeGuru

Locate the role created in the pre-requisite section, named “CodeDeployRoleforGitHub”.
Next, create an inline policy by following these steps. Give it a name, such as “codegurupolicy” and add the following permissions to the policy.

{
    “Version”: “2012-10-17",
    “Statement”: [
        {
            “Action”: [
                “codeguru-reviewer:ListRepositoryAssociations”,
                “codeguru-reviewer:AssociateRepository”,
                “codeguru-reviewer:DescribeRepositoryAssociation”,
                “codeguru-reviewer:CreateCodeReview”,
                “codeguru-reviewer:DescribeCodeReview”,
                “codeguru-reviewer:ListRecommendations”,
                “iam:CreateServiceLinkedRole”
            ],
            “Resource”: “*”,
            “Effect”: “Allow”
        },
        {
            “Action”: [
                “s3:CreateBucket”,
                “s3:GetBucket*“,
                “s3:List*“,
                “s3:GetObject”,
                “s3:PutObject”,
                “s3:DeleteObject”
            ],
            “Resource”: [
                “arn:aws:s3:::codeguru-reviewer-*“,
                “arn:aws:s3:::codeguru-reviewer-*/*”
            ],
            “Effect”: “Allow”
        }
    ]
}

3. Associate the repository in Amazon CodeGuru

Follow the instructions here to associate your repo – https://docs.aws.amazon.com/codeguru/latest/reviewer-ug/create-github-association.html

Figure 3. Associate the repository

At this point, you will have completed your initial full analysis run. However, since this is a simple “helloWorld” program, you may not receive any recommendations. In the following steps, you will incorporate vulnerable code and trigger the analysis again, allowing CodeGuru to identify and provide recommendations for potential issues.

4. Add Vulnerable code

Create a file application.conf
at /aws-codedeploy-github-actions-deployment/spring-boot-hello-world-example

Add the following content in application.conf file.

db.default.url="postgres://test-ojxarsxivjuyjc:ubKveYbvNjQ5a0CU8vK4YoVIhl@ec2-54-225-223-40.compute-1.amazonaws.com:5432/dcectn1pto16vi?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"

db.default.url=${?DATABASE_URL}

db.default.port="3000"

db.default.datasource.username="root"

db.default.datasource.password="testsk_live_454kjkj4545FD3434Srere7878"

db.default.jpa.generate-ddl="true"

db.default.jpa.hibernate.ddl-auto="create"

5. Update GitHub Actions Job to run Amazon CodeGuru Scan

You will need to add a new job definition in the GitHub Actions’ yaml file. This new section should be inserted between the Build and Deploy sections for optimal workflow.
Additionally, you will need to adjust the dependency in the deploy section to reflect the new flow: Build -> CodeScan -> Deploy.
Review sample GitHub actions code for running security scan on Amazon CodeGuru Reviewer.

codescan:
    needs: build
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
      security-events: write

    steps:
    
    - name: Download an artifact
      uses: actions/download-artifact@v2
      with:
          name: build-file 
    
    - name: Configure AWS credentials
      id: iam-role
      continue-on-error: true
      uses: aws-actions/configure-aws-credentials@v1
      with:
          role-to-assume: ${{ secrets.IAMROLE_GITHUB }}
          role-session-name: GitHub-Action-Role
          aws-region: ${{ env.AWS_REGION }}
    
    - uses: actions/checkout@v2
      if: steps.iam-role.outcome == 'success'
      with:
        fetch-depth: 0 

    - name: CodeGuru Reviewer
      uses: aws-actions/[email protected]
      if: ${{ always() }} 
      continue-on-error: false
      with:          
        s3_bucket: ${{ env.S3bucket_CodeGuru }} 
        build_path: .

    - name: Store SARIF file
      if: steps.iam-role.outcome == 'success'
      uses: actions/upload-artifact@v2
      with:
        name: SARIF_recommendations
        path: ./codeguru-results.sarif.json

    - name: Upload review result
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: codeguru-results.sarif.json
    

    - run: |
          
          echo "Check for critical volnurability"
          count=$(cat codeguru-results.sarif.json | jq '.runs[].results[] | select(.level == "error") | .level' | wc -l)
          if (( $count > 0 )); then
            echo "There are $count critical findings, hence stopping the pipeline."
            exit 1
          fi

Refer to the complete file provided below for your reference. It is important to note that you will need to replace the following environment variables with your specific values.
- S3bucket_CodeGuru
- AWS_REGION
- S3BUCKET

name: Build and Deploy

on:
    #workflow_dispatch: {}
  push:
    branches: [ main ]
  pull_request:

env:
  applicationfolder: spring-boot-hello-world-example
  AWS_REGION: us-east-1 # <replace this with your AWS region>
  S3BUCKET: *<Replace your bucket name here>*
  S3bucket_CodeGuru: codeguru-reviewer-<*replacebucketnameher*> # S3 Bucket with "codeguru-reviewer-*" prefix


jobs:
  build:
    name: Build and Package
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v2
        name: Checkout Repository

      - uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: ${{ secrets.IAMROLE_GITHUB }}
          role-session-name: GitHub-Action-Role
          aws-region: ${{ env.AWS_REGION }}

      - name: Set up JDK 1.8
        uses: actions/setup-java@v1
        with:
          java-version: 1.8

      - name: chmod
        run: chmod -R +x ./.github

      - name: Build and Package Maven
        id: package
        working-directory: ${{ env.applicationfolder }}
        run: $GITHUB_WORKSPACE/.github/scripts/build.sh

      - name: Upload Artifact to s3
        working-directory: ${{ env.applicationfolder }}/target
        run: aws s3 cp *.war s3://${{ env.S3BUCKET }}/
      
      - name: Artifacts for codescan action
        uses: actions/upload-artifact@v2
        with:
          name: build-file
          path: ${{ env.applicationfolder }}/target/*.war           

  codescan:
    needs: build
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
      security-events: write

    steps:
    
    - name: Download an artifact
      uses: actions/download-artifact@v2
      with:
          name: build-file 
    
    - name: Configure AWS credentials
      id: iam-role
      continue-on-error: true
      uses: aws-actions/configure-aws-credentials@v1
      with:
          role-to-assume: ${{ secrets.IAMROLE_GITHUB }}
          role-session-name: GitHub-Action-Role
          aws-region: ${{ env.AWS_REGION }}
    
    - uses: actions/checkout@v2
      if: steps.iam-role.outcome == 'success'
      with:
        fetch-depth: 0 

    - name: CodeGuru Reviewer
      uses: aws-actions/[email protected]
      if: ${{ always() }} 
      continue-on-error: false
      with:          
        s3_bucket: ${{ env.S3bucket_CodeGuru }} 
        build_path: .

    - name: Store SARIF file
      if: steps.iam-role.outcome == 'success'
      uses: actions/upload-artifact@v2
      with:
        name: SARIF_recommendations
        path: ./codeguru-results.sarif.json

    - name: Upload review result
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: codeguru-results.sarif.json
    

    - run: |
          
          echo "Check for critical volnurability"
          count=$(cat codeguru-results.sarif.json | jq '.runs[].results[] | select(.level == "error") | .level' | wc -l)
          if (( $count > 0 )); then
            echo "There are $count critical findings, hence stopping the pipeline."
            exit 1
          fi
  deploy:
    needs: codescan
    runs-on: ubuntu-latest
    environment: Dev
    permissions:
      id-token: write
      contents: read
    steps:
    - uses: actions/checkout@v2
    - uses: aws-actions/configure-aws-credentials@v1
      with:
        role-to-assume: ${{ secrets.IAMROLE_GITHUB }}
        role-session-name: GitHub-Action-Role
        aws-region: ${{ env.AWS_REGION }}
    - run: |
        echo "Deploying branch ${{ env.GITHUB_REF }} to ${{ github.event.inputs.environment }}"
        commit_hash=`git rev-parse HEAD`
        aws deploy create-deployment --application-name CodeDeployAppNameWithASG --deployment-group-name CodeDeployGroupName --github-location repository=$GITHUB_REPOSITORY,commitId=$commit_hash --ignore-application-stop-failures

6. Push the code to the repository:

Remember to save all the files that you have modified.
To ensure that you are in your git repository folder, you can run the command:

git remote -v

The command should return the remote branch address, which should be similar to the following:

username@3c22fb075f8a GitActionsDeploytoAWS % git remote -v
 origin	[email protected]:<username>/GitActionsDeploytoAWS.git (fetch)
 origin	[email protected]:<username>/GitActionsDeploytoAWS.git (push)

To push your code to the remote branch, run the following commands:


git add . 
git commit -m “Adding Security Scan” 
git push

Your code has been pushed to the repository and will trigger the workflow as per the configuration in GitHub Actions.

7. Verify the pipeline

Your pipeline is set up to fail upon the detection of a critical vulnerability. You can also suppress recommendations from CodeGuru Reviewer if you think it is not relevant for setup. In this example, as there are two critical vulnerabilities, the pipeline will not proceed to the next step.
To view the status of the pipeline, navigate to the Actions tab on your GitHub console. You can refer to the following image for guidance.

Figure 4. GitHub Actions pipeline

To view the details of the error, you can expand the “codescan” job in the GitHub Actions console. This will provide you with more information about the specific vulnerabilities that caused the pipeline to fail and help you to address them accordingly.

Figure 5. Codescan actions logs

8. Check the Amazon CodeGuru recommendations in the GitHub user interface

Once you have run the CodeGuru Reviewer Action, any security findings and recommendations will be displayed on the Security tab within the GitHub user interface. This will provide you with a clear and convenient way to view and address any issues that were identified during the analysis.

Figure 6. Security tab with results

Clean up

To avoid incurring future charges, you should clean up the resources that you created.

Empty the Amazon S3 bucket.
Delete the CloudFormation stack (CodeDeployStack) from the AWS console.
Delete codeguru Amazon S3 bucket.
Disassociate the GitHub repository in CodeGuru Reviewer.
Delete the GitHub Secret (‘IAMROLE_GITHUB’)
1. Go to the repository settings on GitHub Page.
2. Select Secrets under Actions.
3. Select IAMROLE_GITHUB, and delete it.

Conclusion

Amazon CodeGuru is a valuable tool for software development teams looking to improve the quality and efficiency of their code. With its advanced AI capabilities, CodeGuru automates the manual parts of code review and helps identify performance, cost, security, and maintainability issues. CodeGuru also integrates with popular development tools and provides customizable recommendations, making it easy to use within existing workflows. By using Amazon CodeGuru, teams can improve code quality, increase development speed, lower costs, and enhance security, ultimately leading to better software and a more successful overall development process.

In this post, we explained how to integrate Amazon CodeGuru Reviewer into your code build pipeline using GitHub actions. This integration serves as a quality gate by performing code analysis and identifying challenges in your code. Now you can access the CodeGuru Reviewer recommendations directly within the GitHub user interface for guidance on resolving identified issues.

About the author:

AWS Chatbot Now Integrates With Microsoft Teams

2023-03-16 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-chatbot-now-integrates-with-microsoft-teams/

I am pleased to announce that, starting today, you can use AWS Chatbot to troubleshoot and operate your AWS resources from Microsoft Teams.

Communicating and collaborating on IT operation tasks through chat channels is known as ChatOps. It allows you to centralize the management of infrastructure and applications, as well as to automate and streamline your workflows. It helps to provide a more interactive and collaborative experience, as you can communicate and work with your colleagues in real time through a familiar chat interface to get the job done.

We launched AWS Chatbot in 2020 with Amazon Chime and Slack integrations. Since then, the landscape of chat platforms has evolved rapidly, and many of you are now using Microsoft Teams.

AWS Chatbot Benefits
When using AWS Chatbot for Microsoft Teams or other chat platforms, you receive notifications from AWS services directly in your chat channels, and you can take action on your infrastructure by typing commands without having to switch to another tool.

Typically you want to receive alerts about your system health, your budget, any new security threat or risk, or the status of your CI/CD pipelines. Sending a message to the chat channel is as simple as sending a message on an Amazon Simple Notification Service (Amazon SNS) topic. Thanks to the native integration between Amazon CloudWatch alarms and SNS, alarms are automatically delivered to your chat channels with no additional configuration step required. Similarly, thanks to the integration between Amazon EventBridge and SNS, any system or service that emits events to EventBridge can send information to your chat channels.

But ChatOps is more than the ability to spot problems as they arise. AWS Chatbot allows you to receive predefined CloudWatch dashboards interactively and retrieve Logs Insights logs to troubleshoot issues directly from the chat thread. You can also directly type in the chat channel most AWS Command Line Interface (AWS CLI) commands to retrieve additional telemetry data or resource information or to run runbooks to remediate the issues.

Typing and remembering long commands is difficult. With AWS Chatbot, you can define your own aliases to reference frequently used commands and their parameters. It reduces the number of steps to complete a task. Aliases are flexible and can contain one or more custom parameters injected at the time of the query.

And because chat channels are designed for conversation, you can also ask questions in natural language and have AWS Chatbot answer you with relevant extracts from the AWS documentation or support articles. Natural language understanding also allows you to make queries such as “show me my ec2 instances in eu-west-3.”

Let’s Configure the Integration Between AWS Chatbot and Microsoft Teams
Getting started is a two-step process. First, I configure my team in Microsoft Teams. As a Teams administrator, I add the AWS Chatbot application to the team, and I take note of the URL of the channel I want to use for receiving notifications and operating AWS resources from Microsoft Teams channels.

Second, I register Microsoft Teams channels in AWS Chatbot. I also assign IAM permissions on what channel members can do in this channel and associate SNS topics to receive notifications. I may configure AWS Chatbot with the AWS Management Console, an AWS CloudFormation template, or the AWS Cloud Development Kit (AWS CDK). For this demo, I choose to use the console.

I open the Management Console and navigate to the AWS Chatbot section. On the top right side of the screen, in the Configure a chat client box, I select Microsoft Teams and then Configure client.

I enter the Microsoft Teams channel URL I noted in the Teams app.

At this stage, Chatbot redirects my browser to Microsoft Teams for authentication. If I am already authenticated, I will be redirected back to the AWS console immediately. Otherwise, I enter my Microsoft Teams credentials and one-time password and wait to be redirected.

At this stage, my Microsoft Teams team is registered with AWS Chatbot and ready to add Microsoft Teams channels. I select Configure new channel.

There are four sections to enter the details of the configuration. In the first section, I enter a Configuration name for my channel. Optionally, I also define the Logging details. In the second section, I paste—again—the Microsoft Teams Channel URL.

In the third section, I configure the Permissions. I can choose between the same set of permissions for all Microsoft Teams users in my team, or I can set User-level roles permission to enable user-specific permissions in the channel. In this demo, I select Channel role, and I assign an IAM role to the channel. The role defines the permissions shared by all users in the channel. For example, I can assign a role that allows users to access configuration data from Amazon EC2 but not from Amazon S3. Under Channel role, I select Use an existing IAM role. Under Existing role, I select a role I created for my 2019 re:Invent talk about ChatOps: chatbot-demo. This role gives read-only access to all AWS services, but I could also assign other roles that would allow Chatbot users to take actions on their AWS resources.

To mitigate the risk that another person in your team accidentally grants more than the necessary privileges to the channel or user-level roles, you might also include Channel guardrail policies. These are the maximum permissions your users might have when using the channel. At runtime, the actual permissions are the intersection of the channel or user-level policies and the guardrail policies. Guardrail policies act like a boundary that channel users will never escape. The concept is similar to permission boundaries for IAM entities or service control policies (SCP) for AWS Organizations. In this example, I attach the ReadOnlyAccess managed policy.

The fourth and last section allows you to specify the SNS topic that will be the source for notifications sent to your team’s channel. Your applications or AWS services, such as CloudWatch alarms, can send messages to this topic, and AWS Chatbot will relay all messages to the configured Microsoft Teams channel. Thanks to the integration between Amazon EventBridge and SNS, any application able to send a message to EventBridge is able to send a message to Microsoft Teams.

For this demo, I select an existing SNS topic: alarmme in the us-east-1 Region. You can configure multiple SNS topics to receive alarms from various Regions. I then select Configure.

Let’s Test the Integration
That’s it. Now I am ready to test my setup.

On the AWS Chatbot configuration page, I first select the Send test message. I also have an alarm defined when my estimated billing goes over $500. On the CloudWatch section of the Management Console, I configure the alarm to post a message on the SNS topic shared with Microsoft Teams.

Within seconds, I receive the test message and the alarm message on the Microsoft Teams channel.

Then I type a command to understand where the billing alarm comes from. I want to understand how many EC2 instances are running.

On the chat client channel, I type @aws to select Chatbot as the destination, then the rest of the CLI command, as I would do in a terminal: ec2 describe-instances --region us-east-1 --filters "Name=architecture,Values=arm64_mac" --query "Reservations[].Instances[].InstanceId"

Chatbot answers within seconds.

I can create aliases for commands I frequently use. Aliases may have placeholder parameters that I can give at runtime, such as the Region name for example.

I create an alias to get the list of my macOS instance IDs with the command: aws alias create mac ec2 describe-instances --region $region --filters "Name=architecture,Values=arm64_mac" --query "Reservations[].Instances[].InstanceId"

Now, I can type @aws alias run mac us-east-1 as a shortcut to get the same result as above. I can also manage my aliases with the @aws alias list, @aws alias get, and @aws alias delete commands.

I don’t know about you, but for me it is hard to remember commands. When I use the terminal, I rely on auto-complete to remind me of various commands and their options. AWS Chatbot offers similar command completion and guides me to collect missing parameters.

When using AWS Chatbot, I can also ask questions using natural English language. It can help to find answers from the AWS docs and from support articles by typing questions such as @aws how can I tag my EC2 instances? or @aws how do I configure Lambda concurrency setting?

It can also find resources in my account when AWS Resource Explorer is activated. For example, I asked the bot: @aws what are the tags for my ec2 resources? and @aws what Regions do I have Lambda service?

And I received these responses.

Thanks to AWS Chatbot, I realized that I had a rogue Lambda function left in ca-central-1. I used the AWS console to delete it.

Available Now
You can start to use AWS Chatbot with Microsoft Teams today. AWS Chatbot for Microsoft Teams is available to download from Microsoft Teams app at no additional cost. AWS Chatbot is available in all public AWS Regions, at no additional charge. You pay for the underlying resources that you use. You might incur charges from your chat client.

Get started today and configure your first integration with Microsoft Teams.

— seb

Proactive Insights with Amazon DevOps Guru for RDS

2023-03-01 Kishore Dhamodaran

Post Syndicated from Kishore Dhamodaran original https://aws.amazon.com/blogs/devops/proactive-insights-with-amazon-devops-guru-for-rds/

Today, we are pleased to announce a new Amazon DevOps Guru for RDS capability: Proactive Insights. DevOps Guru for RDS is a fully-managed service powered by machine learning (ML), that uses the data collected by RDS Performance Insights to detect and alert customers of anomalous behaviors within Amazon Aurora databases. Since its release, DevOps Guru for RDS has empowered customers with information to quickly react to performance problems and to take corrective actions. Now, Proactive Insights adds recommendations related to operational issues that may prevent potential issues in the future.

Proactive Insights requires no additional set up for customers already using DevOps Guru for RDS, for both Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition.

The following are example use cases of operational issues available for Proactive Insights today, with more insights coming over time:

Long InnoDB History for Aurora MySQL-Compatible engines – Triggered when the InnoDB history list length becomes very large.
Temporary tables created on disk for Aurora MySQL-Compatible engines – Triggered when the ratio of temporary tables created versus all temporary tables breaches a threshold.
Idle In Transaction for Aurora PostgreSQL-Compatible engines – Triggered when sessions connected to the database are not performing active work, but can keep database resources blocked.

To get started, navigate to the Amazon DevOps Guru Dashboard where you can see a summary of your system’s overall health, including ongoing proactive insights. In the following screen capture, the number three indicates that there are three ongoing proactive insights. Click on that number to see the listing of the corresponding Proactive Insights, which may include RDS or other Proactive Insights supported by Amazon DevOps Guru.

Figure 1. Amazon DevOps Guru Dashboard where you can see a summary of your system’s overall health, including ongoing proactive insights.

Ongoing problems (including reactive and proactive insights) are also highlighted against your database instance on the Database list page in the Amazon RDS console.

Figure 2. Proactive and Reactive Insights are highlighted against your database instance on the Database list page in the Amazon RDS console.

In the following sections, we will dive deep on these use cases of DevOps Guru for RDS Proactive Insights.

Long InnoDB History for Aurora MySQL-Compatible engines

The InnoDB history list is a global list of the undo logs for committed transactions. MySQL uses the history list to purge records and log pages when transactions no longer require the history. If the InnoDB history list length grows too large, indicating a large number of old row versions, queries and even the database shutdown process can become slower.

DevOps Guru for RDS now detects when the history list length exceeds 1 million records and alerts users to close (either by commit or by rollback) any unnecessary long-running transactions before triggering database changes that involve a shutdown (this includes reboots and database version upgrades).

From the DevOps Guru console, navigate to Insights, choose Proactive, then choose “RDS InnoDB History List Length Anomalous” Proactive Insight with an ongoing status. You will notice that Proactive Insights provides an “Insight overview”, “Metrics” and “Recommendations”.

Insight overview provides you basic information on this insight. In our case, the history list for row changes increased significantly, which affects query and shutdown performance.

Figure 3. Long InnoDB History for Aurora MySQL-Compatible engines Insight overview.

The Metrics panel gives you a graphical representation of the history list length and the timeline, allowing you to correlate it with any anomalous application activity that may have occurred during this window.

Figure 4. Long InnoDB History for Aurora MySQL-Compatible engines Metrics panel.

The Recommendations section suggests actions that you can take to mitigate this issue before it leads to a bigger problem. You will also notice the rationale behind the recommendation under the “Why is DevOps Guru recommending this?” column.

Figure 5. The Recommendations section suggests actions that you can take to mitigate this issue before it leads to a bigger problem.

Temporary tables created on disk for Aurora MySQL-Compatible engines

Sometimes it is necessary for the MySQL database to create an internal temporary table while processing a query. An internal temporary table can be held in memory and processed by the TempTable or MEMORY storage engine, or stored on disk by the InnoDB storage engine. An increase of temporary tables created on disk instead of in memory can impact the database performance.

DevOps Guru for RDS now monitors the rate at which the database creates temporary tables and the percentage of those temporary tables that use disk. When these values cross recommended levels over a given period of time, DevOps Guru for RDS creates an insight exposing this situation before it becomes critical.

From the DevOps Guru console, navigate to Insights, choose Proactive, then choose “RDS Temporary Tables On Disk Anomalous” Proactive Insight with an ongoing status. You will notice this Proactive Insight provides an “Insight overview”, “Metrics” and “Recommendations”.

Insight overview provides you basic information on this insight. In our case, more than 58% of the total temporary tables created per second were using disk, with a sustained rate of two temporary tables on disk created every second, which indicates that query performance is degrading.

Figure 6. Temporary tables created on disk insight overview.

The Metrics panel shows you a graphical representation of the information specific for this insight. You will be presented with the evolution of the amount of temporary tables created on disk per second, the percentage of temporary tables on disk (out of the total number of database-created temporary tables), and of the overall rate at which the temporary tables are created (per second).

Figure 7. Temporary tables created on disk – evolution of the amount of temporary tables created on disk per second.

Temporary tables created on disk the percentage of temporary tables on disk (out of the total number of database-created temporary tables)

Figure 8. Temporary tables created on disk – the percentage of temporary tables on disk (out of the total number of database-created temporary tables).

Figure 9. Temporary tables created on disk – overall rate at which the temporary tables are created (per second).

The Recommendations section suggests actions to avoid this situation when possible, such as not using BLOB and TEXT data types, tuning tmp_table_size and max_heap_table_size database parameters, data set reduction, columns indexing and more.

Temporary tables created on disk actions to avoid this situation when possible, such as not using BLOB and TEXT data types, tuning tmp_table_size and max_heap_table_size database parameters, data set reduction, columns indexing and more

Figure 10. Temporary tables created on disk – actions to avoid this situation when possible, such as not using BLOB and TEXT data types, tuning tmp_table_size and max_heap_table_size database parameters, data set reduction, columns indexing and more.

Additional explanations on this use case can be found by clicking on the “View troubleshooting doc” link.

Idle In Transaction for Aurora PostgreSQL-Compatible engines

A connection that has been idle in transaction for too long can impact performance by holding locks, blocking other queries, or by preventing VACUUM (including autovacuum) from cleaning up dead rows.
PostgreSQL database requires periodic maintenance, which is known as vacuuming. Autovacuum in PostgreSQL automates the execution of VACUUM and ANALYZE commands. This process gathers the table statistics and deletes the dead rows. When vacuuming does not occur, this negatively impacts the database performance. It leads to an increase in table and index bloat (the disk space that was used by a table or index and is available for reuse by the database but has not been reclaimed), leads to stale statistics and can even end in transaction wraparound (when the number of unique transaction ids reaches its maximum of about two billion).

DevOps Guru for RDS monitors the time spent by sessions in an Aurora PostgreSQL database in idle in transaction state and raises initially a warning notification, followed by an alarm notification if the idle in transaction state continues (the current thresholds are 1800 seconds for the warning and 3600 seconds for the alarm).

From the DevOps Guru console, navigate to Insights, choose Proactive, then choose “RDS Idle In Transaction Max Time Anomalous” Proactive Insight with an ongoing status. You will notice this Proactive Insights provides an “Insight overview”, “Metrics” and “Recommendations”.

In our case, a connection has been in “idle in transaction” state for more than 1800 seconds, which could impact the database performance.

Figure 11. A connection has been in “idle in transaction” state for more than 1800 seconds, which could impact the database performance.

The Metrics panel shows you a graphical representation of when the long-running “idle in transaction” connections started.

Figure 12. The Metrics panel shows you a graphical representation of when the long-running “idle in transaction” connections started.

As with the other insights, recommended actions are listed and a troubleshooting doc is linked for even more details on this use case.

Figure 13. Recommended actions are listed and a troubleshooting doc is linked for even more details on this use case.

Conclusion

With Proactive Insights, DevOpsGuru for RDS enhances its abilities to help you monitor your databases by notifying you about potential operational issues, before they become bigger problems down the road. To get started, you need to ensure that you have enabled Performance Insights on the database instance(s) you want monitored, as well as ensure and confirm that DevOps Guru is enabled to monitor those instances (for example by enabling it at account level, by monitoring specific CloudFormation stacks or by using AWS tags for specific Aurora resources). Proactive Insights is available in all regions where DevOps Guru for RDS is supported. To learn more about Proactive Insights, join us for a free hands-on Immersion Day (available in three time zones) on March 15th or April 12th.

About the authors:

How to build a consistent workflow for development and operations teams

2023-02-28 Mark Paulsen

Post Syndicated from Mark Paulsen original https://github.blog/2023-02-28-how-to-build-a-consistent-workflow-for-development-and-operations-teams/

In GitHub’s recent 2022 State of the Octoverse report, HashiCorp Configuration Language (HCL) was the fastest growing programming language on GitHub. HashiCorp is a leading provider of Infrastructure as Code (IaC) automation for cloud computing. HCL is HashiCorp’s configuration language used with tools like Terraform and Vault to deliver IaC capabilities in a human-readable configuration file across multi-cloud and on-premises environments.

HCL’s growth shows the importance of bringing together the worlds of infrastructure, operations, and developers. This was always the goal of DevOps. But in reality, these worlds remain siloed for many enterprises.

In this post we’ll look at the business and cultural influences that bring development and operations together, as well as security, governance, and networking teams. Then, we’ll explore how GitHub and HashiCorp can enable consistent workflows and guardrails throughout the entire CI/CD pipeline.

The traditional world of operations (Ops)

Armon Dadgar, co-founder of HashiCorp, uses the analogy of a tree to explain the traditional world of Ops. The trunk includes all of the shared and consistent services you need in an enterprise to get stuff done. Think of things like security requirements, Active Directory, and networking configurations. A branch represents the different lines of business within an enterprise, providing services and products internally or externally. The leaves represent the different environments and technologies where your software or services are deployed: cloud, on-premises, and container environment, among others.

In many enterprises, the communication channels and processes between these different business areas can be cumbersome and expensive. If there is a significant change to the infrastructure or architecture, multiple tickets are typically submitted to multiple teams for reviews and approvals across different parts of the enterprise. Change Advisory Boards are commonly used to protect the organization. The change is usually unable to proceed unless the documentation is complete. Commonly, there’s a set of governance logs and auditable artifacts which are required for future audits.

Wouldn’t it be more beneficial for companies if teams had an optimized, automated workflow that could be used to speed up delivery and empower teams to get the work done in a set of secure guardrails? This could result in significant time and cost savings, leading to added business value.

After all, a recent Forrester report found that over three years, using GitHub drove 433% ROI for a composite organization simply with the combined power of all GitHub’s enterprise products. Not to mention the potential for time savings and efficiency increase, along with other qualitative benefits that come with consistency and streamlining work.

Your products and services would be deployed through an optimized path with security and governance built-in, rather than a sluggish, manual and error-prone process. After all, isn’t that the dream of DevOps, GitOps, and Cloud Native?

Introducing IaC

Let’s use a different analogy. Think of IaC as the blueprint for resources (such as servers, databases, networking components, or PaaS services) that host our software and services.

If you were architecting a hospital or a school, you wouldn’t use the same overall blueprint for both scenarios as they serve entirely different purposes with significantly different requirements. But there are likely building blocks or foundations that can be reused across the two designs.

IaC solutions, such as HCL, allow us to define and reuse these building blocks, similarly to how we reuse methods, modules, and package libraries in software development. With it being IaC, we can start adopting the same recommended practices for infrastructure that we use when collaborating and deploying on applications.

After all, we know that teams that adopt DevOps methodologies will see improved productivity, cloud-enabled scalability, collaboration, and security.

A better way to deliver

With that context, let’s explore the tangible benefits that we gain in codifying our infrastructure and how they can help us transform our traditional Ops culture.

Storing code in repositories

Let’s start with the lowest-hanging fruit. With it being IaC, we can start storing infrastructure and architectural patterns in source code repositories such as GitHub. This gives us a single source of truth with a complete version history. This allows us to easily rollback changes if needed, or deploy a specific version of the truth from history.

Teams across the enterprise can collaborate in separate branches in a Git repository. Branches allow teams and individuals to be productive in “their own space” and not have to worry about negatively impacting the in-progress work of other teams, away from the “production” source of truth (typically, the main branch).

Terraform modules, the reusable building blocks mentioned in the last section, are also stored and versioned in Git repositories. From there, modules can be imported to the private registry in Terraform Cloud to make them easily discoverable by all teams. When a new release version is tagged in GitHub, it is automatically updated in the registry.

Collaborate early and often

As we discussed above, teams can make changes in separate branches to not impact the current state. But what happens when you want to bring those changes to the production codebase? If you’re unfamiliar with Git, then you may not have heard of a pull request before. As the name implies, we can “pull” changes from one branch into another.

Pull requests in GitHub are a great way to collaborate with other users in the team, being able to get peer reviews so feedback can be incorporated into your work. The pull request process is deliberately very social, to foster collaboration across the team.

In GitHub, you could consider setting branch protection rules so that direct changes to your main branch are not allowed. That way, all users must go through a pull request to get their code into production. You can even specify the minimum number of reviewers needed in branch protection rules.

Tip: you could use a special type of file, the CODEOWNERS file in GitHub, to automatically add reviewers to a pull request based on the files being edited. For example, all HCL files may need a review by the core infrastructure team. Or IaC configurations for line of business core banking systems might require review by a compliance team.

Unlike Change Advisory Boards, which typically take place on a specified cadence, pull requests become a natural part of the process to bring code into production. The quality of the decisions and discussions also evolves. Rather than being a “yes/no” decision with recommendations in an external system, the context and recommendations can be viewed directly in the pull request.

Collaboration is also critical in the provisioning process, and GitHub’s integrations with Terraform Cloud will help you scale these processes across multiple teams. Terraform Cloud offers workflow features like secure storage for your Terraform state and direct integration with your GitHub repositories for a turnkey experience around the pull request and merge lifecycle.

Bringing automated quality reviews into the process

Building on from the previous section, pull requests also allow us to automatically check the quality of the changes that are being proposed. It is common in software to check that the application still compiles correctly, that unit tests pass, that no security vulnerabilities are introduced, and more.

From an IaC perspective, we can bring similar automated checks into our process. This is achieved by using GitHub status checks and gives us a clear understanding of whether certain criteria has been met or not.

GitHub Actions are commonly used to execute some of these automated checks in pull requests on GitHub. To determine the quality of IaC, you could include checks such as:

Validating that the code is syntactically correct (for example, Terraform validate).
Linting the code to ensure a certain set of standards are being followed (for example, TFLint or Terraform format).
Static code analysis to identify any misconfigurations in your infrastructure at “design time” (for example, tfsec or terrascan).
Relevant unit or integration tests (using tools such as Terratest).
Deploying the infrastructure into a “smoke test”environment to verify that the infrastructure configuration (along with a known set of parameters) results deploy into a desired state.

Getting started with Terraform on GitHub is easy. Versions of Terraform are installed on our Linux-based GitHub-hosted runners, and HashiCorp has an official GitHub Action to set up Terraform on a runner using a Terraform version that you specify.

Compliance as an automated check

We recently blogged about building compliance, security, and audit into your delivery pipelines and the benefits of this approach. When you add IaC to your existing development pipelines and workflows, you’ll have the ability to describe previously manual compliance testing and artifacts as code directly into your HCL configurations files.

A natural extension to IaC, policy as code allows your security and compliance teams to centralize the definitions of your organization’s requirements. Terraform Cloud’s built-in support for the HashiCorp Sentinel and Open Policy Agent (OPA) frameworks allows policy sets to be automatically ingested from GitHub repositories and applied consistently across all provisioning runs. This ensures policies are applied before misconfigurations have a chance to make it to production.

An added bonus mentioned in another recent blog is the ability to leverage AI-powered compliance solutions to optimize your delivery even more. Imagine a future where generative AI could create compliance-focused unit-tests across your entire development and infrastructure delivery pipeline with no manual effort.

Security in the background

You may have heard of Dependabot, our handy tool to help you keep your dependencies up to date. But did you know that Dependabot supports Terraform? That means you could rely on Dependabot to help keep your Terraform provider and module versions up to date.

Checks complete, time to deploy

With the checks complete, it’s now time for us to deploy our new infrastructure configuration! Branching and deployment strategies is beyond the scope of this post, so we’ll leave that for another discussion.

However, GitHub Actions can help us with the deployment aspect as well! As we explained earlier, getting started with Terraform on GitHub is easy. Versions of Terraform are installed on our Linux-based GitHub-hosted runners, and HashiCorp has an official GitHub Action to set up Terraform on a runner using a Terraform version that you specify.

But you can take this even further! In Terraform, it is very common to use the command terraform plan to understand the impact of changes before you push them to production. terraform apply is then used to execute the changes.

Reviewing environment changes in a pull request

HashiCorp provides an example of automating Terraform with GitHub Actions. This example orchestrates a release through Terraform Cloud by using GitHub Actions. The example takes the output of the terraform plan command and copies the output into your pull request for approval (again, this depends on the development flow that you’ve chosen).

Reviewing environment changes using GitHub Actions environments

Let’s consider another example, based on the example from HashiCorp. GitHub Actions has a built-in concept of environments. Think of these environments as a logical mapping to a target deployment location. You can associate a protection rule with an environment so that an approval is given before deploying.

So, with that context, let’s create a GitHub Action workflow that has two environments—one which is used for planning purposes, and another which is used for deployment:

name: 'Review and Deploy to EnvironmentA'
on: [push]

jobs:
  review:
    name: 'Terraform Plan'
    environment: environment_a_plan
    runs-on: ubuntu-latest

    steps:
      - name: 'Checkout'
        uses: actions/checkout@v2

      - name: 'Terraform Setup'
        uses: hashicorp/setup-terraform@v2
        with:
          cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}

      - name: 'Terraform Init'
        run: terraform init


      - name: 'Terraform Format'
        run: terraform fmt -check

      - name: 'Terraform Plan'
        run: terraform plan -input=false

  deploy:
    name: 'Terraform'
    environment: environment_a_deploy
    runs-on: ubuntu-latest
    needs: [review]

    steps:
      - name: 'Checkout'
        uses: actions/checkout@v2

      - name: 'Terraform Setup'
        uses: hashicorp/setup-terraform@v2
        with:
          cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}

      - name: 'Terraform Init'
        run: terraform init

      - name: 'Terraform Plan'
        run: terraform apply -auto-approve -input=false

Before executing the workflow, we can create an environment in the GitHub repository and associate protection rules with the environment_a_deploy. This means that a review is required before a production deployment.

Learn more

Check out HashiCorp’s Practitioner’s Guide to Using HashiCorp Terraform Cloud with GitHub for some common recommendations on getting started. And find out how we at GitHub are using Terraform to deliver mission-critical functionality faster and at lower cost.

Securely validate business application resilience with AWS FIS and IAM

2023-02-24 Dr. Rudolf Potucek

Post Syndicated from Dr. Rudolf Potucek original https://aws.amazon.com/blogs/devops/securely-validate-business-application-resilience-with-aws-fis-and-iam/

To avoid high costs of downtime, mission critical applications in the cloud need to achieve resilience against degradation of cloud provider APIs and services.

In 2021, AWS launched AWS Fault Injection Simulator (FIS), a fully managed service to perform fault injection experiments on workloads in AWS to improve their reliability and resilience. At the time of writing, FIS allows to simulate degradation of Amazon Elastic Compute Cloud (EC2) APIs using API fault injection actions and thus explore the resilience of workflows where EC2 APIs act as a fault boundary.

In this post we show you how to explore additional fault boundaries in your applications by selectively denying access to any AWS API. This technique is particularly useful for fully managed, “black box” services like Amazon Simple Storage Service (S3) or Amazon Simple Queue Service (SQS) where a failure of read or write operations is sufficient to simulate problems in the service. This technique is also useful for injecting failures in serverless applications without needing to modify code. While similar results could be achieved with network disruption or modifying code with feature flags, this approach provides a fine granular degradation of an AWS API without the need to re-deploy and re-validate code.

Overview

We will explore a common application pattern: user uploads a file, S3 triggers an AWS Lambda function, Lambda transforms the file to a new location and deletes the original:

S3 upload and transform logical workflow: User uploads file to S3, upload triggers AWS Lambda execution, Lambda writes transformed file to a new bucket and deletes original. Workflow can be disrupted at file deletion.

Figure 1. S3 upload and transform logical workflow: User uploads file to S3, upload triggers AWS Lambda execution, Lambda writes transformed file to a new bucket and deletes original. Workflow can be disrupted at file deletion.

We will simulate the user upload with an Amazon EventBridge rate expression triggering an AWS Lambda function which creates a file in S3:

S3 upload and transform implemented demo workflow: Amazon EventBridge triggers a creator Lambda function, Lambda function creates a file in S3, file creation triggers AWS Lambda execution on transformer function, Lambda writes transformed file to a new bucket and deletes original. Workflow can be disrupted at file deletion.

Figure 2. S3 upload and transform implemented demo workflow: Amazon EventBridge triggers a creator Lambda function, Lambda function creates a file in S3, file creation triggers AWS Lambda execution on transformer function, Lambda writes transformed file to a new bucket and deletes original. Workflow can be disrupted at file deletion.

Using this architecture we can explore the effect of S3 API degradation during file creation and deletion. As shown, the API call to delete a file from S3 is an application fault boundary. The failure could occur, with identical effect, because of S3 degradation or because the AWS IAM role of the Lambda function denies access to the API.

To inject failures we use AWS Systems Manager (AWS SSM) automation documents to attach and detach IAM policies at the API fault boundary and FIS to orchestrate the workflow.

Each Lambda function has an IAM execution role that allows S3 write and delete access, respectively. If the processor Lambda fails, the S3 file will remain in the bucket, indicating a failure. Similarly, if the IAM execution role for the processor function is denied the ability to delete a file after processing, that file will remain in the S3 bucket.

Prerequisites

Following this blog posts will incur some costs for AWS services. To explore this test application you will need an AWS account. We will also assume that you are using AWS CloudShell or have the AWS CLI installed and have configured a profile with administrator permissions. With that in place you can create the demo application in your AWS account by downloading this template and deploying an AWS CloudFormation stack:

git clone https://github.com/aws-samples/fis-api-failure-injection-using-iam.git
cd fis-api-failure-injection-using-iam
aws cloudformation deploy --stack-name test-fis-api-faults --template-file template.yaml --capabilities CAPABILITY_NAMED_IAM

Fault injection using IAM

Once the stack has been created, navigate to the Amazon CloudWatch Logs console and filter for /aws/lambda/test-fis-api-faults. Under the EventBridgeTimerHandler log group you should find log events once a minute writing a timestamped file to an S3 bucket named fis-api-failure-ACCOUNT_ID. Under the S3TriggerHandler log group you should find matching deletion events for those files.

Once you have confirmed object creation/deletion, let’s take away the permission of the S3 trigger handler lambda to delete files. To do this you will attach the FISAPI-DenyS3DeleteObject policy that was created with the template:

ROLE_NAME=FISAPI-TARGET-S3TriggerHandlerRole
ROLE_ARN=$( aws iam list-roles --query "Roles[?RoleName=='${ROLE_NAME}'].Arn" --output text )
echo Target Role ARN: $ROLE_ARN

POLICY_NAME=FISAPI-DenyS3DeleteObject
POLICY_ARN=$( aws iam list-policies --query "Policies[?PolicyName=='${POLICY_NAME}'].Arn" --output text )
echo Impact Policy ARN: $POLICY_ARN

aws iam attach-role-policy \
  --role-name ${ROLE_NAME}\
  --policy-arn ${POLICY_ARN}

With the deny policy in place you should now see object deletion fail and objects should start showing up in the S3 bucket. Navigate to the S3 console and find the bucket starting with fis-api-failure. You should see a new object appearing in this bucket once a minute:

Figure 3. S3 bucket listing showing files not being deleted because IAM permissions DENY file deletion during FIS experiment.

If you would like to graph the results you can navigate to AWS CloudWatch, select “Logs Insights“, select the log group starting with /aws/lambda/test-fis-api-faults-S3CountObjectsHandler, and run this query:

fields @timestamp, @message
| filter NumObjects >= 0
| sort @timestamp desc
| stats max(NumObjects) by bin(1m)
| limit 20

This will show the number of files in the S3 bucket over time:

AWS CloudWatch Logs Insights graph showing the increase in the number of retained files in S3 bucket over time, demonstrating the effect of the introduced failure.

Figure 4. AWS CloudWatch Logs Insights graph showing the increase in the number of retained files in S3 bucket over time, demonstrating the effect of the introduced failure.

You can now detach the policy:

ROLE_NAME=FISAPI-TARGET-S3TriggerHandlerRole
ROLE_ARN=$( aws iam list-roles --query "Roles[?RoleName=='${ROLE_NAME}'].Arn" --output text )
echo Target Role ARN: $ROLE_ARN

POLICY_NAME=FISAPI-DenyS3DeleteObject
POLICY_ARN=$( aws iam list-policies --query "Policies[?PolicyName=='${POLICY_NAME}'].Arn" --output text )
echo Impact Policy ARN: $POLICY_ARN

aws iam detach-role-policy \
  --role-name ${ROLE_NAME}\
  --policy-arn ${POLICY_ARN}

We see that newly written files will once again be deleted but the un-processed files will remain in the S3 bucket. From the fault injection we learned that our system does not tolerate request failures when deleting files from S3. To address this, we should add a dead letter queue or some other retry mechanism.

Note: if the Lambda function does not return a success state on invocation, EventBridge will retry. In our Lambda functions we are cost conscious and explicitly capture the failure states to avoid excessive retries.

Fault injection using SSM

To use this approach from FIS and to always remove the policy at the end of the experiment, we first create an SSM document to automate adding a policy to a role. To inspect this document, open the SSM console, navigate to the “Documents” section, find the FISAPI-IamAttachDetach document under “Owned by me”, and examine the “Content” tab (make sure to select the correct region). This document takes the name of the Role you want to impact and the Policy you want to attach as parameters. It also requires an IAM execution role that grants it the power to list, attach, and detach specific policies to specific roles.

Let’s run the SSM automation document from the console by selecting “Execute Automation”. Determine the ARN of the FISAPI-SSM-Automation-Role from CloudFormation or by running:

POLICY_NAME=FISAPI-DenyS3DeleteObject
POLICY_ARN=$( aws iam list-policies --query "Policies[?PolicyName=='${POLICY_NAME}'].Arn" --output text )
echo Impact Policy ARN: $POLICY_ARN

Use FISAPI-SSM-Automation-Role, a duration of 2 minutes expressed in ISO8601 format as PT2M, the ARN of the deny policy, and the name of the target role FISAPI-TARGET-S3TriggerHandlerRole:

Figure 5. Image of parameter input field reflecting the instructions in blog text.

Alternatively execute this from a shell:

ASSUME_ROLE_NAME=FISAPI-SSM-Automation-Role
ASSUME_ROLE_ARN=$( aws iam list-roles --query "Roles[?RoleName=='${ASSUME_ROLE_NAME}'].Arn" --output text )
echo Assume Role ARN: $ASSUME_ROLE_ARN

ROLE_NAME=FISAPI-TARGET-S3TriggerHandlerRole
ROLE_ARN=$( aws iam list-roles --query "Roles[?RoleName=='${ROLE_NAME}'].Arn" --output text )
echo Target Role ARN: $ROLE_ARN

POLICY_NAME=FISAPI-DenyS3DeleteObject
POLICY_ARN=$( aws iam list-policies --query "Policies[?PolicyName=='${POLICY_NAME}'].Arn" --output text )
echo Impact Policy ARN: $POLICY_ARN

aws ssm start-automation-execution \
  --document-name FISAPI-IamAttachDetach \
  --parameters "{
      \"AutomationAssumeRole\": [ \"${ASSUME_ROLE_ARN}\" ],
      \"Duration\": [ \"PT2M\" ],
      \"TargetResourceDenyPolicyArn\": [\"${POLICY_ARN}\" ],
      \"TargetApplicationRoleName\": [ \"${ROLE_NAME}\" ]
    }"

Wait two minutes and then examine the content of the S3 bucket starting with fis-api-failure again. You should now see two additional files in the bucket, showing that the policy was attached for 2 minutes during which files could not be deleted, and confirming that our application is not resilient to S3 API degradation.

Permissions for injecting failures with SSM

Fault injection with SSM is controlled by IAM, which is why you had to specify the FISAPI-SSM-Automation-Role:

Visual representation of IAM permission used for fault injections with SSM. It shows the SSM execution role permitting access to use SSM automation documents as well as modify IAM roles and policies via the SSM document. It also shows the SSM user needing to have a pass-role permission to grant the SSM execution role to the SSM service.

Figure 6. Visual representation of IAM permission used for fault injections with SSM.

This role needs to contain an assume role policy statement for SSM to allow assuming the role:

      AssumeRolePolicyDocument:
        Statement:
          - Action:
             - 'sts:AssumeRole'
            Effect: Allow
            Principal:
              Service:
                - "ssm.amazonaws.com"

The role also needs to contain permissions to describe roles and their attached policies with an optional constraint on which roles and policies are visible:

          - Sid: GetRoleAndPolicyDetails
            Effect: Allow
            Action:
              - 'iam:GetRole'
              - 'iam:GetPolicy'
              - 'iam:ListAttachedRolePolicies'
            Resource:
              # Roles
              - !GetAtt EventBridgeTimerHandlerRole.Arn
              - !GetAtt S3TriggerHandlerRole.Arn
              # Policies
              - !Ref AwsFisApiPolicyDenyS3DeleteObject

Finally the SSM role needs to allow attaching and detaching a policy document. This requires

an ALLOW statement
a constraint on the policies that can be attached
a constraint on the roles that can be attached to

In the role we collapse the first two requirements into an ALLOW statement with a condition constraint for the Policy ARN. We then express the third requirement in a DENY statement that will limit the '*' resource to only the explicit role ARNs we want to modify:

          - Sid: AllowOnlyTargetResourcePolicies
            Effect: Allow
            Action:  
              - 'iam:DetachRolePolicy'
              - 'iam:AttachRolePolicy'
            Resource: '*'
            Condition:
              ArnEquals:
                'iam:PolicyARN':
                  # Policies that can be attached
                  - !Ref AwsFisApiPolicyDenyS3DeleteObject
          - Sid: DenyAttachDetachAllRolesExceptApplicationRole
            Effect: Deny
            Action: 
              - 'iam:DetachRolePolicy'
              - 'iam:AttachRolePolicy'
            NotResource: 
              # Roles that can be attached to
              - !GetAtt EventBridgeTimerHandlerRole.Arn
              - !GetAtt S3TriggerHandlerRole.Arn

We will discuss security considerations in more detail at the end of this post.

Fault injection using FIS

With the SSM document in place you can now create an FIS template that calls the SSM document. Navigate to the FIS console and filter for FISAPI-DENY-S3PutObject. You should see that the experiment template passes the same parameters that you previously used with SSM:

Image of FIS experiment template action summary. This shows the SSM document ARN to be used for fault injection and the JSON parameters passed to the SSM document specifying the IAM Role to modify and the IAM Policy to use.

Figure 7. Image of FIS experiment template action summary. This shows the SSM document ARN to be used for fault injection and the JSON parameters passed to the SSM document specifying the IAM Role to modify and the IAM Policy to use.

You can now run the FIS experiment and after a couple minutes once again see new files in the S3 bucket.

Permissions for injecting failures with FIS and SSM

Fault injection with FIS is controlled by IAM, which is why you had to specify the FISAPI-FIS-Injection-EperimentRole:

Visual representation of IAM permission used for fault injections with FIS and SSM. It shows the SSM execution role permitting access to use SSM automation documents as well as modify IAM roles and policies via the SSM document. It also shows the FIS execution role permitting access to use FIS templates, as well as the pass-role permission to grant the SSM execution role to the SSM service. Finally it shows the FIS user needing to have a pass-role permission to grant the FIS execution role to the FIS service.

Figure 8. Visual representation of IAM permission used for fault injections with FIS and SSM. It shows the SSM execution role permitting access to use SSM automation documents as well as modify IAM roles and policies via the SSM document. It also shows the FIS execution role permitting access to use FIS templates, as well as the pass-role permission to grant the SSM execution role to the SSM service. Finally it shows the FIS user needing to have a pass-role permission to grant the FIS execution role to the FIS service.

This role needs to contain an assume role policy statement for FIS to allow assuming the role:

      AssumeRolePolicyDocument:
        Statement:
          - Action:
              - 'sts:AssumeRole'
            Effect: Allow
            Principal:
              Service:
                - "fis.amazonaws.com"

The role also needs permissions to list and execute SSM documents:

            - Sid: RequiredReadActionsforAWSFIS
              Effect: Allow
              Action:
                - 'cloudwatch:DescribeAlarms'
                - 'ssm:GetAutomationExecution'
                - 'ssm:ListCommands'
                - 'iam:ListRoles'
              Resource: '*'
            - Sid: RequiredSSMStopActionforAWSFIS
              Effect: Allow
              Action:
                - 'ssm:CancelCommand'
              Resource: '*'
            - Sid: RequiredSSMWriteActionsforAWSFIS
              Effect: Allow
              Action:
                - 'ssm:StartAutomationExecution'
                - 'ssm:StopAutomationExecution'
              Resource: 
                - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-definition/${SsmAutomationIamAttachDetachDocument}:$DEFAULT'

Finally, remember that the SSM document needs to use a Role of its own to execute the fault injection actions. Because that Role is different from the Role under which we started the FIS experiment, we need to explicitly allow SSM to assume that role with a PassRole statement which will expand to FISAPI-SSM-Automation-Role:

            - Sid: RequiredIAMPassRoleforSSMADocuments
              Effect: Allow
              Action: 'iam:PassRole'
              Resource: !Sub 'arn:aws:iam::${AWS::AccountId}:role/${SsmAutomationRole}'

Secure and flexible permissions

So far, we have used explicit ARNs for our guardrails. To expand flexibility, we can use wildcards in our resource matching. For example, we might change the Policy matching from:

            Condition:
              ArnEquals:
                'iam:PolicyARN':
                  # Explicitly listed policies - secure but inflexible
                  - !Ref AwsFisApiPolicyDenyS3DeleteObject

or the equivalent:

            Condition:
              ArnEquals:
                'iam:PolicyARN':
                  # Explicitly listed policies - secure but inflexible
                  - !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/${FullPolicyName}

to a wildcard notation like this:

            Condition:
              ArnEquals:
                'iam:PolicyARN':
                  # Wildcard policies - secure and flexible
                  - !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/${PolicyNamePrefix}*'

If we set PolicyNamePrefix to FISAPI-DenyS3 this would now allow invoking FISAPI-DenyS3PutObject and FISAPI-DenyS3DeleteObject but would not allow using a policy named FISAPI-DenyEc2DescribeInstances.

Similarly, we could change the Resource matching from:

            NotResource: 
              # Explicitly listed roles - secure but inflexible
              - !GetAtt EventBridgeTimerHandlerRole.Arn
              - !GetAtt S3TriggerHandlerRole.Arn

to a wildcard equivalent like this:

            NotResource: 
              # Wildcard policies - secure and flexible
              - !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:role/${RoleNamePrefixEventBridge}*'
              - !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:role/${RoleNamePrefixS3}*'

and setting RoleNamePrefixEventBridge to FISAPI-TARGET-EventBridge and RoleNamePrefixS3 to FISAPI-TARGET-S3.

Finally, we would also change the FIS experiment role to allow SSM documents based on a name prefix by changing the constraint on automation execution from:

            - Sid: RequiredSSMWriteActionsforAWSFIS
              Effect: Allow
              Action:
                - 'ssm:StartAutomationExecution'
                - 'ssm:StopAutomationExecution'
              Resource: 
                # Explicitly listed resource - secure but inflexible
                # Note: the $DEFAULT at the end could also be an explicit version number
                # Note: the 'automation-definition' is automatically created from 'document' on invocation
                - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-definition/${SsmAutomationIamAttachDetachDocument}:$DEFAULT'

            - Sid: RequiredSSMWriteActionsforAWSFIS
              Effect: Allow
              Action:
                - 'ssm:StartAutomationExecution'
                - 'ssm:StopAutomationExecution'
              Resource: 
                # Wildcard resources - secure and flexible
                # 
                # Note: the 'automation-definition' is automatically created from 'document' on invocation
                - !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-definition/${SsmAutomationDocumentPrefix}*'

and setting SsmAutomationDocumentPrefix to FISAPI-. Test this by updating the CloudFormation stack with a modified template:

aws cloudformation deploy --stack-name test-fis-api-faults --template-file template2.yaml --capabilities CAPABILITY_NAMED_IAM

Permissions governing users

In production you should not be using administrator access to use FIS. Instead we create two roles FISAPI-AssumableRoleWithCreation and FISAPI-AssumableRoleWithoutCreation for you (see this template). These roles require all FIS and SSM resources to have a Name tag that starts with FISAPI-. Try assuming the role without creation privileges and running an experiment. You will notice that you can only start an experiment if you add a Name tag, e.g. FISAPI-secure-1, and you will only be able to get details of experiments and templates that have proper Name tags.

If you are working with AWS Organizations, you can add further guard rails by defining SCPs that control the use of the FISAPI-* tags similar to this blog post.

Caveats

For this solution we are choosing to attach policies instead of permission boundaries. The benefit of this is that you can attach multiple independent policies and thus simulate multi-step service degradation. However, this means that it is possible to increase the permission level of a role. While there are situations where this might be of interest, e.g. to simulate security breaches, please implement a thorough security review of any fault injection IAM policies you create. Note that modifying IAM Roles may trigger events in your security monitoring tools.

The AttachRolePolicy and DetachRolePolicy calls from AWS IAM are eventually consistent, meaning that in some cases permission propagation when starting and stopping fault injection may take up to 5 minutes each.

Cleanup

To avoid additional cost, delete the content of the S3 bucket and delete the CloudFormation stack:

# Clean up policy attachments just in case
CLEANUP_ROLES=$(aws iam list-roles --query "Roles[?starts_with(RoleName,'FISAPI-')].RoleName" --output text)
for role in $CLEANUP_ROLES; do
  CLEANUP_POLICIES=$(aws iam list-attached-role-policies --role-name $role --query "AttachedPolicies[?starts_with(PolicyName,'FISAPI-')].PolicyName" --output text)
  for policy in $CLEANUP_POLICIES; do
    echo Detaching policy $policy from role $role
    aws iam detach-role-policy --role-name $role --policy-arn $policy
  done
done
# Delete S3 bucket content
ACCOUNT_ID=$( aws sts get-caller-identity --query Account --output text )
S3_BUCKET_NAME=fis-api-failure-${ACCOUNT_ID}
aws s3 rm --recursive s3://${S3_BUCKET_NAME}
aws s3 rb s3://${S3_BUCKET_NAME}
# Delete cloudformation stack
aws cloudformation delete-stack --stack-name test-fis-api-faults
aws cloudformation wait stack-delete-complete --stack-name test-fis-api-faults

Conclusion

AWS Fault Injection Simulator provides the ability to simulate various external impacts to your application to validate and improve resilience. We’ve shown how combining FIS with IAM to selectively deny access to AWS APIs provides a generic path to explore fault boundaries across all AWS services. We’ve shown how this can be used to identify and improve a resilience problem in a common S3 upload workflow. To learn about more ways to use FIS, see this workshop.

About the authors:

Improve collaboration between teams by using AWS CDK constructs

2023-02-22 Joerg Woehrle

Post Syndicated from Joerg Woehrle original https://aws.amazon.com/blogs/devops/improve-collaboration-between-teams-by-using-aws-cdk-constructs/

There are different ways to organize teams to deliver great software products. There are companies that give the end-to-end responsibility for a product to a single team, like Amazon’s Two-Pizza teams, and there are companies where multiple teams split the responsibility between infrastructure (or platform) teams and application development teams. This post provides guidance on how collaboration efficiency can be improved in the case of a split-team approach with the help of the AWS Cloud Development Kit (CDK).

The AWS CDK is an open-source software development framework to define your cloud application resources. You do this by using familiar programming languages like TypeScript, Python, Java, C# or Go. It allows you to mix code to define your application’s infrastructure, traditionally expressed through infrastructure as code tools like AWS CloudFormation or HashiCorp Terraform, with code to bundle, compile, and package your application.

This is great for autonomous teams with end-to-end responsibility, as it helps them to keep all code related to that product in a single place and single programming language. There is no need to separate application code into a different repository than infrastructure code with a single team, but what about the split-team model?

Larger enterprises commonly split the responsibility between infrastructure (or platform) teams and application development teams. We’ll see how to use the AWS CDK to ensure team independence and agility even with multiple teams involved. We’ll have a look at the different responsibilities of the participating teams and their produced artifacts, and we’ll also discuss how to make the teams work together in a frictionless way.

This blog post assumes a basic level of knowledge on the AWS CDK and its concepts. Additionally, a very high level understanding of event driven architectures is required.

Team Topologies

Let’s first have a quick look at the different team topologies and each team’s responsibilities.

One-Team Approach

In this blog post we will focus on the split-team approach described below. However, it’s still helpful to understand what we mean by “One-Team” Approach: A single team owns an application from end-to-end. This cross-functional team decides on its own on the features to implement next, which technologies to use and how to build and deploy the resulting infrastructure and application code. The team’s responsibility is infrastructure, application code, its deployment and operations of the developed service.

If you’re interested in how to structure your AWS CDK application in a such an environment have a look at our colleague Alex Pulver’s blog post Recommended AWS CDK project structure for Python applications.

Split-Team Approach

In reality we see many customers who have separate teams for application development and infrastructure development and deployment.

Infrastructure Team

What I call the infrastructure team is also known as the platform or operations team. It configures, deploys, and operates the shared infrastructure which other teams consume to run their applications on. This can be things like an Amazon SQS queue, an Amazon Elastic Container Service (Amazon ECS) cluster as well as the CI/CD pipelines used to bring new versions of the applications into production.
It is the infrastructure team’s responsibility to get the application package developed by the Application Team deployed and running on AWS, as well as provide operational support for the application.

Application Team

Traditionally the application team just provides the application’s package (for example, a JAR file or an npm package) and it’s the infrastructure team’s responsibility to figure out how to deploy, configure, and run it on AWS. However, this traditional setup often leads to bottlenecks, as the infrastructure team will have to support many different applications developed by multiple teams. Additionally, the infrastructure team often has little knowledge of the internals of those applications. This often leads to solutions which are not optimized for the problem at hand: If the infrastructure team only offers a handful of options to run services on, the application team can’t use options optimized for their workload.

This is why we extend the traditional responsibilities of the application team in this blog post. The team provides the application and additionally the description of the infrastructure required to run the application. With “infrastructure required” we mean the AWS services used to run the application. This infrastructure description needs to be written in a format which can be consumed by the infrastructure team.

While we understand that this shift of responsibility adds additional tasks to the application team, we think that in the long term it is worth the effort. This can be the starting point to introduce DevOps concepts into the organization. However, the concepts described in this blog post are still valid even if you decide that you don’t want to add this responsibility to your application teams. The boundary of who is delivering what would then just move more into the direction of the infrastructure team.

To be successful with the given approach, the two teams need to agree on a common format on how to hand over the application, its infrastructure definition, and how to bring it to production. The AWS CDK with its concept of Constructs provides a perfect means for that.

Primer: AWS CDK Constructs

In this section we take a look at the concepts the AWS CDK provides for structuring our code base and how these concepts can be used to fit a CDK project into your team topology.

Constructs

Constructs are the basic building block of an AWS CDK application. An AWS CDK application is composed of multiple constructs which in the end define how and what is deployed by AWS CloudFormation.

The AWS CDK ships with constructs created to deploy AWS services. However, it is important to understand that you are not limited to the out-of-the-box constructs provided by the AWS CDK. The true power of AWS CDK is the possibility to create your own abstractions on top of the default constructs to create solutions for your specific requirement. To achieve this you write, publish, and consume your own, custom constructs. They codify your specific requirements, create an additional level of abstraction and allow other teams to consume and use your construct.

We will use a custom construct to separate the responsibilities between the the application and the infrastructure team. The application team will release a construct which describes the infrastructure along with its configuration required to run the application code. The infrastructure team will consume this construct to deploy and operate the workload on AWS.

How to use the AWS CDK in a Split-Team Setup

Let’s now have a look at how we can use the AWS CDK to split the responsibilities between the application and infrastructure team. I’ll introduce a sample scenario and then illustrate what each team’s responsibility is within this scenario.

Scenario

Our fictitious application development team writes an AWS Lambda function which gets deployed to AWS. Messages in an Amazon SQS queue will invoke the function. Let’s say the function will process orders (whatever this means in detail is irrelevant for the example) and each order is represented by a message in the queue.

The application development team has full flexibility when it comes to creating the AWS Lambda function. They can decide which runtime to use or how much memory to configure. The SQS queue which the function will act upon is created by the infrastructure team. The application team does not have to know how the messages end up in the queue.

With that we can have a look at a sample implementation split between the teams.

Application Team

The application team is responsible for two distinct artifacts: the application code (for example, a Java jar file or an npm module) and the AWS CDK construct used to deploy the required infrastructure on AWS to run the application (an AWS Lambda Function along with its configuration).

The lifecycles of these artifacts differ: the application code changes more frequently than the infrastructure it runs in. That’s why we want to keep the artifacts separate. With that each of the artifacts can be released at its own pace and only if it was changed.

In order to achieve these separate lifecycles, it is important to notice that a release of the application artifact needs to be completely independent from the release of the CDK construct. This fits our approach of separate teams compared to the standard CDK way of building and packaging application code within the CDK construct.

But how will this be done in our example solution? The team will build and publish an application artifact which does not contain anything related to CDK.
When a CDK Stack with this construct is synthesized it will download the pre-built artifact with a given version number from AWS CodeArtifact and use it to create the input zip file for a Lambda function. There is no build of the application package happening during the CDK synth.

With the separation of construct and application code, we need to find a way to tell the CDK construct which specific version of the application code it should fetch from CodeArtifact. We will pass this information to the construct via a property of its constructor.

For dependencies on infrastructure outside of the responsibility of the application team, I follow the pattern of dependency injection. Those dependencies, for example a shared VPC or an Amazon SQS queue, are passed into the construct from the infrastructure team.

Let’s have a look at an example. We pass in the external dependency on an SQS Queue, along with details on the desired appPackageVersion and its CodeArtifact details:

export interface OrderProcessingAppConstructProps {
    queue: aws_sqs.Queue,
    appPackageVersion: string,
    codeArtifactDetails: {
        account: string,
        repository: string,
        domain: string
    }
}

export class OrderProcessingAppConstruct extends Construct {

    constructor(scope: Construct, id: string, props: OrderProcessingAppConstructProps) {
        super(scope, id);

        const lambdaFunction = new lambda.Function(this, 'OrderProcessingLambda', {
            code: lambda.Code.fromDockerBuild(path.join(__dirname, '..', 'bundling'), {
                buildArgs: {
                    'PACKAGE_VERSION' : props.appPackageVersion,
                    'CODE_ARTIFACT_ACCOUNT' : props.codeArtifactDetails.account,
                    'CODE_ARTIFACT_REPOSITORY' : props.codeArtifactDetails.repository,
                    'CODE_ARTIFACT_DOMAIN' : props.codeArtifactDetails.domain
                }
            }),
            runtime: lambda.Runtime.NODEJS_16_X,
            handler: 'node_modules/order-processing-app/dist/index.lambdaHandler'
        });
        const eventSource = new SqsEventSource(props.queue);
        lambdaFunction.addEventSource(eventSource);
    }
}

Note the code lambda.Code.fromDockerBuild(...): We use AWS CDK’s functionality to bundle the code of our Lambda function via a Docker build. The only things which happen inside of the provided Dockerfile are:

the login into the AWS CodeArtifact repository which holds the pre-built application code’s package
the download and installation of the application code’s artifact from AWS CodeArtifact (in this case via npm)

If you are interested in more details on how you can build, bundle and deploy your AWS CDK assets I highly recommend a blog post by my colleague Cory Hall: Building, bundling, and deploying applications with the AWS CDK. It goes into much more detail than what we are covering here.

Looking at the example Dockerfile we can see the two steps described above:

FROM public.ecr.aws/sam/build-nodejs16.x:latest

ARG PACKAGE_VERSION
ARG CODE_ARTIFACT_AWS_REGION
ARG CODE_ARTIFACT_ACCOUNT
ARG CODE_ARTIFACT_REPOSITORY

RUN aws codeartifact login --tool npm --repository $CODE_ARTIFACT_REPOSITORY --domain $CODE_ARTIFACT_DOMAIN --domain-owner $CODE_ARTIFACT_ACCOUNT --region $CODE_ARTIFACT_AWS_REGION
RUN npm install order-processing-app@$PACKAGE_VERSION --prefix /asset

Please note the following:

we use --prefix /asset with our npm install command. This tells npm to install the dependencies into the folder which CDK will mount into the container. All files which should go into the output of the docker build need to be placed here.
the aws codeartifact login command requires credentials with the appropriate permissions to proceed. In case you run this on for example AWS CodeBuild or inside of a CDK Pipeline you need to make sure that the used role has the appropriate policies attached.

Infrastructure Team

The infrastructure team consumes the AWS CDK construct published by the application team. They own the AWS CDK Stack which composes the whole application. Possibly this will only be one of several Stacks owned by the Infrastructure team. Other Stacks might create shared infrastructure (like VPCs, networking) and other applications.

Within the stack for our application the infrastructure team consumes and instantiates the application team’s construct, passes any dependencies into it and then deploys the stack by whatever means they see fit (e.g. through AWS CodePipeline, GitHub Actions or any other form of continuous delivery/deployment).

The dependency on the application team’s construct is manifested in the package.json of the infrastructure team’s CDK app:

{
  "name": "order-processing-infra-app",
  ...
  "dependencies": {
    ...
    "order-app-construct" : "1.1.0",
    ...
  }
  ...
}

Within the created CDK Stack we see the dependency version for the application package as well as how the infrastructure team passes in additional information (like e.g. the queue to use):

export class OrderProcessingInfraStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);   

    const orderProcessingQueue = new Queue(this, 'order-processing-queue');

    new OrderProcessingAppConstruct(this, 'order-processing-app', {
       appPackageVersion: "2.0.36",
       queue: orderProcessingQueue,
       codeArtifactDetails: { ... }
     });
  }
}

Propagating New Releases

We now have the responsibilities of each team sorted out along with the artifacts owned by each team. But how do we propagate a change done by the application team all the way to production? Or asked differently: how can we invoke the infrastructure team’s CI/CD pipeline with the updated artifact versions of the application team?

We will need to update the infrastructure team’s dependencies on the application teams artifacts whenever a new version of either the application package or the AWS CDK construct is published. With the dependencies updated we can then start the release pipeline.

One approach is to listen and react to events published by AWS CodeArtifact via Amazon EventBridge. On each release AWS CodeArtifact will publish an event to Amazon EventBridge. We can listen to that event, extract the version number of the new release from its payload and start a workflow to update either our dependency on the CDK construct (e.g. in the package.json of our CDK application) or a update the appPackageVersion which the infrastructure team passes into the consumed construct.

Here’s how a release of a new app version flows through the system:

Figure 1 – A release of the application package triggers a change and deployment of the infrastructure team’s CDK Stack

The application team publishes a new app version into AWS CodeArtifact
CodeArtifact triggers an event on Amazon EventBridge
The infrastructure team listens to this event
The infrastructure team updates its CDK stack to include the latest appPackageVersion
The infrastructure team’s CDK Stack gets deployed

And very similar the release of a new version of the CDK Construct:

Figure 2 – A release of the application team’s CDK construct triggers a change and deployment of the infrastructure team’s CDK Stack

The application team publishes a new CDK construct version into AWS CodeArtifact
CodeArtifact triggers an event on Amazon EventBridge
The infrastructure team listens to this event
The infrastructure team updates its dependency to the latest CDK construct
The infrastructure team’s CDK Stack gets deployed

We will not go into the details on how such a workflow could look like, because it’s most likely highly custom for each team (think of different tools used for code repositories, CI/CD). However, here are some ideas on how it can be accomplished:

Updating the CDK Construct dependency

To update the dependency version of the CDK construct the infrastructure team’s package.json (or other files used for dependency tracking like pom.xml) needs to be updated. You can build automation to checkout the source code and issue a command like npm install sample-app-construct@NEW_VERSION (where NEW_VERSION is the value read from the EventBridge event payload). You then automatically create a pull request to incorporate this change into your main branch. For a sample on what this looks like see the blog post Keeping up with your dependencies: building a feedback loop for shared librares.

Updating the appPackageVersion

To update the appPackageVersion used inside of the infrastructure team’s CDK Stack you can either follow the same approach outlined above, or you can use CDK’s capability to read from an AWS Systems Manager (SSM) Parameter Store parameter. With that you wouldn’t put the value for appPackageVersion into source control, but rather read it from SSM Parameter Store. There is a how-to for this in the AWS CDK documentation: Get a value from the Systems Manager Parameter Store. You then start the infrastructure team’s pipeline based on the event of a change in the parameter.

To have a clear understanding of what is deployed at any given time and in order to see the used parameter value in CloudFormation I’d recommend using the option described at Reading Systems Manager values at synthesis time.

Conclusion

You’ve seen how the AWS Cloud Development Kit and its Construct concept can help to ensure team independence and agility even though multiple teams (in our case an application development team and an infrastructure team) work together to bring a new version of an application into production. To do so you have put the application team in charge of not only their application code, but also of the parts of the infrastructure they use to run their application on. This is still in line with the discussed split-team approach as all shared infrastructure as well as the final deployment is in control of the infrastructure team and is only consumed by the application team’s construct.

About the Authors

	As a Solutions Architect Jörg works with manufacturing customers in Germany. Before he joined AWS in 2019 he held various roles like Developer, DevOps Engineer and SRE. With that Jörg enjoys building and automating things and fell in love with the AWS Cloud Development Kit.
	Mo joined AWS in 2020 as a Technical Account Manager, bringing with him 7 years of hands-on AWS DevOps experience and 6 year as System operation admin. He is a member of two Technical Field Communities in AWS (Cloud Operation and Builder Experience), focusing on supporting customers with CI/CD pipelines and AI for DevOps to ensure they have the right solutions that fit their business needs.

Provider	Use case
MongoDB Atlas	Manage components in MongoDB Atlas. Add, edit, or delete administrative objects within Atlas, including projects, users, and database deployments Note: You cannot read or write data to Atlas Clusters with Atlas Admin APIs and AWS CloudFormation resources. To read and write data in Atlas, you must use the Atlas Data API
GitLab	Manage the users and groups in an organization, set up a new project with the right users, groups, and access token, tag a project automatically for every active CI/CD deployment
New Relic	Create a new Dashboard with custom Pages, Widgets and Layout, add tags to your data to help improve data organization and findability, workloads-related tasks
GitHub	Manage the users and groups in an organization, set up a new project with the right users, groups, and access token, Add a webhook to a repo
Dynatrace	Set up a new project with service level objective, locations, monitors and metrics
Okta	Onboard a new application into Okta with the right users and groups
PagerDuty	Set up monitoring of a new or existing application
Databricks	Set up a Databricks cluster and jobs
Fastly	Configure Fastly as a CDN for your web app
BigID	Connect S3 and DynamoDB data sources into your BigID application
Rollbar	Set up a new Rollbar project and manage rules, teams, and users
Cloudflare	Configure a DNS record and load-balancing using Cloudflare
Lacework	Configure Lacework alert profiles, rules, channels and manage queries
Snowflake	Create databases, users, and manage privileges