Tag Archives: ChatOps

How we build containerized services at GitHub using GitHub

2023-08-02 MV Karan

Post Syndicated from MV Karan original https://github.blog/2023-08-02-how-we-build-containerized-services-at-github-using-github/

The developer experience engineering team at GitHub works on creating safe, delightful, and inclusive solutions for GitHub engineers to efficiently code, ship, and operate software–setting an example for the world on how to build software with GitHub. To achieve this we provide our developers with a paved path–a comprehensive suite of automated tools and applications to streamline our runtime platforms, deployment, and hosting that helps power some of the microservices on the GitHub.com platform and many internal tools. Let’s take a deeper look at how one of our main paved paths works.

Our development ecosystem

GitHub’s main paved path covers everything that’s needed for running software–creating, deploying, scaling, debugging, and running applications. It is an ecosystem of tools like Kubernetes, Docker, load balancers, and many custom apps that work together to create a cohesive experience for our engineers. It isn’t just infrastructure and isn’t just Kubernetes. Kubernetes is our base layer, and the paved path is a mix of conventions, tools, and settings built on top of it.

The kind of services that we typically run using the paved path include web apps, computation pipelines, batch processors, and monitoring systems.

Kubernetes, which is the base layer of the paved path, runs in a multi-cluster, multi-region topology.

Benefits of the paved path

There are hundreds of services at GitHub–from a small internal tool to an external API supporting production workloads. For a variety of reasons, it would be inefficient to spin up virtual machines for each service.

Planning and capacity usage across all services wouldn’t be efficient. We would encounter significant overhead in managing both physical and Kubernetes infrastructure on an ongoing basis.
Teams would need to build deep expertise in managing their own Kubernetes clusters and would have less time to focus on their application’s unique needs.
We would have less central visibility of applications.
Security and compliance would be difficult to standardize and enforce.

With the paved path based on Kubernetes and other runtime apps, we’re instead able to:

Plan capacity centrally and only for the Kubernetes nodes, so we can optimally use capacity across nodes, as small workloads and large workloads coexist on the same machines.
Scale rapidly thanks to central capacity planning.
Easily manage configuration and deployments across services in one central control plane.
Consistently provide insights into app and deployment performance for individual services.

Onboarding a service

Onboarding a service with the code living in its own repository has been made easy with our ChatOps command service, called Hubot, and GitHub Apps. Service owners can easily generate some basic scaffolding needed to deploy the service by running a command like:

hubot gh-platform app scaffold monalisa-app

A custom GitHub App installed on the service’s GitHub repository will then automatically generate a pull request to add the necessary configurations, which includes:

A deployment.yaml file that defines the service’s deployment environments.
Kubernetes manifests that define Deployment and Service objects for deploying the service.
A Debian Dockerfile that runs a trivial web server to start off with, which will be used by the Kubernetes manifests.
Setting up CI builds as GitHub Checks that build the Docker images on every push, and store in a container registry ready for deployment.

Each service that is onboarded to the paved path has its unique Kubernetes namespace that is defined by <app-name>-<environment> and generally has a staging and production environment. This helps separate the workloads of multiple services, and also multiple environments for the same service since each environment gets its own Kubernetes namespace.

Deploying a service

At GitHub, we deploy branches and perform deployments through Hubot ChatOps commands. To deploy a branch named bug-fixes in the monalisa-app repository to the staging environment, a developer would run a ChatOps command like:

hubot deploy monalisa-app/bug-fixes to staging

This triggers a deployment that fetches the Docker image associated with the latest commit in the bug-fixes branch, updates the Kubernetes manifests, and applies those manifests to the clusters in the runtime platform relevant to that environment.

Typically, the Docker image would be deployed to multiple Kubernetes clusters across multiple geographical sites in a region that forms a part of the runtime platform.

To automate pull request merges into the busiest branches and orchestrate the rollout across environments we’re also using merge queue and deployment pipelines, which our engineers can observe and interact with during their deployment.

Securing our services

For any company, the security of the platform itself, along with services running within it, is critical. In addition to our engineering-wide practices, such as requiring two-person reviews on every pull request, we also have Security and Platform teams automating security measures, such as:

Pre-built Docker images to be used as base images for the Dockerfiles. These base images contain only the necessary packages/dependencies with security updates, a set of installed software that is auditable and curated according to shared needs.
Build-time and periodic scanning of all packages and running container images, for any vulnerabilities or dependencies needing a patch, powered by our own products for software supply chain security like Dependabot.
Build time and periodic scanning of GitHub repositories of services for exposed secrets and vulnerabilities, using GitHub’s native security features for advanced security like code scanning and secret scanning.
Multiple authentication and authorization mechanisms that allow only the relevant individuals to directly access underlying Kubernetes resources.
Comprehensive telemetry for threat detection.
Services running in the platform are by default accessible only within GitHub’s internal networks and not exposed on the public internet.
Branch protection policies are enforced on all production repositories. These policies prevent merging a pull request until designated automated tests pass and the change has been reviewed by a different developer from the one who proposed the change.

Another key aspect of security for an application is how secrets like keys and tokens are managed. At GitHub, we use a centralized secret store to manage secrets. Each service and each environment within the service has its own vault to store secrets. These secrets are then injected into the relevant pods in Kubernetes, which are then exposed to the containers.

The deployment flow, from merge to rollout

The whole deployment process would look something like this:

A GitHub engineer merges a pull request to a branch in a repository. In the above example, it is the bug-fixes branch in the monalisa-app repository. This repository would also contain the Kubernetes manifest template files for deploying the application.
The pull request merge triggers relevant CI workflows. One of them is a build of the Docker image, which builds the container image based on the Dockerfile specified in the repository, and pushes the image to an internal artifact registry.
Once all the CI workflows have completed successfully, the engineer initiates a deployment by running a ChatOps command like hubot deploy monalisa-app/bug-fixes to staging. This triggers a deployment to our environments such as Staging.
The build systems fetch the Kubernetes manifest files from the repository branch, replaces the latest image to be deployed from the artifact registry, injects app secrets from the secret store, and runs some custom operations. At the end of this stage, a ready-to-deploy Kubernetes manifest is available.
Our deployment systems then apply the Kubernetes manifest to relevant clusters and monitor the rollout status of new changes.

Conclusion

GitHub’s internal paved path helps developers at GitHub focus on building services and delivering value to our users, with minimal focus on the infrastructure. We accomplish this by providing a streamlined path to our GitHub engineers that uses the power of containers and Kubernetes; scalable security, authentication, and authorization mechanisms; and the GitHub.com platform itself.

Want to try some of these for yourself? Learn more about all of GitHub’s features on github.com/features. If you have adopted any of our practices for your own development, give us a shout on Twitter!

AWS Control Tower Account vending through Amazon Lex ChatBot

2021-10-19 Marco Fischer

Post Syndicated from Marco Fischer original https://aws.amazon.com/blogs/devops/aws-control-tower-account-vending-through-amazon-lex-chatbot/

In this blog post you will learn about a multi-environment solution that uses a cloud native CICD pipeline to build, test, and deploy a Serverless ChatOps bot that integrates with AWS Control Tower Account Factory for AWS account vending. This solution can be used and integrated with any of your favourite request portal or channel that allows to call a RESTFUL API endpoint, for you to offer AWS Account vending at scale for your enterprise.

Introduction

Most of the AWS Control Tower customers use the AWS Control Tower Account Factory (a Service Catalog product), and the ServiceCatalog service to vend standardized AWS Services and Products into AWS Accounts. ChatOps is a collaboration model that interconnects a process with people, tools, and automation. It combines a Bot that can fulfill service requests (the work needed) and be augmented by Ops and Engineering staff in order to allow approval processes or corrections in the case of exception request. Major tasks in the public Cloud go toward building a proper foundation (the so called LandingZone). The main goals of this foundation are providing not only an AWS Account access (with the right permissions), but also the correct Cloud Center of Excellence (CCoE) approved products and services. This post demonstrates how to utilize the existing AWS Control Tower Account Factory, extending the Service Catalog portfolio in Control Tower with additional products, and executing Account vending and Product vending through an easy ChatBot interface. You will also learn how to utilize this Solution with Slack. But it can also be easily utilized with Chime/MS Teams or a normal Web-frontend, as the integration is channel-agnostig through an API Gateway integration layer. Then, you will combine all of this, integrating a ChatBot frontend where users can issue requests against the CCoE and Ops team to fulfill AWS services easily and transparently. As a result, you experience a more efficient process for vending AWS Accounts and Products and taking away the burden on your Cloud Operations team.

Background

An AWS Account Factory Account account is an AWS account provisioned using account factory in AWS Control Tower.
AWS Service Catalog lets you to centrally manage commonly deployed IT services. For this blog, account factory utilizes AWS Service Catalog to provision new AWS accounts.
A Control Tower provisioned product is an instance of the Control Tower Account Factory product that is provisioned by AWS Service Catalog. In this post, any new AWS account created through the ChatOps solution will be a provisioned product and visible in Service Catalog.
Amazon Lex: is a service for building conversational interfaces into any application using voice and text

Architecture Overview

The following architecture shows the overview of the solution which will be built with the code provided through Github.

Multi-Environment CICD Architecture

The multi-environment pipeline is building 3 environments (Dev, Staging, Production) with different quality gates to push changes on this solution from a “Development Environment” up to a “Production environment”. This will make sure that your AWS ChatBot and the account vending is scalable and fully functional before you release it to production and make it available to your end-users.

AWS Code Commit: There are two repositories used, one repository where Amazon Lex bot is created through a Java-Lambda function and installed in STEP 1. And one for the Amazon Lex bot APIs that are running and capturing the Account vending requests behind API Gateway and then communicating with the Amazon Lex Bot.
AWS Code Pipeline: It integrates CodeCommit and CodeBuild and CodeDeploy, to be manage your release pipelines moving from Dev to Production.
AWS Code Build: Each different activity executed inside the pipeline is a CodeBuild activity. Inside the source code repository there are different files with the prefix buildspec-. Each of these files contains the exact commands that the code build must execute on each of the stages: build/test.
AWS Code Deploy: Tthis is an AWS service that manages the deployment of the serverless application stack. In this solution it implements a canary deployment where in the first minute we switch 10% of the requests to the new version of it which will allow to test the scaling of the solution. (CodeDeployDefaultLambdaCanary10Percent5Minutes)

AWS ControlTower Account Vending integration and ChatOps bot architecture

AWS ControlTower Account Vending integration and ChatOps bot architecture

The actual Serverless Application architecture built with Amazon Lex and the Application code in Lambda accessible through Amazon API Gateway, which will allow you to integrate this solution with almost any front-end (Slack, MS Teams, Website).

Amazon Lex: With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots (“chatbots”). As Amazon lex is not available yet in all AWS regions that currently AWS Control Tower is supported, it may be that you want to deploy Amazon Lex in another region than you have AWS Control Tower deployed.
Amazon API Gateway / AWS Lambda: The API Gateway is used as a central entry point for the Lambda functions (AccountVendor) that are capturing the Account vending requests from a frontend (e.g. Slack or Website). As Lambda functions can not be exposed directly as a REST service, they need a trigger which in this case API Gateway does.
Amazon SNS: Amazon Simple Notification Service (Amazon SNS) is a fully managed messaging service. SNS is used to send notifications via e-mail channel to an approver mailbox.
Amazon DynamoDB: Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multi-region, multi-active, durable database. Amazon DynamoDB will store the Account vending requests from the Lambda code that get triggered by the Lex-bot interaction.

Solution Overview and Prerequisites

Solution Overview

Start with building these 2 main components of the Architecture through an automated script. This will be split into “STEP 1”, and “STEP 2” in this walkthrough. “STEP 3” and “STEP 4” will be testing the solution and then integrating the solution with a frontend, in this case we use Slack as an example and also provide you with the Slack App manifest file to build the solution quickly.

STEP 1) “Install Amazon Lex Bot”: The key part of the left side of the Architecture, the Amazon Lex Bot called (“ChatOps” bot) will be built in a first step, then
STEP 2) “Build of the multi-environment CI/CD pipeline”: Build and deploy a full load testing DevOps pipeline that will stresstest the Lex bot and its capabilities to answer to requests. This will build the supporting components that are needed to integrate with Amazon Lex and are described below (Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon SNS).
STEP 3) “Testing the ChatOps Bot”: We will execute some test scripts through Postman, that will trigger Amazon API Gateway and trigger a sample Account request that will require a feedback from the ChatOps Lex Bot.
STEP 4) “Integration with Slack”: The final step is an end-to-end integration with an communication platform solution such as Slack.

The DevOps pipeline (using CodePipeline, CodeCommit, CodeBuild and CodeDeploy) is automatically triggered when the stack is deployed and the AWS CodeCommit repository is created inside the account. The pipeline builds the Amazon Lex ChatOps bot from the source code. The Step 2 integrates the surrounding components with the ChatOps Lex bot in 3 different environments: Dev/Staging/Prod. In addition to that, we use canary deployment to promote updates in the lambda code from the AWS CodeCommit repository. During the canary deployment we implemented the rollback procedure using a log metric filter that scans the word Exception inside the log file in CloudWatch. When the word is found, an alarm is triggered and deployment is automatically rolled back. Usually, the rollback will occur automatically during the load test phase. This would prevent faulty code from being promoted into the production environment.

Prerequisites

For this walkthrough, you should have the following prerequisites ready. What you’ll need:

An AWS account
A ready AWS ControlTower deployment (needs 3 AWS Accounts/e-mail addresses)
AWS Cloud9 IDE or a development environment with access to download/run the scripts provided through Github
You need to log into the AWS Control Tower management account with AWSAdministratorAccess role if using AWS SSO or equivalent permissions if you are using other federations.

Walkthrough

To get started, you can use Cloud9 IDE or log into your AWS SSO environment within AWS Control Tower.

Prepare: Set up the sample solution

1.1. Clone the GitHub repository to your Cloud9 environment.

The complete solution can be found at the GitHub repository here. The actual deployment and build are scripted in shell, but the Serverless code is in Java and uses Amazon Serverless services to build this solution (Amazon API Gateway, Amazon DynamoDB, Amazon SNS).

git clone https://github.com/aws-samples/multi-environment-chatops-bot-for-controltower

STEP 1: Install Amazon Lex Bot

Amazon Lex is currently not deployable natively with Amazon CloudFormation. Therefore the solution is using a custom Lambda resource in Amazon CloudFormation to create the Amazon Lex bot. We will create the Lex bot, along some sample utterances, three custom slots (Account Type, Account E-Mail and Organizational OU) and one main intent (“Control Tower Account Vending Intent”) to capture the request to trigger an AWS Account vending process.

2.1. Start the script, “deploy.sh” and provide the below inputs. Select a project name. You can override it if you wan’t to choose a custom name and select the bucket name accordingly (we recommend to use the default names)

./deploy.sh

Choose a project name [chatops-lex-bot-xyz]:

Choose a bucket name for source code upload [chatops-lex-bot-xyz]:

2.2. To confirm, double check the AWS region you have specificed.

Attention: Make sure you have configured your AWS CLI region! (use either 'aws configure' or set your 'AWS_DEFAULT_REGION' variable).

Using region from $AWS_DEFAULT_REGION: eu-west-1

2.3. Then, make sure you choose the region where you want to install Amazon Lex (make sure you use an available AWS region where Lex is available), or use the default and leave empty. The Amazon Lex AWS region can be different as where you have AWS ControlTower deployed.

Choose a region where you want to install the chatops-lex-bot [eu-west-1]:

Using region eu-west-1

2.4. The script will create a new S3 bucket in the specified region in order to upload the code to create the Amazon Lex bot.

Creating a new S3 bucket on eu-west-1 for your convenience...
make_bucket: chatops-lex-bot-xyz
Bucket chatops-lex-bot-xyz successfully created!

2.5. We show a summary of the bucket name and the project being used.

Using project name................chatops-lex-bot-xyz
Using bucket name.................chatops-lex-bot-xyz

2.6 Make sure that if any of these names or outputs are wrong, you can still stop here by pressing Ctrl+c.

If these parameters are wrong press ctrl+c to stop now...

2.7 The script will upload the source code to the S3 bucket specified, you should see a successful upload.

Waiting 9 seconds before continuing
upload: ./chatops-lex-bot-xyz.zip to s3://chatops-lex-bot-xyz/chatops-lex-bot-xyz.zip

2.8 Then, the script will trigger an aws cloudformation package command, that will use the uploaded zip file, reference it and generate a ready CloudFormation yml file for deployment. The output of the generated package-file (devops-packaged.yml) will be stored locally and used to executed the aws cloudformation deploy command.

Successfully packaged artifacts and wrote output template to file devops-packaged.yml.

Note: You can ignore this part below as the shell script will execute the “aws cloudformation deploy” command for you.

Execute the following command to deploy the packaged template

aws cloudformation deploy --template-file devops-packaged.yml --stack-name <YOUR STACK NAME>

2.9 The AWS CloudFormation scripts should be running in the background

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - chatops-lex-bot-xyz-cicd

2.10 Once you see the successful output of the CloudFormation script “chatops-lex-bot-xyz-cicd”, everything is ready to continue.

------------------------------------------
ChatOps Lex Bot Pipeline is installed
Will install the ChatOps API as an Add-On to the Vending Machine
------------------------------------------

2.11 Before we continue, confirm the output of the AWS CloudFormation called “chatops-lex-bot-xyz-cicd”. You should find three outputs from the CloudFormation template.

A CodePipeline, CodeCommit Repository with the same naming convention (chatops-lex-bot-xyz), and a CodeBuild execution with one stage (Prod). The execution of this pipeline should show as “Succeeded” within CodePipeline.
As a successful result of the execution of the Pipeline, you should find another CloudFormation that was triggered, which you should find in the output of CodeBuild or the CloudFormation Console (chatops-lex-bot-xyz-Prod).
The created resource of this CloudFormation will be the Lambda function (chatops-lex-bot-xyz-Prod-AppFunction-abcdefgh) that will create the Amazon Lex Bot. You can find the details in Amazon Lambda in the Mgmt console. For more information on CloudFormation and custom resources, see the CloudFormation documentation.
You can find the successful execution in the CloudWatch Logs:

Adding Slot Type:: AccountTypeValues
Adding Slot Type:: AccountOUValues
Adding Intent:: AWSAccountVending
Adding LexBot:: ChatOps
Adding LexBot Alias:: AWSAccountVending

Check if the Amazon Lex bot has been created in the Amazon Lex console, you should see an Amazon Lex bot called “ChatOps” with the status “READY”.

2.12. This means you have successfully installed the ChatOps Lex Bot. You can now continue with STEP 2.

STEP 2. Build of the multi-environment CI/CD pipeline

In this section, we will finalize the set up by creating a full CI/CD Pipeline, the API Gateway and Lambda functions that can capture requests for Account creation (AccountVendor) and interact with Amazon Lex, and a full testing cycle to do a Dev-Staging-Production build pipeline that does a stress test on the whole set of Infrastructure created.

3.1 You should see the same name of the bucket and project as used previously. If not, please override the input here. Otherwise, leave empty (we recommend to use the default names).

Choose a bucket name for source code upload [chatops-lex-xyz]:

3.2. This means that the Amazon Lex Bot was successfully deployed, and we just confirm the deployed AWS region.

ChatOps-Lex-Bot is already deployed in region eu-west-1

3.3 Please specify a mailbox that you have access in order to approve new ChatOps (e.g. Account vending) vending requests as a manual approver step.

Choose a mailbox to receive approval e-mails for new accounts: [email protected]

3.4 Make sure you have the right AWS region where AWS Control Tower has deployed its Account Factory Portfolio product in Service Catalog (to double check you can log into AWS Service Catalog and confirm that you see the AWS Control Tower Account Factory)

Choose the AWS region where your vending machine is installed [eu-west-1]:
Using region eu-west-1

Creating a new S3 bucket on eu-west-1 for your convenience...
{
"Location": "http://chatops-lex-xyz.s3.amazonaws.com/"
}

Bucket chatops-lex-xyz successfully created!

3.5 Now the script will identify if you have Control Tower deployed and if it can identify the Control Tower Account Factory Product.

Trying to find the AWS Control Tower Account Factory Portfolio

Using project name....................chatops-lex-xyz
Using bucket name.....................chatops-lex-xyz
Using mailbox for approvals...........approvermail+chatops-lex-bot-xyz@yourdomain.com
Using lexbot region...................eu-west-1
Using service catalog portfolio-id....port-abcdefghijklm

If these parameters are wrong press ctrl+c to stop now…

3.6 If something is wrong or has not been set and you see an empty line for any of the, stop here and press ctr+c. Check the Q&A section if you might have missed some errors previously. These values need to be filled to proceed.

Waiting 1 seconds before continuing
[INFO] Scanning for projects...
[INFO] Building Serverless Jersey API 1.0-SNAPSHOT

3.7 You should see a “BUILD SUCCESS” message.

[INFO] BUILD SUCCESS
[INFO] Total time: 0.190 s

3.8 Then the package built locally will be uploaded to the S3 bucket, and then again prepared for Amazon CloudFormation to package- and deploy.

upload: ./chatops-lex-xyz.zip to s3://chatops-lex-xyz/chatops-lex-xyz.zip

Successfully packaged artifacts and wrote output template to file devops-packaged.yml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file devops-packaged.yml --stack-name <YOUR STACK NAME>

3.9 You can neglect the above message, as the shell script will execute the Cloudformation API for you. The AWS CloudFormation scripts should be running in the background, and you can double check in the AWS Mgmt Console.

Waiting for changeset to be created..
Waiting for stack create/update to complete

Successfully created/updated stack - chatops-lex-xyz-cicd
------------------------------------------
ChatOps Lex Pipeline and Chatops Lex Bot Pipelines successfully installed
------------------------------------------

3.10 This means that the Cloud Formation scripts have executed successfully. Lets confirm in the Amazon CloudFormation console, and in Code Pipeline if we have a successful outcome and full test-run of the CICD pipeline. To remember, have a look at the AWS Architecture overview and the resources / components created.

You should find the successful Cloud Formation artefacts named:

chatops-lex-xyz-cicd: This is the core CloudFormation that we created and uploaded that built a full CI/CD pipeline with three phases (DEV/STAGING/PROD). All three stages will create a similar set of AWS resources (e.g. Amazon API Gateway, AWS Lambda, Amazon DynamoDB), but only the Staging phase will run an additional Load-Test prior to doing the production release.
chatops-lex-xyz-DEV: A successful build, creation and deployment of the DEV environment.
chatops-lex-xyz-STAGING: The staging phase will run a set of load tests, for a full testing and through io (an open-source load testing framework)
chatops-lex-xyz-PROD: A successful build, creation and deployment of the Production environment.

3.11 For further confirmation, you can check the Lambda-Functions (chatops-lex-xyz-pipeline-1-Prod-ChatOpsLexFunction-), Amazon DynamoDB (chatops-lex-xyz-pipeline-1_account_vending_) and Amazon SNS (chatops-lex-xyz-pipeline-1_aws_account_vending_topic_Prod) if all the resources as shown in the Architecture picture have been created.

Within Lambda and/or Amazon API Gateway, you will find the API Gateway execution endpoints, same as in the Output section from CloudFormation:

ApiUrl: https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account
ApiApproval https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/confirm

3.11 This means you have successfully installed the Amazon Lex ChatOps bot, and the surrounding test CI/CD pipeline. Make sure you have accepted the SNS subscription confirmation.

AWS Notification - Subscription Confirmation

You have chosen to subscribe to the topic:
arn:aws:sns:eu-west-1:12345678901:chatops-lex-xyz-pipeline_aws_account_vending_topic_Prod
To confirm this subscription, click or visit the link below (If this was in error no action is necessary)

STEP 3: Testing the ChatOps Bot

In this section, we provided a test script to test if the Amazon Lex Bot is up and if Amazon API Gateway/Lambda are correctly configured to handle the requests.

4.1 Use the Postman script under the /test folder postman-test.json, before you start integrating this solution with a Chat or Web- frontend such as Slack or a custom website in Production.

4.2. You can import the JSON file into Postman and execute a RESTful test call to the API Gateway endpoint.

4.3 Once the script is imported in Postman, you should execute the two commands below and replace the HTTP URL of the two requests (Vending API and Confirmation API) by the value of APIs recently created in the Production environment. Alternatively, you can also access these values directly from the Output tab in the CloudFormation stack with a name similar to chatops-lex-xyz-Prod:

aws cloudformation describe-stacks --query "Stacks[0].Outputs[?OutputKey=='ApiUrl'].OutputValue" --output text

aws cloudformation describe-stacks --query "Stacks[0].Outputs[?OutputKey=='ApiApproval'].OutputValue" --output text

4.4 Execute an API call against the PROD API

Use the Amazon API Gateaway endpoint to trigger a REST call against the endpoint, an example would be https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/. Make sure you change the “apiId” with your Amazon Gateway API ID endpoint found in the above sections (CloudFormation Output or within the Lambda), see here the start of the parameters that you have to change in the postman-test.json file:

"url": {
"raw": "https://apiId.execute-api.us-east-1.amazonaws.com/Prod/account",
"protocol": "https",

Request Input, fill out and update the values on each of the JSON sections:

{ “UserEmail”: “[email protected]”, “UserName”:“TestUser-Name”, “UserLastname”: “TestUser-LastName”, “UserInput”: “Hi, I would like a new account please!”}

If the test response is SUCCESSFUL, you should see the following JSON as a return:

{"response": "Hi TestUser-Name, what account type do you want? Production or Sandbox?","initial-params": "{\"UserEmail\": \"[email protected]\",\"UserName\":\"TestUser-Name\",\"UserLastname\": \"TestUser-LastName\",\"UserInput\": \"Hi, I would like a new account please!\"}"}

4.5 Test the “confirm” action. To confirm the Account vending request, you can easily execute the /confirm API, which is similar to if you would confirm the action through the e-mail confirmation that you receive via Amazon SNS.

Make sure you change the following sections in Postman (Production-Confirm-API) and use the ApiApproval-apiID that has the /confirm path.

https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/confirm

STEP 4: Slack Integration Example

We will demonstrate you how to integrate with a Slack channel but any other request portal (Jira), Website or App that allows REST API integrations (e.g. Amazon Chime) could be used for this.

5.1 Use the attached YAML slack App manifest file to create a new Slack Application within your Organization. Go to “https://api.slack.com/apps?new_app=1” and choose “Create New App”.

5.2 Choose the “From an app manifest” to create a new Slack App and paste the sample code from the /test folder slack-app-manifest.yml .

Note: Make sure you first overwrite the request_url parameter for your Slack App that will point to the Production API Gateway endpoint.

request_url: https://apiId.execute-api.us-east-1.amazonaws.com/Prod/account"

5.3 Choose to deploy and re-install the Slack App to your workspace and then access the ChatBot Application within your Slack workspace. If everything is successful, you can see a working Serverless ChatBot as shown below.

Slack Example

Conclusion and Cleanup

Conclusion

In this blog post, you have learned how to create a multi-environment CICD pipeline that builds a fully Serverless AWS account vending solution using an AI powered Amazon Lex bot integrated with AWS Control Tower Account Factory. This solution will help you enable standardized account vending on AWS through an easy way by exposing a ChatBot to your AWS consumers coming from various channels. This solution can be extended with AWS ServiceCatalog to allow to launch not just AWS accounts, but almost any AWS Service by using IaC (CloudFormation) templates provided through the CCoE Ops and Architecture teams.

Cleanup

For a proper cleanup, you can just go into AWS CloudFormation and choose the deployed Stacks and choose to “delete Stack”. If you incur issues while deleting, see below troubleshooting solutions for a fix. Also make sure you delete your integration Apps (e.g. Slack) for a full cleanup.

Troubleshooting

An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.
Solution: Make sure you use a distinct name for the S3 bucket used in this project, for the Amazon Lex Bot and the CICD pipeline
When you delete and rollback of the CloudFormation stacks and you get an error (Code: 409; Error Code: BucketNotEmpty).
Solution: Delete the S3 build bucket and its content “delete permanently” and then delete the associated CloudFormation stack that has created the CICD pipeline.

Deploy an automated ChatOps solution for remediating Amazon Macie findings

2021-01-05 Nick Cuneo

Post Syndicated from Nick Cuneo original https://aws.amazon.com/blogs/security/deploy-an-automated-chatops-solution-for-remediating-amazon-macie-findings/

The amount of data being collected, stored, and processed by Amazon Web Services (AWS) customers is growing at an exponential rate. In order to keep pace with this growth, customers are turning to scalable cloud storage services like Amazon Simple Storage Service (Amazon S3) to build data lakes at the petabyte scale. Customers are looking for new, automated, and scalable ways to address their data security and compliance requirements, including the need to identify and protect their sensitive data. Amazon Macie helps customers address this need by offering a managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data that is stored in Amazon S3.

In this blog post, I show you how to deploy a solution that establishes an automated event-driven workflow for notification and remediation of sensitive data findings from Macie. Administrators can review and approve remediation of findings through a ChatOps-style integration with Slack. Slack is a business communication tool that provides messaging functionality, including persistent chat rooms known as channels. With this solution, you can streamline the notification, investigation, and remediation of sensitive data findings in your AWS environment.

Prerequisites

Before you deploy the solution, make sure that your environment is set up with the following prerequisites:

You have access to an AWS account with permissions to create the resources listed in the solution overview through AWS CloudFormation.
The AWS Command Line Interface (AWS CLI) is installed and configured for use. Make sure that your configured default Region supports Macie by checking service availability.
The AWS Cloud Development Kit (AWS CDK) is installed and configured for use.
You have a Slack account with permissions to add apps and integrations in your desired workspace and channel. If you aren’t already a Slack user, it’s free to sign up and create a workspace and channel of your own.

Important: This solution uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See the AWS pricing page for details.

Solution overview

The solution architecture and workflow are detailed in Figure 1.

Figure 1: Solution overview

This solution allows for the configuration of auto-remediation behavior based on finding type and finding severity. For each finding type, you can define whether you want the offending S3 object to be automatically quarantined, or whether you want the finding details to be reviewed and approved by a human in Slack prior to being quarantined. In a similar manner, you can define the minimum severity level (Low, Medium, High) that a finding must have before the solution will take action. By adjusting these parameters, you can manage false positives and tune the volume and type of findings about which you want to be notified and take action. This configurability is important because customers have different security, risk, and regulatory requirements.

Figure 1 details the services used in the solution and the integration points between them. Let’s walk through the full sequence from the detection of sensitive data to the remediation (quarantine) of the offending object.

Macie is configured with sensitive data discovery jobs (scheduled or one-time), which you create and run to detect sensitive data within S3 buckets. When Macie runs a job, it uses a combination of criteria and techniques to analyze objects in S3 buckets that you specify. For a full list of the categories of sensitive data Macie can detect, see the Amazon Macie User Guide.
For each sensitive data finding, an event is sent to Amazon EventBridge that contains the finding details. An EventBridge rule triggers a Lambda function for processing.
The Finding Handler Lambda function parses the event and examines the type of the finding. Based on the auto-remediation configuration, the function either invokes the Finding Remediator function for immediate remediation, or sends the finding details for manual review and remediation approval through Slack.
Delegated security and compliance administrators monitor the configured Slack channel for notifications. Notifications provide high-level finding information, remediation status, and a link to the Macie console for the finding in question. For findings configured for manual review, administrators can choose to approve the remediation in Slack by using an action button on the notification.
After an administrator chooses the Remediate button, Slack issues an API call to an Amazon API Gateway endpoint, supplying both the unique identifier of the finding to be remediated and that of the Slack user. API Gateway proxies the request to a Remediation Handler Lambda function.
The Remediation Handler Lambda function validates the request and request signature, extracts the offending object’s location from the finding, and makes an asynchronous call to the Finding Remediator Lambda function.
The Finding Remediator Lambda function moves the offending object from the source bucket to a designated S3 quarantine bucket with restricted access.
Finally, the Finding Remediator Lambda function uses a callback URL to update the original finding notification in Slack, indicating that the offending object has now been quarantined.

Deploy the solution

Now we’ll walk through the steps for configuring Slack and deploying the solution into your AWS environment by using the AWS CDK. The AWS CDK is a software development framework that you can use to define cloud infrastructure in code and provision through AWS CloudFormation.

The deployment steps can be summarized as follows:

Configure a Slack channel and app
Check the project out from GitHub
Set the configuration parameters
Build and deploy the solution
Configure Slack with an API Gateway endpoint

To configure a Slack channel and app

In your browser, make sure you’re logged into the Slack workspace where you want to integrate the solution.
Create a new channel where you will send the notifications, as follows:
1. Choose the + icon next to the Channels menu, and select Create a channel.
2. Give your channel a name, for example macie-findings, and make sure you turn on the Make private setting.
  
  Important: By providing Slack users with access to this configured channel, you’re providing implicit access to review Macie finding details and approve remediations. To avoid unwanted user access, it’s strongly recommended that you make this channel private and by invite only.
On your Apps page, create a new app by selecting Create New App, and then enter the following information:
1. For App Name, enter a name of your choosing, for example MacieRemediator.
2. Select your chosen development Slack workspace that you logged into in step 1.
3. Choose Create App.
Figure 2: Create a Slack app
You will then see the Basic Information page for your app. Scroll down to the App Credentials section, and note down the Signing Secret. This secret will be used by the Lambda function that handles all remediation requests from Slack. The function uses the secret with Hash-based Message Authentication Code (HMAC) authentication to validate that requests to the solution are legitimate and originated from your trusted Slack channel.

Figure 3: Signing secret
Scroll back to the top of the Basic Information page, and under Add features and functionality, select the Incoming Webhooks tile. Turn on the Activate Incoming Webhooks setting.
At the bottom of the page, choose Add New Webhook to Workspace.
1. Select the macie-findings channel you created in step 2, and choose Allow.
2. You should now see webhook URL details under Webhook URLs for Your Workspace. Use the Copy button to note down the URL, which you will need later.
  
  Figure 4: Webhook URL

To check the project out from GitHub

The solution source is available on GitHub in AWS Samples. Clone the project to your local machine or download and extract the available zip file.

To set the configuration parameters

In the root directory of the project you’ve just cloned, there’s a file named cdk.json. This file contains configuration parameters to allow integration with the macie-findings channel you created earlier, and also to allow you to control the auto-remediation behavior of the solution. Open this file and make sure that you review and update the following parameters:

autoRemediateConfig – This nested attribute allows you to specify for each sensitive data finding type whether you want to automatically remediate and quarantine the offending object, or first send the finding to Slack for human review and authorization. Note that you will still be notified through Slack that auto-remediation has taken place if this attribute is set to AUTO. Valid values are either AUTO or REVIEW. You can use the default values.
minSeverityLevel – Macie assigns all findings a Severity level. With this parameter, you can define a minimum severity level that must be met before the solution will trigger action. For example, if the parameter is set to MEDIUM, the solution won’t take any action or send any notifications when a finding has a LOW severity, but will take action when a finding is classified as MEDIUM or HIGH. Valid values are: LOW, MEDIUM, and HIGH. The default value is set to LOW.
slackChannel – The name of the Slack channel you created earlier (macie-findings).
slackWebHookUrl – For this parameter, enter the webhook URL that you noted down during Slack app setup in the “Configure a Slack channel and app” step.
slackSigningSecret – For this parameter, enter the signing secret that you noted down during Slack app setup.

Save your changes to the configuration file.

To build and deploy the solution

From the command line, make sure that your current working directory is the root directory of the project that you cloned earlier. Run the following commands:
- npm install – Installs all Node.js dependencies.
- npm run build – Compiles the CDK TypeScript source.
- cdk bootstrap – Initializes the CDK environment in your AWS account and Region, as shown in Figure 5.
  
  Figure 5: CDK bootstrap output
- cdk deploy – Generates a CloudFormation template and deploys the solution resources.
The resources created can be reviewed in the CloudFormation console and can be summarized as follows:
- Lambda functions – Finding Handler, Remediation Handler, and Remediator
- IAM execution roles and associated policy – The roles and policy associated with each Lambda function and the API Gateway
- S3 bucket – The quarantine S3 bucket
- EventBridge rule – The rule that triggers the Lambda function for Macie sensitive data findings
- API Gateway – A single remediation API with proxy integration to the Lambda handler
After you run the deploy command, you’ll be prompted to review the IAM resources deployed as part of the solution. Press y to continue.
Once the deployment is complete, you’ll be presented with an output parameter, shown in Figure 6, which is the endpoint for the API Gateway that was deployed as part of the solution. Copy this URL.

Figure 6: CDK deploy output

To configure Slack with the API Gateway endpoint

Open Slack and return to the Basic Information page for the Slack app you created earlier.
Under Add features and functionality, select the Interactive Components tile.
Turn on the Interactivity setting.
In the Request URL box, enter the API Gateway endpoint URL you copied earlier.
Choose Save Changes.

Figure 7: Slack app interactivity

Now that you have the solution components deployed and Slack configured, it’s time to test things out.

Test the solution

The testing steps can be summarized as follows:

Upload dummy files to S3
Run the Macie sensitive data discovery job
Review and act upon Slack notifications
Confirm that S3 objects are quarantined

To upload dummy files to S3

Two sample text files containing dummy financial and personal data are available in the project you cloned from GitHub. If you haven’t changed the default auto-remediation configurations, these two files will exercise both the auto-remediation and manual remediation review flows.

Find the files under sensitive-data-samples/dummy-financial-data.txt and sensitive-data-samples/dummy-personal-data.txt. Take these two files and upload them to S3 by using either the console, as shown in Figure 8, or AWS CLI. You can choose to use any new or existing bucket, but make sure that the bucket is in the same AWS account and Region that was used to deploy the solution.

Figure 8: Dummy files uploaded to S3

To run a Macie sensitive data discovery job

Navigate to the Amazon Macie console, and make sure that your selected Region is the same as the one that was used to deploy the solution.
1. If this is your first time using Macie, choose the Get Started button, and then choose Enable Macie.
On the Macie Summary dashboard, you will see a Create Job button at the top right. Choose this button to launch the Job creation wizard. Configure each step as follows:
1. Select S3 buckets: Select the bucket where you uploaded the dummy sensitive data file. Choose Next.
2. Review S3 buckets: No changes are required, choose Next.
3. Scope: For Job type, choose One-time job. Make sure Sampling depth is set to 100%. Choose Next.
4. Custom data identifiers: No changes are required, choose Next.
5. Name and description: For Job name, enter any name you like, such as Dummy job, and then choose Next.
6. Review and create: Review your settings; they should look like the following sample. Choose Submit.

Figure 9: Configure the Macie sensitive data discovery job

Macie will launch the sensitive data discovery job. You can track its status from the Jobs page within the Macie console.

To review and take action on Slack notifications

Within five minutes of submitting the data discovery job, you should expect to see two notifications appear in your configured Slack channel. One notification, similar to the one in Figure 10, is informational only and is related to an auto-remediation action that has taken place.

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

The other notification, similar to the one in Figure 11, requires end user action and is for a finding that requires administrator review. All notifications will display key information such as the offending S3 object, a description of the finding, the finding severity, and other relevant metadata.

Figure 11: Slack notification for human review of the file containing dummy personal data

(Optional) You can review the finding details by choosing the View Macie Finding in Console link in the notification.

In the Slack notification, choose the Remediate button to quarantine the object. The notification will be updated with confirmation of the quarantine action, as shown in Figure 12.

Figure 12: Slack notification of authorized remediation

To confirm that S3 objects are quarantined

Finally, navigate to the S3 console and validate that the objects have been removed from their original bucket and placed into the quarantine bucket listed in the notification details, as shown in Figure 13. Note that you may need to refresh your S3 object listing in the browser.

Figure 13: Slack notification of authorized remediation

Congratulations! You now have a fully operational solution to detect and respond to Macie sensitive data findings through a Slack ChatOps workflow.

Solution cleanup

To remove the solution and avoid incurring additional charges from the AWS resources that you deployed, complete the following steps.

To remove the solution and associated resources

Navigate to the Macie console. Under Settings, choose Suspend Macie.
Navigate to the S3 console and delete all objects in the quarantine bucket.
Run the command cdk destroy from the command line within the root directory of the project. You will be prompted to confirm that you want to remove the solution. Press y.

Summary

In this blog post, I showed you how to integrate Amazon Macie sensitive data findings with an auto-remediation and Slack ChatOps workflow. We reviewed the AWS services used, how they are integrated, and the steps to configure, deploy, and test the solution. With Macie and the solution in this blog post, you can substantially reduce the heavy lifting associated with detecting and responding to sensitive data in your AWS environment.

I encourage you to take this solution and customize it to your needs. Further enhancements could include supporting policy findings, adding additional remediation actions, or integrating with additional findings from AWS Security Hub.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Noise