Tag Archives: automation

Blue/Green deployment with AWS Developer tools on Amazon EC2 using Amazon EFS to host application source code

Post Syndicated from Rakesh Singh original https://aws.amazon.com/blogs/devops/blue-green-deployment-with-aws-developer-tools-on-amazon-ec2-using-amazon-efs-to-host-application-source-code/

Many organizations building modern applications require a shared and persistent storage layer for hosting and deploying data-intensive enterprise applications, such as content management systems, media and entertainment, distributed applications like machine learning training, etc. These applications demand a centralized file share that scales to petabytes without disrupting running applications and remains concurrently accessible from potentially thousands of Amazon EC2 instances.

Simultaneously, customers want to automate the end-to-end deployment workflow and leverage continuous methodologies utilizing AWS developer tools services for performing a blue/green deployment with zero downtime. A blue/green deployment is a deployment strategy wherein you create two separate, but identical environments. One environment (blue) is running the current application version, and one environment (green) is running the new application version. The blue/green deployment strategy increases application availability by generally isolating the two application environments and ensuring that spinning up a parallel green environment won’t affect the blue environment resources. This isolation reduces deployment risk by simplifying the rollback process if a deployment fails.

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, and fully-managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It scales on demand, thereby eliminating the need to provision and manage capacity in order to accommodate growth. Utilize Amazon EFS to create a shared directory that stores and serves code and content for numerous applications. Your application can treat a mounted Amazon EFS volume like local storage. This means you don’t have to deploy your application code every time the environment scales up to multiple instances to distribute load.

In this blog post, I will guide you through an automated process to deploy a sample web application on Amazon EC2 instances utilizing Amazon EFS mount to host application source code, and utilizing a blue/green deployment with AWS code suite services in order to deploy the application source code with no downtime.

How this solution works

This blog post includes a CloudFormation template to provision all of the resources needed for this solution. The CloudFormation stack deploys a Hello World application on Amazon Linux 2 EC2 Instances running behind an Application Load Balancer and utilizes Amazon EFS mount point to store the application content. The AWS CodePipeline project utilizes AWS CodeCommit as the version control, AWS CodeBuild for installing dependencies and creating artifacts,  and AWS CodeDeploy to conduct deployment on EC2 instances running in an Amazon EC2 Auto Scaling group.

Figure 1 below illustrates our solution architecture.

Sample solution architecture

Figure 1: Sample solution architecture

The event flow in Figure 1 is as follows:

  1. A developer commits code changes from their local repo to the CodeCommit repository. The commit triggers CodePipeline execution.
  2. CodeBuild execution begins to compile source code, install dependencies, run custom commands, and create deployment artifact as per the instructions in the Build specification reference file.
  3. During the build phase, CodeBuild copies the source-code artifact to Amazon EFS file system and maintains two different directories for current (green) and new (blue) deployments.
  4. After successfully completing the build step, CodeDeploy deployment kicks in to conduct a Blue/Green deployment to a new Auto Scaling Group.
  5. During the deployment phase, CodeDeploy mounts the EFS file system on new EC2 instances as per the CodeDeploy AppSpec file reference and conducts other deployment activities.
  6. After successful deployment, a Lambda function triggers in order to store a deployment environment parameter in Systems Manager parameter store. The parameter stores the current EFS mount name that the application utilizes.
  7. The AWS Lambda function updates the parameter value during every successful deployment with the current EFS location.

Prerequisites

For this walkthrough, the following are required:

Deploy the solution

Once you’ve assembled the prerequisites, download or clone the GitHub repo and store the files on your local machine. Utilize the commands below to clone the repo:

mkdir -p ~/blue-green-sample/
cd ~/blue-green-sample/
git clone https://github.com/aws-samples/blue-green-deployment-pipeline-for-efs

Once completed, utilize the following steps to deploy the solution in your AWS account:

  1. Create a private Amazon Simple Storage Service (Amazon S3) bucket by using this documentation
    AWS S3 console view when creating a bucket

    Figure 2: AWS S3 console view when creating a bucket

     

  2. Upload the cloned or downloaded GitHub repo files to the root of the S3 bucket. the S3 bucket objects structure should look similar to Figure 3:
    AWS S3 bucket object structure after you upload the Github repo content

    Figure 3: AWS S3 bucket object structure

     

  3. Go to the S3 bucket and select the template name solution-stack-template.yml, and then copy the object URL.
  4. Open the CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
  5. Select Amazon S3 URL as the template source, paste the object URL that you copied in Step 3, and then choose Next.
  6. On the Specify stack details page, enter a name for the stack and provide the following input parameter. Modify the default values for other parameters in order to customize the solution for your environment. You can leave everything as default for this walkthrough.
  • ArtifactBucket– The name of the S3 bucket that you created in the first step of the solution deployment. This is a mandatory parameter with no default value.
Defining the stack name and input parameters for the CloudFormation stack

Figure 4: Defining the stack name and input parameters for the CloudFormation stack

  1. Choose Next.
  2. On the Options page, keep the default values and then choose Next.
  3. On the Review page, confirm the details, acknowledge that CloudFormation might create IAM resources with custom names, and then choose Create Stack.
  4. Once the stack creation is marked as CREATE_COMPLETE, the following resources are created:
  • A virtual private cloud (VPC) configured with two public and two private subnets.
  • NAT Gateway, an EIP address, and an Internet Gateway.
  • Route tables for private and public subnets.
  • Auto Scaling Group with a single EC2 Instance.
  • Application Load Balancer and a Target Group.
  • Three security groups—one each for ALB, web servers, and EFS file system.
  • Amazon EFS file system with a mount target for each Availability Zone.
  • CodePipeline project with CodeCommit repository, CodeBuild, and CodeDeploy resources.
  • SSM parameter to store the environment current deployment status.
  • Lambda function to update the SSM parameter for every successful pipeline execution.
  • Required IAM Roles and policies.

      Note: It may take anywhere from 10-20 minutes to complete the stack creation.

Test the solution

Now that the solution stack is deployed, follow the steps below to test the solution:

  1. Validate CodePipeline execution status

After successfully creating the CloudFormation stack, a CodePipeline execution automatically triggers to deploy the default application code version from the CodeCommit repository.

  • In the AWS console, choose Services and then CloudFormation. Select your stack name. On the stack Outputs tab, look for the CodePipelineURL key and click on the URL.
  • Validate that all steps have successfully completed. For a successful CodePipeline execution, you should see something like Figure 5. Wait for the execution to complete in case it is still in progress.
CodePipeline console showing execution status of all stages

Figure 5: CodePipeline console showing execution status of all stages

 

  1. Validate the Website URL

After completing the pipeline execution, hit the website URL on a browser to check if it’s working.

  • On the stack Outputs tab, look for the WebsiteURL key and click on the URL.
  • For a successful deployment, it should open a default page similar to Figure 6.
Sample “Hello World” application (Green deployment)

Figure 6: Sample “Hello World” application (Green deployment)

 

  1. Validate the EFS share

After the website deployed successfully, we will get into the application server and validate the EFS mount point and the application source code directory.

  • Open the Amazon EC2 console, and then choose Instances in the left navigation pane.
  • Select the instance named bg-sample and choose
  • For Connection method, choose Session Manager, and then choose connect

After the connection is made, run the following bash commands to validate the EFS mount and the deployed content. Figure 7 shows a sample output from running the bash commands.

sudo df –h | grep efs
ls –la /efs/green
ls –la /var/www/
Sample output from the bash command (Green deployment)

Figure 7: Sample output from the bash command (Green deployment)

 

  1. Deploy a new revision of the application code

After verifying the application status and the deployed code on the EFS share, commit some changes to the CodeCommit repository in order to trigger a new deployment.

  • On the stack Outputs tab, look for the CodeCommitURL key and click on the corresponding URL.
  • Click on the file html.
  • Click on
  • Uncomment line 9 and comment line 10, so that the new lines look like those below after the changes:
background-color: #0188cc; 
#background-color: #90ee90;
  • Add Author name, Email address, and then choose Commit changes.

After you commit the code, the CodePipeline triggers and executes Source, Build, Deploy, and Lambda stages. Once the execution completes, hit the Website URL and you should see a new page like Figure 8.

New Application version (Blue deployment)

Figure 8: New Application version (Blue deployment)

 

On the EFS side, the application directory on the new EC2 instance now points to /efs/blue as shown in Figure 9.

Sample output from the bash command (Blue deployment)

Figure 9: Sample output from the bash command (Blue deployment)

Solution review

Let’s review the pipeline stages details and what happens during the Blue/Green deployment:

1) Build stage

For this sample application, the CodeBuild project is configured to mount the EFS file system and utilize the buildspec.yml file present in the source code root directory to run the build. Following is the sample build spec utilized in this solution:

version: 0.2
phases:
  install:
    runtime-versions:
      php: latest   
  build:
    commands:
      - current_deployment=$(aws ssm get-parameter --name $SSM_PARAMETER --query "Parameter.Value" --region $REGION --output text)
      - echo $current_deployment
      - echo $SSM_PARAMETER
      - echo $EFS_ID $REGION
      - if [[ "$current_deployment" == "null" ]]; then echo "this is the first GREEN deployment for this project" ; dir='/efs/green' ; fi
      - if [[ "$current_deployment" == "green" ]]; then dir='/efs/blue' ; else dir='/efs/green' ; fi
      - if [ ! -d $dir ]; then  mkdir $dir >/dev/null 2>&1 ; fi
      - echo $dir
      - rsync -ar $CODEBUILD_SRC_DIR/ $dir/
artifacts:
  files:
      - '**/*'

During the build job, the following activities occur:

  • Installs latest php runtime version.
  • Reads the SSM parameter value in order to know the current deployment and decide which directory to utilize. The SSM parameter value flips between green and blue for every successful deployment.
  • Synchronizes the latest source code to the EFS mount point.
  • Creates artifacts to be utilized in subsequent stages.

Note: Utilize the default buildspec.yml as a reference and customize it further as per your requirement. See this link for more examples.

2) Deploy Stage

The solution is utilizing CodeDeploy blue/green deployment type for EC2/On-premises. The deployment environment is configured to provision a new EC2 Auto Scaling group for every new deployment in order to deploy the new application revision. CodeDeploy creates the new Auto Scaling group by copying the current one. See this link for more details on blue/green deployment configuration with CodeDeploy. During each deployment event, CodeDeploy utilizes the appspec.yml file to run the deployment steps as per the defined life cycle hooks. Following is the sample AppSpec file utilized in this solution.

version: 0.0
os: linux
hooks:
  BeforeInstall:
    - location: scripts/install_dependencies
      timeout: 180
      runas: root
  AfterInstall:
    - location: scripts/app_deployment
      timeout: 180
      runas: root
  BeforeAllowTraffic :
     - location: scripts/check_app_status
       timeout: 180
       runas: root  

Note: The scripts mentioned in the AppSpec file are available in the scripts directory of the CodeCommit repository. Utilize these sample scripts as a reference and modify as per your requirement.

For this sample, the following steps are conducted during a deployment:

  • BeforeInstall:
    • Installs required packages on the EC2 instance.
    • Mounts the EFS file system.
    • Creates a symbolic link to point the apache home directory /var/www/html to the appropriate EFS mount point. It also ensures that the new application version deploys to a different EFS directory without affecting the current running application.
  • AfterInstall:
    • Stops apache web server.
    • Fetches current EFS directory name from Systems Manager.
    • Runs some clean up commands.
    • Restarts apache web server.
  • BeforeAllowTraffic:
    • Checks application status if running fine.
    • Exits the deployment with error if the app returns a non 200 HTTP status code. 

3) Lambda Stage

After completing the deploy stage, CodePipeline triggers a Lambda function in order to update the SSM parameter value with the updated EFS directory name. This parameter value alternates between “blue” and “green” to help CodePipeline identify the right EFS file system path during the next deployment.

CodeDeploy Blue/Green deployment

Let’s review the sequence of events flow during the CodeDeploy deployment:

  1. CodeDeploy creates a new Auto Scaling group by copying the original one.
  2. Provisions a replacement EC2 instance in the new Auto Scaling Group.
  3. Conducts the deployment on the new instance as per the instructions in the yml file.
  4. Sets up health checks and redirects traffic to the new instance.
  5. Terminates the original instance along with the Auto Scaling Group.
  6. After completing the deployment, it should appear as shown in Figure 10.
AWS CodeDeploy console view of a Blue/Green CodeDeploy deployment on Ec2

Figure 10: AWS console view of a Blue/Green CodeDeploy deployment on Ec2

Troubleshooting

To troubleshoot any service-related issues, see the following links:

More information

Now that you have tested the solution, here are some additional points worth noting:

  • The sample template and code utilized in this blog can work in any AWS region and are mainly intended for demonstration purposes. Utilize the sample as a reference and modify it further as per your requirement.
  • This solution works with single account, Region, and VPC combination.
  • For this sample, we have utilized AWS CodeCommit as version control, but you can also utilize any other source supported by AWS CodePipeline like Bitbucket, GitHub, or GitHub Enterprise Server

Clean up

Follow these steps to delete the components and avoid any future incurring charges:

  1. Open the AWS CloudFormation console.
  2. On the Stacks page in the CloudFormation console, select the stack that you created for this blog post. The stack must be currently running.
  3. In the stack details pane, choose Delete.
  4. Select Delete stack when prompted.
  5. Empty and delete the S3 bucket created during deployment step 1.

Conclusion

In this blog post, you learned how to set up a complete CI/CD pipeline for conducting a blue/green deployment on EC2 instances utilizing Amazon EFS file share as mount point to host application source code. The EFS share will be the central location hosting your application content, and it will help reduce your overall deployment time by eliminating the need for deploying a new revision on every EC2 instance local storage. It also helps to preserve any dynamically generated content when the life of an EC2 instance ends.

Author bio

Rakesh Singh

Rakesh is a Senior Technical Account Manager at Amazon. He loves automation and enjoys working directly with customers to solve complex technical issues and provide architectural guidance. Outside of work, he enjoys playing soccer, singing karaoke, and watching thriller movies.

Our Journey to Continuous Delivery at Grab (Part 2)

Post Syndicated from Grab Tech original https://engineering.grab.com/our-journey-to-continuous-delivery-at-grab-part2

In the first part of this blog post, you’ve read about the improvements made to our build and staging deployment process, and how plenty of manual tasks routinely taken by engineers have been automated with Conveyor: an in-house continuous delivery solution.

This new post begins with the introduction of the hermeticity principle for our deployments, and how it improves the confidence with promoting changes to production. Changes sent to production via Conveyor’s deployment pipelines are then described in detail.

Overview of Grab delivery process
Overview of Grab delivery process

Finally, looking back at the engineering efficiency improvements around velocity and reliability over the last 2 years, we answer the big question – was the investment on a custom continuous delivery solution like Conveyor the right decision for Grab?

Improving Confidence in our Production Deployments with Hermeticity

The term deployment hermeticity is borrowed from build systems. A build system is called hermetic if builds always produce the same artefacts regardless of changes in the environment they run on. Similarly, we call our deployments hermetic if they always result in the same deployed artefacts regardless of the environment’s change or the number of times they are executed.

The behaviour of a service is rarely controlled by a single variable. The application that makes up your service is an important driver of its behaviour, but its configuration is an important contributor, for example. The behaviour for traditional microservices at Grab is dictated mainly by 3 versioned artefacts: application code, static and dynamic configuration.

Conveyor has been integrated with the systems that operate changes in each of these parameters. By tracking all 3 parameters at every deployment, Conveyor can reproducibly deploy microservices with similar behaviour: its deployments are hermetic.

Building upon this property, Conveyor can ensure that all deployments made to production have been tested before with the same combination of parameters. This is valuable to us:

  • An outcome of staging deployments for a specific set of parameters is a good predictor of outcomes in production deployments for the same set of parameters and thus it makes testing in staging more relevant.
  • Rollbacks are hermetic; we never rollback to a combination of parameters that has not been used previously.

In the past, incidents had resulted from an application rollback not compatible with the current dynamic configuration version; this was aggravating since rollbacks are expected to be a safe recovery mechanism. The introduction of hermetic deployments has largely eliminated this category of problems.

Hermeticity is maintained by registering the deployment parameters as artefacts after each successfully completed pipeline. Users must then select one of the registered deployment metadata to promote to production.

At this point, you might be wondering: why not use a single pipeline that includes both staging and production deployments? This was indeed how it started, with a single pipeline spanning multiple environments. However, engineers soon complained about it.

The most obvious reason for the complaint was that less than 20% of changes deployed in staging will make their way to production. This meant that engineers would have toil associated with each completed staging deployment since the pipeline must be manually cancelled rather than continued to production.

The other reason is that this multi-environment pipeline approach reduced flexibility when promoting changes to production. There are different ways to apply changes to a cluster. For example, lengthy pipelines that refresh instances can be used to deploy any combination of changes, while there are quicker pipelines restricted to dynamic configuration changes (such as feature flags rollouts). Regardless of the order in which the changes are made and how they are applied, Conveyor tracks the change.

Eventually, engineers promote a deployment artefact to production. However they do not need to apply changes in the same sequence with which were applied to staging. Furthermore, to prevent erroneous actions, Conveyor presents only changes that can be applied with the requested pipeline (and sometimes, no changes are available). Not being forced into a specific method of deploying changes is one of added benefits of hermetic deployments.

Returning to Our Journey Towards Engineering Efficiency

If you can recall, the first part of this blog post series ended with a description of staging deployment. Our deployment to production starts with a verification that we uphold our hermeticity principle, as explained above.

Our production deployment pipelines can run for several hours for large clusters with rolling releases (few run for days), so we start by acquiring locks to ensure there are no concurrent deployments for any given cluster.

Before making any changes to the environment, we automatically generate release notes, giving engineers a chance to abort if the wrong set of changes are sent to production.

The pipeline next waits for a deployment slot. Early on, engineers adopted deployment windows that coincide with office hours, such that if anything goes wrong, there is always someone on hand to help. Prior to the introduction of Conveyor, however, engineers would manually ask a Slack bot for approval. This interaction is now automated, and the only remaining action left is for the engineer to approve that the deployment can proceed via a single click, in line with our hands-off deployment principle.

When the canary is in production, Conveyor automates monitoring it. This process is similar to the one already discussed in the first part of this blog post: Engineers can configure a set of alerts that Conveyor will keep track of. As soon as any one of the alerts is triggered, Conveyor automatically rolls back the service.

If no alert is raised for the duration of the monitoring period, Conveyor waits again for a deployment slot. It then publishes the release notes for that deployment and completes the deployments for the cluster. After the lock is released and the deployment registered, the pipeline finally comes to its successful completion.

Benefits of Our Journey Towards Engineering Efficiency

All these improvements made over the last 2 years have reduced the effort spent by engineers on deployment while also reducing the failure rate of our deployments.

If you are an engineer working on DevOps in your organisation, you know how hard it can be to measure the impact you made on your organisation. To estimate the time saved by our pipelines, we can model the activities that were previously done manually with a rudimentary weighted graph. In this graph, each edge carries a probability of the activity being performed (100% when unspecified), while each vertex carries the time taken for that activity.

Focusing on our regular staging deployments only, such a graph would look like this:

The overall amount of effort automated by the staging pipelines () is represented in the graph above. It can be converted into the equation below:

This equation shows that for each staging deployment, around 16 minutes of work have been saved. Similarly, for regular production deployments, we find that 67 minutes of work were saved for each deployment:

Moreover, efficiency was not the only benefit brought by the use of deployment pipelines for our traditional microservices. Surprisingly perhaps, the rate of failures related to production changes is progressively reducing while the amount of production changes that were made with Conveyor increased across the organisation (starting at 1.5% of failures per deployments, and finishing at 0.3% on average over the last 3 months for the period of data collected):

Keep Calm and Automate

Since the first draft for this post was written, we’ve made many more improvements to our pipelines. We’ve begun automating Database Migrations; we’ve extended our set of hermetic variables to Amazon Machine Image (AMI) updates; and we’re working towards supporting container deployments.

Through automation, all of Conveyor’s deployment pipelines have contributed to save more than 5,000 man-days of efforts in 2020 alone, across all supported teams. That’s around 20 man-years worth of effort, which is around 3 times the capacity of the team working on the project! Investments in our automation pipelines have more than paid for themselves, and the gains go up every year as more workflows are automated and more teams are onboarded.

If Conveyor has saved efforts for engineering teams, has it then helped to improve velocity? I had opened the first part of this blog post with figures on the deployment funnel for microservice teams at Grab, towards the end of 2018. So where do the figures stand today for these teams?

In the span of 2 years, the average number of build and staging deployment performed each day has not varied much. However, in the last 3 months of 2020, engineers have sent twice more changes to production than they did for the same period in 2018.

Perhaps the biggest recognition received by the team working on the project, was from Grab’s engineers themselves. In the 2020 internal NPS survey for engineering experience at Grab, Conveyor received the highest score of any tools (built in-house or not).


All these improvements in efficiency for our engineers would never have been possible without the hard work of all team members involved in the project, past and present: Tanun Chalermsinsuwan, Aufar Gilbran, Deepak Ramakrishnaiah, Repon Kumar Roy (Kowshik), Su Han, Voislav Dimitrijevikj, Stanley Goh, Htet Aung Shine, Evan Sebastian, Qijia Wang, Oscar Ng, Jacob Sunny, Subhodip Mandal and many others who have contributed and collaborated with them.


Join Us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30%

Post Syndicated from Grab Tech original https://engineering.grab.com/learn-how-grab-leveraged-performance-marketing-automation

Grab, Southeast Asia’s leading superapp, is a hyperlocal three-sided marketplace that operates across hundreds of cities in Southeast Asia. Grab started out as a taxi hailing company back in 2012 and in less than a decade, the business has evolved tremendously and now offers a diverse range of services for consumers’ everyday needs.

To fuel our business growth in newer service offerings such as GrabFood, GrabMart and GrabExpress, user acquisition efforts play a pivotal role in ensuring we create a sustainable Grab ecosystem that balances the marketplace dynamics between our consumers, driver-partners and merchant-partners.

Part of our user growth strategy is centred around our efforts in running direct-response app campaigns to increase trials on our superapp offerings. Executing these campaigns brings about a set of unique challenges against the diverse cultural backdrop present in Southeast Asia, challenging the team to stay hyperlocal in our strategies while driving user volumes at scale. To address these unique challenges, Grab’s performance marketing team is constantly seeking ways to leverage automation and innovate on our operations, improving our marketing efficiency and effectiveness.

Managing Grab’s Ever-expanding Business, Geographical Coverage and New User Acquisition

Grab’s ever-expanding services, extensive geographical coverage and hyperlocal strategies result in an extremely dynamic, yet complex ad account structure. This also means that whenever there is a new business vertical launch or hyperlocal campaign, the team would spend valuable hours rolling out a large volume of new ads across our accounts in the region.

Sample Google Ads account structure
A sample of our Google Ads account structure.

The granular structure of our Google Ads account provided us with flexibility to execute hyperlocal strategies, but this also resulted in thousands of ad groups that had to be individually maintained.

In 2019, Grab’s growth was simply outpacing our team’s resources and we finally hit a bottleneck. This challenged the team to take a step back and make the decision to pursue a fully automated solution built on the following principles for long term sustainability:

  • Building ad-tech solutions in-house instead of acquiring off-the-shelf solutions

    Grab’s unique business model calls for several tailor-made features, none of which the existing ad tech solutions were able to provide.

  • Shifting our mindset to focus on the infinite game

    In order to sustain the exponential volume in the ads we run, we had to seek the path of automation.

For our very first automation project, we decided to look into automating creative refresh and upload for our Google Ads account. With thousands of ad groups running multiple creatives each, this had become a growing problem for the team. Overtime, manually monitoring these creatives and refreshing them on a regular basis had become impossible.

The Automation Fundamentals

Grab’s superapp nature means that any automation solution fundamentally needs to be robust:

  • Performance-driven – to maintain and improve conversion efficiency over time
  • Flexibility –  to fit needs across business verticals and hyperlocal execution
  • Inclusivity – to account for future service launches and marketing tech (e.g. product feeds and more)
  • Scalability – to account for future geography/campaign coverage

With these in mind, we incorporated them in our requirements for the custom creative automation tool we planned to build.

  • Performance-driven – while many advertising platforms, such as Google’s App Campaigns, have built-in algorithms to prevent low-performing creatives from being served, the fundamental bottleneck lies in the speed in which these low-performing creatives can be replaced with new assets to improve performance. Thus, solving this bottleneck would become the primary goal of our tool.

  • Flexibility – to accommodate our broad range of services, geographies and marketing objectives, a mapping logic would be required to make sure the right creatives are added into the right campaigns.

    To solve this, we relied on a standardised creative naming convention, using key attributes in the file name to map an asset to a specific campaign and ad group based on:

    • Market
    • City
    • Service type
    • Language
    • Creative theme
    • Asset type
    • Campaign optimisation goal
  • Inclusivity – to address coverage of future service offerings and interoperability with existing ad-tech vendors, we designed and built our tool conforming to many industry API and platform standards.

  • Scalability – to ensure full coverage of future geographies/campaigns, the in-house solution’s frontend and backend had to be robust enough to handle volume. Working hand in glove with Google, the solution was built by leveraging multiple APIs including Google Ads and Youtube to host and replace low-performing assets across our ad groups. The solution was then deployed on AWS’ serverless compute engine.

Enter CARA

CARA is an automation tool that scans for any low-performing creatives and replaces them with new assets from our creative library:

CARA Workflow
A sneak peek of how CARA works

In a controlled experimental launch, we saw nearly 2,000 underperforming assets automatically replaced across more than 8,000 active ad groups, translating to an 18-30% increase in clickthrough and conversion rates.

Subset of results from CARA experimental launch
A subset of results from CARA’s experimental launch

Through automation, Grab’s performance marketing team has been able to significantly improve clickthrough and conversion rates while saving valuable man-hours. We have also established a scalable foundation for future growth. The best part? We are just getting started.


Authored on behalf of the performance marketing team @ Grab. Special thanks to the CRM data analytics team, particularly Milhad Miah and Vaibhav Vij for making this a reality.


Join Us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Integrating AWS Device Farm with your CI/CD pipeline to run cross-browser Selenium tests

Post Syndicated from Mahesh Biradar original https://aws.amazon.com/blogs/devops/integrating-aws-device-farm-with-ci-cd-pipeline-to-run-cross-browser-selenium-tests/

Continuously building, testing, and deploying your web application helps you release new features sooner and with fewer bugs. In this blog, you will create a continuous integration and continuous delivery (CI/CD) pipeline for a web app using AWS CodeStar services and AWS Device Farm’s desktop browser testing service.  AWS CodeStar is a suite of services that help you quickly develop and build your web apps on AWS.

AWS Device Farm’s desktop browser testing service helps developers improve the quality of their web apps by executing their Selenium tests on different desktop browsers hosted in AWS. For each test executed on the service, Device Farm generates action logs, web driver logs, video recordings, to help you quickly identify issues with your app. The service offers pay-as-you-go pricing so you only pay for the time your tests are executing on the browsers with no upfront commitments or additional costs. Furthermore, Device Farm provides a default concurrency of 50 Selenium sessions so you can run your tests in parallel and speed up the execution of your test suites.

After you’ve completed the steps in this blog, you’ll have a working pipeline that will build your web app on every code commit and test it on different versions of desktop browsers hosted in AWS.

Solution overview

In this solution, you use AWS CodeStar to create a sample web application and a CI/CD pipeline. The following diagram illustrates our solution architecture.

Figure 1: Deployment pipeline architecture

Figure 1: Deployment pipeline architecture

Prerequisites

Before you deploy the solution, complete the following prerequisite steps:

  1. On the AWS Management Console, search for AWS CodeStar.
  2. Choose Getting Started.
  3. Choose Create Project.
  4. Choose a sample project.

For this post, we choose a Python web app deployed on Amazon Elastic Compute Cloud (Amazon EC2) servers.

  1. For Templates, select Python (Django).
  2. Choose Next.
Figure 2: Choose a template in CodeStar

Figure 2: Choose a template in CodeStar

  1. For Project name, enter Demo CICD Selenium.
  2. Leave the remaining settings at their default.

You can choose from a list of existing key pairs. If you don’t have a key pair, you can create one.

  1. Choose Next.
  2. Choose Create project.
 Figure 3: Verify the project configuration

Figure 3: Verify the project configuration

AWS CodeStar creates project resources for you as listed in the following table.

Service Resource Created
AWS CodePipeline project demo-cicd-selen-Pipeline
AWS CodeCommit repository Demo-CICD-Selenium
AWS CodeBuild project demo-cicd-selen
AWS CodeDeploy application demo-cicd-selen
AWS CodeDeploy deployment group demo-cicd-selen-Env
Amazon EC2 server Tag: Environment = demo-cicd-selen-WebApp
IAM role AWSCodeStarServiceRole, CodeStarWorker*

Your EC2 instance needs to have access to run AWS Device Farm to run the Selenium test scripts. You can use service roles to achieve that.

  1. Attach policy AWSDeviceFarmFullAccess to the IAM role CodeStarWorker-demo-cicd-selen-WebApp.

You’re now ready to create an AWS Cloud9 environment.

Check the Pipelines tab of your AWS CodeStar project; it should show success.

  1. On the IDE tab, under Cloud 9 environments, choose Create environment.
  2. For Environment name, enter Demo-CICD-Selenium.
  3. Choose Create environment.
  4. Wait for the environment to be complete and then choose Open IDE.
  5. In the IDE, follow the instructions to set up your Git and make sure it’s up to date with your repo.

The following screenshot shows the verification that the environment is set up.

Figure 4: Verify AWS Cloud9 setup

Figure 4: Verify AWS Cloud9 setup

You can now verify the deployment.

  1. On the Amazon EC2 console, choose Instances.
  2. Choose demo-cicd-selen-WebApp.
  3. Locate its public IP and choose open address.
Figure 5: Locating IP address of the instance

Figure 5: Locating IP address of the instance

 

A webpage should open. If it doesn’t, check your VPN and firewall settings.

Now that you have a working pipeline, let’s move on to creating a Device Farm project for browser testing.

Creating a Device Farm project

To create your browser testing project, complete the following steps:

  1. On the Device Farm console, choose Desktop browser testing project.
  2. Choose Create a new project.
  3. For Project name, enter Demo cicd selenium.
  4. Choose Create project.
Figure 6: Creating AWS Device Farm project

Figure 6: Creating AWS Device Farm project

 

Note down the project ARN; we use this in the Selenium script for the remote web driver.

Figure 7: Project ARN for AWS Device Farm project

Figure 7: Project ARN for AWS Device Farm project

 

Testing the solution

This solution uses the following script to run browser testing. We call this script in the validate service lifecycle hook of the CodeDeploy project.

  1. Open your AWS Cloud9 IDE (which you made as a prerequisite).
  2. Create a folder tests under the project root directory.
  3. Add the sample Selenium script browser-test-sel.py under tests folder with the following content (replace <sample_url> with the url of your web application refer pre-requisite step 18):
import boto3
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

devicefarm_client = boto3.client("devicefarm", region_name="us-west-2")
testgrid_url_response = devicefarm_client.create_test_grid_url(
    projectArn="arn:aws:devicefarm:us-west-2:<your project ARN>",
    expiresInSeconds=300)

driver = webdriver.Remote(testgrid_url_response["url"],
                          webdriver.DesiredCapabilities.FIREFOX)
try:
    driver.implicitly_wait(30)
    driver.maximize_window()
    driver.get("<sample_url>")
    if driver.find_element_by_id("Layer_1"):
        print("graphics generated in full screen")
    assert driver.find_element_by_id("Layer_1")
    driver.set_window_position(0, 0) and driver.set_window_size(1000, 400)
    driver.get("<sample_url>")
    tower = driver.find_element_by_id("tower1")
    if tower.is_displayed():
        print("graphics generated after resizing")
    else:
        print("graphics not generated at this window size")
       # this is where you can fail the script with error if you expect the graphics to load. And pipeline will terminate
except Exception as e:
    print(e)
finally:
    driver.quit()

This script launches a website (in Firefox) created by you in the prerequisite step using the AWS CodeStar template and verifies if a graphic element is loaded.

The following screenshot shows a full-screen application with graphics loaded.

Figure 8: Full screen application with graphics loaded

Figure 8: Full screen application with graphics loaded

 

The following screenshot shows the resized window with no graphics loaded.

Figure 9: Resized window application with No graphics loaded

Figure 9: Resized window application with No graphics loaded

  1. Create the file validate_service in the scripts folder under the root directory with the following content:
#!/bin/bash
if [ "$DEPLOYMENT_GROUP_NAME" == "demo-cicd-selen-Env" ]
then
cd /home/ec2-user
source environment/bin/activate
python tests/browser-test-sel.py
fi

This script is part of CodeDeploy scripts and determines whether to stop the pipeline or continue based on the output from the browser testing script from the preceding step.

  1. Modify the file appspec.yml under the root directory, add the tests files and ValidateService hook , the file should look like following:
version: 0.0
os: linux
files:
 - source: /ec2django/
   destination: /home/ec2-user/ec2django
 - source: /helloworld/
   destination: /home/ec2-user/helloworld
 - source: /manage.py
   destination: /home/ec2-user
 - source: /supervisord.conf
   destination: /home/ec2-user
 - source: /requirements.txt
   destination: /home/ec2-user
 - source: /requirements/
   destination: /home/ec2-user/requirements
 - source: /tests/
   destination: /home/ec2-user/tests

permissions:
  - object: /home/ec2-user/manage.py
    owner: ec2-user
    mode: 644
    type:
      - file
  - object: /home/ec2-user/supervisord.conf
    owner: ec2-user
    mode: 644
    type:
      - file
hooks:
  AfterInstall:
    - location: scripts/install_dependencies
      timeout: 300
      runas: root
    - location: scripts/codestar_remote_access
      timeout: 300
      runas: root
    - location: scripts/start_server
      timeout: 300
      runas: root

  ApplicationStop:
    - location: scripts/stop_server
      timeout: 300
      runas: root

  ValidateService:
    - location: scripts/validate_service
      timeout: 600
      runas: root

This file is used by AWS CodeDeploy service to perform the deployment and validation steps.

  1. Modify the artifacts section in the buildspec.yml file. The section should look like the following:
artifacts:
  files:
    - 'template.yml'
    - 'ec2django/**/*'
    - 'helloworld/**/*'
    - 'scripts/**/*'
    - 'tests/**/*'
    - 'appspec.yml'
    - 'manage.py'
    - 'requirements/**/*'
    - 'requirements.txt'
    - 'supervisord.conf'
    - 'template-configuration.json'

This file is used by AWS CodeBuild service to package the code

  1. Modify the file Common.txt in the requirements folder under the root directory, the file should look like the following:
# dependencies common to all environments 
Django=2.1.15 
selenium==3.141.0 
boto3 >= 1.10.44
pytest

  1. Save All the changes, your folder structure should look like the following:
── README.md
├── appspec.yml*
├── buildspec.yml*
├── db.sqlite3
├── ec2django
├── helloworld
├── manage.py
├── requirements
│   ├── common.txt*
│   ├── dev.txt
│   ├── prod.txt
├── requirements.txt
├── scripts
│   ├── codestar_remote_access
│   ├── install_dependencies
│   ├── start_server
│   ├── stop_server
│   └── validate_service**
├── supervisord.conf
├── template-configuration.json
├── template.yml
├── tests
│   └── browser-test-sel.py**

**newly added files
*modified file

Running the tests

The Device Farm desktop browsing project is now integrated with your pipeline. All you need to do now is commit the code, and CodePipeline takes care of the rest.

  1. On Cloud9 terminal, go to project root directory.
  2. Run git add . to stage all changed files for commit.
  3. Run git commit -m “<commit message>” to commit the changes.
  4. Run git push to push the changes to repository, this should trigger the Pipeline.
  5. Check the Pipelines tab of your AWS CodeStar project; it should show success.
  6. Go to AWS Device Farm console and click on Desktop browser testing projects.
  7. Click on your Project Demo cicd selenium.

You can verify the running of your Selenium test cases using the recorded run steps shown on the Device Farm console, the video of the run, and the logs, all of which can be downloaded on the Device Farm console, or using the AWS SDK and AWS Command Line Interface (AWS CLI).

The following screenshot shows the project run details on the console.

Figure 10: Viewing AWS Device Farm project run details

Figure 10: Viewing AWS Device Farm project run details

 

To Test the Selenium script locally you can run the following commands.

1. Create a Python virtual environment for your Django project. This virtual environment allows you to isolate this project and install any packages you need without affecting the system Python installation. At the terminal, go to project root directory and type the following command:

$ python3 -m venv ./venv

2. Activate the virtual environment:

$ source ./venv/bin/activate

3. Install development Python dependencies for this project:

$ pip install -r requirements/dev.txt

4. Run Selenium Script:

$ python tests/browser-test-sel.py

Testing the failure scenario (optional)

To test the failure scenario, you can modify the sample script browser-test-sel.py at the else statement.

The following code shows the lines to change:

else:
print("graphics was not generated at this form size")
# this is where you can fail the script with error if you expect the graphics to load. And pipeline will terminate

The following is the updated code:

else:
exit(1)
# this is where you can fail the script with error if you expect the graphics to load. And pipeline will terminate

Commit the change, and the pipeline should fail and stop the deployment.

Conclusion

Integrating Device Farm with CI/CD pipelines allows you to control deployment based on browser testing results. Failing the Selenium test on validation failures can stop and roll back the deployment, and a successful testing can continue the pipeline to deploy the solution to the final stage. Device Farm offers a one-stop solution for testing your native and web applications on desktop browsers and real mobile devices.

Mahesh Biradar is a Solutions Architect at AWS. He is a DevOps enthusiast and enjoys helping customers implement cost-effective architectures that scale..

Automate Amazon EC2 instance isolation by using tags

Post Syndicated from Jose Obando original https://aws.amazon.com/blogs/security/automate-amazon-ec2-instance-isolation-by-using-tags/

Containment is a crucial part of an overall Incident Response Strategy, as this practice allows time for responders to perform forensics, eradication and recovery during an Incident. There are many different approaches to containment. In this post, we will be focusing on isolation—the ability to keep multiple targets separated so that each target only sees and affects itself—as a containment strategy.

I’ll show you how to automate isolation of an Amazon Elastic Compute Cloud (Amazon EC2) instance by using an AWS Lambda function that’s triggered by tag changes on the instance, as reported by Amazon CloudWatch Events.

CloudWatch Event Rules deliver a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. See also Amazon EventBridge.

Preparing for an incident is important as outlined in the Security Pillar of the AWS Well-Architected Framework.

Out of the 7 Design Principles for Security in the Cloud, as per the Well-Architected Framework, this solution will cover the following:

  • Enable traceability: Monitor, alert, and audit actions and changes to your environment in real time. Integrate log and metric collection with systems to automatically investigate and take action.
  • Automate security best practices: Automated software-based security mechanisms can improve your ability to securely scale more rapidly and cost-effectively. Create secure architectures, including through the implementation of controls that can be defined and managed by AWS as code in version-controlled templates.
  • Prepare for security events: Prepare for an incident by implementing incident management and investigation policy and processes that align to your organizational requirements. Run incident response simulations and use tools with automation to help increase your speed for detection, investigation, and recovery.

After detecting an event in the Detection phase and analyzing in the Analysis phase, you can automate the process of logically isolating an instance from a Virtual Private Cloud (VPC) in Amazon Web Services (AWS).

In this blog post, I describe how to automate EC2 instance isolation by using the tagging feature that a responder can use to identify instances that need to be isolated. A Lambda function then uses AWS API calls to isolate the instances by performing the actions described in the following sections.

Use cases

Your organization can use automated EC2 instance isolation for scenarios like these:

  • A security analyst wants to automate EC2 instance isolation in order to respond to security events in a timely manner.
  • A security manager wants to provide their first responders with a way to quickly react to security incidents without providing too much access to higher security features.

High-level architecture and design

The example solution in this blog post uses a CloudWatch Events rule to trigger a Lambda function. The CloudWatch Events rule is triggered when a tag is applied to an EC2 instance. The Lambda code triggers further actions based on the contents of the event. Note that the CloudFormation template includes the appropriate permissions to run the function.

The event flow is shown in Figure 1 and works as follows:

  1. The EC2 instance is tagged.
  2. The CloudWatch Events rule filters the event.
  3. The Lambda function is invoked.
  4. If the criteria are met, the isolation workflow begins.

When the Lambda function is invoked and the criteria are met, these actions are performed:

  1. Checks for IAM instance profile associations.
  2. If the instance is associated to a role, the Lambda function disassociates that role.
  3. Applies the isolation role that you defined during CloudFormation stack creation.
  4. Checks the VPC where the EC2 instance resides.
    • If there is no isolation security group in the VPC (if the VPC is new, for example), the function creates one.
  5. Applies the isolation security group.

Note: If you had a security group with an open (0.0.0.0/0) outbound rule, and you apply this Isolation security group, your existing SSH connections to the instance are immediately dropped. On the other hand, if you have a narrower inbound rule that initially allows the SSH connection, the existing connection will not be broken by changing the group. This is known as Connection Tracking.

Figure 1: High-level diagram showing event flow

Figure 1: High-level diagram showing event flow

For the deployment method, we will be using an AWS CloudFormation Template. AWS CloudFormation gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.

The AWS CloudFormation template that I provide here deploys the following resources:

  • An EC2 instance role for isolation – this is attached to the EC2 Instance to prevent further communication with other AWS Services thus limiting the attack surface to your overall infrastructure.
  • An Amazon CloudWatch Events rule – this is used to detect changes to an AWS EC2 resource, in this case a “tag change event”. We will use this as a trigger to our Lambda function.
  • An AWS Identity and Access Management (IAM) role for Lambda functions – this is what the Lambda function will use to execute the workflow.
  • A Lambda function for automation – this function is where all the decision logic sits, once triggered it will follow a set of steps described in the following section.
  • Lambda function permissions – this is used by the Lambda function to execute.
  • An IAM instance profile – this is a container for an IAM role that you can use to pass role information to an EC2 instance.

Supporting functions within the Lambda function

Let’s dive deeper into each supporting function inside the Lambda code.

The following function identifies the virtual private cloud (VPC) ID for a given instance. This is needed to identify which security groups are present in the VPC.

def identifyInstanceVpcId(instanceId):
    instanceReservations = ec2Client.describe_instances(InstanceIds=[instanceId])['Reservations']
    for instanceReservation in instanceReservations:
        instancesDescription = instanceReservation['Instances']
        for instance in instancesDescription:
            return instance['VpcId']

The following function modifies the security group of an EC2 instance.

def modifyInstanceAttribute(instanceId,securityGroupId):
    response = ec2Client.modify_instance_attribute(
        Groups=[securityGroupId],
        InstanceId=instanceId)

The following function creates a security group on a VPC that blocks all egress access to the security group.

def createSecurityGroup(groupName, descriptionString, vpcId):
    resource = boto3.resource('ec2')
    securityGroupId = resource.create_security_group(GroupName=groupName, Description=descriptionString, VpcId=vpcId)
    securityGroupId.revoke_egress(IpPermissions= [{'IpProtocol': '-1','IpRanges': [{'CidrIp': '0.0.0.0/0'}],'Ipv6Ranges': [],'PrefixListIds': [],'UserIdGroupPairs': []}])
    return securityGroupId.

Deploy the solution

To deploy the solution provided in this blog post, first download the CloudFormation template, and then set up a CloudFormation stack that specifies the tags that are used to trigger the automated process.

Download the CloudFormation template

To get started, download the CloudFormation template from Amazon S3. Alternatively, you can launch the CloudFormation template by selecting the following Launch Stack button:

Select the Launch Stack button to launch the template

Deploy the CloudFormation stack

Start by uploading the CloudFormation template to your AWS account.

To upload the template

  1. From the AWS Management Console, open the CloudFormation console.
  2. Choose Create Stack.
  3. Choose With new resources (standard).
  4. Choose Upload a template file.
  5. Select Choose File, and then select the YAML file that you just downloaded.
Figure 2: CloudFormation stack creation

Figure 2: CloudFormation stack creation

Specify stack details

You can leave the default values for the stack as long as there aren’t any resources provisioned already with the same name, such as an IAM role. For example, if left with default values an IAM role named “SecurityIsolation-IAMRole” will be created. Otherwise, the naming convention is fully customizable from this screen and you can enter your choice of name for the CloudFormation stack, and modify the parameters as you see fit. Figure 3 shows the parameters that you can set.

The Evaluation Parameters section defines the tag key and value that will initiate the automated response. Keep in mind that these values are case-sensitive.

Figure 3: CloudFormation stack parameters

Figure 3: CloudFormation stack parameters

Choose Next until you reach the final screen, shown in Figure 4, where you acknowledge that an IAM role is created and you trust the source of this template. Select the check box next to the statement I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create Stack.

Figure 4: CloudFormation IAM notification

Figure 4: CloudFormation IAM notification

After you complete these steps, the following resources will be provisioned, as shown in Figure 5:

  • EC2IsolationRole
  • EC2TagChangeEvent
  • IAMRoleForLambdaFunction
  • IsolationLambdaFunction
  • IsolationLambdaFunctionInvokePermissions
  • RootInstanceProfile
Figure 5: CloudFormation created resources

Figure 5: CloudFormation created resources

Testing

To start your automation, tag an EC2 instance using the tag defined during the CloudFormation setup. If you’re using the Amazon EC2 console, you can apply tags to resources by using the Tags tab on the relevant resource screen, or you can use the Tags screen, the AWS CLI or an AWS SDK. A detailed walkthrough for each approach can be found in the Amazon EC2 Documentation page.

Reverting Changes

If you need to remove the restrictions applied by this workflow, complete the following steps.

  1. From the EC2 dashboard, in the Instances section, check the box next to the instance you want to modify.

    Figure 6: Select the instance to modify

    Figure 6: Select the instance to modify

  2. In the top right, select Actions, choose Instance settings, and then choose Modify IAM role.

    Figure 7: Choose Actions > Instance settings > Modify IAM role

    Figure 7: Choose Actions > Instance settings > Modify IAM role

  3. Under IAM role, choose the IAM role to attach to your instance, and then select Save.

    Figure 8: Choose the IAM role to attach

    Figure 8: Choose the IAM role to attach

  4. Select Actions, choose Networking, and then choose Change security groups.

    Figure 9: Choose Actions > Networking > Change security groups

    Figure 9: Choose Actions > Networking > Change security groups

  5. Under Associated security groups, select Remove and add a different security group with the access you want to grant to this instance.

Summary

Using the CloudFormation template provided in this blog post, a Security Operations Center analyst could have only tagging privileges and isolate an EC2 instance based on this tag. Or a security service such as Amazon GuardDuty could trigger a lambda to apply the tag as part of a workflow. This means the Security Operations Center analyst wouldn’t need administrative privileges over the EC2 service.

This solution creates an isolation security group for any new VPCs that don’t have one already. The security group would still follow the naming convention defined during the CloudFormation stack launch, but won’t be part of the provisioned resources. If you decide to delete the stack, manual cleanup would be required to remove these security groups.

This solution terminates established inbound Secure Shell (SSH) sessions that are associated to the instance, and isolates the instance from new connections either inbound or outbound. For outbound connections that are already established (for example, reverse shell), you either need to shut down the network interface card (NIC) at the operating system (OS) level, restart the instance network stack at the OS level, terminate the established connections, or apply a network access control list (network ACL).

For more information, see the following documentation:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jose Obando

Jose is a Security Consultant on the Global Financial Services team. He helps the world’s top financial institutions improve their security posture in the cloud. He has a background in network security and cloud architecture. In his free time, you can find him playing guitar or training in Muay Thai.

Growth Engineering at Netflix — Automated Imagery Generation

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/growth-engineering-at-netflix-automated-imagery-generation-5a105fd51569

Growth Engineering at Netflix — Automated Imagery Generation

by Eric Eiswerth

Background

There’s a good chance you’ve probably visited the Netflix homepage. In the Growth Engineering team, we refer to this as the top of the signup funnel. For more background on the signup funnel and Growth Engineering’s role in the signup funnel, please read our initial post on the topic: Growth Engineering at Netflix — Accelerating Innovation. The primary focus of this post will be the top of the signup funnel. In particular, the Netflix homepage:

As discussed in our previous post, Growth Engineering owns the business logic and protocols that allow our UI partners to build lightweight and flexible applications for almost any platform. In some cases, like the homepage, this even involves providing appropriate imagery (e.g., the background image shown above). In this post, we’ll take a deep dive into the journey of content-based imagery on the Netflix homepage.

Motivation

At Netflix we do one thing — entertainment — and we aim to do it really well. We live and breathe TV shows and films, and we want everyone to be able to enjoy them too. That’s why we aspire to have best in class stories, across genres and believe people should have access to new voices, cultures and perspectives. The member-focused teams at Netflix are responsible for making sure the member experience is relevant and personalized, ensuring that this content is shown to the right people at the right time. But what about non-members; those who are simply interested in signing up for Netflix, how should we highlight our content and convey our value propositions to them?

The Solution

The main mechanism for highlighting our content in the signup flow is through content-based imagery. Before designing a solution it’s important to understand the main product requirements for such a feature:

  • The content needs to be new, relevant, and regional (not all countries have the same catalogue).
  • The artwork needs to appeal to a broader audience. The non-member homepage serves a very broad audience and is not personalized to the extent of the member experience.
  • The imagery needs to be localized.
  • We need to be able to easily determine what imagery is present for a given platform, region, and language.
  • The homepage needs to load in a reasonable amount of time, even in poor network conditions.

Unpacking Product Requirements

Given the scale we require and the product requirements listed above, there are a number of technical requirements:

  • A list of titles for the asset, in some order.
  • Ensure the titles are appropriate for a broad audience, which means all titles need to be tagged with metadata.
  • Localized images for each of the titles.
  • Different assets for different device types and screen sizes.
  • Server-generated assets, since client-side generation would require the retrieval of many individual images, which would increase latency and time-to-render.
  • To reduce latency, assets should be generated in an offline fashion and not in real time.
  • The assets need to be compressed, without reducing quality significantly.
  • The assets will need to be stored somewhere and we’ll need to generate URLs for each of them.
  • We’ll need to figure out how to provide the correct asset URL for a given request.
  • We’ll need to build a search index so that the assets can be searchable.

Given this set of requirements, we can effectively break this work down into 3 functional buckets:

The Design

For our design, we decided to build 3 separate microservices, mapping to the aforementioned functional buckets. Let’s take a look at each of these services in turn.

Asset Generation

The Asset Generation Service is responsible for generating groups of assets. We call these groups of assets, asset groups. Each request will generate a single asset group that will contain one or more assets. To support the demands of our stakeholders we designed a Domain Specific Language (DSL) that we call an asset generation recipe. An asset generation request contains a recipe. Below is an example of a simple recipe:

{
"titleIds": [12345, 23456, 34567, …],
"countries": [“US”],
"type": “perspective”, // this is the design of the asset
"rows": 10, // the number of rows and columns can control density
"cols": 15,
"padding": 10, // padding between individual images
"columnOffsets": [0, 0, 0, 0…], // the y-offset for each column
"rowOffsets": [0, -100, 0, -100, …], // the x-offset for each row
"size": [1920, 1080] // size in pixels
}

This recipe can then be issued via an HTTP POST request to the Asset Generation Service. The recipe can then be translated into ImageMagick commands that can do the heavy lifting. At a high level, the following diagram captures the necessary steps required to build an asset.

Generating a single localized asset is a big achievement, but we still need to store the asset somewhere and have the ability to search for it. This requires an asset storage solution.

Asset Storage

We refer to asset storage and management simply as asset management. We felt it would be beneficial to create a separate microservice for asset management for 2 reasons. First, asset generation is CPU intensive and bursty. We can leverage high performance VMs in AWS to generate the assets. We can scale up when generation is occurring and scale down when there is no batch in the queue. However, it would be cost-inefficient to leverage this same hardware for lightweight and more consistent traffic patterns that an asset management service requires.

Let’s take a look at the internals of the Asset Management Service.

At this point we’ve laid out all the details in order to generate a content-based asset and have it stored as part of an asset group, which is persisted and indexed. The next thing to consider is, how do we retrieve an asset in real time and surface it on the Netflix homepage?

If you recall in our previous blog post, Growth Engineering owns a service called the Orchestration Service. It is a mid-tier service that emits a custom JSON data structure that contains fields that are consumed by the UI. The UI can then use these fields to control the presentation in the UI layer. There are two approaches for adding fields to the Orchestration Service’s response. First, the fields can be coded by hand. Second, fields can be added via configuration via a service we call the Customization Service. Since assets will need to be periodically refreshed and we want this process to be entirely automated, it makes sense to pursue the configuration-based approach. To accomplish this, the Asset Management Service needs to translate an asset group into a rule definition for the Customization Service.

Customization Service

Let’s review the Orchestration Service and introduce the Customization Service. The Orchestration Service emits fields in response to upstream requests. For the homepage, there are typically only a small number of fields provided by the Orchestration Service. The following fields are supplied by application code. For example:

{
“fields”: {
“email” : {
“type”: “StringField”,
“value”: “”
},
“nextAction”: {
“type”: “Action”,
“withFields” [“email”]
}
}
}

The Orchestration Service also supports fields supplied by configuration. We call these adaptive fields. Adaptive fields are provided by the Customization Service. The Customization Service is a rules engine that emits the adaptive fields. For example, a rule to provide the background image for the homepage in the en-US locale would look as follows:

{
“country”: “US”,
“language”: “en”,
“platform”: “browser”,
“resolution”: “high”
}

The corresponding payload for such a rule might look as follows:

{
“backgroundImage”: “https://cdn.netflix.com/bgimageurl.jpg”
}

Bringing this all together, the response from the Orchestration Service would now look as follows:

{
“fields”: {
“email” : {
“type”: “StringField”,
“value”: “”
},
“nextAction”: {
“type”: “Action”,
“withFields” [“email”]
}
},
“adaptiveFields”: {
“backgroundImage”: “https://cdn.netflix.com/bgimageurl.jpg”
}
}

At this point, we are now able to generate an asset, persist it, search it, and generate customization rules for it. The generated rules then enable us to return a particular asset for a particular request. Let’s put it all together and review the system interaction diagram.

We now have all the pieces in place to automatically generate artwork and have that artwork appear on the Netflix homepage for a given request. At least one open question remains, how can we scale asset generation?

Scaling Asset Generation

Arguably, there are a number of approaches that could be used to scale asset generation. We decided to opt for an all-or-nothing approach. Meaning, all assets for a given recipe need to be generated as a single asset group. This enables smooth rollback in case of any errors. Additionally, asset generation is CPU intensive and each recipe can produce 1000s of assets as a result of the number of platform, region, and language permutations. Even with high performance VMs, generating 1000s of assets can take a long time. As a result, we needed to find a way to distribute asset generation across multiple VMs. Here’s what the final architecture looked like.

Briefly, let’s review the steps:

  1. The batch process is initiated by a cron job. The job executes a script that contains an asset generation recipe.
  2. The Asset Generation Service receives the request and creates asset generation tasks that can be distributed across any number of Asset Generation Worker nodes. One of the nodes is elected as the leader via Zookeeper. Its job is to coordinate asset generation across the other workers and ensure all assets get generated.
  3. Once the primary worker node has all the assets, it creates an asset group in the Asset Management Service. The Asset Management Service persists, indexes, and uploads the assets to the CDN.
  4. Finally, the Asset Management Service creates rules from the asset group and pushes the rules to the Customization Service. Once the data is published in the Customization Service, the Orchestration Service can supply the correct URLs in its JSON response by invoking the Customization Service with a request context that matches a given set of rules.

Conclusion

Automated asset generation has proven to be an extremely valuable investment. It is low-maintenance, high-leverage, and has allowed us to experiment with a variety of different types of assets on different platforms and on different parts of the product. This project was technically challenging and highly rewarding, both to the engineers involved in the project, and to the business. The Netflix homepage has come a long way over the last several years.

We’re hiring! Join Growth Engineering and help us build the future of Netflix.


Growth Engineering at Netflix — Automated Imagery Generation was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to deploy the AWS Solution for Security Hub Automated Response and Remediation

Post Syndicated from Ramesh Venkataraman original https://aws.amazon.com/blogs/security/how-to-deploy-the-aws-solution-for-security-hub-automated-response-and-remediation/

In this blog post I show you how to deploy the Amazon Web Services (AWS) Solution for Security Hub Automated Response and Remediation. The first installment of this series was about how to create playbooks using Amazon CloudWatch Events, AWS Lambda functions, and AWS Security Hub custom actions that you can run manually based on triggers from Security Hub in a specific account. That solution requires an analyst to directly trigger an action using Security Hub custom actions and doesn’t work for customers who want to set up fully automated remediation based on findings across one or more accounts from their Security Hub master account.

The solution described in this post automates the cross-account response and remediation lifecycle from executing the remediation action to resolving the findings in Security Hub and notifying users of the remediation via Amazon Simple Notification Service (Amazon SNS). You can also deploy these automated playbooks as custom actions in Security Hub, which allows analysts to run them on-demand against specific findings. You can deploy these remediations as custom actions or as fully automated remediations.

Currently, the solution includes 10 playbooks aligned to the controls in the Center for Internet Security (CIS) AWS Foundations Benchmark standard in Security Hub, but playbooks for other standards such as AWS Foundational Security Best Practices (FSBP) will be added in the future.

Solution overview

Figure 1 shows the flow of events in the solution described in the following text.

Figure 1: Flow of events

Figure 1: Flow of events

Detect

Security Hub gives you a comprehensive view of your security alerts and security posture across your AWS accounts and automatically detects deviations from defined security standards and best practices.

Security Hub also collects findings from various AWS services and supported third-party partner products to consolidate security detection data across your accounts.

Ingest

All of the findings from Security Hub are automatically sent to CloudWatch Events and Amazon EventBridge and you can set up CloudWatch Events and EventBridge rules to be invoked on specific findings. You can also send findings to CloudWatch Events and EventBridge on demand via Security Hub custom actions.

Remediate

The CloudWatch Event and EventBridge rules can have AWS Lambda functions, AWS Systems Manager automation documents, or AWS Step Functions workflows as the targets of the rules. This solution uses automation documents and Lambda functions as response and remediation playbooks. Using cross-account AWS Identity and Access Management (IAM) roles, the playbook performs the tasks to remediate the findings using the AWS API when a rule is invoked.

Log

The playbook logs the results to the Amazon CloudWatch log group for the solution, sends a notification to an Amazon Simple Notification Service (Amazon SNS) topic, and updates the Security Hub finding. An audit trail of actions taken is maintained in the finding notes. The finding is updated as RESOLVED after the remediation is run. The security finding notes are updated to reflect the remediation performed.

Here are the steps to deploy the solution from this GitHub project.

  • In the Security Hub master account, you deploy the AWS CloudFormation template, which creates an AWS Service Catalog product along with some other resources. For a full set of what resources are deployed as part of an AWS CloudFormation stack deployment, you can find the full set of deployed resources in the Resources section of the deployed AWS CloudFormation stack. The solution uses the AWS Service Catalog to have the remediations available as a product that can be deployed after granting the users the required permissions to launch the product.
  • Add an IAM role that has administrator access to the AWS Service Catalog portfolio.
  • Deploy the CIS playbook from the AWS Service Catalog product list using the IAM role you added in the previous step.
  • Deploy the AWS Security Hub Automated Response and Remediation template in the master account in addition to the member accounts. This template establishes AssumeRole permissions to allow the playbook Lambda functions to perform remediations. Use AWS CloudFormation StackSets in the master account to have a centralized deployment approach across the master account and multiple member accounts.

Deployment steps for automated response and remediation

This section reviews the steps to implement the solution, including screenshots of the solution launched from an AWS account.

Launch AWS CloudFormation stack on the master account

As part of this AWS CloudFormation stack deployment, you create custom actions to configure Security Hub to send findings to CloudWatch Events. Lambda functions are used to provide remediation in response to actions sent to CloudWatch Events.

Note: In this solution, you create custom actions for the CIS standards. There will be more custom actions added for other security standards in the future.

To launch the AWS CloudFormation stack

  1. Deploy the AWS CloudFormation template in the Security Hub master account. In your AWS console, select CloudFormation and choose Create new stack and enter the S3 URL.
  2. Select Next to move to the Specify stack details tab, and then enter a Stack name as shown in Figure 2. In this example, I named the stack SO0111-SHARR, but you can use any name you want.
     
    Figure 2: Creating a CloudFormation stack

    Figure 2: Creating a CloudFormation stack

  3. Creating the stack automatically launches it, creating 21 new resources using AWS CloudFormation, as shown in Figure 3.
     
    Figure 3: Resources launched with AWS CloudFormation

    Figure 3: Resources launched with AWS CloudFormation

  4. An Amazon SNS topic is automatically created from the AWS CloudFormation stack.
  5. When you create a subscription, you’re prompted to enter an endpoint for receiving email notifications from Amazon SNS as shown in Figure 4. To subscribe to that topic that was created using CloudFormation, you must confirm the subscription from the email address you used to receive notifications.
     
    Figure 4: Subscribing to Amazon SNS topic

    Figure 4: Subscribing to Amazon SNS topic

Enable Security Hub

You should already have enabled Security Hub and AWS Config services on your master account and the associated member accounts. If you haven’t, you can refer to the documentation for setting up Security Hub on your master and member accounts. Figure 5 shows an AWS account that doesn’t have Security Hub enabled.
 

Figure 5: Enabling Security Hub for first time

Figure 5: Enabling Security Hub for first time

AWS Service Catalog product deployment

In this section, you use the AWS Service Catalog to deploy Service Catalog products.

To use the AWS Service Catalog for product deployment

  1. In the same master account, add roles that have administrator access and can deploy AWS Service Catalog products. To do this, from Services in the AWS Management Console, choose AWS Service Catalog. In AWS Service Catalog, select Administration, and then navigate to Portfolio details and select Groups, roles, and users as shown in Figure 6.
     
    Figure 6: AWS Service Catalog product

    Figure 6: AWS Service Catalog product

  2. After adding the role, you can see the products available for that role. You can switch roles on the console to assume the role that you granted access to for the product you added from the AWS Service Catalog. Select the three dots near the product name, and then select Launch product to launch the product, as shown in Figure 7.
     
    Figure 7: Launch the product

    Figure 7: Launch the product

  3. While launching the product, you can choose from the parameters to either enable or disable the automated remediation. Even if you do not enable fully automated remediation, you can still invoke a remediation action in the Security Hub console using a custom action. By default, it’s disabled, as highlighted in Figure 8.
     
    Figure 8: Enable or disable automated remediation

    Figure 8: Enable or disable automated remediation

  4. After launching the product, it can take from 3 to 5 minutes to deploy. When the product is deployed, it creates a new CloudFormation stack with a status of CREATE_COMPLETE as part of the provisioned product in the AWS CloudFormation console.

AssumeRole Lambda functions

Deploy the template that establishes AssumeRole permissions to allow the playbook Lambda functions to perform remediations. You must deploy this template in the master account in addition to any member accounts. Choose CloudFormation and create a new stack. In Specify stack details, go to Parameters and specify the Master account number as shown in Figure 9.
 

Figure 9: Deploy AssumeRole Lambda function

Figure 9: Deploy AssumeRole Lambda function

Test the automated remediation

Now that you’ve completed the steps to deploy the solution, you can test it to be sure that it works as expected.

To test the automated remediation

  1. To test the solution, verify that there are 10 actions listed in Custom actions tab in the Security Hub master account. From the Security Hub master account, open the Security Hub console and select Settings and then Custom actions. You should see 10 actions, as shown in Figure 10.
     
    Figure 10: Custom actions deployed

    Figure 10: Custom actions deployed

  2. Make sure you have member accounts available for testing the solution. If not, you can add member accounts to the master account as described in Adding and inviting member accounts.
  3. For testing purposes, you can use CIS 1.5 standard, which is to require that the IAM password policy requires at least one uppercase letter. Check the existing settings by navigating to IAM, and then to Account Settings. Under Password policy, you should see that there is no password policy set, as shown in Figure 11.
     
    Figure 11: Password policy not set

    Figure 11: Password policy not set

  4. To check the security settings, go to the Security Hub console and select Security standards. Choose CIS AWS Foundations Benchmark v1.2.0. Select CIS 1.5 from the list to see the Findings. You will see the Status as Failed. This means that the password policy to require at least one uppercase letter hasn’t been applied to either the master or the member account, as shown in Figure 12.
     
    Figure 12: CIS 1.5 finding

    Figure 12: CIS 1.5 finding

  5. Select CIS 1.5 – 1.11 from Actions on the top right dropdown of the Findings section from the previous step. You should see a notification with the heading Successfully sent findings to Amazon CloudWatch Events as shown in Figure 13.
     
    Figure 13: Sending findings to CloudWatch Events

    Figure 13: Sending findings to CloudWatch Events

  6. Return to Findings by selecting Security standards and then choosing CIS AWS Foundations Benchmark v1.2.0. Select CIS 1.5 to review Findings and verify that the Workflow status of CIS 1.5 is RESOLVED, as shown in Figure 14.
     
    Figure 14: Resolved findings

    Figure 14: Resolved findings

  7. After the remediation runs, you can verify that the Password policy is set on the master and the member accounts. To verify that the password policy is set, navigate to IAM, and then to Account Settings. Under Password policy, you should see that the account uses a password policy, as shown in Figure 15.
     
    Figure 15: Password policy set

    Figure 15: Password policy set

  8. To check the CloudWatch logs for the Lambda function, in the console, go to Services, and then select Lambda and choose the Lambda function and within the Lambda function, select View logs in CloudWatch. You can see the details of the function being run, including updating the password policy on both the master account and the member account, as shown in Figure 16.
     
    Figure 15: Lambda function log

    Figure 16: Lambda function log

Conclusion

In this post, you deployed the AWS Solution for Security Hub Automated Response and Remediation using Lambda and CloudWatch Events rules to remediate non-compliant CIS-related controls. With this solution, you can ensure that users in member accounts stay compliant with the CIS AWS Foundations Benchmark by automatically invoking guardrails whenever services move out of compliance. New or updated playbooks will be added to the existing AWS Service Catalog portfolio as they’re developed. You can choose when to take advantage of these new or updated playbooks.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security Hub forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ramesh Venkataraman

Ramesh is a Solutions Architect who enjoys working with customers to solve their technical challenges using AWS services. Outside of work, Ramesh enjoys following stack overflow questions and answers them in any way he can.

Tracking the latest server images in Amazon EC2 Image Builder pipelines

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/tracking-the-latest-server-images-in-amazon-ec2-image-builder-pipelines/

This post courtesy of Anoop Rachamadugu, Cloud Architect at AWS

The Amazon EC2 Image Builder service helps users to build and maintain server images. The images created by EC2 Image Builder can be used with Amazon Elastic Compute Cloud (EC2) and on-premises. Image Builder reduces the effort of keeping images up-to-date and secure by providing a graphical interface, built-in automation, and AWS-provided security settings. Customers have told us that they manage multiple server images and are looking for ways to track the latest server images created by the pipelines.

In this blog post, I walk through a solution that uses AWS Lambda and AWS Systems Manager (SSM) Parameter Store. It tracks and updates the latest Amazon Machine Image (AMI) IDs every time an Image Builder pipeline is run. With Lambda, you pay only for what you use. You are charged based on the number of requests for your functions and the time it takes for your code to run. In this case, the Lambda function is invoked upon the completion of the image builder pipeline. Standard SSM parameters are available at no additional charge.

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. Consider the use case of updating Amazon Machine Image (AMI) IDs for the EC2 instances in your CloudFormation templates. Normally, you might map AMI IDs to specific instance types and Regions. Then to update these, you would manually change them in each of your templates. With the SSM parameter integration, your code remains untouched and a CloudFormation stack update operation automatically fetches the latest Parameter Store value.

Overview

This solution uses a Lambda function written in Python that subscribes to an Amazon Simple Notification Service (SNS) topic. The Lambda function and the SNS topic are deployed using AWS SAM CLI. Once deployed, the SNS topic must be configured in an existing Image Builder pipeline. This results in the Lambda function being invoked at the completion of the Image Builder pipeline.

When a Lambda function subscribes to an SNS topic, it is invoked with the payload of the published messages. The Lambda function receives the message payload as an input parameter. The Lambda function first checks the message payload to see if the image status is available. If the image state is available, it retrieves the AMI ID from the message payload and updates the SSM parameter.

EC2 Image builder architecture diagram

EC2 Image builder architecture diagram

Prerequisites

To get started with this solution, the following is required:

Deploying the solution

The solution consists of two files, which can be downloaded from the amazon-ec2-image-builder GitHub repository.

  1. The Python file image-builder-lambda-update-ssm.py contains the code for the Lambda function. It first checks the SNS message payload to determine if the image is available. If it’s available, it extracts the AMI ID from the SNS message payload and updates the SSM parameter specified.The ‘ssm_parameter_name’ variable specifies the SSM parameter path where the AMI ID should be stored and updated. The Lambda function finishes by adding tags to the SSM parameter.
  2. The template.yaml file is an AWS SAM template. It deploys the Lambda function, SNS topic, and IAM role required for the Lambda function. I use Python 3.7 as the runtime and assign a memory of 256 MB for the Lambda function. The IAM policy gives the Lambda function permissions to retrieve and update SSM parameters. Deploy this application using the AWS SAM CLI guided deploy:
    sam deploy --guided

After deploying the application, note the ARN of the created SNS topic. Next, update the infrastructure settings of an existing Image Builder pipeline with this newly created SNS topic. This results in the Lambda function being invoked upon the completion of the image builder pipeline.

Configuration details

Configuration details

Verifying the solution

After the completion of the image builder pipeline, use the AWS CLI or check the AWS Management Console to verify the updated SSM parameter. To verify via AWS CLI, run the following commands to retrieve and list the tags attached to the SSM parameter:

aws ssm get-parameter --name ‘/ec2-imagebuilder/latest’
aws ssm list-tags-for-resource --resource-type "Parameter" --resource-id ‘/ec2-imagebuilder/latest’

To verify via the AWS Management Console, navigate to the Parameter Store under AWS Systems Manager. Search for the parameter /ec2-imagebuilder/latest:

AWS Systems Manager: Parameter Store

AWS Systems Manager: Parameter Store

Select the Tags tab to view the tags attached to the SSM parameter:

Image builder tags list

Image builder tags list

Referencing the SSM Parameter in CloudFormation templates

Users can reference the SSM parameters in automation scripts and AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure. This sample code shows how to reference the SSM parameter in a CloudFormation template.

Parameters :
  LatestAmiId :
    Type : 'AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>'
    Default: ‘/ec2-imagebuilder/latest’

Resources :
  Instance :
    Type : 'AWS::EC2::Instance'
    Properties :
      ImageId : !Ref LatestAmiId

Conclusion

In this blog post, I demonstrate a solution that allows users to track and update the latest AMI ID created by the Image Builder pipelines. The Lambda function retrieves the AMI ID of the image created by a pipeline and update an AWS Systems Manager parameter. This Lambda function is triggered via an SNS topic configured in an Image Builder pipeline.

The solution is deployed using AWS SAM CLI. I also note how users can reference Systems Manager parameters in AWS CloudFormation templates providing access to the latest AMI ID for your EC2 infrastructure.

The amazon-ec2-image-builder-samples GitHub repository provides a number of examples for getting started with EC2 Image Builder. Image Builder can make it easier for you to build virtual machine (VM) images.

Getting started with DevOps automation

Post Syndicated from Jared Murrell original https://github.blog/2020-10-29-getting-started-with-devops-automation/

This is the second post in our series on DevOps fundamentals. For a guide to what DevOps is and answers to common DevOps myths check out part one.

What role does automation play in DevOps?

First things first—automation is one of the key principles for accelerating with DevOps. As noted in my last blog post, it enables consistency, reliability, and efficiency within the organization, making it easier for teams to discover and troubleshoot problems. 

However, as we’ve worked with organizations, we’ve found not everyone knows where to get started, or which processes can and should be automated. In this post, we’ll discuss a few best practices and insights to get teams moving in the right direction.

A few helpful guidelines

The path to DevOps automation is continually evolving. Before we dive into best practices, there are a few common guidelines to keep in mind as you’re deciding what and how you automate. 

  • Choose open standards. Your contributors and team may change, but that doesn’t mean your tooling has to. By maintaining tooling that follows common, open standards, you can simplify onboarding and save time on specialized training. Community-driven standards for packaging, runtime, configuration, and even networking and storage—like those found in Kubernetes—also become even more important as DevOps and deployments move toward the cloud.
  • Use dynamic variables. Prioritizing reusable code will reduce the amount of rework and duplication you have, both now and in the future. Whether in scripts or specialized tools, securely using externally-defined variables is an easy way to apply your automation to different environments without needing to change the code itself.
  • Use flexible tooling you can take with you. It’s not always possible to find a tool that fits every situation, but using a DevOps tool that allows you to change technologies also helps reduce rework when companies change direction. By choosing a solution with a wide ecosystem of partner integrations that works with any cloud, you’ll be able to  define your unique set of best practices and reach your goals—without being restricted by your toolchain.

DevOps automation best practices

Now that our guidelines are in place, we can evaluate which sets of processes we need to automate. We’ve broken some best practices for DevOps automation into four categories to help you get started. 

1. Continuous integration, continuous delivery, and continuous deployment

We often think of the term “DevOps” as being synonymous with “CI/CD”. At GitHub we recognize that DevOps includes so much more, from enabling contributors to build and run code (or deploy configurations) to improving developer productivity. In turn, this shortens the time it takes to build and deliver applications, helping teams add value and learn faster. While CI/CD and DevOps aren’t precisely the same, CI/CD is still a core component of DevOps automation.

  • Continuous integration (CI) is a process that implements testing on every change, enabling users to see if their changes break anything in the environment. 
  • Continuous delivery (CD) is the practice of building software in a way that allows you to deploy any successful release candidate to production at any time.
  • Continuous deployment (CD) takes continuous delivery a step further. With continuous deployment, every successful change is automatically deployed to production. Since some industries and technologies can’t immediately release new changes to customers (think hardware and manufacturing), adopting continuous deployment depends on your organization and product.

Together, continuous integration and continuous delivery (commonly referred to as CI/CD) create a collaborative process for people to work on projects through shared ownership. At the same time, teams can maintain quality control through automation and bring new features to users with continuous deployment. 

2. Change management

Change management is often a critical part of business processes. Like the automation guidelines, there are some common principles and tooling that development and operations teams can use to create consistency.  

  • Version control: The practice of using version control has a long history rooted in helping people revert changes and learn from past decisions. From RCS to SVN, CVS to Perforce, ClearCase to Git, version control is a staple for enabling teams to collaborate by providing a common workflow and code base for individuals to work with. 
  • Change control: Along with maintaining your code’s version history, having a system in place to coordinate and facilitate changes helps to maintain product direction, reduces the probability of harmful changes to your code, and encourages a collaborative process.
  • Configuration management: Configuration management makes it easier for everyone to manage complex deployments through templates and manage changes at scale with proper controls and approvals.

3. ‘X’ as code

By now, you also may have heard of “infrastructure as code,” “configuration as code,” “policy as code,” or some of the other “as code” models. These models provide a declarative framework for managing different aspects of your operating environments through high level abstractions. Stated another way, you provide variables to a tool and the output is consistently the same, allowing you to recreate your resources consistently. DevOps implements the “as code” principle with several goals, including: an auditable change trail for compliance, collaborative change process via version control, a consistent, testable and reliable way of deploying resources, and as a way to lower the learning curve for new team members. 

  • Infrastructure as code (IaC) provides a declarative model for creating immutable infrastructure using the same versioning and workflow that developers use for source code. As changes are introduced to your infrastructure requirements, new infrastructure is defined, tested, and deployed with new configurations through automated declarative pipelines.
  • Platform as code (PaC) provides a declarative model for services similar to how infrastructure as code provides a framework for recreating the same infrastructure—allowing you to rapidly deploy services to existing infrastructure with high-level abstractions.
  • Configuration as code (CaC) brings the next level of declarative pipelining by defining the configuration of your applications as versioned resources.
  • Policy as code brings versioning and the DevOps workflow to security and policy management. 

4. Continuous monitoring

Operational insights are an invaluable component of any production environment. In order to understand the behaviors of your software in production, you need to have information about how it operates. Continuous monitoring—the processes and technology that monitor performance and stability of applications and infrastructure throughout the software lifecycle—provides operations teams with data to help troubleshoot, and development teams the information needed to debug and patch. This also leads into an important aspect of security, where DevSecOps takes on these principles with a security focus. Choosing the right monitoring tools can be the difference between a slight service interruption and a major outage. When it comes to gaining operational insights, there are some important considerations: 

  • Logging gives you a continuous stream of data about your business’ critical components. Application logs, infrastructure logs, and audit logs all provide important data that helps teams learn and improve products.
  • Monitoring provides a level of intelligence and interpretation to the raw data provided in logs and metrics. With advanced tooling, monitoring can provide teams with correlated insights beyond what the raw data provides.
  • Alerting provides proactive notifications to respective teams to help them stay ahead of major issues. When effectively implemented, these alerts not only let you know when something has gone wrong, but can also provide teams with critical debugging information to help solve the problem quickly.
  • Tracing takes logging a step further, providing a deeper level of application performance and behavioral insights that can greatly impact the stability and scalability of applications in production environments.

Putting DevOps automation into action

At this point, we’ve talked much about automation in the DevOps space, so is DevOps all about automation? Put simply, no. Automation is an important means to accomplishing this work efficiently between teams. Whether you’re new to DevOps or migrating from another set of automation solutions, testing new tooling with a small project or process is a great place to start. It will lay the foundation for scaling and standardizing automation across your entire organization, including how to measure effectiveness and progression toward your goals. 

Regardless of which toolset you choose to automate your DevOps workflow, evaluating your teams’ current workflows and the information you need to do your work will help guide you to your tool and platform selection, and set the stage for success. Here are a few more resources to help you along the way:

Want to see what DevOps automation looks like in practice? See how engineers at Wiley build faster and more securely with GitHub Actions.

Building, bundling, and deploying applications with the AWS CDK

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.

Concepts

A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

  • AWS CloudFormation templates and instructions on where to deploy them
  • Dockerfiles, corresponding application source code, and information about where to build and push the images to
  • File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),
});

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),
    },
  })
  ...
});

We specify two parameters:

  • image (required) – The Docker image to perform the build commands in
  • command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false
          }

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true
        },
      },
    },
  })
  ...
});

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

{
  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      },
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
        }
      }
    }
  }
}

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

{
  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          },
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        },
        "Handler": "main",
        ...
      }
    },
    ...
  }
}

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  ],
  destinationBucket: myBucket
});

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

  • yarn install – Installs all our dependencies
  • yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),
        ],
      },
    }),
  ],
  ...
});

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false
            }

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
          },
        },
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],
      },
    }),
  ],
  ...
});

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).

Conclusion

This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.

 

About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.

Complete CI/CD with AWS CodeCommit, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline

Post Syndicated from Nitin Verma original https://aws.amazon.com/blogs/devops/complete-ci-cd-with-aws-codecommit-aws-codebuild-aws-codedeploy-and-aws-codepipeline/

Many organizations have been shifting to DevOps practices, which is the combination of cultural philosophies, practices, and tools that increases your organization’s ability to deliver applications and services at high velocity; for example, evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

DevOps-Feedback-Flow

An integral part of DevOps is adopting the culture of continuous integration and continuous delivery/deployment (CI/CD), where a commit or change to code passes through various automated stage gates, all the way from building and testing to deploying applications, from development to production environments.

This post uses the AWS suite of CI/CD services to compile, build, and install a version-controlled Java application onto a set of Amazon Elastic Compute Cloud (Amazon EC2) Linux instances via a fully automated and secure pipeline. The goal is to promote a code commit or change to pass through various automated stage gates all the way from development to production environments, across AWS accounts.

AWS services

This solution uses the following AWS services:

  • AWS CodeCommit – A fully-managed source control service that hosts secure Git-based repositories. CodeCommit makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. This solution uses CodeCommit to create a repository to store the application and deployment codes.
  • AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy, on a dynamically created build server. This solution uses CodeBuild to build and test the code, which we deploy later.
  • AWS CodeDeploy – A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers. This solution uses CodeDeploy to deploy the code or application onto a set of EC2 instances running CodeDeploy agents.
  • AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. This solution uses CodePipeline to create an end-to-end pipeline that fetches the application code from CodeCommit, builds and tests using CodeBuild, and finally deploys using CodeDeploy.
  • AWS CloudWatch Events – An AWS CloudWatch Events rule is created to trigger the CodePipeline on a Git commit to the CodeCommit repository.
  • Amazon Simple Storage Service (Amazon S3) – An object storage service that offers industry-leading scalability, data availability, security, and performance. This solution uses an S3 bucket to store the build and deployment artifacts created during the pipeline run.
  • AWS Key Management Service (AWS KMS) – AWS KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications. This solution uses AWS KMS to make sure that the build and deployment artifacts stored on the S3 bucket are encrypted at rest.

Overview of solution

This solution uses two separate AWS accounts: a dev account (111111111111) and a prod account (222222222222) in Region us-east-1.

We use the dev account to deploy and set up the CI/CD pipeline, along with the source code repo. It also builds and tests the code locally and performs a test deploy.

The prod account is any other account where the application is required to be deployed from the pipeline in the dev account.

In summary, the solution has the following workflow:

  • A change or commit to the code in the CodeCommit application repository triggers CodePipeline with the help of a CloudWatch event.
  • The pipeline downloads the code from the CodeCommit repository, initiates the Build and Test action using CodeBuild, and securely saves the built artifact on the S3 bucket.
  • If the preceding step is successful, the pipeline triggers the Deploy in Dev action using CodeDeploy and deploys the app in dev account.
  • If successful, the pipeline triggers the Deploy in Prod action using CodeDeploy and deploys the app in the prod account.

The following diagram illustrates the workflow:

cicd-overall-flow

 

Failsafe deployments

This example of CodeDeploy uses the IN_PLACE type of deployment. However, to minimize the downtime, CodeDeploy inherently supports multiple deployment strategies. This example makes use of following features: rolling deployments and automatic rollback.

CodeDeploy provides the following three predefined deployment configurations, to minimize the impact during application upgrades:

  • CodeDeployDefault.OneAtATime – Deploys the application revision to only one instance at a time
  • CodeDeployDefault.HalfAtATime – Deploys to up to half of the instances at a time (with fractions rounded down)
  • CodeDeployDefault.AllAtOnce – Attempts to deploy an application revision to as many instances as possible at once

For OneAtATime and HalfAtATime, CodeDeploy monitors and evaluates instance health during the deployment and only proceeds to the next instance or next half if the previous deployment is healthy. For more information, see Working with deployment configurations in CodeDeploy.

You can also configure a deployment group or deployment to automatically roll back when a deployment fails or when a monitoring threshold you specify is met. In this case, the last known good version of an application revision is automatically redeployed after a failure with the new application version.

How CodePipeline in the dev account deploys apps in the prod account

In this post, the deployment pipeline using CodePipeline is set up in the dev account, but it has permissions to deploy the application in the prod account. We create a special cross-account role in the prod account, which has the following:

  • Permission to use fetch artifacts (app) rom Amazon S3 and deploy it locally in the account using CodeDeploy
  • Trust with the dev account where the pipeline runs

CodePipeline in the dev account assumes this cross-account role in the prod account to deploy the app.

Do I need multiple accounts?
If you answer “yes” to any of the following questions you should consider creating more AWS accounts:

  • Does your business require administrative isolation between workloads? Administrative isolation by account is the most straightforward way to grant independent administrative groups different levels of administrative control over AWS resources based on workload, development lifecycle, business unit (BU), or data sensitivity.
  • Does your business require limited visibility and discoverability of workloads? Accounts provide a natural boundary for visibility and discoverability. Workloads cannot be accessed or viewed unless an administrator of the account enables access to users managed in another account.
  • Does your business require isolation to minimize blast radius? Separate accounts help define boundaries and provide natural blast-radius isolation to limit the impact of a critical event such as a security breach, an unavailable AWS Region or Availability Zone, account suspensions, and so on.
  • Does your business require a particular workload to operate within AWS service limits without impacting the limits of another workload? You can use AWS account service limits to impose restrictions on a business unit, development team, or project. For example, if you create an AWS account for a project group, you can limit the number of Amazon Elastic Compute Cloud (Amazon EC2) or high performance computing (HPC) instances that can be launched by the account.
  • Does your business require strong isolation of recovery or auditing data? If regulatory requirements require you to control access and visibility to auditing data, you can isolate the data in an account separate from the one where you run your workloads (for example, by writing AWS CloudTrail logs to a different account).

Prerequisites

For this walkthrough, you should complete the following prerequisites:

  1. Have access to at least two AWS accounts. For this post, the dev and prod accounts are in us-east-1. You can search and replace the Region and account IDs in all the steps and sample AWS Identity and Access Management (IAM) policies in this post.
  2. Ensure you have EC2 Linux instances with the CodeDeploy agent installed in all the accounts or VPCs where the sample Java application is to be installed (dev and prod accounts).
    • To manually create EC2 instances with CodeDeploy agent, refer Create an Amazon EC2 instance for CodeDeploy (AWS CLI or Amazon EC2 console). Keep in mind the following:
      • CodeDeploy uses EC2 instance tags to identify instances to use to deploy the application, so it’s important to set tags appropriately. For this post, we use the tag name Application with the value MyWebApp to identify instances where the sample app is installed.
      • Make sure to use an EC2 instance profile (AWS Service Role for EC2 instance) with permissions to read the S3 bucket containing artifacts built by CodeBuild. Refer to the IAM role cicd_ec2_instance_profile in the table Roles-1 below for the set of permissions required. You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.
    • To create EC2 Linux instances via AWS Cloudformation, download and launch the AWS CloudFormation template from the GitHub repo: cicd-ec2-instance-with-codedeploy.json
      • This deploys an EC2 instance with AWS CodeDeploy agent.
      • Inputs required:
        • AMI : Enter name of the Linux AMI in your region. (This template has been tested with latest Amazon Linux 2 AMI)
        • Ec2SshKeyPairName: Name of an existing SSH KeyPair
        • Ec2IamInstanceProfile: Name of an existing EC2 instance profile. Note: Use the permissions in the template cicd_ec2_instance_profile_policy.json to create the policy for this EC2 Instance Profile role. You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.
        • Update the EC2 instance Tags per your need.
  3. Ensure required IAM permissions. Have an IAM user with an IAM Group or Role that has the following access levels or permissions:

    AWS Service / Components  Access Level Accounts Comments
    AWS CodeCommit Full (admin) Dev Use AWS managed policy AWSCodeCommitFullAccess.
    AWS CodePipeline Full (admin) Dev Use AWS managed policy AWSCodePipelineFullAccess.
    AWS CodeBuild Full (admin) Dev Use AWS managed policy AWSCodeBuildAdminAccess.
    AWS CodeDeploy Full (admin) All

    Use AWS managed policy

    AWSCodeDeployFullAccess.

    Create S3 bucket and bucket policies Full (admin) Dev IAM policies can be restricted to specific bucket.
    Create KMS key and policies Full (admin) Dev IAM policies can be restricted to specific KMS key.
    AWS CloudFormation Full (admin) Dev

    Use AWS managed policy

    AWSCloudFormationFullAccess.

    Create and pass IAM roles Full (admin) All Ability to create IAM roles and policies can be restricted to specific IAM roles or actions. Also, an admin team with IAM privileges could create all the required roles. Refer to the IAM table Roles-1 below.
    AWS Management Console and AWS CLI As per IAM User permissions All To access suite of Code services.

     

  4. Create Git credentials for CodeCommit in the pipeline account (dev account). AWS allows you to either use Git credentials or associate SSH public keys with your IAM user. For this post, use Git credentials associated with your IAM user (created in the previous step). For instructions on creating a Git user, see Create Git credentials for HTTPS connections to CodeCommit. Download and save the Git credentials to use later for deploying the application.
  5. Create all AWS IAM roles as per the following tables (Roles-1). Make sure to update the following references in all the given IAM roles and policies:
    • Replace the sample dev account (111111111111) and prod account (222222222222) with actual account IDs
    • Replace the S3 bucket mywebapp-codepipeline-bucket-us-east-1-111111111111 with your preferred bucket name.
    • Replace the KMS key ID key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ID.

Table: Roles-1

Service IAM Role Type Account IAM Role Name (used for this post) IAM Role Policy (required for this post) IAM Role Permissions
AWS CodePipeline Service role Dev (111111111111)

cicd_codepipeline_service_role

Select Another AWS Account and use this account as the account ID to create the role.

Later update the trust as follows:
“Principal”: {“Service”: “codepipeline.amazonaws.com”},

Use the permissions in the template cicd_codepipeline_service_policy.json to create the policy for this role. This CodePipeline service role has appropriate permissions to the following services in a local account:

  • Manage CodeCommit repos
  • Initiate build via CodeBuild
  • Create deployments via CodeDeploy
  • Assume cross-account CodeDeploy role in prod account to deploy the application
AWS CodePipeline IAM role Dev (111111111111)

cicd_codepipeline_trigger_cwe_role

Select Another AWS Account and use this account as the account ID to create the role.

Later update the trust as follows:
“Principal”: {“Service”: “events.amazonaws.com”},

Use the permissions in the template cicd_codepipeline_trigger_cwe_policy.json to create the policy for this role. CodePipeline uses this role to set a CloudWatch event to trigger the pipeline when there is a change or commit made to the code repository.
AWS CodePipeline IAM role Prod (222222222222)

cicd_codepipeline_cross_ac_role

Choose Another AWS Account and use the dev account as the trusted account ID to create the role.

Use the permissions in the template cicd_codepipeline_cross_ac_policy.json to create the policy for this role. This role is created in the prod account and has permissions to use CodeDeploy and fetch from Amazon S3. The role is assumed by CodePipeline from the dev account to deploy the app in the prod account. Make sure to set up trust with the dev account for this IAM role on the Trust relationships tab.
AWS CodeBuild Service role Dev (111111111111)

cicd_codebuild_service_role

Choose CodeBuild as the use case to create the role.

Use the permissions in the template cicd_codebuild_service_policy.json to create the policy for this role. This CodeBuild service role has appropriate permissions to:

  • The S3 bucket to store artefacts
  • Stream logs to CloudWatch Logs
  • Pull code from CodeCommit
  • Get the SSM parameter for CodeBuild
  • Miscellaneous Amazon EC2 permissions
AWS CodeDeploy Service role Dev (111111111111) and Prod (222222222222)

cicd_codedeploy_service_role

Choose CodeDeploy as the use case to create the role.

Use the built-in AWS managed policy AWSCodeDeployRole for this role. This CodeDeploy service role has appropriate permissions to:

  • Miscellaneous Amazon EC2 Auto Scaling
  • Miscellaneous Amazon EC2
  • Publish Amazon SNS topic
  • AWS CloudWatch metrics
  • Elastic Load Balancing
EC2 Instance Service role for EC2 instance profile Dev (111111111111) and Prod (222222222222)

cicd_ec2_instance_profile

Choose EC2 as the use case to create the role.

Use the permissions in the template cicd_ec2_instance_profile_policy.json to create the policy for this role.

This is set as the EC2 instance profile for the EC2 instances where the app is deployed. It has appropriate permissions to fetch artefacts from Amazon S3 and decrypt contents using the KMS key.

 

You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.

 

 

Setting up the prod account

To set up the prod account, complete the following steps:

  1. Download and launch the AWS CloudFormation template from the GitHub repo: cicd-codedeploy-prod.json
    • This deploys the CodeDeploy app and deployment group.
    • Make sure that you already have a set of EC2 Linux instances with the CodeDeploy agent installed in all the accounts where the sample Java application is to be installed (dev and prod accounts). If not, refer back to the Prerequisites section.
  2. Update the existing EC2 IAM instance profile (cicd_ec2_instance_profile):
    • Replace the S3 bucket name mywebapp-codepipeline-bucket-us-east-1-111111111111 with your S3 bucket name (the one used for the CodePipelineArtifactS3Bucket variable when you launched the CloudFormation template in the dev account).
    • Replace the KMS key ARN arn:aws:kms:us-east-1:111111111111:key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ARN (the one created as part of the CloudFormation template launch in the dev account).

Setting up the dev account

To set up your dev account, complete the following steps:

  1. Download and launch the CloudFormation template from the GitHub repo: cicd-aws-code-suite-dev.json
    The stack deploys the following services in the dev account:

    • CodeCommit repository
    • CodePipeline
    • CodeBuild environment
    • CodeDeploy app and deployment group
    • CloudWatch event rule
    • KMS key (used to encrypt the S3 bucket)
    • S3 bucket and bucket policy
  2. Use following values as inputs to the CloudFormation template. You should have created all the existing resources and roles beforehand as part of the prerequisites.

    Key Example Value Comments
    CodeCommitWebAppRepo MyWebAppRepo Name of the new CodeCommit repository for your web app.
    CodeCommitMainBranchName master Main branch name on your CodeCommit repository. Default is master (which is pushed to the prod environment).
    CodeBuildProjectName MyCBWebAppProject Name of the new CodeBuild environment.
    CodeBuildServiceRole arn:aws:iam::111111111111:role/cicd_codebuild_service_role ARN of an existing IAM service role to be associated with CodeBuild to build web app code.
    CodeDeployApp MyCDWebApp Name of the new CodeDeploy app to be created for your web app. We assume that the CodeDeploy app name is the same in all accounts where deployment needs to occur (in this case, the prod account).
    CodeDeployGroupDev MyCICD-Deployment-Group-Dev Name of the new CodeDeploy deployment group to be created in the dev account.
    CodeDeployGroupProd MyCICD-Deployment-Group-Prod Name of the existing CodeDeploy deployment group in prod account. Created as part of the prod account setup.

    CodeDeployGroupTagKey

     

    Application Name of the tag key that CodeDeploy uses to identify the existing EC2 fleet for the deployment group to use.

    CodeDeployGroupTagValue

     

    MyWebApp Value of the tag that CodeDeploy uses to identify the existing EC2 fleet for the deployment group to use.
    CodeDeployConfigName CodeDeployDefault.OneAtATime

    Desired Code Deploy config name. Valid options are:

    CodeDeployDefault.OneAtATime

    CodeDeployDefault.HalfAtATime

    CodeDeployDefault.AllAtOnce

    For more information, see Deployment configurations on an EC2/on-premises compute platform.

    CodeDeployServiceRole arn:aws:iam::111111111111:role/cicd_codedeploy_service_role

    ARN of an existing IAM service role to be associated with CodeDeploy to deploy web app.

     

    CodePipelineName MyWebAppPipeline Name of the new CodePipeline to be created for your web app.
    CodePipelineArtifactS3Bucket mywebapp-codepipeline-bucket-us-east-1-111111111111 Name of the new S3 bucket to be created where artifacts for the pipeline are stored for this web app.
    CodePipelineServiceRole arn:aws:iam::111111111111:role/cicd_codepipeline_service_role ARN of an existing IAM service role to be associated with CodePipeline to deploy web app.
    CodePipelineCWEventTriggerRole arn:aws:iam::111111111111:role/cicd_codepipeline_trigger_cwe_role ARN of an existing IAM role used to trigger the pipeline you named earlier upon a code push to the CodeCommit repository.
    CodeDeployRoleXAProd arn:aws:iam::222222222222:role/cicd_codepipeline_cross_ac_role ARN of an existing IAM role in the cross-account for CodePipeline to assume to deploy the app.

    It should take 5–10 minutes for the CloudFormation stack to complete. When the stack is complete, you can see that CodePipeline has built the pipeline (MyWebAppPipeline) with the CodeCommit repository and CodeBuild environment, along with actions for CodeDeploy in local (dev) and cross-account (prod). CodePipeline should be in a failed state because your CodeCommit repository is empty initially.

  3. Update the existing Amazon EC2 IAM instance profile (cicd_ec2_instance_profile):
    • Replace the S3 bucket name mywebapp-codepipeline-bucket-us-east-1-111111111111 with your S3 bucket name (the one used for the CodePipelineArtifactS3Bucket parameter when launching the CloudFormation template in the dev account).
    • Replace the KMS key ARN arn:aws:kms:us-east-1:111111111111:key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ARN (the one created as part of the CloudFormation template launch in the dev account).

Deploying the application

You’re now ready to deploy the application via your desktop or PC.

  1. Assuming you have the required HTTPS Git credentials for CodeCommit as part of the prerequisites, clone the CodeCommit repo that was created earlier as part of the dev account setup. Obtain the name of the CodeCommit repo to clone, from the CodeCommit console. Enter the Git user name and password when prompted. For example:
    $ git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MyWebAppRepo my-web-app-repo
    Cloning into 'my-web-app-repo'...
    Username for 'https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MyWebAppRepo': xxxx
    Password for 'https://[email protected]/v1/repos/MyWebAppRepo': xxxx

  2. Download the MyWebAppRepo.zip file containing a sample Java application, CodeBuild configuration to build the app, and CodeDeploy config file to deploy the app.
  3. Copy and unzip the file into the my-web-app-repo Git repository folder created earlier.
  4. Assuming this is the sample app to be deployed, commit these changes to the Git repo. For example:
    $ cd my-web-app-repo 
    $ git add -A 
    $ git commit -m "initial commit" 
    $ git push

For more information, see Tutorial: Create a simple pipeline (CodeCommit repository).

After you commit the code, the CodePipeline will be triggered and all the stages and your application should be built, tested, and deployed all the way to the production environment!

The following screenshot shows the entire pipeline and its latest run:

 

Troubleshooting

To troubleshoot any service-related issues, see the following:

Cleaning up

To avoid incurring future charges or to remove any unwanted resources, delete the following:

  • EC2 instance used to deploy the application
  • CloudFormation template to remove all AWS resources created through this post
  •  IAM users or roles

Conclusion

Using this solution, you can easily set up and manage an entire CI/CD pipeline in AWS accounts using the native AWS suite of CI/CD services, where a commit or change to code passes through various automated stage gates all the way from building and testing to deploying applications, from development to production environments.

FAQs

In this section, we answer some frequently asked questions:

  1. Can I expand this deployment to more than two accounts?
    • Yes. You can deploy a pipeline in a tooling account and use dev, non-prod, and prod accounts to deploy code on EC2 instances via CodeDeploy. Changes are required to the templates and policies accordingly.
  2. Can I ensure the application isn’t automatically deployed in the prod account via CodePipeline and needs manual approval?
  3. Can I use a CodeDeploy group with an Auto Scaling group?
    • Yes. Minor changes required to the CodeDeploy group creation process. Refer to the following Solution Variations section for more information.
  4. Can I use this pattern for EC2 Windows instances?

Solution variations

In this section, we provide a few variations to our solution:

Author bio

author-pic

 Nitin Verma

Nitin is currently a Sr. Cloud Architect in the AWS Managed Services(AMS). He has many years of experience with DevOps-related tools and technologies. Speak to your AWS Managed Services representative to deploy this solution in AMS!

 

Automated CloudFormation Testing Pipeline with TaskCat and CodePipeline

Post Syndicated from Raleigh Hansen original https://aws.amazon.com/blogs/devops/automated-cloudformation-testing-pipeline-with-taskcat-and-codepipeline/

Researchers at Academic Medical Centers (AMCs) use programs such as Observational Health Data Sciences and Informatics (OHDSI) and Research Electronic Data Capture (REDCap) to interact with healthcare data. Our internal team at AWS has provided solutions such as OHDSI-on-AWS and REDCap environments on AWS to help clinicians analyze healthcare data in the AWS Cloud. Occasionally, these solutions break due to a change in some portion of the solution (e.g. updated services). The Automated Solutions Testing Pipeline enables our team to take a proactive approach to discovering these breaks and their cause in order to expedite the repair process.

OHDSI-on-AWS provides these AMCs with the ability to store and analyze observational health data in the AWS cloud. REDCap is a web application for managing surveys and databases with HIPAA-compliant environments. Using our solutions, these programs can be spun up easily on the AWS infrastructure using AWS CloudFormation templates.

Updates to AWS services and other program libraries can cause the CloudFormation template to fail during deployment. Other times, the outputs may not be operating correctly, or the template may not work on every AWS region. This can create a negative customer experience. Some customers may discover this kind of break and decide to not move forward with using the solution. Other customers may not even realize the solution is broken, so they might be unknowingly working with an uncooperative environment. Furthermore, we cannot always provide fast support to the customers who contact us about broken solutions. To meet our team’s needs and the needs of our customers, we decided to focus our efforts on taking a CI/CD approach to maintain these solutions. We developed the Automated Testing Pipeline which regularly tests solution deployment and changes to source files.

This post shows the features of the Automated Testing Pipeline and provides resources to help you get started using it with your AWS account.

Overview of Automated Testing Pipeline Solution

The Automated Testing Pipeline solution as a whole is designed to automatically deploy CloudFormation templates, run tests against the deployed environments, send notifications if an issue is discovered, and allow for insightful testing data to be easily explored.

CloudFormation templates to be tested are stored in an Amazon S3 bucket. Custom test scripts and TaskCat deployment configuration are stored in an AWS CodeCommit repository.

The pipeline is triggered in one of three ways: an update to the CloudFormation Template in S3, an Amazon CloudWatch events rule, and an update to the testing source code repository. Once the pipeline has been triggered, AWS CodeBuild pulls the source code to deploy the CloudFormation template, test the deployed environment, and store the results in an S3 bucket. If any failures are discovered, subscribers to the failure topic are notified. The following diagram shows its overall architecture.

Diagram of Automated Testing Pipeline architecture

Diagram of Automated Testing Pipeline architecture

In order to create the Automated Testing Pipeline, two interns collaborated over the course of 5 weeks to produce the architecture and custom test scripts. We divided the work of constructing a serverless architecture and writing out test scripts for the output urls for OHDSI-on-AWS and REDCap environments on AWS.

The following tasks were completed to build out the Automated Testing Pipeline solution:

  • Setup AWS IAM roles for accessing AWS resources securely
  • Create CloudWatch events to trigger AWS CodePipeline
  • Setup CodePipeline and CodeBuild to run TaskCat and testing scripts
  • Configure TaskCat to deploy CloudFormation solutions in various AWS Regions
  • Write test scripts to interact with CloudFormation solutions’ deployed environments
  • Subscribe to receive emails detailing test results
  • Create a CloudFormation template for the Automated Testing Pipeline

The architecture can be extended to test any CloudFormation stack. For this particular use case, we wrote the test scripts specifically to test the urls output by the CloudFormation solutions. The Automated Testing Pipeline has the following features:

  • Deployed in a single AWS Region, with the exception of the tested CloudFormation solution
  • Has a serverless architecture operating at the AWS Region level
  • Deploys a pipeline which can deploy and test the CloudFormation solution
  • Creates CloudWatch events to activate the pipeline on a schedule or when the solution is updated
  • Creates an Amazon SNS topic for notifying subscribers when there are errors
  • Includes code for running TaskCat and scripts to test solution functionality
  • Built automatically in minutes
  • Low in cost with free tier benefits

The pipeline is triggered automatically when an event occurs. These events include a change to the CloudFormation solution template, a change to the code in the testing repository, and an alarm set off by a regular schedule. Additional events can be added in the CloudWatch console.

When the pipeline is triggered, the testing environment is set up by CodeBuild. CodeBuild uses a build specification file kept within our source repository to set up the environment and run the test scripts. We created a CodeCommit repository to host the test scripts alongside the build specification. The build specification includes commands run TaskCat — an open-source tool for testing the deployment of CloudFormation templates. TaskCat provides the ability to test the deployment of the CloudFormation solution, but we needed custom test scripts to ensure that we can interact with the deployed environment as expected. If the template is successfully deployed, CodeBuild handles running the test scripts against the CloudFormation solution environment. In our case, the environment is accessed via urls output by the CloudFormation solution.

We used a Selenium WebDriver for interacting with the web pages given by the output urls. This allowed us to programmatically navigate a headless web browser in the serverless environment and gave us the ability to use text output by JavaScript functions to understand the state of the test. You can see this interaction occurring in the code snippet below.

def log_in(driver, user, passw, link, btn_path, title):
    """Enter username and password then submit to log in

        :param driver: webdriver for Chrome page
        :param user: username as String
        :param passw: password as String
        :param link: url for page being tested as String
        :param btn_path: xpath to submit button
        :param title: expected page title upon successful sign in
        :return: success String tuple if log in completed, failure description tuple String otherwise
    """
    try:
        # post username and password data
        driver.find_element_by_xpath("//input[ @name='username' ]").send_keys(user)
        driver.find_element_by_xpath("//input[ @name='password' ]").send_keys(passw)

        # click sign in button and wait for page update
        driver.find_element_by_xpath(btn_path).click()
    except NoSuchElementException:
        return 'FAILURE', 'Unable to access page elements'

    try:
        WebDriverWait(driver, 20).until(ec.url_changes(link))
        WebDriverWait(driver, 20).until(ec.title_is(title))
    except TimeoutException as e:
        print("Timeout occurred (" + e + ") while attempting to sign in to " + driver.current_url)
        if "Sign In" in driver.title or "invalid user" in driver.page_source.lower():
            return 'FAILURE', 'Incorrect username or password'
        else:
            return 'FAILURE', 'Sign in attempt timed out'

    return 'SUCCESS', 'Sign in complete'

We store the test results in JSON format for ease of parsing. TaskCat generates a dashboard which we customize to display these test results. We are able to insert our JSON results into the dashboard in order to make it easy to find errors and access log files. This dashboard is a static html file that can be hosted on an S3 bucket. In addition, messages are published to topics in SNS whenever an error occurs which provide a link to this dashboard.

Dashboard containing descriptions of tests and their results

Customized TaskCat dashboard

In true CI/CD fashion, this end-to-end design automatically performs tasks that would otherwise be performed manually. We have shown how deploying solutions, testing solutions, notifying maintainers, and providing a results dashboard are all actions handled entirely by the Automated Testing Pipeline.

Getting Started with the Automated Testing Pipeline

Prerequisite tasks to complete before deploying the pipeline:

Once the prerequisite tasks are completed, the pipeline is ready to be deployed. Detailed information about deployment, altering the source code to fit your use case, and troubleshooting issues can be found at the GitHub page for the Automated Testing Pipeline.

For those looking to jump right into deployment, click the Launch Stack button below.

Button to click to deploy the Automated Testing Pipeline via CloudFormation

Tasks to complete after deployment:

  • Subscribe to SNS topic for error messages
  • Update the code to match the parameters and CloudFormation template that were chosen
  • Skip this step if you are testing OHDSI-on-AWS. Upload the desired CloudFormation template to the created source S3 Bucket
  • Push the source code to the created CodeCommit Repository

After the code is pushed to the CodeCommit repository and the CloudFormation template has been uploaded to S3, the pipeline will run automatically. You can visit the CodePipeline console to confirm that the pipeline is running with an “in progress” status.

You may desire to alter various aspects of the Automated Testing Pipeline to better fit your use case. Listed below are some actions you can take to modify the solution to fit your needs:

  • Go to CloudWatch Events and update rules for automatically started the pipeline.
  • Scale out testing by providing custom testing scripts or altering the existing ones.
  • Test a different CloudFormation template by uploading it to the source S3 bucket created and configuring the pipeline accordingly. Custom test scripts will likely be required for this use case.

Challenges Addressed by the Automated Testing Pipeline

The Automated Testing Pipeline directly addresses the challenges we faced with maintaining our OHDSI and REDCap solutions. Additionally, the pipeline can be used whenever there is a need to test CloudFormation templates that are being used on a regular basis or are distributed to other users. Listed below is the set of specific challenges we faced maintaining CloudFormation solutions and how the pipeline addresses them.

Table describing challenges faced with their direct solution offered by Testing Pipeline

The desire to better serve our customers guided our decision to create the Automated Testing Pipeline. For example, we know that source code used to build the OHDSI-on-AWS environment changes on occasion. Some of these changes have caused the environment to stop functioning correctly. This left us with cases where our customers had to either open an issue on GitHub or reach out to AWS directly for support. Our customers depend on OHDSI-on-AWS functioning properly, so fixing issues is of high priority to our team. The ability to run tests regularly allows us to take action without depending on notice from our customers. Now, we can be the first ones to know if something goes wrong and get to fixing it sooner.

“This automation will help us better monitor the CloudFormation-based projects our customers depend on to ensure they’re always in working order.” — James Wiggins, EDU HCLS SA Manager

Cleaning Up

If you decide to quit using the Automated Testing Pipeline, follow the steps below to get rid of the resources associated with it in your AWS account.

  • Delete CloudFormation solution root Stack
  • Delete pipeline CloudFormation Stack
  • Delete ATLAS S3 Bucket if OHDSI-on-AWS was chosen

Deleting the pipeline CloudFormation stack handles removing the resources associated with its architecture. Depending on the CloudFormation template chosen for testing, additional resources associated with it may need to be removed. Visit our GitHub page for more information on removing resources.

Conclusion

The ability to continuously test preexisting solutions on AWS has great benefits for our team and our customers. The automated nature of this testing frees up time for us and our customers, and the dashboard makes issues more visible and easier to resolve. We believe that sharing this story can benefit anyone facing challenges maintaining CloudFormation solutions in AWS. Check out the Getting Started with the Automated Testing Pipeline section of this post to deploy the solution.

Additional Resources

More information about the key services and open-source software used in our pipeline can be found at the following documentation pages:

About the Authors

Raleigh Hansen is a former Solutions Architect Intern on the Academic Medical Centers team at AWS. She is passionate about solving problems and improving upon existing systems. She also adores spending time with her two cats.

Dan Le is a former Solutions Architect Intern on the Academic Medical Centers team at AWS. He is passionate about technology and enjoys doing art and music.

Amazon Supplier Fraud

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/08/amazon_supplier.html

Interesting story of an Amazon supplier fraud:

According to the indictment, the brothers swapped ASINs for items Amazon ordered to send large quantities of different goods instead. In one instance, Amazon ordered 12 canisters of disinfectant spray costing $94.03. The defendants allegedly shipped 7,000 toothbrushes costing $94.03 each, using the code for the disinfectant spray, and later billed Amazon for over $650,000.

In another instance, Amazon ordered a single bottle of designer perfume for $289.78. In response, according to the indictment, the defendants sent 927 plastic beard trimmers costing $289.79 each, using the ASIN for the perfume. Prosecutors say the brothers frequently shipped and charged Amazon for more than 10,000 units of an item when it had requested fewer than 100. Once Amazon detected the fraud and shut down their accounts, the brothers allegedly tried to open new ones using fake names, different email addresses, and VPNs to obscure their identity.

It all worked because Amazon is so huge that everything is automated.

Building an automated knowledge repo with Amazon EventBridge and Zendesk

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/building-an-automated-knowledge-repo-with-amazon-eventbridge-and-zendesk/

Zendesk Guide is a smart knowledge base that helps customers harness the power of institutional knowledge. It enables users to build a customizable help center and customer portal.

This post shows how to implement a bidirectional event orchestration pattern between AWS services and an Amazon EventBridge third-party integration partner. This example uses support ticket events to build a customer self-service knowledge repository. It uses the EventBridge partner integration with Zendesk to accelerate the growth of a customer help center.

The examples in this post are part of a serverless application called FreshTracks. This is built in Vue.js and demonstrates SaaS integrations with Amazon EventBridge. To test this example, ask a question on the Fresh Tracks application.

The backend components for this EventBridge integration with Zendesk have been extracted into a separate example application in this GitHub repo.

How the application works

Routing Zendesk events with Amazon EventBridge.

Routing Zendesk events with Amazon EventBridge.

  1. A user searches the knowledge repository via a widget embedded in the web application.
  2. If there is no answer, the user submits the question via the web widget.
  3. Zendesk receives the question as a support ticket.
  4. Zendesk emits events when the support ticket is resolved.
  5. These events are streamed into a custom SaaS event bus in EventBridge.
  6. Event rules match events and send them downstream to an AWS Step Functions Express Workflow.
  7. The Express Workflow orchestrates Lambda functions to retrieve additional information about the event with the Zendesk API.
  8. A Lambda function uses the Zendesk API to publish a new help article from the support ticket data.
  9. The new article is searchable on the website widget for other users to read.

Before deploying this application, you must generate an API key from within Zendesk.

Creating the Zendesk API resource

Use an API to execute events on your Zendesk account from AWS. Follow these steps to generate a Zendesk API token. This is used by the application to authenticate Zendesk API calls.

To generate an API token

  1. Log in to the Zendesk dashboard.
  2. Click the Admin icon in the sidebar, then select Channels > API.
  3. Click the Settings tab, and make sure that Token Access is enabled.
  4. Click the + button to the right of Active API Tokens.

    Creating a Zendesk API token.

    Creating a Zendesk API token.

  5. Copy the token, and store it securely. Once you close this window, the full token is not displayed again.
  6. Click Save to return to the API page, which shows a truncated version of the token.

    Zendesk API token.

    Zendesk API token.

Configuring Zendesk with Amazon EventBridge

Step 1. Configuring your Zendesk event source.

  1. Go to your Zendesk Admin Center and select Admin Center > Integrations.
  2. Choose Connect in events Connector for Amazon EventBridge to open the page to configure your Zendesk event source.

    Zendesk integrations

    Zendesk integrations

  3. Enter your AWS account ID in the Amazon Web Services account ID field, and select the Region to receive events.
  4. Choose Save.

    Zendesk Amazon EventBridge configuration.

Step 2. Associate the Zendesk event source with a new event bus: 

  1. Log into the AWS Management Console and navigate to services > Amazon EventBridge > Partner event sources
    New event source

    New event source

     

  2. Select the radio button next to the new event source and choose Associate with event bus.
    Associating event source with event bus.

    Associating event source with event bus.

     

  3. Choose Associate.

Deploying the backend application

After associating the Event source with a new partner event bus, you can deploy backend services to receive events.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

When deploying the application stack, make sure to provide the custom event bus name, and Zendesk API credentials with --parameter-overrides.

sam deploy --parameter-overrides ZendeskEventBusName=aws.partner/zendesk.com/123456789/default ZenDeskDomain=MydendeskDomain ZenDeskPassword=myAPITOken ZenDeskUsername=myZendeskAgentUsername

You can find the name of the new Zendesk custom event bus in the custom event bus section of the EventBridge console.

Routing events with rules

When a support ticket is updated in Zendesk, a number of individual events are streamed to EventBridge, these include an event for each of:

  • Agent Assignment Changed
  • Comment Created
  • Status Changed
  • Brand Changed
  • Subject Changed

An EventBridge rule is used filter for events. The AWS Serverless Application Model (SAM) template defines the rule with the `AWS::Events::Rule` resource type. This routes the event downstream to an AWS Step Functions Express Workflow.  The EventPattern is shown below:

  ZendeskNewWebQueryClosed: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "New Web Query"
      EventBusName: 
         Ref: ZendeskEventBusName
      EventPattern: 
        account:
        - !Sub '${AWS::AccountId}'
        detail-type: 
        - "Support Ticket: Comment Created"
        detail:
          ticket_event:
            ticket:
              status: 
              - solved
              tags:
              - web_widget
              tags: 
              - guide
      Targets: 
        - RoleArn: !GetAtt [ MyStatesExecutionRole, Arn ]
          Arn: !Ref FreshTracksZenDeskQueryMachine
          Id: NewQuery

The tickets must have two specific tags (web_widget and guide) for this pattern to match. These are defined as separate fields to create an AND  matching rule, instead of declaring within the same array field to create an OR rule. A new comment on a support ticket triggers the event.

The Step Functions Express Workflow

The application routes events to a Step Functions Express Workflow that is defined in the application’s SAM template:

FreshTracksZenDeskQueryMachine:
    Type: "AWS::StepFunctions::StateMachine"
    Properties:
      StateMachineType: EXPRESS
      DefinitionString: !Sub |
               {
                    "Comment": "Create a new article from a zendeskTicket",
                    "StartAt": "GetFullZendeskTicket",
                    "States": {
                      "GetFullZendeskTicket": {
                      "Comment": "Get Full Ticket Details",
                      "Type": "Task",
                      "ResultPath": "$.FullTicket",
                      "Resource": "${GetFullZendeskTicket.Arn}",
                      "Next": "GetFullZendeskUser"
                      },
                      "GetFullZendeskUser": {
                      "Comment": "Get Full User Details",
                      "Type": "Task",
                      "ResultPath": "$.FullUser",
                      "Resource": "${GetFullZendeskUser.Arn}",
                      "Next": "PublishArticle"
                      },
                      "PublishArticle": {
                      "Comment": "Publish as an article",
                      "Type": "Task",
                      "Resource": "${CreateZendeskArticle.Arn}",
                      "End": true
                      }
                    }
                }
      RoleArn: !GetAtt [ MyStatesExecutionRole, Arn ]

This application is suited for a Step Functions Express Workflow because it is orchestrating short duration, high-volume, event-based workloads.  Each workflow task is idempotent and stateless. The Express Workflow carries the workload’s state by passing the output of one task to the input of the next. The Amazon States Language ResultPath definition is used to control where each tasks output is appended to workflow’s state before it is passed to the next task.

 

AWS StepFunctions Express workflow

AWS StepFunctions Express workflow

Lambda functions

Each task in this Express Workflow invokes a Lambda function defined within the example application’s SAM template. The Lambda functions use the Node.js Axios package to make a request to Zendesk’s API.  The Zendesk API credentials are stored in the Lambda function’s environment variables and accessible via ‘process.env’.

The first two Lambda functions in the workflow make a GET request to Zendesk. This retrieves additional data about the support ticket, the author, and the agent’s response.

The final Lambda function makes a POST request to Zendesk API. This creates and publishes a new article using this data.  The permission_group and section defined in this function must be set to your Zendesk account’s default permission group ID and FAQ section ID.

AWS Lambda function code

AWS Lambda function code

Integrating with your front-end application

Follow the instructions in the Fresh Tracks repo on GitHub to deploy the front-end application. This application includes Zendesk’s web widget script in the index.html page. The widget has been customized using Zendesk’s javascript API. This is implemented in the navigation component to insert custom forms into the widget and prefill the email address field for authenticated users. The backend application starts receiving Zendesk emitted events immediately.

The video below demonstrates the implementation from end to end.

Conclusion

This post explains how to set up EventBridge’s third-party integration with Zendesk to capture events. The example backend application demonstrates how to filter these events, and send downstream to a Step Functions Express Workflow. The Express Workflow orchestrates a series of stateless Lambda functions to gather additional data about the event. Zendesk’s API is then used to publish a new help guide article from this data.

This pattern provides a framework for bidirectional event orchestration between AWS services, custom web applications and third party integration partners. This can be replicated and applied to any number of third party integration partners.

This is implemented with minimal code to provide near real-time streaming of events and without adding latency to your application.

The possibilities are vast. I am excited to see how builders use this bidirectional serverless pattern to add even more value to their third party services.

Start here to learn about other SaaS integrations with Amazon EventBridge.

Automatic Instacart Bots

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/04/automatic_insta.html

Instacart is taking legal action against bots that automatically place orders:

Before it closed, to use Cartdash users first selected what items they want from Instacart as normal. Once that was done, they had to provide Cartdash with their Instacart email address, password, mobile number, tip amount, and whether they prefer the first available delivery slot or are more flexible. The tool then checked that their login credentials were correct, logged in, and refreshed the checkout page over and over again until a new delivery window appeared. It then placed the order, Koch explained.

I think I am writing a new book about hacking in general, and want to discuss this. First, does this count as a hack? I feel like it is, since it’s a way to subvert the Instacart ordering system.

When asked if this tool may give people an unfair advantage over those who don’t use the tool, Koch said, “at this point, it’s a matter of awareness, not technical ability, since people who can use Instacart can use Cartdash.” When pushed on how, realistically, not every user of Instacart is going to know about Cartdash, even after it may receive more attention, and the people using Cartdash will still have an advantage over people who aren’t using automated tools, Koch again said, “it’s a matter of awareness, not technical ability.”

Second, should Instacart take action against this? On the one hand, it isn’t “fair” in that Cartdash users get an advantage in finding a delivery slot. But it’s not really any different than programs that “snipe” on eBay and other bidding platforms.

Third, does Instacart even stand a chance in the long run. As various AI technologies give us more agents and bots, this is going to increasingly become the new normal. I think we need to figure out a fair allocation mechanism that doesn’t rely on the precise timing of submissions.

Introducing Dispatch

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/introducing-dispatch-da4b8a2a8072

By Kevin Glisson, Marc Vilanova, Forest Monsen

Netflix is pleased to announce the open-source release of our crisis management orchestration framework: Dispatch!

Okay, but what is Dispatch? Put simply, Dispatch is:

All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!

Dispatch helps us effectively manage security incidents by deeply integrating with existing tools used throughout an organization (Slack, GSuite, Jira, etc.,) Dispatch leverages the existing familiarity of these tools to provide orchestration instead of introducing another tool.

This means you can let Dispatch focus on creating resources, assembling participants, sending out notifications, tracking tasks, and assisting with post-incident reviews; allowing you to focus on actually fixing the issue! Sounds interesting? Continue reading!

The Challenge of Crisis Management

Managing incidents is a stressful job. You are dealing with many questions all at once: What’s the scope? Who can help me? Who do I need to engage? How do I manage all of this?

In general, every incident is unique and extraordinary, if the same incidents are happening over and over you’re firefighting.

There are four main components to Crisis Management that we are attempting to address:

  1. Resource Management — The management of not only data collected about the incident itself but all of the metadata about the response.
  2. Individual Engagement — Understanding the best way to engage individuals and teams, and doing so based on incident context.
  3. Life Cycle Management — Providing the Incident Commander (IC) tools to easily manage the life cycle of the incident.
  4. Incident Learning — Building on past incidents to speed up the resolution of future incidents.

We will use the following terminology throughout the rest of the discussion:

  • Incident Commanders are individuals that are responsible for driving the incident to resolution.
  • Incident Participants are individuals that are Subject Matter Experts (SMEs) that have been engaged to help resolve the incident.
  • Resources are documents, screenshots, logs or any other piece of digital information that is used during an incident.

The Checklist

For an average incident, there are quite a few steps to managing an incident and much of it is typically handled on an ad-hoc basis by a human. Let’s enumerate them:

  1. Declare an Incident — There are many different entry points to a potential incident: automated alerts, an internal notification, or an external notification.
  2. Determine Incident Commander — Determining the sole individual responsible for driving a particular incident to resolution based on the incident source, type, and priority.
  3. Create Communication Channels — Communication during incidents is key. Establishing dedicated and standardized channels for communication prevents the creation of communication silos.
  4. Create Incident Document — The central document responsible for containing up-to-date incident information, including a description of the incident, links to resources, rough notes from in-person meetings, open questions, action items, and timeline information.
  5. Engage Individual Resources — An incident commander will not be able to resolve an incident by themselves, they must identify and engage additional resources within the organization to help them.
  6. Orient Individual Resources — Engaging additional resources is not enough, the Incident Commander needs to orient these resources to the situation at hand.
  7. Notify Key Stakeholders — For any given incident, key stakeholders not directly involved in resolving the incident need to be made aware of the incident.
  8. Drive Incident to Resolution — The actual resolution of the incident, creating tasks, asking questions, and tracking answers. Making note of key learnings to be addressed after resolution.
  9. Perform Post Incident Review (PIR) — Review how the incident process was performed, tracking actions to be performed after the incident, and driving learning through structuring informal knowledge.

Each of these steps has the incident commander and incident participants moving through various systems and interfaces. Each context switch adds to the cognitive load on the responder and distracting them from resolving the incident itself.

Toward Better Crisis Management

Crisis management is not a new challenge, tools like Jira, PagerDuty, VictorOps are all helping organizations manage and respond to incidents. When setting out to automate our incident management process we had two main goals:

  1. Re-use existing tools users were already familiar with; reducing the learning curve to contributing to incidents.
  2. Catalog, store and analyze our incident data to speed up resolution.

Meet Dispatch!

Dispatch

Dispatch is a crisis management orchestration framework that manages incident metadata and resources. It uses tools already in use throughout an organization, providing incident participants a comprehensive crisis management toolset, allowing them to focus on resolving the incident.

Unlike many of our tools Dispatch is not tightly bound to AWS, Dispatch does not use any AWS APIs at all! While Dispatch doesn’t use AWS APIs, it leverages multiple APIs that are deeply embedded into the organization (e.g. Slack, GSuite, PagerDuty, etc.,). In addition to all of the built-in integrations, Dispatch provides multiple integration points that allow it to fit into just about any existing environment.

Although developed as a tool to help Netflix manage security incidents, nothing about Dispatch is specific to a security use-case. At its core, Dispatch aims to manage the entire lifecycle of an incident, focusing on engaging individuals and providing them the context they need to drive the incident to resolution.

Workflow

Let’s take a look at what an incident commander’s new workflow would look like using Dispatch:

Some key benefits of the new workflow are:

  • The incident commander no longer needs to manage access to resources or multiple data streams.
  • Communications are standardized (both in style and interval) across incidents.
  • Incident participants are automatically engaged based on the type, priority, and description of the incident.
  • Incident tasks are tracked and owners are reminded if they’re not completed on time.
  • All incident data is centrally tracked.
  • A common API is provided for internal users and tools.

We want to make reporting incidents as frictionless as possible, giving users a straightforward path to engage the resources they need in a time of crisis.

Jumping between different tools, ensuring data is correct and in sync is a low-value exercise for an incident commander. Instead, we centralized on two common tools to manage the entire lifecycle. Slack for managing incident metadata (e.g. status, title, description, priority, etc,.) and Google Doc and Google Drive for managing data itself.

When teams need to look across many incidents, Dispatch provides an Admin UI. This interface is also where incident knowledge is managed. From common terms and their definitions, individuals, teams, and services. The Admin UI is how we manage incident knowledge for use in future incidents.

Architecture

Dispatch makes use of the following components:

  • Python 3.8 with FastAPI (including helper packages)
  • VueJS UI
  • Postgres

We’re shipping Dispatch with built-in plugins that allow you to create and manage resources with GSuite (Docs, Drive, Sheets, Calendar, Groups), Jira, PagerDuty, and Slack. But the plugin architecture allows for integrations with whatever tools your organization is already using.

Getting Started

Dispatch is available now on the Netflix Open Source site. You can try out Dispatch using Docker. Detailed instructions on setup and configuration are available in our docs.

Interested in Contributing?

Feel free to reach out or submit pull requests if you have any suggestions. We’re looking forward to seeing what new plugins you create to make Dispatch work for you! We hope you’ll find Dispatch as useful as we do!

Oh, and we’re hiring — if you’d like to help us solve these sorts of problems, take a look at https://jobs.netflix.com/teams/security, and reach out!


Introducing Dispatch was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Automated Response and Remediation with AWS Security Hub

Post Syndicated from Jonathan Rau original https://aws.amazon.com/blogs/security/automated-response-and-remediation-with-aws-security-hub/

AWS Security Hub is a service that gives you aggregated visibility into your security and compliance status across multiple AWS accounts. In addition to consuming findings from Amazon services and integrated partners, Security Hub gives you the option to create custom actions, which allow a customer to manually invoke a specific response or remediation action on a specific finding. You can send custom actions to Amazon CloudWatch Events as a specific event pattern, allowing you to create a CloudWatch Events rule that listens for these actions and sends them to a target service, such as a Lambda function or Amazon SQS queue.

By creating custom actions mapped to specific finding type and by developing a corresponding Lambda function for that custom action, you can achieve targeted, automated remediation for these findings. This allows a customer to specifically decide if he or she wants to invoke a remediation action on a specific finding. A customer can also use these Lambda functions as the target of fully automated remediation actions that do not require any human review.

In this blog post, I’ll show you how to build custom actions, CloudWatch Event rules, and Lambda functions for a dozen targeted actions that can help you remediate CIS AWS Foundations Benchmark-related compliance findings. I’ll also cover use cases for sending findings to an issue management system and for automating security patching. To promote rapid deployment and adoption of this solution, you’ll deploy a majority of the necessary components via AWS CloudFormation.

Note: The full repository for current and future response and remediation templates is hosted on GitHub and includes additional technical guidance for expanding the solution provided in this post.

Solution architecture

Figure 1 - Solution Architecture Overview

Figure 1 – Solution Architecture Overview

Figure 1 shows how a finding travels from an integrated service to a custom action:

  1. Integrated services send their findings to Security Hub.
  2. From the Security Hub console, you’ll choose a custom action for a finding. Each custom action is then emitted as a CloudWatch Event.
  3. The CloudWatch Event rule triggers a Lambda function. This function is mapped to a custom action based on the custom action’s ARN.
  4. Dependent on the particular rule, the Lambda function that is invoked will perform a remediation action on your behalf.

For the purpose of this blog post, I’ll refer to the end-to-end combination of a custom action, a CloudWatch Event rule, a Lambda function, plus any supporting services needed to perform a specific action as a “playbook.” To demonstrate how a remediation solution works end-to-end, I’ll show you how to build your first playbook manually. You’ll deploy the remainder of the playbooks via CloudFormation.

I’ll also show you how to modify four of the playbooks (three of which are appended by an asterisk), as they use AWS Lambda environment variables to perform their actions, we’ll walk through populating these later.

Based on feedback from Security Hub customers, the following controls from the CIS AWS Foundations Benchmark will be supported by this blog post:

  • 1.3 – “Ensure credentials unused for 90 days or greater are disabled”
  • 1.4 – “Ensure access keys are rotated every 90 days or less”
  • 1.5 – “Ensure IAM password policy requires at least one uppercase letter”
  • 1.6 – “Ensure IAM password policy requires at least one lowercase letter”
  • 1.7 – “Ensure IAM password policy requires at least one symbol”
  • 1.8 – “Ensure IAM password policy requires at least one number”
  • 1.9 – “Ensure IAM password policy requires a minimum length of 14 or greater”
  • 1.10 – “Ensure IAM password policy prevents password reuse”
  • 1.11 – “Ensure IAM password policy expires passwords within 90 days or less”
  • 2.2 – “Ensure CloudTrail log file validation is enabled”
  • 2.3 – “Ensure the S3 bucket CloudTrail logs to is not publicly accessible”
  • 2.4 – “Ensure CloudTrail trails are integrated with Amazon CloudWatch Logs”*
  • 2.6 – “Ensure S3 bucket access logging is enabled on the CloudTrail S3 bucket”*
  • 2.7 – “Ensure CloudTrail logs are encrypted at rest using AWS KMS CMKs”
  • 2.8 – “Ensure rotation for customer created CMKs is enabled”
  • 2.9 – “Ensure VPC flow logging is enabled in all VPCs”*
  • 4.1 – “Ensure no security groups allow ingress from 0.0.0.0/0 to port 22”
  • 4.2 – “Ensure no security groups allow ingress from 0.0.0.0/0 to port 3389”
  • 4.3 – “Ensure the default security group of every VPC restricts all traffic”

You’ll also deploy and modify an additional playbook, “Send findings to JIRA.” You can find the high-level description of each playbook in each custom action creation script, as well as in the CloudFormation resource descriptions.

Note: If you want to send Security Hub findings to a security information and event management tool (SIEM) such as Amazon ElasticSearch Service or a third-party solution, you must change the CloudWatch Events event pattern to match all findings and use different targets such as Amazon Kinesis Data Streams to Kinesis Data Firehose to load your SIEM. This process is out of scope for this post.

Prerequisites

Ensure you have Security Hub and AWS Config turned on in your Region. Also, note that the solution in this blog post is meant to support a single account and will not support cross-account remediation as deployed. Refer to the Knowledge Center article How can I configure a Lambda function to assume a role from another AWS account? for basic information on cross-account roles for Lambda.

For the playbook “Apply Security Patches,” your EC2 instances must be managed by Systems Manager. For more information on managed instances, see AWS Systems Manager Managed Instances in the AWS Systems Manager User Guide.

Manually create a remediation playbook

To demonstrate the end-to-end process of building a playbook, I’ll first show you how to create one manually, before you deploy the remaining playbooks via CloudFormation. You’ll build a playbook to remediate Control 2.7 of the AWS Foundations Benchmark, “ensure CloudTrail logs are encrypted at rest using AWS Key Management Service (KMS) Customer Managed Keys (CMK).” Configuring CloudTrail to use KMS encryption (called SSE-KMS) provides additional confidentiality controls on you log data. To access your CloudTrail logs, users must not only have S3 read permissions for the corresponding log bucket, they now must be granted decrypt permissions by the KMS key policy.

Important note: The way this remediation, and all other remediation code is written, you can only target one finding at a time via the Action Menu.

You’ll achieve automated remediation by using a Lambda function to create a new KMS CMK and alias which identifies the non-compliant CloudTrail trail. You’ll then attach a KMS key policy that only allows the AWS account that owns the trail to decrypt the logs by using the IAM condition for StringEquals: kms:CallerAccount. You only need to run this playbook once per non-compliant CloudTrail trail.

To get started, follow these steps:

  1. Navigate to the Security Hub console, select Settings from the navigation pane, then select the Custom Actions tab.
  2. Choose Create custom action and enter values for Action name, Description, and Custom action ID, then choose Create custom action again, as shown in Figure 2.

    For the purpose of this blog post, I’ll refer to my action name as “CIS 2.7 RR” where the “RR” stands for “Response and remediation.”
     

    Figure 2 - Create custom action

    Figure 2 – Create custom action

  3. Copy the Amazon resource number (ARN) down, as you’ll need it in step 11.
  4. Navigate to the Lambda console and select Create function.
  5. Enter a function name, choose Python 3.7 runtime, and under Permissions select Create a new role with basic Lambda permissions. Then choose Create function.
  6. Scroll down to Execution role and select the hyperlink under Existing role. This will open a new tab in the IAM console.
  7. From the IAM console, select Add inline policy, then select the JSON tab, paste in the below IAM policy JSON, and select Review Policy.
    
    {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "kmssid",
            "Action": [
              "kms:CreateAlias",
              "kms:CreateKey",
              "kms:PutKeyPolicy"
            ],
            "Effect": "Allow",
            "Resource": "*"
          },
          {
            "Sid": "cloudtrailsid",
            "Action": [
              "cloudtrail:UpdateTrail"
            ],
            "Effect": "Allow",
            "Resource": "*"
          }
        ]
      }
    

  8. Give the in-line policy a name and select Create policy.
  9. Back in the Lambda console, increase Timeout to 1 minute and Memory to 256MB. Scroll up to Function code, paste in the below code, and select Save.
    
     # Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
     # SPDX-License-Identifier: MIT-0
     #
     # Permission is hereby granted, free of charge, to any person obtaining a copy of this
     # software and associated documentation files (the "Software"), to deal in the Software
     # without restriction, including without limitation the rights to use, copy, modify,
     # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
     # permit persons to whom the Software is furnished to do so.
     #
     # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
     # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
     # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
     # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
     # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
     # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
    
    import boto3
    import json
    import time
    
    def lambda_handler(event, context):
        # parse non-compliant trail from Security Hub finding
        noncompliantTrail = str(event['detail']['findings'][0]['Resources'][0]['Details']['Other']['name'])
        # parse account ID from Security Hub finding, will be needed for Key Policy
        accountID = str(event['detail']['findings'][0]['AwsAccountId'])
        
        # import boto3 clients for KMS and CloudTrail
        kms = boto3.client('kms')
        cloudtrail = boto3.client('cloudtrail')
    
        # create a new KMS CMK to encrypt the non-compliant trail
        try:
            createKey = kms.create_key(
            Description='Generated by Security Hub to remediate CIS 2.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs',
            KeyUsage='ENCRYPT_DECRYPT',
            Origin='AWS_KMS'
            )
            # save key id as a variable
            cloudtrailKey = str(createKey['KeyMetadata']['KeyId'])
            print("Created Key" + " " + cloudtrailKey)
        except Exception as e:
            print(e)
            print("KMS CMK creation failed")
            raise
            
        # wait 2 seconds for key creation to propogate
        time.sleep(2)
    
        # attach an alias for easy identification to the key - must always begin with "alias/"
        try:
            createAlias = kms.create_alias(
            AliasName='alias/' + noncompliantTrail + '-CMK',
            TargetKeyId=cloudtrailKey
            )
            print(createAlias)
        except Exception as e:
            print(e)
            print("Failed to create KMS Alias")
            raise
        
        # wait 1 second
        time.sleep(1)
    
        # policy name for PutKeyPolicy is always "default"
        policyName = 'default'
        # set Key Policy as JSON object
        keyPolicy={
            "Version": "2012-10-17",
            "Id": "Key policy created by CloudTrail",
            "Statement": [
                {
                    "Sid": "Enable IAM User Permissions",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": [ "arn:aws:iam::" + accountID + ":root" ]
                    },
                    "Action": "kms:*",
                    "Resource": "*"
                },
                {
                    "Sid": "Allow CloudTrail to encrypt logs",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "cloudtrail.amazonaws.com"
                    },
                    "Action": "kms:GenerateDataKey*",
                    "Resource": "*",
                    "Condition": {
                        "StringLike": {
                            "kms:EncryptionContext:aws:cloudtrail:arn": "arn:aws:cloudtrail:*:" + accountID + ":trail/" + noncompliantTrail
                        }
                    }
                },
                {
                    "Sid": "Allow CloudTrail to describe key",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "cloudtrail.amazonaws.com"
                    },
                    "Action": "kms:DescribeKey",
                    "Resource": "*"
                },
                {
                    "Sid": "Allow principals in the account to decrypt log files",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "*"
                    },
                    "Action": [
                        "kms:Decrypt",
                        "kms:ReEncryptFrom"
                    ],
                    "Resource": "*",
                    "Condition": {
                        "StringEquals": {
                            "kms:CallerAccount": accountID
                        },
                        "StringLike": {
                            "kms:EncryptionContext:aws:cloudtrail:arn": "arn:aws:cloudtrail:*:" + accountID + ":trail/" + noncompliantTrail
                        }
                    }
                },
                {
                    "Sid": "Allow alias creation during setup",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "*"
                    },
                    "Action": "kms:CreateAlias",
                    "Resource": "*",
                    "Condition": {
                        "StringEquals": {
                            "kms:CallerAccount": accountID
                        }
                    }
                }
            ]
        }
        # attaches above key policy to key
        try:
            attachKeyPolicy = kms.put_key_policy(
                KeyId=cloudtrailKey,
                Policy=json.dumps(keyPolicy),
                PolicyName=policyName
            )
            print(attachKeyPolicy)
        except Exception as e:
            print(e)
            print("Failed to attach key policy to Key:" + " " + cloudtrailKey)
    
        # update CloudTrail with the new CMK
        try:
            encryptTrail = cloudtrail.update_trail(
            Name=noncompliantTrail,
            KmsKeyId=cloudtrailKey
            )
            print(encryptTrail)
            print("CloudTrail trail" + " " + noncompliantTrail + " " + "has been successfully encrypted!")
        except Exception as e:
            print(e)
            print("Failed to attach KMS CMK to CloudTrail")
    

  10. Navigate to the CloudWatch console, and choose Events, then Create rule.
  11. On the left, next to Event Pattern Preview, select Edit.
  12. Under Event Source, find Build custom event pattern and paste the JSON below into the text box. Replace the ARN under “resources” with the ARN of the custom action you created in step 3.
    
    {
      "source": [
        "aws.securityhub"
      ],
      "detail-type": [
        "Security Hub Findings - Custom Action"
      ],
      "resources": [
        "arn:aws:securityhub:us-west-2:123456789012:action/custom/test-action1"
      ]
    }
    

  13. On the same screen, under Targets on the right, select add Target, choose the Lambda function you created in step 4, then select Configure details.
  14. On the next screen, enter values for Name and Description, then select Create rule.
  15. Back in the Security Hub console, select Compliance standards from the menu on the left, then select View results to see the CIS AWS Benchmarks.
  16. Find rule 2.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs and select the hyperlink to see all findings related to that control.
  17. Select a finding that is FAILED, with a Resource type that reads AwsCloudTrailTrail.
  18. From the top right, select the Actions dropdown menu, then select CIS 2.7 RR (or whatever you named this action in step 2), as shown in Figure 3.

    Selecting this action will execute the rule you created in step 13, which will invoke the Lambda function you created in step 9.
     

    Figure 3 - Security Hub Custom Actions

    Figure 3 – Security Hub Custom Actions

  19. Navigate to the CloudTrail console to ensure your CloudTrail trail is updated with SSE-KMS.

This creation flow via the console is universal for all playbook development with Security Hub. You’ll use the Actions menu in the same way to trigger the playbooks you deploy via CloudFormation in the next section of this walkthrough.

Note: To monitor the actions that are taken by the playbooks’ Lambda functions, refer to the functions’ logs. Both success and error messages will appear to help you diagnose as needed.

Deploy remediation playbooks via CloudFormation

Download the CloudFormation template from GitHub and create a CloudFormation stack. For more information about how to create a CloudFormation stack, see Getting Started with AWS CloudFormation in the AWS CloudFormation User Guide.

After your stack has finished deploying, navigate to the Resources tab and select the hyperlink for each resource to be taken to their respective consoles. The logical ID for each resource will be prepended by the action the resource corresponds to. For example, in figure 4, CIS13RRCWEPermissions denotes CloudWatch Event permissions for AWS CIS Benchmark Control 1.3. This logical ID structure is used throughout the template.
 

Figure 4 - CloudFormation Resources

Figure 4 – CloudFormation Resources

Next, I’ll show you how to modify the Lambda functions associated with the following playbooks: “Send findings to JIRA,” CIS 2.4, CIS 2.6, and CIS 2.9.

Playbook modification: send findings to JIRA

Note: If you don’t currently have a JIRA Software Data Center deployment set up in your account, you can deploy one with a free evaluation period by following this Quick Start. If you are not interested in using JIRA or you use a different issue management tool, you can skip this section.

This playbook works by using the associated Lambda function to execute a Systems Manager automation document called AWS-CreateJiraIssue.

This Systems Manager document will in turn deploy a CloudFormation stack and create a custom Lambda function to map environmental variables (listed below) into JIRA via your playbook’s Lambda function.

Once complete, the CloudFormation stack will self-delete and the automation will be complete. To see this flow, refer to figure 5, below:
 

Figure 5 - Send to JIRA Architecture Diagram

Figure 5 – Send to JIRA Architecture Diagram

  1. A finding is selected and the custom action “Create JIRA Issue” is invoked, which triggers a CloudWatch Event.
  2. The CloudWatch Event rule will trigger a Lambda function.
  3. Lambda will invoke the AWS-CreateJiraIssue document via Systems Manager Automation.
  4. Systems Manager Automation will pass your Lambda environmental variables and Security Hub finding as document parameters.
  5. The document will create a CloudFormation Stack that contains another custom Lambda to invoke JIRA APIs for creating issues.
  6. The document’s Lambda function uses your parameters to create an issue in JIRA.
  7. JIRA will send back a response to note failure or success, and the CloudFormation stack will self-delete.

To get started, navigate to the Lambda console and find the function named SendToJIRA, then scroll down to Environmental variables: You should see 4, with the text “placeholder” next to each. The following steps walk you through how to populate them:

  1. Fill out the JIRA_API_PARAMETER field:
    1. Refer to Atlassian’s instructions to generate a JIRA API token
    2. Follow the Systems Manager Parameter Store walkthrough to create a parameter.
      1. In Step 6 of the linked instructions, choose Secure String and choose the default KMS key.
      2. In Step 7 of the linked instructions, paste in your newly generated JIRA API token.
    3. Copy the name of the parameter and paste it as the value of the JIRA_API_PARAMETER field.
  2. Fill out the JIRA_PROJECT field by pasting in your JIRA Software project’s Project Key.
  3. Fill out the JIRA_SECURITY_ISSUE_USER field:
    1. This value maps to a user in JIRA Software; refer to Atlassian’s instructions for how to create a user. Use the API key you generated in step 1.A as this user’s password.
    2. Paste the user name into the JIRA_SECURITY_ISSUE_USER field.
  4. Fill out the JIRA_URL field:
    1. From JIRA Software navigate to the System sub-menu of JIRA Administration and look for the Base URL field, underneath Settings – General Settings (see figure 6).
    2. Copy the Base URL value and paste it into the JIRA_URL field.
       
      Figure 6 - Base URL in JIRA

      Figure 6 – Base URL in JIRA

  5. When finished, select Save at the top of the Lambda console, then navigate to the Security Hub console and select Findings from the left-hand menu.
  6. Select any finding, and then, from the Actions dropdown in the top right, select Create JIRA Issue.
  7. Navigate to your JIRA instance and wait a few minutes for the new issue to appear, as shown in Figure 7.
     
    Figure 7 - Security Hub Finding in JIRA

    Figure 7 – Security Hub Finding in JIRA

    Note: If your issue hasn’t populated in JIRA after a few minutes, refer to the Systems Manager Automation console or CloudFormation console to find the stack that was created by Systems Manager and refer to the failure messages to troubleshoot further.

CIS 2.4 response & remediation playbook modification

CIS Control 2.4 is “Ensure CloudTrail trails are integrated with Amazon CloudWatch Logs.” The intent of this recommendation is to ensure that API activity recorded by CloudTrail is available to query in near real-time for the purpose of troubleshooting or security incident investigation with CloudWatch.

To send your CloudTrail logs to CloudWatch, the Lambda function for this playbook will create a brand new CloudWatch Logs group that has the name of the non-compliant CloudTrail trail in it for easy identification and then update your non-compliant CloudTrail trail to send its logs to the newly created log group.

To accomplish this, CloudTrail needs an IAM role and permissions to be allowed to publish logs to CloudWatch. To avoid creating multiple new IAM roles and policies via Lambda, you’ll populate the ARN of this IAM role in the Lambda environmental variables for this playbook.

Note: If you don’t currently have an IAM role for CloudTrail, follow these instructions from the CloudTrail user guide to create one.

To update this playbook:

  1. Navigate to the Lambda console and find the function named CIS_2-4_RR, then scroll down to Environmental variables.
  2. Find the variable CLOUDTRAIL_CW_LOGGING_ROLE_ARN with text that reads “placeholder” as the value.
  3. Paste the ARN of the IAM role into the field for this value, then select Save.

CIS 2.6 response & remediation playbook modification

CIS Control 2.6 is “Ensure S3 bucket access logging is enabled on the CloudTrail S3 bucket.” An access log record contains details about the request, such as the request type, the resources specified in the request worked, and the time and date the request was processed. AWS recommends that you enable bucket access logging on the CloudTrail S3 bucket. By enabling S3 bucket logging on target S3 buckets, you can capture all the events that might affect objects in the bucket. Configuring logs to be placed in a separate bucket enables centralized collection of access log information, which can be useful in security and incident response workflows.

Note: Security Hub supports CIS AWS Foundations controls only on resources in the same Region and owned by the same account as the one in which Security Hub is enabled and being used. For example, if you’re using Security Hub in the us-east-2 Region, and you’re storing CloudTrail logs in a bucket in the us-west-2 Region, Security Hub cannot find the bucket in the us-west-2 Region. The control returns a warning that the resource cannot be located. Similarly, if you’re aggregating logs from multiple accounts into a single bucket, the CIS control fails for all accounts except the account that owns the bucket.

To ensure the S3 bucket that contains your CloudTrail logs has access logging enabled, the Lambda function for this playbook invokes the Systems Manager document AWS-ConfigureS3BucketLogging. This document will enable access logging for that bucket. To avoid statically populating your S3 access logging bucket in the Lambda function’s code, you’ll pass that value in via an environmental variable.

Note: If you do not currently have an S3 bucket configured to receive access logs, follow the directions from the S3 user guide to create one.

To update this playbook:

  1. Navigate to the Lambda console and find the function named CIS_2-6_RR, then scroll down to Environmental variables.
  2. Find the variable ACCESS_LOGGING_BUCKET with text that reads ‘placeholder’ as the value.
  3. Paste the name of the S3 bucket that will receive the access logs into the field for this value and select Save.

CIS 2.9 response & remediation playbook modification

CIS Control 2.9 is “Ensure VPC flow logging is enabled in all VPCs.” The Amazon Virtual Private Cloud (VPC) flow logs feature enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you’ve created a flow log, you can view and retrieve its data in CloudWatch Logs. AWS recommends that you enable flow logging for packet rejects for VPCs. Flow logs provide visibility into network traffic that traverses the VPC and can detect anomalous traffic or provide insight into your security workflow.

To enable VPC flow logging for rejected packets, the Lambda function for this playbook will create a new CloudWatch Logs group. For easy identification, the name of the group will include the non-compliant VPC name. The Lambda function will programmatically update your VPC to enable flow logs to be sent to the newly created log group.

Similar to CloudTrail logging, VPC flow log need an IAM role and permissions to be allowed to publish logs to CloudWatch. To avoid creating multiple new IAM roles and policies via Lambda, you’ll populate the ARN of this IAM role in the Lambda environmental variables for this playbook.

Note: If you don’t currently have an IAM role that VPC flow logs can use to deliver logs to CloudWatch, follow the directions from the VPC user guide to create one.

To update this playbook:

  1. Navigate to the Lambda console and find the function named CIS_2-9_RR, then scroll down to Environmental variables.
  2. Find the variable flowLogRoleARN with text that reads ‘placeholder’ as the value.
  3. Paste in the ARN of the VPC flow logs IAM role the field for this value, and select Save.

Conclusion

In this blog post, I showed you how to create, deploy, and execute response and remediation playbooks for Security Hub. By combining custom actions, CloudWatch Event rules, and Lambda functions, you can quickly remediate non-compliant resources. I also showed you how to take pre-defined actions such as sending findings to JIRA, and how to deploy an additional seven playbooks via CloudFormation.

You can create playbooks that take other actions, such as updating Network Access Control Lists to help block malicious traffic from a TOR exit node via the UnauthorizedAccess:EC2/TorIPCaller finding from GuardDuty. Using playbooks with Security Hub is one more way to build toward the security and compliance of your AWS resources.

To avoid incurring additional charges from AWS resources, delete the CloudFormation stack you deployed as well as the resources you manually created for the CIS 2.7 response and remediation playbook.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Security Hub forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Jonathon Rau

Jonathan Rau

Jonathan is the Senior TPM for AWS Security Hub. He holds an AWS Certified Specialty-Security certification and is extremely passionate about cyber security, data privacy, and new emerging technologies, such as blockchain. He devotes personal time into research and advocacy about those same topics.

How FactSet automated exporting data from Amazon DynamoDB to Amazon S3 Parquet to build a data analytics platform

Post Syndicated from Arvind Godbole original https://aws.amazon.com/blogs/big-data/how-factset-automated-exporting-data-from-amazon-dynamodb-to-amazon-s3-parquet-to-build-a-data-analytics-platform/

This is a guest post by Arvind Godbole, Lead Software Engineer with FactSet and Tarik Makota, AWS Principal Solutions Architect. In their own words “FactSet creates flexible, open data and software solutions for tens of thousands of investment professionals around the world, which provides instant access to financial data and analytics that investors use to make crucial decisions. At FactSet, we are always working to improve the value that our products provide.”

One area that we’ve been looking into is the relevancy of search results for our clients. Given the wide variety of client use cases and the large number of searches per day, we needed a platform to store anonymized usage data and allow us to analyze that data to boost results using our custom scoring algorithm. Amazon EMR was the obvious choice to host the calculations, but the question arose on how to get our anonymized data into a form that Amazon EMR could use. We worked with AWS and chose to use Amazon DynamoDB to prepare the data for usage in Amazon EMR.

This post walks you through how FactSet takes data from a DynamoDB table and converts that data into Apache Parquet. We store the Parquet files in Amazon S3 to enable near real-time analysis with Amazon EMR. Along the way, we encountered challenges related to data type conversion, which we will explain and show how we were able to overcome these.

Workflow overview

Our workflow contained the following steps:

  1. Anonymized log data is stored into DynamoDB tables. These entries have different fields, depending on how the logs were generated. Whenever we create items in the tables, we use DynamoDB Streams to write out a record. The stream records contain information from a single item in a DynamoDB table.
  2. An AWS Lambda function is hooked into the DynamoDB stream to capture the new items stored in a DynamoDB table. We built our Lambda function off of the lambda-streams-to-firehose project on GitHub to convert the DynamoDB stream image to JSON, which we stringify and push to Amazon Kinesis Data Firehose.
  3. Kinesis Data Firehose transforms the JSON data into Parquet using data contained within an AWS Glue Data Catalog table.
  4. Kinesis Data Firehose stores the Parquet files in S3.
  5. An AWS Glue crawler discovers the schema of DynamoDB items and stores the associated metadata into the Data Catalog.

The following diagram illustrates this workflow.

AWS Glue provides tools to help with data preparation and analysis. A crawler can run on a DynamoDB table to take inventory of the table data and store that information in a Data Catalog. Other services can use the Data Catalog as an index to the location, schema, and types of the table data. There are other ways to add metadata into a Data Catalog, but the key idea is that you can update and modify the metadata easily. For more information, see Populating the AWS Glue Data Catalog.

Problem: Data type disparities

Using a variety of technologies to build a solution often requires mapping and converting data types between these technologies. The cloud is no exception. In our case, log items stored in DynamoDB contained attributes of type String Set. String Set values caused data conversion exceptions when Kinesis tried to transform the data to Parquet. After investigating the problem, we found the following:

  • As the crawler indexes the DynamoDB table, Set data types (StringSet, NumberSet) are stored in the Glue metadata catalog as set<string> and set<bigint>.
  • Kinesis Data Firehose uses that same catalog when it performs the conversion to Apache Parquet. The conversion requires valid Hive data types.
  • set<string> and set<bigint> are not valid Hive data types, so the conversion fails, and an exception is generated. The exception looks similar to the following code:
    [{
       "lastErrorCode": "DataFormatConversion.InvalidSchema",
       "lastErrorMessage": "The schema is invalid. Error parsing the schema: Error: type expected at the position 38 of 'array,used:bigint>>' but 'set' is found."
    }]

Solution: Construct data mapping

While working with the AWS team, we confirmed that the Kinesis Data Firehose converter needs valid Hive data types in the Data Catalog to succeed. When it comes to complex data types, Hive doesn’t support set<data_type>, but it does support the following:

  • ARRAY<data_type>
  • MAP<primitive_type, data_type
  • STRUCT<col_name : data_type [COMMENT col_comment], ...>
  • UNIONTYPE<data_type, data_type, ...>

In our case, this meant that we must convert set<string> and set<bigint> into array<string> and array<bigint>. Our first step was to manually change the types directly in the Data Catalog. After we updated the Data Catalog to change all occurrences of set<data_type> to array<data_type>, the Kinesis transformation to Parquet completed successfully.

Our business case calls for a data store that can store items with different attributes in the same table and the addition of new attributes on-the-fly. We took advantage of DynamoDB’s schema-less nature and ability to scale up and down on-demand so we could focus on our functionality and not the management of the underlying infrastructure. For more information, see Should Your DynamoDB Table Be Normalized or Denormalized?

If our data had a static schema, a manual change would be good enough. Given our business case, a manual solution wasn’t going to scale. Every time we introduced new attributes to the DynamoDB table, we needed to run the crawler, which re-created the metadata and overwrote the change.

Serverless event architecture

To automate the data type updates to the Data Catalog, we used Amazon EventBridge and Lambda to implement the modifications to the data type mapping. EventBridge is a serverless event bus that connects applications using events. An event is a signal that a system’s state has changed, such as the status of a Data Catalog table.

The following diagram shows the previous workflow with the new architecture.

  1. The crawler stays as-is and crawls the DynamoDB table to obtain the metadata.
  2. The metadata obtained by the crawler is stored in the Data Catalog. Previous metadata is updated or removed, and changes (manual or automated) are overwritten.
  3. The event GlueTableChanged in EventBridge listens to any changes to the Data Catalog tables. After we receive the event that there was a change to the table, we trigger the Lambda function.
  4. The Lambda function uses AWS SDK to update the Glue Catalog table using the glue.update_table() API to replace occurrences of set<data_type> with array<data_type>.

To set up EventBridge, we set Event pattern to be “Pre-defined pattern by service”. For service provider, we selected AWS and Glue as service. Event Type we selected “Glue Data Catalog Table State Change”. The following screenshot shows the EventBridge configuration that sends events to the Lambda function that updates the Data Catalog.

The following is the baseline Lambda code:

# This is NOT production worthy code please modify and implement error handling routines as appropriate
import json
import logging
import boto3

glue = boto3.client('glue')

logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Define subsegments manually
def table_contains_set(databaseName, tableName):
    
    # returns Glue Catalog description for Table structure
    response = glue.get_table( DatabaseName=databaseName,Name=tableName)
    logger.info(response)  
    
    # loop thru all the Columns of the table 
    isModified = False
    for i in response['Table']['StorageDescriptor']['Columns']: 
        logger.info("## Column: " + str(i['Name']))
        # if Column datatype starts with set< then change it to array<
        if i['Type'].find("set<") != -1:
            i['Type'] = i['Type'].replace("set<", "array<")
            isModified = True
            logger.info(i['Type'])
    
    if isModified:
        # following 3 statements simply clean up the response JSON so that update_table API call works
        del response['Table']['DatabaseName']
        del response['Table']['CreateTime']
        del response['Table']['UpdateTime']
        glue.update_table(DatabaseName=databaseName,TableInput=response['Table'],SkipArchive=True)
        
    logger.info("============ ### =============") 
    logger.info(response)
    
    return True
    
def lambda_handler(event, context):
    logger.info('## EVENT')
    # logger.info(event)
    # This is Sample of the event payload that would be received
    # { 'version': '0', 
    #   'id': '2b402842-21f5-1d76-1a9a-c90076d1d7da', 
    #   'detail-type': 'Glue Data Catalog Table State Change', 
    #   'source': 'aws.glue', 
    #   'account': '1111111111', 
    #   'time': '2019-08-18T02:53:41Z', 
    #   'region': 'us-east-1', 
    #   'resources': ['arn:aws:glue:us-east-1:111111111:table/ddb-glue-fh/ddb_glu_fh_sample'], 
    #   'detail': {
    #           'databaseName': 'ddb-glue-fh', 
    #           'changedPartitions': [], 
    #           'typeOfChange': 'UpdateTable', 
    #           'tableName': 'ddb_glu_fh_sample'
    #    }
    # }
    
    # get the database and table name of the Glue table triggered the event
    databaseName = event['detail']['databaseName']
    tableName = event['detail']['tableName']
    logger.info("DB: " + databaseName + " | Table: " + tableName)
    
    table_contains_set(databaseName, tableName)
   
    # TODO implement and modify
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

The Lambda function is straightforward; this post provides a basic skeleton. You can use this as a template to implement your own functionality for your specific data.

Conclusion

Simple things such as data type conversion and mapping can create unexpected outcomes and challenges when data crosses service boundaries. One of the advantages of AWS is the wide variety of tools with which you can create robust and scalable solutions tailored to your needs. Using event-driven architecture, we solved our data type conversion errors and automated the process to eliminate the issue as we move forward.

 


About the Authors

Arvind Godbole is a Lead Software Engineer at FactSet Research Systems. He has experience in building high-performance, high-availability client facing products and services, ranging from real-time financial applications to search infrastructure. He is currently building an analytics platform to gain insights into client workflows. He holds a B.S. in Computer Engineering from the University of California, San Diego

 

 

 

Tarik Makota is a Principal Solutions Architect with the Amazon Web Services. He provides technical guidance, design advice and thought leadership to AWS’ customers across US Northeast. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.

 

 

Protect your veggies from hail with a Raspberry Pi Zero W

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/protect-your-veggies-from-hail-with-a-raspberry-pi-zero-w/

Tired of losing vegetable crops to frequent summertime hail storms, Nick Rogness decided to build something to protect them. And the result is brilliant!

Digital Garden with hail protection

Tired of getting your garden destroyed by hail storms? I was, so I did something about it…maker style!

“I live in a part of the country where hail and severe weather are commonplace during the summer months,” Nick explains in his Hackster tutorial. “I was getting frustrated every year when my wife’s garden was get demolished by the nightly hail storms losing our entire haul of vegetable goodies!”

Nick drew up plans for a solution to his hail problem, incorporating liner actuators bolted to a 12ft × 12ft frame that surrounds the vegetable patch. When a storm is on the horizon, the actuators pull a heavy-duty tarp over the garden.

Nick connected two motor controllers to a Raspberry Pi Zero W. The Raspberry Pi then controls the actuators to pull the tarp, either when a manual rocker switch is flipped or when it’s told to do so via weather-controlled software.

“Software control of the garden was accomplished by using a Raspberry Pi and MQTT to communicate via Adafruit IO to reach the mobile app on my phone,” Nick explains. The whole build is powered by a 12V Marine deep-cycle battery that’s charged using a solar panel.

You can view the full tutorial on Hackster, including the code for the project.

The post Protect your veggies from hail with a Raspberry Pi Zero W appeared first on Raspberry Pi.

Orchestrating a security incident response with AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/orchestrating-a-security-incident-response-with-aws-step-functions/

In this post I will show how to implement the callback pattern of an AWS Step Functions Standard Workflow. This is used to add a manual approval step into an automated security incident response framework. The framework could be extended to remediate automatically, according to the individual policy actions defined. For example, applying alternative actions, or restricting actions to specific ARNs.

The application uses Amazon EventBridge to trigger a Step Functions Standard Workflow on an IAM policy creation event. The workflow compares the policy action against a customizable list of restricted actions. It uses AWS Lambda and Step Functions to roll back the policy temporarily, then notify an administrator and wait for them to approve or deny.

Figure 1: High-level architecture diagram.

Important: the application uses various AWS services, and there are costs associated with these services after the Free Tier usage. Please see the AWS pricing page for details.

You can deploy this application from the AWS Serverless Application Repository. You then create a new IAM Policy to trigger the rule and run the application.

Deploy the application from the Serverless Application Repository

  1. Find the “Automated-IAM-policy-alerts-and-approvals” app in the Serverless Application Repository.
  2. Complete the required application settings
    • Application name: an identifiable name for the application.
    • EmailAddress: an administrator’s email address for receiving approval requests.
    • restrictedActions: the IAM Policy actions you want to restrict.

      Figure 2 Deployment Fields

  3. Choose Deploy.

Once the deployment process is completed, 21 new resources are created. This includes:

  • Five Lambda functions that contain the business logic.
  • An Amazon EventBridge rule.
  • An Amazon SNS topic and subscription.
  • An Amazon API Gateway REST API with two resources.
  • An AWS Step Functions state machine

To receive Amazon SNS notifications as the application administrator, you must confirm the subscription to the SNS topic. To do this, choose the Confirm subscription link in the verification email that was sent to you when deploying the application.

EventBridge receives new events in the default event bus. Here, the event is compared with associated rules. Each rule has an event pattern defined, which acts as a filter to match inbound events to their corresponding rules. In this application, a matching event rule triggers an AWS Step Functions execution, passing in the event payload from the policy creation event.

Running the application

Trigger the application by creating a policy either via the AWS Management Console or with the AWS Command Line Interface.

Using the AWS CLI

First install and configure the AWS CLI, then run the following command:

aws iam create-policy --policy-name my-bad-policy1234 --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketObjectLockConfiguration",
                "s3:DeleteObjectVersion",
                "s3:DeleteBucket"
            ],
            "Resource": "*"
        }
    ]
}'

Using the AWS Management Console

  1. Go to Services > Identity Access Management (IAM) dashboard.
  2. Choose Create policy.
  3. Choose the JSON tab.
  4. Paste the following JSON:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:GetBucketObjectLockConfiguration",
                    "s3:DeleteObjectVersion",
                    "s3:DeleteBucket"
                ],
                "Resource": "*"
            }
        ]
    }
  5. Choose Review policy.
  6. In the Name field, enter my-bad-policy.
  7. Choose Create policy.

Either of these methods creates a policy with the permissions required to delete Amazon S3 buckets. Deleting an S3 bucket is one of the restricted actions set when the application is deployed:

Figure 3 default restricted actions

This sends the event to EventBridge, which then triggers the Step Functions state machine. The Step Functions state machine holds each state object in the workflow. Some of the state objects use the Lambda functions created during deployment to process data.

Others use Amazon States Language (ASL) enabling the application to conditionally branch, wait, and transition to the next state. Using a state machine decouples the business logic from the compute functionality.

After triggering the application, go to the Step Functions dashboard and choose the newly created state machine. Choose the current running state machine from the executions table.

Figure 4 State machine executions.

You see a visual representation of the current execution with the workflow is paused at the AskUser state.

Figure 5 Workflow Paused

These are the states in the workflow:

ModifyData
State Type: Pass
Re-structures the input data into an object that is passed throughout the workflow.

ValidatePolicy
State type: Task. Services: AWS Lambda
Invokes the ValidatePolicy Lambda function that checks the new policy document against the restricted actions.

ChooseAction
State type: Choice
Branches depending on input from ValidatePolicy step.

TempRemove
State type: Task. Service: AWS Lambda
Creates a new default version of the policy with only permissions for Amazon CloudWatch Logs and deletes the previously created policy version.

AskUser
State type: Choice
Sends an approval email to user via SNS, with the task token that initiates the callback pattern.

UsersChoice
State type: Choice
Branch based on the user action to approve or deny.

Denied
State type: Pass
Ends the execution with no further action.

Approved
State type: Task. Service: AWS Lambda
Restores the initial policy document by creating as a new version.

AllowWithNotification
State type: Task. Services: AWS Lambda
With no restricted actions detected, the user is still notified of change (via an email from SNS) before execution ends.

The callback pattern

An important feature of this application is the ability for an administrator to approve or deny a new policy. The Step Functions callback pattern makes this possible.

The callback pattern allows a workflow to pause during a task and wait for an external process to return a task token. The task token is generated when the task starts. When the AskUser function is invoked, it is passed a task token. The task token is published to the SNS topic along with the API resources for approval and denial. These API resources are created when the application is first deployed.

When the administrator clicks on the approve or deny links, it passes the token with the API request to the receiveUser Lambda function. This Lambda function uses the incoming task token to resume the AskUser state.

The lifecycle of the task token as it transitions through each service is shown below:

Figure 6 Task token lifecycle

  1. To invoke this callback pattern, the askUser state definition is declared using the .waitForTaskToken identifier, with the task token passed into the Lambda function as a payload parameter:
    "AskUser":{
     "Type": "Task",
     "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
     "Parameters":{  
     "FunctionName": "${AskUser}",
     "Payload":{  
     "token.$":"$$.Task.Token"
      }
     },
      "ResultPath":"$.taskresult",
      "Next": "usersChoice"
      },
  2. The askUser Lambda function can then access this token within the event object:
    exports.handler = async (event,context) => {
        let approveLink = `process.env.APIAllowEndpoint?token=${JSON.stringify(event.token)}`
        let denyLink = `process.env.APIDenyEndpoint?token=${JSON.stringify(event.token)}
    //code continues
  3. The task token is published to an SNS topic along with the message text parameter:
        let params = {
     TopicArn: process.env.Topic,
     Message: `A restricted Policy change has been detected Approve:${approveLink} Or Deny:${denyLink}` 
    }
     let res = await sns.publish(params).promise()
    //code continues
  4. The administrator receives an email with two links, one to approve and one to deny. The task token is appended to these links as a request query string parameter named token:

    Figure 7 Approve / deny email.

  5. Using the Amazon API Gateway proxy integration, the task token is passed directly to the recieveUser Lambda function from the API resource, and accessible from within in the function code as part of the event’s queryStringParameter object:
    exports.handler = async(event, context) => {
    //some code
        let taskToken = event.queryStringParameters.token
    //more code
    
  6.  The token is then sent back to the askUser state via an API call from within the recieveUser Lambda function.  This API call also defines the next course of action for the workflow to take.
    //some code 
    let params = {
            output: JSON.stringify({"action":NextAction}),
            taskToken: taskTokenClean
        }
    let res = await stepfunctions.sendTaskSuccess(params).promise()
    //code continues
    

Each Step Functions execution can last for up to a year, allowing for long wait periods for the administrator to take action. There is no extra cost for a longer wait time as you pay for the number of state transitions, and not for the idle wait time.

Conclusion

Using EventBridge to route IAM policy creation events directly to AWS Step Functions reduces the need for unnecessary communication layers. It helps promote good use of compute resources, ensuring Lambda is used to transform data, and not transport or orchestrate.

Using Step Functions to invoke services sequentially has two important benefits for this application. First, you can identify the use of restricted policies quickly and automatically. Also, these policies can be removed and held in a ‘pending’ state until approved.

Step Functions Standard Workflow’s callback pattern can create a robust orchestration layer that allows administrators to review each change before approving or denying.

For the full code base see the GitHub repository https://github.com/bls20AWS/AutomatedPolicyOrchestrator.

For more information on other Step Functions patterns, see our documentation on integration patterns.