Tag Archives: Expert (400)

Deploying Alexa Skills with the AWS CDK

2021-08-20 Jeff Gardner

Post Syndicated from Jeff Gardner original https://aws.amazon.com/blogs/devops/deploying-alexa-skills-with-aws-cdk/

So you’re expanding your reach by leveraging voice interfaces for your applications through the Alexa ecosystem. You’ve experimented with a new Alexa Skill via the Alexa Developer Console, and now you’re ready to productionalize it for your customers. How exciting!

You are also a proponent of Infrastructure as Code (IaC). You appreciate the speed, consistency, and change management capabilities enabled by IaC. Perhaps you have other applications that you provision and maintain via DevOps practices, and you want to deploy and maintain your Alexa Skill in the same way. Great idea!

That’s where AWS CloudFormation and the AWS Cloud Development Kit (AWS CDK) come in. AWS CloudFormation lets you treat infrastructure as code, so that you can easily model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles. The AWS CDK is an open-source software development framework for modeling and provisioning your cloud application resources via familiar programming languages, like TypeScript, Python, Java, and .NET. AWS CDK utilizes AWS CloudFormation in the background in order to provision resources in a safe and repeatable manner.

In this post, we show you how to achieve Infrastructure as Code for your Alexa Skills by leveraging powerful AWS CDK features.

Concepts

Alexa Skills Kit (ASK)

In addition to the Alexa Developer Console, skill developers can utilize the Alexa Skills Kit (ASK) to build interactive voice interfaces for Alexa. ASK provides a suite of self-service APIs and tools for building and interacting with Alexa Skills, including the ASK CLI, the Skill Management API (SMAPI), and SDKs for Node.js, Java, and Python. These tools provide a programmatic interface for your Alexa Skills in order to update them with code rather than through a user interface.

AWS CloudFormation

AWS CloudFormation lets you create templates written in either YAML or JSON format to model your infrastructure in code form. CloudFormation templates are declarative and idempotent, allowing you to check them into a versioned code repository, deploy them automatically, and track changes over time.

The ASK CloudFormation resource allows you to incorporate Alexa Skills in your CloudFormation templates alongside your other infrastructure. However, this has limitations that we’ll discuss in further detail in the Problem section below.

AWS Cloud Development Kit (AWS CDK)

Think of the AWS CDK as a developer-centric toolkit that leverages the power of modern programming languages to define your AWS infrastructure as code. When AWS CDK applications are run, they compile down to fully formed CloudFormation JSON/YAML templates that are then submitted to the CloudFormation service for provisioning. Because the AWS CDK leverages CloudFormation, you still enjoy every benefit provided by CloudFormation, such as safe deployment, automatic rollback, and drift detection. AWS CDK currently supports TypeScript, JavaScript, Python, Java, C#, and Go (currently in Developer Preview).

Perhaps the most compelling part of AWS CDK is the concept of constructs—the basic building blocks of AWS CDK apps. The three levels of constructs reflect the level of abstraction from CloudFormation. A construct can represent a single resource, like an AWS Lambda Function, or it can represent a higher-level component consisting of multiple AWS resources.

The three different levels of constructs begin with low-level constructs, called L1 (short for “level 1”) or Cfn (short for CloudFormation) resources. These constructs directly represent all of the resources available in AWS CloudFormation. The next level of constructs, called L2, also represents AWS resources, but it has a higher-level and intent-based API. They provide not only similar functionality, but also the defaults, boilerplate, and glue logic you’d be writing yourself with a CFN Resource construct. Finally, the AWS Construct Library includes even higher-level constructs, called L3 constructs, or patterns. These are designed to help you complete common tasks in AWS, often involving multiple resource types. Learn more about constructs in the AWS CDK developer guide.

One L2 construct example is the Custom Resources module. This lets you execute custom logic via a Lambda Function as part of your deployment in order to cover scenarios that the AWS CDK doesn’t support yet. While the Custom Resources module leverages CloudFormation’s native Custom Resource functionality, it also greatly reduces the boilerplate code in your CDK project and simplifies the necessary code in the Lambda Function. The open-source construct library referenced in the Solution section of this post utilizes Custom Resources to avoid some limitations of what CloudFormation and CDK natively support for Alexa Skills.

Problem

The primary issue with utilizing the Alexa::ASK::Skill CloudFormation resource, and its corresponding CDK CfnSkill construct, arises when you define the Skill’s backend Lambda Function in the same CloudFormation template or CDK project. When the Skill’s endpoint is set to a Lambda Function, the ASK service validates that the Skill has the appropriate permissions to invoke that Lambda Function. The best practice is to enable Skill ID verification in your Lambda Function. This effectively restricts the Lambda Function to be invokable only by the configured Skill ID. The problem is that in order to configure Skill ID verification, the Lambda Permission must reference the Skill ID, so it cannot be added to the Lambda Function until the Alexa Skill has been created. If we try creating the Alexa Skill without the Lambda Permission in place, insufficient permissions will cause the validation to fail. The endpoint validation causes a circular dependency preventing us from defining our desired end state with just the native CloudFormation resource.

Unfortunately, the AWS CDK also does not yet support any L2 constructs for Alexa skills. While the ASK Skill Management API is another option, managing imperative API calls within a CI/CD pipeline would not be ideal.

Solution

Overview

AWS CDK is extensible in that if there isn’t a native construct that does what you want, you can simply create your own! You can also publish your custom constructs publicly or privately for others to leverage via package registries like npm, PyPI, NuGet, Maven, etc.

We could write our own code to solve the problem, but luckily this use case allows us to leverage an open-source construct library that addresses our needs. This library is currently available for TypeScript (npm) and Python (PyPI).

The complete solution can be found at the GitHub repository, here. The code is in TypeScript, but you can easily port it to another language if necessary. See the AWS CDK Developer Guide for more guidance on translating between languages.

Prerequisites

You will need the following in order to build and deploy the solution presented below. Please be mindful of any prerequisites for these tools.

Alexa Developer Account
AWS Account
Docker
- Used by CDK for bundling assets locally during synthesis and deployment.
- See Docker website for installation instructions based on your operating system.
AWS CLI
- Used by CDK to deploy resources to your AWS account.
- See AWS CLI user guide for installation instructions based on your operating system.
Node.js
- The CDK Toolset and backend runs on Node.js regardless of the project language. See the detailed requirements in the AWS CDK Getting Started Guide.
- See the Node.js website to download the specific installer for your operating system.

Clone Code Repository and Install Dependencies

The code for the solution in this post is located in this repository on GitHub. First, clone this repository and install its local dependencies by executing the following commands in your local Terminal:

# clone repository
git clone https://github.com/aws-samples/aws-devops-blog-alexa-cdk-walkthrough
# navigate to project directory
cd aws-devops-blog-alexa-cdk-walkthrough
# install dependencies
npm install

Note that CLI commands in the sections below (ask, cdk) use npx. This executes the command from local project binaries if they exist, or, if not, it installs the binaries required to run the command. In our case, the local binaries are installed as part of the npm install command above. Therefore, npx will utilize the local version of the binaries even if you already have those tools installed globally. We use this method to simplify setup and alleviate any issues arising from version discrepancies.

Get Alexa Developer Credentials

To create and manage Alexa Skills via CDK, we will need to provide Alexa Developer account credentials, which are separate from our AWS credentials. The following values must be supplied in order to authenticate:

Vendor ID: Represents the Alexa Developer account.
Client ID: Represents the developer, tool, or organization requiring permission to perform a list of operations on the skill. In this case, our AWS CDK project.
Client Secret: The secret value associated with the Client ID.
Refresh Token: A token for reauthentication. The ASK service uses access tokens for authentication that expire one hour after creation. Refresh tokens do not expire and can retrieve a new access token when needed.

Follow the steps below to retrieve each of these values.

Get Alexa Developer Vendor ID

Easily retrieve your Alexa Developer Vendor ID from the Alexa Developer Console.

Navigate to the Alexa Developer console and login with your Amazon account.
After logging in, on the main screen click on the “Settings” tab.

Screenshot of Alexa Developer console showing location of Settings tab

Your Vendor ID is listed in the “My IDs” section. Note this value.

Screenshot of Alexa Developer console showing location of Vendor ID

Create Login with Amazon (LWA) Security Profile

The Skill Management API utilizes Login with Amazon (LWA) for authentication, so first we must create a security profile for LWA under the same Amazon account that we will use to create the Alexa Skill.

Navigate to the LWA console and login with your Amazon account.
Click the “Create a New Security Profile” button.

Screenshot of Login with Amazon console showing location of Create a New Security Profile button

Fill out the form with a Name, Description, and Consent Privacy Notice URL, and then click “Save”.

Screenshot of Login with Amazon console showing Create a New Security Profile form

The new Security Profile should now be listed. Hover over the gear icon, located to the right of the new profile name, and click “Web Settings”.

Screenshot of Login with Amazon console showing location of Web Settings link

Click the “Edit” button and add the following under “Allowed Return URLs”:
- http://127.0.0.1:9090/cb
- https://s3.amazonaws.com/ask-cli/response_parser.html
Click the “Save” button to save your changes.
Click the “Show Secret” button to reveal your Client Secret. Note your Client ID and Client Secret.

Screenshot of Login with Amazon console showing location of Client ID and Client Secret values

Get Refresh Token from ASK CLI

Your Client ID and Client Secret let you generate a refresh token for authenticating with the ASK service.

Navigate to your local Terminal and enter the following command, replacing <your Client ID> and <your Client Secret> with your Client ID and Client Secret, respectively:

# ensure you are in the root directory of the repository
npx ask util generate-lwa-tokens --client-id "<your Client ID>" --client-confirmation "<your Client Secret>" --scopes "alexa::ask:skills:readwrite alexa::ask:models:readwrite"

A browser window should open with a login screen. Supply credentials for the same Amazon account with which you created the LWA Security Profile previously.
Click the “Allow” button to grant the refresh token appropriate access to your Amazon Developer account.
Return to your Terminal. The credentials, including your new refresh token, should be printed. Note the value in the refresh_token field.

NOTE: If your Terminal shows an error like CliFileNotFoundError: File ~/.ask/cli_config not exists., you need to first initialize the ASK CLI with the command npx ask configure. This command will open a browser with a login screen, and you should enter the credentials for the Amazon account with which you created the LWA Security Profile previously. After signing in, return to your Terminal and enter n to decline linking your AWS account. After completing this process, try the generate-lwa-tokens command above again.

NOTE: If your Terminal shows an error like CliError: invalid_client, make sure that you have included the quotation marks (") around the --client_id and --client-confirmation arguments.

Add Alexa Developer Credentials to AWS SSM Parameter Store / AWS Secrets Manager

Our AWS CDK project requires access to the Alexa Developer credentials we just generated (Client ID, Client Secret, Refresh Token) in order to create and manage our Skill. To avoid hard-coding these values into our code, we can store the values in AWS Systems Manager (SSM) Parameter Store and AWS Secrets Manager, and then retrieve them programmatically when deploying our CDK project. In our case, we are using SSM Parameter Store to store the non-sensitive values in plaintext, and Secrets Manager to store the secret values in encrypted form.

The repository contains a shell script at scripts/upload-credentials.sh that can create the appropriate parameters and secrets via AWS CLI. You’ll just need to supply the credential values from the previous steps. Alternatively, instructions for creating parameters and secrets via the AWS Console or AWS CLI can each be found in the AWS Systems Manager User Guide and AWS Secrets Manager User Guide.

You will need the following resources created in your AWS account before proceeding:

Name	Service	Type
/alexa-cdk-blog/alexa-developer-vendor-id	SSM Parameter Store	String
/alexa-cdk-blog/lwa-client-id	SSM Parameter Store	String
/alexa-cdk-blog/lwa-client-secret	Secrets Manager	Plaintext / secret-string
/alexa-cdk-blog/lwa-refresh-token	Secrets Manager	Plaintext / secret-string

Code Walkthrough

Skill Package

When you programmatically create an Alexa Skill, you supply a Skill Package, which is a zip file consisting of a set of files defining your Skill. A skill package includes a manifest JSON file, and optionally a set of interaction model files, in-skill product files, and/or image assets for your skill. See the Skill Management API documentation for details regarding skill packages.

The repository contains a skill package that defines a simple Time Teller Skill at src/skill-package. If you want to use an existing Skill instead, replace the contents of src/skill-package with your skill package.

If you want to export the skill package of an existing Skill, use the ASK CLI:

Navigate to the Alexa Developer console and log in with your Amazon account.
Find the Skill you want to export and click the link under the name “Copy Skill ID”. Either make sure this stays on your clipboard or note the Skill ID for the next step.
Navigate to your local Terminal and enter the following command, replacing <your Skill ID> with your Skill ID:

# ensure you are in the root directory of the repository
cd src
npx ask smapi export-package --stage development --skill-id <your Skill ID>

NOTE: To export the skill package for a live skill, replace --stage development with --stage live.

NOTE: The CDK code in this solution will dynamically populate the manifest.apis section in skill.json. If that section is populated in your skill package, either clear it out or know that it will be replaced when the project is deployed.

Skill Backend Lambda Function

The Lambda Function code for the Time Teller Alexa Skill’s backend also resides within the CDK project at src/lambda/skill-backend. If you want to use an existing Skill instead, replace the contents of src/lambda/skill-backend with your Lambda code. Also note the following if you want to use your own Lambda code:

The CDK code in the repository assumes that the Lambda Function runtime is Python. However, you can modify for another runtime if necessary by using either the aws-lambda or aws-lambda-nodejs CDK module instead of aws-lambda-python.
If you’re using your own Python Lambda Function code, please note the following to ensure the Lambda Function definition compatibility in the sample CDK project. If your Lambda Function varies from what is below, then you may need to modify the CDK code. See the Python Lambda code in the repository for an example.
- The skill-backend/ directory should contain all of the necessary resources for your Lambda Function. For Python functions, this should include at least a file named index.py that contains your Lambda entrypoint, and a requirements.txt file containing your pip dependencies.
- For Python functions, your Lambda handler function should be called handler(). This generally looks like handler = SkillBuilder().lambda_handler() when using the Python ASK SDK.

Open-Source Alexa Skill Construct Library

As mentioned above, this solution utilizes an open-source construct library to create and manage the Alexa Skill. This construct library utilizes the L1 CfnSkill construct along with other L1 and L2 constructs to create a complete Alexa Skill with a functioning backend Lambda Function. Utilizing this construct library means that we are no longer limited by the shortcomings of only using the Alexa::ASK::Skill CloudFormation resource or L1 CfnSkill construct.

Look into the construct library code if you’re curious. There’s only one construct—Skill—and you can follow the code to see how it dodges the Lambda Permission issue.

CDK Stack

The CDK stack code is located in lib/alexa-cdk-stack.ts. Let’s dive in to understand what’s happening. We’ll look at one section at a time:

...
const PARAM_PREFIX = '/alexa-cdk-blog/'

export class AlexaCdkStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Get Alexa Developer credentials from SSM Parameter Store/Secrets Manager.
    // NOTE: Parameters and secrets must have been created in the appropriate account before running `cdk deploy` on this stack.
    //       See sample script at scripts/upload-credentials.sh for how to create appropriate resources via AWS CLI.
    const alexaVendorId = ssm.StringParameter.valueForStringParameter(this, `${PARAM_PREFIX}alexa-developer-vendor-id`);
    const lwaClientId = ssm.StringParameter.valueForStringParameter(this, `${PARAM_PREFIX}lwa-client-id`);
    const lwaClientSecret = cdk.SecretValue.secretsManager(`${PARAM_PREFIX}lwa-client-secret`);
    const lwaRefreshToken = cdk.SecretValue.secretsManager(`${PARAM_PREFIX}lwa-refresh-token`);
    ...
  }
}

First, within the stack’s constructor, after calling the constructor of the base class, we retrieve the credentials we uploaded earlier to SSM and Secrets Manager. This lets us to store our account credentials in a safe place—encrypted in the case of our lwaClientSecret and lwaRefreshToken secrets—and we avoid storing sensitive data in plaintext or source control.

...
export class AlexaCdkStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    ...
    // Create the Lambda Function for the Skill Backend
    const skillBackend = new lambdaPython.PythonFunction(this, 'SkillBackend', {
      entry: 'src/lambda/skill-backend',
      timeout: cdk.Duration.seconds(7)
    });
    ...
  }
}

Next, we create the Lambda Function containing the skill’s backend logic. In this case, we are using the aws-lambda-python module. This transparently handles every aspect of the dependency installation and packaging for us. Rather than leave the default 3-second timeout, specify a 7-second timeout to correspond with the Alexa service timeout of 8 seconds.

...

export class AlexaCdkStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    ...
    // Create the Alexa Skill
    const skill = new Skill(this, 'Skill', {
      endpointLambdaFunction: skillBackend,
      skillPackagePath: 'src/skill-package',
      alexaVendorId: alexaVendorId,
      lwaClientId: lwaClientId,
      lwaClientSecret: lwaClientSecret,
      lwaRefreshToken: lwaRefreshToken
    });
  }
}

Finally, we create our Skill! All we need to do is pass the Lambda Function with the Skill’s backend code into where the skill package is located, as well as the credentials for authenticating into our Alexa Developer account. All of the wiring for deploying the skill package and connecting the Lambda Function to the Skill is handled transparently within the construct code.

Deploy CDK project

Now that all of our code is in place, we can deploy our project and test it out!

Make sure that you have bootstrapped your AWS account for CDK. If not, you can bootstrap with the following command:

# ensure you are in the root directory of the repository
npx cdk bootstrap

Make sure that the Docker daemon is running locally. This is generally done by starting the Docker Desktop application.
- You can also use the following Terminal command to determine whether the Docker daemon is running. The command will return an error if the daemon is not running.

docker ps -q

- See more details regarding starting the Docker daemon based on your operating system via the Docker website.
Synthesize your CDK project in order to confirm that your project is building properly.

# ensure you are in the root directory of the repository
npx cdk synth

NOTE: In addition to generating the CloudFormation template for this project, this command also bundles the Lambda Function code via Docker, so it may take a few minutes to complete.

Deploy!

# ensure you are in the root directory of the repository
npx cdk deploy

- Feel free to review the IAM policies that will be created, and enter y to continue when prompted.
- If you would like to skip the security approval requirement and deploy in one step, use cdk deploy --require-approval never instead.

Check it out!

Once your project finishes deploying, take a look at your new Skill!

Navigate to the Alexa Developer console and log in with your Amazon account.
After logging in, on the main screen you should now see your new Skill listed. Click on the name to go to the “Build” screen.
Investigate the console to confirm that your Skill was created as expected.
On the left-hand navigation menu, click “Endpoint” and confirm that the ARN for your backend Lambda Function is showing in the “Default Region” field. This ARN was added dynamically by our CDK project.

Screenshot of Alexa Developer console showing location of Endpoint text box

Test the Skill to confirm that it functions properly.
1. Click on the “Test” tab and enable testing for the “Development” stage of your skill.
2. Type your Skill’s invocation name in the Alexa Simulator in order to launch the skill and invoke a response.
  - If you deployed the sample skill package and Lambda Function, the invocation name is “time teller”. If Alexa responds with the current time in UTC, it is working properly!

Bonus Points

Now that you can deploy your Alexa Skill via the AWS CDK, can you incorporate your new project into a CI/CD pipeline for automated deployments? Extra kudos if the pipeline is defined with the CDK 🙂 Follow these links for some inspiration:

Cleanup

After you are finished, delete the resources you created to avoid incurring future charges. This can be easily done by deleting the CloudFormation stack from the CloudFormation console, or by executing the following command in your Terminal, which has the same effect:

# ensure you are in the root directory of the repository
npx cdk destroy

Conclusion

You can, and should, strive for IaC and CI/CD in every project, and the powerful AWS CDK features make that easier with a set of simple yet flexible constructs. Leverage the simplicity of declarative infrastructure definitions with convenient default configurations and helper methods via the AWS CDK. This example also reveals that if there are any gaps in the built-in functionality, you can easily fill them with a custom resource construct, or one of the thousands of open-source construct libraries shared by fellow CDK developers around the world. Happy coding!

Jeff Gardner

Jeff Gardner is a Solutions Architect with Amazon Web Services (AWS). In his role, Jeff helps enterprise customers through their cloud journey, leveraging his experience with application architecture and DevOps practices. Outside of work, Jeff enjoys watching and playing sports and chasing around his three young children.

Building an end-to-end Kubernetes-based DevSecOps software factory on AWS

2021-06-26 Srinivas Manepalli

Post Syndicated from Srinivas Manepalli original https://aws.amazon.com/blogs/devops/building-an-end-to-end-kubernetes-based-devsecops-software-factory-on-aws/

DevSecOps software factory implementation can significantly vary depending on the application, infrastructure, architecture, and the services and tools used. In a previous post, I provided an end-to-end DevSecOps pipeline for a three-tier web application deployed with AWS Elastic Beanstalk. The pipeline used cloud-native services along with a few open-source security tools. This solution is similar, but instead uses a containers-based approach with additional security analysis stages. It defines a software factory using Kubernetes along with necessary AWS Cloud-native services and open-source third-party tools. Code is provided in the GitHub repo to build this DevSecOps software factory, including the integration code for third-party scanning tools.

DevOps is a combination of cultural philosophies, practices, and tools that combine software development with information technology operations. These combined practices enable companies to deliver new application features and improved services to customers at a higher velocity. DevSecOps takes this a step further by integrating and automating the enforcement of preventive, detective, and responsive security controls into the pipeline.

In a DevSecOps factory, security needs to be addressed from two aspects: security of the software factory, and security in the software factory. In this architecture, we use AWS services to address the security of the software factory, and use third-party tools along with AWS services to address the security in the software factory. This AWS DevSecOps reference architecture covers DevSecOps practices and security vulnerability scanning stages including secret analysis, SCA (Software Composite Analysis), SAST (Static Application Security Testing), DAST (Dynamic Application Security Testing), RASP (Runtime Application Self Protection), and aggregation of vulnerability findings into a single pane of glass.

The focus of this post is on application vulnerability scanning. Vulnerability scanning of underlying infrastructure such as the Amazon Elastic Kubernetes Service (Amazon EKS) cluster and network is outside the scope of this post. For information about infrastructure-level security planning, refer to Amazon Guard Duty, Amazon Inspector, and AWS Shield.

You can deploy this pipeline in either the AWS GovCloud (US) Region or standard AWS Regions. All listed AWS services are authorized for FedRamp High and DoD SRG IL4/IL5.

Security and compliance

Thoroughly implementing security and compliance in the public sector and other highly regulated workloads is very important for achieving an ATO (Authority to Operate) and continuously maintain an ATO (c-ATO). DevSecOps shifts security left in the process, integrating it at each stage of the software factory, which can make ATO a continuous and faster process. With DevSecOps, an organization can deliver secure and compliant application changes rapidly while running operations consistently with automation.

Security and compliance are shared responsibilities between AWS and the customer. Depending on the compliance requirements (such as FedRamp or DoD SRG), a DevSecOps software factory needs to implement certain security controls. AWS provides tools and services to implement most of these controls. For example, to address NIST 800-53 security controls families such as access control, you can use AWS Identity Access and Management (IAM) roles and Amazon Simple Storage Service (Amazon S3) bucket policies. To address auditing and accountability, you can use AWS CloudTrail and Amazon CloudWatch. To address configuration management, you can use AWS Config rules and AWS Systems Manager. Similarly, to address risk assessment, you can use vulnerability scanning tools from AWS.

The following table is the high-level mapping of the NIST 800-53 security control families and AWS services that are used in this DevSecOps reference architecture. This list only includes the services that are defined in the AWS CloudFormation template, which provides pipeline as code in this solution. You can use additional AWS services and tools or other environmental specific services and tools to address these and the remaining security control families on a more granular level.

#	NIST 800-53 Security Control Family – Rev 5	AWS Services Used (In this DevSecOps Pipeline)
1	AC – Access Control	AWS IAM, Amazon S3, and Amazon CloudWatch are used. AWS::IAM::ManagedPolicy AWS::IAM::Role AWS::S3::BucketPolicy AWS::CloudWatch::Alarm
2	AU – Audit and Accountability	AWS CloudTrail, Amazon S3, Amazon SNS, and Amazon CloudWatch are used. AWS::CloudTrail::Trail AWS::Events::Rule AWS::CloudWatch::LogGroup AWS::CloudWatch::Alarm AWS::SNS::Topic
3	CM – Configuration Management	AWS Systems Manager, Amazon S3, and AWS Config are used. AWS::SSM::Parameter AWS::S3::Bucket AWS::Config::ConfigRule
4	CP – Contingency Planning	AWS CodeCommit and Amazon S3 are used. AWS::CodeCommit::Repository AWS::S3::Bucket
5	IA – Identification and Authentication	AWS IAM is used. AWS:IAM:User AWS::IAM::Role
6	RA – Risk Assessment	AWS Config, AWS CloudTrail, AWS Security Hub, and third party scanning tools are used. AWS::Config::ConfigRule AWS::CloudTrail::Trail AWS::SecurityHub::Hub Vulnerability Scanning Tools (AWS/AWS Partner/3rd party)
7	CA – Assessment, Authorization, and Monitoring	AWS CloudTrail, Amazon CloudWatch, and AWS Config are used. AWS::CloudTrail::Trail AWS::CloudWatch::LogGroup AWS::CloudWatch::Alarm AWS::Config::ConfigRule
8	SC – System and Communications Protection	AWS KMS and AWS Systems Manager are used. AWS::KMS::Key AWS::SSM::Parameter SSL/TLS communication
9	SI – System and Information Integrity	AWS Security Hub, and third party scanning tools are used. AWS::SecurityHub::Hub Vulnerability Scanning Tools (AWS/AWS Partner/3rd party)
10	AT – Awareness and Training	N/A
11	SA – System and Services Acquisition	N/A
12	IR – Incident Response	Not implemented, but services like AWS Lambda, and Amazon CloudWatch Events can be used.
13	MA – Maintenance	N/A
14	MP – Media Protection	N/A
15	PS – Personnel Security	N/A
16	PE – Physical and Environmental Protection	N/A
17	PL – Planning	N/A
18	PM – Program Management	N/A
19	PT – PII Processing and Transparency	N/A
20	SR – SupplyChain Risk Management	N/A

Services and tools

In this section, we discuss the various AWS services and third-party tools used in this solution.

CI/CD services

For continuous integration and continuous delivery (CI/CD) in this reference architecture, we use the following AWS services:

AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.
AWS CodeCommit – A fully managed source control service that hosts secure Git-based repositories.
AWS CodeDeploy – A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Fargate, AWS Lambda, and your on-premises servers.
AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates.
AWS Lambda – A service that lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
Amazon Simple Notification Service – Amazon SNS is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.
Amazon S3 – Amazon S3 is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.
AWS Systems Manager Parameter Store – Parameter Store provides secure, hierarchical storage for configuration data management and secrets management.

Continuous testing tools

The following are open-source scanning tools that are integrated in the pipeline for the purpose of this post, but you could integrate other tools that meet your specific requirements. You can use the static code review tool Amazon CodeGuru for static analysis, but at the time of this writing, it’s not yet available in AWS GovCloud and currently supports Java and Python.

Anchore (SCA and SAST) – Anchore Engine is an open-source software system that provides a centralized service for analyzing container images, scanning for security vulnerabilities, and enforcing deployment policies.
Amazon Elastic Container Registry image scanning – Amazon ECR image scanning helps in identifying software vulnerabilities in your container images. Amazon ECR uses the Common Vulnerabilities and Exposures (CVEs) database from the open-source Clair project and provides a list of scan findings.
Git-Secrets (Secrets Scanning) – Prevents you from committing sensitive information to Git repositories. It is an open-source tool from AWS Labs.
OWASP ZAP (DAST) – Helps you automatically find security vulnerabilities in your web applications while you’re developing and testing your applications.
Snyk (SCA and SAST) – Snyk is an open-source security platform designed to help software-driven businesses enhance developer security.
Sysdig Falco (RASP) – Falco is an open source cloud-native runtime security project that detects unexpected application behavior and alerts on threats at runtime. It is the first runtime security project to join CNCF as an incubation-level project.

You can integrate additional security stages like IAST (Interactive Application Security Testing) into the pipeline to get code insights while the application is running. You can use AWS partner tools like Contrast Security, Synopsys, and WhiteSource to integrate IAST scanning into the pipeline. Malware scanning tools, and image signing tools can also be integrated into the pipeline for additional security.

Continuous logging and monitoring services

The following are AWS services for continuous logging and monitoring used in this reference architecture:

Amazon CloudWatch Events – Delivers a near-real-time stream of system events that describe changes in AWS resources
Amazon CloudWatch Logs – Allows you to monitor, store, and access your log files from EC2 instances, CloudTrail, Amazon Route 53, and other sources

Auditing and governance services

The following are AWS auditing and governance services used in this reference architecture:

AWS CloudTrail – Enables governance, compliance, operational auditing, and risk auditing of your AWS account.
AWS Config – Allows you to assess, audit, and evaluate the configurations of your AWS resources.
AWS Identity and Access Management – Enables you to manage access to AWS services and resources securely. With IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.

Operations services

The following are the AWS operations services used in this reference architecture:

AWS CloudFormation – Gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.
Amazon ECR – A fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts anywhere.
Amazon EKS – A managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. Amazon EKS runs up-to-date versions of the open-source Kubernetes software, so you can use all of the existing plugins and tooling from the Kubernetes community.
AWS Security Hub – Gives you a comprehensive view of your security alerts and security posture across your AWS accounts. This post uses Security Hub to aggregate all the vulnerability findings as a single pane of glass.
AWS Systems Manager Parameter Store – Provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values.

Pipeline architecture

The following diagram shows the architecture of the solution. We use AWS CloudFormation to describe the pipeline as code.

Kubernetes DevSecOps Pipeline Architecture

The main steps are as follows:

1. When a user commits the code to CodeCommit repository, a CloudWatch event is generated, which triggers CodePipeline to orchestrate the events.
2. CodeBuild packages the build and uploads the artifacts to an S3 bucket.
3. CodeBuild scans the code with git-secrets. If there is any sensitive information in the code such as AWS access keys or secrets keys, CodeBuild fails the build.
4. CodeBuild creates the container image and perform SCA and SAST by scanning the image with Snyk or Anchore. In the provided CloudFormation template, you can pick one of these tools during the deployment. Please note, CodeBuild is fully enabled for a “bring your own tool” approach.
  - (4a) If there are any vulnerabilities, CodeBuild invokes the Lambda function. The function parses the results into AWS Security Finding Format (ASFF) and posts them to Security Hub. Security Hub helps aggregate and view all the vulnerability findings in one place as a single pane of glass. The Lambda function also uploads the scanning results to an S3 bucket.
  - (4b) If there are no vulnerabilities, CodeBuild pushes the container image to Amazon ECR and triggers another scan using built-in Amazon ECR scanning.
5. CodeBuild retrieves the scanning results.
  - (5a) If there are any vulnerabilities, CodeBuild invokes the Lambda function again and posts the findings to Security Hub. The Lambda function also uploads the scan results to an S3 bucket.
  - (5b) If there are no vulnerabilities, CodeBuild deploys the container image to an Amazon EKS staging environment.
6. After the deployment succeeds, CodeBuild triggers the DAST scanning with the OWASP ZAP tool (again, this is fully enabled for a “bring your own tool” approach).
  - (6a) If there are any vulnerabilities, CodeBuild invokes the Lambda function, which parses the results into ASFF and posts it to Security Hub. The function also uploads the scan results to an S3 bucket (similar to step 4a).
7. If there are no vulnerabilities, the approval stage is triggered, and an email is sent to the approver for action via Amazon SNS.
8. After approval, CodeBuild deploys the code to the production Amazon EKS environment.
9. During the pipeline run, CloudWatch Events captures the build state changes and sends email notifications to subscribed users through Amazon SNS.
10. CloudTrail tracks the API calls and sends notifications on critical events on the pipeline and CodeBuild projects, such as UpdatePipeline, DeletePipeline, CreateProject, and DeleteProject, for auditing purposes.
11. AWS Config tracks all the configuration changes of AWS services. The following AWS Config rules are added in this pipeline as security best practices:
  1. CODEBUILD_PROJECT_ENVVAR_AWSCRED_CHECK – Checks whether the project contains environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The rule is NON_COMPLIANT when the project environment variables contain plaintext credentials. This rule ensures that sensitive information isn’t stored in the CodeBuild project environment variables.
  2. CLOUD_TRAIL_LOG_FILE_VALIDATION_ENABLED – Checks whether CloudTrail creates a signed digest file with logs. AWS recommends that the file validation be enabled on all trails. The rule is noncompliant if the validation is not enabled. This rule ensures that pipeline resources such as the CodeBuild project aren’t altered to bypass critical vulnerability checks.

Security of the pipeline is implemented using IAM roles and S3 bucket policies to restrict access to pipeline resources. Pipeline data at rest and in transit is protected using encryption and SSL secure transport. We use Parameter Store to store sensitive information such as API tokens and passwords. To be fully compliant with frameworks such as FedRAMP, other things may be required, such as MFA.

Security in the pipeline is implemented by performing the Secret Analysis, SCA, SAST, DAST, and RASP security checks. Applicable AWS services provide encryption at rest and in transit by default. You can enable additional controls on top of these wherever required.

In the next section, I explain how to deploy and run the pipeline CloudFormation template used for this example. As a best practice, we recommend using linting tools like cfn-nag and cfn-guard to scan CloudFormation templates for security vulnerabilities. Refer to the provided service links to learn more about each of the services in the pipeline.

Prerequisites

Before getting started, make sure you have the following prerequisites:

An EKS cluster environment with your application deployed. In this post, we use PHP WordPress as a sample application, but you can use any other application.
Sysdig Falco installed on an EKS cluster. Sysdig Falco captures events on the EKS cluster and sends those events to CloudWatch using AWS FireLens. For implementation instructions, see Implementing Runtime security in Amazon EKS using CNCF Falco. This step is required only if you need to implement RASP in the software factory.
A CodeCommit repo with your application code and a Dockerfile. For more information, see Create an AWS CodeCommit repository.
An Amazon ECR repo to store container images and scan for vulnerabilities. Enable vulnerability scanning on image push in Amazon ECR. You can enable or disable the automatic scanning on image push via the Amazon ECR
The provided buildspec-*.yml files for git-secrets, Anchore, Snyk, Amazon ECR, OWASP ZAP, and your Kubernetes deployment .yml files uploaded to the root of the application code repository. Please update the Kubernetes (kubectl) commands in the buildspec files as needed.
A Snyk API key if you use Snyk as a SAST tool.
The Lambda function uploaded to an S3 bucket. We use this function to parse the scan reports and post the results to Security Hub.
An OWASP ZAP URL and generated API key for dynamic web scanning.
An application web URL to run the DAST testing.
An email address to receive approval notifications for deployment, pipeline change notifications, and CloudTrail events.
AWS Config and Security Hub services enabled. For instructions, see Managing the Configuration Recorder and Enabling Security Hub manually, respectively.

Deploying the pipeline

To deploy the pipeline, complete the following steps:

Download the CloudFormation template and pipeline code from the GitHub repo.
Sign in to your AWS account if you have not done so already.
On the CloudFormation console, choose Create Stack.
Choose the CloudFormation pipeline template.
Choose Next.
Under Code, provide the following information:
1. Code details, such as repository name and the branch to trigger the pipeline.
2. The Amazon ECR container image repository name.
Under SAST, provide the following information:
1. Choose the SAST tool (Anchore or Snyk) for code analysis.
2. If you select Snyk, provide an API key for Snyk.
Under DAST, choose the DAST tool (OWASP ZAP) for dynamic testing and enter the API token, DAST tool URL, and the application URL to run the scan.
Under Lambda functions, enter the Lambda function S3 bucket name, filename, and the handler name.
For STG EKS cluster, enter the staging EKS cluster name.
For PRD EKS cluster, enter the production EKS cluster name to which this pipeline deploys the container image.
Under General, enter the email addresses to receive notifications for approvals and pipeline status changes.
Choose Next.
Complete the stack.
After the pipeline is deployed, confirm the subscription by choosing the provided link in the email to receive notifications.

Pipeline CloudFormation Parameters

The provided CloudFormation template in this post is formatted for AWS GovCloud. If you’re setting this up in a standard Region, you have to adjust the partition name in the CloudFormation template. For example, change ARN values from arn:aws-us-gov to arn:aws.

Running the pipeline

To trigger the pipeline, commit changes to your application repository files. That generates a CloudWatch event and triggers the pipeline. CodeBuild scans the code and if there are any vulnerabilities, it invokes the Lambda function to parse and post the results to Security Hub.

When posting the vulnerability finding information to Security Hub, we need to provide a vulnerability severity level. Based on the provided severity value, Security Hub assigns the label as follows. Adjust the severity levels in your code based on your organization’s requirements.

0 – INFORMATIONAL
1–39 – LOW
40– 69 – MEDIUM
70–89 – HIGH
90–100 – CRITICAL

The following screenshot shows the progression of your pipeline.

DevSecOps Kubernetes CI/CD Pipeline

Secrets analysis scanning

In this architecture, after the pipeline is initiated, CodeBuild triggers the Secret Analysis stage using git-secrets and the buildspec-gitsecrets.yml file. Git-Secrets looks for any sensitive information such as AWS access keys and secret access keys. Git-Secrets allows you to add custom strings to look for in your analysis. CodeBuild uses the provided buildspec-gitsecrets.yml file during the build stage.

SCA and SAST scanning

In this architecture, CodeBuild triggers the SCA and SAST scanning using Anchore, Snyk, and Amazon ECR. In this solution, we use the open-source versions of Anchore and Snyk. Amazon ECR uses open-source Clair under the hood, which comes with Amazon ECR for no additional cost. As mentioned earlier, you can choose Anchore or Snyk to do the initial image scanning.

Scanning with Anchore

If you choose Anchore as a SAST tool during the deployment, the build stage uses the buildspec-anchore.yml file to scan the container image. If there are any vulnerabilities, it fails the build and triggers the Lambda function to post those findings to Security Hub. If there are no vulnerabilities, it proceeds to next stage.

Anchore Lambda Code Snippet

Scanning with Snyk

If you choose Snyk as a SAST tool during the deployment, the build stage uses the buildspec-snyk.yml file to scan the container image. If there are any vulnerabilities, it fails the build and triggers the Lambda function to post those findings to Security Hub. If there are no vulnerabilities, it proceeds to next stage.

Snyk Lambda Code Snippet

Scanning with Amazon ECR

If there are no vulnerabilities from Anchore or Snyk scanning, the image is pushed to Amazon ECR, and the Amazon ECR scan is triggered automatically. Amazon ECR lists the vulnerability findings on the Amazon ECR console. To provide a single pane of glass view of all the vulnerability findings and for easy administration, we retrieve those findings and post them to Security Hub. If there are no vulnerabilities, the image is deployed to the EKS staging cluster and next stage (DAST scanning) is triggered.

ECR Lambda Code Snippet

DAST scanning with OWASP ZAP

In this architecture, CodeBuild triggers DAST scanning using the DAST tool OWASP ZAP.

After deployment is successful, CodeBuild initiates the DAST scanning. When scanning is complete, if there are any vulnerabilities, it invokes the Lambda function, similar to SAST analysis. The function parses and posts the results to Security Hub. The following is the code snippet of the Lambda function.

Zap Lambda Code Snippet

The following screenshot shows the results in Security Hub. The highlighted section shows the vulnerability findings from various scanning stages.

Vulnerability Findings in Security Hub

We can drill down to individual resource IDs to get the list of vulnerability findings. For example, if we drill down to the resource ID of SASTBuildProject*, we can review all the findings from that resource ID.

SAST Vulnerabilities in Security Hub

If there are no vulnerabilities in the DAST scan, the pipeline proceeds to the manual approval stage and an email is sent to the approver. The approver can review and approve or reject the deployment. If approved, the pipeline moves to next stage and deploys the application to the production EKS cluster.

Aggregation of vulnerability findings in Security Hub provides opportunities to automate the remediation. For example, based on the vulnerability finding, you can trigger a Lambda function to take the needed remediation action. This also reduces the burden on operations and security teams because they can now address the vulnerabilities from a single pane of glass instead of logging into multiple tool dashboards.

Along with Security Hub, you can send vulnerability findings to your issue tracking systems such as JIRA, Systems Manager SysOps, or can automatically create an incident management ticket. This is outside the scope of this post, but is one of the possibilities you can consider when implementing DevSecOps software factories.

RASP scanning

Sysdig Falco is an open-source runtime security tool. Based on the configured rules, Falco can detect suspicious activity and alert on any behavior that involves making Linux system calls. You can use Falco rules to address security controls like NIST SP 800-53. Falco agents on each EKS node continuously scan the containers running in pods and send the events as STDOUT. These events can be then sent to CloudWatch or any third-party log aggregator to send alerts and respond. For more information, see Implementing Runtime security in Amazon EKS using CNCF Falco. You can also use Lambda to trigger and automatically remediate certain security events.

The following screenshot shows Falco events on the CloudWatch console. The highlighted text describes the Falco event that was triggered based on the default Falco rules on the EKS cluster. You can add additional custom rules to meet your security control requirements. You can also trigger responsive actions from these CloudWatch events using services like Lambda.

Falco alerts in CloudWatch

Cleanup

This section provides instructions to clean up the DevSecOps pipeline setup:

Conclusion

In this post, I presented an end-to-end Kubernetes-based DevSecOps software factory on AWS with continuous testing, continuous logging and monitoring, auditing and governance, and operations. I demonstrated how to integrate various open-source scanning tools, such as Git-Secrets, Anchore, Snyk, OWASP ZAP, and Sysdig Falco for Secret Analysis, SCA, SAST, DAST, and RASP analysis, respectively. To reduce operations overhead, I explained how to aggregate and manage vulnerability findings in Security Hub as a single pane of glass. This post also talked about how to implement security of the pipeline and in the pipeline using AWS Cloud-native services. Finally, I provided the DevSecOps software factory as code using AWS CloudFormation.

To get started with DevSecOps on AWS, see AWS DevOps and the DevOps blog.

Srinivas Manepalli is a DevSecOps Solutions Architect in the U.S. Fed SI SA team at Amazon Web Services (AWS). He is passionate about helping customers, building and architecting DevSecOps and highly available software systems. Outside of work, he enjoys spending time with family, nature and good food.

Create a portable root CA using AWS CloudHSM and ACM Private CA

2021-06-24 J.D. Bean

Post Syndicated from J.D. Bean original https://aws.amazon.com/blogs/security/create-a-portable-root-ca-using-aws-cloudhsm-and-acm-private-ca/

With AWS Certificate Manager Private Certificate Authority (ACM Private CA) you can create private certificate authority (CA) hierarchies, including root and subordinate CAs, without the investment and maintenance costs of operating an on-premises CA.

In this post, I will explain how you can use ACM Private CA with AWS CloudHSM to operate a hybrid public key infrastructure (PKI) in which the root CA is in CloudHSM, and the subordinate CAs are in ACM Private CA. In this configuration your root CA is portable, meaning that it can be securely moved outside of the AWS Region in which it was created.

Important: This post assumes that you are familiar with the ideas of CA trust and hierarchy. The example in this post uses an advanced hybrid configuration for operating PKI.

The Challenge

The root CA private key of your CA hierarchy represents the anchor of trust for all CAs and end entities that use certificates from that hierarchy. A root CA private key generated by ACM Private CA cannot be exported or transferred to another party. You may require the flexibility to move control of your root CA in the future. Situations where you may want to move control of a root CA include cases such as a divestiture of a corporate division or a major corporate reorganization. In this post, I will describe one solution for a hybrid PKI architecture that allows you to take advantage of the availability of ACM Private CA for certificate issuance, while maintaining the flexibility offered by having direct control and portability of your root CA key. The solution I detail in this post uses CloudHSM to create a root CA key that is predominantly kept inactive, along with a signed subordinate CA that is created and managed online in ACM Private CA that you can use for regular issuing of further subordinate or end-entity certificates. In the next section, I show you how you can achieve this.

The hybrid ACM Private CA and CloudHSM solution

With AWS CloudHSM, you can create and use your own encryption keys that use FIPS 140-2 Level 3 validated HSMs. CloudHSM offers you the flexibility to integrate with your applications by using standard APIs, such as PKCS#11. Most importantly for this solution is that CloudHSM offers a suite of standards-compliant SDKs for you to create, export, and import keys. This can make it easy for you to securely exchange your keys with other commercially-available HSMs, as long as your configurations allow it.”

By using AWS CloudHSM to store and perform cryptographic operations with root CA private key, and by using ACM Private CA to manage a first-level subordinate CA key, you maintain a fully cloud-based infrastructure while still retaining access to – and control over – your root CA key pairs. You can keep the key pair of the root in CloudHSM, where you have the ability to escrow the keys, and only generate and use subordinate CAs in ACM Private CA. Figure 1 shows the high-level architecture of this solution.

Figure 1: Architecture overview of portable root CA with AWS CloudHSM and ACM Private CA

Note: The solution in this post creates the root CA and Subordinate CA 0a but does not demonstrate the steps to use Subordinate CA 0a to issue the remainder of the key hierarchy that is depicted in Figure 1.

This architecture relies on a root CA that you create and manage with AWS CloudHSM. The root CA is generally required for use in the following circumstances:

When you create the PKI.
When you need to replace a root CA.
When you need to configure a certificate revocation list (CRL) or Online Certificate Status Protocol (OCSP).

A single direct subordinate intermediate CA is created and managed with AWS ACM Private CA, which I will refer to as the primary subordinate CA, (Subordinate CA 0a in Figure 1). A certificate signing request (CSR) for this primary subordinate CA is then provided to the CloudHSM root CA, and the signed certificate and certificate chain is then imported to ACM Private CA. The primary subordinate CA in ACM Private CA is issued with the same validity duration as the CloudHSM root CA and in day-to-day practice plays the role of a root CA, acting as the single issuer of additional subordinate CAs. These second-level subordinate CAs (Subordinate CA 0b, Subordinate CA 1b, and Subordinate CA 2b in Figure 1) must be issued with a shorter validity period than the root CA or the primary subordinate CA, and may be used as typical subordinate CAs issuing end-entity certificates or further subordinate CAs as appropriate.

The root CA private key that is stored in CloudHSM can be exported to other commercially-available HSMs through a secure key export process if required, or can be taken offline. The CloudHSM cluster can be shut down, and the root CA private key can be securely retained in a CloudHSM backup. In the event that the root CA must be used, a CloudHSM cluster can be provisioned on demand, and the backup restored temporarily.

Prerequisites

To follow this walkthrough, you need to have the following in place:

A deployed, initialized, and activated CloudHSM cluster with one HSM device.
An Amazon Linux 2 Amazon Elastic Compute Cloud (Amazon EC2) instance. For ease of setup and access AWS Cloud9 can be used to provide a cloud-based environment running on a managed Amazon EC2 instance.
The AWS CloudHSM client installed and configured.
The AWS Command Line Interface (CLI) installed and configured.
A crypto user (CU) on the HSM to perform the steps in this post.

Process

In this post, you will create an ACM Private CA subordinate CA that is chained to a root CA that is created and managed with AWS CloudHSM. The high-level steps are as follows:

Create a root CA with AWS CloudHSM
Create a subordinate CA in ACM Private CA
Sign your subordinate CA with your root CA
Import the signed subordinate CA certificate in ACM Private CA
Remove any unused CloudHSM resources to reduce cost

To create a root CA with AWS CloudHSM

To install the AWS CloudHSM dynamic engine for OpenSSL on Amazon Linux 2, open a terminal on your Amazon Linux 2 EC2 instance and enter the following commands:

wget https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-dyn-latest.el7.x86_64.rpm

sudo yum install -y ./cloudhsm-client-dyn-latest.el7.x86_64.rpm

To set an environment variable that contains your CU credentials, enter the following command, replacing USER and PASSWORD with your own information:
```
Export n3fips_password=USER:PASSWORD
```
To generate a private key using the AWS CloudHSM dynamic engine for OpenSSL, enter the following command:
```
openssl genrsa -engine cloudhsm -out Root_CA_FAKE_PEM.key
```
Note: This process exports a fake PEM private key from the HSM and saves it to a file. This file contains a reference to the private key that is stored on the HSM; it doesn’t contain the actual private key. You can use this fake PEM private key file and the AWS CloudHSM engine for OpenSSL to perform CA operations using the referenced private key within the HSM.
To generate a CSR for your certificate using the AWS CloudHSM dynamic engine for OpenSSL, enter the following command:
```
openssl req -engine cloudhsm -new -key Root_CA_FAKE_PEM.key -out Root_CA.csr
```
When prompted, enter your values for Country Name, State or Province Name, Locality Name, Organization Name, Organizational Unit Name, and Common Name. For the purposes of this walkthrough, you can leave the other fields blank.
Figure 2 shows an example result of running the command.

Figure 2: An example certificate signing request for your private key using AWS CloudHSM dynamic engine for OpenSSL
To sign your root CA with its own private key using the AWS CloudHSM dynamic engine for OpenSSL, enter the following command:
```
openssl x509 -engine cloudhsm -req -days 3650 -in Root_CA.csr -signkey Root_CA_FAKE_PEM.key -out Root_CA.crt
```

To create a subordinate CA in ACM Private CA

To create a CA configuration file for your subordinate CA, open a terminal on your Amazon Linux 2 EC2 instance and enter the following command. Replace each user input placeholder with your own information.

cat ‘{
  "KeyAlgorithm":"RSA_2048",
  "SigningAlgorithm":"SHA256WITHRSA",
  "Subject":{
    "Country":"US"
    "Organization":"Example Corp",
    "OrganizastionalUnit":"Sales",
    "State":"WA",
    "Locality":"Seattle",
    "CommonName":"www.example.com"
  }
}’ > ca_config.txt

To create a sample subordinate CA, enter the following command:
```
aws acm-pca create-certificate-authority --certificate-authority-configuration file://ca_config.txt --certificate-authority-type "SUBORDINATE" --tags Key=Name,Value=MyPrivateSubordinateCA
```
Figure 3 shows a sample successful result of this command.

Figure 3: A sample response from the acm-pca create-certificate-authority command.

For more information about how to create a CA in ACM Private CA and additional configuration options, see Procedures for Creating a CA in the ACM Private CA User Guide, and the acm-pca create-certificate-authority command in the AWS CLI Command Reference.

To sign the subordinate CA with the root CA

To retrieve the certificate signing request (CSR) for your subordinate CA, open a terminal on your Amazon Linux 2 EC2 instance and enter the following command. Replace each user input placeholder with your own information.
```
aws acm-pca get-certificate-authority-csr --certificate-authority-arn arn:aws:acm-pca:region:account:certificate-authority/12345678-1234-1234-1234-123456789012 > IntermediateCA.csr
```
For demonstration purposes, you can create a sample CA config file by entering the following command:
```
cat > ext.conf << EOF
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer
basicConstraints = critical, CA:true
keyUsage = critical, digitalSignature, cRLSign, keyCertSign
EOF
```
When you are ready to implement the solution in this post, you will need to create a root CA configuration file for signing the CSR for your subordinate CA. Details of your X.509 infrastructure, and the CA hierarchy within it, are beyond the scope of this post.

To sign the CSR for your subordinate CA using the sample minimalist CA application OpenSSL-CA, enter the following command:

openssl x509 -engine cloudhsm -extfile ext.conf -req -in IntermediateCA.csr -CA Root_CA.crt -CAkey Root_CA_FAKE_PEM.key -CAcreateserial -days 3650 -sha256 -out IntermediateCA.crt

Importing your signed Subordinate CA Certificate

To import the private CA certificate into ACM Private CA, open a terminal on your Amazon Linux 2 EC2 instance and enter the following command. Replace each user input placeholder with your own information.

aws acm-pca import-certificate-authority-certificate --certificate-authority-arn arn:aws:acm-pca:region:account:certificate-authority/1234678-1234-1234-123456789012 --certificate file://IntermediateCA.crt --certificate-chain file://Root_CA.crt

Shutting down CloudHSM resources

After you import your subordinate CA, it is available for use in ACM Private CA. You can configure the subordinate CA with the same validity period as the root CA, so that you can automate CA certificate management using and renewals using ACM without requiring regular access to the root CA. Typically you will create one or more intermediate issuing CAs with a shorter lifetime that chain up to the subordinate CA.

If you have enabled OCSP or CRL for your CA, you will need to maintain your CloudHSM in an active state in order to access the root CA private key for these functions. However, if you have no immediate need to access the root CA you can safely remove the CloudHSM resources while preserving your AWS CloudHSM cluster’s users, policies, and keys in an CloudHSM cluster backup stored encrypted in Amazon Simple Storage Service (Amazon S3).

To remove the CloudHSM resources

(Optional) If you don’t know the ID of the cluster that contains the HSM that you are deleting, or your HSM IP address, open a terminal on your Amazon Linux 2 EC2 instance and enter the describe-clusters command to find them.
Enter the following command, replacing cluster ID with the ID of the cluster that contains the HSM that you are deleting, and replacing HSM IP address with your HSM IP address.
```
aws cloudhsmv2 delete-hsm --cluster-id cluster ID --eni-ip HSM IP address
```

To disable expiration of your automatically generated CloudHSM backup

(Optional) If you don’t know the value for your backup ID, open a terminal on your Amazon Linux 2 EC2 instance and enter the describe-backups command to find it.
Enter the following command, replacing backup ID with the ID of the backup for your cluster.
```
aws cloudhsmv2 modify-backup-attributes --backup-id backup ID --never-expires
```

Later, when you do need to access your root CA private key in a CloudHSM, create a new HSM in the same cluster, and this action will restore the backup you previously created with the delete HSM operation.

Depending on your particular needs, you may also want to securely export a copy of the root CA private key to an offsite HSM by using key wrapping. You may need to do this to meet your requirements for managing the CA using another HSM or you may want to copy a cluster backup to a different AWS Region for disaster recovery purposes.

Summary

In this post, I explained an approach to establishing a PKI infrastructure using Amazon Certificate Manager Private Certificate Authority (ACM Private CA) with portable root CA private keys created and managed with AWS CloudHSM. This approach allows you to meet specific requirements for root CA portability that cannot be met by ACM Private CA alone. Before adopting this approach in production, you should carefully consider whether a portable root CA is a requirement for your use case, and review the ACM Private CA guide for Planning a Private CA.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How to implement SaaS tenant isolation with ABAC and AWS IAM

2021-06-09 Michael Pelts

Post Syndicated from Michael Pelts original https://aws.amazon.com/blogs/security/how-to-implement-saas-tenant-isolation-with-abac-and-aws-iam/

Multi-tenant applications must be architected so that the resources of each tenant are isolated and cannot be accessed by other tenants in the system. AWS Identity and Access Management (IAM) is often a key element in achieving this goal. One of the challenges with using IAM, however, is that the number and complexity of IAM policies you need to support your tenants can grow rapidly and impact the scale and manageability of your isolation model. The attribute-based access control (ABAC) mechanism of IAM provides developers with a way to address this challenge.

In this blog post, we describe and provide detailed examples of how you can use ABAC in IAM to implement tenant isolation in a multi-tenant environment.

Choose an IAM isolation method

IAM makes it possible to implement tenant isolation and scope down permissions in a way that is integrated with other AWS services. By relying on IAM, you can create strong isolation foundations in your system, and reduce the risk of developers unintentionally introducing code that leads to a violation of tenant boundaries. IAM provides an AWS native, non-invasive way for you to achieve isolation for those cases where IAM supports policies that align with your overall isolation model.

There are several methods in IAM that you can use for isolating tenants and restricting access to resources. Choosing the right method for your application depends on several parameters. The number of tenants and the number of role definitions are two important dimensions that you should take into account.

Most applications require multiple role definitions for different user functions. A role definition refers to a minimal set of privileges that users or programmatic components need in order to do their job. For example, business users and data analysts would typically have different set of permissions to allow minimum necessary access to resources that they use.

In software-as-a-service (SaaS) applications, in addition to functional boundaries, there are also boundaries between tenant resources. As a result, the entire set of role definitions exists for each individual tenant. In highly dynamic environments (e.g., collaboration scenarios with cross-tenant access), new role definitions can be added ad-hoc. In such a case, the number of role definitions and their complexity can grow significantly as the system evolves.

There are three main tenant isolation methods in IAM. Let’s briefly review them before focusing on the ABAC in the following sections.

Figure 1: IAM tenant isolation methods

RBAC – Each tenant has a dedicated IAM role or static set of IAM roles that it uses for access to tenant resources. The number of IAM roles in RBAC equals to the number of role definitions multiplied by the number of tenants. RBAC works well when you have a small number of tenants and relatively static policies. You may find it difficult to manage multiple IAM roles as the number of tenants and the complexity of the attached policies grows.

Dynamically generated IAM policies – This method dynamically generates an IAM policy for a tenant according to user identity. Choose this method in highly dynamic environments with changing or frequently added role definitions (e.g., tenant collaboration scenario). You may also choose dynamically generated policies if you have a preference for generating and managing IAM policies by using your code rather than relying on built-in IAM service features. You can find more details about this method in the blog post Isolating SaaS Tenants with Dynamically Generated IAM Policies.

ABAC – This method is suitable for a wide range of SaaS applications, unless your use case requires support for frequently changed or added role definitions, which are easier to manage with dynamically generated IAM policies. Unlike Dynamically generated IAM policies, where you manage and apply policies through a self-managed mechanism, ABAC lets you rely more directly on IAM.

ABAC for tenant isolation

ABAC is achieved by using parameters (attributes) to control tenant access to resources. Using ABAC for tenant isolation results in temporary access to resources, which is restricted according to the caller’s identity and attributes.

One of the key advantages of the ABAC model is that it scales to any number of tenants with a single role. This is achieved by using tags (such as the tenant ID) in IAM polices and a temporary session created specifically for accessing tenant data. The session encapsulates the attributes of the requesting entity (for example, a tenant user). At policy evaluation time, IAM replaces these tags with session attributes.

Another component of ABAC is the assignation of attributes to tenant resources by using special naming conventions or resource tags. The access to a resource is granted when session and resource attributes match (for example, a session with the TenantID: yellow attribute can access a resource that is tagged as TenantID: yellow).

For more information about ABAC in IAM, see What is ABAC for AWS?

ABAC in a typical SaaS architecture

To demonstrate how you can use ABAC in IAM for tenant isolation, we will walk you through an example of a typical microservices-based application. More specifically, we will focus on two microservices that implement a shipment tracking flow in a multi-tenant ecommerce application.

Our sample tenant, Yellow, which has many users in many roles, has exclusive access to shipment data that belongs to this particular tenant. To achieve this, all microservices in the call chain operate in a restricted context that prevents cross-tenant access.

Figure 2: Sample shipment tracking flow in a SaaS application

Let’s take a closer look at the sequence of events and discuss the implementation in detail.

A shipment tracking request is initiated by an authenticated Tenant Yellow user. The authentication process is left out of the scope of this discussion for the sake of brevity. The user identity expressed in the JSON Web Token (JWT) includes custom claims, one of which is a TenantID. In this example, TenantID equals yellow.

The JWT is delivered from the user’s browser in the HTTP header of the Get Shipment request to the shipment service. The shipment service then authenticates the request and collects the required parameters for getting the shipment estimated time of arrival (ETA). The shipment service makes a GetShippingETA request using the parameters to the tracking service along with the JWT.

The tracking service manages shipment tracking data in a data repository. The repository stores data for all of the tenants, but each shipment record there has an attached TenantID resource tag, for instance yellow, as in our example.

An IAM role attached to the tracking service, called TrackingServiceRole in this example, determines the AWS resources that the microservice can access and the actions it can perform.

Note that TrackingServiceRole itself doesn’t have permissions to access tracking data in the data store. To get access to tracking records, the tracking service temporarily assumes another role called TrackingAccessRole. This role remains valid for a limited period of time, until credentials representing this temporary session expire.

To understand how this works, we need to talk first about the trust relationship between TrackingAccessRole and TrackingServiceRole. The following trust policy lists TrackingServiceRole as a trusted entity.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/TrackingServiceRole"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/TrackingServiceRole"
      },
      "Action": "sts:TagSession",
      "Condition": {
        "StringLike": {
          "aws:RequestTag/TenantID": "*"
        }
      }
    }
  ]
}

This policy needs to be associated with TrackingAccessRole. You can do that on the Trust relationships tab of the Role Details page in the IAM console or via the AWS CLI update-assume-role-policy method. That association is what allows the tracking service with the attached TrackingServiceRole role to assume TrackingAccessRole. The policy also allows TrackingServiceRole to attach the TenantID session tag to the temporary sessions it creates.

Session tags are principal tags that you specify when you request a session. This is how you inject variables into the request context for API calls executed during the session. This is what allows IAM policies evaluated in subsequent API calls to reference TenantID with the aws:PrincipalTag context key.

Now let’s talk about TrackingAccessPolicy. It’s an identity policy attached to TrackingAccessRole. This policy makes use of the aws:PrincipalTag/TenantID key to dynamically scope access to a specific tenant.

Later in this post, you can see examples of such data access policies for three different data storage services.

Now the stage is set to see how the tracking service creates a temporary session and injects TenantID into the request context. The following Python function does that by using AWS SDK for Python (Boto3). The function gets the TenantID (extracted from the JWT) and the TrackingAccessRole Amazon Resource Name (ARN) as parameters and returns a scoped Boto3 session object.

import boto3

def create_temp_tenant_session(access_role_arn, session_name, tenant_id, duration_sec):
    """
    Create a temporary session
    :param access_role_arn: The ARN of the role that the caller is assuming
    :param session_name: An identifier for the assumed session
    :param tenant_id: The tenant identifier the session is created for
    :param duration_sec: The duration, in seconds, of the temporary session
    :return: The session object that allows you to create service clients and resources
    """
    sts = boto3.client('sts')
    assume_role_response = sts.assume_role(
        RoleArn=access_role_arn,
        DurationSeconds=duration_sec,
        RoleSessionName=session_name,
        Tags=[
            {
                'Key': 'TenantID',
                'Value': tenant_id
            }
        ]
    )
    session = boto3.Session(aws_access_key_id=assume_role_response['Credentials']['AccessKeyId'],
                    aws_secret_access_key=assume_role_response['Credentials']['SecretAccessKey'],
                    aws_session_token=assume_role_response['Credentials']['SessionToken'])
    return session

Use these parameters to create temporary sessions for a specific tenant with a duration that meets your needs.

access_role_arn – The assumed role with an attached templated policy. The IAM policy must include the aws:PrincipalTag/TenantID tag key.

session_name – The name of the session. Use the role session name to uniquely identify a session. The role session name is used in the ARN of the assumed role principal and included in the AWS CloudTrail logs.

tenant_id – The tenant identifier that describes which tenant the session is created for. For better compatibility with resource names in IAM policies, it’s recommended to generate non-guessable alphanumeric lowercase tenant identifiers.

duration_sec – The duration of your temporary session.

Note: The details of token management can be abstracted away from the application by extracting the token generation into a separate module, as described in the blog post Isolating SaaS Tenants with Dynamically Generated IAM Policies. In that post, the reusable application code for acquiring temporary session tokens is called a Token Vending Machine.

The returned session can be used to instantiate IAM-scoped objects such as a storage service. After the session is returned, any API call performed with the temporary session credentials contains the aws:PrincipalTag/TenantID key-value pair in the request context.

When the tracking service attempts to access tracking data, IAM completes several evaluation steps to determine whether to allow or deny the request. These include evaluation of the principal’s identity-based policy, which is, in this example, represented by TrackingAccessPolicy. It is at this stage that the aws:PrincipalTag/TenantID tag key is replaced with the actual value, policy conditions are resolved, and access is granted to the tenant data.

Common ABAC scenarios

Let’s take a look at some common scenarios with different data storage services. For each example, a diagram is included that illustrates the allowed access to tenant data and how the data is partitioned in the service.

These examples rely on the architecture described earlier and assume that the temporary session context contains a TenantID parameter. We will demonstrate different versions of TrackingAccessPolicy that are applicable to different services. The way aws:PrincipalTag/TenantID is used depends on service-specific IAM features, such as tagging support, policy conditions and ability to parameterize resource ARN with session tags. Examples below illustrate these techniques applied to different services.

Pooled storage isolation with DynamoDB

Many SaaS applications rely on a pooled data partitioning model where data from all tenants is combined into a single table. The tenant identifier is then introduced into each table to identify the items that are associated with each tenant. Figure 3 provides an example of this model.

Figure 3: DynamoDB index-based partitioning

In this example, we’ve used Amazon DynamoDB, storing each tenant identifier in the table’s partition key. Now, we can use ABAC and IAM fine-grained access control to implement tenant isolation for the items in this table.

The following TrackingAccessPolicy uses the dynamodb:LeadingKeys condition key to restrict permissions to only the items whose partition key matches the tenant’s identifier as passed in a session tag.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:BatchGetItem",
        "dynamodb:Query"
      ],
      "Resource": [
        "arn:aws:dynamodb:<region>:<account-id>:table/TrackingData"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
          "dynamodb:LeadingKeys": [
            "${aws:PrincipalTag/TenantID}"
          ]
        }
      }
    }
  ]
}

This example uses the dynamodb:LeadingKeys condition key in the policy to describe how you can control access to tenant resources. You’ll notice that we haven’t bound this policy to any specific tenant. Instead, the policy relies on the aws:PrincipalTag tag to resolve the TenantID parameter at runtime.

This approach means that you can add new tenants without needing to create any new IAM constructs. This reduces the maintenance burden and limits your chances that any IAM quotas will be exceeded.

Siloed storage isolation with Amazon Elasticsearch Service

Let’s look at another example that illustrates how you might implement tenant isolation of Amazon Elasticsearch Service resources. Figure 4 illustrates a silo data partitioning model, where each tenant of your system has a separate Elasticsearch index for each tenant.

Figure 4: Elasticsearch index-per-tenant strategy

You can isolate these tenant resources by creating a parameterized identity policy with the principal TenantID tag as a variable (similar to the one we created for DynamoDB). In the following example, the principal tag is a part of the index name in the policy resource element. At access time, the principal tag is replaced with the tenant identifier from the request context, yielding the Elasticsearch index ARN as a result.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "es:ESHttpGet",
        "es:ESHttpPut"
      ],
      "Resource": [
        "arn:aws:es:<region>:<account-id>:domain/test/${aws:PrincipalTag/TenantID}*/*"
      ]
    }
  ]
}

In the case where you have multiple indices that belong to the same tenant, you can allow access to them by using a wildcard. The preceding policy allows es:ESHttpGet and es:ESHttpPut actions to be taken on documents if the documents belong to an index with a name that matches the pattern.

Important: In order for this to work, the tenant identifier must follow the same naming restrictions as indices.

Although this approach scales the tenant isolation strategy, you need to keep in mind that this solution is constrained by the number of indices your Elasticsearch cluster can support.

Amazon S3 prefix-per-tenant strategy

Amazon Simple Storage Service (Amazon S3) buckets are commonly used as shared object stores with dedicated prefixes for different tenants. For enhanced security, you can optionally use a dedicated customer master key (CMK) per tenant. If you do so, attach a corresponding TenantID resource tag to a CMK.

By using ABAC and IAM, you can make sure that each tenant can only get and decrypt objects in a shared S3 bucket that have the prefixes that correspond to that tenant.

Figure 5: S3 prefix-per-tenant strategy

In the following policy, the first statement uses the TenantID principal tag in the resource element. The policy grants s3:GetObject permission, but only if the requested object key begins with the tenant’s prefix.

The second statement allows the kms:Decrypt operation on a KMS key that the requested object is encrypted with. The KMS key must have a TenantID resource tag attached to it with a corresponding tenant ID as a value.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::sample-bucket-12345678/${aws:PrincipalTag/TenantID}/*"
    },
    {
      "Effect": "Allow",
      "Action": "kms:Decrypt",
       "Resource": "arn:aws:kms:<region>:<account-id>:key/*",
       "Condition": {
           "StringEquals": {
           "aws:PrincipalTag/TenantID": "${aws:ResourceTag/TenantID}"
        }
      }
    }
  ]
}

Important: In order for this policy to work, the tenant identifier must follow the S3 object key name guidelines.

With the prefix-per-tenant approach, you can support any number of tenants. However, if you choose to use a dedicated customer managed KMS key per tenant, you will be bounded by the number of KMS keys per AWS Region.

Conclusion

The ABAC method combined with IAM provides teams who build SaaS platforms with a compelling model for implementing tenant isolation. By using this dynamic, attribute-driven model, you can scale your IAM isolation policies to any practical number of tenants. This approach also makes it possible for you to rely on IAM to manage, scale, and enforce isolation in a way that’s integrated into your overall tenant identity scheme. You can start experimenting with IAM ABAC by using either the examples in this blog post, or this resource: IAM Tutorial: Define permissions to access AWS resources based on tags.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How to protect sensitive data for its entire lifecycle in AWS

2021-02-26 Raj Jain

Post Syndicated from Raj Jain original https://aws.amazon.com/blogs/security/how-to-protect-sensitive-data-for-its-entire-lifecycle-in-aws/

Many Amazon Web Services (AWS) customer workflows require ingesting sensitive and regulated data such as Payments Card Industry (PCI) data, personally identifiable information (PII), and protected health information (PHI). In this post, I’ll show you a method designed to protect sensitive data for its entire lifecycle in AWS. This method can help enhance your data security posture and be useful for fulfilling the data privacy regulatory requirements applicable to your organization for data protection at-rest, in-transit, and in-use.

An existing method for sensitive data protection in AWS is to use the field-level encryption feature offered by Amazon CloudFront. This CloudFront feature protects sensitive data fields in requests at the AWS network edge. The chosen fields are protected upon ingestion and remain protected throughout the entire application stack. The notion of protecting sensitive data early in its lifecycle in AWS is a highly desirable security architecture. However, CloudFront can protect a maximum of 10 fields and only within HTTP(S) POST requests that carry HTML form encoded payloads.

If your requirements exceed CloudFront’s native field-level encryption feature, such as a need to handle diverse application payload formats, different HTTP methods, and more than 10 sensitive fields, you can implement field-level encryption yourself using the Lambda@Edge feature in CloudFront. In terms of choosing an appropriate encryption scheme, this problem calls for an asymmetric cryptographic system that will allow public keys to be openly distributed to the CloudFront network edges while keeping the corresponding private keys stored securely within the network core. One such popular asymmetric cryptographic system is RSA. Accordingly, we’ll implement a Lambda@Edge function that uses asymmetric encryption using the RSA cryptosystem to protect an arbitrary number of fields in any HTTP(S) request. We will discuss the solution using an example JSON payload, although this approach can be applied to any payload format.

A complex part of any encryption solution is key management. To address that, I use AWS Key Management Service (AWS KMS). AWS KMS simplifies the solution and offers improved security posture and operational benefits, detailed later.

Solution overview

You can protect data in-transit over individual communications channels using transport layer security (TLS), and at-rest in individual storage silos using volume encryption, object encryption or database table encryption. However, if you have sensitive workloads, you might need additional protection that can follow the data as it moves through the application stack. Fine-grained data protection techniques such as field-level encryption allow for the protection of sensitive data fields in larger application payloads while leaving non-sensitive fields in plaintext. This approach lets an application perform business functions on non-sensitive fields without the overhead of encryption, and allows fine-grained control over what fields can be accessed by what parts of the application.

A best practice for protecting sensitive data is to reduce its exposure in the clear throughout its lifecycle. This means protecting data as early as possible on ingestion and ensuring that only authorized users and applications can access the data only when and as needed. CloudFront, when combined with the flexibility provided by Lambda@Edge, provides an appropriate environment at the edge of the AWS network to protect sensitive data upon ingestion in AWS.

Since the downstream systems don’t have access to sensitive data, data exposure is reduced, which helps to minimize your compliance footprint for auditing purposes.

The number of sensitive data elements that may need field-level encryption depends on your requirements. For example:

For healthcare applications, HIPAA regulates 18 personal data elements.
In California, the California Consumer Privacy Act (CCPA) regulates at least 11 categories of personal information—each with its own set of data elements.

The idea behind field-level encryption is to protect sensitive data fields individually, while retaining the structure of the application payload. The alternative is full payload encryption, where the entire application payload is encrypted as a binary blob, which makes it unusable until the entirety of it is decrypted. With field-level encryption, the non-sensitive data left in plaintext remains usable for ordinary business functions. When retrofitting data protection in existing applications, this approach can reduce the risk of application malfunction since the data format is maintained.

The following figure shows how PII data fields in a JSON construction that are deemed sensitive by an application can be transformed from plaintext to ciphertext with a field-level encryption mechanism.

Figure 1: Example of field-level encryption

You can change plaintext to ciphertext as depicted in Figure 1 by using a Lambda@Edge function to perform field-level encryption. I discuss the encryption and decryption processes separately in the following sections.

Field-level encryption process

Let’s discuss the individual steps involved in the encryption process as shown in Figure 2.

Figure 2: Field-level encryption process

Figure 2 shows CloudFront invoking a Lambda@Edge function while processing a client request. CloudFront offers multiple integration points for invoking Lambda@Edge functions. Since you are processing a client request and your encryption behavior is related to requests being forwarded to an origin server, you want your function to run upon the origin request event in CloudFront. The origin request event represents an internal state transition in CloudFront that happens immediately before CloudFront forwards a request to the downstream origin server.

You can associate your Lambda@Edge with CloudFront as described in Adding Triggers by Using the CloudFront Console. A screenshot of the CloudFront console is shown in Figure 3. The selected event type is Origin Request and the Include Body check box is selected so that the request body is conveyed to Lambda@Edge.

Figure 3: Configuration of Lambda@Edge in CloudFront

The Lambda@Edge function acts as a programmable hook in the CloudFront request processing flow. You can use the function to replace the incoming request body with a request body with the sensitive data fields encrypted.

The process includes the following steps:

Step 1 – RSA key generation and inclusion in Lambda@Edge

You can generate an RSA customer managed key (CMK) in AWS KMS as described in Creating asymmetric CMKs. This is done at system configuration time.

Note: You can use your existing RSA key pairs or generate new ones externally by using OpenSSL commands, especially if you need to perform RSA decryption and key management independently of AWS KMS. Your choice won’t affect the fundamental encryption design pattern presented here.

The RSA key creation in AWS KMS requires two inputs: key length and type of usage. In this example, I created a 2048-bit key and assigned its use for encryption and decryption. The cryptographic configuration of an RSA CMK created in AWS KMS is shown in Figure 4.

Figure 4: Cryptographic properties of an RSA key managed by AWS KMS

Of the two encryption algorithms shown in Figure 4— RSAES_OAEP_SHA_256 and RSAES_OAEP_SHA_1, this example uses RSAES_OAEP_SHA_256. The combination of a 2048-bit key and the RSAES_OAEP_SHA_256 algorithm lets you encrypt a maximum of 190 bytes of data, which is enough for most PII fields. You can choose a different key length and encryption algorithm depending on your security and performance requirements. How to choose your CMK configuration includes information about RSA key specs for encryption and decryption.

Using AWS KMS for RSA key management versus managing the keys yourself eliminates that complexity and can help you:

Enforce IAM and key policies that describe administrative and usage permissions for keys.
Manage cross-account access for keys.
Monitor and alarm on key operations through Amazon CloudWatch.
Audit AWS KMS API invocations through AWS CloudTrail.
Record configuration changes to keys and enforce key specification compliance through AWS Config.
Generate high-entropy keys in an AWS KMS hardware security module (HSM) as required by NIST.
Store RSA private keys securely, without the ability to export.
Perform RSA decryption within AWS KMS without exposing private keys to application code.
Categorize and report on keys with key tags for cost allocation.
Disable keys and schedule their deletion.

You need to extract the RSA public key from AWS KMS so you can include it in the AWS Lambda deployment package. You can do this from the AWS Management Console, through the AWS KMS SDK, or by using the get-public-key command in the AWS Command Line Interface (AWS CLI). Figure 5 shows Copy and Download options for a public key in the Public key tab of the AWS KMS console.

Figure 5: RSA public key available for copy or download in the console

Note: As we will see in the sample code in step 3, we embed the public key in the Lambda@Edge deployment package. This is a permissible practice because public keys in asymmetric cryptography systems aren’t a secret and can be freely distributed to entities that need to perform encryption. Alternatively, you can use Lambda@Edge to query AWS KMS for the public key at runtime. However, this introduces latency, increases the load against your KMS account quota, and increases your AWS costs. General patterns for using external data in Lambda@Edge are described in Leveraging external data in Lambda@Edge.

Step 2 – HTTP API request handling by CloudFront

CloudFront receives an HTTP(S) request from a client. CloudFront then invokes Lambda@Edge during origin-request processing and includes the HTTP request body in the invocation.

Step 3 – Lambda@Edge processing

The Lambda@Edge function processes the HTTP request body. The function extracts sensitive data fields and performs RSA encryption over their values.

The following code is sample source code for the Lambda@Edge function implemented in Python 3.7:

import Crypto
import base64
import json
from Crypto.Cipher import PKCS1_OAEP
from Crypto.PublicKey import RSA

# PEM-formatted RSA public key copied over from AWS KMS or your own public key.
RSA_PUBLIC_KEY = "-----BEGIN PUBLIC KEY-----<your key>-----END PUBLIC KEY-----"
RSA_PUBLIC_KEY_OBJ = RSA.importKey(RSA_PUBLIC_KEY)
RSA_CIPHER_OBJ = PKCS1_OAEP.new(RSA_PUBLIC_KEY_OBJ, Crypto.Hash.SHA256)

# Example sensitive data field names in a JSON object. 
PII_SENSITIVE_FIELD_NAMES = ["fname", "lname", "email", "ssn", "dob", "phone"]

CIPHERTEXT_PREFIX = "#01#"
CIPHERTEXT_SUFFIX = "#10#"

def lambda_handler(event, context):
    # Extract HTTP request and its body as per documentation:
    # https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html
    http_request = event['Records'][0]['cf']['request']
    body = http_request['body']
    org_body = base64.b64decode(body['data'])
    mod_body = protect_sensitive_fields_json(org_body)
    body['action'] = 'replace'
    body['encoding'] = 'text'
    body['data'] = mod_body
    return http_request


def protect_sensitive_fields_json(body):
    # Encrypts sensitive fields in sample JSON payload shown earlier in this post.
    # [{"fname": "Alejandro", "lname": "Rosalez", … }]
    person_list = json.loads(body.decode("utf-8"))
    for person_data in person_list:
        for field_name in PII_SENSITIVE_FIELD_NAMES:
            if field_name not in person_data:
                continue
            plaintext = person_data[field_name]
            ciphertext = RSA_CIPHER_OBJ.encrypt(bytes(plaintext, 'utf-8'))
            ciphertext_b64 = base64.b64encode(ciphertext).decode()
            # Optionally, add unique prefix/suffix patterns to ciphertext
            person_data[field_name] = CIPHERTEXT_PREFIX + ciphertext_b64 + CIPHERTEXT_SUFFIX 
    return json.dumps(person_list)

The event structure passed into the Lambda@Edge function is described in Lambda@Edge Event Structure. Following the event structure, you can extract the HTTP request body. In this example, the assumption is that the HTTP payload carries a JSON document based on a particular schema defined as part of the API contract. The input JSON document is parsed by the function, converting it into a Python dictionary. The Python native dictionary operators are then used to extract the sensitive field values.

Note: If you don’t know your API payload structure ahead of time or you’re dealing with unstructured payloads, you can use techniques such as regular expression pattern searches and checksums to look for patterns of sensitive data and target them accordingly. For example, credit card primary account numbers include a Luhn checksum that can be programmatically detected. Additionally, services such as Amazon Comprehend and Amazon Macie can be leveraged for detecting sensitive data such as PII in application payloads.

While iterating over the sensitive fields, individual field values are encrypted using the standard RSA encryption implementation available in the Python Cryptography Toolkit (PyCrypto). The PyCrypto module is included within the Lambda@Edge zip archive as described in Lambda@Edge deployment package.

The example uses the standard optimal asymmetric encryption padding (OAEP) and SHA-256 encryption algorithm properties. These properties are supported by AWS KMS and will allow RSA ciphertext produced here to be decrypted by AWS KMS later.

Note: You may have noticed in the code above that we’re bracketing the ciphertexts with predefined prefix and suffix strings:

person_data[field_name] = CIPHERTEXT_PREFIX + ciphertext_b64 + CIPHERTEXT_SUFFIX

This is an optional measure and is being implemented to simplify the decryption process.

The prefix and suffix strings help demarcate ciphertext embedded in unstructured data in downstream processing and also act as embedded metadata. Unique prefix and suffix strings allow you to extract ciphertext through string or regular expression (regex) searches during the decryption process without having to know the data body format or schema, or the field names that were encrypted.

Distinct strings can also serve as indirect identifiers of RSA key pair identifiers. This can enable key rotation and allow separate keys to be used for separate fields depending on the data security requirements for individual fields.

You can ensure that the prefix and suffix strings can’t collide with the ciphertext by bracketing them with characters that don’t appear in cyphertext. For example, a hash (#) character cannot be part of a base64 encoded ciphertext string.

Deploying a Lambda function as a Lambda@Edge function requires specific IAM permissions and an IAM execution role. Follow the Lambda@Edge deployment instructions in Setting IAM Permissions and Roles for Lambda@Edge.

Step 4 – Lambda@Edge response

The Lambda@Edge function returns the modified HTTP body back to CloudFront and instructs it to replace the original HTTP body with the modified one by setting the following flag:

http_request['body']['action'] = 'replace'

Step 5 – Forward the request to the origin server

CloudFront forwards the modified request body provided by Lambda@Edge to the origin server. In this example, the origin server writes the data body to persistent storage for later processing.

Field-level decryption process

An application that’s authorized to access sensitive data for a business function can decrypt that data. An example decryption process is shown in Figure 6. The figure shows a Lambda function as an example compute environment for invoking AWS KMS for decryption. This functionality isn’t dependent on Lambda and can be performed in any compute environment that has access to AWS KMS.

Figure 6: Field-level decryption process

The steps of the process shown in Figure 6 are described below.

Step 1 – Application retrieves the field-level encrypted data

The example application retrieves the field-level encrypted data from persistent storage that had been previously written during the data ingestion process.

Step 2 – Application invokes the decryption Lambda function

The application invokes a Lambda function responsible for performing field-level decryption, sending the retrieved data to Lambda.

Step 3 – Lambda calls the AWS KMS decryption API

The Lambda function uses AWS KMS for RSA decryption. The example calls the KMS decryption API that inputs ciphertext and returns plaintext. The actual decryption happens in KMS; the RSA private key is never exposed to the application, which is a highly desirable characteristic for building secure applications.

Note: If you choose to use an external key pair, then you can securely store the RSA private key in AWS services like AWS Systems Manager Parameter Store or AWS Secrets Manager and control access to the key through IAM and resource policies. You can fetch the key from relevant vault using the vault’s API, then decrypt using the standard RSA implementation available in your programming language. For example, the cryptography toolkit in Python or javax.crypto in Java.

The Lambda function Python code for decryption is shown below.

import base64
import boto3
import re

kms_client = boto3.client('kms')
CIPHERTEXT_PREFIX = "#01#"
CIPHERTEXT_SUFFIX = "#10#"

# This lambda function extracts event body, searches for and decrypts ciphertext 
# fields surrounded by provided prefix and suffix strings in arbitrary text bodies 
# and substitutes plaintext fields in-place.  
def lambda_handler(event, context):    
    org_data = event["body"]
    mod_data = unprotect_fields(org_data, CIPHERTEXT_PREFIX, CIPHERTEXT_SUFFIX)
    return mod_data

# Helper function that performs non-greedy regex search for ciphertext strings on
# input data and performs RSA decryption of them using AWS KMS 
def unprotect_fields(org_data, prefix, suffix):
    regex_pattern = prefix + "(.*?)" + suffix
    mod_data_parts = []
    cursor = 0

    # Search ciphertexts iteratively using python regular expression module
    for match in re.finditer(regex_pattern, org_data):
        mod_data_parts.append(org_data[cursor: match.start()])
        try:
            # Ciphertext was stored as Base64 encoded in our example. Decode it.
            ciphertext = base64.b64decode(match.group(1))

            # Decrypt ciphertext using AWS KMS  
            decrypt_rsp = kms_client.decrypt(
                EncryptionAlgorithm="RSAES_OAEP_SHA_256",
                KeyId="<Your-Key-ID>",
                CiphertextBlob=ciphertext)
            decrypted_val = decrypt_rsp["Plaintext"].decode("utf-8")
            mod_data_parts.append(decrypted_val)
        except Exception as e:
            print ("Exception: " + str(e))
            return None
        cursor = match.end()

    mod_data_parts.append(org_data[cursor:])
    return "".join(mod_data_parts)

The function performs a regular expression search in the input data body looking for ciphertext strings bracketed in predefined prefix and suffix strings that were added during encryption.

While iterating over ciphertext strings one-by-one, the function calls the AWS KMS decrypt() API. The example function uses the same RSA encryption algorithm properties—OAEP and SHA-256—and the Key ID of the public key that was used during encryption in Lambda@Edge.

Note that the Key ID itself is not a secret. Any application can be configured with it, but that doesn’t mean any application will be able to perform decryption. The security control here is that the AWS KMS key policy must allow the caller to use the Key ID to perform the decryption. An additional security control is provided by Lambda execution role that should allow calling the KMS decrypt() API.

Step 4 – AWS KMS decrypts ciphertext and returns plaintext

To ensure that only authorized users can perform decrypt operation, the KMS is configured as described in Using key policies in AWS KMS. In addition, the Lambda IAM execution role is configured as described in AWS Lambda execution role to allow it to access KMS. If both the key policy and IAM policy conditions are met, KMS returns the decrypted plaintext. Lambda substitutes the plaintext in place of ciphertext in the encapsulating data body.

Steps three and four are repeated for each ciphertext string.

Step 5 – Lambda returns decrypted data body

Once all the ciphertext has been converted to plaintext and substituted in the larger data body, the Lambda function returns the modified data body to the client application.

Conclusion

In this post, I demonstrated how you can implement field-level encryption integrated with AWS KMS to help protect sensitive data workloads for their entire lifecycle in AWS. Since your Lambda@Edge is designed to protect data at the network edge, data remains protected throughout the application execution stack. In addition to improving your data security posture, this protection can help you comply with data privacy regulations applicable to your organization.

Since you author your own Lambda@Edge function to perform standard RSA encryption, you have flexibility in terms of payload formats and the number of fields that you consider to be sensitive. The integration with AWS KMS for RSA key management and decryption provides significant simplicity, higher key security, and rich integration with other AWS security services enabling an overall strong security solution.

By using encrypted fields with identifiers as described in this post, you can create fine-grained controls for data accessibility to meet the security principle of least privilege. Instead of granting either complete access or no access to data fields, you can ensure least privileges where a given part of an application can only access the fields that it needs, when it needs to, all the way down to controlling access field by field. Field by field access can be enabled by using different keys for different fields and controlling their respective policies.

In addition to protecting sensitive data workloads to meet regulatory and security best practices, this solution can be used to build de-identified data lakes in AWS. Sensitive data fields remain protected throughout their lifecycle, while non-sensitive data fields remain in the clear. This approach can allow analytics or other business functions to operate on data without exposing sensitive data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Mitigate data leakage through the use of AppStream 2.0 and end-to-end auditing

2021-02-10 Chaim Landau

Post Syndicated from Chaim Landau original https://aws.amazon.com/blogs/security/mitigate-data-leakage-through-the-use-of-appstream-2-0-and-end-to-end-auditing/

Customers want to use AWS services to operate on their most sensitive data, but they want to make sure that only the right people have access to that data. Even when the right people are accessing data, customers want to account for what actions those users took while accessing the data.

In this post, we show you how you can use Amazon AppStream 2.0 to grant isolated access to sensitive data and decrease your attack surface. In addition, we show you how to achieve end-to-end auditing, which is designed to provide full traceability of all activities around your data.

To demonstrate this idea, we built a sample solution that provides a data scientist with access to an Amazon SageMaker Studio notebook using AppStream 2.0. The solution deploys a new Amazon Virtual Private Cloud (Amazon VPC) with isolated subnets, where the SageMaker notebook and AppStream 2.0 instances are set up.

Why AppStream 2.0?

AppStream 2.0 is a fully-managed, non-persistent application and desktop streaming service that provides access to desktop applications from anywhere by using an HTML5-compatible desktop browser.

Each time you launch an AppStream 2.0 session, a freshly-built, pre-provisioned instance is provided, using a prebuilt image. As soon as you close your session and the disconnect timeout period is reached, the instance is terminated. This allows you to carefully control the user experience and helps to ensure a consistent, secure environment each time. AppStream 2.0 also lets you enforce restrictions on user sessions, such as disabling the clipboard, file transfers, or printing.

Furthermore, AppStream 2.0 uses AWS Identity and Access Management (IAM) roles to grant fine-grained access to other AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon SageMaker, and other AWS services. This gives you both control over the access as well as an accounting, via Amazon CloudTrail, of what actions were taken and when.

These features make AppStream 2.0 uniquely suitable for environments that require high security and isolation.

Why SageMaker?

Developers and data scientists use SageMaker to build, train, and deploy machine learning models quickly. SageMaker does most of the work of each step of the machine learning process to help users develop high-quality models. SageMaker access from within AppStream 2.0 provides your data scientists and analysts with a suite of common and familiar data-science packages to use against isolated data.

Solution architecture overview

This solution allows a data scientist to work with a data set while connected to an isolated environment that doesn’t have an outbound path to the internet.

First, you build an Amazon VPC with isolated subnets and with no internet gateways attached. This ensures that any instances stood up in the environment don’t have access to the internet. To provide the resources inside the isolated subnets with a path to commercial AWS services such as Amazon S3, SageMaker, AWS System Manager you build VPC endpoints and attach them to the VPC, as shown in Figure 1.

Figure 1: Network Diagram

You then build an AppStream 2.0 stack and fleet, and attach a security group and IAM role to the fleet. The purpose of the IAM role is to provide the AppStream 2.0 instances with access to downstream AWS services such as Amazon S3 and SageMaker. The IAM role design follows the least privilege model, to ensure that only the access required for each task is granted.

During the building of the stack, you will enable AppStream 2.0 Home Folders. This feature builds an S3 bucket where users can store files from inside their AppStream 2.0 session. The bucket is designed with a dedicated prefix for each user, where only they have access. We use this prefix to store the user’s pre-signed SagaMaker URLs, ensuring that no one user can access another users SageMaker Notebook.

You then deploy a SageMaker notebook for the data scientist to use to access and analyze the isolated data.

To confirm that the user ID on the AppStream 2.0 session hasn’t been spoofed, you create an AWS Lambda function that compares the user ID of the data scientist against the AppStream 2.0 session ID. If the user ID and session ID match, this indicates that the user ID hasn’t been impersonated.

Once the session has been validated, the Lambda function generates a pre-signed SageMaker URL that gives the data scientist access to the notebook.

Finally, you enable AppStream 2.0 usage reports to ensure that you have end-to-end auditing of your environment.

To help you easily deploy this solution into your environment, we’ve built an AWS Cloud Development Kit (AWS CDK) application and stacks, using Python. To deploy this solution, you can go to the Solution deployment section in this blog post.

Note: this solution was built with all resources being in a single AWS Region. The support of multi Region is possible but isn’t part of this blog post.

Solution requirements

Before you build a solution, you must know your security requirements. The solution in this post assumes a set of standard security requirements that you typically find in an enterprise environment:

User authentication is provided by a Security Assertion Markup Language (SAML) identity provider (IdP).
IAM roles are used to access AWS services such as Amazon S3 and SageMaker.
AWS IAM access keys and secret keys are prohibited.
IAM policies follow the least privilege model so that only the required access is granted.
Windows clipboard, file transfer, and printing to local devices is prohibited.
Auditing and traceability of all activities is required.

Note: before you will be able to integrate SAML with AppStream 2.0, you will need to follow the AppStream 2.0 Integration with SAML 2.0 guide. There are quite a few steps and it will take some time to set up. SAML authentication is optional, however. If you just want to prototype the solution and see how it works, you can do that without enabling SAML integration.

Solution components

This solution uses the following technologies:

Amazon VPC – provides an isolated network where the solution will be deployed.
VPC endpoints – provide access from the isolated network to commercial AWS services such as Amazon S3 and SageMaker.
AWS Systems Manager – stores parameters such as S3 bucket names.
AppStream 2.0 – provides hardened instances to run the solution on.
AppStream 2.0 home folders – store users’ session information.
Amazon S3 – stores application scripts and pre-signed SageMaker URLs.
SageMaker notebook – provides data scientists with tools to access the data.
AWS Lambda – runs scripts to validate the data scientist’s session, and generates pre-signed URLs for the SageMaker notebook.
AWS CDK – deploys the solution.
PowerShell – processes scripts on AppStream 2.0 Microsoft Windows instances.

Solution high-level design and process flow

The following figure is a high-level depiction of the solution and its process flow.

Figure 2: Solution process flow

The process flow—illustrated in Figure 2—is:

A data scientist clicks on an AppStream 2.0 federated or a streaming URL.
1. If it’s a federated URL, the data scientist authenticates using their corporate credentials, as well as MFA if required.
1. If it’s a streaming URL, no further authentication is required.
The data scientist is presented with a PowerShell application that’s been made available to them.
After starting the application, it starts the PowerShell script on an AppStream 2.0 instance.
The script then:
1. Downloads a second PowerShell script from an S3 bucket.
2. Collects local AppStream 2.0 environment variables:
  1. AppStream_UserName
  2. AppStream_Session_ID
  3. AppStream_Resource_Name
3. Stores the variables in the session.json file and copies the file to the home folder of the session on Amazon S3.
The PUT event of the JSON file into the Amazon S3 bucket triggers an AWS Lambda function that performs the following:
1. Reads the session.json file from the user’s home folder on Amazon S3.
2. Performs a describe action against the AppStream 2.0 API to ensure that the session ID and the user ID match. This helps to prevent the user from manipulating the local environment variable to pretend to be someone else (spoofing), and potentially gain access to unauthorized data.
3. If the session ID and user ID match, a pre-signed SageMaker URL is generated and stored in session_url.txt, and copied to the user’s home folder on Amazon S3.
4. If the session ID and user ID do not match, the Lambda function ends without generating a pre-signed URL.
When the PowerShell script detects the session_url.txt file, it opens the URL, giving the user access to their SageMaker notebook.

Code structure

To help you deploy this solution in your environment, we’ve built a set of code that you can use. The code is mostly written in Python and for the AWS CDK framework, and with an AWS CDK application and some PowerShell scripts.

Note: We have chosen the default settings on many of the AWS resources our code deploys. Before deploying the code, you should conduct a thorough code review to ensure the resources you are deploying meet your organization’s requirements.

AWS CDK application – ./app.py

To make this application modular and portable, we’ve structured it in separate AWS CDK nested stacks:

vpc-stack – deploys a VPC with two isolated subnets, along with three VPC endpoints.
s3-stack – deploys an S3 bucket, copies the AppStream 2.0 PowerShell scripts, and stores the bucket name in an SSM parameter.
appstream-service-roles-stack – deploys AppStream 2.0 service roles.
appstream-stack – deploys the AppStream 2.0 stack and fleet, along with the required IAM roles and security groups.
appstream-start-fleet-stack – builds a custom resource that starts the AppStream 2.0 fleet.
notebook-stack – deploys a SageMaker notebook, along with IAM roles, security groups, and an AWS Key Management Service (AWS KMS) encryption key.
saml-stack – deploys a SAML role as a placeholder for SAML authentication.

PowerShell scripts

The solution uses the following PowerShell scripts inside the AppStream 2.0 instances:

sagemaker-notebook-launcher.ps1 – This script is part of the AppStream 2.0 image and downloads the sagemaker-notebook.ps1 script.
sagemaker-notebook.ps1 – starts the process of validating the session and generating the SageMaker pre-signed URL.

Note: Having the second script reside on Amazon S3 provides flexibility. You can modify this script without having to create a new AppStream 2.0 image.

Deployment Prerequisites

To deploy this solution, your deployment environment must meet the following prerequisites:

A machine with Python, Git, and AWS Command Line interface (AWS CLI) installed, to be used for the deployment.
Download the solution code from Github.
(Optional) A SAML federation solution that’s tied to your corporate directory.

Note: We used AWS Cloud9 with Amazon Linux 2 to test this solution, as it comes preinstalled with most of the prerequisites for deploying this solution.

Deploy the solution

Now that you know the design and components, you’re ready to deploy the solution.

Note: In our demo solution, we deploy two stream.standard.small AppStream 2.0 instances, using Windows Server 2019. This gives you a reasonable example to work from. In your own environment you might need more instances, a different instance type, or a different version of Windows. Likewise, we deploy a single SageMaker notebook instance of type ml.t3.medium. To change the AppStream 2.0 and SageMaker instance types, you will need to modify the stacks/data_sandbox_appstream.py and stacks/data_sandbox_notebook.py respectively.

Step 1: AppStream 2.0 image

An AppStream 2.0 image contains applications that you can stream to your users. It’s what allows you to curate the user experience by preconfiguring the settings of the applications you stream to your users.

To build an AppStream 2.0 image:

Build an image following the Create a Custom AppStream 2.0 Image by Using the AppStream 2.0 Console tutorial.

Note: In Step 1: Install Applications on the Image Builder in this tutorial, you will be asked to choose an Instance family. For this example, we chose General Purpose. If you choose a different Instance family, you will need to make sure the appstream_instance_type specified under Step 2: Code modification is of the same family.

In Step 6: Finish Creating Your Image in this tutorial, you will be asked to provide a unique image name. Note down the image name as you will need it in Step 2 of this blog post.
Copy notebook-launcher.ps1 to a location on the image. We recommend that you copy it to C:\AppStream.
In Step 2—Create an AppStream 2.0 Application Catalog—of the tutorial, use C:\Windows\System32\Windowspowershell\v1.0\powershell.exe as the application, and the path to notebook-launcher.ps1 as the launch parameter.

Note: While testing your application during the image building process, the PowerShell script will fail because the underlying infrastructure is not present. You can ignore that failure during the image building process.

Step 2: Code modification

Next, you must modify some of the code to fit your environment.

Make the following changes in the cdk.json file:

vpc_cidr – Supply your preferred CIDR range to be used for the VPC.

Note: VPC CIDR ranges are your private IP space and thus can consist of any valid RFC 1918 range. However, if the VPC you are planning on using for AppStream 2.0 needs to connect to other parts of your private network (on premise or other VPCs), you need to choose a range that does not conflict or overlap with the rest of your infrastructure.
appstream_Image_name – Enter the image name you chose when you built the Appstream 2.0 image in Step 1.a.
appstream_environment_name – The environment name is strictly cosmetic and drives the naming of your AppStream 2.0 stack and fleet.
appstream_instance_type – Enter the AppStream 2.0 instance type. The instance type must be part of the same instance family you used in Step 1 of the To build an AppStream 2.0 image section. For a list of AppStream 2.0 instances, visit https://aws.amazon.com/appstream2/pricing/.
appstream_fleet_type – Enter the fleet type. Allowed values are ALWAYS_ON or ON_DEMAND.
Idp_name – If you have integrated SAML with this solution, you will need to enter the IdP name you chose when creating the SAML provider in the IAM Console.

Step 3: Deploy the AWS CDK application

The CDK application deploys the CDK stacks.

The stacks include:

VPC with isolated subnets
VPC Endpoints for S3, SageMaker, and Systems Manager
S3 bucket
AppStream 2.0 stack and fleet
Two AppStream 2.0 stream.standard.small instances
A single SageMaker ml.t2.medium notebook

Run the following commands to deploy the AWS CDK application:

Install the AWS CDK Toolkit.
```
npm install -g aws-cdk
```

Create and activate a virtual environment.

python -m venv .datasandbox-env

source .datasandbox-env/bin/activate

Change directory to the root folder of the code repository.
Install the required packages.
```
pip install -r requirements.txt
```
If you haven’t used AWS CDK in your account yet, run:
```
cdk bootstrap
```
Deploy the AWS CDK stack.
```
cdk deploy DataSandbox
```

Step 4: Test the solution

After the stack has successfully deployed, allow approximately 25 minutes for the AppStream 2.0 fleet to reach a running state. Testing will fail if the fleet isn’t running.

Without SAML

If you haven’t added SAML authentication, use the following steps to test the solution.

In the AWS Management Console, go to AppStream 2.0 and then to Stacks.
Select the stack, and then select Action.
Select Create streaming URL.
Enter any user name and select Get URL.
Enter the URL in another tab of your browser and test your application.

With SAML

If you are using SAML authentication, you will have a federated login URL that you need to visit.

If everything is working, your SageMaker notebook will be launched as shown in Figure 3.

Figure 3: SageMaker Notebook

Note: if you receive a web browser timeout, verify that the SageMaker notebook instance “Data-Sandbox-Notebook” is currently in InService status.

Auditing

Auditing for this solution is provided through AWS CloudTrail and AppStream 2.0 Usage Reports. Though CloudTrail is enabled by default, to collect and store the CloudTrail logs, you must create a trail for your AWS account.

The following logs will be available for you to use, to provide auditing.

Login – CloudTrail
- User ID
- IAM SAML role
- AppStream stack
S3 – CloudTrail
- IAM role
- Source IP
- S3 bucket
- Amazon S3 object
AppStream 2.0 instance IP address – AppStream 2.0 usage reports
- User ID
- Session ID
- Elastic network interface

Connecting the dots

To get an accurate idea of your users’ activity, you have to correlate some logs from different services. First, you collect the login information from CloudTrail. This gives you the user ID of the user who logged in. You then collect the Amazon S3 put from CloudTrail, which gives you the IP address of the AppStream 2.0 instance. And finally, you collect the AppStream 2.0 usage report which gives you the IP address of the AppStream 2.0 instance, plus the user ID. This allows you to connect the user ID to the activity on Amazon S3. For auditing & controlling exploration activities with SageMaker, please visit this GitHub repository.

Though the logs are automatically being collected, what we have shown you here is a manual way of sifting through those logs. For a more robust solution on querying and analyzing CloudTrail logs, visit Querying AWS CloudTrail Logs.

Costs of this Solution

The cost for running this solution will depend on a number of factors like the instance size, the amount of data you store, and how many hours you use the solution. AppStream 2.0 is charged per instance hour and there is one instance in this example solution. You can see details on the AppStream 2.0 pricing page. VPC endpoints are charged by the hour and by how much data passes through them. There are three VPC endpoints in this solution (S3, System Manager, and SageMaker). VPC endpoint pricing is described on the Privatelink pricing page. SageMaker Notebooks are charged based on the number of instance hours and the instance type. There is one SageMaker instance in this solution, which may be eligible for free tier pricing. See the SageMaker pricing page for more details. Amazon S3 storage pricing depends on how much data you store, what kind of storage you use, and how much data transfers in and out of S3. The use in this solution may be eligible for free tier pricing. You can see details on the S3 pricing page.

Before deploying this solution, make sure to calculate your cost using the AWS Pricing Calculator, and the AppStream 2.0 pricing calculator.

Conclusion

Congratulations! You have deployed a solution that provides your users with access to sensitive and isolated data in a secure manner using AppStream 2.0. You have also implemented a mechanism that is designed to prevent user impersonation, and enabled end-to-end auditing of all user activities.

To learn about how Amazon is using AppStream 2.0, visit the blog post How Amazon uses AppStream 2.0 to provide data scientists and analysts with access to sensitive data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Building end-to-end AWS DevSecOps CI/CD pipeline with open source SCA, SAST and DAST tools

2021-01-22 Srinivas Manepalli

Post Syndicated from Srinivas Manepalli original https://aws.amazon.com/blogs/devops/building-end-to-end-aws-devsecops-ci-cd-pipeline-with-open-source-sca-sast-and-dast-tools/

DevOps is a combination of cultural philosophies, practices, and tools that combine software development with information technology operations. These combined practices enable companies to deliver new application features and improved services to customers at a higher velocity. DevSecOps takes this a step further, integrating security into DevOps. With DevSecOps, you can deliver secure and compliant application changes rapidly while running operations consistently with automation.

Having a complete DevSecOps pipeline is critical to building a successful software factory, which includes continuous integration (CI), continuous delivery and deployment (CD), continuous testing, continuous logging and monitoring, auditing and governance, and operations. Identifying the vulnerabilities during the initial stages of the software development process can significantly help reduce the overall cost of developing application changes, but doing it in an automated fashion can accelerate the delivery of these changes as well.

To identify security vulnerabilities at various stages, organizations can integrate various tools and services (cloud and third-party) into their DevSecOps pipelines. Integrating various tools and aggregating the vulnerability findings can be a challenge to do from scratch. AWS has the services and tools necessary to accelerate this objective and provides the flexibility to build DevSecOps pipelines with easy integrations of AWS cloud native and third-party tools. AWS also provides services to aggregate security findings.

In this post, we provide a DevSecOps pipeline reference architecture on AWS that covers the afore-mentioned practices, including SCA (Software Composite Analysis), SAST (Static Application Security Testing), DAST (Dynamic Application Security Testing), and aggregation of vulnerability findings into a single pane of glass. Additionally, this post addresses the concepts of security of the pipeline and security in the pipeline.

You can deploy this pipeline in either the AWS GovCloud Region (US) or standard AWS Regions. As of this writing, all listed AWS services are available in AWS GovCloud (US) and authorized for FedRAMP High workloads within the Region, with the exception of AWS CodePipeline and AWS Security Hub, which are in the Region and currently under the JAB Review to be authorized shortly for FedRAMP High as well.

Services and tools

In this section, we discuss the various AWS services and third-party tools used in this solution.

CI/CD services

For CI/CD, we use the following AWS services:

AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.
AWS CodeCommit – A fully managed source control service that hosts secure Git-based repositories.
AWS CodeDeploy – A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Fargate, AWS Lambda, and your on-premises servers.
AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates.
AWS Lambda – A service that lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
Amazon Simple Notification Service – Amazon SNS is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.
Amazon Simple Storage Service – Amazon S3 is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.
AWS Systems Manager Parameter Store – Parameter Store gives you visibility and control of your infrastructure on AWS.

Continuous testing tools

The following are open-source scanning tools that are integrated in the pipeline for the purposes of this post, but you could integrate other tools that meet your specific requirements. You can use the static code review tool Amazon CodeGuru for static analysis, but at the time of this writing, it’s not yet available in GovCloud and currently supports Java and Python (available in preview).

OWASP Dependency-Check – A Software Composition Analysis (SCA) tool that attempts to detect publicly disclosed vulnerabilities contained within a project’s dependencies.
SonarQube (SAST) – Catches bugs and vulnerabilities in your app, with thousands of automated Static Code Analysis rules.
PHPStan (SAST) – Focuses on finding errors in your code without actually running it. It catches whole classes of bugs even before you write tests for the code.
OWASP Zap (DAST) – Helps you automatically find security vulnerabilities in your web applications while you’re developing and testing your applications.

Continuous logging and monitoring services

The following are AWS services for continuous logging and monitoring:

AWS CloudWatch Logs – Allows you to monitor, store, and access your log files from EC2 instances, AWS CloudTrail, Amazon Route 53, and other sources
AWS CloudWatch Events – Delivers a near-real-time stream of system events that describe changes in AWS resources

Auditing and governance services

The following are AWS auditing and governance services:

AWS CloudTrail – Enables governance, compliance, operational auditing, and risk auditing of your AWS account.
AWS Identity and Access Management – Enables you to manage access to AWS services and resources securely. With IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.
AWS Config – Allows you to assess, audit, and evaluate the configurations of your AWS resources.

Operations services

The following are AWS operations services:

AWS Security Hub – Gives you a comprehensive view of your security alerts and security posture across your AWS accounts. This post uses Security Hub to aggregate all the vulnerability findings as a single pane of glass.
AWS CloudFormation – Gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.
AWS Systems Manager Parameter Store – Provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values.
AWS Elastic Beanstalk – An easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS. This post uses Elastic Beanstalk to deploy LAMP stack with WordPress and Amazon Aurora MySQL. Although we use Elastic Beanstalk for this post, you could configure the pipeline to deploy to various other environments on AWS or elsewhere as needed.

Pipeline architecture

The following diagram shows the architecture of the solution.

AWS DevSecOps CICD pipeline architecture

The main steps are as follows:

When a user commits the code to a CodeCommit repository, a CloudWatch event is generated which, triggers CodePipeline.
CodeBuild packages the build and uploads the artifacts to an S3 bucket. CodeBuild retrieves the authentication information (for example, scanning tool tokens) from Parameter Store to initiate the scanning. As a best practice, it is recommended to utilize Artifact repositories like AWS CodeArtifact to store the artifacts, instead of S3. For simplicity of the workshop, we will continue to use S3.
CodeBuild scans the code with an SCA tool (OWASP Dependency-Check) and SAST tool (SonarQube or PHPStan; in the provided CloudFormation template, you can pick one of these tools during the deployment, but CodeBuild is fully enabled for a bring your own tool approach).
If there are any vulnerabilities either from SCA analysis or SAST analysis, CodeBuild invokes the Lambda function. The function parses the results into AWS Security Finding Format (ASFF) and posts it to Security Hub. Security Hub helps aggregate and view all the vulnerability findings in one place as a single pane of glass. The Lambda function also uploads the scanning results to an S3 bucket.
If there are no vulnerabilities, CodeDeploy deploys the code to the staging Elastic Beanstalk environment.
After the deployment succeeds, CodeBuild triggers the DAST scanning with the OWASP ZAP tool (again, this is fully enabled for a bring your own tool approach).
If there are any vulnerabilities, CodeBuild invokes the Lambda function, which parses the results into ASFF and posts it to Security Hub. The function also uploads the scanning results to an S3 bucket (similar to step 4).
If there are no vulnerabilities, the approval stage is triggered, and an email is sent to the approver for action.
After approval, CodeDeploy deploys the code to the production Elastic Beanstalk environment.
During the pipeline run, CloudWatch Events captures the build state changes and sends email notifications to subscribed users through SNS notifications.
CloudTrail tracks the API calls and send notifications on critical events on the pipeline and CodeBuild projects, such as UpdatePipeline, DeletePipeline, CreateProject, and DeleteProject, for auditing purposes.
AWS Config tracks all the configuration changes of AWS services. The following AWS Config rules are added in this pipeline as security best practices:
CODEBUILD_PROJECT_ENVVAR_AWSCRED_CHECK – Checks whether the project contains environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The rule is NON_COMPLIANT when the project environment variables contains plaintext credentials.
CLOUD_TRAIL_LOG_FILE_VALIDATION_ENABLED – Checks whether CloudTrail creates a signed digest file with logs. AWS recommends that the file validation be enabled on all trails. The rule is noncompliant if the validation is not enabled.

Security of the pipeline is implemented by using IAM roles and S3 bucket policies to restrict access to pipeline resources. Pipeline data at rest and in transit is protected using encryption and SSL secure transport. We use Parameter Store to store sensitive information such as API tokens and passwords. To be fully compliant with frameworks such as FedRAMP, other things may be required, such as MFA.

Security in the pipeline is implemented by performing the SCA, SAST and DAST security checks. Alternatively, the pipeline can utilize IAST (Interactive Application Security Testing) techniques that would combine SAST and DAST stages.

As a best practice, encryption should be enabled for the code and artifacts, whether at rest or transit.

In the next section, we explain how to deploy and run the pipeline CloudFormation template used for this example. Refer to the provided service links to learn more about each of the services in the pipeline. If utilizing CloudFormation templates to deploy infrastructure using pipelines, we recommend using linting tools like cfn-nag to scan CloudFormation templates for security vulnerabilities.

Prerequisites

Before getting started, make sure you have the following prerequisites:

Elastic Beanstalk environments with an application deployed. In this post, we use WordPress, but you can use any other application. For more information, see Deploying a high-availability WordPress website with an external Amazon RDS database to Elastic Beanstalk.
A CodeCommit repo with your application code. For more information, see Create an AWS CodeCommit repository.
The provided buildspec-*.yml files, sonar-project.properties file, json file, and phpstan.neon file uploaded to the root of the application code repository.
The Lambda function uploaded to a S3 bucket. We use this function to parse the scanning reports and post the results to Security Hub.
A SonarQube URL and generated API token for code scanning.
An OWASP ZAP URL and generated API key for dynamic web scanning.
An application web URL to run the DAST testing.
An email address to receive approval notifications for deployment, pipeline change notifications, and CloudTrail events.
AWS Config and Security Hub enabled. For instructions, see Managing the Configuration Recorder and Enabling Security Hub manually, respectively.

Deploying the pipeline

To deploy the pipeline, complete the following steps: Download the CloudFormation template and pipeline code from GitHub repo.

Log in to your AWS account if you have not done so already.
On the CloudFormation console, choose Create Stack.
Choose the CloudFormation pipeline template.
Choose Next.
Provide the stack parameters:
- Under Code, provide code details, such as repository name and the branch to trigger the pipeline.
- Under SAST, choose the SAST tool (SonarQube or PHPStan) for code analysis, enter the API token and the SAST tool URL. You can skip SonarQube details if using PHPStan as the SAST tool.
- Under DAST, choose the DAST tool (OWASP Zap) for dynamic testing and enter the API token, DAST tool URL, and the application URL to run the scan.
- Under Lambda functions, enter the Lambda function S3 bucket name, filename, and the handler name.
- Under STG Elastic Beanstalk Environment and PRD Elastic Beanstalk Environment, enter the Elastic Beanstalk environment and application details for staging and production to which this pipeline deploys the application code.
- Under General, enter the email addresses to receive notifications for approvals and pipeline status changes.

CF Deploymenet - Passing parameter values

CloudFormation deployment - Passing parameter values

CloudFormation template deployment

After the pipeline is deployed, confirm the subscription by choosing the provided link in the email to receive the notifications.

Running the pipeline

0 – INFORMATIONAL
1–39 – LOW
40– 69 – MEDIUM
70–89 – HIGH
90–100 – CRITICAL

The following screenshot shows the progression of your pipeline.

CodePipeline stages

SCA and SAST scanning

In our architecture, CodeBuild trigger the SCA and SAST scanning in parallel. In this section, we discuss scanning with OWASP Dependency-Check, SonarQube, and PHPStan.

Scanning with OWASP Dependency-Check (SCA)

The following is the code snippet from the Lambda function, where the SCA analysis results are parsed and posted to Security Hub. Based on the results, the equivalent Security Hub severity level (normalized_severity) is assigned.

Lambda code snippet for OWASP Dependency-check

You can see the results in Security Hub, as in the following screenshot.

SecurityHub report from OWASP Dependency-check scanning

Scanning with SonarQube (SAST)

The following is the code snippet from the Lambda function, where the SonarQube code analysis results are parsed and posted to Security Hub. Based on SonarQube results, the equivalent Security Hub severity level (normalized_severity) is assigned.

Lambda code snippet for SonarQube

The following screenshot shows the results in Security Hub.

SecurityHub report from SonarQube scanning

Scanning with PHPStan (SAST)

The following is the code snippet from the Lambda function, where the PHPStan code analysis results are parsed and posted to Security Hub.

Lambda code snippet for PHPStan

The following screenshot shows the results in Security Hub.

SecurityHub report from PHPStan scanning

DAST scanning

In our architecture, CodeBuild triggers DAST scanning and the DAST tool.

If there are no vulnerabilities in the SAST scan, the pipeline proceeds to the manual approval stage and an email is sent to the approver. The approver can review and approve or reject the deployment. If approved, the pipeline moves to next stage and deploys the application to the provided Elastic Beanstalk environment.

Scanning with OWASP Zap

After deployment is successful, CodeBuild initiates the DAST scanning. When scanning is complete, if there are any vulnerabilities, it invokes the Lambda function similar to SAST analysis. The function parses and posts the results to Security Hub. The following is the code snippet of the Lambda function.

Lambda code snippet for OWASP-Zap

The following screenshot shows the results in Security Hub.

SecurityHub report from OWASP-Zap scanning

Conclusion

In this post, I presented a DevSecOps pipeline that includes CI/CD, continuous testing, continuous logging and monitoring, auditing and governance, and operations. I demonstrated how to integrate various open-source scanning tools, such as SonarQube, PHPStan, and OWASP Zap for SAST and DAST analysis. I explained how to aggregate vulnerability findings in Security Hub as a single pane of glass. This post also talked about how to implement security of the pipeline and in the pipeline using AWS cloud native services. Finally, I provided the DevSecOps pipeline as code using AWS CloudFormation. For additional information on AWS DevOps services and to get started, see AWS DevOps and DevOps Blog.

Automatically update security groups for Amazon CloudFront IP ranges using AWS Lambda

2020-11-21 Yeshwanth Kottu

Post Syndicated from Yeshwanth Kottu original https://aws.amazon.com/blogs/security/automatically-update-security-groups-for-amazon-cloudfront-ip-ranges-using-aws-lambda/

Amazon CloudFront is a content delivery network that can help you increase the performance of your web applications and significantly lower the latency of delivering content to your customers. For CloudFront to access an origin (the source of the content behind CloudFront), the origin has to be publicly available and reachable. Anyone with the origin domain name or IP address could request content directly and bypass CloudFront. In this blog post, I describe an automated solution that uses security groups to permit only CloudFront to access the origin.

Amazon Simple Storage Service (Amazon S3) origins provide a feature called Origin Access Identity, which blocks public access to selected buckets, making them accessible only through CloudFront. When you use CloudFront to secure your web applications, it’s important to ensure that only CloudFront can access your origin (such as Amazon Elastic Cloud Compute (Amazon EC2) or Application Load Balancer (ALB)) and any direct access to origin is restricted. This blog post shows you how to create an AWS Lambda function to automatically update Amazon Virtual Private Cloud (Amazon VPC) security groups with CloudFront service IP ranges to permit only CloudFront to access the origin.

AWS publishes the IP ranges in JSON format for CloudFront and other AWS services. If your origin is an Elastic Load Balancer or an Amazon EC2 instance, you can use VPC security groups to allow only CloudFront IP ranges to access your applications. The IP ranges in the list are separated by service and Region, and you must specify only the IP ranges that correspond to CloudFront.

The IP ranges that AWS publishes change frequently and without an automated solution, you would need to retrieve this document frequently to understand the current IP ranges for CloudFront. Frequent polling is inefficient because there is no notice of when the IP ranges change, and if these IP ranges aren’t modified immediately, your client might see 504 errors when they access CloudFront. Additionally, there are numerous IP ranges for each service, performing the change manually isn’t an efficient way of updating these ranges. This means you need infrastructure to support the task. However, in that case you end up with another host to manage—complete with the typical patching, deployment, and monitoring. As you can see, a small task could quickly become more complicated than the problem you intended to solve.

An Amazon Simple Notification Service (Amazon SNS) message is sent to a topic whenever the AWS IP ranges change. Enabling you to build an event-driven, serverless solution that updates the IP ranges for your security groups, as needed by using a Lambda function that is triggered in response to the SNS notification.

Here are the steps we are going to take to implement the solution:

Create your resources
1. Create an IAM policy and execution role for the Lambda function
2. Create your Lambda function
Test your Lambda function
Configure your Lambda function’s trigger

Create your resources

The first thing you need to do is create a Lambda function execution role and policy. Lambda function uses execution role to access or create AWS resources. This Lambda function is triggered by an SNS notification whenever there’s a change in the IP ranges document. Based on the number of IP ranges present for CloudFront and also the number of ports (for example, 80,443) that you want to whitelist on the origin, this Lambda function creates the required security groups. These security groups will allow only traffic from CloudFront to your ELB load balancers or EC2 instances.

Create an IAM policy and execution role for the Lambda function

When you create a Lambda function, it’s important to understand and properly define the security context for the Lambda function. Using AWS Identity and Access Management (IAM), you can create the Lambda execution role that determines the AWS service calls that the function is authorized to complete. (Learn more about the Lambda permissions model.)

To create the IAM policy for your role

Log in to the IAM console with the user account that you will use to manage the Lambda function. This account must have administrator permissions.
In the navigation pane, choose Policies.
In the content pane, choose Create policy.

Choose the JSON tab and copy the text from the following JSON policy document. Paste this text into the JSON text box.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CloudWatchPermissions",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Sid": "EC2Permissions",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeSecurityGroups",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:RevokeSecurityGroupIngress",
        "ec2:CreateSecurityGroup",
        "ec2:DescribeVpcs",
		"ec2:CreateTags",
        "ec2:ModifyNetworkInterfaceAttribute",
        "ec2:DescribeNetworkInterfaces"
        
      ],
      "Resource": "*"
    }
  ]
}

When you’re finished, choose Review policy.
On the Review page, enter a name for the policy name (e.g. LambdaExecRolePolicy-UpdateSecurityGroupsForCloudFront). Review the policy Summary to see the permissions granted by your policy, and then choose Create policy to save your work.

To understand what this policy allows, let’s look closely at both statements in the policy. The first statement allows the Lambda function to create and write to CloudWatch Logs, which is vital for debugging and monitoring our function. The second statement allows the function to get information about existing security groups, get existing VPC information, create security groups, and authorize and revoke ingress permissions. It’s an important best practice that your IAM policies be as granular as possible, to support the principal of least privilege.

Now that you’ve created your policy, you can create the Lambda execution role that will use the policy.

To create the Lambda execution role

In the navigation pane of the IAM console, choose Roles, and then choose Create role.
For Select type of trusted entity, choose AWS service.
Choose the service that you want to allow to assume this role. In this case, choose Lambda.
Choose Next: Permissions.
Search for the policy name that you created earlier and select the check box next to the policy.
Choose Next: Tags.
(Optional) Add metadata to the role by attaching tags as key-value pairs. For more information about using tags in IAM, see Tagging IAM Users and Roles.
Choose Next: Review.
For Role name (e.g. LambdaExecRole-UpdateSecurityGroupsForCloudFront), enter a name for your role.
(Optional) For Role description, enter a description for the new role.
Review the role, and then choose Create role.

Create your Lambda function

Now, create your Lambda function and configure the role that you created earlier as the execution role for this function.

To create the Lambda function

Go to the Lambda console in N. Virginia region and choose Create function. On the next page, choose Author from scratch. (I’ll be providing the code for your Lambda function, but for other functions, the Use a blueprint option can be a great way to get started.)
Give your Lambda function a name (e.g UpdateSecurityGroupsForCloudFront) and description, and select Python 3.8 from the Runtime menu.
Choose or create an execution role: Select the execution role you created earlier by selecting the option Use an Existing Role.
After confirming that your settings are correct, choose Create function.
Paste the Lambda function code from here.
Select Save.

Additionally, in the Basic Settings of the Lambda function, increase the timeout to 10 seconds.

To set the timeout value in the Lambda console

In the Lambda console, choose the function you just created.
Under Basic settings, choose Edit.
For Timeout, select 10s.
Choose Save.

By default, the Lambda function has these settings:

The Lambda function is configured to create security groups in the default VPC.
CloudFront IP ranges are updated as inbound rules on port 80.
The created security groups are tagged with the name prefix AUTOUPDATE.
Debug logging is turned off.
The service for which IP ranges are extracted is set to CloudFront.
The SDK client in the Lambda function set to us-east-1(N. Virginia).

If you want to customize these settings, set the following environment variables for the Lambda function. For more details, see Using AWS Lambda environment variables.

Action	Key-value data
To create security groups in a specific VPC	Key: VPC_ID Value: vpc-id
To create security groups rules for a different port or multiple ports The solution in this example supports a total of two ports. One can be used for HTTP and another for HTTPS.	Key: PORTS Value: portnumber or Key: PORTS Value: portnumber,portnumber
To customize the prefix name tag of your security groups	Key: PREFIX_NAME Value: custom-name
To enable debug logging to CloudWatch	Key: DEBUG Value: true
To extract IP ranges for a different service other than CloudFront	Key: SERVICE Value: servicename
To configure the Region for the SDK client used in the Lambda function If the CloudFront origin is present in a different Region than N. Virginia, the security groups must be created in that region.	Key: REGION Value: regionname

To set environment variables in the Lambda console

In the Lambda console, choose the function you created.
Under Environment variables, choose Edit.
Choose Add environment variable.
Enter a key and value.
Choose Save.

Test your Lambda function

Now that you’ve created your function, it’s time to test it and initialize your security group.

To create your test event for the Lambda function

In the Lambda console, on the Functions page, choose your function. In the drop-down menu next to Actions, choose Configure test events.
Enter an Event Name (e.g. TriggerSNS)

Replace the following as your sample event, which will represent an SNS notification and then select Create.

{
    "Records": [
        {
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:sns:EXAMPLE",
            "EventSource": "aws:sns",
            "Sns": {
                "SignatureVersion": "1",
                "Timestamp": "1970-01-01T00:00:00.000Z",
                "Signature": "EXAMPLE",
                "SigningCertUrl": "EXAMPLE",
                "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e",
                "Message": "{\"create-time\": \"yyyy-mm-ddThh:mm:ss+00:00\", \"synctoken\": \"0123456789\", \"md5\": \"7fd59f5c7f5cf643036cbd4443ad3e4b\", \"url\": \"https://ip-ranges.amazonaws.com/ip-ranges.json\"}",
                "Type": "Notification",
                "UnsubscribeUrl": "EXAMPLE",
                "TopicArn": "arn:aws:sns:EXAMPLE",
                "Subject": "TestInvoke"
            }
  		}
    ]
}

After you’ve added the test event, select Save and then select Test. Your Lambda function is then invoked, and you should see log output at the bottom of the console in Execution Result section, similar to the following.

Updating from https://ip-ranges.amazonaws.com/ip-ranges.json
MD5 Mismatch: got 2e967e943cf98ae998efeec05d4f351c expected 7fd59f5c7f5cf643036cbd4443ad3e4b: Exception
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 29, in lambda_handler
    ip_ranges = json.loads(get_ip_groups_json(message['url'], message['md5']))
  File "/var/task/lambda_function.py", line 50, in get_ip_groups_json
    raise Exception('MD5 Missmatch: got ' + hash + ' expected ' + expected_hash)
Exception: MD5 Mismatch: got 2e967e943cf98ae998efeec05d4f351c expected 7fd59f5c7f5cf643036cbd4443ad3e4b

Edit the sample event again, and this time change the md5 value in the sample event to be the first MD5 hash provided in the log output. In this example, you would update the md5 value in the sample event configured earlier with the hash value seen in the error ‘2e967e943cf98ae998efeec05d4f351c’. Lambda code successfully executes only when the original hash of the IP ranges document and the hash received from the event trigger match. After you modify the hash value from the error message received earlier, the test event matches the hash of the IP ranges document.
Select Save and test. This invokes your Lambda function.

After the function is invoked the second time with updated md5 has Lambda function should execute without any errors. You should be able to see the new security groups created and the IP ranges of CloudFront updated in the rules in the EC2 console, as shown in Figure 1.

Figure 1: EC2 console showing the security groups created

In the initial successful run of this function, it created the total number of security groups required to update all the IP ranges of CloudFront for the ports mentioned. The function creates security groups based on the maximum number of rules that can be added to individual security groups. The new security groups can be identified from the EC2 console by the name AUTOUPDATE_random if you used the default configuration, or a custom name if you provided a PREFIX_NAME.

You can now attach these security groups to your Elastic LoadBalancer or EC2 instances. If your log output is different from what is described here, the output should help you identify the issue.

Configure your Lambda function’s trigger

After you’ve validated that your function is executing properly, it’s time to connect it to the SNS topic for IP changes. To do this, use the AWS Command Line Interface (CLI). Enter the following command, making sure to replace Lambda ARN with the Amazon Resource Name (ARN) of your Lambda function. You can find this ARN at the top right when viewing the configuration of your Lambda function.

aws sns subscribe - -topic-arn "arn:aws:sns:us-east-1:806199016981:AmazonIpSpaceChanged" - -region us-east-1 - -protocol lambda - -notification-endpoint "Lambda ARN"

You should receive the ARN of your Lambda function’s SNS subscription.

Now add a permission that allows the Lambda function to be invoked by the SNS topic. The following command also adds the Lambda trigger.

aws lambda add-permission - -function-name "Lambda ARN" - -statement-id lambda-sns-trigger - -region us-east-1 - -action lambda:InvokeFunction - -principal sns.amazonaws.com - -source-arn "arn:aws:sns:us-east-1:806199016981:AmazonIpSpaceChanged"

When AWS changes any of the IP ranges in the document, an SNS notification is sent and your Lambda function will be triggered. This Lambda function verifies the modified ranges in the document and efficiently updates the IP ranges on the existing security groups. Additionally, the function dynamically scales and creates additional security groups if the number of IP ranges for CloudFront is increased in future. Any newly created security groups are automatically attached to the network interface where the previous security groups are attached in order to avoid service interruption.

Summary

As you followed this blog post, you created a Lambda function to create a security groups and update the security group’s rules dynamically whenever AWS publishes new internal service IP ranges. This solution has several advantages:

The solution isn’t designed as a periodic poll, so it only runs when it needs to.
It’s automatic, so you don’t need to update security groups manually which lowers the operational cost.
It’s simple, because you have no extra infrastructure to maintain as the solution is completely serverless.
It’s cost effective, because the Lambda function runs only when triggered by the AmazonIpSpaceChanged SNS topic and only runs for a few seconds, this solution costs only pennies to operate.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon CloudFront forum. If you have any other use cases for using Lambda functions to dynamically update security groups, or even other networking configurations such as VPC route tables or ACLs, we’d love to hear about them!

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Building, bundling, and deploying applications with the AWS CDK

2020-10-28 Cory Hall

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.

Concepts

A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

AWS CloudFormation templates and instructions on where to deploy them
Dockerfiles, corresponding application source code, and information about where to build and push the images to
File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),
});

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),
    },
  })
  ...
});

We specify two parameters:

image (required) – The Docker image to perform the build commands in
command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false
          }

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true
        },
      },
    },
  })
  ...
});

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

{
  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      },
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
        }
      }
    }
  }
}

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

{
  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          },
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        },
        "Handler": "main",
        ...
      }
    },
    ...
  }
}

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  ],
  destinationBucket: myBucket
});

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

yarn install – Installs all our dependencies
yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),
        ],
      },
    }),
  ],
  ...
});

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false
            }

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
          },
        },
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],
      },
    }),
  ],
  ...
});

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).

Conclusion

This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.

About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.

Improving customer experience and reducing cost with CodeGuru Profiler

2020-10-24 Rajesh

Post Syndicated from Rajesh original https://aws.amazon.com/blogs/devops/improving-customer-experience-and-reducing-cost-with-codeguru-profiler/

Amazon CodeGuru is a set of developer tools powered by machine learning that provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code. Amazon CodeGuru Profiler allows you to profile your applications in a low impact, always on manner. It helps you improve your application’s performance, reduce cost and diagnose application issues through rich data visualization and proactive recommendations. CodeGuru Profiler has been a very successful and widely used service within Amazon, before it was offered as a public service. This post discusses a few ways in which internal Amazon teams have used and benefited from continuous profiling of their production applications. These uses cases can provide you with better insights on how to reap similar benefits for your applications using CodeGuru Profiler.

Inside Amazon, over 100,000 applications currently use CodeGuru Profiler across various environments globally. Over the last few years, CodeGuru Profiler has served as an indispensable tool for resolving issues in the following three categories:

Performance bottlenecks, high latency and CPU utilization
Cost and Infrastructure utilization
Diagnosis of an application impacting event

API latency improvement for CodeGuru Profiler

What could be a better example than CodeGuru Profiler using itself to improve its own performance?
CodeGuru Profiler offers an API called BatchGetFrameMetricData, which allows you to fetch time series data for a set of frames or methods. We noticed that the 99th percentile latency (i.e. the slowest 1 percent of requests over a 5 minute period) metric for this API was approximately 5 seconds, higher than what we wanted for our customers.

Solution

CodeGuru Profiler is built on a micro service architecture, with the BatchGetFrameMetricData API implemented as set of AWS Lambda functions. It also leverages other AWS services such as Amazon DynamoDB to store data and Amazon CloudWatch to record performance metrics.

When investigating the latency issue, the team found that the 5-second latency spikes were happening during certain time intervals rather than continuously, which made it difficult to easily reproduce and determine the root cause of the issue in pre-production environment. The new Lambda profiling feature in CodeGuru came in handy, and so the team decided to enable profiling for all its Lambda functions. The low impact, continuous profiling capability of CodeGuru Profiler allowed the team to capture comprehensive profiles over a period of time, including when the latency spikes occurred, enabling the team to better understand the issue.
After capturing the profiles, the team went through the flame graphs of one of the Lambda functions (TimeSeriesMetricsGeneratorLambda) and learned that all of its CPU time was spent by the thread responsible to publish metrics to CloudWatch. The following screenshot shows a flame graph during one of these spikes.

TimeSeriesMetricsGeneratorLambda taking 100% CPU

As seen, there is a single call stack visible in the above flame graph, indicating all the CPU time was taken by the thread invoking above code. This helped the team immediately understand what was happening. Above code was related to the thread responsible for publishing the CloudWatch metrics. This thread was publishing these metrics in a synchronized block and as this thread took most of the CPU, it caused all other threads to wait and the latency to spike. To fix the issue, the team simply changed the TimeSeriesMetricsGeneratorLambda Lambda code, to publish CloudWatch metrics at the end of the function, which eliminated contention of this thread with all other threads.

Improvement

After the fix was deployed, the 5 second latency spikes were gone, as seen in the following graph.

Latency reduction for BatchGetFrameMetricData API

Cost, infrastructure and other improvements for CAGE

CAGE is an internal Amazon retail service that does royalty aggregation for digital products, such as Kindle eBooks, MP3 songs and albums and more. Like many other Amazon services, CAGE is also customer of CodeGuru Profiler.

CAGE was experiencing latency delays and growing infrastructure cost, and wanted to reduce them. Thanks to CodeGuru Profiler’s always-on profiling capabilities, rich visualization and recommendations, the team was able to successfully diagnose the issues, determine the root cause and fix them.

Solution

With the help of CodeGuru Profiler, the CAGE team identified several reasons for their degraded service performance and increased hardware utilization:

Excessive garbage collection activity – The team reviewed the service flame graphs (see the following screenshot) and identified that a lot of CPU time was spent getting garbage collection activities, 65.07% of the total service CPU.

Excessive garbage collection activities for CAGE

Metadata overhead – The team followed CodeGuru Profiler recommendation to identify that the service’s DynamoDB responses were consuming higher CPU, 2.86% of total CPU time. This was due to the response metadata caching in the AWS SDK v1.x HTTP client that was turned on by default. This was causing higher CPU overhead for high throughput applications such as CAGE. The following screenshot shows the relevant recommendation.

Response metadata recommendation for CAGE

Excessive logging – The team also identified excessive logging of its internal Amazon ION structures. The team initially added this logging for debugging purposes, but was unaware of its impact on the CPU cost, taking 2.28% of the overall service CPU. The following screenshot is part of the flame graph that helped identify the logging impact.

Excessive logging in CAGE service

The team used these flame graphs and CodeGuru Profiler provided recommendations to determine the root cause of the issues and systematically resolve them by doing the following:

Switching to a more efficient garbage collector
Removing excessive logging
Disabling metadata caching for Dynamo DB response

Improvements

After making these changes, the team was able to reduce their infrastructure cost by 25%, saving close to $2600 per month. Service latency also improved, with a reduction in service’s 99th percentile latency from approximately 2,500 milliseconds to 250 milliseconds in their North America (NA) region as shown below.

CAGE Latency Reduction

The team also realized a side benefit of having reduced log verbosity and saw a reduction in log size by 55%.

Event Analysis of increased checkout latency for Amazon.com

During one of the high traffic times, Amazon retail customers experienced higher than normal latency on their checkout page. The issue was due to one of the downstream service’s API experiencing high latency and CPU utilization. While the team quickly mitigated the issue by increasing the service’s servers, the always-on CodeGuru Profiler came to the rescue to help diagnose and fix the issue permanently.

Solution

The team analyzed the flame graphs from CodeGuru Profiler at the time of the event and noticed excessive CPU consumption (69.47%) when logging exceptions using Log4j2. See the following screenshot taken from an earlier version of CodeGuru Profiler user interface.

Excessive CPU consumption when logging exceptions using Log4j2

With CodeGuru Profiler flame graph and other metrics, the team quickly confirmed that the issue was due to excessive exception logging using Log4j2. This downstream service had recently upgraded to Log4j2 version 2.8, in which exception logging could be expensive, due to the way Log4j2 handles class-loading of certain stack frames. Log4j 2.x versions enabled class loading by default, which was disabled in 1.x versions, causing the increased latency and CPU utilization. The team was not able to detect this issue in pre-production environment, as the impact was observable only in high traffic situations.

Improvement

After they understood the issue, the team successfully rolled out the fix, removing the unnecessary exception trace logging to fix the issue. Such performance issues and many others are proactively offered as CodeGuru Profiler recommendations, to ensure you can proactively learn about such issues with your applications and quickly resolve them.

Conclusion

I hope this post provided a glimpse into various ways CodeGuru Profiler can benefit your business and applications. To get started using CodeGuru Profiler, see Setting up CodeGuru Profiler.
For more information about CodeGuru Profiler, see the following:

Investigating performance issues with Amazon CodeGuru Profiler

Optimizing application performance with Amazon CodeGuru Profiler

Find Your Application’s Most Expensive Lines of Code and Improve Code Quality with Amazon CodeGuru

How to automate incident response in the AWS Cloud for EC2 instances

2020-10-20 Ben Eichorst

Post Syndicated from Ben Eichorst original https://aws.amazon.com/blogs/security/how-to-automate-incident-response-in-aws-cloud-for-ec2-instances/

One of the security epics core to the AWS Cloud Adoption Framework (AWS CAF) is a focus on incident response and preparedness to address unauthorized activity. Multiple methods exist in Amazon Web Services (AWS) for automating classic incident response techniques, and the AWS Security Incident Response Guide outlines many of these methods. This post demonstrates one specific method for instantaneous response and acquisition of infrastructure data from Amazon Elastic Compute Cloud (Amazon EC2) instances.

Incident response starts with detection, progresses to investigation, and then follows with remediation. This process is no different in AWS. AWS services such as Amazon GuardDuty, Amazon Macie, and Amazon Inspector provide detection capabilities. Amazon Detective assists with investigation, including tracking and gathering information. Then, after your security organization decides to take action, pre-planned and pre-provisioned runbooks enable faster action towards a resolution. One principle outlined in the incident response whitepaper and the AWS Well-Architected Framework is the notion of pre-provisioning systems and policies to allow you to react quickly to an incident response event. The solution I present here provides a pre-provisioned architecture for an incident response system that you can use to respond to a suspect EC2 instance.

Infrastructure overview

The architecture that I outline in this blog post automates these standard actions on a suspect compute instance:

Capture all the persistent disks.
Capture the instance state at the time the incident response mechanism is started.
Isolate the instance and protect against accidental instance termination.
Perform operating system–level information gathering, such as memory captures and other parameters.
Notify the administrator of these actions.

The solution in this blog post accomplishes these tasks through the following logical flow of AWS services, illustrated in Figure 1.

Figure 1: Infrastructure deployed by the accompanying AWS CloudFormation template and associated task flow when invoking the main API

A user or application calls an API with an EC2 instance ID to start data collection.
Amazon API Gateway initiates the core logic of the process by instantiating an AWS Lambda function.
The Lambda function performs the following data gathering steps before making any changes to the infrastructure:
1. Save instance metadata to the SecResponse Amazon Simple Storage Service (Amazon S3) bucket.
2. Save a snapshot of the instance console to the SecResponse S3 bucket.
3. Initiate an Amazon Elastic Block Store (Amazon EBS) snapshot of all persistent block storage volumes.
The Lambda function then modifies the infrastructure to continue gathering information, by doing the following steps:
1. Set the Amazon EC2 termination protection flag on the instance.
2. Remove any existing EC2 instance profile from the instance.
3. If the instance is managed by AWS Systems Manager:
  1. Attach an EC2 instance profile with minimal privileges for operating system–level information gathering.
  2. Perform operating system–level information gathering actions through Systems Manager on the EC2 instance.
  3. Remove the instance profile after Systems Manager has completed its actions.
4. Create a quarantine security group that lacks both ingress and egress rules.
5. Move the instance into the created quarantine security group for isolation.
Send an administrative notification through the configured Amazon Simple Notification Service (Amazon SNS) topic.

Solution features

By using the mechanisms outlined in this post to codify your incident response runbooks, you can see the following benefits to your incident response plan.

Preparation for incident response before an incident occurs

Both the AWS CAF and Well-Architected Framework recommend that customers formulate known procedures for incident response, and test those runbooks before an incident. Testing these processes before an event occurs decreases the time it takes you to respond in a production environment. The sample infrastructure shown in this post demonstrates how you can standardize those procedures.

Consistent incident response artifact gathering

Codifying your processes into set code and infrastructure prepares you for the need to collect data, but also standardizes the collection process into a repeatable and auditable sequence of What information was collected when and how. This reduces the likelihood of missing data for future investigations.

Walkthrough: Deploying infrastructure and starting the process

To implement the solution outlined in this post, you first need to deploy the infrastructure, and then start the data collection process by issuing an API call.

The code example in this blog post requires that you provision an AWS CloudFormation stack, which creates an S3 bucket for storing your event artifacts and a serverless API that uses API Gateway and Lambda. You then execute a query against this API to take action on a target EC2 instance.

The infrastructure deployed by the AWS CloudFormation stack is a set of AWS components as depicted previously in Figure 1. The stack includes all the services and configurations to deploy the demo. It doesn’t include a target EC2 instance that you can use to test the mechanism used in this post.

Cost

The cost for this demo is minimal because the base infrastructure is completely serverless. With AWS, you only pay for the infrastructure that you use, so the single API call issued in this demo costs fractions of a cent. Artifact storage costs will incur S3 storage prices, and Amazon EC2 snapshots will be stored at their respective prices.

Deploy the AWS CloudFormation stack

In future posts and updates, we will show how to set up this security response mechanism inside a separate account designated for security, but for the purposes of this post, your demo stack must reside in the same AWS account as the target instance that you set up in the next section.

First, start by deploying the AWS CloudFormation template to provision the infrastructure.

To deploy this template in the us-east-1 region

Choose the Launch Stack button to open the AWS CloudFormation console pre-loaded with the template:
(Optional) In the AWS CloudFormation console, on the Specify Details page, customize the stack name.
For the LambdaS3BucketLocation and LambdaZipFileName fields, leave the default values for the purposes of this blog. Customizing this field allows you to customize this code example for your own purposes and store it in an S3 bucket of your choosing.
Customize the S3BucketName field. This needs to be a globally unique S3 bucket name. This bucket is where gathered artifacts are stored for the demo in this blog. You must customize it beyond the default value for the template to instantiate properly.
(Optional) Customize the SNSTopicName field. This name provides a meaningful label for the SNS topic that notifies the administrator of the actions that were performed.
Choose Next to configure the stack options and leave all default settings in place.
Choose Next to review and scroll to the bottom of the page. Select all three check boxes under the Capabilities and Transforms section, next to each of the three acknowledgements:
- I acknowledge that AWS CloudFormation might create IAM resources.
- I acknowledge that AWS CloudFormation might create IAM resources with custom names.
- I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
Choose Create Stack.

Set up a target EC2 instance

In order to demonstrate the functionality of this mechanism, you need a target host. Provision any EC2 instance in your account to act as a target for the security response mechanism to act upon for information collection and quarantine. To optimize affordability and demonstrate full functionality, I recommend choosing a small instance size (for example, t2.nano) and optionally joining the instance into Systems Manager for the ability to later execute Run Command API queries. For more details on configuring Systems Manager, refer to the AWS Systems Manager User Guide.

Retrieve required information for system initiation

The entire security response mechanism triggers through an API call. To successfully initiate this call, you first need to gather the API URI and key information.

To find the API URI and key information

Navigate to the AWS CloudFormation console and choose the stack that you’ve instantiated.
Choose the Outputs tab and save the value for the key APIBaseURI. This is the base URI for the API Gateway. It will resemble https://abcdefgh12.execute-api.us-east-1.amazonaws.com.
Next, navigate to the API Gateway console and choose the API with the name SecurityResponse.
Choose API Keys, and then choose the only key present.
Next to the API key field, choose Show to reveal the key, and then save this value to a notepad for later use.

(Optional) Configure administrative notification through the created SNS topic

One aspect of this mechanism is that it sends notifications through SNS topics. You can optionally subscribe your email or another notification pipeline mechanism to the created SNS topic in order to receive notifications on actions taken by the system.

Initiate the security response mechanism

Note that, in this demo code, you’re using a simple API key for limiting access to API Gateway. In production applications, you would use an authentication mechanism such as Amazon Cognito to control access to your API.

To kick off the security response mechanism, initiate a REST API query against the API that was created in the AWS CloudFormation template. You first create this API call by using a curl command to be run from a Linux system.

To create the API initiation curl command

Copy the following example curl command.

curl -v -X POST -i -H "x-api-key: 012345ABCDefGHIjkLMS20tGRJ7othuyag" https://abcdefghi.execute-api.us-east-1.amazonaws.com/DEMO/secresponse -d '{
  "instance_id":"i-123457890"
}'

Replace the placeholder API key specified in the x-api-key HTTP header with your API key.
Replace the example URI path with your API’s specific URI. To create the full URI, concatenate the base URI listed in the AWS CloudFormation output you gathered previously with the API call path, which is /DEMO/secresponse. This full URI for your specific API call should closely resemble this sample URI path: https://abcdefghi.execute-api.us-east-1.amazonaws.com/DEMO/secresponse
Replace the value associated with the key instance_id with the instance ID of the target EC2 instance you created.

Because this mechanism initiates through a simple API call, you can easily integrate it with existing workflow management systems. This allows for complex data collection and forensic procedures to be integrated with existing incident response workflows.

Review the gathered data

Note that the following items were uploaded as objects in the security response S3 bucket:

A console screenshot, as shown in Figure 2.
(If Systems Manager is configured) stdout information from the commands that were run on the host operating system.
Instance metadata in JSON form.

Figure 2: Example outputs from a successful completion of this blog post’s mechanism

Additionally, if you load the Amazon EC2 console and scroll down to Elastic Block Store, you can see that EBS snapshots are present for all persistent disks as shown in Figure 3.

Figure 3: Evidence of an EBS snapshot from a successful run

You can also verify that the previously outlined security controls are in place by viewing the instance in the Amazon EC2 console. You should see the removal of AWS Identity and Access Management (IAM) roles from the target EC2 instances and that the instance has been placed into network isolation through a newly created quarantine security group.

Note that for the purposes of this demo, all information that you gathered is stored in the same AWS account as the workload. As a best practice, many AWS customers choose instead to store this information in an AWS account that’s specifically designated for incident response and analysis. A dedicated account provides clear isolation of function and restriction of access. Using AWS Organizations service control policies (SCPs) and IAM permissions, your security team can limit access to adhere to security policy, legal guidance, and compliance regulations.

Clean up and delete artifacts

To clean up the artifacts from the solution in this post, first delete all information in your security response S3 bucket. Then delete the CloudFormation stack that was provisioned at the start of this process in order to clean up all remaining infrastructure.

Conclusion

Placing workloads in the AWS Cloud allows for pre-provisioned and explicitly defined incident response runbooks to be codified and quickly executed on suspect EC2 instances. This enables you to gather data in minutes that previously took hours or even days using manual processes.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

New IAMCTL tool compares multiple IAM roles and policies

2020-10-06 Sudhir Reddy Maddulapally

Post Syndicated from Sudhir Reddy Maddulapally original https://aws.amazon.com/blogs/security/new-iamctl-tool-compares-multiple-iam-roles-and-policies/

If you have multiple Amazon Web Services (AWS) accounts, and you have AWS Identity and Access Management (IAM) roles among those multiple accounts that are supposed to be similar, those roles can deviate over time from your intended baseline due to manual actions performed directly out-of-band called drift. As part of regular compliance checks, you should confirm that these roles have no deviations. In this post, we present a tool called IAMCTL that you can use to extract the IAM roles and policies from two accounts, compare them, and report out the differences and statistics. We will explain how to use the tool, and will describe the key concepts.

Prerequisites

Before you install IAMCTL and start using it, here are a few prerequisites that need to be in place on the computer where you will run it:

Python 3.x
Git
AWS Command Line Interface (AWS CLI)
Set up AWS CLI profiles for two accounts you want to compare, as described in Named Profiles.

To follow along in your environment, clone the files from the GitHub repository, and run the steps in order. You won’t incur any charges to run this tool.

Install IAMCTL

This section describes how to install and run the IAMCTL tool.

To install and run IAMCTL

At the command line, enter the following command:
```
pip3 install git+ssh://[email protected]/aws-samples/[email protected]
```
You will see output similar to the following.

Figure 1: IAMCTL tool installation output
To confirm that your installation was successful, enter the following command.
```
iamctl –h
```
You will see results similar to those in figure 2.

Figure 2: IAMCTL help message

Now that you’ve successfully installed the IAMCTL tool, the next section will show you how to use the IAMCTL commands.

Example use scenario

Here is an example of how IAMCTL can be used to find differences in IAM roles between two AWS accounts.

A system administrator for a product team is trying to accelerate a product launch in the middle of testing cycles. Developers have found that the same version of their application behaves differently in the development environment as compared to the QA environment, and they suspect this behavior is due to differences in IAM roles and policies.

The application called “app1” primarily reads from an Amazon Simple Storage Service (Amazon S3) bucket, and runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. In the development (DEV) account, the application uses an IAM role called “app1_dev” to access the S3 bucket “app1-dev”. In the QA account, the application uses an IAM role called “app1_qa” to access the S3 bucket “app1-qa”. This is depicted in figure 3.

Figure 3: Showing the “app1” application in the development and QA accounts

Setting up the scenario

To simulate this setup for the purpose of this walkthrough, you don’t have to create the EC2 instance or the S3 bucket, but just focus on the IAM role, inline policy, and trust policy.

As noted in the prerequisites, you will switch between the two AWS accounts by using the AWS CLI named profiles “dev-profile” and “qa-profile”, which are configured to point to the DEV and QA accounts respectively.

Start by using this command:

mkdir -p iamctl_test iamctl_test/dev iamctl_test/qa

The command creates a directory structure that looks like this:
Iamctl_test
|– qa
|– dev

Now, switch to the dev folder to run all the following example commands against the DEV account, by using this command:

cd iamctl_test/dev

To create the required policies, first create a file named “app1_s3_access_policy.json” and add the following policy to it. You will use this file’s content as your role’s inline policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-dev/shared/*"
            ]
        }
    ]
}

Second, create a file called “app1_trust_policy.json” and add the following policy to it. You will use this file’s content as your role’s trust policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now use the two files to create an IAM role with the name “app1_dev” in the account by using these command(s), run in the same order as listed here:

#create role with trust policy

aws --profile dev-profile iam create-role --role-name app1_dev --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role created above
 
aws --profile dev-profile iam put-role-policy --role-name app1_dev --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

In the QA account, the IAM role is named “app1_qa” and the S3 bucket is named “app1-qa”.

Repeat the steps from the prior example against the QA account by changing dev to qa where shown in bold in the following code samples. Change the directory to qa by using this command:

cd ../qa

To create the required policies, first create a file called “app1_s3_access_policy.json” and add the following policy to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Next, create a file, called “app1_trust_policy.json” and add the following policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, use the two files created so far to create an IAM role with the name “app1_qa” in your QA account by using these command(s), run in the same order as listed here:

#create role with trust policy
aws --profile qa-profile iam create-role --role-name app1_qa --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role create above
aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

So far, you have two accounts with an IAM role created in each of them for your application. In terms of permissions, there are no differences other than the name of the S3 bucket resource the permission is granted against.

You can expect IAMCTL to generate a minimal set of differences between the DEV and QA accounts, assuming all other IAM roles and policies are the same, but to be sure about the current state of both accounts, in IAMCTL you can run a process called baselining.

Through the process of baselining, you will generate an equivalency dictionary that represents all the known string patterns that reduce the noise in the generated deviations list, and then you will introduce a change into one of the IAM roles in your QA account, followed by a final IAMCTL diff to show the deviations.

Baselining

Baselining is the process of bringing two accounts to an “equivalence” state for IAM roles and policies by establishing a baseline, which future diff operations can leverage. The process is as simple as:

Run the iamctl diff command.
Capture all string substitutions into an equivalence dictionary to remove or reduce noise.
Save the generated detailed files as a snapshot.

Now you can go through these steps for your baseline.

Go ahead and run the iamctl diff command against these two accounts by using the following commands.

#change directory from qa to iamctl-test
cd ..

#run iamctl init
iamctl init

The results of running the init command are shown in figure 4.

Figure 4: Output of the iamctl init command

If you look at the iamctl_test directory now, shown in figure 5, you can see that the init command created two files in the iamctl_test directory.

Figure 5: The directory structure after running the init command

These two files are as follows:

iam.jsonA reference file that has all AWS services and actions listed, among other things. IAMCTL uses this to map the resource listed in an IAM policy to its corresponding AWS resource, based on Amazon Resource Name (ARN) regular expression.
equivalency_list.jsonThe default sample dictionary that IAMCTL uses to suppress false alarms when it compares two accounts. This is where the known string patterns that need to be substituted are added.

Note: A best practice is to make the directory where you store the equivalency dictionary and from which you run IAMCTL to be a Git repository. Doing this will let you capture any additions or modifications for the equivalency dictionary by using Git commits. This will not only give you an audit trail of your historical baselines but also gives context to any additions or modifications to the equivalency dictionary. However, doing this is not necessary for the regular functioning of IAMCTL.

Next, run the iamctl diff command:

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa

Figure 6: Result of diff command

Figure 6 shows the results of running the diff command. You can see that IAMCTL considers the app1_qa and app1_dev roles as unique to the DEV and QA accounts, respectively. This is because IAMCTL uses role names to decide whether to compare the role or tag the role as unique.

You will add the strings “dev” and “qa” to the equivalency dictionary to instruct IAMCTL to substitute occurrences of these two strings with “accountname” by adding the follow JSON to the equivalency_list.json file. You will also clean up some defaults already present in there.

echo “{“accountname”:[“dev”,”qa”]}” > equivalency_list.json

Figure 7 shows the equivalency dictionary before you take these actions, and figure 8 shows the dictionary after these actions.

Figure 7: Equivalency dictionary before

Figure 8: Equivalency dictionary after

There’s another thing to notice here. In this example, one common role was flagged as having a difference. To know which role this is and what the difference is, go to the detail reports folder listed at the bottom of the summary report. The directory structure of this folder is shown in figure 9.

Notice that the reports are created under your home directory with a folder structure that mimics the time stamp down to the second. IAMCTL does this to maintain uniqueness for each run.

tree /Users/<username>/aws-idt/output/2020/08/24/08/38/49/

Figure 9: Files written to the output reports directory

You can see there is a file called common_roles_in_dev_with_differences.csv, and it lists a role called “AwsSecurity***Audit”.

You can see there is another file called dev_to_qa_common_role_difference_items.csv, and it lists the granular IAM items from the DEV account that belong to the “AwsSecurity***Audit” role as compared to QA, but which have differences. You can see that all entries in the file have the DEV account number in the resource ARN, whereas in the qa_to_dev_common_role_difference_items.csv file, all entries have the QA account number for the same role “AwsSecurity***Audit”.

Add both of the account numbers to the equivalency dictionary to substitute them with a placeholder number, because you don’t want this role to get flagged as having differences.

echo “{“accountname”:[“dev”,”qa”],”000000000000”:[“123456789012”,”987654321098”]}” > equivalency_list.json

Now, re-run the diff command.

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa

As you can see in figure 10, you get back the result of the diff command that shows that the DEV account doesn’t have any differences in IAM roles as compared to the QA account.

Figure 10: Output showing no differences after completion of baselining

This concludes the baselining for your DEV and QA accounts. Now you will introduce a change.

Introducing drift

Drift occurs when there is a difference in actual vs expected values in the definition or configuration of a resource. There are several reasons why drift occurs, but for this scenario you will use “intentional need to respond to a time-sensitive operational event” as a reason to mimic and introduce drift into what you have built so far.

To simulate this change, add “s3:PutObject” to the qa app1_s3_access_policy.json file as shown in the following example.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3:PutObject"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Put this new inline policy on your existing role “app1_qa” by using this command:

aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

The following table represents the new drift in the accounts.

Action	Account-DEV Role name: app1_dev	Account-QA Role name: app1_qa
s3:Get*	Yes	Yes
s3:List*	Yes	Yes
s3: PutObject	No	Yes

Next, run the iamctl diff command to see how it picks up the drift from your previously baselined accounts.

#change directory from qa to iamctL-test
cd ..
iamctl diff dev-profile dev qa-profile qa

Figure 11: Output showing the one deviation that was introduced

You can see that IAMCTL now shows that the QA account has one difference as compared to DEV, which is what we expect based on the deviation you’ve introduced.

Open up the file qa_to_dev_common_role_difference_items.csv to look at the one difference. Again, adjust the following path example with the output from the iamctl diff command at the bottom of the summary report in Figure 11.

cat /Users/<username>/aws-idt/output/2020/09/18/07/38/15/qa_to_dev_common_role_difference_items.csv

As shown in figure 12, you can see that the file lists the specific S3 action “PutObject” with the role name and other relevant details.

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

You can use this information to remediate the deviation by performing corrective actions in either your DEV account or QA account. You can confirm the effectiveness of the corrective action by re-baselining to make sure that zero deviations appear.

Conclusion

In this post, you learned how to use the IAMCTL tool to compare IAM roles between two accounts, to arrive at a granular list of meaningful differences that can be used for compliance audits or for further remediation actions. If you’ve created your IAM roles by using an AWS CloudFormation stack, you can turn on drift detection and easily capture the drift because of changes done outside of AWS CloudFormation to those IAM resources. For more information about drift detection, see Detecting unmanaged configuration changes to stacks and resources. Lastly, see the GitHub repository where the tool is maintained with documentation describing each of the subcommand concepts. We welcome any pull requests for issues and enhancements.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Improved client-side encryption: Explicit KeyIds and key commitment

2020-09-25 Alex Tribble

Post Syndicated from Alex Tribble original https://aws.amazon.com/blogs/security/improved-client-side-encryption-explicit-keyids-and-key-commitment/

I’m excited to announce the launch of two new features in the AWS Encryption SDK (ESDK): local KeyId filtering and key commitment. These features each enhance security for our customers, acting as additional layers of protection for your most critical data. In this post I’ll tell you how they work. Let’s dig in.

The ESDK is a client-side encryption library designed to make it easy for you to implement client-side encryption in your application using industry standards and best practices. Since the security of your encryption is only as strong as the security of your key management, the ESDK integrates with the AWS Key Management Service (AWS KMS), though the ESDK doesn’t require you to use any particular source of keys. When using AWS KMS, the ESDK wraps data keys to one or more customer master keys (CMKs) stored in AWS KMS on encrypt, and calls AWS KMS again on decrypt to unwrap the keys.

It’s important to use only CMKs you trust. If you encrypt to an untrusted CMK, someone with access to the message and that CMK could decrypt your message. It’s equally important to only use trusted CMKs on decrypt! Decrypting with an untrusted CMK could expose you to ciphertext substitution, where you could decrypt a message that was valid, but written by an untrusted actor. There are several controls you can use to prevent this. I recommend a belt-and-suspenders approach. (Technically, this post’s approach is more like a belt, suspenders, and an extra pair of pants.)

The first two controls aren’t new, but they’re important to consider. First, you should configure your application with an AWS Identity and Access Management (IAM) policy that only allows it to use specific CMKs. An IAM policy allowing Decrypt on “Resource”:”*” might be appropriate for a development or testing account, but production accounts should list out CMKs explicitly. Take a look at our best practices for IAM policies for use with AWS KMS for more detailed guidance. Using IAM policy to control access to specific CMKs is a powerful control, because you can programmatically audit that the policy is being used across all of your accounts. To help with this, AWS Config has added new rules and AWS Security Hub added new controls to detect existing IAM policies that might allow broader use of CMKs than you intended. We recommend that you enable Security Hub’s Foundational Security Best Practices standard in all of your accounts and regions. This standard includes a set of vetted automated security checks that can help you assess your security posture across your AWS environment. To help you when writing new policies, the IAM policy visual editor in the AWS Management Console warns you if you are about to create a new policy that would add the “Resource”:”*” condition in any policy.

The second control to consider is to make sure you’re passing the KeyId parameter to AWS KMS on Decrypt and ReEncrypt requests. KeyId is optional for symmetric CMKs on these requests, since the ciphertext blob that the Encrypt request returns includes the KeyId as metadata embedded in the blob. That’s quite useful—it’s easier to use, and means you can’t (permanently) lose track of the KeyId without also losing the ciphertext. That’s an important concern for data that you need to access over long periods of time. Data stores that would otherwise include the ciphertext and KeyId as separate objects get re-architected over time and the mapping between the two objects might be lost. If you explicitly pass the KeyId in a decrypt operation, AWS KMS will only use that KeyId to decrypt, and you won’t be surprised by using an untrusted CMK. As a best practice, pass KeyId whenever you know it. ESDK messages always include the KeyId; as part of this release, the ESDK will now always pass KeyId when making AWS KMS Decrypt requests.

A third control to protect you from using an unexpected CMK is called local KeyId filtering. If you explicitly pass the KeyId of an untrusted CMK, you would still be open to ciphertext substitution—so you need to be sure you’re only passing KeyIds that you trust. The ESDK will now filter KeyIds locally by using a list of trusted CMKs or AWS account IDs you configure. This enforcement happens client-side, before calling AWS KMS. Let’s walk through a code sample. I’ll use Java here, but this feature is available in all of the supported languages of the ESDK.

Let’s say your app is decrypting ESDK messages read out of an Amazon Simple Queue Service (Amazon SQS) queue. Somewhere you’ll likely have a function like this:

public byte[] decryptMessage(final byte[] messageBytes,
                             final Map<String, String> encryptionContext) {
    // The Amazon Resource Name (ARN) of your CMK.
    final String keyArn = "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab";

    // 1. Instantiate the SDK
    AwsCrypto crypto = AwsCrypto.builder().build();

Now, when you create a KmsMasterKeyProvider, you’ll configure it with one or more KeyIds you expect to use. I’m passing a single element here for simplicity.

	// 2. Instantiate a KMS master key provider in Strict Mode using buildStrict()
    final KmsMasterKeyProvider keyProvider = KmsMasterKeyProvider.builder().buildStrict(keyArn);

Decrypt the message as normal. The ESDK will check each encrypted data key against the list of KeyIds configured at creation: in the preceeding example, the single CMK in keyArn. The ESDK will only call AWS KMS for matching encrypted data keys; if none match, it will throw a CannotUnwrapDataKeyException.

	// 3. Decrypt the message.
    final CryptoResult<byte[], KmsMasterKey> decryptResult = crypto.decryptData(keyProvider, messageBytes);

    // 4. Validate the encryption context.
    //

(See our documentation for more information on how encryption context provides additional authentication features!)

	checkEncryptionContext(decryptResult, encryptionContext);

    // 5. Return the decrypted bytes.
    return decryptResult.getResult();
}

We recommend that everyone using the ESDK with AWS KMS adopt local KeyId filtering. How you do this varies by language—the ESDK Developer Guide provides detailed instructions and example code.

I’m especially excited to announce the second new feature of the ESDK, key commitment, which addresses a non-obvious property of modern symmetric ciphers used in the industry (including the Advanced Encryption Standard (AES)). These ciphers have the property that decrypting a single ciphertext with two different keys could give different plaintexts! Picking a pair of keys that decrypt to two specific messages involves trying random keys until you get the message you want, making it too expensive for most messages. However, if you’re encrypting messages of a few bytes, it might be feasible. Most authenticated encryption schemes, such as AES-GCM, don’t solve for this issue. Instead, they prevent someone who doesn’t control the keys from tampering with the ciphertext. But someone who controls both keys can craft a ciphertext that will properly authenticate under each key by using AES-GCM.

All of this means that if a sender can get two parties to use different keys, those two parties could decrypt the exact same ciphertext and get different results. That could be problematic if the message reads, for example, as “sell 1000 shares” to one party, and “buy 1000 shares” to another.

The ESDK solves this problem for you with key commitment. Key commitment means that only a single data key can decrypt a given message, and that trying to use any other data key will result in a failed authentication check and a failure to decrypt. This property allows for senders and recipients of encrypted messages to know that everyone will see the same plaintext message after decryption.

Key commitment is on by default in version 2.0 of the ESDK. This is a breaking change from earlier versions. Existing customers should follow the ESDK migration guide for their language to upgrade from 1.x versions of the ESDK currently in their environment. I recommend a thoughtful and careful migration.

AWS is always looking for feedback on ways to improve our services and tools. Security-related concerns can be reported to AWS Security at [email protected]. We’re deeply grateful for security research, and we’d like to thank Thai Duong from Google’s security team for reaching out to us. I’d also like to thank my colleagues on the AWS Crypto Tools team for their collaboration, dedication, and commitment (pun intended) to continuously improving our libraries.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Crypto Tools forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.