Tag Archives: Expert (400)

Automatically update security groups for Amazon CloudFront IP ranges using AWS Lambda

Post Syndicated from Yeshwanth Kottu original https://aws.amazon.com/blogs/security/automatically-update-security-groups-for-amazon-cloudfront-ip-ranges-using-aws-lambda/

Amazon CloudFront is a content delivery network that can help you increase the performance of your web applications and significantly lower the latency of delivering content to your customers. For CloudFront to access an origin (the source of the content behind CloudFront), the origin has to be publicly available and reachable. Anyone with the origin domain name or IP address could request content directly and bypass CloudFront. In this blog post, I describe an automated solution that uses security groups to permit only CloudFront to access the origin.

Amazon Simple Storage Service (Amazon S3) origins provide a feature called Origin Access Identity, which blocks public access to selected buckets, making them accessible only through CloudFront. When you use CloudFront to secure your web applications, it’s important to ensure that only CloudFront can access your origin (such as Amazon Elastic Cloud Compute (Amazon EC2) or Application Load Balancer (ALB)) and any direct access to origin is restricted. This blog post shows you how to create an AWS Lambda function to automatically update Amazon Virtual Private Cloud (Amazon VPC) security groups with CloudFront service IP ranges to permit only CloudFront to access the origin.

AWS publishes the IP ranges in JSON format for CloudFront and other AWS services. If your origin is an Elastic Load Balancer or an Amazon EC2 instance, you can use VPC security groups to allow only CloudFront IP ranges to access your applications. The IP ranges in the list are separated by service and Region, and you must specify only the IP ranges that correspond to CloudFront.

The IP ranges that AWS publishes change frequently and without an automated solution, you would need to retrieve this document frequently to understand the current IP ranges for CloudFront. Frequent polling is inefficient because there is no notice of when the IP ranges change, and if these IP ranges aren’t modified immediately, your client might see 504 errors when they access CloudFront. Additionally, there are numerous IP ranges for each service, performing the change manually isn’t an efficient way of updating these ranges. This means you need infrastructure to support the task. However, in that case you end up with another host to manage—complete with the typical patching, deployment, and monitoring. As you can see, a small task could quickly become more complicated than the problem you intended to solve.

An Amazon Simple Notification Service (Amazon SNS) message is sent to a topic whenever the AWS IP ranges change. Enabling you to build an event-driven, serverless solution that updates the IP ranges for your security groups, as needed by using a Lambda function that is triggered in response to the SNS notification.

Here are the steps we are going to take to implement the solution:

  1. Create your resources
    1. Create an IAM policy and execution role for the Lambda function
    2. Create your Lambda function
  2. Test your Lambda function
  3. Configure your Lambda function’s trigger

Create your resources

The first thing you need to do is create a Lambda function execution role and policy. Lambda function uses execution role to access or create AWS resources. This Lambda function is triggered by an SNS notification whenever there’s a change in the IP ranges document. Based on the number of IP ranges present for CloudFront and also the number of ports (for example, 80,443) that you want to whitelist on the origin, this Lambda function creates the required security groups. These security groups will allow only traffic from CloudFront to your ELB load balancers or EC2 instances.

Create an IAM policy and execution role for the Lambda function

When you create a Lambda function, it’s important to understand and properly define the security context for the Lambda function. Using AWS Identity and Access Management (IAM), you can create the Lambda execution role that determines the AWS service calls that the function is authorized to complete. (Learn more about the Lambda permissions model.)

To create the IAM policy for your role

  1. Log in to the IAM console with the user account that you will use to manage the Lambda function. This account must have administrator permissions.
  2. In the navigation pane, choose Policies.
  3. In the content pane, choose Create policy.
  4. Choose the JSON tab and copy the text from the following JSON policy document. Paste this text into the JSON text box.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "CloudWatchPermissions",
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:*:*:*"
        },
        {
          "Sid": "EC2Permissions",
          "Effect": "Allow",
          "Action": [
            "ec2:DescribeSecurityGroups",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:RevokeSecurityGroupIngress",
            "ec2:CreateSecurityGroup",
            "ec2:DescribeVpcs",
    		"ec2:CreateTags",
            "ec2:ModifyNetworkInterfaceAttribute",
            "ec2:DescribeNetworkInterfaces"
            
          ],
          "Resource": "*"
        }
      ]
    }
    

  5. When you’re finished, choose Review policy.
  6. On the Review page, enter a name for the policy name (e.g. LambdaExecRolePolicy-UpdateSecurityGroupsForCloudFront). Review the policy Summary to see the permissions granted by your policy, and then choose Create policy to save your work.

To understand what this policy allows, let’s look closely at both statements in the policy. The first statement allows the Lambda function to create and write to CloudWatch Logs, which is vital for debugging and monitoring our function. The second statement allows the function to get information about existing security groups, get existing VPC information, create security groups, and authorize and revoke ingress permissions. It’s an important best practice that your IAM policies be as granular as possible, to support the principal of least privilege.

Now that you’ve created your policy, you can create the Lambda execution role that will use the policy.

To create the Lambda execution role

  1. In the navigation pane of the IAM console, choose Roles, and then choose Create role.
  2. For Select type of trusted entity, choose AWS service.
  3. Choose the service that you want to allow to assume this role. In this case, choose Lambda.
  4. Choose Next: Permissions.
  5. Search for the policy name that you created earlier and select the check box next to the policy.
  6. Choose Next: Tags.
  7. (Optional) Add metadata to the role by attaching tags as key-value pairs. For more information about using tags in IAM, see Tagging IAM Users and Roles.
  8. Choose Next: Review.
  9. For Role name (e.g. LambdaExecRole-UpdateSecurityGroupsForCloudFront), enter a name for your role.
  10. (Optional) For Role description, enter a description for the new role.
  11. Review the role, and then choose Create role.

Create your Lambda function

Now, create your Lambda function and configure the role that you created earlier as the execution role for this function.

To create the Lambda function

  1. Go to the Lambda console in N. Virginia region and choose Create function. On the next page, choose Author from scratch. (I’ll be providing the code for your Lambda function, but for other functions, the Use a blueprint option can be a great way to get started.)
  2. Give your Lambda function a name (e.g UpdateSecurityGroupsForCloudFront) and description, and select Python 3.8 from the Runtime menu.
  3. Choose or create an execution role: Select the execution role you created earlier by selecting the option Use an Existing Role.
  4. After confirming that your settings are correct, choose Create function.
  5. Paste the Lambda function code from here.
  6. Select Save.

Additionally, in the Basic Settings of the Lambda function, increase the timeout to 10 seconds.

To set the timeout value in the Lambda console

  1. In the Lambda console, choose the function you just created.
  2. Under Basic settings, choose Edit.
  3. For Timeout, select 10s.
  4. Choose Save.

By default, the Lambda function has these settings:

  • The Lambda function is configured to create security groups in the default VPC.
  • CloudFront IP ranges are updated as inbound rules on port 80.
  • The created security groups are tagged with the name prefix AUTOUPDATE.
  • Debug logging is turned off.
  • The service for which IP ranges are extracted is set to CloudFront.
  • The SDK client in the Lambda function set to us-east-1(N. Virginia).

If you want to customize these settings, set the following environment variables for the Lambda function. For more details, see Using AWS Lambda environment variables.

ActionKey-value data
To create security groups in a specific VPCKey: VPC_ID
Value: vpc-id
To create security groups rules for a different port or multiple ports
 
The solution in this example supports a total of two ports. One can be used for HTTP and another for HTTPS.

Key: PORTS
Value: portnumber
or
Key: PORTS
Value: portnumber,portnumber
To customize the prefix name tag of your security groupsKey: PREFIX_NAME
Value: custom-name
To enable debug logging to CloudWatchKey: DEBUG
Value: true
To extract IP ranges for a different service other than CloudFrontKey: SERVICE
Value: servicename
To configure the Region for the SDK client used in the Lambda function
 
If the CloudFront origin is present in a different Region than N. Virginia, the security groups must be created in that region.
Key: REGION
Value: regionname

To set environment variables in the Lambda console

  1. In the Lambda console, choose the function you created.
  2. Under Environment variables, choose Edit.
  3. Choose Add environment variable.
  4. Enter a key and value.
  5. Choose Save.

Test your Lambda function

Now that you’ve created your function, it’s time to test it and initialize your security group.

To create your test event for the Lambda function

  1. In the Lambda console, on the Functions page, choose your function. In the drop-down menu next to Actions, choose Configure test events.
  2. Enter an Event Name (e.g. TriggerSNS)
  3. Replace the following as your sample event, which will represent an SNS notification and then select Create.
    {
        "Records": [
            {
                "EventVersion": "1.0",
                "EventSubscriptionArn": "arn:aws:sns:EXAMPLE",
                "EventSource": "aws:sns",
                "Sns": {
                    "SignatureVersion": "1",
                    "Timestamp": "1970-01-01T00:00:00.000Z",
                    "Signature": "EXAMPLE",
                    "SigningCertUrl": "EXAMPLE",
                    "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e",
                    "Message": "{\"create-time\": \"yyyy-mm-ddThh:mm:ss+00:00\", \"synctoken\": \"0123456789\", \"md5\": \"7fd59f5c7f5cf643036cbd4443ad3e4b\", \"url\": \"https://ip-ranges.amazonaws.com/ip-ranges.json\"}",
                    "Type": "Notification",
                    "UnsubscribeUrl": "EXAMPLE",
                    "TopicArn": "arn:aws:sns:EXAMPLE",
                    "Subject": "TestInvoke"
                }
      		}
        ]
    }
    

  4. After you’ve added the test event, select Save and then select Test. Your Lambda function is then invoked, and you should see log output at the bottom of the console in Execution Result section, similar to the following.
    Updating from https://ip-ranges.amazonaws.com/ip-ranges.json
    MD5 Mismatch: got 2e967e943cf98ae998efeec05d4f351c expected 7fd59f5c7f5cf643036cbd4443ad3e4b: Exception
    Traceback (most recent call last):
      File "/var/task/lambda_function.py", line 29, in lambda_handler
        ip_ranges = json.loads(get_ip_groups_json(message['url'], message['md5']))
      File "/var/task/lambda_function.py", line 50, in get_ip_groups_json
        raise Exception('MD5 Missmatch: got ' + hash + ' expected ' + expected_hash)
    Exception: MD5 Mismatch: got 2e967e943cf98ae998efeec05d4f351c expected 7fd59f5c7f5cf643036cbd4443ad3e4b
    

  5. Edit the sample event again, and this time change the md5 value in the sample event to be the first MD5 hash provided in the log output. In this example, you would update the md5 value in the sample event configured earlier with the hash value seen in the error ‘2e967e943cf98ae998efeec05d4f351c’. Lambda code successfully executes only when the original hash of the IP ranges document and the hash received from the event trigger match. After you modify the hash value from the error message received earlier, the test event matches the hash of the IP ranges document.
  6. Select Save and test. This invokes your Lambda function.

After the function is invoked the second time with updated md5 has Lambda function should execute without any errors. You should be able to see the new security groups created and the IP ranges of CloudFront updated in the rules in the EC2 console, as shown in Figure 1.
 

Figure 1: EC2 console showing the security groups created

Figure 1: EC2 console showing the security groups created

In the initial successful run of this function, it created the total number of security groups required to update all the IP ranges of CloudFront for the ports mentioned. The function creates security groups based on the maximum number of rules that can be added to individual security groups. The new security groups can be identified from the EC2 console by the name AUTOUPDATE_random if you used the default configuration, or a custom name if you provided a PREFIX_NAME.

You can now attach these security groups to your Elastic LoadBalancer or EC2 instances. If your log output is different from what is described here, the output should help you identify the issue.

Configure your Lambda function’s trigger

After you’ve validated that your function is executing properly, it’s time to connect it to the SNS topic for IP changes. To do this, use the AWS Command Line Interface (CLI). Enter the following command, making sure to replace Lambda ARN with the Amazon Resource Name (ARN) of your Lambda function. You can find this ARN at the top right when viewing the configuration of your Lambda function.

aws sns subscribe - -topic-arn "arn:aws:sns:us-east-1:806199016981:AmazonIpSpaceChanged" - -region us-east-1 - -protocol lambda - -notification-endpoint "Lambda ARN"

You should receive the ARN of your Lambda function’s SNS subscription.

Now add a permission that allows the Lambda function to be invoked by the SNS topic. The following command also adds the Lambda trigger.

aws lambda add-permission - -function-name "Lambda ARN" - -statement-id lambda-sns-trigger - -region us-east-1 - -action lambda:InvokeFunction - -principal sns.amazonaws.com - -source-arn "arn:aws:sns:us-east-1:806199016981:AmazonIpSpaceChanged"

When AWS changes any of the IP ranges in the document, an SNS notification is sent and your Lambda function will be triggered. This Lambda function verifies the modified ranges in the document and efficiently updates the IP ranges on the existing security groups. Additionally, the function dynamically scales and creates additional security groups if the number of IP ranges for CloudFront is increased in future. Any newly created security groups are automatically attached to the network interface where the previous security groups are attached in order to avoid service interruption.

Summary

As you followed this blog post, you created a Lambda function to create a security groups and update the security group’s rules dynamically whenever AWS publishes new internal service IP ranges. This solution has several advantages:

  • The solution isn’t designed as a periodic poll, so it only runs when it needs to.
  • It’s automatic, so you don’t need to update security groups manually which lowers the operational cost.
  • It’s simple, because you have no extra infrastructure to maintain as the solution is completely serverless.
  • It’s cost effective, because the Lambda function runs only when triggered by the AmazonIpSpaceChanged SNS topic and only runs for a few seconds, this solution costs only pennies to operate.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon CloudFront forum. If you have any other use cases for using Lambda functions to dynamically update security groups, or even other networking configurations such as VPC route tables or ACLs, we’d love to hear about them!

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Yeshwanth Kottu

Yeshwanth is a Systems Development Engineer at AWS in Cupertino, CA. With a focus on CloudFront and [email protected], he enjoys helping customers tackle challenges through cloud-scale architectures. Yeshwanth has an MS in Computer Systems Networking and Telecommunications from Northeastern University. Outside of work, he enjoys travelling, visiting national parks, and playing cricket.

Building, bundling, and deploying applications with the AWS CDK

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.

Concepts

A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

  • AWS CloudFormation templates and instructions on where to deploy them
  • Dockerfiles, corresponding application source code, and information about where to build and push the images to
  • File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),
});

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),
    },
  })
  ...
});

We specify two parameters:

  • image (required) – The Docker image to perform the build commands in
  • command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false
          }

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true
        },
      },
    },
  })
  ...
});

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

{
  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      },
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
        }
      }
    }
  }
}

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

{
  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          },
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        },
        "Handler": "main",
        ...
      }
    },
    ...
  }
}

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  ],
  destinationBucket: myBucket
});

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

  • yarn install – Installs all our dependencies
  • yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),
        ],
      },
    }),
  ],
  ...
});

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false
            }

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
          },
        },
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],
      },
    }),
  ],
  ...
});

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).

Conclusion

This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.

 

About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.

Improving customer experience and reducing cost with CodeGuru Profiler

Post Syndicated from Rajesh original https://aws.amazon.com/blogs/devops/improving-customer-experience-and-reducing-cost-with-codeguru-profiler/

Amazon CodeGuru is a set of developer tools powered by machine learning that provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code. Amazon CodeGuru Profiler allows you to profile your applications in a low impact, always on manner. It helps you improve your application’s performance, reduce cost and diagnose application issues through rich data visualization and proactive recommendations. CodeGuru Profiler has been a very successful and widely used service within Amazon, before it was offered as a public service. This post discusses a few ways in which internal Amazon teams have used and benefited from continuous profiling of their production applications. These uses cases can provide you with better insights on how to reap similar benefits for your applications using CodeGuru Profiler.

Inside Amazon, over 100,000 applications currently use CodeGuru Profiler across various environments globally. Over the last few years, CodeGuru Profiler has served as an indispensable tool for resolving issues in the following three categories:

  1. Performance bottlenecks, high latency and CPU utilization
  2. Cost and Infrastructure utilization
  3. Diagnosis of an application impacting event

API latency improvement for CodeGuru Profiler

What could be a better example than CodeGuru Profiler using itself to improve its own performance?
CodeGuru Profiler offers an API called BatchGetFrameMetricData, which allows you to fetch time series data for a set of frames or methods. We noticed that the 99th percentile latency (i.e. the slowest 1 percent of requests over a 5 minute period) metric for this API was approximately 5 seconds, higher than what we wanted for our customers.

Solution

CodeGuru Profiler is built on a micro service architecture, with the BatchGetFrameMetricData API implemented as set of AWS Lambda functions. It also leverages other AWS services such as Amazon DynamoDB to store data and Amazon CloudWatch to record performance metrics.

When investigating the latency issue, the team found that the 5-second latency spikes were happening during certain time intervals rather than continuously, which made it difficult to easily reproduce and determine the root cause of the issue in pre-production environment. The new Lambda profiling feature in CodeGuru came in handy, and so the team decided to enable profiling for all its Lambda functions. The low impact, continuous profiling capability of CodeGuru Profiler allowed the team to capture comprehensive profiles over a period of time, including when the latency spikes occurred, enabling the team to better understand the issue.
After capturing the profiles, the team went through the flame graphs of one of the Lambda functions (TimeSeriesMetricsGeneratorLambda) and learned that all of its CPU time was spent by the thread responsible to publish metrics to CloudWatch. The following screenshot shows a flame graph during one of these spikes.

TimeSeriesMetricsGeneratorLambda taking 100% CPU

As seen, there is a single call stack visible in the above flame graph, indicating all the CPU time was taken by the thread invoking above code. This helped the team immediately understand what was happening. Above code was related to the thread responsible for publishing the CloudWatch metrics. This thread was publishing these metrics in a synchronized block and as this thread took most of the CPU, it caused all other threads to wait and the latency to spike. To fix the issue, the team simply changed the TimeSeriesMetricsGeneratorLambda Lambda code, to publish CloudWatch metrics at the end of the function, which eliminated contention of this thread with all other threads.

Improvement

After the fix was deployed, the 5 second latency spikes were gone, as seen in the following graph.

Latency reduction for BatchGetFrameMetricData API

Cost, infrastructure and other improvements for CAGE

CAGE is an internal Amazon retail service that does royalty aggregation for digital products, such as Kindle eBooks, MP3 songs and albums and more. Like many other Amazon services, CAGE is also customer of CodeGuru Profiler.

CAGE was experiencing latency delays and growing infrastructure cost, and wanted to reduce them. Thanks to CodeGuru Profiler’s always-on profiling capabilities, rich visualization and recommendations, the team was able to successfully diagnose the issues, determine the root cause and fix them.

Solution

With the help of CodeGuru Profiler, the CAGE team identified several reasons for their degraded service performance and increased hardware utilization:

  • Excessive garbage collection activity – The team reviewed the service flame graphs (see the following screenshot) and identified that a lot of CPU time was spent getting garbage collection activities, 65.07% of the total service CPU.

Excessive garbage collection activities for CAGE

  • Metadata overhead – The team followed CodeGuru Profiler recommendation to identify that the service’s DynamoDB responses were consuming higher CPU, 2.86% of total CPU time. This was due to the response metadata caching in the AWS SDK v1.x HTTP client that was turned on by default. This was causing higher CPU overhead for high throughput applications such as CAGE. The following screenshot shows the relevant recommendation.

Response metadata recommendation for CAGE

  • Excessive logging – The team also identified excessive logging of its internal Amazon ION structures. The team initially added this logging for debugging purposes, but was unaware of its impact on the CPU cost, taking 2.28% of the overall service CPU. The following screenshot is part of the flame graph that helped identify the logging impact.

Excessive logging in CAGE service

The team used these flame graphs and CodeGuru Profiler provided recommendations to determine the root cause of the issues and systematically resolve them by doing the following:

  • Switching to a more efficient garbage collector
  • Removing excessive logging
  • Disabling metadata caching for Dynamo DB response

Improvements

After making these changes, the team was able to reduce their infrastructure cost by 25%, saving close to $2600 per month. Service latency also improved, with a reduction in service’s 99th percentile latency from approximately 2,500 milliseconds to 250 milliseconds in their North America (NA) region as shown below.

CAGE Latency Reduction

The team also realized a side benefit of having reduced log verbosity and saw a reduction in log size by 55%.

Event Analysis of increased checkout latency for Amazon.com

During one of the high traffic times, Amazon retail customers experienced higher than normal latency on their checkout page. The issue was due to one of the downstream service’s API experiencing high latency and CPU utilization. While the team quickly mitigated the issue by increasing the service’s servers, the always-on CodeGuru Profiler came to the rescue to help diagnose and fix the issue permanently.

Solution

The team analyzed the flame graphs from CodeGuru Profiler at the time of the event and noticed excessive CPU consumption (69.47%) when logging exceptions using Log4j2. See the following screenshot taken from an earlier version of CodeGuru Profiler user interface.

Excessive CPU consumption when logging exceptions using Log4j2

With CodeGuru Profiler flame graph and other metrics, the team quickly confirmed that the issue was due to excessive exception logging using Log4j2. This downstream service had recently upgraded to Log4j2 version 2.8, in which exception logging could be expensive, due to the way Log4j2 handles class-loading of certain stack frames. Log4j 2.x versions enabled class loading by default, which was disabled in 1.x versions, causing the increased latency and CPU utilization. The team was not able to detect this issue in pre-production environment, as the impact was observable only in high traffic situations.

Improvement

After they understood the issue, the team successfully rolled out the fix, removing the unnecessary exception trace logging to fix the issue. Such performance issues and many others are proactively offered as CodeGuru Profiler recommendations, to ensure you can proactively learn about such issues with your applications and quickly resolve them.

Conclusion

I hope this post provided a glimpse into various ways CodeGuru Profiler can benefit your business and applications. To get started using CodeGuru Profiler, see Setting up CodeGuru Profiler.
For more information about CodeGuru Profiler, see the following:

Investigating performance issues with Amazon CodeGuru Profiler

Optimizing application performance with Amazon CodeGuru Profiler

Find Your Application’s Most Expensive Lines of Code and Improve Code Quality with Amazon CodeGuru

 

How to automate incident response in the AWS Cloud for EC2 instances

Post Syndicated from Ben Eichorst original https://aws.amazon.com/blogs/security/how-to-automate-incident-response-in-aws-cloud-for-ec2-instances/

One of the security epics core to the AWS Cloud Adoption Framework (AWS CAF) is a focus on incident response and preparedness to address unauthorized activity. Multiple methods exist in Amazon Web Services (AWS) for automating classic incident response techniques, and the AWS Security Incident Response Guide outlines many of these methods. This post demonstrates one specific method for instantaneous response and acquisition of infrastructure data from Amazon Elastic Compute Cloud (Amazon EC2) instances.

Incident response starts with detection, progresses to investigation, and then follows with remediation. This process is no different in AWS. AWS services such as Amazon GuardDuty, Amazon Macie, and Amazon Inspector provide detection capabilities. Amazon Detective assists with investigation, including tracking and gathering information. Then, after your security organization decides to take action, pre-planned and pre-provisioned runbooks enable faster action towards a resolution. One principle outlined in the incident response whitepaper and the AWS Well-Architected Framework is the notion of pre-provisioning systems and policies to allow you to react quickly to an incident response event. The solution I present here provides a pre-provisioned architecture for an incident response system that you can use to respond to a suspect EC2 instance.

Infrastructure overview

The architecture that I outline in this blog post automates these standard actions on a suspect compute instance:

  1. Capture all the persistent disks.
  2. Capture the instance state at the time the incident response mechanism is started.
  3. Isolate the instance and protect against accidental instance termination.
  4. Perform operating system–level information gathering, such as memory captures and other parameters.
  5. Notify the administrator of these actions.

The solution in this blog post accomplishes these tasks through the following logical flow of AWS services, illustrated in Figure 1.
 

Figure 1: Infrastructure deployed by the accompanying AWS CloudFormation template and associated task flow when invoking the main API

Figure 1: Infrastructure deployed by the accompanying AWS CloudFormation template and associated task flow when invoking the main API

  1. A user or application calls an API with an EC2 instance ID to start data collection.
  2. Amazon API Gateway initiates the core logic of the process by instantiating an AWS Lambda function.
  3. The Lambda function performs the following data gathering steps before making any changes to the infrastructure:
    1. Save instance metadata to the SecResponse Amazon Simple Storage Service (Amazon S3) bucket.
    2. Save a snapshot of the instance console to the SecResponse S3 bucket.
    3. Initiate an Amazon Elastic Block Store (Amazon EBS) snapshot of all persistent block storage volumes.
  4. The Lambda function then modifies the infrastructure to continue gathering information, by doing the following steps:
    1. Set the Amazon EC2 termination protection flag on the instance.
    2. Remove any existing EC2 instance profile from the instance.
    3. If the instance is managed by AWS Systems Manager:
      1. Attach an EC2 instance profile with minimal privileges for operating system–level information gathering.
      2. Perform operating system–level information gathering actions through Systems Manager on the EC2 instance.
      3. Remove the instance profile after Systems Manager has completed its actions.
    4. Create a quarantine security group that lacks both ingress and egress rules.
    5. Move the instance into the created quarantine security group for isolation.
  5. Send an administrative notification through the configured Amazon Simple Notification Service (Amazon SNS) topic.

Solution features

By using the mechanisms outlined in this post to codify your incident response runbooks, you can see the following benefits to your incident response plan.

Preparation for incident response before an incident occurs

Both the AWS CAF and Well-Architected Framework recommend that customers formulate known procedures for incident response, and test those runbooks before an incident. Testing these processes before an event occurs decreases the time it takes you to respond in a production environment. The sample infrastructure shown in this post demonstrates how you can standardize those procedures.

Consistent incident response artifact gathering

Codifying your processes into set code and infrastructure prepares you for the need to collect data, but also standardizes the collection process into a repeatable and auditable sequence of What information was collected when and how. This reduces the likelihood of missing data for future investigations.

Walkthrough: Deploying infrastructure and starting the process

To implement the solution outlined in this post, you first need to deploy the infrastructure, and then start the data collection process by issuing an API call.

The code example in this blog post requires that you provision an AWS CloudFormation stack, which creates an S3 bucket for storing your event artifacts and a serverless API that uses API Gateway and Lambda. You then execute a query against this API to take action on a target EC2 instance.

The infrastructure deployed by the AWS CloudFormation stack is a set of AWS components as depicted previously in Figure 1. The stack includes all the services and configurations to deploy the demo. It doesn’t include a target EC2 instance that you can use to test the mechanism used in this post.

Cost

The cost for this demo is minimal because the base infrastructure is completely serverless. With AWS, you only pay for the infrastructure that you use, so the single API call issued in this demo costs fractions of a cent. Artifact storage costs will incur S3 storage prices, and Amazon EC2 snapshots will be stored at their respective prices.

Deploy the AWS CloudFormation stack

In future posts and updates, we will show how to set up this security response mechanism inside a separate account designated for security, but for the purposes of this post, your demo stack must reside in the same AWS account as the target instance that you set up in the next section.

First, start by deploying the AWS CloudFormation template to provision the infrastructure.

To deploy this template in the us-east-1 region

  1. Choose the Launch Stack button to open the AWS CloudFormation console pre-loaded with the template:
     
    Select the Launch Stack button to launch the template
  2. (Optional) In the AWS CloudFormation console, on the Specify Details page, customize the stack name.
  3. For the LambdaS3BucketLocation and LambdaZipFileName fields, leave the default values for the purposes of this blog. Customizing this field allows you to customize this code example for your own purposes and store it in an S3 bucket of your choosing.
  4. Customize the S3BucketName field. This needs to be a globally unique S3 bucket name. This bucket is where gathered artifacts are stored for the demo in this blog. You must customize it beyond the default value for the template to instantiate properly.
  5. (Optional) Customize the SNSTopicName field. This name provides a meaningful label for the SNS topic that notifies the administrator of the actions that were performed.
  6. Choose Next to configure the stack options and leave all default settings in place.
  7. Choose Next to review and scroll to the bottom of the page. Select all three check boxes under the Capabilities and Transforms section, next to each of the three acknowledgements:
    • I acknowledge that AWS CloudFormation might create IAM resources.
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names.
    • I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
  8. Choose Create Stack.

Set up a target EC2 instance

In order to demonstrate the functionality of this mechanism, you need a target host. Provision any EC2 instance in your account to act as a target for the security response mechanism to act upon for information collection and quarantine. To optimize affordability and demonstrate full functionality, I recommend choosing a small instance size (for example, t2.nano) and optionally joining the instance into Systems Manager for the ability to later execute Run Command API queries. For more details on configuring Systems Manager, refer to the AWS Systems Manager User Guide.

Retrieve required information for system initiation

The entire security response mechanism triggers through an API call. To successfully initiate this call, you first need to gather the API URI and key information.

To find the API URI and key information

  1. Navigate to the AWS CloudFormation console and choose the stack that you’ve instantiated.
  2. Choose the Outputs tab and save the value for the key APIBaseURI. This is the base URI for the API Gateway. It will resemble https://abcdefgh12.execute-api.us-east-1.amazonaws.com.
  3. Next, navigate to the API Gateway console and choose the API with the name SecurityResponse.
  4. Choose API Keys, and then choose the only key present.
  5. Next to the API key field, choose Show to reveal the key, and then save this value to a notepad for later use.

(Optional) Configure administrative notification through the created SNS topic

One aspect of this mechanism is that it sends notifications through SNS topics. You can optionally subscribe your email or another notification pipeline mechanism to the created SNS topic in order to receive notifications on actions taken by the system.

Initiate the security response mechanism

Note that, in this demo code, you’re using a simple API key for limiting access to API Gateway. In production applications, you would use an authentication mechanism such as Amazon Cognito to control access to your API.

To kick off the security response mechanism, initiate a REST API query against the API that was created in the AWS CloudFormation template. You first create this API call by using a curl command to be run from a Linux system.

To create the API initiation curl command

  1. Copy the following example curl command.
    curl -v -X POST -i -H "x-api-key: 012345ABCDefGHIjkLMS20tGRJ7othuyag" https://abcdefghi.execute-api.us-east-1.amazonaws.com/DEMO/secresponse -d '{
      "instance_id":"i-123457890"
    }'
    

  2. Replace the placeholder API key specified in the x-api-key HTTP header with your API key.
  3. Replace the example URI path with your API’s specific URI. To create the full URI, concatenate the base URI listed in the AWS CloudFormation output you gathered previously with the API call path, which is /DEMO/secresponse. This full URI for your specific API call should closely resemble this sample URI path: https://abcdefghi.execute-api.us-east-1.amazonaws.com/DEMO/secresponse
  4. Replace the value associated with the key instance_id with the instance ID of the target EC2 instance you created.

Because this mechanism initiates through a simple API call, you can easily integrate it with existing workflow management systems. This allows for complex data collection and forensic procedures to be integrated with existing incident response workflows.

Review the gathered data

Note that the following items were uploaded as objects in the security response S3 bucket:

  1. A console screenshot, as shown in Figure 2.
  2. (If Systems Manager is configured) stdout information from the commands that were run on the host operating system.
  3. Instance metadata in JSON form.

 

Figure 2: Example outputs from a successful completion of this blog post's mechanism

Figure 2: Example outputs from a successful completion of this blog post’s mechanism

Additionally, if you load the Amazon EC2 console and scroll down to Elastic Block Store, you can see that EBS snapshots are present for all persistent disks as shown in Figure 3.
 

Figure 3: Evidence of an EBS snapshot from a successful run

Figure 3: Evidence of an EBS snapshot from a successful run

You can also verify that the previously outlined security controls are in place by viewing the instance in the Amazon EC2 console. You should see the removal of AWS Identity and Access Management (IAM) roles from the target EC2 instances and that the instance has been placed into network isolation through a newly created quarantine security group.

Note that for the purposes of this demo, all information that you gathered is stored in the same AWS account as the workload. As a best practice, many AWS customers choose instead to store this information in an AWS account that’s specifically designated for incident response and analysis. A dedicated account provides clear isolation of function and restriction of access. Using AWS Organizations service control policies (SCPs) and IAM permissions, your security team can limit access to adhere to security policy, legal guidance, and compliance regulations.

Clean up and delete artifacts

To clean up the artifacts from the solution in this post, first delete all information in your security response S3 bucket. Then delete the CloudFormation stack that was provisioned at the start of this process in order to clean up all remaining infrastructure.

Conclusion

Placing workloads in the AWS Cloud allows for pre-provisioned and explicitly defined incident response runbooks to be codified and quickly executed on suspect EC2 instances. This enables you to gather data in minutes that previously took hours or even days using manual processes.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ben Eichorst

Ben is a Senior Solutions Architect, Security, Cryptography, and Identity Specialist for AWS. He works with AWS customers to efficiently implement globally scalable security programs while empowering development teams and reducing risk. He holds a BA from Northwestern University and an MBA from University of Colorado.

New IAMCTL tool compares multiple IAM roles and policies

Post Syndicated from Sudhir Reddy Maddulapally original https://aws.amazon.com/blogs/security/new-iamctl-tool-compares-multiple-iam-roles-and-policies/

If you have multiple Amazon Web Services (AWS) accounts, and you have AWS Identity and Access Management (IAM) roles among those multiple accounts that are supposed to be similar, those roles can deviate over time from your intended baseline due to manual actions performed directly out-of-band called drift. As part of regular compliance checks, you should confirm that these roles have no deviations. In this post, we present a tool called IAMCTL that you can use to extract the IAM roles and policies from two accounts, compare them, and report out the differences and statistics. We will explain how to use the tool, and will describe the key concepts.

Prerequisites

Before you install IAMCTL and start using it, here are a few prerequisites that need to be in place on the computer where you will run it:

To follow along in your environment, clone the files from the GitHub repository, and run the steps in order. You won’t incur any charges to run this tool.

Install IAMCTL

This section describes how to install and run the IAMCTL tool.

To install and run IAMCTL

  1. At the command line, enter the following command:
    pip3 install git+ssh://[email protected]/aws-samples/[email protected]
    

    You will see output similar to the following.

    Figure 1: IAMCTL tool installation output

    Figure 1: IAMCTL tool installation output

  2. To confirm that your installation was successful, enter the following command.
    iamctl –h
    

    You will see results similar to those in figure 2.

    Figure 2: IAMCTL help message

    Figure 2: IAMCTL help message

Now that you’ve successfully installed the IAMCTL tool, the next section will show you how to use the IAMCTL commands.

Example use scenario

Here is an example of how IAMCTL can be used to find differences in IAM roles between two AWS accounts.

A system administrator for a product team is trying to accelerate a product launch in the middle of testing cycles. Developers have found that the same version of their application behaves differently in the development environment as compared to the QA environment, and they suspect this behavior is due to differences in IAM roles and policies.

The application called “app1” primarily reads from an Amazon Simple Storage Service (Amazon S3) bucket, and runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. In the development (DEV) account, the application uses an IAM role called “app1_dev” to access the S3 bucket “app1-dev”. In the QA account, the application uses an IAM role called “app1_qa” to access the S3 bucket “app1-qa”. This is depicted in figure 3.

Figure 3: Showing the “app1” application in the development and QA accounts

Figure 3: Showing the “app1” application in the development and QA accounts

Setting up the scenario

To simulate this setup for the purpose of this walkthrough, you don’t have to create the EC2 instance or the S3 bucket, but just focus on the IAM role, inline policy, and trust policy.

As noted in the prerequisites, you will switch between the two AWS accounts by using the AWS CLI named profiles “dev-profile” and “qa-profile”, which are configured to point to the DEV and QA accounts respectively.

Start by using this command:

mkdir -p iamctl_test iamctl_test/dev iamctl_test/qa

The command creates a directory structure that looks like this:
Iamctl_test
|– qa
|– dev

Now, switch to the dev folder to run all the following example commands against the DEV account, by using this command:

cd iamctl_test/dev

To create the required policies, first create a file named “app1_s3_access_policy.json” and add the following policy to it. You will use this file’s content as your role’s inline policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-dev/shared/*"
            ]
        }
    ]
}

Second, create a file called “app1_trust_policy.json” and add the following policy to it. You will use this file’s content as your role’s trust policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now use the two files to create an IAM role with the name “app1_dev” in the account by using these command(s), run in the same order as listed here:

#create role with trust policy

aws --profile dev-profile iam create-role --role-name app1_dev --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role created above
 
aws --profile dev-profile iam put-role-policy --role-name app1_dev --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

In the QA account, the IAM role is named “app1_qa” and the S3 bucket is named “app1-qa”.

Repeat the steps from the prior example against the QA account by changing dev to qa where shown in bold in the following code samples. Change the directory to qa by using this command:

cd ../qa

To create the required policies, first create a file called “app1_s3_access_policy.json” and add the following policy to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Next, create a file, called “app1_trust_policy.json” and add the following policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, use the two files created so far to create an IAM role with the name “app1_qa” in your QA account by using these command(s), run in the same order as listed here:

#create role with trust policy
aws --profile qa-profile iam create-role --role-name app1_qa --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role create above
aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

So far, you have two accounts with an IAM role created in each of them for your application. In terms of permissions, there are no differences other than the name of the S3 bucket resource the permission is granted against.

You can expect IAMCTL to generate a minimal set of differences between the DEV and QA accounts, assuming all other IAM roles and policies are the same, but to be sure about the current state of both accounts, in IAMCTL you can run a process called baselining.

Through the process of baselining, you will generate an equivalency dictionary that represents all the known string patterns that reduce the noise in the generated deviations list, and then you will introduce a change into one of the IAM roles in your QA account, followed by a final IAMCTL diff to show the deviations.

Baselining

Baselining is the process of bringing two accounts to an “equivalence” state for IAM roles and policies by establishing a baseline, which future diff operations can leverage. The process is as simple as:

  1. Run the iamctl diff command.
  2. Capture all string substitutions into an equivalence dictionary to remove or reduce noise.
  3. Save the generated detailed files as a snapshot.

Now you can go through these steps for your baseline.

Go ahead and run the iamctl diff command against these two accounts by using the following commands.

#change directory from qa to iamctl-test
cd ..

#run iamctl init
iamctl init

The results of running the init command are shown in figure 4.

Figure 4: Output of the iamctl init command

Figure 4: Output of the iamctl init command

If you look at the iamctl_test directory now, shown in figure 5, you can see that the init command created two files in the iamctl_test directory.

Figure 5: The directory structure after running the init command

Figure 5: The directory structure after running the init command

These two files are as follows:

  1. iam.jsonA reference file that has all AWS services and actions listed, among other things. IAMCTL uses this to map the resource listed in an IAM policy to its corresponding AWS resource, based on Amazon Resource Name (ARN) regular expression.
  2. equivalency_list.jsonThe default sample dictionary that IAMCTL uses to suppress false alarms when it compares two accounts. This is where the known string patterns that need to be substituted are added.

Note: A best practice is to make the directory where you store the equivalency dictionary and from which you run IAMCTL to be a Git repository. Doing this will let you capture any additions or modifications for the equivalency dictionary by using Git commits. This will not only give you an audit trail of your historical baselines but also gives context to any additions or modifications to the equivalency dictionary. However, doing this is not necessary for the regular functioning of IAMCTL.

Next, run the iamctl diff command:

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa
Figure 6: Result of diff command

Figure 6: Result of diff command

Figure 6 shows the results of running the diff command. You can see that IAMCTL considers the app1_qa and app1_dev roles as unique to the DEV and QA accounts, respectively. This is because IAMCTL uses role names to decide whether to compare the role or tag the role as unique.

You will add the strings “dev” and “qa” to the equivalency dictionary to instruct IAMCTL to substitute occurrences of these two strings with “accountname” by adding the follow JSON to the equivalency_list.json file. You will also clean up some defaults already present in there.

echo “{“accountname”:[“dev”,”qa”]}” > equivalency_list.json

Figure 7 shows the equivalency dictionary before you take these actions, and figure 8 shows the dictionary after these actions.

Figure 7: Equivalency dictionary before

Figure 7: Equivalency dictionary before

Figure 8: Equivalency dictionary after

Figure 8: Equivalency dictionary after

There’s another thing to notice here. In this example, one common role was flagged as having a difference. To know which role this is and what the difference is, go to the detail reports folder listed at the bottom of the summary report. The directory structure of this folder is shown in figure 9.

Notice that the reports are created under your home directory with a folder structure that mimics the time stamp down to the second. IAMCTL does this to maintain uniqueness for each run.

tree /Users/<username>/aws-idt/output/2020/08/24/08/38/49/
Figure 9: Files written to the output reports directory

Figure 9: Files written to the output reports directory

You can see there is a file called common_roles_in_dev_with_differences.csv, and it lists a role called “AwsSecurity***Audit”.

You can see there is another file called dev_to_qa_common_role_difference_items.csv, and it lists the granular IAM items from the DEV account that belong to the “AwsSecurity***Audit” role as compared to QA, but which have differences. You can see that all entries in the file have the DEV account number in the resource ARN, whereas in the qa_to_dev_common_role_difference_items.csv file, all entries have the QA account number for the same role “AwsSecurity***Audit”.

Add both of the account numbers to the equivalency dictionary to substitute them with a placeholder number, because you don’t want this role to get flagged as having differences.

echo “{“accountname”:[“dev”,”qa”],”000000000000”:[“123456789012”,”987654321098”]}” > equivalency_list.json

Now, re-run the diff command.

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa

As you can see in figure 10, you get back the result of the diff command that shows that the DEV account doesn’t have any differences in IAM roles as compared to the QA account.

Figure 10: Output showing no differences after completion of baselining

Figure 10: Output showing no differences after completion of baselining

This concludes the baselining for your DEV and QA accounts. Now you will introduce a change.

Introducing drift

Drift occurs when there is a difference in actual vs expected values in the definition or configuration of a resource. There are several reasons why drift occurs, but for this scenario you will use “intentional need to respond to a time-sensitive operational event” as a reason to mimic and introduce drift into what you have built so far.

To simulate this change, add “s3:PutObject” to the qa app1_s3_access_policy.json file as shown in the following example.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3:PutObject"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Put this new inline policy on your existing role “app1_qa” by using this command:

aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

The following table represents the new drift in the accounts.

ActionAccount-DEV
Role name: app1_dev
Account-QA
Role name: app1_qa
s3:Get*YesYes
s3:List*YesYes
s3: PutObjectNoYes

Next, run the iamctl diff command to see how it picks up the drift from your previously baselined accounts.

#change directory from qa to iamctL-test
cd ..
iamctl diff dev-profile dev qa-profile qa
Figure 11: Output showing the one deviation that was introduced

Figure 11: Output showing the one deviation that was introduced

You can see that IAMCTL now shows that the QA account has one difference as compared to DEV, which is what we expect based on the deviation you’ve introduced.

Open up the file qa_to_dev_common_role_difference_items.csv to look at the one difference. Again, adjust the following path example with the output from the iamctl diff command at the bottom of the summary report in Figure 11.

cat /Users/<username>/aws-idt/output/2020/09/18/07/38/15/qa_to_dev_common_role_difference_items.csv

As shown in figure 12, you can see that the file lists the specific S3 action “PutObject” with the role name and other relevant details.

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

You can use this information to remediate the deviation by performing corrective actions in either your DEV account or QA account. You can confirm the effectiveness of the corrective action by re-baselining to make sure that zero deviations appear.

Conclusion

In this post, you learned how to use the IAMCTL tool to compare IAM roles between two accounts, to arrive at a granular list of meaningful differences that can be used for compliance audits or for further remediation actions. If you’ve created your IAM roles by using an AWS CloudFormation stack, you can turn on drift detection and easily capture the drift because of changes done outside of AWS CloudFormation to those IAM resources. For more information about drift detection, see Detecting unmanaged configuration changes to stacks and resources. Lastly, see the GitHub repository where the tool is maintained with documentation describing each of the subcommand concepts. We welcome any pull requests for issues and enhancements.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sudhir Reddy Maddulapally

Sudhir is a Senior Partner Solution Architect who builds SaaS solutions for partners by day and is a tech tinkerer by night. He enjoys trips to state and national parks, and Yosemite is his favorite thus far!

Author

Soumya Vanga

Soumya is a Cloud Application Architect with AWS Professional Services in New York, NY, helping customers design solutions and workloads and to adopt Cloud Native services.

Improved client-side encryption: Explicit KeyIds and key commitment

Post Syndicated from Alex Tribble original https://aws.amazon.com/blogs/security/improved-client-side-encryption-explicit-keyids-and-key-commitment/

I’m excited to announce the launch of two new features in the AWS Encryption SDK (ESDK): local KeyId filtering and key commitment. These features each enhance security for our customers, acting as additional layers of protection for your most critical data. In this post I’ll tell you how they work. Let’s dig in.

The ESDK is a client-side encryption library designed to make it easy for you to implement client-side encryption in your application using industry standards and best practices. Since the security of your encryption is only as strong as the security of your key management, the ESDK integrates with the AWS Key Management Service (AWS KMS), though the ESDK doesn’t require you to use any particular source of keys. When using AWS KMS, the ESDK wraps data keys to one or more customer master keys (CMKs) stored in AWS KMS on encrypt, and calls AWS KMS again on decrypt to unwrap the keys.

It’s important to use only CMKs you trust. If you encrypt to an untrusted CMK, someone with access to the message and that CMK could decrypt your message. It’s equally important to only use trusted CMKs on decrypt! Decrypting with an untrusted CMK could expose you to ciphertext substitution, where you could decrypt a message that was valid, but written by an untrusted actor. There are several controls you can use to prevent this. I recommend a belt-and-suspenders approach. (Technically, this post’s approach is more like a belt, suspenders, and an extra pair of pants.)

The first two controls aren’t new, but they’re important to consider. First, you should configure your application with an AWS Identity and Access Management (IAM) policy that only allows it to use specific CMKs. An IAM policy allowing Decrypt on “Resource”:”*” might be appropriate for a development or testing account, but production accounts should list out CMKs explicitly. Take a look at our best practices for IAM policies for use with AWS KMS for more detailed guidance. Using IAM policy to control access to specific CMKs is a powerful control, because you can programmatically audit that the policy is being used across all of your accounts. To help with this, AWS Config has added new rules and AWS Security Hub added new controls to detect existing IAM policies that might allow broader use of CMKs than you intended. We recommend that you enable Security Hub’s Foundational Security Best Practices standard in all of your accounts and regions. This standard includes a set of vetted automated security checks that can help you assess your security posture across your AWS environment. To help you when writing new policies, the IAM policy visual editor in the AWS Management Console warns you if you are about to create a new policy that would add the “Resource”:”*” condition in any policy.

The second control to consider is to make sure you’re passing the KeyId parameter to AWS KMS on Decrypt and ReEncrypt requests. KeyId is optional for symmetric CMKs on these requests, since the ciphertext blob that the Encrypt request returns includes the KeyId as metadata embedded in the blob. That’s quite useful—it’s easier to use, and means you can’t (permanently) lose track of the KeyId without also losing the ciphertext. That’s an important concern for data that you need to access over long periods of time. Data stores that would otherwise include the ciphertext and KeyId as separate objects get re-architected over time and the mapping between the two objects might be lost. If you explicitly pass the KeyId in a decrypt operation, AWS KMS will only use that KeyId to decrypt, and you won’t be surprised by using an untrusted CMK. As a best practice, pass KeyId whenever you know it. ESDK messages always include the KeyId; as part of this release, the ESDK will now always pass KeyId when making AWS KMS Decrypt requests.

A third control to protect you from using an unexpected CMK is called local KeyId filtering. If you explicitly pass the KeyId of an untrusted CMK, you would still be open to ciphertext substitution—so you need to be sure you’re only passing KeyIds that you trust. The ESDK will now filter KeyIds locally by using a list of trusted CMKs or AWS account IDs you configure. This enforcement happens client-side, before calling AWS KMS. Let’s walk through a code sample. I’ll use Java here, but this feature is available in all of the supported languages of the ESDK.

Let’s say your app is decrypting ESDK messages read out of an Amazon Simple Queue Service (Amazon SQS) queue. Somewhere you’ll likely have a function like this:

public byte[] decryptMessage(final byte[] messageBytes,
                             final Map<String, String> encryptionContext) {
    // The Amazon Resource Name (ARN) of your CMK.
    final String keyArn = "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab";

    // 1. Instantiate the SDK
    AwsCrypto crypto = AwsCrypto.builder().build();

Now, when you create a KmsMasterKeyProvider, you’ll configure it with one or more KeyIds you expect to use. I’m passing a single element here for simplicity.

	// 2. Instantiate a KMS master key provider in Strict Mode using buildStrict()
    final KmsMasterKeyProvider keyProvider = KmsMasterKeyProvider.builder().buildStrict(keyArn); 

Decrypt the message as normal. The ESDK will check each encrypted data key against the list of KeyIds configured at creation: in the preceeding example, the single CMK in keyArn. The ESDK will only call AWS KMS for matching encrypted data keys; if none match, it will throw a CannotUnwrapDataKeyException.

	// 3. Decrypt the message.
    final CryptoResult<byte[], KmsMasterKey> decryptResult = crypto.decryptData(keyProvider, messageBytes);

    // 4. Validate the encryption context.
    //

(See our documentation for more information on how encryption context provides additional authentication features!)

	checkEncryptionContext(decryptResult, encryptionContext);

    // 5. Return the decrypted bytes.
    return decryptResult.getResult();
}

We recommend that everyone using the ESDK with AWS KMS adopt local KeyId filtering. How you do this varies by language—the ESDK Developer Guide provides detailed instructions and example code.

I’m especially excited to announce the second new feature of the ESDK, key commitment, which addresses a non-obvious property of modern symmetric ciphers used in the industry (including the Advanced Encryption Standard (AES)). These ciphers have the property that decrypting a single ciphertext with two different keys could give different plaintexts! Picking a pair of keys that decrypt to two specific messages involves trying random keys until you get the message you want, making it too expensive for most messages. However, if you’re encrypting messages of a few bytes, it might be feasible. Most authenticated encryption schemes, such as AES-GCM, don’t solve for this issue. Instead, they prevent someone who doesn’t control the keys from tampering with the ciphertext. But someone who controls both keys can craft a ciphertext that will properly authenticate under each key by using AES-GCM.

All of this means that if a sender can get two parties to use different keys, those two parties could decrypt the exact same ciphertext and get different results. That could be problematic if the message reads, for example, as “sell 1000 shares” to one party, and “buy 1000 shares” to another.

The ESDK solves this problem for you with key commitment. Key commitment means that only a single data key can decrypt a given message, and that trying to use any other data key will result in a failed authentication check and a failure to decrypt. This property allows for senders and recipients of encrypted messages to know that everyone will see the same plaintext message after decryption.

Key commitment is on by default in version 2.0 of the ESDK. This is a breaking change from earlier versions. Existing customers should follow the ESDK migration guide for their language to upgrade from 1.x versions of the ESDK currently in their environment. I recommend a thoughtful and careful migration.

AWS is always looking for feedback on ways to improve our services and tools. Security-related concerns can be reported to AWS Security at [email protected]. We’re deeply grateful for security research, and we’d like to thank Thai Duong from Google’s security team for reaching out to us. I’d also like to thank my colleagues on the AWS Crypto Tools team for their collaboration, dedication, and commitment (pun intended) to continuously improving our libraries.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Crypto Tools forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Alex Tribble

Alex is a Principal Software Development Engineer in AWS Crypto Tools. She joined Amazon in 2008 and has spent her time building security platforms, protecting availability, and generally making things faster and cheaper. Outside of work, she, her wife, and children love to pack as much stuff into as few bikes as possible.

AWS Online Tech Talks for August 2020

Post Syndicated from Jimmy Cooper original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-for-august-2020/

Join us for live, online presentations led by AWS solutions architects and engineers. AWS Online Tech Talks cover a range of topics and expertise levels, and feature technical deep dives, demonstrations, customer examples, and live Q&A with AWS experts.

Note – All sessions are free and in Pacific Time. Can’t join us live? Access webinar recordings and slides on our On-Demand Portal.

Tech talks this month are:

August 17, 2020 | 9:00 AM – 10:00 AM PT
Unlock the Power of Connected Vehicle Data with the AWS Connected Mobility Solution
Learn how to start building solutions for common connected mobility use cases in minutes.

August 17, 2020 | 11:00 AM – 12:00 PM PT
Build a Blockchain Track-and-Trace Application on AWS
Learn how to set up a blockchain network and build a track-and-trace application using Amazon Managed Blockchain.

August 18, 2020 | 9:00 AM – 10:00 AM PT
Cloud Financial Management: How Small Changes Can Accelerate Value Realization
Join 451 Research and AWS Cloud Financial Management (CFM) experts as they share findings from a recent study of 500 enterprise AWS customers conducted to understand how CFM helps organizations realize value beyond cost savings.

August 18, 2020 | 11:00 AM – 12:00 PM PT
Improve Data Science Team Productivity Using Amazon SageMaker Studio
Join us for a deep dive into Amazon SageMaker Studio, the first IDE for ML.

August 18, 2020 | 1:00 PM – 2:00 PM PT
What’s New With AWS Global Accelerator
Learn how to move user traffic onto the AWS network infrastructure, improving traffic by up to 60% in minutes with AWS Global Accelerator.

August 19, 2020 | 11:00 AM – 12:00 PM PT
AWS Lambda Data Storage: Choosing Between S3, EFS, and Local Storage
Learn when to choose Amazon S3, Amazon EFS, or Lambda Layers for your application with AWS Lambda Data Storage.

August 19, 2020 | 1:00 PM – 2:00 PM PT
Optimizing Cluster Utilization with EKS and Fargate
Learn how Amazon and EKS can work together and why you should consider deploying your Kubernetes applications on Fargate instead of traditional Kubernetes worker nodes.

August 20, 2020 | 9:00 AM – 10:00 AM PT
Streaming Data Pipelines for Real-Time Analytics – Are You Ready?
Learn best practices for building streaming data pipelines so you can focus on your data instead of managing infrastructure.

August 20, 2020 | 11:00 AM – 12:00 PM PT
Build Resilient, Easy to Manage Continuous Integration Workflows with AWS CodeBuild and AWS Step Functions
Learn how to use AWS CodeBuild and AWS Step Functions to easily set up branching workflows with manual approval steps for your continuous integration processes or data-processing applications.

August 20, 2020 | 1:00 PM – 2:00 PM PT
Improving Business Continuity with Amazon Aurora Global Database
Learn how to place your database where your customers are for low-latency reads and global disaster recovery.

August 21, 2020 | 9:00 AM – 10:00 AM PT
Intro to AWS Service Management Connector for ServiceNow
Learn how to simplify cloud resource management with the AWS Service Catalog Connector for ServiceNow.

August 21, 2020 | 11:00 AM – 12:00 PM PT
How to Use AWS Wavelength to Deliver Applications that Require Ultra-Low Latency to 5G Mobile Users
Learn how to use AWS Wavelength to develop and deploy applications that need low-latency access to end-user mobile devices.

August 21, 2020 | 1:00 PM – 2:00 PM PT
Deep Dive on Migrating and Modernizing Middleware Applications
Learn migration and modernization strategy, and patterns for middleware applications.

August 24, 2020 | 9:00 AM – 10:00 AM PT
AWS IoT and Amazon Kinesis Video Streams for Connected Home Applications
Learn how you can simplify the development and management of connected cameras and video streaming solutions for smart home use cases.

August 24, 2020 | 11:00 AM – 12:00 PM PT
Introduction to Quantum Computing on AWS
Learn the best practices to get started with quantum computing with Amazon Braket.

August 24, 2020 | 1:00 PM – 2:00 PM PT
Embedding Analytics in Your Applications
Learn about new QuickSight capabilities for embedding analytics and how customers are leveraging QuickSight’s serverless architecture to easily build and scale their embedded analytics applications.

August 25, 2020 | 9:00 AM – 10:00 AM PT
DNS Design Using Amazon Route 53
Explore DNS design, the capabilities of Amazon Route 53 DNS, and architectural considerations for hybrid networks.

August 25, 2020 | 11:00 AM – 12:00 PM PT
Amazon EMR Deep Dive and Best Practices
Learn design patterns and architectural best practices to get the most from your big data with Amazon EMR.

August 25, 2020 | 1:00 PM – 2:00 PM PT
Simply and Seamlessly Set Up Secure File Transfers to Amazon S3 Over SFTP and Other Protocols
Learn how to simply and seamlessly setup secure file transfers to Amazon S3 using AWS Transfer Family.

August 26, 2020 | 9:00 AM – 10:00 AM PT
Enhanced CI/CD with AWS CDK
Learn how to use the AWS Cloud Development Kit and AWS CodePipeline to create a construct library that makes it even easier to set up continuous delivery pipelines for your CDK applications.

August 26, 2020 | 11:00 AM – 12:00 PM PT
Forrester Research Analyzes the Total Economic Impact™ of Amazon Connect
Learn about the Forrester framework and to evaluate the financial impact of working with Amazon Connect, covering ROI benefits, costs, risk, and flexibility.

August 26, 2020 | 1:00 PM – 2:00 PM PT
Best Practices for Data Protection on Amazon S3
Learn how to protect your data by leveraging S3 features to ensure your data meets compliance requirements.

August 27, 2020 | 9:00 AM – 10:00 AM PT
How to Optimize for Cost When Using Amazon DocumentDB (with MongoDB Compatibility)
Learn how to optimize for cost when using Amazon DocumentDB (with MongoDB compatibility).

August 27, 2020 | 11:00 AM – 12:00 PM PT
Accelerating Embedded Linux Solutions for the AWS Cloud
Learn how to accelerate Embedded Linux solutions for the AWS Cloud.

August 27, 2020 | 1:00 PM – 2:00 PM PT
SQL Server Cost Optimization on AWS
Learn how to optimize the cost of SQL Server workloads on AWS across EC2 and RDS SQL Server to diversify and optimize your current licensing investments, think strategically about licensing in the cloud, and bring your own licenses to AWS.

August 28, 2020 | 9:00 AM – 10:00 AM PT
How to Centrally Audit and Remediate VPC Security Groups Using AWS Firewall Manager
Learn how to use AWS Firewall Manager to centrally audit VPC security groups across your accounts and resources using different pre-configured and custom audit polices.

August 28, 2020 | 11:00 AM – 12:00 PM PT
Increase Employee Productivity and Delight Customers with Cognitive Search Using Amazon Kendra
Discover how Amazon Kendra provides highly accurate and easy to use enterprise search powered by machine learning.

August 28, 2020 | 1:00 PM – 2:00 PM PT
How to Maximize the Value of Your Analytics Stack with an Amazon Redshift Data Warehouse
Learn how to maximize the value of your analytics stack with an Amazon Redshift data warehouse.

Serverless Architecture for a Web Scraping Solution

Post Syndicated from Dzidas Martinaitis original https://aws.amazon.com/blogs/architecture/serverless-architecture-for-a-web-scraping-solution/

If you are interested in serverless architecture, you may have read many contradictory articles and wonder if serverless architectures are cost effective or expensive. I would like to clear the air around the issue of effectiveness through an analysis of a web scraping solution. The use case is fairly simple: at certain times during the day, I want to run a Python script and scrape a website. The execution of the script takes less than 15 minutes. This is an important consideration, which we will come back to later. The project can be considered as a standard extract, transform, load process without a user interface and can be packed into a self-containing function or a library.

Subsequently, we need an environment to execute the script. We have at least two options to consider: on-premises (such as on your local machine, a Raspberry Pi server at home, a virtual machine in a data center, and so on) or you can deploy it to the cloud. At first glance, the former option may feel more appealing — you have the infrastructure available free of charge, why not to use it? The main concern of an on-premises hosted solution is the reliability — can you assure its availability in case of a power outage or a hardware or network failure? Additionally, does your local infrastructure support continuous integration and continuous deployment (CI/CD) tools to eliminate any manual intervention? With these two constraints in mind, I will continue the analysis of the solutions in the cloud rather than on-premises.

Let’s start with the pricing of three cloud-based scenarios and go into details below.

Pricing table of three cloud-based scenarios

*The AWS Lambda free usage tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month. Review AWS Lambda pricing.

Option #1

The first option, an instance of a virtual machine in AWS (called Amazon Elastic Cloud Compute or EC2), is the most primitive one. However, it definitely does not resemble any serverless architecture, so let’s consider it as a reference point or a baseline. This option is similar to an on-premises solution giving you full control of the instance, but you would need to manually spin an instance, install your environment, set up a scheduler to execute your script at a specific time, and keep it on for 24×7. And don’t forget the security (setting up a VPC, route tables, etc.). Additionally, you will need to monitor the health of the instance and maybe run manual updates.

Option #2

The second option is to containerize the solution and deploy it on Amazon Elastic Container Service (ECS). The biggest advantage to this is platform independence. Having a Docker file (a text document that contains all the commands you could call on the command line to assemble an image) with a copy of your environment and the script enables you to reuse the solution locally—on the AWS platform, or somewhere else. A huge advantage to running it on AWS is that you can integrate with other services, such as AWS CodeCommit, AWS CodeBuild, AWS Batch, etc. You can also benefit from discounted compute resources such as Amazon EC2 Spot instances.

Architecture of CloudWatch, Batch, ECR

The architecture, seen in the diagram above, consists of Amazon CloudWatch, AWS Batch, and Amazon Elastic Container Registry (ECR). CloudWatch allows you to create a trigger (such as starting a job when a code update is committed to a code repository) or a scheduled event (such as executing a script every hour). We want the latter: executing a job based on a schedule. When triggered, AWS Batch will fetch a pre-built Docker image from Amazon ECR and execute it in a predefined environment. AWS Batch is a free-of-charge service and allows you to configure the environment and resources needed for a task execution. It relies on ECS, which manages resources at the execution time. You pay only for the compute resources consumed during the execution of a task.

You may wonder where the pre-built Docker image came from. It was pulled from Amazon ECR, and now you have two options to store your Docker image there:

  • You can build a Docker image locally and upload it to Amazon ECR.
  • You just commit few configuration files (such as Dockerfile, buildspec.yml, etc.) to AWS CodeCommit (a code repository) and build the Docker image on the AWS platform.This option, shown in the image below, allows you to build a full CI/CD pipeline. After updating a script file locally and committing the changes to a code repository on AWS CodeCommit, a CloudWatch event is triggered and AWS CodeBuild builds a new Docker image and commits it to Amazon ECR. When a scheduler starts a new task, it fetches the new image with your updated script file. If you feel like exploring further or you want actually implement this approach please take a look at the example of the project on GitHub.

CodeCommit. CodeBuild, ECR

Option #3

The third option is based on AWS Lambda, which allows you to build a very lean infrastructure on demand, scales continuously, and has generous monthly free tier. The major constraint of Lambda is that the execution time is capped at 15 minutes. If you have a task running longer than 15 minutes, you need to split it into subtasks and run them in parallel, or you can fall back to Option #2.

By default, Lambda gives you access to standard libraries (such as the Python Standard Library). In addition, you can build your own package to support the execution of your function or use Lambda Layers to gain access to external libraries or even external Linux based programs.

Lambda Layer

You can access AWS Lambda via the web console to create a new function, update your Lambda code, or execute it. However, if you go beyond the “Hello World” functionality, you may realize that online development is not sustainable. For example, if you want to access external libraries from your function, you need to archive them locally, upload to Amazon Simple Storage Service (Amazon S3), and link it to your Lambda function.

One way to automate Lambda function development is to use AWS Cloud Development Kit (AWS CDK), which is an open source software development framework to model and provision your cloud application resources using familiar programming languages. Initially, the setup and learning might feel strenuous; however the benefits are worth of it. To give you an example, please take a look at this Python class on GitHub, which creates a Lambda function, a CloudWatch event, IAM policies, and Lambda layers.

In a summary, the AWS CDK allows you to have infrastructure as code, and all changes will be stored in a code repository. For a deployment, AWS CDK builds an AWS CloudFormation template, which is a standard way to model infrastructure on AWS. Additionally, AWS Serverless Application Model (SAM) allows you to test and debug your serverless code locally, meaning that you can indeed create a continuous integration.

See an example of a Lambda-based web scraper on GitHub.

Conclusion

In this blog post, we reviewed two serverless architectures for a web scraper on AWS cloud. Additionally, we have explored the ways to implement a CI/CD pipeline in order to avoid any future manual interventions.

AWS Online Tech Talks for April 2020

Post Syndicated from Jimmy Cooper original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-for-april-2020/

Join us for live, online presentations led by AWS solutions architects and engineers. AWS Online Tech Talks cover a range of topics and expertise levels, and feature technical deep dives, demonstrations, customer examples, and live Q&A with AWS experts.

Note – All sessions are free and in Pacific Time. Can’t join us live? Access webinar recordings and slides on our On-Demand Portal.

Tech talks this month are:

April 20, 2020 | 9:00 AM – 10:00 AM PT – Save Costs Running Kubernetes Clusters with EC2 Spot Instances – ​Learn how you can lower costs and improve application resiliency by running Kubernetes workloads on Amazon EKS with Spot Instances.​

April 20, 2020 | 11:00 AM – 12:00 PM PT – Hadoop 3.0 and Docker on Amazon EMR 6.0 – A deep dive into what’s new in EMR 6.0 including Apache Hadoop 3.0, Docker containers & Apache Hive performance improvements​.

​April 20, 2020 | 1:00 PM – 2:00 PM PT – Infrastructure as Code on AWS – ​Join this tech talk to learn how to use AWS CloudFormation and AWS CDK to provision and manage infrastructure, deploy code, and automate your software-release processes.

April 21, 2020 | 9:00 AM – 10:00 AM PT – How to Maximize Results with a Cloud Contact Center, Featuring Aberdeen Research – ​Learn how to maximize results with a cloud contact center, featuring Aberdeen Research and Amazon Connect​.

April 21, 2020 | 11:00 AM – 12:00 PM PT – Connecting Microcontrollers to the Cloud for IoT Applications – ​Learn how you can connect microcontrollers to the cloud for IoT applications​.

April 21, 2020 | 1:00 PM – 2:00 PM PT – Reducing Machine Learning Inference Cost for PyTorch Models – ​Join us for a tech talk to learn about deploying your PyTorch models for low latency at low cost.​

April 22, 2020 | 11:00 AM – 12:00 PM PT – Top 10 Security Items to Improve in Your AWS Account – Learn about the top 10 security items to improve in your AWS environment and how you can automate them.​

April 22, 2020 | 1:00 PM – 2:00 PM PT – Building Your First Application with AWS Lambda – ​Learn how to build your first serverless application with AWS Lambda, including basic design patterns and best practices.​

April 23, 2020 | 9:00 AM – 10:00 AM PT – Persistent Storage for Containers with Amazon EFS – ​Learn how to securely store your containers in the cloud with Amazon EFS​.

April 23, 2020 | 11:00 AM – 12:00 PM PT – Build Event Driven Graph Applications with AWS Purpose-Built Databases – ​Learn how to build event driven graph applications using AWS purpose-built database services including Amazon Neptune, Amazon DynamoDB, and Amazon ElastiCache.​

April 23, 2020 | 1:00 PM – 2:00 PM PT – Migrate with AWS – ​Introduction to best practice driven process for migrations to AWS, developed by the experience in helping thousands of enterprises migrate.

April 27, 2020 | 9:00 AM – 10:00 AM PT – Best Practices for Modernizing On-Premise Big Data Workloads Using Amazon EMR – ​Learn about best practices to migrate from on-premises big data (Apache Spark and Hadoop) to Amazon EMR.​

April 27, 2020 | 11:00 AM – 12:00 PM PT – Understanding Game Changes and Player Behavior with Graph Databases – ​Learn how to solve problems with highly connected data in game datasets with Amazon Neptune.

​​April 27, 2020 | 1:00 PM – 2:00 PM PT – Assess, Migrate, and Modernize from Legacy Databases to AWS: Oracle to Amazon Aurora PostgreSQL Migration – ​Punitive licensing and high cost of on-premises legacy databases could hold you back. Join this tech talk to learn how to assess, migrate, and modernize your Oracle workloads over to Amazon Aurora PostgreSQL, using Amazon Database Migration Service (DMS).​

April 28, 2020 | 9:00 AM – 10:00 AM PT – Implementing SAP in the Cloud with AWS Tools and Services – ​This tech talk will help architects and administrators to understand the automation capabilities available that can assist your SAP migration.​

April 28, 2020 | 11:00 AM – 12:00 PM PT – Choosing Events, Queues, Topics, and Streams in Your Serverless Application – ​Learn how to choose between common Lambda event sources like EventBridge, SNS, SQS, and Kinesis Data Streams.​

April 30, 2020 | 9:00 AM – 10:00 AM PT – Inside Amazon DocumentDB: The Makings of a Managed Non-relational Database – Join Rahul Pathak, GM of Emerging Databases and Blockchain at AWS, to learn about the inner workings of Amazon DocumentDB and how it provides better performance, scalability, and availability while reducing operational overhead for managing your own non-relational databases.

How to verify AWS KMS asymmetric key signatures locally with OpenSSL

Post Syndicated from J.D. Bean original https://aws.amazon.com/blogs/security/how-to-verify-aws-kms-asymmetric-key-signatures-locally-with-openssl/

In this post, I demonstrate a sample workflow for generating a digital signature within AWS Key Management Service (KMS) and then verifying that signature on a client machine using OpenSSL.

The support for asymmetric keys in AWS KMS has exciting use cases. The ability to create, manage, and use public and private key pairs with KMS enables you to perform digital signing operations using RSA and Elliptic Curve (ECC) keys. You can also perform public key encryption or decryption operations using RSA keys.

For example, you can use ECC or RSA private keys to generate digital signatures. Third parties can perform verification outside of AWS KMS using the corresponding public keys. Similarly, AWS customers and third parties can perform unauthenticated encryption outside of AWS KMS using an RSA public key and still enforce authenticated decryption within AWS KMS. This is done using the corresponding private key.

The commands found in this tutorial were tested using Amazon Linux 2. Other Linux, macOS, or Unix operating systems are likely to work with minimal modification but have not been tested.

Creating an asymmetric signing key pair

To start, create an asymmetric customer master key (CMK) using the AWS Command Line Interface (AWS CLI) example command below. This generates an RSA 4096 key for signature creation and verification using AWS KMS.


aws kms create-key --customer-master-key-spec RSA_4096 \
 --key-usage SIGN_VERIFY \
 --description "Sample Digital Signature Key Pair"

If successful, this command returns a KeyMetadata object. Take note of the KeyID value. As a best practice, I recommend assigning an alias for your key. The command below assigns an alias of sample-sign-verify-key to your newly created CMK (replace the target-key-id value of <1234abcd-12ab-34cd-56ef-1234567890ab> with your KeyID).


aws kms create-alias \
    --alias-name alias/sample-sign-verify-key \
    --target-key-id <1234abcd-12ab-34cd-56ef-1234567890ab> 

Creating signer and verifier roles

For the next phase of this tutorial, you must create two AWS principals. You’ll create two roles: a signer principal and a verifier principal. First, navigate to the AWS Identity and Access Management (IAM) “Create role” Console dialogue that allows entities in a specified account to assume the role. Enter your Account ID and select Next: Permissions, as shown in Figure 1 below.

Figure 1: Enter your Account ID to begin creating a role in AWS IAM

Figure 1: Enter your Account ID to begin creating a role in AWS IAM

Select Next through the next two screens. On the fourth and final screen, enter a Role name of SignerRole and Role description, as shown in Figure 2 below.

Figure 2: Enter a role name and description to finish creating the role

Figure 2: Enter a role name and description to finish creating the role

Select Create role to finish creating the signer role. To create the verifier role, you must perform this same process one more time. On the final screen, provide the name OfflineVerifierRole for the role instead.

Configuring key policy permissions

A best practice is to adhere to the principle of least privilege and provide each AWS principal with the minimal permissions necessary to perform its tasks. The signer and verifier roles that you created currently have no permissions in your account. The signer principal must have permission to be able to create digital signatures in KMS for files using the public portion of your CMK. The verifier principal must have permission to download the plaintext public key portion of your CMK.

To provide access control permissions for KMS actions to your AWS principals, attach a key policy to the CMK. The IAM role for the signer principal (SignerRole) is given kms:Sign permission in the CMK key policy. The IAM role for the verifier principal (OfflineVerifierRole) is given kms:GetPublicKey permission in the CMK key policy.

Navigate to the KMS page in the AWS Console and select customer-managed keys. Next, select your CMK, scroll down to the key policy section, and select edit.

To allow your signer principal to use the CMK for digital signing, append the following stanza to the key policy (replace the account ID value of <111122223333> with your own):


{
    "Sid": "Allow use of the key pair for digital signing",
    "Effect": "Allow",
    "Principal": {"AWS":"arn:aws:iam::<111122223333>:role/SignerRole"},
    "Action": "kms:Sign",
    "Resource": "*"
}

To allow your verifier principal to download the CMK public key, append the following stanza to the key policy (replace the account ID value of <111122223333> with your own):


{
    "Sid": "Allow plaintext download of the public portion of the key pair",
    "Effect": "Allow",
    "Principal": {"AWS":"arn:aws:iam::<111122223333>:role/OfflineVerifierRole"},
    "Action": "kms:GetPublicKey",
    "Resource": "*"
}

You can permit the verifier to perform digital signature verification using KMS by granting the kms:Verify action. However, the kms:GetPublicKey action enables the verifier principal to download the CMK public key in plaintext to verify the signature in a local environment without access to AWS KMS.

Although you configured the policy for a verifier principal within your own account, you can also configure the policy for a verifier principal in a separate account to validate signatures generated by your CMK.

Because AWS KMS enables the verifier principal to download the CMK public key in plaintext, you can also use the verifier principal you configure to download the public key and distribute it to third parties. This can be done whether or not they have AWS security credentials via, for example, an S3 presigned URL.

Signing a message

To demonstrate signature verification, you need KMS to sign a file with your CMK using the KMS Sign API. KMS signatures can be generated directly for messages of up to 4096 bytes. To sign a larger message, you can generate a hash digest of the message, and then provide the hash digest to KMS for signing.

For messages up to 4096 bytes, you first create a text file containing a short message of your choosing, which we refer to as SampleText.txt. To sign the file, you must assume your signer role. To do so, execute the following command, but replace the account ID value of <111122223333> with your own:


aws sts assume-role \
--role-arn arn:aws:iam::<111122223333>:role/SignerRole \
--role-session-name AWSCLI-Session

The return values provide an access key ID, secret key, and session token. Substitute these values into their respective fields in the following command and execute it:


export AWS_ACCESS_KEY_ID=<ExampleAccessKeyID1>
export AWS_SECRET_ACCESS_KEY=<ExampleSecretKey1>
export AWS_SESSION_TOKEN=<ExampleSessionToken1>

Then confirm that you have successfully assumed the signer role by issuing:


aws sts get-caller-identity

If the output of this command contains the text assumed-role/SignerRole then you have successfully assumed the signer role and you may sign your message file with:


aws kms sign \
    --key-id alias/sample-sign-verify-key \
    --message-type RAW \
    --signing-algorithm RSASSA_PKCS1_V1_5_SHA_512 \
    --message fileb://SampleText.txt \
    --output text \
    --query Signature | base64 --decode > SampleText.sig

To indicate that the file is a message and not a message digest, the command passes a MessageType parameter of RAW. The command then decodes the signature and writes it to a local disk as SampleText.sig. This file is important later when you want to verify the signature entirely client-side without calling AWS KMS.

Finally, to drop your assumed role, you may issue:


unset \
	AWS_ACCESS_KEY_ID \
	AWS_SECRET_ACCESS_KEY \
	AWS_SESSION_TOKEN

followed by:


aws sts get-caller-identity

Verifying a signature client-side

Assume your verifier role using the same process as before and issue the following command to fetch a copy of the public portion of your CMK from AWS KMS:


aws kms get-public-key \
 --key-id alias/sample-sign-verify-key \
 --output text \
 --query PublicKey | base64 --decode > SamplePublicKey.der

This command writes to disk the DER-encoded X.509 public key with a name of SamplePublicKey.der . You can convert this DER-encoded key to a PEM-encoded key by running the following command:


openssl rsa -pubin -inform DER \
    -outform PEM -in SamplePublicKey.der \
    -pubout -out SamplePublicKey.pem

You now have the following three files:

  1. A PEM file, SamplePublicKey.pem containing the CMK public key
  2. The original SampleText.txt file
  3. The SampleText.sig file that you generated in KMS using the CMK private key

With these three inputs, you can now verify the signature entirely client-side without calling AWS KMS. To verify the signature, run the following command:


openssl dgst -sha512 \
    -verify SamplePublicKey.pem \
    -signature SampleText.sig \
    SampleText.txt

If you performed all of the steps correctly, you see the following message on your console:


 Verified OK

This successful verification provides a high degree of confidence that the message was endorsed by a principal with permission to sign using your KMS CMK (authentication) — in this case, your sender role principal. It also verifies that the message has not been modified in transit (integrity).

To demonstrate this, update your SampleText.txt file by adding new characters to the file. If you rerun the command, you see the following message:


Verification Failure

Summary

In this tutorial, you verified the authenticity of a digital signature generated by a KMS asymmetric key pair on your local machine. You did this by using OpenSSL and a plaintext public key exported from KMS.

You created an asymmetric CMK in KMS and configured key policy permissions for your signer and verifier principals. You then digitally signed a message in KMS using the private portion of your asymmetric CMK. Then, you exported a copy of the public portion of your asymmetric key pair from KMS in plaintext. With this copy of your public key, you were able to perform signature verification using OpenSSL entirely in your local environment. A similar pattern can be used with an asymmetric CMK configured with a KeyUsage value of ENCRYPT_DECRYPT. This pattern can be used to perform encryption operations in a local environment and decryption operations in AWS KMS.

To learn more about the asymmetric keys feature of KMS, please read the KMS Developer Guide. If you’re considering implementing an architecture involving downloading public keys, be sure to refer to the KMS Developer Guide for Special Considerations for Downloading Public Keys. If you have questions about the asymmetric keys feature, please start a new thread on the AWS KMS Discussion Forum.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

J.D. Bean

J.D. Bean is a Solutions Architect at Amazon Web Services on the World Wide Public Sector Federal Financials team based out of New York City. He is passionate about his work enabling AWS customers’ successful cloud journeys, and his interests include security, privacy, and compliance. J.D. holds a Bachelor of Arts from The George Washington University and a Juris Doctor from New York University School of Law. In his spare time J.D. enjoys spending time with his family, yoga, and experimenting in his kitchen.

How to run AWS CloudHSM workloads on AWS Lambda

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-aws-lambda/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. CloudHSM also automatically manages synchronization, high availability and failover within a cluster.

When the service first launched, many customers ran CloudHSM workloads on Amazon Elastic Compute Cloud (Amazon EC2), which required the CloudHSM client to be installed on the Amazon EC2 instance in order to communicate with the CloudHSM cluster. Today, we see customers who are interested in leveraging CloudHSM for serverless workloads using AWS Lambda, but when using Lambda there is no “instance” to install the CloudHSM client on. This blog post shows a workaround that can be used to satisfy the CloudHSM client installation requirement on Lambda functions to be able to run CloudHSM workloads within these Lambda functions.

The workaround is performed by first packaging the CloudHSM client and its requirements in a Lambda layer, and then running the CloudHSM client in a child process from within the Lambda function code to allow communication with the HSMs in your CloudHSM cluster. By leveraging this approach, you gain the benefits of serverless computing (such as increased scalability and decreased admin overhead), as well as the ability to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3) and AWS Config.

Why would I want to run CloudHSM workloads on Lambda?

Below are some specific use cases enabled by this solution:

  1. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to encrypt or decrypt the file using keys stored in CloudHSM.
  2. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to create a digital signature for the file using a private key stored in CloudHSM. This digital signature can then be used to ensure file integrity.
  3. You can create a custom AWS Config rule that checks to ensure files in a directory or a bucket have not been tampered with by verifying their digital signatures using keys stored in CloudHSM.

Solution overview

This solution shows you how to package the CloudHSM client binary and its dependencies (configuration files and libraries) as well as the CloudHSM Java JCE library to a Lambda layer which is attached to the Lambda function. This enables the function to run the CloudHSM client daemon in the background as a child process, allowing it to connect to the CloudHSM cluster and to perform cryptographic tasks such as encryption and decryption operations.

Using a Lambda layer decouples the code of the Lambda function from the CloudHSM client and the CloudHSM Java JCE library. This way, when a new version of the CloudHSM client and the CloudHSM Java JCE library is released, it can be included in a new Lambda layer version and attached to the Lambda function without needing to rebuild the Lambda function package.

The example solution below includes a complete Java sample for the Lambda function. It uses the CloudHSM Java JCE library to generate a symmetric key on the HSM, and it uses this key to encrypt and decrypt after starting the CloudHSM client. Maven (a build automation tool) will be used to build the Lambda function package.

The solution uses AWS Secrets Manager to store and retrieve the crypto user (CU) credentials that are needed to perform cryptographic operations. If the HSM IPs of the CloudHSM cluster are changed (for example, if the HSMs are deleted and re-created), the Lambda function will automatically update the configuration during runtime.

Note:

  1. The solution only works with version 2.0.4 or later of the CloudHSM client and CloudHSM Java JCE library.
  2. In this workaround, the client is started at the beginning of each Lambda invocation, and is stopped at the end of the invocation. Due to the way Lambda works, the client can’t persist through multiple invocations.
  3. Secrets Manager uses AWS Key Management Service to secure its data. If your workload requires that all data be secured using HSMs under your sole control, without reliance on IAM credentials, this solution may not be appropriate. You should work with your security or compliance officer to ensure you’re using a method of securing HSM login credentials that meets your application and security needs.

Prerequisites

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Here are the resources you’ll need in order to follow along with the example in Figure 1:

  1. An Amazon Virtual Private Cloud (Amazon VPC) with the following components:
    1. Private subnets in multiple Availability Zones to be used for the HSM’s elastic network interfaces (ENIs).
    2. A public subnet that contains a network address translation (NAT) gateway.
    3. A private subnet with a route table that routes internet traffic (0.0.0.0/0) to the NAT gateway. You’ll use this subnet to run the Lambda function. The NAT gateway allows you to connect to the CloudHSM, CloudWatch Logs and Secrets Manager endpoints.

    Note: For high availability, you can add multiple instances of the public and private subnets mentioned in Prerequisites 1.b and 1.c. For more information about how to create an Amazon VPC with public and private subnets as well as a NAT gateway, refer to the Amazon VPC user guide.

  2. An active CloudHSM cluster with at least one active HSM. The HSMs should be created in the private subnets mentioned in Prerequisite 1.a. You can follow the Getting Started with AWS CloudHSM guide to create and initialize the CloudHSM cluster.
  3. An Amazon Linux 2 EC2 instance with the CloudHSM client installed and configured to connect to the CloudHSM cluster. The client instance should be launched in the public subnet mentioned in Prerequisite 1.b. You can again refer to Getting Started With AWS CloudHSM to configure and connect the client instance.

    Note: You only need the client instance to build the Lambda function package. You can terminate the instance after the package has been created.

  4. CU credentials. You can create a CU by following the steps in the user guide.
  5. A server/machine with AWS Command Line Interface (AWS CLI) installed and configured. You’ll need this to follow along, as the example uses AWS CLI to create and configure the necessary AWS resources. The IAM user/role should have at minimum the permissions in the below policy attached to it to follow this example. Make sure you replace the <REGION> and <ACCOUNT-ID> tags below with the actual Region and account ID you are using.
    
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "secretsmanager:CreateSecret",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "secretsmanager:Name": "CloudHSM_CU"
                    }
                }
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "ec2:AuthorizeSecurityGroupEgress",
                    "lambda:CreateFunction",
                    "lambda:InvokeFunction",
                    "lambda:GetLayerVersion",
                    "lambda:PublishLayerVersion",
                    "iam:GetRole",
                    "iam:CreateRole",
                    "iam:AttachRolePolicy",
                    "iam:PutRolePolicy",
                    "iam:PassRole",
                    "secretsmanager:DescribeSecret",
                    "secretsmanager:GetResourcePolicy",
                    "secretsmanager:GetSecretValue",
                    "secretsmanager:PutResourcePolicy",
                    "logs:FilterLogEvents"
                ],
                "Resource": [
                    "arn:aws:ec2:<REGION>:<ACCOUNT-ID>:security-group/outbound-443",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:function:cloudhsm_lambda_example",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:layer:cloudhsm-client-layer",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:layer:cloudhsm-client-layer:*",
                    "arn:aws:iam::<ACCOUNT-ID>:role/cloudhsm_lambda_example_role",
                    "arn:aws:secretsmanager:<REGION>:<ACCOUNT-ID>:secret:CloudHSM_CU*",
                    "arn:aws:logs:<REGION>:<ACCOUNT-ID>:log-group:/aws/lambda/cloudhsm_lambda_example:log-stream:"
                ]
            },
            {
                "Sid": "VisualEditor3",
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeVpcs",
                    "ec2:CreateSecurityGroup",
                    "ec2:DescribeSubnets",
                    "cloudhsm:DescribeClusters",
                    "ec2:DescribeSecurityGroups",
                    "ec2:AuthorizeSecurityGroupEgress"
                ],
                "Resource": "*"
            }
        ]
    }
    	

Step 1: Build the Lambda function package

In this step, you’ll build the Lambda function package using Maven. For more information about using Maven to build an AWS Lambda Java package, refer to the AWS Lambda developer guide.

  1. On your CloudHSM client instance, install the CloudHSM Java JCE library by following the steps in the user guide.
  2. Install OpenJDK 8 and Maven:
    
    $ sudo yum install -y java maven
    	

  3. Download the sample code, unzip it and move to the created directory. The directory will have the name aws-cloudhsm-on-aws-lambda-sample-master and will include:
    • A file with the name pom.xml that contains the Maven project configuration.
    • A file with the name SymmetricKeys.java which is also available on the AWS CloudHSM Java JCE samples repo. This file contains the function that you’ll use to generate the advanced encryption standard (AES) key.
    • A file with the name AESGCMEncryptDecryptLambda.java, which will run when the Lambda function is invoked:
      
      $ wget https://github.com/aws-samples/aws-cloudhsm-on-aws-lambda-sample/archive/master.zip
      $ unzip master.zip
      $ cd aws-cloudhsm-on-aws-lambda-sample-master/
      	

  4. Create a Java Archive (JAR) package by running the below commands. This will create the JAR file under the target/ directory with the name cloudhsm_lambda_project-1.0-SNAPSHOT.jar.

    
    $ export CLOUDHSM_VER=$(ls /opt/cloudhsm/java/ | grep "cloudhsm-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JCORE_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-core-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JAPI_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-api-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ mvn validate && mvn clean package 
    	

Step 2: Create the Lambda layer

In this step, you’ll create the Lambda layer that contains the CloudHSM client and its dependencies and the CloudHSM Java library JARs.

  1. On your CloudHSM client instance, create a directory called “layer” and change directories to it:
    
    $ mkdir ~/layer && cd ~/layer
    	

  2. Create the following directories, which you’ll use in the next steps to hold the CloudHSM binary and its prerequisites such as configuration files and libraries, and the CloudHSM Java JCE JARs:
    
    $ mkdir -p lib cloudhsm/bin cloudhsm/etc java/lib
    	

  3. Copy the cloudhsm_client binary and the needed configuration files to the directories you created in the previous step.
    
    $ cp /opt/cloudhsm/bin/cloudhsm_client cloudhsm/bin
    $ cp -r /opt/cloudhsm/etc/{cloudhsm_client.cfg,customerCA.crt,client.crt,client.key,certs} cloudhsm/etc
    	

  4. Add the necessary libraries by running the commands below. These libraries are needed by the Lambda function to be able to run the cloudhsm_client binary.
    
    $ cp /opt/cloudhsm/lib/libcaviumjca.so lib/
    $ ldd /opt/cloudhsm/bin/cloudhsm_client | awk '{print $3}' | grep "^/" | xargs -I{} cp {} lib/
    	

  5. Add the CloudHSM Java JCE Jars by running the commands below. These JARs include the classes needed by the Lambda function code to run.
    
    $ cp /opt/cloudhsm/java/{cloudhsm-[0-9]*.jar,log4j-*-*.jar} java/lib/
    	

  6. Create the Lambda layer ZIP archive by running the command below. This will create the archive with the name layer.zip in the home directory.
    
    $ zip -r ~/layer.zip * 
    	

  7. Move the ZIP archive (layer.zip) to the server/machine with AWS CLI installed and configured, and run the below command to create the Lambda layer with the name cloudhsm-client-layer.
    
    $ aws lambda publish-layer-version --layer-name cloudhsm-client-layer --zip-file fileb://layer.zip --compatible-runtimes java8
    	

Step 3: Create a secret to store the CU credentials

In this step, you will use Secrets Manager to create a secret to store your CU credentials. You must perform this step on your server/machine that has AWS CLI installed and configured.

Run the following command to create a secret with the name CloudHSM_CU that contains your CU user name and password (Prerequisite 4). Make sure to replace the user name and password below with your actual CU user name and password.


$ export HSM_USER=<user>
$ export HSM_PASSWORD=<password>
$ aws secretsmanager create-secret --name CloudHSM_CU --secret-string "{ \"HSM_USER\": \"$HSM_USER\", \"HSM_PASSWORD\": \"$HSM_PASSWORD\"}"

Step 4: Create an IAM role for the Lambda function

In this step, you’ll create an IAM role that has the permissions necessary for it to be assumed by the Lambda function.

  1. On the server/machine with AWS CLI installed and configured, create a new file with the name trust.json.
    
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    	

  2. Create a role named cloudhsm_lambda_example_role using the following AWS CLI command:

    
    $ aws iam create-role --role-name cloudhsm_lambda_example_role --assume-role-policy-document file://trust.json
    	

  3. Run the commands below to create a new file named policy.json. The policy in this file allows the IAM role to perform the following actions:
    • Writing to CloudWatch Logs. This permission allows the IAM role to write to the CloudWatch Logs of the Lambda function. You can then use the logs for troubleshooting. For more information about accessing CloudWatch Logs for Lambda, refer to this guide.
    • Retrieving the CU secret value from Secrets Manager. The CU credentials stored in the CU secret are needed by the Lambda function to be able to log-in to the CloudHSM cluster.
    • Describing CloudHSM clusters. This permission allows the Lambda function to check the current HSM IPs and update its configuration if the IPs have changed.
    
    $ export SECRET_ARN=$(aws secretsmanager describe-secret --secret-id "CloudHSM_CU" --query "ARN" --output text)
    
    $ cat <<EOF> policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "CWLogs",
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": "*"
            },
            {
                "Sid": "SecretsManager",
                "Effect": "Allow",
                "Action": "secretsmanager:GetSecretValue",
                "Resource": "$SECRET_ARN"
            },
            {
                "Sid": "CloudHSM",
                "Effect": "Allow",
                "Action": "cloudhsm:DescribeClusters",
                "Resource": "*"
            }
        ]
    }
    EOF
    	

  4. Attach the policy to the IAM role created in step 2 of this section by running the following command:
    
    $ aws iam put-role-policy --role-name cloudhsm_lambda_example_role --policy-name cloudhsm_lambda_example_policy --policy-document file://policy.json
    	

  5. Attach the AWS managed policy AWSLambdaVPCAccessExecutionRole to the created role by running the command below. This policy allows the IAM role to access the VPC, which is necessary in order to run the Lambda function in a VPC and a subnet.
    
    $ aws iam attach-role-policy --role-name cloudhsm_lambda_example_role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
    	

  6. To make sure the CU secret is only accessible to the Lambda function role, run the below commands to attach a resource-based policy to the secret:
    
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export ASSUMED_ROLE_ARN=$(echo $ROLE_ARN | sed -e "s/:iam:/:sts:/" -e "s/:role/:assumed-role/" -e "s/$/\/cloudhsm_lambda_example/")
    $ export ROOT_ARN=$(echo $ROLE_ARN | sed "s/:role.*/:root/")
    $ cat <<EOF> sm_policy.json
    { "Version": "2012-10-17",
    	"Statement": [
    		{
    			"Effect": "Deny",
    			"Action": "secretsmanager:GetSecretValue",
    			"NotPrincipal": {"AWS": [
    				"$ASSUMED_ROLE_ARN",
    				"$ROLE_ARN",
    				"$ROOT_ARN"
    			]},
    				"Resource": "*"
    		}
    	]
    }
    EOF
    
    $ aws secretsmanager put-resource-policy --resource-policy file://sm_policy.json --secret-id CloudHSM_CU
    	

Step 5: Create the Lambda function

In this step, you will create a Lambda function with the necessary settings.

  1. On the server/machine with AWS CLI installed and configured, run the command below to create a security group with the name outbound-443. This security group will be attached to the Lambda function to allow it to connect to the CloudWatch Logs, Secrets Manager and CloudHSM endpoints. Make sure to replace the CLUSTER_ID below with the actual CloudHSM cluster ID of your environment.
    
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 create-security-group --group-name outbound-443 --description "Allow outbound access to port 443" --vpc-id $CLUSTER_VPC --output text)
    $ aws ec2 authorize-security-group-egress --group-id $OUTBOUND_SG --protocol tcp --port 443 --cidr 0.0.0.0/0
    	

  2. Move the JAR package generated in step 4 of the Step 1 section to the current directory on the server/machine that has AWS CLI installed and configured (The file was generated on the CloudHSM client instance under ~/aws-cloudhsm-on-aws-lambda-sample-master/target/cloudhsm_lambda_project-1.0-SNAPSHOT.jar).
  3. Replace the cluster ID and subnet ID below with the CloudHSM cluster ID of your environment, and the ID of the private Lambda subnet in your environment (Prerequisite 1.c), then run the commands below. These commands set environment variables that you’ll need for the next command.
    
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export SUBNET_ID=<subnet-xxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 describe-security-groups --filters Name=group-name,Values=outbound-443  --query SecurityGroups[0].GroupId --output text)
    $ export CLUSTER_SG=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].SecurityGroup --output text)
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export LAYER_ARN=$(aws lambda get-layer-version --layer-name cloudhsm-client-layer --version-number 1 --query LayerVersionArn --output text)
    	

  4. Create a Lambda function with the name cloudhsm_lambda_example by running the below command:
    
    $ aws lambda create-function --function-name "cloudhsm_lambda_example" \
    --runtime java8 \
    --role $ROLE_ARN \
    --handler "com.amazonaws.cloudhsm.examples.AESGCMEncryptDecryptLambda::myhandler" \
    --timeout 600 \
    --memory-size 512 \
    --vpc-config SubnetIds=$SUBNET_ID,SecurityGroupIds=$CLUSTER_SG,$OUTBOUND_SG \
    --environment "Variables={CLUSTER_ID=$CLUSTER_ID, SECRET_ID=CloudHSM_CU,liquidsecurity_daemon_id=1}" \
    --layers $LAYER_ARN \
    --zip-file fileb://cloudhsm_lambda_project-1.0-SNAPSHOT.jar
    	

The command will create a Lambda function with the following configuration:

  • Runtime: Java8
  • Execution Role: The role you created in the Step 4 section.
  • Handler: The name of the class and the function in the package created in the Step 1 section.
  • Timeout: 10 minutes.
  • Memory size: 512 MB.
  • Subnet: The private Lambda subnet in your environment (Prerequisite 1.c).
  • Security Groups: The CloudHSM cluster security group AND the security group created in step 1 of the Step 5 section for outbound access to port 443 (outbound-443).
  • Code/Package: The JAR package you created in step 4 of the Step 1 section.
  • Layer: The layer created in the Step 2 section.
  • Environmental Variables:
    • CLUSTER_ID = the CloudHSM cluster ID in your environment
    • SECRET_ID = the ID of the secret you created in the Step 3 section
    • liquidsecurity_daemon_id = 1 (this is needed by the cloudhsm_client binary)

Step 6: Run the Lambda function

In this step, you will invoke the Lambda function and check the logs to view the output.

  1. You can invoke the Lambda function using the following command. This will execute the code in the package you created in Step 1.
    
    $ aws lambda invoke --function-name cloudhsm_lambda_example out.txt
    	

  2. You can check the function’s CloudWatch Log group with a command like this one:
    
    $ aws logs filter-log-events --log-group-name "/aws/lambda/cloudhsm_lambda_example" --start-time "`date -d "now -5min" +%s`000" --query events[*].message --output text | sed "s/\t/\n/g" 
    	

    If the Lambda function was successful, the output of the function should look something like the example below:

    
    START RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a Version: $LATEST
    
    * Running GetSecretValue to get the CU credentials ...
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    
    * Running DescribeClusters to get the HSM IP ...
    DescribeClusters returned the HSM IP = 1.2.3.4
    * Getting the HSM IP inf the configuration file ...
    The configuration file has the HSM IP = 1.2.3.4
    * Starting the cloudhsm client ...
    * Waiting for the cloudhsm client to start ...
    * cloudhsm client started ...
    * Adding the Cavium provider ...
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    
    * Using credentials to Login to the CloudHSM Cluster ...
    Login successful!
    * Generating AES Key ...
    * Generating Random data to encrypt ...
    Plain Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
    * Encrypting data ...
    Cipher Text data = CA6D80AD34BBADEF34275743F309E6730ABC66BA19C2EADC731899B0FB86564EDDB9F7FC103E1C9C2A6A1E64BF2D2C48
    * Decrypting ciphertext ...
    Decrypted Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
     * Successful decryption
    * Logging out the CloudHSM Cluster
    * Closing client ...
    END RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    
    REPORT RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    Duration: 11990.69 ms
    Billed Duration: 12000 ms
    Memory Size: 512 MB
    Max Memory Used: 103 MB
    	

Note: The StatusLogger No log4j2 configuration file found error above is normal and can be ignored. This is related to missing log4j configuration which is normally used to configure logging, but is not needed in this case as the log messages are being written to CloudWatch Logs by default.

Conclusion

This solution demonstrates how to run CloudHSM workloads on Lambda, which allows you to not only leverage the flexibility of serverless computing, but also helps you meet security and compliance requirements by performing cryptographic tasks such as encryption and decryption operations. This approach also allows you to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3), or AWS Config for a seamless experience across your environment.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir is an Application Security Engineer who works with different teams to ensure AWS services, applications, and websites are designed and implemented to the highest security standards. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

How to define least-privileged permissions for actions called by AWS services

Post Syndicated from Michael Switzer original https://aws.amazon.com/blogs/security/how-to-define-least-privileged-permissions-for-actions-called-by-aws-services/

When you perform certain actions in AWS, the service you called sometimes takes additional actions in other AWS services on your behalf. AWS Identity and Access Management (IAM) now includes condition keys to make it easier to grant only the minimum level of access necessary for IAM principals (users and roles) and AWS services to take those actions. Using the aws:CalledVia condition key, you can create distinct access rules for the actions performed by your IAM principals, and for the subsequent actions taken by AWS services on your behalf.

For example, if you use AWS CloudFormation to launch an Amazon Elastic Compute Cloud (EC2) instance, the CloudFormation service independently uses your credentials to launch the instance in EC2. Your principals need permissions from both services. Based on the principle of granting least privileged permissions, you might want to prevent your principals from taking each of those actions independently. Using the new condition, you can grant your IAM principals the ability to launch EC2 instances only by using CloudFormation, without granting them direct access to EC2.

You can also use the aws:CalledVia condition key to define rules for the initial call made to AWS by your principals, without impacting the additional calls the service makes. For example, you can require that all initial calls to AWS services come from inside your Virtual Private Cloud (VPC) or your private IP subnet, but you will not impose the same rule for downstream requests to other services, since those requests come from AWS rather than your private network.

In this post, I explain the aws:CalledVia condition key and outline the context it provides during authorization. Then, I walk through a detailed use case with examples that show you how to secure access to a database managed in Amazon Athena behind a VPC. In the examples, I’ll cover how to grant access to execute queries in Athena without granting direct access to dependent services such as Amazon Simple Storage Service (S3). I will also explain how you can use the aws:CalledVia condition key to prevent access to your databases from outside your private networks.

How to use the aws:CalledVia condition key

The aws:CalledVia condition key contains an ordered list of each service principal that triggers the AWS action in another service using your credentials. It is a global condition key, meaning you can use it in combination with any AWS service action. For example, say that you use CloudFormation to read and write from an Amazon DynamoDB table that uses encryption supplied by the AWS Key Management Service (AWS KMS). When you call CloudFormation, it calls Amazon DynamoDB to read from the table, then Amazon DynamoDB calls AWS KMS to decrypt the data. Each call gets its own authorization check, and the aws:CalledVia key keeps track of who called whom.

  • For the call to CloudFormation, aws:CalledVia will be empty.
  • For the call from CloudFormation to Amazon DynamoDB, aws:CalledVia will contain [ “cloudformation.amazonaws.com” ]
  • For the call from Amazon DynamoDB to AWS KMS, aws:CalledVia will contain [ “cloudformation.amazonaws.com” , “dynamodb.amazonaws.com” ]

    Important: The aws:CalledVia context is only available when AWS reuses your credentials after you make a request. For example, if you provided a service role to CloudFormation instead of triggering the action directly, the aws:CalledVia context would be empty for a call from CloudFormation to Amazon DynamoDB.

To identify the service principal that aws:CalledVia returns for each AWS service, see to the CalledVia Services table in the documentation. AWS will update this list as more services add support for this key.

You can use condition operators, such as StringLike and StringEquals, in your policies to check the contents of the key during authorization. Because aws:CalledVia is a multivalued condition key, you also need to use the ForAnyValue or ForAllValues set operators along with your comparison operators to check the content of the key. For more information, see Creating a Condition with Multiple Keys or Values in the IAM documentation.

Here’s an example policy that follows this use case.


{ 
   "Version":"2012-10-17",
   "Statement":[ 
      { 
         "Sid":"AllowCFNActions",
         "Effect":"Allow",
         "Action":[ 
            "cloudformation:CreateStack*"
         ],
         "Resource":"*"
      },
      { 
         "Sid":"AllowDDBActionsViaCFN",
         "Effect":"Allow",
         "Action":[ 
            "dynamodb:GetItem",
            "dynamodb:BatchGetItem",
            "dynamodb:PutItem",
            "dynamodb:UpdateItem",
            "dynamodb:DeleteItem"
         ],
         "Resource":"arn:aws:dynamodb:region:111122223333:table",
         "Condition":{ 
            "ForAnyValue:StringEquals":{ 
               "aws:CalledVia":[ 
                  "cloudformation.amazonaws.com"
               ]
            }
         }
      },
      { 
         "Sid":"AllowKMSActionsViaDDB",
         "Effect":"Allow",
         "Action":[ 
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey",
            "kms:DescribeKey"
         ],
         "Resource":[
            "arn:aws:kms:region:111122223333:key/example"
         ],
         "Condition":{ 
            "ForAnyValue:StringEquals":{ 
               "aws:CalledVia":[ 
                  "dynamodb.amazonaws.com"
               ]
            }
         }
      }
   ]
}

I’ll walk you through each statement in this policy. In the first statement, AllowCFNActions, I grant access to use CloudFormation to create stacks and stack sets. The second statement, AllowDDBActionsViaCFN, grants access to read and write from a specific DynamoDB table, but only if CloudFormation took the action. Finally, the third statement AllowKMSActionsViaDDB allows AWS KMS encrypt and decrypt operations that are triggered through Amazon DynamoDB, using the customer master key (CMK) specified in the Resource element. Each condition statement uses the ForAnyValue:StringEquals combination of operators to check for the existence of the CloudFormation or DynamoDB service principals. Putting it all together, the policy allows an IAM principal to do the following:

  1. Create stacks and stack sets by using CloudFormation.
  2. Read and write to Amazon DynamoDB tables via CloudFormation.
  3. Encrypt and decrypt by using AWS KMS actions via Amazon DynamoDB.

Controlling access based on the first and last requesting services

Along with aws:CalledVia, AWS has introduced two companion keys to make it easy to retrieve the first and last services in the chain of requests. The aws:CalledViaFirst condition key returns the first service principal in the chain, and aws:CalledViaLast returns the last service principal in the chain. For example, if aws:CalledVia contained [ “cloudformation.amazonaws.com” , “dynamodb.amazonaws.com” ], then aws:CalledViaFirst would contain “cloudformation.amazonaws.com” and aws:CalledViaLast would contain “dynamodb.amazonaws.com”.

Still following the previous example, here’s a policy that uses aws:CalledViaFirst to allow access to Amazon DynamoDB and AWS KMS, as long as the entry point is CloudFormation. You can use this policy to ensure that Amazon DynamoDB tables in your organization are accessed according to the best practices you’ve defined in your CloudFormation templates.


{ 
   "Version":"2012-10-17",
   "Statement":[ 
      { 
         "Sid":"AllowCFNActions",
         "Effect":"Allow",
         "Action":[ 
            "cloudformation:CreateStack*"
         ],
         "Resource":"*"
      },
      { 
         "Sid":"AllowDDBAndKMSActionsViaCFN",
         "Effect":"Allow",
         "Action":[ 
            "dynamodb:GetItem",
            "dynamodb:BatchGetItem",
            "dynamodb:PutItem",
            "dynamodb:UpdateItem",
            "dynamodb:DeleteItem",
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey",
            "kms:DescribeKey"
         ],
         "Resource":[
            "arn:aws:dynamodb:region:111122223333:table"
            "arn:aws:kms:region:111122223333:key/example"
         ],
         "Condition":{ 
            "StringEquals":{ 
               "aws:CalledViaFirst":[ 
                  "cloudformation.amazonaws.com"
               ]
            }
         }
      }
   ]
}

I’ll walk you through each statement in this policy. The first statement of the policy, AllowCFNActions, enables the principal to create stacks and stack sets through CloudFormation. The second statement, AllowDDBAndKMSActionsViaCFN, allows Amazon DynamoDB and AWS KMS actions for a specific CMK and table, but only under the condition that the first request made by the principal was to the CloudFormation service.

Now that I’ve described the conditions and their usage, I’ll share detailed examples in the use case that follows.

Use case: Permissions to use Athena inside your Virtual Private Cloud

Amazon Athena is a service you can use to analyze data in Amazon S3 by using standard SQL. Athena is serverless, which makes it cost-effective for operating a data lake. In this example, I define the IAM permissions for a role that manages and executes Athena queries from behind a secure network perimeter.

In this use case, my business metrics team needs an IAM role to execute queries against my data lake that is stored in Amazon S3. They’ll use the IAM role to manage the BizMetrics workgroup in Athena, which is responsible for managing exploratory queries and generating regular reports for key performance indicators within my company. Because my business metrics are internal to my organization, I want to ensure the data is only accessible inside my Amazon Virtual Private Cloud (VPC). VPCs let you create a logically-isolated section of the AWS Cloud and connect it to your private network. For more information about VPCs, see What is Amazon VPC? in the VPC documentation.

When an IAM role makes a call to Athena to execute a query inside a VPC, Athena makes subsequent calls to Amazon S3 and other services to complete the task. You can see the interaction on the following diagram.
 

Figure 1: IAM role makes a call to Athena to execute a query inside a VPC

Figure 1: IAM role makes a call to Athena to execute a query inside a VPC

The calls from the IAM role to Athena, and from Athena to Amazon S3, use the same role credentials. This means that the principal needs permissions for both Athena and Amazon S3 actions to accomplish the query. You can also see that the IAM role calls Athena through the VPC endpoint, rather than the public AWS endpoint. VPC endpoints allow you to communicate with AWS from your private network without a connection to the public internet. For more information about VPC endpoints, see Interface VPC Endpoints (AWS PrivateLink) in the VPC documentation.

The subsequent calls between Athena and Amazon S3 don’t use the VPC endpoint. If I want to require that each call to Athena must use the VPC endpoint, I cannot apply the same restriction to Athena’s calls to Amazon S3. I will need to use aws:CalledVia to define distinct permissions for the initial call to Athena, and the call to Amazon S3 from Athena. I’ll apply these permissions to the IAM role I create, as well as the Amazon S3 bucket that contains the data.

Step 1: Define permissions for the IAM Role

For the purposes of this use case, my organization’s data lake has one database, revenuedata, in the bucket called examplecorp-business-data. When you follow along with these examples, you should replace the bucket and database names in the policies with your own resources.

  1. First, I create the IAM role BizMetricsQuery. I can create the role in the IAM console without any permission policies attached. If you haven’t created a role before, see Creating IAM Roles in the IAM documentation.
  2. Next, I create a managed policy that defines permissions for the BizMetricsQuery role. For more information about creating a managed policy or attaching it to the BizMetricsQuery role, see Managing IAM Policies in the IAM documentation.

    The following policy grants the permissions I need:

    
    	{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowAthenaReadActions",
                "Effect": "Allow",
                "Action": [
                    "athena:ListWorkGroups",
                    "athena:GetExecutionEngine",
                    "athena:GetExecutionEngines",
                    "athena:GetNamespace",
                    "athena:GetCatalogs",
                    "athena:GetNamespaces",
                    "athena:GetTables",
                    "athena:GetTable"
                ],
                "Resource": "*",
                "Condition":{ 
                   "StringEquals":{ 
                      "aws:SourceVpce":[ 
                         "vpce-0e880cb0a9EXAMPLE"
                      ]
                   }
                }
            },
            {
                "Sid": "AllowAthenaWorkgroupActions",
                "Effect": "Allow",
                "Action": [
                    "athena:StartQueryExecution",
                    "athena:GetQueryResults",
                    "athena:DeleteNamedQuery",
                    "athena:GetNamedQuery",
                    "athena:ListQueryExecutions",
                    "athena:StopQueryExecution",
                    "athena:GetQueryResultsStream",
                    "athena:ListNamedQueries",
                    "athena:CreateNamedQuery",
                    "athena:GetQueryExecution",
                    "athena:BatchGetNamedQuery",
                    "athena:BatchGetQueryExecution",
                    "athena:GetWorkGroup"
                ],
                "Resource": [
                    "arn:aws:athena:us-east-1:111122223333:workgroup/BizMetrics"
                ],
                "Condition":{ 
                   "StringEquals":{ 
                      "aws:SourceVpce":[ 
                         "vpce-0e880cb0a9EXAMPLE"
                      ]
                   }
                }
            },
            {
                "Sid": "AllowGlueActionsViaVPCE",
                "Effect": "Allow",
                "Action": [
                    "glue:GetDatabase",
                    "glue:GetDatabases",
                    "glue:CreateDatabase",
                    "glue:GetTables",
                    "glue:GetTable"
                ],
                "Resource": [
                    "arn:aws:glue:us-east-1:111122223333:catalog",
                    "arn:aws:glue:us-east-1:111122223333:database/default",
                    "arn:aws:glue:us-east-1:111122223333:database/revenuedata",
                    "arn:aws:glue:us-east-1:111122223333:table/revenuedata/*"
                ],
                "Condition":{ 
                   "StringEquals":{ 
                      "aws:SourceVpce":[ 
                         "vpce-0e880cb0a9EXAMPLE"
                      ]
                   }
                }
            },
            {
                "Sid": "AllowGlueActionsViaAthena",
                "Effect": "Allow",
                "Action": [
                    "glue:GetDatabase",
                    "glue:GetDatabases",
                    "glue:CreateDatabase",
                    "glue:GetTables",
                    "glue:GetTable"
                ],
                "Resource": [
                    "arn:aws:glue:us-east-1:111122223333:catalog",
                    "arn:aws:glue:us-east-1:111122223333:database/default",
                    "arn:aws:glue:us-east-1:111122223333:database/revenuedata",
                    "arn:aws:glue:us-east-1:111122223333:table/revenuedata/*"
                ],
                "Condition":{ 
                   "ForAnyValue:StringEquals":{ 
                      "aws:CalledVia":[ 
                         "athena.amazonaws.com"
                      ]
                   }
                }
            },
    
            {
                "Sid": "AllowS3ActionsViaAthena",
                "Effect": "Allow",
                "Action": [
                    "s3:GetBucketLocation",
                    "s3:GetObject",
                    "s3:ListBucket",
                    "s3:ListBucketMultipartUploads",
                    "s3:ListMultipartUploadParts",
                    "s3:AbortMultipartUpload",
                    "s3:CreateBucket",
                    "s3:PutObject"
                ],
                "Resource": [
                    "arn:aws:s3:::examplecorp-business-data/*",
                    "arn:aws:s3:::athena-examples-*"
                ],
                "Condition":{ 
                   "ForAnyValue:StringEquals":{ 
                      "aws:CalledVia":[ 
                         "athena.amazonaws.com"
                      ]
                   }
                }
            }
        ]
    }
    	

    I’ll walk you through each statement in this policy. First, the AllowAthenaReadActions and AllowAthenaWorkgroupActions statements allow the role to use Athena actions within the BizMetrics workgroup. The role calls Athena directly from inside the VPC, so there is a condition on these statements that requires that the calls must come from my VPC endpoint.

    Next, the AllowGlueActionsViaVPCE and AllowGlueActionsViaAthena statements allow the role to access the AWS Glue Data Catalog and view the tables within my data lake. AWS Glue makes it easy to catalog your data and make it searchable, queryable, and available for ETL operations. To view the database and tables in the Athena console, I need access to these AWS Glue actions. The role calls AWS Glue directly, and allows Athena to call AWS Glue, so the policy has two statements that allow both paths of communication respectively. For more information about required permissions for Athena and AWS Glue, see Fine-Grained Access to Databases and Tables in the AWS Glue Data Catalog in the Amazon Athena documentation.

    Finally, the AllowS3ActionsViaAthena statement enables Athena to call Amazon S3 on my behalf. I use the aws:CalledVia condition to require that these S3 actions are only available through Athena. In the resource section of the policy, I specify that the access is limited to the examplecorp-business-data bucket, as well as the Athena examples repository, which is required for using Athena with the AWS console.

  3. Next, I attach the policy to the BizMetricsQuery role. With this policy attached, I ensure the role can use Athena to access data in the Amazon S3 bucket without granting permissions to access the bucket directly. I also ensure that the role can only access the data through the VPC endpoint located within my private network.

Step 2: Define permissions on the S3 bucket

I need to make sure the BizMetricsQuery role is the only IAM identity that has access to the data in the examplecorp-business-data S3 bucket. To do this, I need to update the bucket policy that is attached to the bucket.

The following bucket policy grants the BizMetricsQuery role access to the data through the VPC endpoint.


{
	"Version": "2012-10-17",
	"Statement": [
    {
      "Sid": "AllowS3ActionsThroughAthena",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:role/BizMetricsQuery"
      },
      "Action": [
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucketMultipartUploads",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::examplecorp-business-data/*",
        "arn:aws:s3:::examplecorp-business-data"
      ],
      "Condition": {
        "ForAnyValue:StringEquals": {
          "aws:CalledVia": [
            "athena.amazonaws.com"
          ]
        }
      }
    }
  ]
}

This policy is very similar to the statement on the role policy that grants permissions to Amazon S3. In this policy, I allow the role to take Amazon S3 operations, but only if the aws:CalledVia context includes the Athena service. In the Principal element of the policy, I include the BizMetricsQuery role’s ARN. This is an important step because without this requirement, anyone could interact with the data in the bucket as long as they make the call using Athena. I want to require that the only role that can reach the data is BizMetricsQuery, which is also subject to the VPC rules.

With this policy attached to the examplecorp-business-data bucket, I ensure that access to the databases is only available through the BizMetricsQuery role. Because the BizMetricsQuery role can only access the data through Athena, and all calls to Athena must use my VPC endpoint, this also ensures the Amazon S3 data is only accessible from within my VPC.

Step 3: Run the Athena queries

I’ll try a simple query in Athena from inside the VPC to make sure everything works as expected. I use the BizMetricsQuery role to view the service_revenue table in the revenuedata database. If you are following along with this example, you should replace the database and table in the query with your own resource names.

  1. In the Athena console, on the Query Editor page, enter the following query in a new tab:
     
    SELECT * FROM “revenuedata”.”service_revenue” limit 10;
     
    Then select Run query.
     
    Figure 2: Run an Athena test query

    Figure 2: Run an Athena test query

    When I run this test query, my role performs the StartQueryExecution action. For this request, the aws:CalledVia context is empty because the role takes the action directly. The role is allowed to use Athena actions, so it passes the first authorization check.

    To perform the query, Athena subsequently calls the Amazon S3 service. These requests have the aws:CalledVia context athena.amazonaws.com. The role has access to the Amazon S3 actions when aws:CalledVia is equal to this value, so it passes the second and final authorization check.

  2. When I try to access the data directly by using Amazon S3, I get an Access Denied error message.
     
    Figure 3: Error message attempting to access directly from Amazon S3

    Figure 3: Error message attempting to access directly from Amazon S3

    In this case, the aws:CalledVia context is empty because I made the call directly to the Amazon S3 service. The policy requires me to call Amazon S3 only through Athena, so this request is denied.

  3. When I call Athena by using the same query from outside my VPC, I get the following error message.
     
    Figure 4: Error message attempting to call Athena from outside my VPC

    Figure 4: Error message attempting to call Athena from outside my VPC

In the Athena case, the SourceVPCE context is not present because I didn’t make the call using the VPC endpoint. In Amazon S3, the policy requires that the calls can only happen from inside my private network, so I get an access denied message in that case, as well. With this policy and example, I’ve demonstrated that I can only call Athena from within the VPC, and shown that calling the dependent services directly is not possible.

Next steps

By using the aws:CalledVia condition key, you can now define specific permissions for when AWS services make calls to other services on your behalf. This makes it easier to ensure that your company’s guidelines are followed consistently. For more information about aws:CalledVia and other IAM conditions, see AWS Global Condition Context Keys in the IAM documentation. If you have any questions, comments, or concerns, please reach out to AWS Support or our Identity and Access Management Forums.

If you have feedback about this blog post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Michael Switzer

Mike is the product manager for the Identity and Access Management service at AWS. He enjoys working directly with customers to identify solutions to their challenges, and using data-driven decision making to drive his work. Outside of work, Mike is an avid cyclist and outdoorsperson. He holds a master’s degree in computational mathematics from the University of Washington.

How to use KMS and IAM to enable independent security controls for encrypted data in S3

Post Syndicated from Paco Hope original https://aws.amazon.com/blogs/security/how-to-use-kms-and-iam-to-enable-independent-security-controls-for-encrypted-data-in-s3/

Typically, when you protect data in Amazon Simple Storage Service (Amazon S3), you use a combination of Identity and Access Management (IAM) policies and S3 bucket policies to control access, and you use the AWS Key Management Service (AWS KMS) to encrypt the data. This approach is well-understood, documented, and widely implemented. However, many customers want to extend the value of encryption beyond basic protection against unauthorized access to the storage layer where the data resides. They want to enforce a separation of duties between which team manages access to the storage layer and which team manages access to the encryption keys. This model ensures that configuration errors made by only one of these teams won’t compromise the data in ways that grant unauthorized access to plaintext data. For example, if the team that owns permissions to the S3 bucket mistakenly grants access to unauthorized users, when those users attempt to access objects in S3 they will fail. Why? Because the separate team who manages access to the keys didn’t grant those users access to use the keys for decryption.

You can create this kind of independent access control by combining KMS encryption with IAM policies and S3 bucket policies. When data is encrypted with a customer-managed KMS customer master key (CMK), the key’s policy acts as an independent access control. Users can be prevented from accessing the data, even though the IAM permissions and the S3 bucket policy would permit the access. Figure 1 shows a Venn diagram of the access that is required. The bucket policy, the IAM policy, and the KMS key policy all play a role. Users have permission for the data only when they are granted permissions in all three policies.
 

Figure 1: Venn diagram showing the required permissions for access

Figure 1: Venn diagram showing the required permissions for access

This exercise builds the resources shown in Figure 2:

  • Three AWS IAM roles
    1. A role (1) with permission to create and manage permissions on an S3 bucket (secure-bucket-admin)
    2. A role (2) with permission to create and manage permissions on a KMS master key (secure-key-admin)
    3. A role (3) with permissions to access (but not manage) a specific S3 bucket and to use (but not manage) a specific AWS KMS customer master key (authorized-users).
  • An S3 bucket (4) with a custom bucket policy (5) that only allows data to be stored if that data is encrypted with a specific KMS key. The ability to write to or read from this bucket will be restricted to the IAM role authorized-users.
  • A KMS key (6) with a specific key policy (7) that can only be used by the IAM role authorized-users and only managed by the IAM user secure-key-admin.

 

Figure 2: Architecture diagram

Figure 2: Architecture diagram

When you have completed this exercise, you will have:

  • Created an S3 bucket protected by IAM policies, and a bucket policy that enforces encryption.
  • Attached the IAM role authorized-users to an EC2 instance so your applications in that instance can assume that role and access encrypted data in the S3 bucket.
  • Uploaded and downloaded data from the bucket that is protected by the KMS key.
  • Demonstrated that when the KMS key policy is modified, removing access for the IAM role authorized-users, the applications on the EC2 instance no longer have access to the data in the S3 bucket.

Set things up

For simplicity, I create the S3 bucket, KMS keys, and EC2 instances all in the same region and in the same AWS account. It’s possible to use KMS keys that are owned by a different AWS account, to assume roles across accounts, and to have instances in different regions from the buckets and the keys. I discuss those variations at the end.

I assume you have at least one administrator identity available to you already: one that has broad rights for creating users, creating roles, managing KMS keys, and launching EC2 instances. I will refer to this as your “Admin identity” throughout these instructions. This can be a federated identity (for example, from your corporate identity provider or from a social identity), or it can be an AWS IAM user.

Assuming Roles

Throughout this exercise I will use IAM roles to acquire and release privileges. If you’re working from the AWS command line, you’ll need to configure your command line environment to use profiles. If you’re working from the AWS Management Console, then you’ll follow these instructions to switch role. If you haven’t worked with roles before, take a minute to follow those instructions and become familiar with it before continuing.

Step 1: Create IAM policies

First, I will create 3 policies that grant very specific sets of rights. Then, I will attach those policies to roles: two roles for administrators, and one for software running on EC2 instances. You’re going to create an S3 bucket in Step 3. That bucket, like all S3 buckets, needs a globally unique name. You will reference that bucket’s name in these policies, even though you will create the bucket later. Decide the name of your bucket now. When you reach steps that require you to type or paste a JSON policy document for your bucket policy, remember to use the name of your bucket where I have written secure-demo-bucket.

Step 1a: Create the S3 bucket management policy

While logged in to the console as your Admin user, create an IAM policy in the web console using the JSON tab. Name the policy secure-bucket-admin. When you reach the step to type or paste a JSON policy document, paste the JSON from Listing 1 below. This policy allows broad S3 administration rights (creating, deleting, and modifying policies), so it is a high privilege policy. In an effort to be concise, it grants all permissions to S3 and then takes a few away by explicitly denying them. The intention is to permit managing all aspects of the bucket’s operation, while denying all access to the contents of the bucket. The explicit deny mechanism is important because, due to IAM’s policy evaluation logic, an explicit deny cannot be overridden by subsequent “allow” statements or by attaching additional policies. As the S3 service evolves over time and new features are added, the policy will permit using those new features, without any change to this policy. If you prefer to enable features explicitly, you’ll need to rewrite this policy to explicitly allow only the features you want, and then come back and revise the policy every so often, as S3 features are added that your role needs to use.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAllActions",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Sid": "DenyObjectAccess",
      "Action": [
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:PutObjectVersionAcl"
      ],
      "Effect": "Deny",
      "Resource": "arn:aws:s3:::secure-demo-bucket"
    }
  ]
}

Listing 1: secure-bucket-admin IAM policy
 
Your policy will have an ARN (it will look something like arn:aws:iam::111122223333:policy/secure-bucket-admin). Make a note of this ARN. You will use it later to attach to the secure-bucket-admin role you’ll create in step 2.

Step 1b: Create the KMS administrator policy

While logged in to the console as your Admin user, create an IAM policy in the web console using the JSON tab. Name the policy secure-key-admin. When you reach the step to type or paste a JSON policy document, paste the JSON from Listing 2 below. Be sure to add your own 12-digit AWS account number where I have written 111122223333. This policy allows broad KMS administration rights (creating keys, granting access to keys, and modifying key policies), so it is a high privilege policy. In an effort to be concise, this policy grants all permissions to the KMS service and then denies certain rights through an explicit deny statement. The intention is to permit managing all aspects of KMS keys, while denying all access to perform encryption and decryption using KMS keys. As the KMS service evolves over time and new features are added, the policy will permit using those new features, without any change to this policy. If you prefer to enable features explicitly, you’ll need to rewrite this policy to explicitly allow only the features you want, and then come back and revise the policy every so often, as KMS features are added that your role needs to use.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAllKMS",
      "Action": "kms:*",
      "Effect": "Allow",
      "Resource": " arn:aws:kms:*:111122223333:key/*"
    },
    {
      "Sid": "DenyKMSKeyUsage",
      "Action": [
        "kms:Decrypt",
        "kms:Encrypt",
        "kms:GenerateDataKey",
        "kms:ReEncryptFrom",
        "kms:ReEncryptTo"
      ],
      "Effect": "Deny",
      "Resource": " arn:aws:kms:*:111122223333:key/*"
    }
  ]
}

Listing 2: secure-key-admin IAM policy
 
Your policy will have an ARN (it will look something like arn:aws:iam::111122223333:policy/secure-key-admin). Make a note of this ARN. You will use it later to attach to the secure-key-admin role you’ll create in step 2.

Step 1c: Create the S3 bucket usage policy

This final policy grants access to read and write encrypted data in the target S3 bucket. This is a narrowly-scoped policy that only grants rights to a single bucket. While logged in to the console as your Admin user, create an IAM policy in the web console using the JSON tab. Name the policy secure-bucket-access.

When you reach the step to type or paste a JSON policy document for your bucket policy, paste the JSON from Listing 3 below, substituting the name of your bucket on the two lines where I have secure-demo-bucket.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BasicList",
            "Effect": "Allow",
            "Action": [ "s3:ListAllMyBuckets", "s3:HeadBucket" ],
            "Resource": "*"
        },
        {
            "Sid": "AllowSecureBucket",
            "Effect": "Allow",
            "Action": [ "s3:PutObject", "s3:GetObjectAcl",
                "s3:GetObject", "s3:DeleteObjectVersion",
                "s3:DeleteObject", "s3:GetBucketLocation",
                "s3:GetObjectVersion" ],
            "Resource": [
                "arn:aws:s3:::secure-demo-bucket/*",
                "arn:aws:s3:::secure-demo-bucket"
            ]
        }
    ]
}

Listing 3: secure-bucket-access IAM policy

Note: In an effort to grant a minimal, but realistic, set of permissions, this IAM policy only grants access to basic get, put, and delete operations. You might have a use for other features, like tagging objects. If so, you will need to change the policy to enable the features you want to use.

Your policy will have an ARN (it will look something like arn:aws:iam::111122223333:policy/secure-bucket-access). Make a note of this ARN. You will use it later to attach to the authorized-users role you’ll create in step 2.

You might ask why this policy designed to control access to encrypted objects has no KMS permissions in it. Wouldn’t that prevent the users that assume this IAM role from using the encryption keys? It would normally prevent them, except you have the ability to list the authorized-users IAM role within the resource policy attached to the KMS key you’re about to create. By placing the authorized-users role in the KMS key resource policy, it further enforces the separation of duties so administrators in the account with an ability to modify IAM policies don’t inadvertently escalate privilege to other IAM users/roles and give them permissions to use KMS keys for decryption.

Step 2: Create IAM roles

An AWS IAM role is an identity that you can create in an AWS account that has specific permissions. An IAM role is similar to an IAM user, because it has permission policies that determine what the identity can and cannot do in AWS. It’s different from an IAM user because it’s not associated with a single person. A role can be used by users, by EC2 instances, by AWS services, or by other entities like AWS Lambda functions that you allow to use it. The IAM policies we created in step 1 do not grant permissions until we assign them to roles and assign the roles to users or entities.

Step 2a: Create the S3 bucket management role

This role will be used by administrators who need to manage the properties of the bucket.

  1. Follow the online instructions for creating an IAM role.
  2. Choose Another AWS account under the section labeled Select type of trusted entity.
  3. For the authorized AWS account ID, enter the 12-digit account number for the account that you’re working in. If you intend to authorize AWS IAM users that are defined in a different AWS IAM account to access the S3 bucket and decrypt objects, then you would include that AWS account’s ID number, instead.
  4. Name the IAM role secure-bucket-admin and import the customer managed policy named secure-bucket-admin that you created in step 1a to the role that you have created.

    Your AWS IAM role will have an ARN (it will look something like arn:aws:iam::111122223333:role/secure-bucket-admin). Make a note of this ARN. You will use it in the step 3 when you create your S3 bucket.

Step 2b: Create the KMS key management role

This role will be used by administrators who need to manage the KMS customer master keys that protect the data. The actions you take to manage the keys will be authorized by this role. Importantly, this role has no ability to modify the bucket, grant access to the bucket, or access any of the data in the bucket.

  1. Follow the online instructions for creating an IAM role.
  2. In the Select type of trusted entity section, select Another AWS account.
  3. For the authorized AWS account ID, enter the 12-digit account number for the account that you’re working in. If you intend to authorize AWS IAM users that are defined in a different AWS IAM account, then you would include that AWS account’s ID number, instead.
  4. Name the IAM role secure-key-admin and import the customer-managed policy named secure-key-admin that you created in step 1b to the role that you have created.

    Your AWS IAM role will have an ARN (it will look something like arn:aws:iam::111122223333:role/secure-key-admin). Make a note of this ARN. You will use it in step 4 when you create your KMS key.

Step 2c. Create the bucket usage role

This role will grant permissions to EC2 instances. An EC2 instance running with this role will be able to create and read encrypted data in the protected S3 bucket.

  1. Follow the online instructions for creating an IAM role.
  2. In the Select type of trusted entity section, select AWS service.
  3. Choose EC2 as the service that you will authorize. This authorizes all applications running on that EC2 instance to use credentials with permissions attached to the role.
  4. Name the IAM role authorized-users and import the customer-managed secure-bucket-access policy that you created in step 1c to the role that you have created.

This role is not for users trying to access the S3 bucket from any arbitrary application that happens to have the role’s credentials. It will only be used by users operating within applications running in AWS EC2 instances.

Step 3: Create an S3 bucket for the encrypted data

Log in to the console using your secure-bucket-admin role. (Either log in with the correct federated identity, or with the AWS IAM user you created in step 1d). Follow the instructions to create a bucket that will hold the encrypted data. In my example, I call my bucket secure-demo-bucket. You chose your own unique bucket name back in step 1. Type that bucket name throughout these steps where I use secure-demo-bucket. You will set a bucket policy and properties on that bucket later.

Step 4: Create a KMS key to encrypt and decrypt the data in the S3 bucket

Log out of the console and log back in using your secure-key-admin role. Create a customer-managed customer master key (CMK) to encrypt and decrypt the data in the S3 bucket you just created. If you already have a customer-managed CMK created that you want to use for this purpose, you can do that. To use your own CMK, skip steps 1-5 below about creating a key and, instead, select your existing key in the KMS console and then follow steps 6-8 to change the key policy to allow the authorized-users role permissions to use the key.

  1. In the AWS console, go to Key Management Service.
  2. Select the Create Key button.
  3. On the Step 1 screen, set a display name (called an “Alias”) for the key and a description. I recommend a meaningful description that tells others what the key is for.
  4. On the Step 2 screen, set tags if you need them to track usage of keys for billing purposes. Tags won’t have a functional impact in this exercise so you can skip this step if you want by selecting Next.
  5. On the Step 3 screen, select key administrators. Pick only the secure-key-admin IAM role. You must not pick the secure-bucket-admin role or the authorized-users role as key administrators to ensure separation of duties. For example, if you were to pick the authorized-users IAM role, then any user that assumed that role could escalate their own (or others’) privileges to use this key to decrypt any other data encrypted under this key in your account. If you were to pick the secure-bucket-admin user, then that user could modify permissions both on the S3 bucket and the KMS key in ways that allowed unauthorized users access to decrypt data.
  6. On the Step 4 screen, select key users. Pick only the authorized-users IAM role you created in step 2c.
  7. On the Step 5 screen, select Finish.

    After you have created the key, make note of the key’s ARN. It will look something like this:

    arn:aws:kms::11112222333:key/1234abcd-12ab-34cd-56ef-1234567890ab

    You will need it for the next step where you enforce all objects uploaded into the S3 bucket to be encrypted under this key.

Step 5: Modify the bucket policy

Log out of the console and log back with the secure-bucket-admin role. You’re going to attach a bucket policy to the bucket that does two things: it requires objects to be encrypted and it requires them to be encrypted with a specific KMS key. You will accomplish this by explicitly denying any attempt to call PutObject unless the correct conditions are true. This helps you increase your confidence that you will not store unencrypted data in this bucket.

Find the secure-demo-bucket bucket in the S3 web console, and then modify its bucket policy. Use the code from Listing 4 below as the entire bucket policy. Be sure to change secure-demo-bucket to the actual name of the bucket that you’re using in both places where it appears in the policy. You recorded the key’s ARN in step 4, make sure you insert that ARN for your KMS key where I use an example key ARN below.


{
    "Version": "2012-10-17",
    "Id": "PutObjPolicy",
    "Statement": [
      {
        "Sid": "DenyUnencryptedObjectUploads",
        "Effect": "Deny",
        "Principal": "*",
        "Action": "s3:PutObject",
        "Resource": "arn:aws:s3:::secure-demo-bucket/*",
        "Condition": {
          "StringNotEquals": {
            "s3:x-amz-server-side-encryption": "aws:kms"
          }
        }
      },
      {
        "Sid": "DenyWrongKMSKey",
        "Effect": "Deny",
        "Principal": "*",
        "Action": "s3:PutObject",
        "Resource": "arn:aws:s3:::secure-demo-bucket/*",
        "Condition": {
          "StringNotEquals": {
            "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms::11112222333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
          }
        }
      }
    ]
  }

Listing 4: Bucket policy requiring encryption

Note: This bucket policy is not retroactive: If you apply this policy to a bucket that already exists and already has unencrypted objects, nothing happens to the objects that are already in the bucket. They remain unencrypted. They can be fetched or deleted. Once the policy is applied, however, new objects cannot be put in the bucket unless they are correctly encrypted.

Instead of applying a bucket policy, you could consider turning on S3 default encryption. This feature forces all new objects uploaded to an S3 bucket to be encrypted using the KMS key you created in step 4 unless the user specifies a different key. This feature doesn’t prohibit callers from encrypting objects under other KMS keys, but it ensures that the data is protected even if the user does not specify KMS encryption when putting the object. The bucket policy in Listing 4 is a bit stricter than S3 default encryption because it ensures that no object is ever encrypted by any key other than the CMK created in step 4. That strictness means the attempt to put an object fails, unless the caller explicitly names the KMS keyId in every S3 PUT request. With S3 default encryption, attempts to put an object without specifying encryption will succeed, and the data will be protected by the named KMS CMK.

Step 7: Launch an EC2 instance to demonstrate the solution

The final step to showing how this solution works is to launch an EC2 instance and show that applications running in that instance can write and read data in the S3 bucket you created. If you launch an EC2 instance that has your authorized-users role attached and log in on that instance, you will be able to upload and download objects from the bucket, encrypting and decrypting transparently as you do it. No other identity (for example, other IAM users, other IAM roles, other EC2 instances, and Lambda functions) will be able to upload and download data to this S3 bucket because these other identities don’t have the permissions to use the KMS key that protects the data.

Start by logging out of the console and log back in as your Admin user. Following instructions to launch an EC2 instance:

  1. Choose an Amazon Linux AMI.
  2. Choose an instance type. Any instance type will work. If you launch an Amazon Linux t2.micro instance, it might qualify for free tier pricing.
  3. For IAM Role, select the authorized-users role from the drop-down menu.
  4. Make sure you specify an SSH key that you have access to, and make sure that you have a way to reach the EC2 instance over the network.

Satisfy yourself that it works as expected

At this point, the solution is complete and is running. I want to demonstrate that the KMS key is providing the independent access controls the way I said it would. I will modify the key policy to remove the instance’s rights to use the KMS key. Then, I will confirm that the commands that had succeeded before now fail after the key policy change. This shows how the KMS key and its policy are completely independent of the S3 bucket policies and the IAM policies.

Test 1: Uploading encrypted objects

Using SSH, log in on the EC2 instance you launched that has the authorized-users role attached.

You will need to download a file onto the EC2 instance that you can then upload, encrypted, to the S3 bucket. If you don’t have a file that you want to use, you can use the AWS Cryptographic Details whitepaper as a reasonable test file.

On the instance, run the following command to download a local copy of the AWS Cryptographic Details whitepaper that you can use as test data:


curl -O 'https://d1.awsstatic.com/whitepapers/KMS-Cryptographic-Details.pdf'

Side note: You should also read this whitepaper. It’s very informative on how AWS KMS is built and operated to secure your encryption keys.

On the EC2 instance, use the AWS command line to upload the file to the S3 bucket. Note all the options that tell S3 to use KMS encryption and to use the correct key ID. Remember to insert the bucket name for the bucket that you’re using and the ARN of your KMS key from step 4 above.


aws s3 cp KMS-Cryptographic-Details.pdf s3://secure-demo-bucket/
--sse aws:kms --sse-kms-key-id arn:aws:kms::11112222333:key/1234abcd-12ab-34cd-56ef-1234567890ab

If all went well, you should see a message like the following, showing that the object was uploaded successfully:


upload: ./KMS-Cryptographic-Details.pdf to s3://secure-demo-bucket/KMS-Cryptographic-Details.pdf

Test 2: Upload an Unencrypted Object

You can now prove the fact that a user on this instance attempting to upload unencrypted objects will fail. Run this command to upload a second copy of the PDF file to be called test2.pdf. Be sure to substitute your bucket’s name into the command.


aws s3 cp KMS-Cryptographic-Details.pdf s3://secure-demo-bucket/test2.pdf

You’ll notice this command doesn’t include the options instructing S3 to use KMS to encrypt the file. You should see this error message:


An error occurred (AccessDenied) when calling the PutObject operation: Access Denied

If you see no error, then double-check that your bucket policy in Step 5 above is correct.

Test 3: Downloading Encrypted Objects

You’ve now proven that the EC2 instance can upload encrypted objects and that unencrypted objects are refused. Now, you can prove that the EC2 instance has access to cause S3 to decrypt the encrypted object in the bucket using the KMS keys. Here’s how: While still on your EC2 instance, run this command, substituting your bucket name, to download a copy of the PDF file:


aws s3 cp s3://secure-demo-bucket/KMS-Cryptographic-Details.pdf test3.pdf

If this command succeeds, then you will have a file in your current directory on your EC2 instance named test3.pdf. That shows that you have successfully decrypted and downloaded the PDF file.

Test 4: Demonstrate that the key policy regulates access

Now, I will demonstrate the independence of access control provided by the KMS key policy. Leaving the bucket policy and IAM role/policy as they are, you will disable the EC2 instance’s access to the objects using the KMS key policy. The IAM policy for S3 and the bucket policy on the bucket would still normally permit the EC2 instance to access the data. But, because the KMS key policy will prevent use of the key by the authorized-users IAM role, S3 will fail to encrypt or decrypt the object. This means that any commands that execute on the EC2 instance will no longer be able to upload or download data from the S3 bucket.

First, modify the key policy.

  1. Log out of the console and log back in under the secure-key-admin user. Go to the Key Management Service console.
  2. In the left-hand navigation, select Customer managed keys and look for the key with the alias or Key ID that you’re using. The Key ID is the last 32 characters of the full key ARN.
  3. Select the Key ID for the key that you’re using to get to the screen where you can edit the key policy.
  4. In the list of Key users, you will see your authorized-users role listed. Select that role, and then select the Remove button to remove its access to use the KMS key.

At this point, the EC2 instance no longer has the permissions to use the KMS key because its role no longer grants it permission to use the key.

Repeat the command that you did in Test 1 that uploaded a PDF file to the bucket. In this case, try to make a second copy of the PDF file into an object named test4.pdf. Run this command, substituting your bucket name and your KMS key ID as required:


aws s3 cp KMS-Cryptographic-Details.pdf s3://secure-demo-bucket/test4.pdf --sse aws:kms --sse-kms-key-id abcdefab-1234-1234-1234-abcdef01234567890

You should see an error like this:


An error occurred (AccessDenied) when calling the PutObject operation: Access Denied

Now, try to download the copy of the KMS-Cryptographic-Details.pdf file from the bucket, again using the command that worked before, substituting the bucket name as required:


aws s3 cp s3://secure-demo-bucket/KMS-Cryptographic-Details.pdf test4.pdf

You should see an error message like this:


An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

These two commands are denied because when S3 tried to invoke KMS to encrypt or decrypt data, the EC2 instance role did not have permission to use the KMS key and thus the request failed. Note that there is no situation where the API call returns the KMS-encrypted data from S3. Either the API call succeeds, and you receive the decrypted data, or the API call fails, and you receive an error. All AWS services that use KMS to encrypt data behave this way—you either get the decrypted data, or you get an error message.

Restoring access to the key

To restore the EC2 instance’s access to the data, you authorize its role again in the KMS key policy:

  1. Go to Key Management Service in the AWS Console.
  2. Select Customer managed keys.
  3. Find the key that you’re using and select it.
  4. Find your authorized-users role in the list of roles, or type “authorized-users” in the search box to find it.
  5. Select the checkbox next to the authorized-users role, and then select Add to add that role as a key user.

The role will now have permission to use the key as it did before.

Useful variations on this solution

Variation 1: Using KMS keys in different AWS accounts

You can use a KMS key that is in a different AWS account for encrypting and decrypting. This allows administrators in a central AWS account to manage KMS keys, while the data itself resides in other AWS accounts. This can offer further separation of roles from the example above because even a highly privileged user (for example, root) in the account in which the authorized-users role exists won’t be able to modify the key policy. The account ID in which authorized-users role exists must be listed in the key policy. For more information, follow the instructions on sharing KMS keys across accounts.

Note that the KMS key and the S3 bucket must always be in the same region. The EC2 instance does not need to be in the same region as the S3 bucket. You will experience higher latency when your EC2 instance is not in the same region as the S3 bucket.

Variation 2: Granting KMS key usage permissions to other AWS services

EC2 is not the only service that can be granted a role this way. Lambda functions can be granted AWS IAM roles that allow them to use KMS keys. That would permit the Lambda functions with the correct roles to manipulate the S3 data, while other entities (users, EC2 instances) could not. Likewise, AWS services such as Amazon Athena might require access to a KMS key if you want to use it to search data stored in S3 that has been encrypted using KMS. If Athena is given permission to assume a role with permissions to use the KMS key, then Athena can successfully execute its search queries because S3 will be allowed to decrypt objects on behalf of Athena, which is acting on your behalf when assuming the authorized-users role.

Variation 3: Creating isolated authorization to encrypt vs decrypt

You can use the KMS key policy to isolate authorization to encrypt versus decrypt data between two identities. For example, if a role has the kms:Encrypt or kms:GenerateDataKey permissions for a key, that means that role can write encrypted data directly or ask an AWS service to do it on their behalf (for example, during an upload to an S3 bucket). If the role does not also have kms:Decrypt permission, it can’t read encrypted data. This write-only permission might be appropriate for data acquisition, security log delivery, or other functions that should not be allowed to read the data they have written. Likewise, if a role has the kms:Decrypt permission, then the role has the ability to read data. But if it lacks the kms:Encrypt permission, it cannot write or modify encrypted data. This kind of isolation authorization is suitable for audit functions and log aggregation functions that need to read data but typically are prohibited from modifying the data/logs that they read. The complete set of permissions for KMS key policies can be found in the KMS developers guide.

Cost of this solution

Three services with charges are used in this solution: EC2, S3, and KMS. The EC2 instance hours are charged according to standard EC2 pricing. Likewise, storing data in S3 will incur costs according to standard S3 pricing. There is no difference in S3 pricing for storing encrypted versus unencrypted data. Finally, KMS has a fixed price per month for each customer-managed CMK you create, which is described in the KMS pricing page. Each encryption and decryption of an object is a KMS API call and a certain number of KMS API calls are free each month. The number of free KMS API calls, and the price for API calls beyond the free tier, are described on the KMS pricing page.

Summary

The combination of IAM policies, S3 bucket policies, and KMS key policies gives you a powerful way to apply independent access control mechanisms on data. This mechanism means that one set of users can be granted rights to do maintenance operations on the buckets themselves, while not having rights to access or manipulate the data itself. Even a user or function with full privileges in S3 would be denied access to this encrypted data unless it also had the rights to use the KMS keys. It gives you an approach to access control that allows key policies to serve as an additional control when IAM policies or S3 bucket policies alone are not sufficient.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Key Management Service forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author bio

Paco Hope

Paco Hope is a Principal Security Consultant with AWS Professional Services working to help enterprise customers secure their workloads in the cloud. He has helped secure migration landing zones, design customer security architectures, and has mentored a number of AWS partners in the UK on AWS Security. He frequently speaks at information security conferences and security meetups.

re:Invent 2019: Introducing the Amazon Builders’ Library (Part II)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-ii/

In last week’s post, I told you about a new site we introduced at re:Invent at the beginning of this month, the Amazon Builders’ Library, a site that’s chock-full articles by senior technical leaders that help you understand the underpinnings of both Amazon.com and AWS.

Below are four more architecture-based articles that describe how Amazon develops, architects, releases, and operates technology.

Caching Challenges and Strategies

Caching challenges

Over years of building services at Amazon we’ve experienced various versions of the following scenario: We build a new service, and this service needs to make some network calls to fulfill its requests. Perhaps these calls are to a relational database, or an AWS service like Amazon DynamoDB, or to another internal service. In simple tests or at low request rates the service works great, but we notice a problem on the horizon. The problem might be that calls to this other service are slow or that the database is expensive to scale out as call volume increases. We also notice that many requests are using the same downstream resource or the same query results, so we think that caching this data could be the answer to our problems. We add a cache and our service appears much improved. We observe that request latency is down, costs are reduced, and small downstream availability drops are smoothed over. After a while, no one can remember life before the cache. Dependencies reduce their fleet sizes accordingly, and the database is scaled down. Just when everything appears to be going well, the service could be poised for disaster. There could be changes in traffic patterns, failure of the cache fleet, or other unexpected circumstances that could lead to a cold or otherwise unavailable cache. This in turn could cause a surge of traffic to downstream services that can lead to outages both in our dependencies and in our service.

Read the full article by Matt Brinkley, Principal Engineer, and Jas Chhabra, Principal Engineer

Avoiding Fallback in Distributed Systems

Avoiding fallback

Critical failures prevent a service from producing useful results. For example, in an ecommerce website, if a database query for product information fails, the website cannot display the product page successfully. Amazon services must handle the majority of critical failures in order to be reliable.

This article covers fallback strategies and why we almost never use them at Amazon. You might find this surprising. After all, engineers often use the real world as a starting point for their designs. And in the real world, fallback strategies must be planned in advance and used when necessary. Let’s say an airport’s display boards go out. A contingency plan (such as humans writing flight information on whiteboards) must be in place to handle this situation, because passengers still need to find their gates. But consider how awful the contingency plan is: the difficulty of reading the whiteboards, the difficulty of keeping them up-to-date, and the risk that humans will add incorrect information. The whiteboard fallback strategy is necessary but it’s riddled with problems.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Leader Elections in Distributed Systems

Leader elections

Leader election is the simple idea of giving one thing (a process, host, thread, object, or human) in a distributed system some special powers. Those special powers could include the ability to assign work, the ability to modify a piece of data, or even the responsibility of handling all requests in the system.

Leader election is a powerful tool for improving efficiency, reducing coordination, simplifying architectures, and reducing operations. On the other hand, leader election can introduce new failure modes and scaling bottlenecks. In addition, leader election may make it more difficult for you to evaluate the correctness of a system.

Because of these complications, we carefully consider other options before implementing leader election. For data processing and workflows, workflow services like AWS Step Functions can achieve many of the same benefits as leader election and avoid many of its risks. For other systems, we often implement idempotent APIs, optimistic locking, and other patterns that make a single leader unnecessary.

In this article, I discuss some of the pros and cons of leader election in general and how Amazon approaches leader election in our distributed systems, including insights into leader failure.

Read the full article by Mark Brooker, Senior Principal Engineer

Workload Isolation Using Shuffle-Sharding

Shuffle-sharding

Not long after AWS began offering services, AWS customers made clear that they wanted to be able to use our Amazon Simple Storage Service (S3), Amazon CloudFront, and Elastic Load Balancing services at the “root” of their domain, that is, for names like “amazon.com” and not just for names like “www.amazon.com”.

That may seem very simple. However, due to a design decision in the DNS protocol, made back in the 1980s, it’s harder than it seems. DNS has a feature called CNAME that allows the owner of a domain to offload a part of their domain to another provider to host, but it doesn’t work at the root or top level of a domain. To serve our customers’ needs, we’d have to actually host our customers’ domains. When we host a customer’s domain, we can return whatever the current set of IP addresses are for Amazon S3, Amazon CloudFront, or Elastic Load Balancing. These services are constantly expanding and adding IP addresses, so it’s not something that customers could easily hard-code in their domain configurations either.

It’s no small task to host DNS. If DNS is having problems, an entire business can be offline. However, after we identified the need, we set out to solve it in the way that’s typical at Amazon—urgently. We carved out a small team of engineers, and we got to work

Read the full article by Colm MacCárthaigh, Senior Principal Engineer

Want to learn more about the Amazon Builders’ Library? Visit our FAQ.

Next week we’ll wrap up 2019 with a top ten list of the most-visited Architecture blog posts of 2019.

re:Invent 2019: Introducing the Amazon Builders’ Library (Part I)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-i/

Today, I’m going to tell you about a new site we launched at re:Invent, the Amazon Builders’ Library, a collection of living articles covering topics across architecture, software delivery, and operations. You get to peek under the hood of how Amazon architects, releases, and operates the software underpinning Amazon.com and AWS.

Want to know how Amazon.com does what it does? This is for you. In this two-part series (the next one coming December 23), I’ll highlight some of the best architecture articles written by Amazon’s senior technical leaders and engineers.

Avoiding insurmountable queue backlogs

Avoiding insurmountable queue backlogs

In queueing theory, the behavior of queues when they are short is relatively uninteresting. After all, when a queue is short, everyone is happy. It’s only when the queue is backlogged, when the line to an event goes out the door and around the corner, that people start thinking about throughput and prioritization.

In this article, I discuss strategies we use at Amazon to deal with queue backlog scenarios – design approaches we take to drain queues quickly and to prioritize workloads. Most importantly, I describe how to prevent queue backlogs from building up in the first place. In the first half, I describe scenarios that lead to backlogs, and in the second half, I describe many approaches used at Amazon to avoid backlogs or deal with them gracefully.

Read the full article by David Yanacek – Principal Engineer

Timeouts, retries, and backoff with jitter

Timeouts, retries and backoff with jitter

Whenever one service or system calls another, failures can happen. These failures can come from a variety of factors. They include servers, networks, load balancers, software, operating systems, or even mistakes from system operators. We design our systems to reduce the probability of failure, but impossible to build systems that never fail. So in Amazon, we design our systems to tolerate and reduce the probability of failure, and avoid magnifying a small percentage of failures into a complete outage. To build resilient systems, we employ three essential tools: timeouts, retries, and backoff.

Read the full article by Marc Brooker, Senior Principal Engineer

Challenges with distributed systems

Challenges with distributed systems

The moment we added our second server, distributed systems became the way of life at Amazon. When I started at Amazon in 1999, we had so few servers that we could give some of them recognizable names like “fishy” or “online-01”. However, even in 1999, distributed computing was not easy. Then as now, challenges with distributed systems involved latency, scaling, understanding networking APIs, marshalling and unmarshalling data, and the complexity of algorithms such as Paxos. As the systems quickly grew larger and more distributed, what had been theoretical edge cases turned into regular occurrences.

Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Static stability using Availability Zones

Static stability using availability zones

At Amazon, the services we build must meet extremely high availability targets. This means that we need to think carefully about the dependencies that our systems take. We design our systems to stay resilient even when those dependencies are impaired. In this article, we’ll define a pattern that we use called static stability to achieve this level of resilience. We’ll show you how we apply this concept to Availability Zones, a key infrastructure building block in AWS and therefore a bedrock dependency on which all of our services are built.

Read the full article by Becky Weiss, Senior Principal Engineer, and Mike Furr, Principal Engineer

Check back in two weeks to read about some other architecture-based expert articles that let you in on how Amazon does what it does.

Integrating CodePipeline with on-premises Bitbucket Server

Post Syndicated from Alex Rosa original https://aws.amazon.com/blogs/devops/integrating-codepipeline-with-on-premises-bitbucket-server/

This blog post demonstrates how to integrate AWS CodePipeline with on-premises Bitbucket Server. If you want to integrate with Bitbucket Cloud, see Integrating Git with AWS CodePipeline. The AWS Lambda function provided can get the source code from a Bitbucket Server repository whenever the user sends a new code push and store it in a designed Amazon Simple Storage Service (Amazon S3) bucket.

The Bitbucket Server integration uses webhooks configured in the Bitbucket repository. Webhooks are ideal for this case and avoid the need for performing frequent polling to check for changes in the repository.

Some security protections are available with this solution:

  • The Amazon S3 bucket has encryption enabled using SSE-AES, and every object created is encrypted by default.
  • The Lambda function accepts only events signed by the Bitbucket Server.
  • All environment variables used by the Lambda function are encrypted in rest using AWS KMS.

Overview

During the creation of the CloudFormation stack, you can select either Amazon API Gateway or Elastic Load Balancing to communicate with the Lambda function. The following diagram shows how the integration works.

 

This diagram explains the solution flow, from an user code push to Bitbucket server to the CodePipeline trigger and what happen in the between.

 

  1. The user pushes code to the Bitbucket repository.
  2. Based on that user action, the Bitbucket server generates a new webhook event and sends it to Elastic Load Balancing or API Gateway based on which endpoint type you selected during AWS CloudFormation stack creation.
  3. API Gateway or Elastic Load Balancing forwards the request to the Lambda function, which checks the message signature using the secret configured in the webhook. If the signature is valid, then the Lambda function moves to the next step.
  4. The Lambda function calls the Bitbucket server API and requests that it generate a ZIP package with the content of the branch modified by the user in Step 1.
  5. The Lambda function sends the ZIP package to the Amazon S3 bucket.
  6. CodePipeline is triggered when it detects a new or updated file in the Amazon S3 bucket path.

Requirements

Before starting the solution setup, make sure you have:

  • An Amazon S3 bucket available to store the Lambda function setup files
  •  NPM or Yarn to install the package dependencies
  • AWS CLI

Setup

Follow these steps for setup.

Creating a personal token on the Bitbucket server

Create a personal token on the Bitbucket server that the Lambda function uses to access the repository.

  1. Log in to the Bitbucket server.
  2. Choose your user avatar, then choose Manage Account.
  3. On the Account screen, choose Personal access tokens.
  4. Choose Create a token.
  5. Fill out the form with the token name. In the Permissions section, leave Read for Projects and Repositories as is.
  6. Choose Create to finish.

Launch a CloudFormation stack

Using the steps below you will upload the Lambda function and Lambda layer ZIP files to an Amazon S3 bucket and launch the AWS CloudFormation stack to create the resources on your AWS account.

1. Clone the Git repository containing the solution source code:

git clone https://github.com/aws-samples/aws-codepipeline-bitbucket-integration.git

2. Install the NodeJS packages with npm:

cd code
npm install
cd ..

3. Prepare the packages for deployment.

aws cloudformation package --template-file ./infra/infra.yaml --s3-bucket your_bucket_name --output-template-file package.yaml

4. Edit the AWS CloudFormation parameters file.

Open the file located at infra/parameters.json in your favorite text editor and replace the parameters as follows:

BitbucketSecret – Bitbucket webhook secret used to sign webhook events. You should define the secret and use the same value here and in the Bitbucket server webhook.
BitbucketServerUrl – URL of your Bitbucket Server, such as https://server:port.
BitbucketToken – Bitbucket server personal token used by the Lambda function to access the Bitbucket API.
EndpointType – Select the type of endpoint to integrate with the Lambda function. It can be the Application Load Balancer or the API Gateway.
LambdaSubnets – Subnets where the Lambda function runs.
LBCIDR – CIDR allowed to communicate with the Load Balancer. It should allow the Bitbucket server IP address. Leave it blank if you are using the API Gateway endpoint type.
LBSubnets – Subnets where the Application Load Balancer runs. Leave it blank if you are using the API Gateway endpoint type.
LBSSLCertificateArn – SSL Certificate to associate with the Application Load Balancer. Leave it blank if you are using the API Gateway endpoint type.
S3BucketCodePipelineName – Amazon S3 bucket name that this stack creates to store the Bitbucket repository content.
VPCID – VPC ID where the Application Load Balancer and the Lambda function run.
WebProxyHost – Hostname of your proxy server used by the Lambda function to access the Bitbucket server, such as myproxy.mydomain.com. If you don’t need a web proxy, leave it blank.
WebProxyPort – Port of your proxy server used by the Lambda function to access the Bitbucket server, such as 8080. If you don’t need a web proxy leave it blank.

5. Create the CloudFormation stack:

aws cloudformation create-stack --stack-name CodePipeline-Bitbucket-Integration --template-body file:///package.yaml --parameters file:///infra/parameters.json --capabilities CAPABILITY_NAMED_IAM

Creating a webhook on the Bitbucket Server

Next, create the webhook on Bitbucket server to notify the Lambda function of push events to the repository:

  1. Log into the Bitbucket server and navigate to the repository page.
  2. Choose Repository settings.
  3. Select Webhook.
  4. Choose Create webhook.
  5. Fill out the form with the name of the webhook, such as CodePipeline.
  6. Fill out the URL field with the API Gateway or Load Balancer URL. To obtain this URL, choose the Outputs tab of the AWS CloudFormation stack.
  7. Fill out the Secret field with the same value used in the AWS CloudFormation stack.
  8. In the Events section, ensure Push is selected.
  9. Choose Create to finish.
  10. Repeat these steps for each repository in which you want to enable the integration.

Configuring your pipeline

Finally, change your pipeline on CodePipeline to use the Amazon S3 bucket created by the AWS CloudFormation stack as the source of your pipeline.

The Lambda function uploads the files to the Amazon S3 bucket using the following path structure:

Project Name/Repository Name/Branch Name.zip

Now, every time someone pushes code to the Bitbucket repository, your pipeline starts automatically.

Cleaning up

If you want to remove the integration and clean up the resources created at AWS, you need to delete the CloudFormation stack. Run the command below to delete the stack and associated resources.

aws cloudformation delete-stack --stack-name CodePipeline-Bitbucket-Integration 

Conclusion

This post demonstrated how to integrate your on-premises Bitbucket Server with CodePipeline.

Leveraging Elastic Fabric Adapter to run HPC and ML Workloads on AWS Batch

Post Syndicated from Bala Thekkedath original https://aws.amazon.com/blogs/compute/leveraging-efa-to-run-hpc-and-ml-workloads-on-aws-batch/

Leveraging Elastic Fabric Adapter to run HPC and ML Workloads on AWS Batch

 This post is contributed by  Sean Smith, Software Development Engineer II, AWS ParallelCluster & Arya Hezarkhani, Software Development Engineer II, AWS Batch and HPC

 

On August 2, 2019, AWS Batch announced support for Elastic Fabric Adapter (EFA). This enables you to run highly performant, distributed high performance computing (HPC) and machine learning (ML) workloads by using AWS Batch’s managed resource provisioning and job scheduling.

EFA is a network interface for Amazon EC2 instances that enables you to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system bypasses the hardware interface and enhances the performance of inter-instance communications, which is critical to scaling these applications. With EFA, HPC applications using the Message Passing Interface (MPI) and ML applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of cores or GPUs. As a result, you get the application performance of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS Cloud.

AWS Batch is a cloud-native batch scheduler that manages instance provisioning and job scheduling. AWS Batch automatically provisions instances according to job specifications, with the appropriate placement group, networking configurations, and any user-specified file system. It also automatically sets up the EFA interconnect to the instances it launches, which you specify through a single launch template parameter.

In this post, we walk through the setup of EFA on AWS Batch and run the NAS Parallel Benchmark (NPB), a benchmark suite that evaluates the performance of parallel supercomputers, using the open source implementation of MPI, OpenMPI.

 

Prerequisites

This walk-through assumes:

 

Configuring your compute environment

First, configure your compute environment to launch instances with the EFA device.

Creating an EC2 placement group

The first step is to create a cluster placement group. This is a logical grouping of instances within a single Availability Zone. The chief benefit of a cluster placement group is non-blocking, non-oversubscribed, fully bi-sectional network connectivity. Use a Region that supports EFA—currently, that is us-east-1, us-east-2, us-west-2, and eu-west-1. Run the following command:

$ aws ec2 create-placement-group –group-name “efa” –strategy “cluster” –region [your-region]

Creating an EC2 launch template

Next, create a launch template that contains a user-data script to install EFA libraries onto the instance. Launch templates enable you to store launch parameters so that you do not have to specify them every time you launch an instance. This will be the launch template used by AWS Batch to scale the necessary compute resources in your AWS Batch Compute Environment.

First, encode the user data into base64-encoding. This example uses the CLI utility base64 to do so.

 

$ echo “MIME-Version: 1.0

Content-Type: multipart/mixed; boundary=”==MYBOUNDARY==”

 

–==MYBOUNDARY==

Content-Type: text/cloud-boothook; charset=”us-ascii” cloud-init-per once yum_wget yum install -y wget

cloud-init-per once wget_efa wget -q –timeout=20 https://s3-us-west-2.amazonaws.com/

cloud-init-per once tar_efa tar -xf /tmp/aws-efa-installer-latest.tar.gz -C /tmp pushd /tmp/aws-efa-installer

cloud-init-per once install_efa ./efa_installer.sh -y pop /tmp/aws-efa-installer

cloud-init-per once efa_info /opt/amazon/efa/bin/fi_info -p efa

–==MYBOUNDARY==–” | base64

 

Save the base64-encoded output, because you need it to create the launch template.

 

Next, make sure that your default security group is configured correctly. On the EC2 console, select the default security group associated with your default VPC and edit the inbound rules to allow SSH and All traffic to itself. This must be set explicitly to the security group ID for EFA to work, as seen in the following screenshot.

SecurityGroupInboundRules

SecurityGroupInboundRules

 

Then edit the outbound rules and add a rule that allows all inbound traffic from the security group itself, as seen in the following screenshot. This is a requirement for EFA to work.

SecurityGroupOutboundRules

SecurityGroupOutboundRules

 

Now, create an ecsInstanceRole, the Amazon ECS instance profile that will be applied to Amazon EC2 instances in a Compute Environment. To create a role, follow these steps.

  1. Choose Roles, then Create Role.
  2. Select AWS Service, then EC2.
  3. Choose Permissions.
  4. Attach the managed policy AmazonEC2ContainerServiceforEC2Role.
  5. Name the role ecsInstanceRole.

 

You will create the launch template using the ID of the security group, the ID of a subnet in your default VPC, and the ecsInstanceRole that you created.

Next, choose an instance type that supports EFA, that’s denoted by the n in the instance name. This example uses c5n.18xlarge instances.

You also need an Amazon Machine Image (AMI) ID. This example uses the latest ECS-optimized AMI based on Amazon Linux 2. Grab the AMI ID that corresponds to the Region that you are using.

This example uses UserData to install EFA. This adds 1.5 minutes of bootstrap time to the instance launch. In production workloads, bake the EFA installation into the AMI to avoid this additional bootstrap delay.

Now create a file called launch_template.json with the following content, making sure to substitute the account ID, security group, subnet ID, AMI ID, and key name.

{

“LaunchTemplateName”: “EFA-Batch-LaunchTemplate”, “LaunchTemplateData”: {

“InstanceType”: “c5n.18xlarge”,

“IamInstanceProfile”: {

“Arn”: “arn:aws:iam::<Account Id>:instance-profile/ecsInstanceRole”

},

“NetworkInterfaces”: [

{

“DeviceIndex”: 0,

“Groups”: [

“<Security Group>”

],

“SubnetId”: “<Subnet Id>”,

“InterfaceType”: “efa”,

“Description”: “NetworkInterfaces Configuration For EFA and Batch”

}

],

“Placement”: {

“GroupName”: “efa”

},

“TagSpecifications”: [

{

“ResourceType”: “instance”,

“Tags”: [

{

“Key”: “from-lt”,

“Value”: “networkInterfacesConfig-EFA-Batch”

}

]

}

],

“UserData”: “TUlNRS1WZXJzaW9uOiAxLjAKQ29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSI9PU1ZQk9VTkRBUlk9PSIKCi0tPT1NWUJPVU5EQVJZPT0KQ29udGVudC1UeXBlOiB0ZXh0L2Nsb3VkLWJvb3Rob29rOyBjaGFyc2V0PSJ1cy1hc2NpaSIKCmNsb3VkLWluaXQtcGVyIG9uY2UgeXVtX3dnZXQgeXVtIGluc3RhbGwgLXkgd2dldAoKY2xvdWQtaW5pdC1wZXIgb25jZSB3Z2V0X2VmYSB3Z2V0IC1xIC0tdGltZW91dD0yMCBodHRwczovL3MzLXVzLXdlc3QtMi5hbWF6b25hd3MuY29tL2F3cy1lZmEtaW5zdGFsbGVyL2F3cy1lZmEtaW5zdGFsbGVyLWxhdGVzdC50YXIuZ3ogLU8gL3RtcC9hd3MtZWZhLWluc3RhbGxlci1sYXRlc3QudGFyLmd6CgpjbG91ZC1pbml0LXBlciBvbmNlIHRhcl9lZmEgdGFyIC14ZiAvdG1wL2F3cy1lZmEtaW5zdGFsbGVyLWxhdGVzdC50YXIuZ3ogLUMgL3RtcAoKcHVzaGQgL3RtcC9hd3MtZWZhLWluc3RhbGxlcgpjbG91ZC1pbml0LXBlciBvbmNlIGluc3RhbGxfZWZhIC4vZWZhX2luc3RhbGxlci5zaCAteQpwb3AgL3RtcC9hd3MtZWZhLWluc3RhbGxlcgoKY2xvdWQtaW5pdC1wZXIgb25jZSBlZmFfaW5mbyAvb3B0L2FtYXpvbi9lZmEvYmluL2ZpX2luZm8gLXAgZWZhCgotLT09TVlCT1VOREFSWT09LS0K”

}

}

Create a launch template from that file:

 

$ aws ec2 create-launch-template –cli-input-json file://launch_template.json

{

“LaunchTemplate”: {

“LatestVersionNumber”: 1,

“LaunchTemplateId”: “lt-*****************”, “LaunchTemplateName”: “EFA-Batch-LaunchTemplate”, “DefaultVersionNumber”: 1,

“CreatedBy”: “arn:aws:iam::************:user/desktop-user”, “CreateTime”: “2019-09-23T13:00:21.000Z”

}

}

 

Creating a compute environment

Next, create an AWS Batch Compute Environment. This uses the information from the launch template

EFA-Batch-Launch-Template created earlier.

 

{

“computeEnvironmentName”: “EFA-Batch-ComputeEnvironment”,

“type”: “MANAGED”,

“state”: “ENABLED”,

“computeResources”: {

“type”: “EC2”,

“minvCpus”: 0,

“maxvCpus”: 2088,

“desiredvCpus”: 0,

“instanceTypes”: [

“c5n.18xlarge”

],

“subnets”: [

“<same-subnet-as-in-LaunchTemplate>”

],

“instanceRole”: “arn:aws:iam::<account-id>:instance-profile/ecsInstanceRole”,

“launchTemplate”: {

“launchTemplateName”: “EFA-Batch-LaunchTemplate”,

“version”: “$Latest”

}

},

“serviceRole”: “arn:aws:iam::<account-id>:role/service-role/AWSBatchServiceRole”

}

 

Now, create the compute environment:

$ aws batch create-compute-environment –cli-input-json file://compute_environment.json

{

“computeEnvironmentName”: “EFA-Batch-ComputeEnvironment”, “computeEnvironmentArn”: “arn:aws:batch:us-east-1:<Account Id>:compute-environment”

}

 

Building the container image

To build the container, clone the repository that contains the Dockerfile used in this example.

First, install git:

$ git clone https://github.com/aws-samples/aws-batch-efa.git

 

In that repository, there are several files, one of which is the following Dockerfile.

 

FROM amazonlinux:1 ENV USER efauser

 

RUN yum update -y

RUN yum install -y which util-linux make tar.x86_64 iproute2 gcc-gfortran openssh-serv RUN pip-2.7 install supervisor

 

RUN useradd -ms /bin/bash $USER ENV HOME /home/$USER

 

##################################################### ## SSH SETUP

ENV SSHDIR $HOME/.ssh

RUN mkdir -p ${SSHDIR} \

&& touch ${SSHDIR}/sshd_config \

&& ssh-keygen -t rsa -f ${SSHDIR}/ssh_host_rsa_key -N ” \

&& cp ${SSHDIR}/ssh_host_rsa_key.pub ${SSHDIR}/authorized_keys \ && cp ${SSHDIR}/ssh_host_rsa_key ${SSHDIR}/id_rsa \

&& echo ”  IdentityFile ${SSHDIR}/id_rsa” >> ${SSHDIR}/config \ && echo ”  StrictHostKeyChecking no” >> ${SSHDIR}/config \

&& echo ”  UserKnownHostsFile /dev/null” >> ${SSHDIR}/config \ && echo ”  Port 2022″ >> ${SSHDIR}/config \

&& echo ‘Port 2022’ >> ${SSHDIR}/sshd_config \

&& echo ‘UsePrivilegeSeparation no’ >> ${SSHDIR}/sshd_config \

&& echo “HostKey ${SSHDIR}/ssh_host_rsa_key” >> ${SSHDIR}/sshd_config \ && echo “PidFile ${SSHDIR}/sshd.pid” >> ${SSHDIR}/sshd_config \

&& chmod -R 600 ${SSHDIR}/* \

&& chown -R ${USER}:${USER} ${SSHDIR}/

 

# check if ssh agent is running or not, if not, run RUN eval `ssh-agent -s` && ssh-add ${SSHDIR}/id_rsa

 

################################################# ## EFA and MPI SETUP

RUN curl -O https://s3-us-west-2.amazonaws.com/aws-efa-installer/aws-efa-installer-1. && tar -xf aws-efa-installer-1.5.0.tar.gz \

&& cd aws-efa-installer \

&& ./efa_installer.sh -y –skip-kmod –skip-limit-conf –no-verify

 

RUN wget https://www.nas.nasa.gov/assets/npb/NPB3.3.1.tar.gz \ && tar xzf NPB3.3.1.tar.gz

COPY make.def_efa /NPB3.3.1/NPB3.3-MPI/config/make.def COPY suite.def      /NPB3.3.1/NPB3.3-MPI/config/suite.def

 

RUN cd /NPB3.3.1/NPB3.3-MPI \

&& make suite \

&& chmod -R 755 /NPB3.3.1/NPB3.3-MPI/

 

###################################################

## supervisor container startup

 

ADD conf/supervisord/supervisord.conf /etc/supervisor/supervisord.conf ADD supervised-scripts/mpi-run.sh supervised-scripts/mpi-run.sh

RUN chmod 755 supervised-scripts/mpi-run.sh

 

EXPOSE 2022

ADD batch-runtime-scripts/entry-point.sh batch-runtime-scripts/entry-point.sh RUN chmod 755 batch-runtime-scripts/entry-point.sh

 

CMD /batch-runtime-scripts/entry-point.sh

 

To build this Dockerfile, run the included Makerfile with:

make

Now, push the created container image to Amazon Elastic Container Registry (ECR), so you can use it in your AWS Batch JobDefinition:

From the AWS CLI, create an ECR repository, we’ll call it aws-batch-efa:

$ aws ecr create-repository –repository-name aws-batch-efa

{

“repository”: {

“registryId”: “<Account-Id>”,
“repositoryName”: “aws-batch-efa”,
“repositoryArn”: “arn:aws:ecr:us-east-2:<Account-Id>:repository/aws-batch-efa”,
“createdAt”: 1568154893.0,
“repositoryUri”: “<Account-Id>.dkr.ecr.us-east-2.amazonaws.com/aws-batch-efa”

}
}

Edit the top of the makefile and add your AWS account number and AWS Region.

AWS_REGION=<REGION>
ACCOUNT_ID=<ACCOUNT-ID>

To push the image to the ECR repository, run:

make tag
make push

 

Run the application

To run the application using AWS Batch multi-node parallel jobs, follow these steps.

 

Setting up the AWS Batch multi-node job definition

Set up the AWS Batch multi-node job definition and expose the EFA device to the container by following these steps.

 

First, create a file called job_definition.json with the following contents. This file holds the configurations for the AWS Batch JobDefinition. Specifically, this JobDefinition uses the newly supported field LinuxParameters.Devices to expose a particular device—in this case, the EFA device path /dev/infiniband/uverbs0—to the container. Be sure to substitute the image URI with the one you pushed to ECR in the previous step. This is used to start the container.

 

{

“jobDefinitionName”: “EFA-MPI-JobDefinition”,

“type”: “multinode”,

“nodeProperties”: {

“numNodes”: 8,

“mainNode”: 0,

“nodeRangeProperties”: [

{

“targetNodes”: “0:”,

“container”: {

“user”: “efauser”,

“image”: “<Docker Image From Previous Section>”,

“vcpus”: 72,

“memory”: 184320,

“linuxParameters”: {

“devices”: [

{

“hostPath”: “/dev/infiniband/uverbs0”

}

]

},

“ulimits”: [

{

“hardLimit”: -1,

“name”: “memlock”,

“softLimit”: -1

}

]

}

}

]

}

}

 

$ aws batch register-job-definition –cli-input-json file://job_definition.json

{

“jobDefinitionArn”: “arn:aws:batch:us-east-1:<account-id>:job-definition/EFA-MPI-JobDefinition”,

“jobDefinitionName”: “EFA-MPI-JobDefinition”,

“revision”: 1

}

 

Run the job

Next, create a job queue. This job queue points at the compute environment created before. When jobs are submitted to it, they queue until instances are available to run them.

 

{

“jobQueueName”: “EFA-Batch-JobQueue”,

“state”: “ENABLED”,

“priority”: 10,

“computeEnvironmentOrder”: [

{

“order”: 1,

“computeEnvironment”: “EFA-Batch-ComputeEnvironment”

}

]

}

 

aws   batch create-job-queue –cli-input-json  file://job_queue.json

Now that you’ve created all the resources, submit the job. The numNodes=8 parameter tells the job definition to use eight nodes.

aws   batch submit-job –job-name      example-mpi-job –job-queue EFA-Batch-JobQueue –job-definition EFA-MPI-JobDefinition –node-overrides numNodes=8

 

NPB overview

NPB is a small set of benchmarks derived from computational fluid dynamics (CFD) applications. They consist of five kernels and three pseudo-applications. This example runs the 3D Fast Fourier Transform (FFT) benchmark, as it tests all-to-all communication. For this run, use c5n.18xlarge, as configured in the AWS compute environment earlier. This is an excellent choice for this workload as it has an Intel Skylake processor (72 hyperthreaded cores) and 100 GB Enhanced Networking (ENA), which you can take advantage of with EFA.

 

This test runs the FT “C” Benchmark with eight nodes * 72 vcpus = 576 vcpus.

 

NAS Parallel Benchmarks 3.3 — FT Benchmark

No input file inputft.data. Using compiled defaults Size : 512x 512x 512

Iterations : 20

Number of processes : 512 Processor array : 1x 512 Layout type : 1D

Initialization time = 1.3533580760000063

T = 1 Checksum = 5.195078707457D+02 5.149019699238D+02

T = 2 Checksum = 5.155422171134D+02 5.127578201997D+02
T = 3 Checksum = 5.144678022222D+02 5.122251847514D+02
T = 4 Checksum = 5.140150594328D+02 5.121090289018D+02
T = 5 Checksum = 5.137550426810D+02 5.121143685824D+02
T = 6 Checksum = 5.135811056728D+02 5.121496764568D+02
T = 7 Checksum = 5.134569343165D+02 5.121870921893D+02
T = 8 Checksum = 5.133651975661D+02 5.122193250322D+02
T = 9 Checksum = 5.132955192805D+02 5.122454735794D+02
T = 10 Checksum = 5.132410471738D+02 5.122663649603D+02
T = 11 Checksum = 5.131971141679D+02 5.122830879827D+02
T = 12 Checksum = 5.131605205716D+02 5.122965869718D+02
T = 13 Checksum = 5.131290734194D+02 5.123075927445D+02
T = 14 Checksum = 5.131012720314D+02 5.123166486553D+02
T = 15 Checksum = 5.130760908195D+02 5.123241541685D+02
T = 16 Checksum = 5.130528295923D+02 5.123304037599D+02
T = 17 Checksum = 5.130310107773D+02 5.123356167976D+02
T = 18 Checksum = 5.130103090133D+02 5.123399592211D+02
T = 19 Checksum = 5.129905029333D+02 5.123435588985D+02
T = 20 Checksum = 5.129714421109D+02 5.123465164008D+02

 

Result verification successful class = C

FT Benchmark Completed. Class = C

Size = 512x 512x 512 Iterations = 20

Time in seconds = 1.92 Total processes = 512 Compiled procs = 512 Mop/s total = 206949.17 Mop/s/process = 404.20

Operation type = floating point Verification = SUCCESSFUL

 

Summary

In this post, we covered how to run MPI Batch jobs with an EFA-enabled elastic network interface using AWS Batch multi-node parallel jobs and an EC2 launch template. We used a launch template to configure the AWS Batch compute environment to launch an instance with the EFA device installed. We showed you how to expose the EFA device to the container. You also learned how to package an MPI benchmarking application, the NPB, as a Docker container, and how to run the application as an AWS Batch multi-node parallel job.

We hope you found the information in this post helpful and encouraging as to all the possibilities for HPC on AWS.

How to BYOK (bring your own key) to AWS KMS for less than $15.00 a year using AWS CloudHSM

Post Syndicated from Tracy Pierce original https://aws.amazon.com/blogs/security/how-to-byok-bring-your-own-key-to-aws-kms-for-less-than-15-00-a-year-using-aws-cloudhsm/

Note: BYOK is helpful for certain use cases, but I recommend that you familiarize yourself with KMS best practices before you adopt this approach. You can review best practices in the AWS Key Management Services Best Practices (.pdf) whitepaper.

May 14, 2019: We’ve updated a sentence to clarify that this solution does not include instructions for creating an AWS CloudHSM cluster. For information about how to do this, please refer to our documentation.


Back in 2016, AWS Key Management Service (AWS KMS) announced the ability to bring your own keys (BYOK) for use with KMS-integrated AWS services and custom applications. This feature allows you more control over the creation, lifecycle, and durability of your keys. You decide the hardware or software used to generate the customer-managed customer master keys (CMKs), you determine when or if the keys will expire, and you get to upload your keys only when you need them and delete them when you’re done.

The documentation that walks you through how to import a key into AWS KMS provides an example that uses OpenSSL to create your master key. Some customers, however, don’t want to use OpenSSL. While it’s a valid method of creating the key material associated with a KMS CMK, security best practice is to perform key creation on a hardware security module (HSM) or a hardened key management system.

However, using an on-premises HSM to create and back up your imported keys can become expensive. You have to plan for factors like the cost of the device itself, its storage in a datacenter, electricity, maintenance of the device, and network costs, all of which can add up. An on-premises HSM device could run upwards of $10K annually even if used sparingly, in addition to the cost of purchasing the device in the first place. Even if you’re only using the HSM for key creation and backup and don’t need it on an ongoing basis, you might still need to keep it running to avoid complex re-initialization processes. This is where AWS CloudHSM comes in. CloudHSM offers HSMs that are under your control, in your virtual private cloud (VPC). You can spin up an HSM device, create your key material, export it, import it into AWS KMS for use, and then terminate the HSM (since CloudHSM saves your HSM state using secure backups). Because you’re only billed for the time your HSM instance is active, you can perform all of these steps for less than $15.00 a year!

Solution pricing

In this post, I’ll show you how to use an AWS CloudHSM cluster (which is a group that CloudHSM uses to keep all HSMs inside in sync with one another) with one HSM to create your key material. Only one HSM is needed, as you’ll use this HSM just for key generation and storage. Ongoing crypto operations that use the key will all be performed by KMS. AWS CloudHSM comes with an hourly fee that changes based on your region of choice; be sure to check the pricing page for updates prior to use. AWS KMS has a $1.00/month charge for all customer-managed CMKs, including those that you import yourself. In the following chart, I’ve calculated annual costs for some regions. These assume that you want to rotate your imported key annually and that you perform this operation once per year by running a single CloudHSM for one hour. I’ve included 12 monthly installments of $1.00/mo for your CMK. Depending on the speed of your application, additional costs may apply for KMS usage, which can be found on the AWS KMS pricing page.

REGIONCLOUDHSM PRICING STRUCTUREKMS PRICING STRUCTUREANNUAL TOTAL COST
US-EAST-1$1.60/HR$1/MO$13.60/YR
US-WEST-2$1.45/HR$1/MO$13.45/YR
EU-WEST-1$1.47/HR$1/MO$13.47/YR
US-GOV-EAST-1$2.08/HR$1/MO$14.08/YR
AP-SOUTHEAST-1$1.86/HR$1/MO$13.86/YR
EU-CENTRAL-1$1.92/HR$1/MO$13.92/YR
EU-NORTH-1$1.40/HR$1/MO$13.40/YR
AP-SOUTHEAST-2$1.99/HR$1/MO$13.99/YR
AP-NORTHEAST-1$1.81/HR$1/MO$13.81/YR

Solution overview

I’ll walk you through the process of creating a CMK in AWS KMS with no key material associated and then downloading the public key and import token that you’ll need in order to import the key material later on. I’ll also show you how to create, securely wrap, and export your symmetric key from AWS CloudHSM. Finally, I’ll show you how to import your key material into AWS KMS and then terminate your HSM to save on cost. The following diagram illustrates the steps covered in this post.
 

Figure 1: The process of creating a CMK in AWS KMS

Figure 1: The process of creating a CMK in AWS KMS

  1. Create a customer master key (CMK) in AWS KMS that has no key material associated.
  2. Download the import wrapping key and import token from KMS.
  3. Import the wrapping key provided by KMS into the HSM.
  4. Create a 256 bit symmetric key on AWS CloudHSM.
  5. Use the imported wrapping key to wrap the symmetric key.
  6. Import the symmetric key into AWS KMS using the import token from step 2.
  7. Terminate your HSM, which triggers a backup. Delete or leave your cluster, depending on your needs.

Prerequisites

In this walkthrough, I assume that you already have an AWS CloudHSM cluster set up and initialized with at least one HSM device, and an Amazon Elastic Compute Cloud (EC2) instance running Amazon Linux OS with the AWS CloudHSM client installed. You must have a crypto user (CU) on the HSM to perform the key creation and export functions. You’ll also need an IAM user or role with permissions to both AWS KMS and AWS CloudHSM, and with credentials configured in your AWS Command Line Interface (AWS CLI). Make sure to store your CU information and IAM credentials (if a user is created for this activity) in a safe location. You can use AWS Secrets Manager for secure storage.

Deploying the solution

For my demonstration, I’ll be using the AWS CLI. However, if you prefer working with a graphical user interface, step #1 and the important portion of step #3 can be done in the AWS KMS Console. All other steps are via AWS CLI only.

Step 1: Create the CMK with no key material associated

Begin by creating a customer master key (CMK) in AWS KMS that has no key material associated. The CLI command to create the CMK is as follows:

$ aws kms create-key –origin EXTERNAL –region us-east-1

If successful, you’ll see an output on the CLI similar to below. The KeyState will be PendingImport and the Origin will be EXTERNAL.


{
    "KeyMetadata": {
        "AWSAccountId": "111122223333",
        "KeyId": "1234abcd-12ab-34cd-56ef-1234567890ab",
        "Arn": "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab",
        "CreationDate": 1551119868.343,
        "Enabled": false,
        "Description": "",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyState": "PendingImport",
        "Origin": "EXTERNAL",
        "KeyManager": "CUSTOMER"
    }
}

Step 2: Download the public key and import token

With the CMK ID created, you now need to download the import wrapping key and the import token. Wrapping is a method of encrypting the key so that it doesn’t pass in plaintext over the network. You need both the wrapping key and the import token in order to import a key into AWS KMS. You’ll use the public key to encrypt your key material, which ensures that your key is not exposed as it’s imported. When AWS KMS receives the encrypted key material, it will use the corresponding private component of the import wrapping key to decrypt it. The public and private key pair used for this action will always be a 2048-bit RSA key that is unique to each import operation. The import token contains metadata to ensure the key material is imported to the correct CMK ID. Both the import token and import wrapping key are on a time limit, so if you don’t import your key material within 24 hours of downloading them, the import wrapping key and import token will expire and you’ll need to download a new set from KMS.

  1. (OPTIONAL) In my example command, I’ll use the wrapping algorithm RSAES_OAEP_SHA_256 to encrypt and securely import my key into AWS KMS. If you’d like to choose a different algorithm, you may use any of the following: RSAES_OAEP_SHA_256, RSAES_OAEP_SHA_1, or RSAES_PKCS1_V1_5.
  2. In the command below, replace the example key ID (shown in red) with the key ID you received in step 1. If you’d like, you can also modify the wrapping algorithm. Then run the command.

    $ aws kms get-parameters-for-import –key-id <1234abcd-12ab-34cd-56ef-1234567890ab> –wrapping-algorithm <RSAES_OAEP_SHA_256> –wrapping-key-spec RSA_2048 –region us-east-1

    If the command is successful, you’ll see an output similar to what’s below. Your output will contain the import token and import wrapping key. Copy this output to a safe location, as you’ll need it in the next step. Keep in mind, these items will expire after 24 hours.

    
        {
            "KeyId": "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab ",
            "ImportToken": "AQECAHjS7Vj0eN0qmSLqs8TfNtCGdw7OfGBfw+S8plTtCvzJBwAABrMwggavBgkqhkiG9w0BBwagggagMIIGnAIBADCCBpUGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMpK6U0uSCNsavw0OJAgEQgIIGZqZCSrJCf68dvMeP7Ah+AfPEgtKielFq9zG4aXAIm6Fx+AkH47Q8EDZgf16L16S6+BKc1xUOJZOd79g5UdETnWF6WPwAw4woDAqq1mYKSRtZpzkz9daiXsOXkqTlka5owac/r2iA2giqBFuIXfr2RPDB6lReyrhAQBYTwefTHnVqosKVGsv7/xxmuorn2iBwMfyqoj2gmykmoxrmWmCdK+jRdWKBHtriwplBTEHpUTdHmQnQzGVThfr611XFCQDZ+uk/wpVFY4jZ/Z/…"
            "PublicKey": "MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA4KCiW36zofngmyVTOpMuFfHAXLWTfl2BAvaeq5puewoMt1zPwBP23qKdG10L7ChTVt4H9PCs1vzAWm2jGeWk5fO…",
            "ParametersValidTo": 1551207484.208
        }
        

  3. Because the output of the command is base64 encoded, you must base64 decode both components into binary data before use. To do this, use your favorite text editor to save the output into two separate files. Name one ImportToken.b64—it will have the output of the “ImportToken” section from above. Name the other PublicKey.b64—it will have the output of the “PublicKey” section from above. You will find example commands to save both below.
    
        echo 'AQECAHjS7Vj0eN0qmSLqs8TfNtCGdw7OfGBfw+S8plTtCvzJBwAABrMwggavBgkqhkiG9w0BBwagggagMIIGnAIBADCCBpUGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMpK6U0uSCNsavw0OJAgEQgIIGZqZCSrJCf68dvMeP7Ah+AfPEgtKielFq9zG4aXAIm6Fx+AkH47Q8EDZgf16L16S6+BKc1xUOJZOd79g5UdETnWF6WPwAw4woDAqq1mYKSRtZpzkz9daiXsOXkqTlka5owac/r2iA2giqBFuIXfr2RPDB6lReyrhAQBYTwefTHnVqosKVGsv7/xxmuorn2iBwMfyqoj2gmykmoxrmWmCdK+jRdWKBHtriwplBTEHpUTdHmQnQzGVThfr611XFCQDZ+uk/wpVFY4jZ/Z' > /ImportToken.b64
    
        echo 'MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA4KCiW36zofngmyVTOpMuFfHAXLWTfl2BAvaeq5puewoMt1zPwBP23qKdG10L7ChTVt4H9PCs1vzAWm2jGeWk5fO' > /PublicKey.b64 
        

  4. On both saved files, you must then run the following commands, which will base64 decode them and save them in their binary form:
    
        $ openssl enc -d -base64 -A -in PublicKey.b64 -out PublicKey.bin
    
        $ openssl enc -d -base64 -A -in ImportToken.b64 -out ImportToken.bin
        

Step 3: Import the import wrapping key provided by AWS KMS into your HSM

Now that you’ve created the CMK ID and prepared for the key material import, you’re going to move to AWS CloudHSM to create and export your symmetric encryption key. You’re going to import the import wrapping key provided by AWS KMS into your HSM. Before completing this step, be sure you’ve set up the prerequisites I listed at the start of this post.

  1. Log into your EC2 instance with the AWS CloudHSM client package installed, and launch the Key Management Utility. You will launch the utility using the command below:

    $ /opt/cloudhsm/bin/key_mgmt_util

  2. After launching the key_mgmt_util, you’ll need to log in as the CU user. Do so with the command below, being sure to replace <ExampleUser> and <Password123> for your own CU user and password.

    Command: loginHSM -u CU -p <Password123> -s <ExampleUser>

    You should see an output similar to the example below, letting you know that you’ve logged in correctly:

        
        Cfm3LoginHSM returned: 0x00 : HSM Return: SUCCESS
        Cluster Error Status
        Node id 1 and err state 0x00000000 : HSM Return: SUCCESS
        

    Next, you’ll import the import wrapping key provided by KMS into the HSM, so that you can use it to wrap the symmetric key you’re going to create.

  3. Open the file PublicKey.b64 in your favorite text editor and add the line —–BEGIN PUBLIC KEY—– at the very top of the file and the line —–END PUBLIC KEY—— at the very end. (An example of how this should look is below.) Save this new file as PublicKey.pem.
       
        -----BEGIN PUBLIC KEY-----
        MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0UnLXHbP/jC/QykQQUroaB0xQYYVaZge//NrmIdYtYvaw32fcEUYjTHovWkYkmifUFVYkNaWKNRGuICo1oAY5cgowgxYcsBXZ4Pk2h3v43tIsxG63ZDLKDFju/dtGaa8+CwR8mR…
        -----END PUBLIC KEY-----
        

  4. Use the importPubKey command from the key_mgmt_util to import the key file. An example of the command is below. For the -l flag (label), I’ve named my example <wrapping-key> so that I’ll know what it’s for. You can name yours whatever you choose. Also, remember to replace <PublicKey.pem> with the actual filename of your public key.

    Command: importPubKey -l <wrapping-key> -f <PublicKey.pem>

    If successful, your output should look similar to the following example. Make note of the key handle, as this is the identifying ID for your wrapping key:

          
        Cfm3CreatePublicKey returned: 0x00 : HSM Return: SUCCESS
    
        Public Key Handle: 30 
        
        Cluster Error Status
        Node id 2 and err state 0x00000000 : HSM Return: SUCCESS
        

Step 4: Create a symmetric key on AWS CloudHSM

Next, you’ll create a symmetric key that you want to export from CloudHSM, so that you can import it into AWS KMS.

The first parameter you’ll use for this action is -t with the value 31. This is the key-type flag and 31 equals an AES key. KMS only accepts the AES key type. The second is -s with a value of 32. This is the key-size flag, and 32 equals a 32-byte key (256 bits). The last flag is the -l flag with the value BYOK_Key. This flag is simply a label to name your key, so you may alter this value however you wish.

Here’s what my example looks like:

Command: genSymKey -t 31 -s 32 -l BYOK-KMS

If successful, your output should look similar to what’s below. Make note of the key handle, as this will be the identifying ID of the key you wish to import into AWS KMS.

  
Cfm3LoginHSM returned: 0x00 : HSM Return: SUCCESS

Symmetric Key Created. Key Handle: 20

Cluster Error Status
Node id 1 and err state 0x00000000 : HSM Return: SUCCESS

Step 5: Use the imported import wrapping key to wrap the symmetric key

Now that you’ve imported the import wrapping key into your HSM in the PublicKey.pem file, I’ll show you how to use the PublicKey.pem file and the key_mgmt_util to wrap your symmetric key out of the HSM.

An example of the command is below. Here’s how to customize the parameters:

  • -k refers to the key handle of your symmetric key. Replace the variable that follows -k with your own key handle ID from step 4.
  • -w refers to the key handle of your wrapping key. Replace the variable with your own ID from step 3.4
  • -out refers to the file name of your exported key. Replace the variable with a file name of your choice.
  • -m refers to the wrapping-mechanism. In my example, I’ve used RSA-OAEP, which has a value of 8.
  • -t refers to the hash-type. In my example, I’ve used SHA256, which has a value of 3.
  • -noheader refers to the -noheader flag. This omits the header that specifies CloudHSM-specific key attributes.

Command: wrapKey -k <20> -w <30> -out <KMS-BYOK-March2019.bin> -m 8 -t 3 -noheader

If successful, you should see an output similar to what’s below:

 
Cfm2WrapKey5 returned: 0x00 : HSM Return: SUCCESS

	Key Wrapped.

	Wrapped Key written to file "KMS-BYOK-March2019.bin length 256

	Cfm2WrapKey returned: 0x00 : HSM Return: SUCCESS

With that, you’ve completed the AWS CloudHSM steps. You now have a symmetric key, securely wrapped for transport into AWS KMS. You can log out of the Key Management Utility by entering the word exit.

Step 6: Import the wrapped symmetric key into AWS KMS using the key import function

You’re ready to import the wrapped symmetric key into AWS KMS using the key import function. Make the following updates to the sample command that I provide below:

  • Replace the key-id value and file names with your own.
  • Leave the expiration model parameter blank, or choose from
    KEY_MATERIAL_EXPIRES or KEY_MATERIAL_DOES_NOT_EXPIRE. If you leave the parameter blank, it will default to KEY_MATERIAL_EXPIRES. This expiration model requires you to set the –valid-to parameter, which can be any time of your choosing so long as it follows the format in the example. If you choose KEY_MATERIAL_DOES_NOT_EXPIRE, you may leave the –valid-to option out of the command. To enforce an expiration and rotation of your key material, best practice is to use the KEY_MATERIAL_EXPIRES option and a date of 1-2 years.

Here’s my sample command:

$ aws kms import-key-material –key-id <1234abcd-12ab-34cd-56ef-1234567890ab> –encrypted-key-material fileb://<KMS-BYOK-March2019.bin> –import-token fileb://<ImportToken.bin> –expiration-model KEY_MATERIAL_EXPIRES –valid-to 2020-03-01T12:00:00-08:00 –region us-east-1

You can test that the import was successful by using the key ID to encrypt a small (under 4KB) file. An example command is below:

$ aws kms encrypt –key-id 1234abcd-12ab-34cd-56ef-1234567890ab –plaintext file://test.txt –region us-east-1

A successful call will output something similar to below:

 
{
    "KeyId": "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab ", 
    "CiphertextBlob": "AQICAHi4Pj/Jhk6aHMZNegOpYFnnuiEKAtjRWwD33TdDnKNY5gHbUVdrwptz…"
}

Step 7: Terminate your HSM (which triggers a backup)

The last step in the process is to Terminate your HSM (which triggers a backup). Since you’ve imported the key material into AWS KMS, you no longer need to run the HSM. Terminating the HSM will automatically trigger a backup, which will take an exact copy of the key material and the users on your HSM and store it, encrypted, in an Amazon Simple Storage Service (Amazon S3) bucket. If you need to re-import your key into KMS or if your company’s security policy requires an annual rotation of your KMS master key(s), when the time comes, you can spin up an HSM from your CloudHSM backup and be back to where you started. You can re-import your existing key material or create key materials for your new CMK, either way, you’ll need to request a fresh import wrapping key and import token from KMS.

Deleting an HSM can be done with the command below. Replace <cluster-example> with your actual cluster ID and <hsm-example> with your HSM ID:

$ aws cloudhsmv2 delete-hsm –cluster-id <cluster-example> –hsm-id <hsm-example> –region us-east-1

Following these steps, you will have successfully defined a KMS CMK, created your own key material on a CloudHSM device, and then imported it into AWS KMS for use. Dependent upon region, all of this can be done for less than $15.00 a year.

Summary

In this post, I walked you through creating a CMK ID without encryption key material associated, creating and then exporting a symmetric key from AWS CloudHSM, and then importing it into AWS KMS for use with KMS-integrated AWS services. This process allows you to maintain ownership and management over your master keys, create them on FIPS 140-2 Level 3 validated hardware, and maintain a copy for disaster recovery scenarios. This saves not only the time and personnel required to maintain your own HSMs on-premises, but the cost of hardware, electricity, and housing of the device.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM or AWS KMS forums.

Want more AWS Security news? Follow us on Twitter.

Author

Tracy Pierce

Tracy Pierce is a Senior Cloud Support Engineer at AWS. She enjoys the peculiar culture of Amazon and uses that to ensure every day is exciting for her fellow engineers and customers alike. Customer Obsession is her highest priority and she shows this by improving processes, documentation, and building tutorials. She has her AS in Computer Security & Forensics from SCTD, SSCP certification, AWS Developer Associate certification, and AWS Security Specialist certification. Outside of work, she enjoys time with friends, her Great Dane, and three cats. She keeps work interesting by drawing cartoon characters on the walls at request.