Tag Archives: Advanced (300)

Use the Snyk CLI to scan Python packages using AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild

Post Syndicated from BK Das original https://aws.amazon.com/blogs/devops/snyk-cli-scan-python-codecommit-codepipeline-codebuild/

One of the primary advantages of working in the cloud is achieving agility in product development. You can adopt practices like continuous integration and continuous delivery (CI/CD) and GitOps to increase your ability to release code at quicker iterations. Development models like these demand agility from security teams as well. This means your security team has to provide the tooling and visibility to developers for them to fix security vulnerabilities as quickly as possible.

Vulnerabilities in cloud-native applications can be roughly classified into infrastructure misconfigurations and application vulnerabilities. In this post, we focus on enabling developers to scan vulnerable data around Python open-source packages using the Snyk Command Line Interface (CLI).

The world of package dependencies

Traditionally, code scanning is performed by the security team; they either ship the code to the scanning instance, or in some cases ship it to the vendor for vulnerability scanning. After the vendor finishes the scan, the results are provided to the security team and forwarded to the developer. The end-to-end process of organizing the repositories, sending the code to security team for scanning, getting results back, and remediating them is counterproductive to the agility of working in the cloud.

Let’s take an example of package A, which uses package B and C. To scan package A, you scan package B and C as well. Similar to package A having dependencies on B and C, packages B and C can have their individual dependencies too. So the dependencies for each package get complex and cumbersome to scan over time. The ideal method is to scan all the dependencies in one go, without having manual intervention to understand the dependencies between packages.

Building on the foundation of GitOps and Gitflow

GitOps was introduced in 2017 by Weaveworks as a DevOps model to implement continuous deployment for cloud-native applications. It focuses on the developer ability to ship code faster. Because security is a non-negotiable piece of any application, this solution includes security as part of the deployment process. We define the Snyk scanner as declarative and immutable AWS Cloud Development Kit (AWS CDK) code, which instructs new Python code committed to the repository to be scanned.

Another continuous delivery practice that we base this solution on is Gitflow. Gitflow is a strict branching model that enables project release by enforcing a framework for managing Git projects. As a brief introduction on Gitflow, typically you have a main branch, which is the code sent to production, and you have a development branch where new code is committed. After the code in development branch passes all tests, it’s merged to the main branch, thereby becoming the code in production. In this solution, we aim to provide this scanning capability in all your branches, providing security observability through your entire Gitflow.

AWS services used in this solution

We use the following AWS services as part of this solution:

  • AWS CDK – The AWS CDK is an open-source software development framework to define your cloud application resources using familiar programming languages. In this solution, we use Python to write our AWS CDK code.
  • AWS CodeBuild – CodeBuild is a fully managed build service in the cloud. CodeBuild compiles your source code, runs unit tests, and produces artifacts that are ready to deploy. CodeBuild eliminates the need to provision, manage, and scale your own build servers.
  • AWS CodeCommit – CodeCommit is a fully managed source control service that hosts secure Git-based repositories. It makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. You can use CodeCommit to securely store anything from source code to binaries, and it works seamlessly with your existing Git tools.
  • AWS CodePipeline – CodePipeline is a continuous delivery service you can use to model, visualize, and automate the steps required to release your software. You can quickly model and configure the different stages of a software release process. CodePipeline automates the steps required to release your software changes continuously.
  • Amazon EventBridge – EventBridge rules deliver a near-real-time stream of system events that describe changes in AWS resources. With simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams.
  • AWS Systems Manager Parameter Store – Parameter Store, a capability of AWS Systems Manager, provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values.


Before you get started, make sure you have the following prerequisites:

  • An AWS account (use a Region that supports CodeCommit, CodeBuild, Parameter Store, and CodePipeline)
  • A Snyk account
  • An existing CodeCommit repository you want to test on

Architecture overview

After you complete the steps in this post, you will have a working pipeline that scans your Python code for open-source vulnerabilities.

We use the Snyk CLI, which is available to customers on all plans, including the Free Tier, and provides the ability to programmatically scan repositories for vulnerabilities in open-source dependencies as well as base image recommendations for container images. The following reference architecture represents a general workflow of how Snyk performs the scan in an automated manner. The design uses DevSecOps principles of automation, event-driven triggers, and keeping humans out of the loop for its run.

As developers keep working on their code, they continue to commit their code to the CodeCommit repository. Upon each commit, a CodeCommit API call is generated, which is then captured using the EventBridge rule. You can customize this event rule for a specific event or feature branch you want to trigger the pipeline for.

When the developer commits code to the specified branch, that EventBridge event rule triggers a CodePipeline pipeline. This pipeline has a build stage using CodeBuild. This stage interacts with the Snyk CLI, and uses the token stored in Parameter Store. The Snyk CLI uses this token as authentication and starts scanning the latest code committed to the repository. When the scan is complete, you can review the results on the Snyk console.

This code is built for Python pip packages. You can edit the buildspec.yml to incorporate for any other language that Snyk supports.

The following diagram illustrates our architecture.

snyk architecture codepipeline

Code overview

The code in this post is written using the AWS CDK in Python. If you’re not familiar with the AWS CDK, we recommend reading Getting started with AWS CDK before you customize and deploy the code.

Repository URL: https://github.com/aws-samples/aws-cdk-codecommit-snyk

This AWS CDK construct uses the Snyk CLI within the CodeBuild job in the pipeline to scan the Python packages for open-source package vulnerabilities. The construct uses CodePipeline to create a two-stage pipeline: one source, and one build (the Snyk scan stage). The construct takes the input of the CodeCommit repository you want to scan, the Snyk organization ID, and Snyk auth token.

Resources deployed

This solution deploys the following resources:

For the deployment, we use the AWS CDK construct in the codebase cdk_snyk_construct/cdk_snyk_construct_stack.py in the AWS CDK stack cdk-snyk-stack. The construct requires the following parameters:

  • ARN of the CodeCommit repo you want to scan
  • Name of the repository branch you want to be monitored
  • Parameter Store name of the Snyk organization ID
  • Parameter Store name for the Snyk auth token

Set up the organization ID and auth token before deploying the stack. Because these are confidential and sensitive data, you should deploy them as a separate stack or manual process. In this solution, the parameters have been stored as a SecureString parameter type and encrypted using the AWS-managed KMS key.

You create the organization ID and auth token on the Snyk console. On the Settings page, choose General in the navigation page to add these parameters.

snyk settings console


You can retrieve the names of the parameters on the Systems Manager console by navigating to Parameter Store and finding the name on the Overview tab.

SSM Parameter Store

Create a requirements.txt file in the CodeCommit repository

We now create a repository in CodeCommit to store the code. For simplicity, we primarily store the requirements.txt file in our repository. In Python, a requirements file stores the packages that are used. Having clearly defined packages and versions makes it easier for development, especially in virtual environments.

For more information on the requirements file in Python, see Requirement Specifiers.

To create a CodeCommit repository, run the following AWS Command Line Interface (AWS CLI) command in your AWS accounts:

aws codecommit create-repository --repository-name snyk-repo \
--repository-description "Repository for Snyk to scan Python packages"

Now let’s create a branch called main in the repository using the following command:

aws codecommit create-branch --repository-name snyk-repo \
--branch-name main

After you create the repository, commit a file named requirements.txt with the following content. The following packages are pinned to a particular version that they have a vulnerability with. This file is our hypothetical vulnerable set of packages that have been committed into your development code.



For instructions on committing files in CodeCommit, see Connect to an AWS CodeCommit repository.

When you store the Snyk auth token and organization ID in Parameter Store, note the parameter names—you need to pass them as parameters during the deployment step.

Now clone the CDK code from the GitHub repository with the command below:

git clone https://github.com/aws-samples/aws-cdk-codecommit-snyk.git

After the cloning is complete you should see a directory named aws-cdk-codecommit-snyk on your machine.

When you’re ready to deploy, enter the aws-cdk-codecommit-snyk directory, and run the following command with the appropriate values:

cdk deploy cdk-snyk-stack \
--parameters RepoName=<name-of-codecommit-repo> \
--parameters RepoBranch=<branch-to-be-scanned>  \
--parameters SnykOrgId=<value> \
--parameters SnykAuthToken=<value>

After the stack deployment is complete, you can see a new pipeline in your AWS account, which is configured to be triggered every time a commit occurs on the main branch.

You can view the results of the scan on the Snyk console. After the pipeline runs, log in to snyk.io and you should see a project named as per your repository (see the following screenshot).

snyk dashboard


Choose the repo name to get a detailed view of the vulnerabilities found. Depending on what packages you put in your requirements.txt, your report will differ from the following screenshot.



To fix the vulnerability identified, you can change the version of these packages in the requirements.txt file. The edited requirements file should look like the following:


After you update the requirements.txt file in your repository, push your changes back to the CodeCommit repository you created earlier on the main branch. The push starts the pipeline again.

After the commit is performed to the targeted branch, you don’t see the vulnerability reported on the Snyk dashboard because the pinned version 5.4 doesn’t contain that vulnerability.

Clean up

To avoid accruing further cost for the resources deployed in this solution, run cdk destroy to remove all the AWS resources you deployed through CDK.

As the CodeCommit repository was created using AWS CLI, the following command deletes the CodeCommit repository:

aws codecommit delete-repository --repository-name snyk-repo


In this post, we provided a solution so developers can self- remediate vulnerabilities in their code by monitoring it through Snyk. This solution provides observability, agility, and security for your Python application by following DevOps principles.

A similar architecture has been used at NFL to shift-left the security of their code. According to the shift-left design principle, security should be moved closer to the developers to identify and remediate security issues earlier in the development cycle. NFL has implemented a similar architecture which made the total process, from committing code on the branch to remediating 15 times faster than their previous code scanning setup.

Here’s what NFL has to say about their experience:

“NFL used Snyk to scan Python packages for a service launch. Traditionally it would have taken 10days to scan the packages through our existing process but with Snyk we were able to follow DevSecOps principles and get the scans completed, and reviewed within matter of days. This simplified our time to market while maintaining visibility into our security posture.” – Joe Steinke (Director, Data Solution Architect)

The three most important AWS WAF rate-based rules

Post Syndicated from Artem Lovan original https://aws.amazon.com/blogs/security/three-most-important-aws-waf-rate-based-rules/

In this post, we explain what the three most important AWS WAF rate-based rules are for proactively protecting your web applications against common HTTP flood events, and how to implement these rules. We share what the Shield Response Team (SRT) has learned from helping customers respond to HTTP floods and show how all AWS WAF customers can benefit from these learnings.

When you have business-critical applications that are internet-facing, you need to protect them from risks such as distributed denial of service (DDoS) attacks. AWS Shield Advanced is a managed DDoS protection service that safeguards applications that are running behind Amazon Web Services (AWS) internet-facing resources. The backend origin of your application can exist anywhere, including on premises, and Shield Advanced can protect it. Shield Advanced provides DDoS protection for Layers 3–7. It also includes 24/7 access to the SRT to help you quickly respond to sophisticated unauthorized activity scenarios that might be unique to your application. To learn more about what resource types are supported to associate AWS WAF, see AWS WAF.

Increasingly, the SRT has been assisting customers in protecting against Layer 7 HTTP flood occurrences that negatively impact application availability or performance by overloading the application with an unusually high number of HTTP requests. In many cases, these malicious events can be automatically mitigated by using AWS WAF. In addition, AWS WAF has an easy-to-configure native rate-based rule capability, which detects source IP addresses that make large numbers of HTTP requests within a 5-minute time span, and automatically blocks requests from the offending source IP until the rate of requests falls below a set threshold. In this post, we show how you can pull insights from the AWS WAF logs to determine what your rate-based rule threshold should be.

The top three most important AWS WAF rate-based rules are:

  • A blanket rate-based rule to protect your application from large HTTP floods.
  • A rate-based rule to protect specific URIs at more restrictive rates than the blanket rate-based rule.
  • A rate-based rule to protect your application against known malicious source IPs.

Solution overview

AWS WAF is a web application firewall that helps protect your web applications against common web exploits that might affect availability, compromise security, or consume excessive resources. AWS WAF gives you control over which web traffic reaches your applications. If you already know the request rates for your application, you have all the necessary information to start creating your AWS WAF rate-based rules. To learn more about how to create rules, see Creating a rule and adding conditions. However, if you don’t have this data and want to learn how to get started, this solution helps you determine appropriate rates for your applications, and how to create AWS WAF rate-based rules.

Figure 1 shows how incoming request information is captured so that the operations team can use it to determine rate-based rules.

Figure 1: The workflow to collect and query logs and apply rate-based rules

Figure 1: The workflow to collect and query logs and apply rate-based rules

Let’s go through the flow to better understand what’s happening at each step:

  1. An application user makes requests to the application.
  2. AWS WAF captures information about the incoming requests and sends this to Amazon Kinesis Data Firehose.
  3. Kinesis Data Firehose delivers the logs to an Amazon Simple Storage Service (Amazon S3) bucket, where they will be stored.
  4. The operations team uses Amazon Athena to analyze the logs with SQL queries.
  5. Athena queries the logs in the S3 bucket and shows the query results.
  6. The operations team uses the query results to determine the appropriate AWS WAF rate-based rule.

The three rate-based rules in detail

Each of the rules helps to protect web applications from unauthorized activity. Each of the rules focuses on a specific aspect of protection. The rules complement each other, and so when they’re combined, they can offer greater help in protecting your web application. We’ll look at each of the rules to understand what they do.

Blanket rate-based rule

A blanket rate-based rule is designed to prevent any single source IP address from negatively impacting the availability of a website. For example, if the threshold for the rate-based rule is set to 2,000, the rule will block all IPs that are making more than 2,000 requests in a rolling 5-minute period. This is the most basic rate-based rule, and one of the most valuable for AWS WAF customers to implement. The SRT often helps customers who are actively under a DDoS attack to quickly implement this rule. In past experiences with HTTP flood cases, if this rule were proactively in place, the customer would have been protected and wouldn’t have needed to reach out to the SRT for assistance. The blanket rate-based rule would have automatically blocked the attempt without any human intervention.

URI-specific rate-based rule

Some application URI endpoints typically receive a high request volume, but for others it would be unusual and suspicious to see a high request count. For example, multiple requests in a 5-minute period to an application’s login page is suspicious and indicates a potential brute force or credential-stuffing attack against the application. A URI-specific rule can prevent a single source IP address from connecting to the login page as few as 100 times per 5-minute period, while still allowing a much higher request volume to the rest of the application. Some applications naturally have computationally expensive URIs that, when called, require considerably more resources to process the request. An example of this could be a database query or search function. If a bad actor targets these computationally expensive URIs, this can quickly lead to application performance or availability issues. If you assign a URI-specific rate-based rule to these portions of your site, you can configure a much lower threshold than the blanket rate-based rule. It’s beyond the scope of this blog post, but some customers use Application Load Balancer access logs and the target_processing_time information to determine precisely which portions of the site are the slowest to respond and might represent a computationally expensive call. These customers then put additional rate-based rule protections on calls that are made to these URIs.

IP reputation rate-based rule

Many of the DDoS events the SRT assists customers with include HTTP floods that originate from known malicious source IPs. The AWS WAF Security Automations solution provides AWS WAF customers with a subscription to four open-source threat intelligence lists. Rate-based rules with low thresholds can be applied to requests coming from these suspect sources. Some customers feel comfortable completely blocking web requests from these IPs, but at the very least, requests from these IPs should be rate-limited to protect the application from these well-known malicious sources.

It’s also common to see HTTP floods originate from IP addresses within certain countries. You can use AWS WAF geographical matching rules to assign lower rate-based rule thresholds to requests that originate from certain countries, or countries that don’t contain your web application’s primary user base. For example, suppose your application primarily serves users in the United States. In that case, it could be beneficial to create a rate-based rule with a low threshold for requests that come from any country other than the United States. HTTP floods are also commonly seen originating from IP addresses classified as cloud hosting provider IPs. You can use AWS WAF’s “HostingProviderIPList” Managed Rule to label these requests and then assign a lower rate-based rule threshold to them as well.


Before you implement the solution, verify that:

  • AWS WAF is deployed in your AWS account and is associated with an Amazon CloudFront distribution or an Application Load Balancer.
  • Your AWS WAF default action is set to Block. When you create and configure a web ACL, you set the web ACL default action, which determines how AWS WAF handles web requests that don’t match any rules in the web ACL. To learn more about default action for a web ACL, see Deciding on the default action for a web ACL.
  • AWS WAF logging is configured and logs are being stored in an S3 bucket.

    Note: You can follow these instructions to configure delivery of AWS WAF logs to your S3 bucket, and you can also use AWS Firewall Manager to configure centralized AWS WAF logging in a multi-account environment.

Set up Athena to analyze AWS WAF logs

Amazon Athena is an interactive query service that you can use to analyze data in Amazon S3 by using standard SQL. For this solution, you’ll use Athena to connect to the S3 bucket where AWS WAF logs are stored and query the AWS WAF logs. The first step is to open the Athena console and create a database.

Note: The Athena database and table creation is a once-off configuration process. You can then come back and run the queries and see the query results based on your latest AWS WAF log data.

To create an Athena database, you’ll use a data definition language (DDL) statement. Paste the following query in the Athena query editor, replacing values as described here:

  • Replace <your-bucket-name> with the S3 bucket name that holds your AWS WAF logs.
  • For <bucket-prefix-if-exist>, if AWS WAF logs are stored in an S3 bucket prefix, replace with your prefix name. Otherwise, remove this part from the query, including the slash “/” at the end.
  LOCATION 's3://<your-bucket-name>/<bucket-prefix-if-exist>/';

Choose Run query to run the query and create the database. Successful completion will be indicated by the query result, as shown below.

Query successful. 

Next, you’ll create a table inside the database. Paste the following query in the Athena query editor, replacing values as described here:

  • Replace <your-bucket-name> with the S3 bucket name that holds your AWS WAF logs.
  • For <bucket-prefix-if-exist>, if AWS WAF logs are stored in an S3 bucket prefix, replace with your prefix name. Otherwise, you can remove this part from the query, including the slash “/” at the end.
  • For has_encrypted_data, if your AWS WAF log data is encrypted at rest, change the value to true, otherwise false is the correct value.
  `terminatingRuleId` string,
  `httpSourceName` string,
  `action` string,
  `httpSourceId` string,
  `terminatingRuleType` string,
  `webaclId` string,
  `timestamp` float,
  `formatVersion` int,
  `ruleGroupList` array<string>,
  `httpRequest` struct<`headers`:array<struct<name:string,value:string>>,clientIp:string,args:string,requestId:string,httpVersion:string,httpMethod:string,country:string,uri:string>,
  `rateBasedRuleList` string,
  `nonTerminatingMatchingRules` string,
  `terminatingRuleMatchDetails` string 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
  'serialization.format' = '1'
) LOCATION 's3://<your-bucket-name>/<bucket-prefix-if-exist>/'
TBLPROPERTIES ('has_encrypted_data'='false');

Run the query in the Athena console. After the query completes, Athena registers the waftable table, which makes the data in it available for queries.

Run SQL queries to identify rate-based rule thresholds

Now that you have a table in Athena, know where the data is located, and have the correct schema, you can run SQL queries for each of the rate-based rules and see the query results.

Blanket rate-based rule for all application endpoints

You’ll start with a SQL query that identifies the blanket rule. The critical factor in determining the blanket rule is to run the query against AWS WAF logs data that represents a healthy high request volume. The following query defines a time window of 6 hours in the evening, expressed as 2020-12-01 16:00:00 and 2020-12-01 22:00:00. Time windows can span a few hours or several days; however, this time window must be a good representation of your traffic volume, which you will use as the basis to identify the threshold. For example, if your application is busier during certain periods, you should evaluate the log data for that time. In the example shown here, we limit the query results to the top 100 IPs in our SQL queries. You can adjust the limit to your needs by updating the LIMIT value.

  COUNT(*) AS "count"
FROM wafrulesdb.waftable
WHERE from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
GROUP BY httprequest.clientip, FLOOR("timestamp"/(1000*60*5))
LIMIT 100; 

Update the time window to your needs and run the query in the Athena console. The results will show the top requesting IPs in any 5-minute period between two dates, as illustrated in Figure 2.

Figure 2: The top requesting IP in any 5-minute period between dates

Figure 2: The top requesting IP in any 5-minute period between dates

You can visualize the results data to see a holistic view of the request count per IP. The chart in Figure 3 illustrates the SQL query results.

Figure 3: Chart: Top requesting IP in any 5-minute period between dates

Figure 3: Chart: Top requesting IP in any 5-minute period between dates

The results are sorted by showing the IPs with the highest request volume for every 5-minute period. This means that the same IP could appear multiple times, if most of the requests were made within that 5-minute interval. In our example, looking at the result, an excellent first blanket rule would limit the request volume to about 7,000 requests within a 5-minute time period. You can either create the AWS WAF rule by using the following JSON and the JSON rule editor, or by using the AWS WAF visual rule editor and following these instructions. If you’re using the following JSON, make sure to replace the Limit value with the value that you identified by running the SQL query earlier.

  "Name": "BlanketRule",
  "Priority": 2,
  "Action": {
    "Block": {}
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "BlanketRule"
  "Statement": {
    "RateBasedStatement": {
      "Limit": 7000,
      "AggregateKeyType": "IP"

Sometimes a client connects to an application through an HTTP proxy or a content delivery network (CDN), which obscures the client origin IP. It’s important to identify the client IP instead of the one from the proxy or CDN, because blocking source IPs can cause a wider unwanted impact. You can use many tools to help you identify whether the source IP might be a CDN. In this case, you would need to query and filter on the X-Forwarded-For, True-Client-IP, or other custom headers. CDN providers typically publish which headers they add to the requests, but X-Forwarded-For and True-Client-IP are common. The following query shows how you can reference these headers, illustrating with the X-Forwarded-For header, to write rate-based rules. You can replace X-Forwarded-For with the header you expect to hold the client IP.

  COUNT(*) AS "count"
FROM wafrulesdb.waftable, UNNEST(httprequest.headers) as t(header)
    from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
    header.name = 'X-Forwarded-For'
GROUP BY header.value, FLOOR("timestamp"/(1000*60*5))
LIMIT 100;

URI-based rule for specific application endpoints

Suppose that you want to further limit requests to the login page on your website. To do this, you could add the following string match condition to a rate-based rule:

  • The part of the request to filter on is URI
  • The Match Type is Starts with
  • A Value to match is /login (this needs to be whatever identifies the login page in the URI portion of the web request)

Next you have to identify what is a typical request volume to the /login URI for the application. The following SQL query does exactly that.

  COUNT(*) AS "count"
FROM wafrulesdb.waftable
  from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
  httprequest.uri = '/login'
GROUP BY httprequest.clientip, httprequest.uri, FLOOR("timestamp"/(1000*60*5))
LIMIT 100;

Replace the time window 2020-12-01 16:00:00 and 2020-12-01 22:00:00 and the httprequest.uri value, if applicable, and run the query in the Athena console. The results show the highest requesting IP and /login URI for every 5-minute period between dates, as illustrated in Figure 4.

Figure 4: The highest requesting IP and /login URI for every 5-minute period between dates

Figure 4: The highest requesting IP and /login URI for every 5-minute period between dates

Figure 5 illustrates a chart based on the query results for the highest requesting IP and /login URI for every 5-minute period between dates.

Figure 5: Chart: The highest requesting IP and /login URI for every 5-minute period between dates

Figure 5: Chart: The highest requesting IP and /login URI for every 5-minute period between dates

Based on the SQL query results, you would specify a rate limit of 150 requests per 5 minutes. Adding this rate-based rule to a web ACL will limit requests to your login page per IP address without affecting the rest of your site. Once again, you can either create the AWS WAF rule by using the following JSON and the JSON rule editor, or by using the AWS WAF visual rule editor and following these instructions. If you’re using the following JSON, make sure to replace the Limit value with the value that you identified by running the SQL query earlier.

  "Name": "UriBasedRule",
  "Priority": 1,
  "Action": {
    "Block": {}
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "UriBasedRule"
  "Statement": {
    "RateBasedStatement": {
      "Limit": 150,
      "AggregateKeyType": "IP",
      "ScopeDownStatement": {
        "ByteMatchStatement": {
          "FieldToMatch": {
            "UriPath": {}
          "PositionalConstraint": "STARTS_WITH",
          "SearchString": "/login",
          "TextTransformations": [
              "Type": "NONE",
              "Priority": 0

AWS WAF rules with a lower value for Priority are evaluated before rules with a higher value. For the AWS WAF rules to work as expected (first evaluating the more specific rule—the URI-based rule, and only after that, the more general blanket rule) you have to set the AWS WAF rule priority. You can do that by updating the JSON and setting the Priority value to 1 for the blanket rule and 0 for the URI-based rule, or by using the AWS WAF visual rule editor. The expected AWS WAF rule priority should be as illustrated in Figure 6.

Figure 6: AWS WAF rules with priority for UriBasedRule

Figure 6: AWS WAF rules with priority for UriBasedRule

If you want to know the request volume across all application URIs, the following SQL will accomplish that.

  COUNT(*) AS "count"
FROM wafrulesdb.waftable
WHERE from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
GROUP BY httprequest.clientip, httprequest.uri, FLOOR("timestamp"/(1000*60*5))
LIMIT 100;

Figure 7 shows a chart of what the SQL query results might look like.

Figure 7: The highest requesting IP and URI for every 5-minute period between dates

Figure 7: The highest requesting IP and URI for every 5-minute period between dates

IP reputation rule groups to block bots or other threats

You can use IP reputation rules to block requests based on their source. AWS WAF offers a wide selection of managed rule groups, and Amazon IP reputation list is the one that will help to reduce your exposure to bot traffic or exploitation attempts.

To add the Amazon IP reputation list rule to your web ACL

  1. Open the AWS WAF console and navigate to the managed rule groups view.

    Figure 8: The managed rule group view in AWS WAF

    Figure 8: The managed rule group view in AWS WAF

  2. Expand AWS managed rule groups, and for Amazon IP reputation list, choose Add to web ACL.

    Figure 9: Add the Amazon IP reputation list to the web ACL

    Figure 9: Add the Amazon IP reputation list to the web ACL

  3. Scroll to the bottom of the page and choose Add rule.
  4. At this point, you should see the Set rule priority view. Move up the Amazon managed rule so that it has the highest priority. If a request originates from a bot, you want to deny the request as early as possible, and you achieve exactly that by assigning the highest priority to the Amazon IP reputation list rule. Your final AWS WAF rules order should be as shown in Figure 10.

    Figure 10: Final AWS WAF rules ordered by priority

    Figure 10: Final AWS WAF rules ordered by priority

Considerations for rate-based rules

It’s important to note that the more specific AWS WAF rules should have a higher priority, because you want these rules to limit the request volume first. In our example, the rules strategy is first based on a specific URI, and then on a blanket rule that limits requests across the whole application.

The rate-based rules that we discussed here provide a solid foundation to help you protect your internet-facing applications from common basic HTTP request floods. However, the solution in this blog post shouldn’t be seen as a one-time setup but rather as an iterative activity.

You should determine a healthy time frame to rerun Amazon Athena queries to identify a new rate-based rule that aligns with the application’s growth and increasing request volume. Reviewing the rate-based rules on an iterative basis and incorporating it into your existing processes, such as software development life cycle, is a great way to schedule in the review process. Each AWS WAF rule can publish Amazon CloudWatch metrics, which can be used to trigger alerts before thresholds are crossed. You can use alerts to create tickets to operations teams based on thresholds you set. This alerts your operations teams to review the situation to see if it’s a DDoS attack being thwarted versus legitimate traffic being dropped.

After you define your request, add a buffer to allow for growth. Rate-based rules should have a reasonable buffer to account for near-future application growth. For instance, when an Athena query result shows a request volume of 500 requests, a rate-based rule with a limit of 1,000 requests gives a buffer for an additional 500 requests to account for application growth.


In this post, we introduced you to the top three most important AWS WAF rate-based rules to protect your web applications from common HTTP flood events. We also covered how to implement these rate-based rules and determined an appropriate request threshold for your application by using AWS WAF logs and Amazon Athena queries. To learn more about best practices that help you protect your websites and web applications against various attack vectors by using AWS WAF, see our whitepaper, Guidelines for Implementing AWS WAF.

You can learn more about AWS WAF in other AWS WAF–related Security Blog posts.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS WAF forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Artem Lovan

Artem is a Senior Solutions Architect based in New York. He helps customers architect and optimize applications on AWS. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.


Jesse Lepich

Jesse is a Senior Security Solutions Architect at AWS based in Lake St. Louis, Missouri, focused on helping customers implement native AWS security services. Outside of cloud security, his interests include relaxing with family, barefoot waterskiing, snowboarding/snow skiing, surfing, boating/sailing, and mountain climbing.

How to restrict IAM roles to access AWS resources from specific geolocations using AWS Client VPN

Post Syndicated from Artem Lovan original https://aws.amazon.com/blogs/security/how-to-restrict-iam-roles-to-access-aws-resources-from-specific-geolocations-using-aws-client-vpn/

You can improve your organization’s security posture by enforcing access to Amazon Web Services (AWS) resources based on IP address and geolocation. For example, users in your organization might bring their own devices, which might require additional security authorization checks and posture assessment in order to comply with corporate security requirements. Enforcing access to AWS resources based on geolocation can help you to automate compliance with corporate security requirements by auditing the connection establishment requests. In this blog post, we walk you through the steps to allow AWS Identity and Access Management (IAM) roles to access AWS resources only from specific geographic locations.

Solution overview

AWS Client VPN is a managed client-based VPN service that enables you to securely access your AWS resources and your on-premises network resources. With Client VPN, you can access your resources from any location using an OpenVPN-based VPN client. A client VPN session terminates at the Client VPN endpoint, which is provisioned in your Amazon Virtual Private Cloud (Amazon VPC) and therefore enables a secure connection to resources running inside your VPC network.

This solution uses Client VPN to implement geolocation authentication rules. When a client VPN connection is established, authentication is implemented at the first point of entry into the AWS Cloud. It’s used to determine if clients are allowed to connect to the Client VPN endpoint. You configure an AWS Lambda function as the client connect handler for your Client VPN endpoint. You can use the handler to run custom logic that authorizes a new connection. When a user initiates a new client VPN connection, the custom logic is the point at which you can determine the geolocation of this user. In order to enforce geolocation authorization rules, you need:

  • AWS WAF to determine the user’s geolocation based on their IP address.
  • A Network address translation (NAT) gateway to be used as the public origin IP address for all requests to your AWS resources.
  • An IAM policy that is attached to the IAM role and validated by AWS when the request origin IP address matches the IP address of the NAT gateway.

One of the key features of AWS WAF is the ability to allow or block web requests based on country of origin. When the client connection handler Lambda function is invoked by your Client VPN endpoint, the Client VPN service invokes the Lambda function on your behalf. The Lambda function receives the device, user, and connection attributes. The user’s public IP address is one of the device attributes that are used to identify the user’s geolocation by using the AWS WAF geolocation feature. Only connections that are authorized by the Lambda function are allowed to connect to the Client VPN endpoint.

Note: The accuracy of the IP address to country lookup database varies by region. Based on recent tests, the overall accuracy for the IP address to country mapping is 99.8 percent. We recommend that you work with regulatory compliance experts to decide if your solution meets your compliance needs.

A NAT gateway allows resources in a private subnet to connect to the internet or other AWS services, but prevents a host on the internet from connecting to those resources. You must also specify an Elastic IP address to associate with the NAT gateway when you create it. Since an Elastic IP address is static, any request originating from a private subnet will be seen with a public IP address that you can trust because it will be the elastic IP address of your NAT gateway.

AWS Identity and Access Management (IAM) is a web service for securely controlling access to AWS services. You manage access in AWS by creating policies and attaching them to IAM identities (users, groups of users, or roles) or AWS resources. A policy is an object in AWS that, when associated with an identity or resource, defines their permissions. In an IAM policy, you can define the global condition key aws:SourceIp to restrict API calls to your AWS resources from specific IP addresses.

Note: Throughout this post, the user is authenticating with a SAML identity provider (IdP) and assumes an IAM role.

Figure 1 illustrates the authentication process when a user tries to establish a new Client VPN connection session.

Figure 1: Enforce connection to Client VPN from specific geolocations

Figure 1: Enforce connection to Client VPN from specific geolocations

Let’s look at how the process illustrated in Figure 1 works.

  1. The user device initiates a new client VPN connection session.
  2. The Client VPN service redirects the user to authenticate against an IdP.
  3. After user authentication succeeds, the client connects to the Client VPN endpoint.
  4. The Client VPN endpoint invokes the Lambda function synchronously. The function is invoked after device and user authentication, and before the authorization rules are evaluated.
  5. The Lambda function extracts the public-ip device attribute from the input and makes an HTTPS request to the Amazon API Gateway endpoint, passing the user’s public IP address in the X-Forwarded-For header.Because you’re using AWS WAF to protect API Gateway, and have geographic match conditions configured, a response with the status code 200 is returned only if the user’s public IP address originates from an allowed country of origin. Additionally, AWS WAF has another rule configured that blocks all requests to API Gateway if the request doesn’t originate from one of the NAT gateway IP addresses. Because Lambda is deployed in a VPC, it has a NAT gateway IP address, and therefore the request isn’t blocked by AWS WAF. To learn more about running a Lambda function in a VPC, see Configuring a Lambda function to access resources in a VPC.The following code example showcases Lambda code that performs the described step.

    Note: Optionally, you can implement additional controls by creating specific authorization rules. Authorization rules act as firewall rules that grant access to networks. You should have an authorization rule for each network for which you want to grant access. To learn more, see Authorization rules.

  6. The Lambda function returns the authorization request response to Client VPN.
  7. When the Lambda function—shown following—returns an allow response, Client VPN establishes the VPN session.
import os
import http.client

cloud_front_url = os.getenv("ENDPOINT_DNS")
endpoint = os.getenv("ENDPOINT")
success_status_codes = [200]

def build_response(allow, status):
    return {
        "allow": allow,
        "error-msg-on-failed-posture-compliance": "Error establishing connection. Please contact your administrator.",
        "posture-compliance-statuses": [status],
        "schema-version": "v1"

def handler(event, context):
    ip = event['public-ip']

    conn = http.client.HTTPSConnection(cloud_front_url)
    conn.request("GET", f'/{endpoint}', headers={'X-Forwarded-For': ip})
    r1 = conn.getresponse()

    status_code = r1.status

    if status_code in success_status_codes:
        print("User's IP is based from an allowed country. Allowing the connection to VPN.")
        return build_response(True, 'compliant')

    print("User's IP is NOT based from an allowed country. Blocking the connection to VPN.")
    return build_response(False, 'quarantined')

After the client VPN session is established successfully, the request from the user device flows through the NAT gateway. The originating source IP address is recognized, because it is the Elastic IP address associated with the NAT gateway. An IAM policy is defined that denies any request to your AWS resources that doesn’t originate from the NAT gateway Elastic IP address. By attaching this IAM policy to users, you can control which AWS resources they can access.

Figure 2 illustrates the process of a user trying to access an Amazon Simple Storage Service (Amazon S3) bucket.

Figure 2: Enforce access to AWS resources from specific IPs

Figure 2: Enforce access to AWS resources from specific IPs

Let’s look at how the process illustrated in Figure 2 works.

  1. A user signs in to the AWS Management Console by authenticating against the IdP and assumes an IAM role.
  2. Using the IAM role, the user makes a request to list Amazon S3 buckets. The IAM policy of the user is evaluated to form an allow or deny decision.
  3. If the request is allowed, an API request is made to Amazon S3.

The aws:SourceIp condition key is used in a policy to deny requests from principals if the origin IP address isn’t the NAT gateway IP address. However, this policy also denies access if an AWS service makes calls on a principal’s behalf. For example, when you use AWS CloudFormation to provision a stack, it provisions resources by using its own IP address, not the IP address of the originating request. In this case, you use aws:SourceIp with the aws:ViaAWSService key to ensure that the source IP address restriction applies only to requests made directly by a principal.

IAM deny policy

The IAM policy doesn’t allow any actions. What the policy does is deny any action on any resource if the source IP address doesn’t match any of the IP addresses in the condition. Use this policy in combination with other policies that allow specific actions.


Make sure that you have the following in place before you deploy the solution:

Implementation and deployment details

In this section, you create a CloudFormation stack that creates AWS resources for this solution. To start the deployment process, select the following Launch Stack button.

Select the Launch Stack button to launch the template

You also can download the CloudFormation template if you want to modify the code before the deployment.

The template in Figure 3 takes several parameters. Let’s go over the key parameters.

Figure 3: CloudFormation stack parameters

Figure 3: CloudFormation stack parameters

The key parameters are:

  • AuthenticationOption: Information about the authentication method to be used to authenticate clients. You can choose either AWS Managed Microsoft AD or IAM SAML identity provider for authentication.
  • AuthenticationOptionResourceIdentifier: The ID of the AWS Managed Microsoft AD directory to use for Active Directory authentication, or the Amazon Resource Number (ARN) of the SAML provider for federated authentication.
  • ServerCertificateArn: The ARN of the server certificate. The server certificate must be provisioned in ACM.
  • CountryCodes: A string of comma-separated country codes. For example: US,GB,DE. The country codes must be alpha-2 country ISO codes of the ISO 3166 international standard.
  • LambdaProvisionedConcurrency: Provisioned concurrency for the client connection handler. We recommend that you configure provisioned concurrency for the Lambda function to enable it to scale without fluctuations in latency.

All other input fields have default values that you can either accept or override. Once you provide the parameter input values and reach the final screen, choose Create stack to deploy the CloudFormation stack.

This template creates several resources in your AWS account, as follows:

  • A VPC and associated resources, such as InternetGateway, Subnets, ElasticIP, NatGateway, RouteTables, and SecurityGroup.
  • A Client VPN endpoint, which provides connectivity to your VPC.
  • A Lambda function, which is invoked by the Client VPN endpoint to determine the country origin of the user’s IP address.
  • An API Gateway for the Lambda function to make an HTTPS request.
  • AWS WAF in front of API Gateway, which only allows requests to go through to API Gateway if the user’s IP address is based in one of the allowed countries.
  • A deny policy with a NAT gateway IP addresses condition. Attaching this policy to a role or user enforces that the user can’t access your AWS resources unless they are connected to your client VPN.

Note: CloudFormation stack deployment can take up to 20 minutes to provision all AWS resources.

After creating the stack, there are two outputs in the Outputs section, as shown in Figure 4.

Figure 4: CloudFormation stack outputs

Figure 4: CloudFormation stack outputs

  • ClientVPNConsoleURL: The URL where you can download the client VPN configuration file.
  • IAMRoleClientVpnDenyIfNotNatIP: The IAM policy to be attached to an IAM role or IAM user to enforce access control.

Attach the IAMRoleClientVpnDenyIfNotNatIP policy to a role

This policy is used to enforce access to your AWS resources based on geolocation. Attach this policy to the role that you are using for testing the solution. You can use the steps in Adding IAM identity permissions to do so.

Configure the AWS client VPN desktop application

When you open the URL that you see in ClientVPNConsoleURL, you see the newly provisioned Client VPN endpoint. Select Download Client Configuration to download the configuration file.

Figure 5: Client VPN endpoint

Figure 5: Client VPN endpoint

Confirm the download request by selecting Download.

Figure 6: Client VPN Endpoint - Download Client Configuration

Figure 6: Client VPN Endpoint – Download Client Configuration

To connect to the Client VPN endpoint, follow the steps in Connect to the VPN. After a successful connection is established, you should see the message Connected. in your AWS Client VPN desktop application.

Figure 7: AWS Client VPN desktop application - established VPN connection

Figure 7: AWS Client VPN desktop application – established VPN connection


If you can’t establish a Client VPN connection, here are some things to try:

  • Confirm that the Client VPN connection has successfully established. It should be in the Connected state. To troubleshoot connection issues, you can follow this guide.
  • If the connection isn’t establishing, make sure that your machine has TCP port 35001 available. This is the port used for receiving the SAML assertion.
  • Validate that the user you’re using for testing is a member of the correct SAML group on your IdP.
  • Confirm that the IdP is sending the right details in the SAML assertion. You can use browser plugins, such as SAML-tracer, to inspect the information received in the SAML assertion.

Test the solution

Now that you’re connected to Client VPN, open the console, sign in to your AWS account, and navigate to the Amazon S3 page. Since you’re connected to the VPN, your origin IP address is one of the NAT gateway IPs, and the request is allowed. You can see your S3 bucket, if any exist.

Figure 8: Amazon S3 service console view - user connected to AWS Client VPN

Figure 8: Amazon S3 service console view – user connected to AWS Client VPN

Now that you’ve verified that you can access your AWS resources, go back to the Client VPN desktop application and disconnect your VPN connection. Once the VPN connection is disconnected, go back to the Amazon S3 page and reload it. This time you should see an error message that you don’t have permission to list buckets, as shown in Figure 9.

Figure 9: Amazon S3 service console view - user is disconnected from AWS Client VPN

Figure 9: Amazon S3 service console view – user is disconnected from AWS Client VPN

Access has been denied because your origin public IP address is no longer one of the NAT gateway IP addresses. As mentioned earlier, since the policy denies any action on any resource without an established VPN connection to the Client VPN endpoint, access to all your AWS resources is denied.

Scale the solution in AWS Organizations

With AWS Organizations, you can centrally manage and govern your environment as you grow and scale your AWS resources. You can use Organizations to apply policies that give your teams the freedom to build with the resources they need, while staying within the boundaries you set. By organizing accounts into organizational units (OUs), which are groups of accounts that serve an application or service, you can apply service control policies (SCPs) to create targeted governance boundaries for your OUs. To learn more about Organizations, see AWS Organizations terminology and concepts.

SCPs help you to ensure that your accounts stay within your organization’s access control guidelines across all your accounts within OUs. In particular, these are the key benefits of using SCPs in your AWS Organizations:

  • You don’t have to create an IAM policy with each new account, but instead create one SCP and apply it to one or more OUs as needed.
  • You don’t have to apply the IAM policy to every IAM user or role, existing or new.
  • This solution can be deployed in a separate account, such as a shared infrastructure account. This helps to decouple infrastructure tooling from business application accounts.

The following figure, Figure 10, illustrates the solution in an Organizations environment.

Figure 10: Use SCPs to enforce policy across many AWS accounts

Figure 10: Use SCPs to enforce policy across many AWS accounts

The Client VPN account is the account the solution is deployed into. This account can also be used for other networking related services. The SCP is created in the Organizations root account and attached to one or more OUs. This allows you to centrally control access to your AWS resources.

Let’s review the new condition that’s added to the IAM policy:

"ArnNotLikeIfExists": {
    "aws:PrincipalARN": [

The aws:PrincipalARN condition key allows your AWS services to communicate to other AWS services even though those won’t have a NAT IP address as the source IP address. For instance, when a Lambda function needs to read a file from your S3 bucket.

Note: Appending policies to existing resources might cause an unintended disruption to your application. Consider testing your policies in a test environment or to non-critical resources before applying them to production resources. You can do that by attaching the SCP to a specific OU or to an individual AWS account.


After you’ve tested the solution, you can clean up all the created AWS resources by deleting the CloudFormation stack.


In this post, we showed you how you can restrict IAM users to access AWS resources from specific geographic locations. You used Client VPN to allow users to establish a client VPN connection from a desktop. You used an AWS client connection handler (as a Lambda function), and API Gateway with AWS WAF to identify the user’s geolocation. NAT gateway IPs served as trusted source IPs, and an IAM policy protects access to your AWS resources. Lastly, you learned how to scale this solution to many AWS accounts with Organizations.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Artem Lovan

Artem is a Senior Solutions Architect based in New York. He helps customers architect and optimize applications on AWS. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.


Faiyaz Desai

Faiyaz leads a solutions architecture team supporting cloud-native customers in New York. His team guides customers in their modernization journeys through business and technology strategies, architectural best practices, and customer innovation. Faiyaz’s focus areas include unified communication, customer experience, network design, and mobile endpoint security.

Protect public clients for Amazon Cognito by using an Amazon CloudFront proxy

Post Syndicated from Mahmoud Matouk original https://aws.amazon.com/blogs/security/protect-public-clients-for-amazon-cognito-by-using-an-amazon-cloudfront-proxy/

In Amazon Cognito user pools, an app client is an entity that has permission to call unauthenticated API operations (that is, operations that don’t have an authenticated user), such as operations to sign up, sign in, and handle forgotten passwords. In this post, I show you a solution designed to protect these API operations from unwanted bots and distributed denial of service (DDoS) attacks.

To protect Amazon Cognito services and customers, Amazon Cognito applies request rate quotas on all API categories, and throttles rapid calls that exceed the assigned quota. For that reason, you must ensure your applications control who can call unauthenticated API operations and at what rate, so that user calls aren’t throttled because of unwanted or misconfigured clients that call these API operations at high rates.

App clients fall into one of two categories: public clients (used from web or mobile applications) and private or confidential clients (used from a secured backend). Public clients shouldn’t have secrets, because it isn’t possible to protect secrets in these types of clients. Confidential clients, on the other hand, use a secret to authorize calls to unauthenticated operations. In these clients, the secret can be protected in the backend.

The benefit of using a confidential app client with a secret in Amazon Cognito is that unauthenticated API operations will accept only the calls that include the secret hash for this client, and will drop calls with an invalid or missing secret. In this way, you control who calls these API operations. Public applications can use a confidential app client by implementing a lightweight proxy layer in front of the Amazon Cognito endpoint, and then using this proxy to add a secret hash in relevant requests before passing the requests to Amazon Cognito.

There are multiple options that you can use to implement this proxy. One option is to use Amazon CloudFront and [email protected] to add the secret hash to the incoming requests. When you use a CloudFront proxy, you can also use AWS WAF, which gives you tools to detect and block unwanted clients. From [email protected], you can also integrate with other services (like Amazon Fraud Detector or third-party bot detection services) to help you detect possible fraudulent requests and block them. The CloudFront proxy, with the right set of security tools, helps protect your Amazon Cognito user pool from unwanted clients.

Solution overview

To implement this lightweight proxy pattern, you need to create an application client with a secret. Unauthenticated API calls to this client must include the secret hash which is added to the request from the proxy layer. Client applications use an SDK like AWS Amplify, the Amazon Cognito Identity SDK, or a mobile SDK to communicate with Amazon Cognito. By default, the SDK sends requests to the Regional Amazon Cognito endpoint. Your application must override the default endpoint by manually adding an “Endpoint” property in the app configuration. See the Integrate the client application with the proxy section later in this post for more details.

Figure 1 shows how this works, step by step.

Figure 1: A proxy solution to the Amazon Cognito Regional endpoint

Figure 1: A proxy solution to the Amazon Cognito Regional endpoint

The workflow is as follows:

  1. You configure the client application (mobile or web client) to use a CloudFront endpoint as a proxy to an Amazon Cognito Regional endpoint. You also create an application client in Amazon Cognito with a secret. This means that any unauthenticated API call must have the secret hash.
  2. Clients that send unauthenticated API calls to the Amazon Cognito endpoint directly are blocked and dropped because of the missing secret.
  3. You use [email protected] to add a secret hash to the relevant incoming requests before passing them on to the Amazon Cognito endpoint.
  4. From [email protected], you must have the app client secret to be able to calculate the secret hash and add it to the request. It’s recommended that you keep the secret in AWS Secrets Manager and cache it for the lifetime of the function.
  5. You use AWS WAF with CloudFront distribution to enforce rate limiting, allow and deny lists, and other rule groups according to your security requirements.

When to use this pattern

It’s a best practice to use this proxy pattern with clients that use SDKs to integrate with Amazon Cognito user pools. Examples include mobile applications that use the iOS or Android SDK, or web applications that use client-side libraries like Amplify or the Amazon Cognito Identity SDK to integrate with Amazon Cognito.

You don’t need to use a proxy pattern with server-side applications that use an AWS SDK to integrate with Amazon Cognito user pools from a protected backend, because server-side applications can natively use confidential clients and protect the secret in the backend.

You can’t use this solution with applications that use Hosted UI and OAuth 2.0 endpoints to integrate with Amazon Cognito user pools. This includes federation scenarios where users sign in with an external identity provider (IdP).

Implementation and deployment details

Before you deploy this solution, you need a user pool and an application client that has the client secret. When you have these in place, choose the following Launch Stack button to launch a CloudFormation stack in your account and deploy the proxy solution.

Select the Launch Stack button to launch the template

Note: The CloudFormation stack must be created in the us-east-1 AWS Region, but the user pool itself can exist in any supported Region.

The template takes the parameters shown in Figure 2 below.

Figure 2: CloudFormation stack creation with initial parameters

Figure 2: CloudFormation stack creation with initial parameters

The parameters in Figure 2 include:

  • AdvancedSecurityEnabled is a flag that indicates whether advanced security is enabled in the user pool or not. This flag determines which version of the Lambda function is deployed. Notice that if you change this flag as part of a stack update, it overrides the function code, so if you have any manual changes, make sure to back up your changes.
  • AppClientSecret is the secret for your application client. This secret is stored in Secrets Manager and accessed from [email protected] as needed.
  • LambdaS3BucketName is the bucket that hosts the Lambda code package. You don’t need to change this parameter unless you have a requirement to modify or extend the solution with your own Lambda function.
  • RateLimit is the maximum number of calls from a single IP address that are allowed within a 5 minute period. Values between 100 requests and 20 million requests are valid for RateLimit.
  • Important: provide a value suitable for your application and security requirements.

  • UserPoolId is the ID of your user pool. This value is used by [email protected] when needed (for example, to call admin APIs, which require the user pool ID).
  • UserPoolRegion is the AWS Region where you created your user pool. This value is used to determine which Amazon Cognito Regional endpoint to proxy the calls to.

This template creates several resources in your AWS account, as follows:

  1. A CloudFront distribution that serves as a proxy to an Amazon Cognito Regional endpoint.
  2. An AWS WAF web access control list (ACL) with rules for the allow list, deny list, and rate limit.
  3. A Lambda function to be deployed at the edge and assigned to the origin request event.
  4. A secret in Secrets Manager, to hold the values of the application client secret and user pool ID.

After you create the stack, the CloudFront distribution domain name is available on the Outputs tab in the CloudFront console, as shown in Figure 3. This is the value that’s used as the Endpoint property in your client-side application. You can optionally add an alternative domain name to the CloudFront distribution if you prefer to use your own custom domain.

Figure 3: The output of the CloudFormation stack creation, displaying the CloudFront domain name

Figure 3: The output of the CloudFormation stack creation, displaying the CloudFront domain name

Use [email protected] to add a secret hash to the request

As explained earlier, the purpose of having this proxy is to be able to inject the secret hash in unauthenticated API calls before passing them to the Amazon Cognito endpoint. This injection is achieved by a Lambda function that intercepts incoming requests at the edge (the CloudFront distribution) before passing them to the origin (the Amazon Cognito Regional endpoint).

The Lambda function that is deployed to the edge has two versions. One is a simple pass-through proxy that only adds the secret hash, and this version is used if Amazon Cognito advanced security isn’t enabled. The other version is a proxy that uses the AdminInitiateAuth and AdminRespondToAuthChallenge API operations instead of unauthenticated API operations for the user authentication and challenge response. This allows the proxy layer to propagate the client IP address to the Amazon Cognito endpoint, which guides the adaptive authentication features of advanced security. The version that is deployed by the stack is determined by the AdvancedSecurityEnabled flag when you create or update the CloudFormation stack.

You can extend this solution by manually modifying the Lambda function with your own processing logic. For example, you can integrate with fraud detection or bot detection services to evaluate the request and decide to proceed or reject the call. Note that after making any change to the Lambda function code, you must deploy a new version to the edge location. To do that from the Lambda console, navigate to Actions, choose Deploy to [email protected], and then choose Use existing CloudFront trigger on this function.

Important: If you update the stack from CloudFormation and change the value of the AdvancedSecurityEnabled flag, the new value overrides the Lambda code with the default version for the choice. In that case, all manual changes are lost.

Allow or block requests

The template that is provided in this blog post creates a web ACL with three rules: AllowList, DenyList, and RateLimit. These rules are evaluated in order and determine which requests are allowed or blocked. The template also creates four IP sets, as shown in Figure 4, to hold the values of allowed or blocked IPs for both IPv4 and IPv6 address types.

Figure 4: The CloudFormation template creates IP sets in the AWS WAF console for allow and deny lists

Figure 4: The CloudFormation template creates IP sets in the AWS WAF console for allow and deny lists

If you want to always allow requests from certain clients, for example, trusted enterprise clients or server-side clients in cases where a large volume of requests is coming from the same IP address like a VPN gateway, add these IP addresses to the corresponding AllowList IP set. Similarly, if you want to always block traffic from certain IPs, add those IPs to the corresponding DenyList IP set.

Requests from sources that aren’t on the allow list or deny list are evaluated based on the volume of calls within 5 minutes, and sources that exceed the defined rate limit within 5 minutes are automatically blocked. If you want to change the defined rate limit, you can do so by updating the CloudFormation stack and providing a different value for the RateLimit parameter. Or you can modify this value directly in the AWS WAF console by editing the RateLimit rule.

Note: You can also use AWS Managed Rules for AWS WAF to add additional protection according to your security needs.

Integrate the client application with the proxy

You can integrate the client application with the proxy by changing the Endpoint in your client application to use the CloudFront distribution domain name. The domain name is located in the Outputs section of the CloudFormation stack.

You then need to edit your client-side code to forward calls to Amazon Cognito through the proxy endpoint. For example, if you’re using the Identity SDK, you should change this property as follows.

var poolData = {
  UserPoolId: '<USER-POOL-ID>',
  ClientId: '<APP-CLIENT-ID>',
  endpoint: 'https://<CF-DISTRIBUTION-DOMAIN>'

If you’re using AWS Amplify, you can change the endpoint in the aws-exports.js file by overriding the property aws_cognito_endpoint. Or, if you configure Amplify Auth in your code, you can provide the endpoint as follows.

  userPoolId: '<USER-POOL-ID>',
  userPoolWebClientId: '<APP-CLIENT-ID>',
  endpoint: 'https://<CF-DISTRIBUTION-DOMAIN>'

If you have a mobile application that uses the Amplify mobile SDK, you can override the endpoint in your configuration as follows (don’t include AppClientSecret parameter in your configuration). Note that the Endpoint value contains the domain name only, not the full URL. This feature is available in the latest releases of the iOS and Android SDKs.

"CognitoUserPool": {
  "Default": {
    "AppClientId": "<APP-CLIENT-ID>",
    "Endpoint": "<CF-DISTRIBUTION-DOMAIN>",
    "PoolId": "<USER-POOL-ID>",
    "Region": "<REGION>"

Warning: The Amplify CLI overwrites customizations to the awsconfiguration.json and amplifyconfiguration.json files if you do an amplify push or amplify pull operation. You must manually re-apply the Endpoint customization and remove the AppClientSecret if you use the CLI to modify your cloud backend.

Solution limitations

This solution has these limitations:

  • If advanced security features are enabled for the user pool, Amazon Cognito calculates risk for user events. If you use this proxy pattern, the IP address that is propagated in user events is the proxy IP address, which causes risk calculation for SignUp, ForgotPassword, and ResendCode events to be inaccurate. On the other hand, Sign-In events still have the client IP address propagated correctly, and risk calculation and adaptive authentication for Sign-In events aren’t affected by the use of this proxy.
  • This solution is not applicable to Hosted UI, OAuth 2.0 endpoints, and federation flows.
  • Authenticated and admin API operations (which require developer credentials or an access token) aren’t covered in this solution. These API operations don’t require a secret hash, and they use other authentication mechanisms.
  • Using this proxy solution with mobile apps requires an update to the application. The update might take time to be available in the relevant app store, and you must depend on end users to update their app. Plan ahead of time to use the solution with mobile apps.

How to detect unusual behavior

In this section, I share with you the steps to detect, quickly analyze and respond to unwanted clients. It’s a best practice to configure monitoring and alarms that help you to detect unexpected spikes in activity. Additionally, I show you how to be ready to quickly identify clients that are calling your resources at a higher-than-usual rate.

Monitor utilization compared to quotas

Amazon Cognito integrates with Service Quotas, which monitor service utilization compared to quotas. These metrics help you detect unexpected spikes and be alerted if you’re approaching your quota for a certain API category. Approaching your quota indicates that there is a risk that calls from legitimate users will be throttled.

To view utilization versus quota metrics

  1. In the Service Quotas console, choose Service Quotas, choose AWS Services, and then choose Amazon Cognito User Pools.
  2. Under Service quotas, enter the search term rate of. This shows you the list of API categories and the assigned quotas for each category.
    Figure 5: The Service Quotas console showing Amazon Cognito API category rate quotas

    Figure 5: The Service Quotas console showing Amazon Cognito API category rate quotas

  3. Choose any of the API categories to see utilization versus quota metrics.
    Figure 6: The Service Quotas console showing utilization vs quota metrics for Amazon Cognito UserCreation APIs

    Figure 6: The Service Quotas console showing utilization vs quota metrics for Amazon Cognito UserCreation APIs

  4. You can also create alarms from this page to alert you if utilization is above a pre-defined threshold. You can create alarms starting at 50 percent utilization. It’s recommended that you create multiple alarms, for example at the 50 percent, 70 percent, and 90 percent thresholds, and configure CloudWatch alarms as appropriate.
    Figure 7: Creating an alarm for the utilization of the UserCreation API category

    Figure 7: Creating an alarm for the utilization of the UserCreation API category

Analyze CloudTrail logs with Athena

If you detect an unexpected spike in traffic to a certain API category, the next step is to identify the sources of this spike. You can do that by using CloudTrail logs or, after you deploy and use this proxy solution, CloudFront logs as sources of information. You can then analyze these logs by using Amazon Athena queries.

The first step is to create Athena tables from CloudTrail and CloudFront logs. You can do that by following these steps for CloudTrail and similar steps for CloudFront. After you have these tables created, you can create a set of queries that help you identify unwanted clients. Here are a couple of examples:

  • Use the following query to identify clients with the highest call rate to the InitiateAuth API operation within the timeframe you noticed the spike (change the eventtime value to reflect the attack window).
    SELECT sourceipaddress, count(*)
    FROM "default"."cloudtrail_logs"
    WHERE eventname='InitiateAuth'
    AND eventtime >= '2021-03-01T00:00:00Z'and eventtime < '2021-03-31T00:00:00Z'
    GROUP BY sourceipaddress
    LIMIT 10

  • Use the following query to identify clients that come through CloudFront with the highest error rate.
    SELECT count(*) as count, request_ip
    FROM "default"."cloudfront_logs"
    WHERE status>500
    GROUP BY request_ip

After you identify sources that are calling your service with a higher-than-usual rate, you can block these clients by adding them to the DenyList IP set that was created in AWS WAF.

Analyze CloudTrail events with CloudWatch Logs Insights

It’s a best practice to configure your trail to send events to CloudWatch Logs. After you do this, you can interactively search and analyze your Amazon Cognito CloudTrail events with CloudWatch Logs Insights to identify errors, unusual activity, or unusual user behavior in your account.


In this post, I showed you how to implement a lightweight proxy to an Amazon Cognito endpoint, which can be used with an application client secret to control access to unauthenticated API operations. This approach, together with security tools such as AWS WAF, helps provide protection for these API operations from unwanted clients. I also showed you strategies to help detect an ongoing attack and quickly analyze, identify, and block unwanted clients.

For more strategies for DDoS mitigation, see the AWS Best Practices for DDoS Resiliency.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Mahmoud Matouk

Mahmoud is a Senior Solutions Architect with the Amazon Cognito team. He helps AWS customers build secure and innovative solutions for various identity and access management scenarios.

Enforcing AWS CloudFormation scanning in CI/CD Pipelines at scale using Trend Micro Cloud One Conformity

Post Syndicated from Chris Dorrington original https://aws.amazon.com/blogs/devops/cloudformation-scanning-cicd-pipeline-cloud-conformity/

Integrating AWS CloudFormation template scanning into CI/CD pipelines is a great way to catch security infringements before application deployment. However, implementing and enforcing this in a multi team, multi account environment can present some challenges, especially when the scanning tools used require external API access.

This blog will discuss those challenges and offer a solution using Trend Micro Cloud One Conformity (formerly Cloud Conformity) as the worked example. Accompanying this blog is the end to end sample solution and detailed install steps which can be found on GitHub here.

We will explore explore the following topics in detail:

  • When to detect security vulnerabilities
    • Where can template scanning be enforced?
  • Managing API Keys for accessing third party APIs
    • How can keys be obtained and distributed between teams?
    • How easy is it to rotate keys with multiple teams relying upon them?
  • Viewing the results easily
    • How do teams easily view the results of any scan performed?
  • Solution maintainability
    • How can a fix or update be rolled out?
    • How easy is it to change scanner provider? (i.e. from Cloud Conformity to in house tool)
  • Enforcing the template validation
    • How to prevent teams from circumventing the checks?
  • Managing exceptions to the rules
    • How can the teams proceed with deployment if there is a valid reason for a check to fail?


When to detect security vulnerabilities

During the DevOps life-cycle, there are multiple opportunities to test cloud applications for best practice violations when it comes to security. The Shift-left approach is to move testing to as far left in the life-cycle, so as to catch bugs as early as possible. It is much easier and less costly to fix on a local developer machine than it is to patch in production.

Diagram showing Shift-left approach

Figure 1 – depicting the stages that an app will pass through before being deployed into an AWS account

At the very left of the cycle is where developers perform the traditional software testing responsibilities (such as unit tests), With cloud applications, there is also a responsibility at this stage to ensure there are no AWS security, configuration, or compliance vulnerabilities. Developers and subsequent peer reviewers looking at the code can do this by eye, but in this way it is hard to catch every piece of bad code or misconfigured resource.

For example, you might define an AWS Lambda function that contains an access policy making it accessible from the world, but this can be hard to spot when coding or peer review. Once deployed, potential security risks are now live. Without proper monitoring, these misconfigurations can go undetected, with potentially dire consequences if exploited by a bad actor.

There are a number of tools and SaaS offerings on the market which can scan AWS CloudFormation templates and detect infringements against security best practices, such as Stelligent’s cfn_nag, AWS CloudFormation Guard, and Trend Micro Cloud One Conformity. These can all be run from the command line on a developer’s machine, inside the IDE or during a git commit hook. These options are discussed in detail in Using Shift-Left to Find Vulnerabilities Before Deployment with Trend Micro Template Scanner.

Whilst this is the most left the testing can be moved, it is hard to enforce it this early on in the development process. Mandating that scan commands be integrated into git commit hooks or IDE tools can significantly increase the commit time and quickly become frustrating for the developer. Because they are responsible for creating these hooks or installing IDE extensions, you cannot guarantee that a template scan is performed before deployment, because the developer could easily turn off the scans or not install the tools in the first place.

Another consideration for very-left testing of templates is that when applications are written using AWS CDK or AWS Serverless Application Model (SAM), the actual AWS CloudFormation template that is submitted to AWS isn’t available in source control; it’s created during the build or package stage. Therefore, moving template scanning as far to the left is just not possible in these situations. Developers have to run a command such as cdk synth or sam package to obtain the final AWS CloudFormation templates.

If we now look at the far right of Figure 1, when an application has been deployed, real time monitoring of the account can pick up security issues very quickly. Conformity performs excellently in this area by providing central visibility and real-time monitoring of your cloud infrastructure with a single dashboard. Accounts are checked against over 400 best practices, which allows you to find and remediate non-compliant resources. This real time alerting is fast – you can be assured of an email stating non-compliance in no time at all! However, remediation does takes time. Following the correct process, a fix to code will need to go through the CI/CD pipeline again before a patch is deployed. Relying on account scanning only at the far right is sub-optimal.

The best place to scan templates is at the most left of the enforceable part of the process – inside the CI/CD pipeline. Conformity provides their Template Scanner API for this exact purpose. Templates can be submitted to the API, and the same Conformity checks that are being performed in real time on the account are run against the submitted AWS CloudFormation template. When integrated programmatically into a build, failing checks can prevent a deployment from occurring.

Whilst it may seem a simple task to incorporate the Template Scanner API call into a CI/CD pipeline, there are many considerations for doing this successfully in an enterprise environment. The remainder of this blog will address each consideration in detail, and the accompanying GitHub repo provides a working sample solution to use as a base in your own organization.


View failing checks as AWS CodeBuild test reports

Treating failing Conformity checks the same as unit test failures within the build will make the process feel natural to the developers. A failing unit test will break the build, and so will a failing Conformity check.

AWS CodeBuild provides test reporting for common unit test frameworks, such as NUnit, JUnit, and Cucumber. This allows developers to easily and very visually see what failing tests have occurred within their builds, allowing for quicker remediation than having to trawl through test log files. This same principle can be applied to failing Conformity checks—this allows developers to quickly see what checks have failed, rather than looking into AWS CodeBuild logs. However, the AWS CodeBuild test reporting feature doesn’t natively support the JSON schema that the Conformity Template Scanner API returns. Instead, you need custom code to turn the Conformity response into a usable format. Later in this blog we will explore how the conversion occurs.

Cloud conformity failed checks displayed as CodeBuild Reports

Figure 2 – Cloud Conformity failed checks appearing as failed test cases in AWS CodeBuild reports

Enterprise speed bumps

Teams wishing to use template scanning as part of their AWS CodePipeline currently need to create an AWS CodeBuild project that calls the external API, and then performs the custom translation code. If placed inside a buildspec file, it can easily become bloated with many lines of code, leading to maintainability issues arising as copies of the same buildspec file are distributed across teams and accounts. Additionally, third-party APIs such as Conformity are often authorized by an API key. In some enterprises, not all teams have access to the Conformity console, further compounding the problem for API key management.

Below are some factors to consider when implementing template scanning in the enterprise:

  • How can keys be obtained and distributed between teams?
  • How easy is it to rotate keys when multiple teams rely upon them?
  • How can a fix or update be rolled out?
  • How easy is it to change scanner provider? (i.e. From Cloud Conformity to in house tool)

Overcome scaling issues, use a centralized Validation API

An approach to overcoming these issues is to create a single AWS Lambda function fronted by Amazon API Gateway within your organization that runs the call to the Template Scanner API, and performs the transform of results into a format usable by AWS CodeBuild reports. A good place to host this API is within the Cloud Ops team account or similar shared services account. This way, you only need to issue one API key (stored in AWS Secrets Manager) and it’s not available for viewing by any developers. Maintainability for the code performing the Template Scanner API calls is also very easy, because it resides in one location only. Key rotation is now simple (due to only one key in one location requiring an update) and can be automated through AWS Secrets Manager

The following diagram illustrates a typical setup of a multi-account, multi-dev team scenario in which a team’s AWS CodePipeline uses a centralized Validation API to call Conformity’s Template Scanner.

architecture diagram central api for cloud conformity template scanning

Figure 3 – Example of an AWS CodePipeline utilizing a centralized Validation API to call Conformity’s Template Scanner


Providing a wrapper API around the Conformity Template Scanner API encapsulates the code required to create the CodeBuild reports. Enabling template scanning within teams’ CI/CD pipelines now requires only a small piece of code within their CodeBuild buildspec file. It performs the following three actions:

  1. Post the AWS CloudFormation templates to the centralized Validation API
  2. Write the results to file (which are already in a format readable by CodeBuild test reports)
  3. Stop the build if it detects failed checks within the results

The centralized Validation API in the shared services account can be hosted with a private API in Amazon API Gateway, fronted by a VPC endpoint. Using a private API denies any public access but does allow access from any internal address allowed by the VPC endpoint security group and endpoint policy. The developer teams can run their AWS CodeBuild validation phase within a VPC, thereby giving it access to the VPC endpoint.

A working example of the code required, along with an AWS CodeBuild buildspec file, is provided in the GitHub repository


Converting 3rd party tool results to CodeBuild Report format

With a centralized API, there is now only one place where the conversion code needs to reside (as opposed to copies embedded in each teams’ CodePipeline). AWS CodeBuild Reports are primarily designed for test framework outputs and displaying test case results. In our case, we want to display Conformity checks – which are not unit test case results. The accompanying GitHub repository to convert from Conformity Template Scanner API results, but we will discuss mappings between the formats so that bespoke conversions for other 3rd party tools, such as cfn_nag can be created if required.

AWS CodeBuild provides out of the box compatibility for common unit test frameworks, such as NUnit, JUnit and Cucumber. Out of the supported formats, Cucumber JSON is the most readable format to read and manipulate due to native support in languages such as Python (all other formats being in XML).

Figure 4 depicts where the Cucumber JSON fields will appear in the AWS CodeBuild reports page and Figure 5 below shows a valid Cucumber snippet, with relevant fields highlighted in yellow.

CodeBuild Reports page with fields highlighted that correspond to cucumber JSON fields

Figure 4 – AWS CodeBuild report test case field mappings utilized by Cucumber JSON



Cucumber JSON snippet showing CodeBuild Report field mappings

Figure 5 – Cucumber JSON with mappings to AWS CodeBuild report table


Note that in Figure 5, there are additional fields (eg. id, description etc) that are required to make the file valid Cucumber JSON – even though this data is not displayed in CodeBuild Reports page. However, raw reports are still available as AWS CodeBuild artifacts, and therefore it is useful to still populate these fields with data that could be useful to aid deeper troubleshooting.

Conversion code for Conformity results is provided in the accompanying GitHub repo, within file app.py, line 376 onwards


Making the validation phase mandatory in AWS CodePipeline

The Shift-Left philosophy states that we should shift testing as much as possible to the left. The furthest left would be before any CI/CD pipeline is triggered. Developers could and should have the ability to perform template validation from their own machines. However, as discussed earlier this is rarely enforceable – a scan during a pipeline deployment is the only true way to know that templates have been validated. But how can we mandate this and truly secure the validation phase against circumvention?

Preventing updates to deployed CI/CD pipelines

Using a centralized API approach to make the call to the validation API means that this code is now only accessible by the Cloud Ops team, and not the developer teams. However, the code that calls this API has to reside within the developer teams’ CI/CD pipelines, so that it can stop the build if failures are found. With CI/CD pipelines defined as AWS CloudFormation, and without any preventative measures in place, a team could move to disable the phase and deploy code without any checks performed.

Fortunately, there are a number of approaches to prevent this from happening, and to enforce the validation phase. We shall now look at one of them from the AWS CloudFormation Best Practices.

IAM to control access

Use AWS IAM to control access to the stacks that define the pipeline, and then also to the AWS CodePipeline/AWS CodeBuild resources within them.

IAM policies can generically restrict a team from updating a CI/CD pipeline provided to them if a naming convention is used in the stacks that create them. By using a naming convention, coupled with the wildcard “*”, these policies can be applied to a role even before any pipelines have been deployed..

For example, lets assume the pipeline depicted in Figure 6 is defined and deployed in AWS CloudFormation as follows:

  • Stack name is “cicd-pipeline-team-X”
  • AWS CodePipeline resource within the stack has logical name with prefix “CodePipelineCICD”
  • AWS CodeBuild Project for validation phase is prefixed with “CodeBuildValidateProject”

Creating an IAM policy with the statements below and attaching to the developer teams’ IAM role will prevent them from modifying the resources mentioned above. The AWS CloudFormation stack and resource names will match the wildcards in the statements and Deny the user to any update actions.

Example IAM policy highlighting how to deny updates to stacks and pipeline resources

Figure 6 – Example of how an IAM policy can restrict updates to AWS CloudFormation stacks and deployed resources


Preventing valid failing checks from being a bottleneck

When centralizing anything, and forcing developers to use tooling or features such as template scanners, it is imperative that it (or the team owning it) does not become a bottleneck and slow the developers down. This is just as true for our centralized API solution.

It is sometimes the case that a developer team has a valid reason for a template to yield a failing check. For instance, Conformity will report a HIGH severity alert if a load balancer does not have an HTTPS listener. If a team is migrating an older application which will only work on port 80 and not 443, the team may be able to obtain an exception from their cyber security team. It would not desirable to turn off the rule completely in the real time scanning of the account, because for other deployments this HIGH severity alert could be perfectly valid. The team faces an issue now because the validation phase of their pipeline will fail, preventing them from deploying their application – even though they have cyber approval to fail this one check.

It is imperative that when enforcing template scanning on a team that it must not become a bottleneck. Functionality and workflows must accompany such a pipeline feature to allow for quick resolution.

Screenshot of Trend Micro Cloud One Conformity rule from their website

Figure 7 – Screenshot of a Conformity rule from their website

Therefore the centralized validation API must provide a way to allow for exceptions on a case by case basis. Any exception should be tied to a unique combination of AWS account number + filename + rule ID, which ensures that exceptions are only valid for the specific instance of violation, and not for any other. This can be achieved by extending the centralized API with a set of endpoints to allow for exception request and approvals. These can then be integrated into existing or new tooling and workflows to be able to provide a self service method for teams to be able to request exceptions. Cyber security teams should be able to quickly approve/deny the requests.

The exception request/approve functionality can be implemented by extending the centralized private API to provide an /exceptions endpoint, and using DynamoDB as a data store. During a build and template validation, failed checks returned from Conformity are then looked up in the Dynamo table to see if an approved exception is available – if it is, then the check is not returned as a actual failing check, but rather an exempted check. The build can then continue and deploy to the AWS account.

Figure 8 and figure 9 depict the /exceptions endpoints that are provided as part of the sample solution in the accompanying GitHub repository.

screenshot of API gateway for centralized template scanner api

Figure 8 – Screenshot of API Gateway depicting the endpoints available as part of the accompanying solution


The /exceptions endpoint methods provides the following functionality:

Table containing HTTP verbs for exceptions endpoint

Figure 9 – HTTP verbs implementing exception functionality

Important note regarding endpoint authorization: Whilst the “validate” private endpoint may be left with no auth so that any call from within a VPC is accepted, the same is not true for the “exception” approval endpoint. It would be prudent to use AWS IAM authentication available in API Gateway to restrict approvals to this endpoint for certain users only (i.e. the cyber and cloud ops team only)

With the ability to raise and approve exception requests, the mandatory scanning phase of the developer teams’ pipelines is no longer a bottleneck.



Enforcing template validation into multi developer team, multi account environments can present challenges with using 3rd party APIs, such as Conformity Template Scanner, at scale. We have talked through each hurdle that can be presented, and described how creating a centralized Validation API and exception approval process can overcome those obstacles and keep the teams deploying without unwarranted speed bumps.

By shifting left and integrating scanning as part of the pipeline process, this can leave the cyber team and developers sure that no offending code is deployed into an account – whether they were written in AWS CDK, AWS SAM or AWS CloudFormation.

Additionally, we talked in depth on how to use CodeBuild reports to display the vulnerabilities found, aiding developers to quickly identify where attention is required to remediate.

Getting started

The blog has described real life challenges and the theory in detail. A complete sample for the described centralized validation API is available in the accompanying GitHub repo, along with a sample CodePipeline for easy testing. Step by step instructions are provided for you to deploy, and enhance for use in your own organization. Figure 10 depicts the sample solution available in GitHub.


NOTE: Remember to tear down any stacks after experimenting with the provided solution, to ensure ongoing costs are not charged to your AWS account. Notes on how to do this are included inside the repo Readme.


example codepipeline architecture provided by the accompanying github solution

Figure 10 depicts the solution available for use in the accompanying GitHub repository


Find out more

Other blog posts are available that cover aspects when dealing with template scanning in AWS:

For more information on Trend Micro Cloud One Conformity, use the links below.

Trend Micro AWS Partner Network joint image

Avatar for Chris Dorrington

Chris Dorrington

Chris Dorrington is a Senior Cloud Architect with AWS Professional Services in Perth, Western Australia. Chris loves working closely with AWS customers to help them achieve amazing outcomes. He has over 25 years software development experience and has a passion for Serverless technologies and all things DevOps


Automate resolution for IAM Access Analyzer cross-account access findings on IAM roles

Post Syndicated from Ramesh Balajepalli original https://aws.amazon.com/blogs/security/automate-resolution-for-iam-access-analyzer-cross-account-access-findings-on-iam-roles/

In this blog post, we show you how to automatically resolve AWS Identity and Access Management (IAM) Access Analyzer findings generated in response to unintended cross-account access for IAM roles. The solution automates the resolution by responding to the Amazon EventBridge event generated by IAM Access Analyzer for each active finding.

You can use identity-based policies and resource-based policies to granularly control access to a specific resource and how you use it across the entire AWS Cloud environment. It is important to ensure that policies you create adhere to your organization’s requirements on data/resource access and security best practices. IAM Access Analyzer is a feature that you can enable to continuously monitor policies for changes, and generate detailed findings related to access from external entities to your AWS resources.

When you enable Access Analyzer, you create an analyzer for your entire organization or your account. The organization or account you choose is known as the zone of trust for the analyzer. The zone of trust determines what type of access is considered trusted by Access Analyzer. Access Analyzer continuously monitors all supported resources to identify policies that grant public or cross-account access from outside the zone of trust, and generates findings. In this post, we will focus on an IAM Access Analyzer finding that is generated when an IAM role is granted access to an external AWS principal that is outside your zone of trust. To resolve the finding, we will show you how to automatically block such unintended access by adding explicit deny statement to the IAM role trust policy.


To ensure that the solution only prevents unintended cross account access for IAM roles, we highly recommend you to do the following within your AWS environment before deploying the solution described in the blog post:

Note: This solution adds an explicit deny in the IAM role trust policy to block the unintended access, which overrides any existing allow actions. We recommend that you carefully evaluate that this is the resolution action you want to apply.

Solution overview

To demonstrate this solution, we will take a scenario where you are asked to grant access to an external AWS account. In order to grant access, you create an IAM role named Audit_CrossAccountRole on your AWS account 123456789012. You grant permission to assume the role Audit_CrossAccountRole to an AWS principal named Alice in AWS account 999988887777, which is out-side of your AWS Organizations. The following is an example of the trust policy for the IAM role Audit_CrossAccountRole:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::999988887777:user/Alice"
      "Action": "sts:AssumeRole",
      "Condition": {}

Assuming the principal arn:aws:iam::999988887777:user/Alice was not archived previously in Access Analyzer, you see an active finding in Access Analyzer as shown in Figure 1.

Figure 1: Sample IAM Access Analyzer finding in AWS Console

Figure 1: Sample IAM Access Analyzer finding in AWS Console

Typically, you will review this finding and determine whether this access is intended or not. If the access is unintended, you can block access to the principal 999988887777/Alice by adding an explicit deny to the IAM role trust policy, and then follow up with IAM role owner to find out if there is a reason to allow this cross-account access. If the access is intended, then you can create an archive rule that will archive the finding and suppress such findings in future.

We will walk through the solution to automate this resolution process in the remainder of this blog post.

Solution walkthrough

Access Analyzer sends an event to Amazon EventBridge for every active finding. This solution configures an event rule in EventBridge to match an active finding, and triggers a resolution AWS Lambda function. The Lambda function checks that the resource type in the finding is an IAM role, and then adds a deny statement to the associated IAM role trust policy as a resolution. The Lambda function also sends an email through Amazon Simple Notification Service (Amazon SNS) to the email address configured in the solution. The individual or group who receives the email can then review the automatic resolution and the IAM role. They can then decide either to remove the role for unintended access, or to delete the deny statement from the IAM trust policy and create an archive rule in Access Analyzer to suppress such findings in future.

Figure 2: Automated resolution followed by human review

Figure 2: Automated resolution followed by human review

Figure 2 shows the following steps of the resolution solution.

  1. Access Analyzer scans resources and generates findings based on the zone of trust and the archive rules configuration. The following is an example of an Access Analyzer active finding event sent to Amazon EventBridge:
        "version": "0",
        "id": "22222222-dcba-4444-dcba-333333333333",
        "detail-type": "Access Analyzer Finding",
        "source": "aws.access-analyzer",
        "account": "123456789012",
        "time": "2020-05-13T03:14:33Z",
        "region": "us-east-1",
        "resources": [
            "arn:aws:access-analyzer:us-east-1: 123456789012:analyzer/AccessAnalyzer"
        "detail": {
            "version": "1.0",
            "id": "a5018210-97c4-46c4-9456-0295898377b6",
            "status": "ACTIVE",
            "resourceType": "AWS::IAM::Role",
            "resource": "arn:aws:iam::123456789012:role/ Audit_CrossAccountRole",
            "createdAt": "2020-05-13T03:14:32Z",
            "analyzedAt": "2020-05-13T03:14:32Z",
            "updatedAt": "2020-05-13T03:14:32Z",
            "accountId": "123456789012",
            "region": "us-east-1",
            "principal": {
                "AWS": "aws:arn:iam::999988887777:user/Alice"
            "action": [
            "condition": {},
            "isDeleted": false,
            "isPublic": false

  2. EventBridge receives an event for the Access Analyzer finding, and triggers the AWS Lambda function based on the event rule configuration. The following is an example of the EventBridge event pattern to match active Access Analyzer findings:
      "source": [
      "detail-type": [
        "Access Analyzer Finding"
      "detail": { 
         "status": [ "ACTIVE" ],
    	"resourceType": [ "AWS::IAM:Role" ] 

  3. The Lambda function processes the event when ResourceType is equal to AWS::IAM::Role, as shown in the following example Python code:
    ResourceType = event['detail']['resourceType']
    ResourceType = "".join(ResourceType.split())
    if ResourceType == 'AWS::IAM::Role' :

    Then, the Lambda function adds an explicit deny statement in the trust policy of the IAM role where the Sid of the new statement references the Access Analyzer finding ID.

    def disable_iam_access(<resource_name>, <ext_arn>, <finding_id>):
            ext_arn = ext_arn.strip()
            policy = {
                "Sid": <finding_id>,
                "Effect": "Deny",
                "Principal": {
                    "AWS": <ext_arn>},
                "Action": "sts:AssumeRole"
            response = iam.get_role(RoleName=<resource_name>)
            current_policy = response['Role']['AssumeRolePolicyDocument']
            current_policy = current_policy['Statement'].append(policy)
            new_policy = json.dumps(response['Role']['AssumeRolePolicyDocument'])
            response = iam.update_assume_role_policy(
        except Exception as e:
            logger.error('Unable to update IAM Policy')

    As result, the IAM role trust policy looks like the following example:

            "Version": "2012-10-17",
            "Statement": [
                    "Effect": "Allow",
                    "Principal": {
            "AWS": "arn:aws:iam::999988887777:user/Alice"
                    "Action": "sts:AssumeRole",
                    "Sid": "22222222-dcba-4444-dcba-333333333333",
                    "Effect": "Deny",
                    "Principal": {
            "AWS": "arn:aws:iam::999988887777:user/Alice"
                    "Action": "sts:AssumeRole"

    Note: After Access Analyzer adds the deny statement, on the next scan, Access Analyzer finds the resource is no longer shared outside of your zone of trust. Access Analyzer changes the status of the finding to Resolved and the finding appears in the Resolved findings table.

  4. The Lambda function sends a notification to an SNS topic that sends an email to the configured email address (which should be the business owner or security team) subscribed to the SNS topic. The email notifies them that a specific IAM role has been blocked from the cross-account access. The following is an example of the SNS code for the notification.
    def send_notifications(sns_topic,principal, resource_arn, finding_id):
        sns_client = boto3.client("sns")
        message = "The IAM Role resource {} allows access to the principal {}. Trust policy for the role has been updated to deny the external access. Please review the IAM Role and its trust policy. If this access is intended, update the IAM Role trust policy to remove a statement with SID matching with the finding id {} and mark the finding as archived or create an archive rule. If this access is not intended then delete the IAM Role.". format(
            resource_arn, principal)
        subject = "Access Analyzer finding {} was automatically resolved ".format(finding_id)
        snsResponse = sns_client.publish(

    Figure 3 shows an example of the email notification.

    Figure 3: Sample resolution email generated by the solution

    Figure 3: Sample resolution email generated by the solution

  5. The security team or business owner who receives the email reviews the role and does one of the following steps:
    • If you find that the IAM role with cross-account access is intended then:Remove the deny statement added in the trust policy through AWS CLI or AWS Management Console. As mentioned above, the solution adds the Access Analyzer finding ID as Sid for the deny statement. The following command shows removing the deny statement for role_name through AWS CLI using the finding id available in the email notification.
      POLICY_DOCUMENT=`aws iam get-role --role-name '<role_name>' --query "Role.AssumeRolePolicyDocument.{Version: Version, Statement: Statement[?Sid!='<finding_id>']}"`
      aws iam update-assume-role-policy --role-name '<role_name>' --policy-document "$POLICY_DOCUMENT"

      Further, you can create an archive rule with criteria such as AWS Account ID, resource type, and principal, to automatically archive new findings that match the criteria.

    • If you find that the IAM role provides unintentional cross-account access then you may delete the IAM role. Also, you should investigate who created the IAM role by checking relevant AWS CloudTrail events like iam:createRole, so that you can plan for preventive actions.

Solution deployment

You can deploy the solution by using either the AWS Management Console or the AWS Cloud Development Kit (AWS CDK).

To deploy the solution by using the AWS Management Console

  1. In your AWS account, launch the template by choosing the Launch Stack button, which creates the stack the in us-east-1 Region.
    Select the Launch Stack button to launch the template
  2. On the Quick create stack page, for Stack name, enter a unique stack name for this account; for example, iam-accessanalyzer-findings-resolution, as shown in Figure 4.

    Figure 4: Deploy the solution using CloudFormation template

    Figure 4: Deploy the solution using CloudFormation template

  3. For NotificationEmail, enter the email address to receive notifications for any resolution actions taken by the solution.
  4. Choose Create stack.

Additionally, you can find the latest code on the aws-iam-permissions-guardrails GitHub repository, where you can also contribute to the sample code. The following procedure shows how to deploy the solution by using the AWS Cloud Development Kit (AWS CDK).

To deploy the solution by using the AWS CDK

  1. Install the AWS CDK.
  2. Deploy the solution to your account using the following commands:
    git clone [email protected]:aws-samples/aws-iam-permissions-guardrails.git
    cd aws-iam-permissions-guardrails/access-analyzer/iam-role-findings-resolution/ 
    cdk bootstrap
    cdk deploy --parameters NotificationEmail=<YOUR_EMAIL_ADDRESS_HERE>

After deployment, you must confirm the AWS Amazon SNS email subscription to get the notifications from the solution.

To confirm the email address for notifications

  1. Check your email inbox and choose Confirm subscription in the email from Amazon SNS.
  2. Amazon SNS opens your web browser and displays a subscription confirmation with your subscription ID.

To test the solution

Create an IAM role with a trust policy with another AWS account as principal that is neither part of archive rule nor within your zone of trust. Also, for this test, do not attach any permission policies to the IAM role. You will receive an email notification after a few minutes, similar to the one shown previously in Figure 3.

As a next step, review the resolution action as described in step 5 in the solution walkthrough section above.

Clean up

If you launched the solution in the AWS Management Console by using the Launch Stack button, you can delete the stack by navigating to CloudFormation console, selecting the specific stack by its name, and then clicking the Delete button.

If you deployed the solution using AWS CDK, you can perform the cleanup using the following CDK command from the local directory where the solution was cloned from GitHub.

cdk destroy

Cost estimate

Deploying the solution alone will not incur any costs, but there is a cost associated with the AWS Lambda execution and Amazon SNS notifications through email, when the findings generated by IAM Access Analyzer match the EventBridge event rule and the notifications are sent. AWS Lambda and Amazon SNS have perpetual free tier and you will be charged only when the usage goes beyond the free tier usage each month.


In this blog post, we showed you how to automate the resolution of unintended cross-account IAM roles using IAM Access Analyzer. As a resolution, this solution added a deny statement into the IAM role’s trust policy.

You can expand the solution to resolve Access Analyzer findings for Amazon S3 and KMS, by modifying the associated resource policies. You can also include capabilities like automating the rollback of the resolution if the role is intended, or introducing an approval workflow to resolve the finding to suit to your organization’s process requirements. Also, IAM Access Analyzer now enables you to preview and validate public and cross-account access before deploying permissions changes.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Ramesh Balajepalli

Ramesh is a Senior Solutions Architect at AWS. He enjoys working with customers to solve their technical challenges using AWS services. In his spare time, you can find him spending time with family and cooking.


Siva Rajamani

Siva is a Boston-based Enterprise Solutions Architect for AWS. Siva enjoys working closely with customers to accelerate their AWS cloud adoption and improve their overall security posture.


Sujatha Kuppuraju

Sujatha is a Senior Solutions Architect at AWS. She works with ISV customers to help design secured, scalable, and well-architected solutions on the AWS Cloud. She is passionate about solving complex business problems with the ever-growing capabilities of technology.

Automatically update AWS WAF IP sets with AWS IP ranges

Post Syndicated from Fola Bolodeoku original https://aws.amazon.com/blogs/security/automatically-update-aws-waf-ip-sets-with-aws-ip-ranges/

Note: This blog post describes how to automatically update AWS WAF IP sets with the most recent AWS IP ranges for AWS services. This related blog post describes how to perform a similar update for Amazon CloudFront IP ranges that are used in VPC Security Groups.

You can use AWS Managed Rules for AWS WAF to quickly create baseline protections for your web applications, including setting up lists of IP addresses to be blocked. In some cases, you might need to create an IP set in AWS WAF with the IP address ranges of Amazon Web Services (AWS) services that you use, so that traffic from these services is allowed. In this blog post, we provide a solution that automatically updates an AWS WAF IP set with the IP address ranges of the AWS services Amazon CloudFront, Amazon Route 53 health checks, and Amazon EC2 (and also the services that share the same IP address ranges, such as AWS Lambda, Amazon CloudWatch, and so on). These services are present in the AWS Managed Rules Anonymous IP list, and blocking them may cause inadvertent service impairment for applications that expect traffic from the services.

As an application owner, you can improve your security posture by using the Anonymous IP list in your AWS WAF web access control lists (web ACLs) to block source IP addresses from specific hosting providers and anonymization services, such as VPNs, proxies, and Tor nodes. Due to the generic nature of these rules, when you use the Anonymous IP list, you might want to exclude certain IPs from the list of IPs to be blocked, in order to allow web traffic from those sources. For example, you can allow traffic that originates from the AWS network.

Alternatively, you might want to permit only IP addresses from certain AWS services in a web ACL. This is a common requirement when you protect an Application Load Balancer by restricting all incoming traffic to CloudFront IP ranges. Creating your own custom list to allow expected traffic from the AWS network requires some effort, because you need to periodically update the list by using the IP ranges that we provide. With the solution we present here, you don’t have to manually manage the exclusion list. When the new AWS IP ranges are published, this solution will automatically fetch and update the list.

Note: This solution only works with AWS WAF, and will not work with AWS WAF Classic.

Solution overview

Figure 1 shows the solution architecture.

Figure 1: Automatic update process for service IPs

Figure 1: Automatic update process for service IPs

AWS sends Amazon Simple Notification Service (Amazon SNS) notifications to subscribers of the AmazonIPSpaceChanged SNS topic when updates are made to the public IP addresses for AWS services. This solution uses an AWS CloudFormation template to deploy an AWS Lambda function that is triggered by these SNS notifications. The function creates AWS WAF IP sets for IPv4 and IPv6 address ranges in your web ACL.

The solution workflow is as follows:

  1. In the CloudFormation template, you select the services that you want the AWS WAF IP set to be updated with.
  2. The template deploys the required AWS resources with the configuration that specifies what services to fetch from an AWS public IP address update.
  3. AWS Lambda function is manually invoked one first time to populate AWS WAF IP sets with selected IPs from AWS IP range.
  4. Once AWS IP range is updated, an Amazon SNS notification is sent to subscribers of the SNS topic.
  5. SNS notification triggers the AWS Lambda function.
  6. The Lambda function fetches the selected IP ranges and updates IP sets for IPv4 addresses and IPv6 addresses.
  7. The application owner adds a custom AWS WAF web ACL rule that uses the IP sets to allow traffic from the AWS services that you’ve selected. This way, the web ACL makes reference to always updated AWS WAF IP sets with no further action required from your side.

Solution prerequisites

The solution is automatically created when you deploy the AWS CloudFormation template that is available on the solution’s GitHub page. There are three resources that you must have in place before you deploy the template:

  • The Python code that will be used as the Lambda function.
    • Download the update_aws_waf_ipset.py Python code from the project’s AWS Lambda directory in GitHub. This function is responsible for constantly checking AWS IPs and making sure that your AWS WAF IP sets are always updated with the most recent set of IPs in use by the AWS service of choice.
  • An Amazon Simple Storage Service (Amazon S3) bucket that you will use to store the compressed Python code.
    • Compress the file to a .zip file and upload it to an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region where you will deploy the template. For instructions on how to create an S3 bucket, see Creating a bucket.
  • An AWS WAF web ACL to filter requests that come in from trusted sources. The web ACL uses the IP sets that the solution creates and updates with the necessary IP addresses.

Deploy the AWS CloudFormation template

The CloudFormation template deploys the required resources for this solution in your account. The following resources are deployed:

  • Two AWS WAF IP sets, IPv4Set and IPv6Set that are used to store IPv4 and IPv6 IP addresses from the services you’re interested in allowing. Those IP sets are visible in the AWS WAF console under the same Region where the template is deployed.
    • Note: The IP address that appears in the template is a placeholder for the IP addresses that will be populated by the solution, and it is used for documentation purposes only.
  • The update_aws_waf_ipset.py Python code is used in an AWS Lambda function called UpdateWAFIPSet. This is the function that will read which services the solution should collects IPs from, and which IP sets should be populated. If you don’t change those parameters, the function will use default IP set suffixes. By default, the solution will select ROUTE53_HEALTHCHECKS and CLOUDFRONT as the services for which to download IPs. You can update the list of IP addresses as needed, by referring to the AWS IP JSON document for a list of service names and IP ranges.
  • A Lambda execution role with permissions restricted to least privilege required.
  • The Lambda function is automatically subscribed to the AmazonIPSpaceChanged SNS topic, which is responsible for monitoring changes in the list of AWS IPs.
  • A Lambda permission resource to allow the previously created SNS topic to invoke the template’s Lambda function.

Solution deployment through the console

You can download the AWS CloudFormation template, called template.yml, from the solution’s GitHub page.

After you’ve downloaded the template, access the CloudFormation console to create the stack. See the CloudFormation User Guide for instructions on selecting a downloaded template in the CloudFormation console to deploy a stack.

Note: The Region that you use when you deploy the template is where resources will be created.

On the Specify stack details page, you can enter the stack name, which will be the name used as a reference for resources created by the template, as well as six other stack parameters, shown in Figure 2.

Figure 2: Template parameters

Figure 2: Template parameters

The parameters are as follows:

  • EC2REGIONS – This is the Region that the solution will use as a reference when it updates its list of IPs. Select all for all Regions, but you can also specify a Region of interest.
  • IPV4SetNameSuffix – The solution will create an AWS WAF IPv4 IP set with the stack name as its name, but you can also add a suffix of your choice to the name.
  • IPV6SetNameSuffix – Like the AWS WAF IPv4 IP set, the IPv6 IP set can also have a suffix of your choice.
  • LambdaCodeS3Bucket – As mentioned in the Prerequisites section, you need to have previously uploaded the Lambda function Python code to an Amazon S3 bucket in the same Region where you’re deploying the stack. Enter the bucket name here, for example, mybucket.
  • LambdaCodeS3Object – Enter the name of the .zip file of the compressed Lambda function in the S3 bucket, for example, myfunction.zip.
  • SERVICES – Enter the list of AWS services for which you want the IP addresses populated in the AWS WAF IP sets. By default, this solution uses ROUTE53_HEALTHCHECKS and CLOUDFRONT, but you can change this parameter and add any service name, according to the list in the AWS IP ranges JSON.

After you deploy the template, its status will change to CREATE_COMPLETE.

Solution deployment through the AWS CLI

You can also deploy the solution template through the AWS Command Line Interface (AWS CLI). On the solution’s GitHub page, in the Setup section, follow the instructions for deploying the solution by using AWS CLI commands.

Note: To use the AWS CLI, you must have set it up in your environment. To set up the AWS CLI, follow the instructions in the AWS CLI installation documentation.

Invoke the Lambda function for the first time

After you successfully deploy the CloudFormation stack, it’s required that you run an initial Lambda invocation so that the AWS WAF IP sets are updated with AWS services IPs. This Lambda invocation is only required once, and after this initial call, the solution will handle future updates on your behalf.

To invoke this Lambda call through the AWS Management Console, open the Lambda console, select the Lambda function that was created by the template, and use the following event to create a test event. See Invoke the Lambda function in the AWS Lambda Developer Guide for step-by-step guidance on how to run a test event.

  "Records": [
      "EventVersion": "1.0",
      "EventSubscriptionArn": "arn:aws:sns:EXAMPLE",
      "EventSource": "aws:sns",
      "Sns": {
        "SignatureVersion": "1",
        "Timestamp": "1970-01-01T00:00:00.000Z",
        "Signature": "EXAMPLE",
        "SigningCertUrl": "EXAMPLE",
        "MessageId": "12345678-1234-1234-1234-123456789012",
        "Message": "{\"create-time\": \"yyyy-mm-ddThh:mm:ss+00:00\", \"synctoken\": \"0123456789\", \"md5\": \"test-hash\", \"url\": \"https://ip-ranges.amazonaws.com/ip-ranges.json\"}",
        "Type": "Notification",
        "UnsubscribeUrl": "EXAMPLE",
        "TopicArn": "arn:aws:sns:EXAMPLE",
        "Subject": "TestInvoke"

The success of the event will mean that the newly created AWS WAF IP sets now have the updated list of IPs from the services you’re working with.

You can also achieve Lambda function invocation through the AWS CLI by using the following command, where test_event.json is the test event I mentioned earlier.

aws lambda invoke \
  --function-name $CFN_STACK_NAME-UpdateWAFIPSets \
  --region $REGION \
  --payload file://lambda/test_event.json lambda_return.json

You can use the documentation for invoking a Lambda function in the AWS CLI to explore this command and its parameters.

After successful invocation, status code 200 is returned on the AWS CLI to illustrate that invocation happened as expected. At this point, the AWS WAF IP sets are updated.

Use the solution IP sets in your AWS WAF web ACL

Now the AWS WAF IPv4 and IPv6 IP sets are populated, and you can obtain the IP lists either by using the AWS WAF console, or by calling the GetIPSet API through the AWS CLI command get-ip-set.

To use AWS WAF IP sets in your web ACL, see Creating and managing an IP set in the AWS WAF Developer Guide. You can use these IP sets in the same web ACL or rule group that contains the AWS Managed Rules Anonymous IP list and is associated to the AWS resource that AWS WAF is protecting. AWS WAF evaluation order and solution positioning within WebACL will be discussed in later section.

To associate your web ACL with an AWS resource, see Associating or disassociating a web ACL with an AWS resource.

Validate the solution

To validate the solution, let’s consider a scenario where you would like to allow requests from CloudFront to come through, while blocking any other anonymous and hosting provider sources. In this scenario, consider the following requests that are filtered by AWS WAF.

In the first one, a customer has the AWSManagedRulesAnonymousIpList rule group, and a request coming from an Amazon EC2 instance IP is blocked.

    "timestamp": 1619175030566,
    "formatVersion": 1,
    "webaclId": "arn:aws:wafv2:eu-west-1:111122223333:regional/webacl/managedRuleValidation/11fd1e32-ae25-45f8-811f-3c1485f76ceb",
    "terminatingRuleId": "AWS-AWSManagedRulesAnonymousIpList",
    "terminatingRuleType": "MANAGED_RULE_GROUP",
    "action": "BLOCK",
    "ruleGroupList": [
            "ruleGroupId": "AWS#AWSManagedRulesAnonymousIpList",
            "terminatingRule": {
                "ruleId": "HostingProviderIPList",
                "action": "BLOCK",
                "ruleMatchDetails": null
    "httpRequest": {
        "clientIp": "",

In the second request, this time coming in from CloudFront, you can see that AWS WAF didn’t block the request.

    "timestamp": 1619175149405,
    "formatVersion": 1,
    "webaclId": "arn:aws:wafv2:eu-west-1:111122223333:regional/webacl/managedRuleValidation/11fd1e32-ae25-45f8-811f-3c1485f76ceb",
    "terminatingRuleId": "Default_Action",
    "terminatingRuleType": "REGULAR",
    "action": "ALLOW",
    "terminatingRuleMatchDetails": [],
    "httpRequest": {
       "clientIp": "",

To achieve this result, you need to edit AWSManagedRulesAnonymousIpList and add a scope-down statement so that the rule set only blocks requests that aren’t sent from sources within this solution’s IPv4 and IPv6 IP sets.

To create a scope-down statement for AWSManagedRulesAnonymousIpList

  1. In the AWS WAF console, access your web ACL.
  2. Open the Rules tab.
  3. Select AWSManagedRulesAnonymousIpList rule set, and then choose Edit.
  4. Choose the arrow next to Scope-down statement – optional. You will see two options, Rule visual editor and Rule JSON editor.
  5. Choose Rule JSON editor and enter the following JSON. Replacing <IPv4-IPSET-ARN> and <IPv6-IPSET-ARN> with respective IP sets’ Amazon Resource Numbers (ARNs).

Note: You can use the AWS WAF ListIPSets action or the list-ip-sets CLI command to obtain the IP set Amazon Resource Numbers (ARNs) and enter that information in the provided JSON.

  "NotStatement": {
    "Statement": {
      "OrStatement": {
        "Statements": [
            "IPSetReferenceStatement": {
              "ARN": "<IPv4-IPSET-ARN>"
            "IPSetReferenceStatement": {
              "ARN": "<IPv6-IPSET-ARN>"

After making this change, your rule editing page will look like the following.

Figure 3: AWSManagedRulesAnonymousIpList scope-down statement

Figure 3: AWSManagedRulesAnonymousIpList scope-down statement

When you set the rule priority, consider using the AWSManagedRulesAnonymousIpList rule group with a lower priority than other rules within the web ACL. This causes that rule group to be evaluated prior to rules that are configured with terminating actions (that is, Allow and Block actions). The scope-down statement will match the request and allow traffic from the IP addresses within the IP set, and pass every other IP on to the next rule for further evaluation. Figure 4 shows an example of the suggested priority.

Figure 4: Example with suggested use of AWS WAF web ACL priority

Figure 4: Example with suggested use of AWS WAF web ACL priority


This blog post provides you with a solution that is capable of automatically updating AWS WAF IP sets with the list of current IP ranges for one or more AWS services. You can use this solution in various ways, such as to allow requests from Amazon CloudFront when you’re using the AWS Managed Rules Anonymous IP List.

For best practices on AWS WAF implementation, see Guidelines for Implementing AWS WAF. For further reading on AWS WAF, see the AWS WAF Developer Guide.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Fola Bolodeoku

Fola is a Security Engineer on the AWS Shield Team, where he focuses on helping customers improve their application security posture against DDoS, and other application threats. When he is not working, he enjoys spending time on road trips in the Western Cape, and beyond.


Mario Pinho

Mário is a Security Engineer at AWS. He has a background in network engineering and consulting, and feels at his best when breaking apart complex topics and processes into their simpler components. In his free time, he pretends to be an artist by playing piano and doing landscape photography.


Davidson Junior

Davidson is a Cloud Support Engineer at AWS in Cape Town, South Africa. He is a subject matter expert in AWS WAF, and is focused on helping customers troubleshoot and protect their network in the cloud. Outside of work, he enjoys listening to music, outdoor photography, and hiking the Western Cape.

Build an end-to-end attribute-based access control strategy with AWS SSO and Okta

Post Syndicated from Louay Shaat original https://aws.amazon.com/blogs/security/build-an-end-to-end-attribute-based-access-control-strategy-with-aws-sso-and-okta/

This blog post discusses the benefits of using an attribute-based access control (ABAC) strategy and also describes how to use ABAC with AWS Single Sign-On (AWS SSO) when you’re using Okta as an identity provider (IdP).

Over the past two years, Amazon Web Services (AWS) has invested heavily in making ABAC available across the majority of our services. With ABAC, you can simplify your access control strategy by granting access to groups of resources, which are specified by tags, instead of managing long lists of individual resources. Each tag is a label that consists of a user-defined key and value, and you can use these to assign metadata to your AWS resources. Tags can help you manage, identify, organize, search for, and filter resources. You can create tags to categorize resources by purpose, owner, environment, or other criteria. To learn more about tags and AWS best practices for tagging, see Tagging AWS resources.

The ability to include tags in sessions—combined with the ability to tag AWS Identity and Access Management (IAM) users and roles—means that you can now incorporate user attributes from your identity provider as part of your tagging and authorization strategy. Additionally, user attributes help organizations to make permissions more intuitive, because the attributes are easier to relate to teams and functions. A tag that represents a team or a job function is easier to audit and understand.

For more information on ABAC in AWS, see our ABAC documentation.

Why use ABAC?

ABAC is a strategy that that can help organizations to innovate faster. Implementing a purely role-based access control (RBAC) strategy requires identity and security teams to define a large number of RBAC policies, which can lead to complexity and time delays. With ABAC, you can make use of attributes to build more dynamic policies that provide access based on matching the attribute conditions. AWS supports both RBAC and ABAC as co-existing strategies, so you can use ABAC alongside your existing RBAC strategy.

A good example that uses ABAC is the scenario where you have two teams that require similar access to their secrets in AWS Secrets Manager. By using ABAC, you can build a single role or policy with a condition based on the Department attribute from your IdP. When the user is authenticated, you can pass the Department attribute value and use a condition to provide access to resources that have the identical tag, as shown in the following code snippet. In this post, I show how to use ABAC for this example scenario.

"Condition": {
                "StringEquals": {
                    "secretsmanager:ResourceTag/Department": "${aws:PrincipalTag/Department}"

ABAC provides organizations with a more dynamic way of working with permissions. There are four main benefits for organizations that use ABAC:

  • Scale your permissions as you innovate: As developers create new project resources, administrators can require specific attributes to be applied when resources are created. This can include applying tags with attributes that give developers immediate access to the new resources they create, without requiring an update to their own permissions.
  • Help your teams to change and grow quickly: Because permissions are based on user attributes from a corporate identity source such as an IdP, changing user attributes in the IdP that you use for access control in AWS automatically updates your permissions in AWS.
  • Create fewer AWS SSO permission sets and IAM roles: With ABAC, multiple users who are using the same AWS SSO permission set and IAM role can still get unique permissions, because permissions are now based on user attributes. Administrators can author IAM policies that grant users access only to AWS resources that have matching attributes. This helps to reduce the number of IAM roles you need to create for various use cases in a single AWS account.
  • Efficiently audit who performed an action: By using attributes that are logged in AWS CloudTrail next to every action that is performed in AWS by using an IAM role, you can make it easier for security administrators to determine the identity that takes actions in a role session.


In this section, I describe some higher-level prerequisites for using ABAC effectively. ABAC in AWS relies on the use of tags for access-control decisions, so it’s important to have in place a tagging strategy for your resources. To help you develop an effective strategy, see the AWS Tagging Strategies whitepaper.

Organizations that implement ABAC can enhance the use of tags across their resources for the purpose of identity access. Making sure that tagging is enforced and secure is essential to an enterprise-wide strategy. For more information about enforcing a tagging policy, see the blog post Enforce Centralized Tag Compliance Using AWS Service Catalog, DynamoDB, Lambda, and CloudWatch Events.

You can use the service AWS Resource Groups to identify untagged resources and to find resources to tag. You can also use Resource Groups to remediate untagged resources.

Use AWS SSO with Okta as an IdP

AWS SSO gives you an efficient way to centrally manage access to multiple AWS accounts and business applications, and to provide users with single sign-on access to all their assigned accounts and applications from one place. With AWS SSO, you can manage access and user permissions to all of your accounts in AWS Organizations centrally. AWS SSO configures and maintains all the necessary permissions for your accounts automatically, without requiring any additional setup in the individual accounts.

AWS SSO supports access control attributes from any IdP. This blog post focuses on how you can use ABAC attributes with AWS SSO when you’re using Okta as an external IdP.

Use other single sign-on services with ABAC

This post describes how to turn on ABAC in AWS SSO. To turn on ABAC with other federation services, see these links:

Implement the solution

Follow these steps to set up Okta as an IdP in AWS SSO and turn on ABAC.

To set up Okta and turn on ABAC

  1. Set up Okta as an IdP for AWS SSO. To do so, follow the instructions in the blog post Single Sign-On Between Okta Universal Directory and AWS. For more information on the supported actions in AWS SSO with Okta, see our documentation.
  2. Enable attributes for access control (in other words, turn on ABAC) in AWS SSO by using these steps:
    1. In the AWS Management Console, navigate to AWS SSO in the AWS Region you selected for your implementation.
    2. On the Dashboard tab, select Choose your identity source.
    3. Next to Attributes for access control, choose Enable.

      Figure 1: Turn on ABAC in AWS SSO

      Figure 1: Turn on ABAC in AWS SSO

    You should see the message “Attributes for access control has been successfully enabled.”

  3. Enable updates for user attributes in Okta provisioning. Now that you’ve turned on ABAC in AWS SSO, you need to verify that automatic provisioning for Okta has attribute updates enabled.Log in to Okta as an administrator and locate the application you created for AWS SSO. Navigate to the Provisioning tab, choose Edit, and verify that Update User Attributes is enabled.

    Figure 2: Enable automatic provisioning for ABAC updates

    Figure 2: Enable automatic provisioning for ABAC updates

  4. Configure user attributes in Okta for use in AWS SSO by following these steps:
    1. From the same application that you created earlier, navigate to the Sign On tab.
    2. Choose Edit, and then expand the Attributes (optional) section.
    3. In the Attribute Statements (optional) section, for each attribute that you will use for access control in AWS SSO, do the following:
      1. For Name, enter https://aws.amazon.com/SAML/Attributes/AccessControl:<AttributeName>. Replace <AttributeName> with the name of the attribute you’re expecting in AWS SSO, for example https://aws.amazon.com/SAML/Attributes/AccessControl:Department.
      2. For Name Format, choose URI reference.
      3. For Value, enter user.<AttributeName>. Replace <AttributeName> with the Okta default user profile variable name, for example user.department. To view the Okta default user profile, see these instructions.


    Figure 3: Configure two attributes for users in Okta

    Figure 3: Configure two attributes for users in Okta

    In the example shown here, I added two attributes, Department and Division. The result should be similar to the configuration shown in Figure 3.

  5. Add attributes to your users by using these steps:
    1. In your Okta portal, log in as administrator. Navigate to Directory, and then choose People.
    2. Locate a user, navigate to the Profile tab, and then choose Edit.
    3. Add values to the attributes you selected.
    Figure 4: Addition of user attributes in Okta

    Figure 4: Addition of user attributes in Okta

  6. Confirm that attributes are mapped. Because you’ve enabled automatic provisioning updates from Okta, you should be able to see the attributes for your user immediately in AWS SSO. To confirm this:
    1. In the console, navigate to AWS SSO in the Region you selected for your implementation.
    2. On the Users tab, select a user that has attributes from Okta, and select the user. You should be able to see the attributes that you mapped from Okta.
    Figure 5: User attributes in Okta

    Figure 5: User attributes in Okta

Now that you have ABAC attributes for your users in AWS SSO, you can now create permission sets based on those attributes.

Note: Step 4 ensures that users will not be successfully authenticated unless the attributes configured are present. If you don’t want this enforcement, do not perform step 4.

Build an ABAC permission set in AWS SSO

For demonstration purposes, I’ll show how you can build a permission set that is based on ABAC attributes for AWS Secrets Manager. The permission set will match resource tags to user tags, in order to control which resources can be managed by Secrets Manager administrators. You can apply this single permission set to multiple teams.

To build the ABAC permission set

  1. In the console, navigate to AWS SSO, and choose AWS Accounts.
  2. Choose the Permission sets tab.
  3. Choose Create permission set, and then choose Create a custom permission set.
  4. Fill in the fields as follows.
    1. For Name, enter a name for your permission set that will be visible to your users, for example, SecretsManager-Profile.
    2. For Description, enter ABAC SecretsManager Profile.
    3. Select the appropriate session duration.
    4. For Relay State, for my example I will enter the URL for Secrets Manager: https://console.aws.amazon.com/secretsmanager/home. This will give a better user experience when the user signs in to AWS SSO, with an automatic redirect to the Secrets Manager console.
    5. For the field What policies do you want to include in your permission set?, choose Create a custom permissions policy.
    6. Under Create a custom permissions policy, paste the following policy.
          "Version": "2012-10-17",
          "Statement": [
                  "Sid": "SecretsManagerABAC",
                  "Effect": "Allow",
                  "Action": [
                  "Resource": "*",
                  "Condition": {
                      "StringEquals": {
                          "secretsmanager:ResourceTag/Department": "${aws:PrincipalTag/Department}"
                  "Sid": "NeededPermissions",
                  "Effect": "Allow",
                  "Action": [
                  "Resource": "*"

    This policy grants users the ability to create and list secrets that belong to their department. The policy is configured to allow Secrets Manager users to manage only the resources that belong to their department. You can modify this policy to perform matching on more attributes, in order to have more granular permissions.

    Note: The RDS permissions in the policy enable users to select an RDS instance for the secret and the Lambda Permissions are to enable custom key rotation.

    If you look closely at the condition

    “secretsmanager:ResourceTag/Department”: “${aws:PrincipalTag/Department}”

    …the condition states that the user can only access Secrets Manager resources that have a Department tag, where the value of that tag is identical to the value of the Department tag from the user.

  5. Choose Next: Tags.
  6. Tag your permission set. For my example, I’ll add Key: Service and Value: SecretsManager.
  7. Choose Next: Review and create.
  8. Assign the permission set to a user or group and to the appropriate accounts that you have in AWS Organizations.

Test an ABAC permission set

Now you can test the ABAC permission set that you just created for Secrets Manager.

To test the ABAC permission set

  1. In the AWS SSO console, on the Dashboard page, navigate to the User Portal URL.
  2. Sign in as a user who has the attributes that you configured earlier in AWS SSO. You will assume the permission set that you just created.
  3. Choose Management console. This will take you to the console that you specified in the Relay State setting for the permission set, which in my example is the Secrets Manager console.

    Figure 6: AWS SSO ABAC profile access

    Figure 6: AWS SSO ABAC profile access

  4. Try to create a secret with no tags:
    1. Choose Store a new secret.
    2. Choose Other type of secrets.
    3. You can add any values you like for the other options, and then choose Next.
    4. Give your secret a name, but don’t add any tags. Choose Next.
    5. On the Configure automatic rotation page, choose Next, and then choose Store.

    You should receive an error stating that the user failed to create the secret, because the user is not authorized to perform the secretsmanager:CreateSecret action.

    Figure 7: Failure to create a secret (no attributes)

    Figure 7: Failure to create a secret (no attributes)

  5. Choose Previous twice, and then add the appropriate tag. For my example, I’ll add a tag with the key Department and the value Serverless.

    Figure 8: Adding tags for a secret

    Figure 8: Adding tags for a secret

  6. Choose Next twice, and then choose Store. You should see a message that your secret creation was successful.

    Figure 9: Successful secret creation

    Figure 9: Successful secret creation

Now administrators who assume this permission set can view, create, and manage only the secrets that belong to their team or department, based on the tags that you defined. You can reuse this permission set across a large number of teams, which can reduce the number of permission sets you need to create and manage.


In this post, I’ve talked about the benefits organizations can gain from embracing an ABAC strategy, and walked through how to turn on ABAC attributes in Okta and AWS SSO. I’ve also shown how you can create ABAC-driven permission sets to simplify your permission set management. For more information on AWS services that support ABAC—in other words, authorization based on tags—see our updated AWS services that work with IAM page.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Single Sign-On forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Louay Shaat

Louay is a Security Solutions Architect with AWS. He spends his days working with customers, from startups to the largest of enterprises helping them build cool new capabilities and accelerating their cloud journey. He has a strong focus on security and automation helping customers improve their security, risk, and compliance in the cloud.

CloudHSM best practices to maximize performance and avoid common configuration pitfalls

Post Syndicated from Esteban Hernández original https://aws.amazon.com/blogs/security/cloudhsm-best-practices-to-maximize-performance-and-avoid-common-configuration-pitfalls/

AWS CloudHSM provides fully-managed hardware security modules (HSMs) in the AWS Cloud. CloudHSM automates day-to-day HSM management tasks including backups, high availability, provisioning, and maintenance. You’re still responsible for all user management and application integration.

In this post, you will learn best practices to help you maximize the performance of your workload and avoid common configuration pitfalls in the following areas:

Administration of CloudHSM

The administration of CloudHSM includes those tasks necessary to correctly set up your CloudHSM cluster, and to manage your users and keys in a secure and efficient manner.

Initialize your cluster with a customer key pair

To initialize a new CloudHSM cluster, you will first create a new RSA key pair, which we will call the customer key pair. First, generate a self-signed certificate using the customer key pair. Then, you sign the cluster’s certificate by using the customer public key as described in Initialize the Cluster section in the AWS CloudHSM User Guide. The resulting signed cluster certificate, as shown in Figure 1, identifies your CloudHSM cluster as yours.

Figure 1: CloudHSM key hierarchy and customer generated keys

Figure 1: CloudHSM key hierarchy and customer generated keys

It’s important to use best practices when you generate and store the customer private key. The private key is a binding secret between you and your cluster, and cannot be rotated. We therefore recommend that you create the customer private key in an offline HSM and store the HSM securely. Any entity (organization, person, system) that demonstrates possession of the customer private key will be considered an owner of the cluster and the data it contains. In this procedure, you are using the customer private key to claim a new cluster, but in the future you could also use it to demonstrate ownership of the cluster in scenarios such as cloning and migration.

Manage your keys with crypto user (CU) accounts

The HSMs provided by CloudHSM support different types of HSM users, each with specific entitlements. Crypto users (CUs) generate, manage, and use keys. If you’ve worked with HSMs in the past, you can think of CUs as similar to partitions. However, CU accounts are more flexible. The CU that creates a key owns the key, and can share it with other CUs. The shared key can be used for operations in accordance with the key’s attributes, but the CU that the key was shared with cannot manage it – that is, they cannot delete, wrap, or re-share the key.

From a security standpoint, it is a best practice for you to have multiple CUs with different scopes. For example, you can have different CUs for different classes of keys. As another example, you can have one CU account to create keys, and then share these keys with one or more CU accounts that your application leverages to utilize keys. You can also have multiple shared CU accounts, to simplify rotation of credentials in production applications.

Warning: You should be careful when deleting CU accounts. If the owner CU account for a key is deleted, the key can no longer be used. You can use the cloudhsm_mgmt_util tool command findAllKeys to identify which keys are owned by a specified CU. You should rotate these keys before deleting a CU. As part of your key generation and rotation scheme, consider using labels to identify current and legacy keys.

Manage your cluster by using crypto officer (CO) accounts

Crypto officers (COs) can perform user management operations including change password, create user, and delete user. COs can also set and modify cluster policies.

Important: When you add or remove a user, or change a password, it’s important to ensure that you connect to all the HSMs in a cluster, to keep them synchronized and avoid inconsistencies that can result in errors. It is a best practice to use the Configure tool with the –m option to refresh the cluster configuration file before making mutating changes to the cluster. This helps to ensure that all active HSMs in the cluster are properly updated, and prevents the cluster from becoming desynchronized. You can learn more about safe management of your cluster in the blog post Understanding AWS CloudHSM Cluster Synchronization. You can verify that all HSMs in the cluster have been added by checking the /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfg file.

After a password has been set up or updated, we strongly recommend that you keep a record in a secure location. This will help you avoid lockouts due to erroneous passwords, because clients will fail to log in to HSM instances that do not have consistent credentials. Depending on your security policy, you can use AWS Secrets Manager, specifying a customer master key created in AWS Key Management Service (KMS), to encrypt and distribute your secrets – secrets in this case being the CU credentials used by your CloudHSM clients.

Use quorum authentication

To prevent a single CO from modifying critical cluster settings, a best practice is to use quorum authentication. Quorum authentication is a mechanism that requires any operation to be authorized by a minimum number (M) of a group of N users and is therefore also known as M of N access control.

To prevent lock-outs, it’s important that you have at least two more COs than the M value you define for the quorum minimum value. This ensures that if one CO gets locked out, the others can safely reset their password. Also be careful when deleting users, because if you fall under the threshold of M, you will be unable to create new users or authorize any other operations and will lose the ability to administer your cluster.

If you do fall below the minimum quorum required (M), or if all of your COs end up in a locked-out state, you can revert to a previously known good state by restoring from a backup to a new cluster. CloudHSM automatically creates at least one backup every 24 hours. Backups are event-driven. Adding or removing HSMs will trigger additional backups.


CloudHSM is a fully managed service, but it is deployed within the context of an Amazon Virtual Private Cloud (Amazon VPC). This means there are aspects of the CloudHSM service configuration that are under your control, and your choices can positively impact the resilience of your solutions built using CloudHSM. The following sections describe the best practices that can make a difference when things don’t go as expected.

Use multiple HSMs and Availability Zones to optimize resilience

When you’re optimizing a cluster for high availability, one of the aspects you have control of is the number of HSMs in the cluster and the Availability Zones (AZs) where the HSMs get deployed. An AZ is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region, which can be formed of multiple physical buildings, and have different risk profiles between them. Most of the AWS Regions have three Availability Zones, and some have as many as six.

AWS recommends placing at least two HSMs in the cluster, deployed in different AZs, to optimize data loss resilience and improve the uptime in case an individual HSM fails. As your workloads grow, you may want to add extra capacity. In that case, it is a best practice to spread your new HSMs across different AZs to keep improving your resistance to failure. Figure 2 shows an example CloudHSM architecture using multiple AZs.

Figure 2: CloudHSM architecture using multiple AZs

Figure 2: CloudHSM architecture using multiple AZs

When you create a cluster in a Region, it’s a best practice to include subnets from every available AZ of that Region. This is important, because after the cluster is created, you cannot add additional subnets to it. In some Regions, such as Northern Virginia (us-east-1), CloudHSM is not yet available in all AZs at the time of writing. However, you should still include subnets from every AZ, even if CloudHSM is currently not available in that AZ, to allow your cluster to use those additional AZs if they become available.

Increase your resiliency with cross-Region backups

If your threat model involves a failure of the Region itself, there are steps you can take to prepare. First, periodically create copies of the cluster backup in the target Region. You can see the blog post How to clone an AWS CloudHSM cluster across regions to find an extensive description of how to create copies and deploy a clone of an active CloudHSM cluster.

As part of your change management process, you should keep copies of important files, such as the files stored in /opt/cloudhsm/etc/. If you customize the certificates that you use to establish communication with your HSM, you should back up those certificates as well. Additionally, you can use configuration scripts with the AWS Systems Manager Run Command to set up two or more client instances that use exactly the same configuration in different Regions.

The managed backup retention feature in CloudHSM automatically deletes out-of-date backups for an active cluster. However, because backups that you copy across Regions are not associated with an active cluster, they are not in scope of managed backup retention and you must delete out-of-date backups yourself. Backups are secure and contain all users, policies, passwords, certificates and keys for your HSM, so it’s important to delete older backups when you rotate passwords, delete a user, or retire keys. This ensures that you cannot accidentally bring older data back to life by creating a new cluster that uses outdated backups.

The following script shows you how to delete all backups older than a certain point in time. You can also download the script from S3.

#!/usr/bin/env python

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0
# Reference Links:
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
# https://docs.python.org/3/library/re.html
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudhsmv2.html#CloudHSMV2.Client.describe_backups
# https://docs.python.org/3/library/datetime.html#datetime-objects
# https://pypi.org/project/typedate/
# https://pypi.org/project/pytz/

import boto3, time, datetime, re, argparse, typedate, json

def main():
    bkparser = argparse.ArgumentParser(prog='backdel',
                                    usage='%(prog)s [-h] --region --clusterID [--timestamp] [--timezone] [--deleteall] [--dryrun]',
                                    description='Deletes CloudHSMv2 backups from a given point in time\n')
                    help='region where the backups are stored',
                    help='CloudHSMv2 cluster_id for which you want to delete backups',
                    help="Enter the timestamp to filter the backups that should be deleted:\n   Backups older than the timestamp will be deleted.\n  Timestamp ('MM/DD/YY', 'MM/DD/YYYY' or 'MM/DD/YYYY HH:mm')",
                    help="Enter the timezone to adjust the timestamp.\n Example arguments:\n --timezone '-0200' , --timezone '05:00' , --timezone GMT #If the pytz module has been installed  ",
                    help="Set this flag to simulate the deletion",
                    help="Set this flag to delete all the back ups for the specified cluster",
    args = bkparser.parse_args()
    client = boto3.client('cloudhsmv2', args.region)
    cluster_id = args.clusterID 
    timestamp_str = args.timestamp 
    timezone = args.timezone
    dry_true = args.dryrun
    delall_true = args.deleteall
    delete_all_backups_before(client, cluster_id, timestamp_str, timezone, dry_true, delall_true)

def delete_all_backups_before(client, cluster_id, timestamp_str, timezone, dry_true, delall_true, max_results=25):
    timestamp_datetime = None
    if delall_true == True and not timestamp_str:
        print("\nAll backups will be deleted...\n")
    elif delall_true == True and timestamp_str:
        print("\nUse of incompatible instructions: --timestamp  and --deleteall cannot be used in the same invocation\n")
    elif not timestamp_str :
        print("\nParameter missing: --timestamp must be defined\n")
    else :
        # Valid formats: 'MM/DD/YY', 'MM/DD/YYYY' or 'MM/DD/YYYY HH:mm'
        if re.match(r'^\d\d/\d\d/\d\d\d\d \d\d:\d\d$', timestamp_str):
                timestamp_datetime = datetime.datetime.strptime(timestamp_str, "%m/%d/%Y %H:%M")
            except Exception as e:
                print("Exception: %s" % str(e))
        elif re.match(r'^\d\d/\d\d/\d\d\d\d$', timestamp_str):
                timestamp_datetime = datetime.datetime.strptime(timestamp_str, "%m/%d/%Y")
            except Exception as e:
                print("Exception: %s" % str(e))
        elif re.match(r'^\d\d/\d\d/\d\d$', timestamp_str):
                timestamp_datetime = datetime.datetime.strptime(timestamp_str, "%m/%d/%y")
            except Exception as e:
                print("Exception: %s" % str(e))
            print("The format of the specified timestamp is not supported by this script. Aborting...")

        print("Backups older than %s will be deleted...\n" % timestamp_str)

        response = client.describe_backups(MaxResults=max_results, Filters={"clusterIds": [cluster_id]}, SortAscending=True)
    except Exception as e:
        print("DescribeBackups failed due to exception: %s" % str(e))

    failed_deletions = []
    while True:
        if 'Backups' in response.keys() and len(response['Backups']) > 0:
            for backup in response['Backups']:
                if timestamp_str and not delall_true:
                    if timezone != None:
                        timestamp_datetime = timestamp_datetime.replace(tzinfo=timezone)
                        timestamp_datetime = timestamp_datetime.replace(tzinfo=backup['CreateTimestamp'].tzinfo)

                    if backup['CreateTimestamp'] > timestamp_datetime:

                print("Deleting backup %s whose creation timestamp is %s:" % (backup['BackupId'], backup['CreateTimestamp']))
                    if not dry_true :
                        delete_backup_response = client.delete_backup(BackupId=backup['BackupId'])
                except Exception as e:
                    print("DeleteBackup failed due to exception: %s" % str(e))
                print("Sleeping for 1 second to avoid throttling. \n")

        if 'NextToken' in response.keys():
                response = client.describe_backups(MaxResults=max_results, Filters={"clusterIds": [cluster_id]}, SortAscending=True, NextToken=response['NextToken'])
            except Exception as e:
                print("DescribeBackups failed due to exception: %s" % str(e))

    if len(failed_deletions) > 0:
        print("FAILED backup deletions: " + failed_deletions)

if __name__== "__main__":

Use Amazon VPC security features to control access to your cluster

Because each cluster is deployed inside an Amazon VPC, you should use the familiar controls of Amazon VPC security groups and network access control lists (network ACLs) to limit what instances are allowed to communicate with your cluster. Even though the CloudHSM cluster itself is protected in depth by your login credentials, Amazon VPC offers a useful first line of defense. Because it’s unlikely that you need your communications ports to be reachable from the public internet, it’s a best practice to take advantage of the Amazon VPC security features.

Managing PKI root keys

A common use case for CloudHSM is setting up public key infrastructure (PKI). The root key for PKI is a long-lived key which forms the basis for certificate hierarchies and worker keys. The worker keys are the private portion of the end-entity certificates and are meant for routine rotation, while root PKI keys are generally fixed. As a characteristic, these keys are infrequently used, with very long validity periods that are often measured in decades. Because of this, it is a best practice to not rely solely on CloudHSM to generate and store your root private key. Instead, you should generate and store the root key in an offline HSM (this is frequently referred to as an offline root) and periodically generate intermediate signing key pairs on CloudHSM.

If you decide to store and use the root key pair with CloudHSM, you should take precautions. You can either create the key in an offline HSM and import it into CloudHSM for use, or generate the key in CloudHSM and wrap it out to an offline HSM. Either way, you should always have a copy of the key, usable independently of CloudHSM, in an offline vault. This helps to protect your trust infrastructure against forgotten CloudHSM credentials, lost application code, changing technology, and other such scenarios.

Optimize performance by managing your cluster size

It is important to size your cluster correctly, so that you can maintain its performance at the desired level. You should measure throughput rather than latency, and keep in mind that parallelizing transactions is the key to getting the most performance out of your HSM. You can maximize how efficiently you use your HSM by following these best practices:

  1. Use threading at 50-100 threads per application. The impact of network round-trip delays is magnified if you serialize each operation. The exception to this rule is generating persistent keys – these are serialized on the HSM to ensure consistent state, and so parallelizing these will yield limited benefit.
  2. Use sufficient resources for your CloudHSM client. The CloudHSM client handles all load balancing, failover, and high availability tasks as your application transacts with your HSM cluster. You should ensure that the CloudHSM client has enough computational resources so that the client itself doesn’t become your performance bottleneck. Specifically, do not use resource-limited instances such as t.nano or t.micro instances to run the client. To learn more, see the Amazon Elastic Compute Cloud (EC2) instance types online documentation.
  3. Use cryptographically accelerated commands. There are two types of HSM commands: management commands (such as looking up a key based on its attributes) and cryptographically accelerated commands (such as operating on a key with a known key handle). You should rely on cryptographically accelerated commands as much as possible for latency-sensitive operations. As one example, you can cache the key handles for frequently used keys or do it per application run, rather than looking up a key handle each time. As another example, you can leave frequently used keys on the HSM, rather than unwrapping or importing them prior to each use.
  4. Authenticate once per session. Pay close attention to session logins. Your individual CloudHSM client should create just one session per execution, which is authenticated using the credentials of one cryptographic user. There’s no need to reauthenticate the session for every cryptographic operation.
  5. Use the PKCS #11 library. If performance is critical for your application and you can choose from the multiple software libraries to integrate with your CloudHSM cluster, give preference to PKCS #11, as it tends to give an edge on speed.
  6. Use token keys. For workloads with a limited number of keys, and for which high throughput is required, use token keys. When you create or import a key as a token key, it is available in all the HSMs in the cluster. However, when it is created as a session key with the “-sess” option, it only exists in the context of a single HSM.

After you maximize throughput by using these best practices, you can add HSMs to your cluster for additional throughput. Other reasons to add HSMs to your cluster include if you hit audit log buffering limits while rapidly generating or importing and then deleting keys, or if you run out of capacity to create more session keys.

Error handling

Occasionally, an HSM may fail or lose connectivity during a cryptographic operation. The CloudHSM client does not automatically retry failed operations because it’s not state-aware. It’s a best practice for you to retry as needed by handling retries in your application code. Before retrying, you may want to ensure that your CloudHSM client is still running, that your instance has connectivity, and that your session is still logged in (if you are using explicit login). For an overview of the considerations for retries, see the Amazon Builders’ Library article Timeouts, retries, and backoff with jitter.


In this post, we’ve outlined a set of best practices for the use of CloudHSM, whether you want to improve the performance and durability of the solution, or implement robust access control.

To get started building and applying these best practices, a great way is to look at the AWS samples we have published on GitHub for the Java Cryptography Extension (JCE) and for the Public-Key Cryptography Standards number 11 (PKCS11).

If you have feedback about this blog post, submit comments in the Comments session below. You can also start a new thread on the AWS CloudHSM forum to get answers from the community.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Esteban Hernández

Esteban is a Specialist Solutions Architect for Security & Compliance at AWS where he works with customers to create secure and robust architectures that help to solve business problems. He is interested in topics like Identity and Cryptography. Outside of work, he enjoys science fiction and taking new challenges like learning to sail.


Avni Rambhia

Avni is the product manager for AWS CloudHSM. As part of AWS Cryptography, she drives technologies and defines best practices that help customers build secure, reliable workloads in the AWS Cloud. Outside of work, she enjoys hiking, travel and philosophical debates with her children.

Encrypt global data client-side with AWS KMS multi-Region keys

Post Syndicated from Jeremy Stieglitz original https://aws.amazon.com/blogs/security/encrypt-global-data-client-side-with-aws-kms-multi-region-keys/

Today, AWS Key Management Service (AWS KMS) is introducing multi-Region keys, a new capability that lets you replicate keys from one Amazon Web Services (AWS) Region into another. Multi-Region keys are designed to simplify management of client-side encryption when your encrypted data has to be copied into other Regions for disaster recovery or is replicated in Amazon DynamoDB global tables.

In this blog post, we give an overview of how we got here and how to get started using multi-Region keys. We include a code example for multi-Region encryption of data in DynamoDB global tables.

How we got here

From its inception, AWS KMS has been strictly isolated to a single AWS Region for each implementation, with no sharing of keys, policies, or audit information across Regions. Region isolation can help you comply with security standards and data residency requirements. However, not sharing keys across Regions creates challenges when you need to move data that depends on those keys across Regions. AWS services that use your KMS keys for server-side encryption address this challenge by transparently re-encrypting data on your behalf using the KMS keys you designate in the destination Region. If you use client-side encryption, this work adds extra complexity and latency of re-encrypting between regionally isolated KMS keys.

Multi-Region keys are a new feature from AWS KMS for client-side applications that makes KMS-encrypted ciphertext portable across Regions. Multi-Region keys are a set of interoperable KMS keys that have the same key ID and key material, and that you can replicate to different Regions within the same partition. With symmetric multi-Region keys, you can encrypt data in one Region and decrypt it in a different Region. With asymmetric multi-Region keys, you encrypt, decrypt, sign, and verify messages in multiple Regions.

Multi-Region keys are supported in the AWS KMS console, the AWS KMS API, the AWS Encryption SDK, Amazon DynamoDB Encryption Client, and Amazon S3 Encryption Client. AWS services also let you configure multi-Region keys for server-side encryption in case you want the same key to protect data that needs both server-side and client-side encryption.

Getting started with multi-Region keys

To use multi-Region keys, you create a primary multi-Region key with a new key ID and key material. Then, you use the primary key to create a related multi-Region replica key in a different Region of the same AWS partition. Replica keys are KMS keys that can be used independently; they aren’t a pointer to the primary key. The primary and replica keys share only certain properties, including their key ID, key rotation, and key origin. In all other aspects, every multi-Region key, whether primary or replica, is a fully functional, independent KMS key resource with its own key policy, aliases, grants, key description, lifecycle, and other attributes. The key Amazon Resource Names (ARN) of related multi-Region keys differ only in the Region portion, as shown in the following figure (Figure 1).

Figure 1: Multi-Region keys have unique ARNs but identical key IDs

Figure 1: Multi-Region keys have unique ARNs but identical key IDs

You cannot convert an existing single-Region key to a multi-Region key. This design ensures that all data protected with existing single-Region keys maintain the same data residency and data sovereignty properties.

When to use multi-Region keys

You can use multi-Region keys in any client-side application. Since multi-Region keys avoid cross-Region calls, they’re especially useful for scenarios where you don’t want to depend on another Region or incur the latency of a cross-Region call. For example, disaster recovery, global data management, distributed signing applications, and active-active applications that span multiple Regions can all benefit from using multi-Region keys. You can also create and use multi-Region keys in a single Region and choose to replicate those keys at some later date when you need to move protected data to additional Regions.

Note: If your application will run in only one Region, you should continue to use single-Region keys to benefit from their data isolation properties.

One significant benefit of multi-Region keys is using them with DynamoDB global tables. Let’s explore that interaction in detail.

Using multi-Region keys with DynamoDB global tables

AWS KMS multi-Region keys (MRKs) can be used with the DynamoDB Encryption Client to protect data in DynamoDB global tables. You can configure the DynamoDB Encryption Client to call AWS KMS for decryption in a different Region than the one in which the data was encrypted, as shown in the following figure (Figure 2). This is useful for disaster recovery, or simply to improve performance when using DynamoDB in a globally distributed application.

Figure 2: Using multi-Region keys with DynamoDB global tables

Figure 2: Using multi-Region keys with DynamoDB global tables

The steps described in Figure 2 are:

  1. Encrypt record with primary MRK
  2. Put encrypted record
  3. Global table replication
  4. Get encrypted record
  5. Decrypt record with replica MRK

Create a multi-Region primary key

Begin by creating a multi-Region primary key and replicating it into your backup Regions. We’ll assume that you’ve created a DynamoDB global table that’s replicated to the same Regions.

Configure the DynamoDB Encryption Client to encrypt records

To use AWS KMS multi-Region keys, you need to configure the DynamoDB Encryption Client with the Region you want to call, which is typically the Region where the application is running. Then, you need to configure the ARN of the KMS key you want to use in that Region.

This example encrypts records in us-east-1 (US East (N. Virginia)) and decrypts records in us-west-2 (US West (Oregon)). If you use the following example configuration code, be sure to replace the example key ARNs with valid key ARNs for your multi-Region keys.

// Specify the multi-Region key in the us-east-1 Region
String encryptRegion = "us-east-1";
String cmkArnEncrypt = "arn:aws:kms:us-east-1:<111122223333>:key/<mrk-1234abcd12ab34cd56ef12345678990ab>";

// Set up SDK clients for KMS and DDB in us-east-1
AWSKMS kmsEncrypt = AWSKMSClientBuilder.standard().withRegion(encryptRegion).build();
AmazonDynamoDB ddbEncrypt = AmazonDynamoDBClientBuilder.standard().withRegion(encryptRegion).build();

// Configure the example global table
String tableName = "global-table-example";
String employeeIdAttribute = "employeeId";
String nameAttribute = "name";

// Configure attribute actions for the Dynamo DB Encryption Client
//   Sign the employee ID field
//   Encrypt and sign the name field
Map<String, Set<EncryptionFlags>> actions = new HashMap<>();
actions.put(employeeIdAttribute, EnumSet.of(EncryptionFlags.SIGN));
actions.put(nameAttribute, EnumSet.of(EncryptionFlags.ENCRYPT, EncryptionFlags.SIGN));

// Set an encryption context. This is an optional best practice.
final EncryptionContext encryptionContext = new EncryptionContext.Builder()

// Use the Direct KMS materials provider and the DynamoDB encryptor
// Specify the key ARN of the multi-Region key in us-east-1
DirectKmsMaterialProvider cmpEncrypt = new DirectKmsMaterialProvider(kmsEncrypt, cmkArnEncrypt);
DynamoDBEncryptor encryptor = DynamoDBEncryptor.getInstance(cmpEncrypt);

// Create a record, encrypt it, 
// and put it in the DynamoDB global table
Map<String, AttributeValue> rec = new HashMap<>();
rec.put(nameAttribute, new AttributeValue().withS("Andy"));
rec.put(employeeIdAttribute, new AttributeValue().withS("1234"));

final Map<String, AttributeValue> encryptedRecord = encryptor.encryptRecord(rec, actions, encryptionContext);
ddbEncrypt.putItem(tableName, encryptedRecord);

When you save the newly encrypted record, DynamoDB global tables automatically replicates this encrypted record to the replica tables in the us-west-2 Region.

Configure the DynamoDB Encryption Client to decrypt data

Now you’re ready to configure a DynamoDB client to decrypt the record in us-west-2 where both the replica table and the replica multi-Region key exist.

// Specify the Region and key ARN to use when decrypting          
String decryptRegion = "us-west-2";
String cmkArnDecrypt = "arn:aws:kms:us-west-2:<111122223333>:key/<mrk-1234abcd12ab34cd56ef12345678990ab>";

// Set up SDK clients for KMS and DDB in us-west-2
AWSKMS kmsDecrypt = AWSKMSClientBuilder.standard()

AmazonDynamoDB ddbDecrypt = AmazonDynamoDBClientBuilder.standard()

// Configure the DynamoDB Encryption Client
// Use the Direct KMS materials provider and the DynamoDB encryptor
// Specify the key ARN of the multi-Region key in us-west-2
final DirectKmsMaterialProvider cmpDecrypt = new DirectKmsMaterialProvider(kmsDecrypt, cmkArnDecrypt);
final DynamoDBEncryptor decryptor = DynamoDBEncryptor.getInstance(cmpDecrypt);

// Set up your query
Map<String, AttributeValue> query = new HashMap<>();
query.put(employeeIdAttribute, new AttributeValue().withS("1234"));

// Get a record from DDB and decrypt it
final Map<String, AttributeValue> retrievedRecord = ddbDecrypt.getItem(tableName, query).getItem();
final Map<String, AttributeValue> decryptedRecord = decryptor.decryptRecord(retrievedRecord, actions, encryptionContext);

Note: This example encrypts with the primary multi-Region key and then decrypts with a replica multi-Region key. The process could also be reversed—every multi-Region key can be used in the encryption or decryption of data.


In this blog post, we showed you how to use AWS KMS multi-Region keys with client-side encryption to help secure data in global applications without sacrificing high availability or low latency. We also showed you how you can start working with a global application with a brief example of using multi-Region keys with the DynamoDB Encryption Client and DynamoDB global tables.

This blog post is a brief introduction to the ways you can use multi-Region keys. We encourage you to read through the Using multi-Region keys topic to learn more about their functionality and design. You’ll learn about:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS KMS forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Jeremy Stieglitz

Jeremy is the Principal Product Manager for AWS Key Management Service (KMS) where he drives global product strategy and roadmap for AWS KMS. Jeremy has more than 20 years of experience defining new products and platforms, launching and scaling cryptography solutions, and driving end-to-end product strategies. Jeremy is the author or co-author of 23 patents in network security, user authentication and network automation and control.


Peter Zieske

Peter is a Senior Software Developer on the AWS Key Management Service team, where he works on developing features on the service-side front-end. Outside of work, he enjoys building with LEGO, gaming, and spending time with family.


Ben Farley

Ben is a Senior Software Developer on the AWS Crypto Tools team, where he works on client-side encryption libraries that help customers protect their data. Before that, he spent time focusing on the scalability and availability of services like AWS Identity and Access Management and AWS Key Management Service. Outside of work, he likes to explore the mountains with his fiancée and dog.

Building an ARM64 Rust development environment using AWS Graviton2 and AWS CDK

Post Syndicated from Alistair McLean original https://aws.amazon.com/blogs/devops/building-an-arm64-rust-development-environment-using-aws-graviton2-and-aws-cdk/

2020 was the year that ARM chips made the headlines by moving from largely mobile form factors into the cloud thanks to AWS Graviton2, allowing you to have up to 40% better price performance over comparable current generation x86 Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Relational Database Service (Amazon RDS) instances.

We speak to customers daily about Graviton2. One recurring question we hear is “Graviton2 is great, but how can my team develop for ARM natively without the complexity of cross-compilation or having to buy custom hardware on premises?” This post seeks to answer that question by setting up the Visual Studio Code-based Code Server IDE, running on a Graviton2 EC2 instance that enables native development in a cost-effective and secure manner accessed via your browser.

The Rust programming language has gained a huge amount of popularity recently. This post aims to show that you can use this environment for Rust development as well as hundreds of other supported languages. AWS has committed to supporting the Rust community and using the language to deliver fast and robust services to customers at scale, and we want to enable our customers to do the same.

We also include instructions for building and installing the rust-analyzer and CodeLLDB debugger plugins to add additional language features.

Solution overview

The following diagram illustrates our solution architecture.

Architecture of the solution showing components and their linkages

The solution consists of an EC2 Graviton2 instance located in a private VPC subnet routed through an AWS Global Accelerator accelerator to provide routing optimization and keep packet loss, jitter, and latency lower by up to 60%. An internal facing Application Load Balancer containing the AWS Certificate Manager certificate decrypts and forwards traffic to this instance.

Code Server queries AWS Secrets Manager to initially set the login password on startup and allow for continued password-based authentication and easy password rotation. The EC2 instance has access to the internet through a NAT gateway and has no public IP address or key pair associated, and is accessible only through AWS Systems Manager Session Manager.


For this walkthrough, the following are prerequisites:

AWS CDK stack

In order to deploy our architecture, I use the AWS CDK. As a developer, it’s more intuitive to me to define my infrastructure using a language and tooling with which I am familiar. I can also do things like environment variable injection and scripting as part of the stack creation to add stack parameters and customization points.

The AWS CDK application is comprised of five stacks. Each stack defines a separate part of the architecture:

  • Networking – Defines a VPC across two Availability Zones with the CIDR range of your choice. The routing and public/private subnet creation is done for us as part of the default configuration.
  • Certificate – This is the reason for the domain prerequisite. It’s a best practice to encrypt web applications using TLS, and for that we need a certificate and therefore a domain. This stack creates a certificate for the subdomain you specify as part of the stack creation and DNS validation in Route 53.
  • Amazon EC2 configuration – This defines both our AMI and the instance type and configuration. In this case, we’re using Amazon Linux 2 ARM64 edition. Here we also set the instance-managed roles that allow Session Manager connectivity and Secrets Manager access.
  • ALB configuration – Here we define the internal load balancer and specify the listener, certificate, and target configuration. I have injected the Amazon EC2 configuration as part of the class constructor so that I can reference it directly as a target.
  • Global accelerator configuration – Finally, the accelerator is defined here with two ports open, the ALB we defined in the ALB stack as a target, and most importantly adds in a CNAME DNS entry pointing to the DNS name of the accelerator.

Walkthrough overview

This walkthrough uses the AWS CDK command line tools to deploy the stack. Session Manager is enabled to allow access to the EC2 instance and configure the Code Server application and associated plugins.

The walkthrough specifically covers the following steps:

  1. Deploy the AWS CDK stacks via CloudShell to build out the application infrastructure and associated IAM roles.
  2. Launch Code Server via the official Docker container with the commands to get and set the password stored in Secrets Manager.
  3. Log in and build the rust-analyzer and CodeLLDB plugins from a terminal to allow for debugging within a “Hello World” application.

Start CloudShell and install the appropriate tooling

In this section, I use dummy values for the domain, the VPC CIDR, AWS Region, and the secret password. You need to submit real values as appropriate.

Sign in to CloudShell and enter the following commands:

sudo yum groupinstall -y "Development Tools"
sudo npm install aws-cdk -g
git clone https://github.com/aws-samples/cdk-graviton2-alb-aga-route53.git
cd cdk-graviton2-alb-aga-route53
python3 -m venv .
source bin/activate
python -m pip install -r requirements.txt
export VPC_CIDR=”” #Substitute your CIDR here.
export CDK_DEPLOY_ACCOUNT=`aws sts get-caller-identity | jq -r '.Account'`
export R53_DOMAIN=”code-server.example.com” #Substitute your domain here.
cdk deploy --all

The deploy step takes around 10-15 mins to run and prompts a couple of times to add resources like security groups and IAM roles.

Log in to the new instance using Session Manager

Install the latest version of the Session Manager plugin for the AWS CLI:

cd ~
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/linux_64bit/session-manager-plugin.rpm" -o "session-manager-plugin.rpm"
sudo yum install -y session-manager-plugin.rpm

Now start a session, logging into the newly created EC2 instance and log in as ec2-user:

aws ssm start-session --target i-1234xyz7890abc #Substitute the instance id we just created here
#Once session is active:
sudo su - ec2-user

Add the password as a secret and start the container

Enter the following code to add the password as a secret in Secrets Manager and start the container:

aws secretsmanager create-secret --name CodeServerProd --secret-string Password123abc # Substitute the appropriate password here.
sudo docker run -d --name=code-server -e PUID=1000 -e PGID=1000 -e PASSWORD=`aws secretsmanager get-secret-value --secret-id CodeServerProd | jq -r '.SecretString'` -p 8080:8080 -v /home/ec2-user/.config:/config --restart unless-stopped codercom/code-server

Access and configure the web application for Rust development

So far, we have accomplished the following:

  • Created the infrastructure in the diagram via AWS CDK deployment
  • Configured the EC2 instance to run Docker and added this to the systemctl startup scripts
  • Created a secret in Secrets Manager to use as the application login password
  • Instantiated a Docker container running Code Server

Next, we access the running container via the web interface and install the required development tools.

Log in to the Code Server web application

To log in to the Code Server web application, complete the following steps:

  1. Browse to https://code-server.example.com, where example.com is the name of the domain you supplied in the AWS CDK step.
  2. Log in using the password you created in Secrets Manager.
  3. Create a new terminal by choosing the hamburger icon and, under Terminal, choosing New Terminal.
  4. Issue the following commands into the terminal to install the Rust programming language:
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential npm clang lldb
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Install the rust-analyzer plugin

Open the extensions panel and enter Rust Analyzer in the search bar. Then install the plugin.

Install the debugger

Go back to the extensions panel in the Code Server application and enter CodeLLDB into the search bar. Then install this extension.

Create a sample application and open it in the Code Server window

To create and use our sample application, complete the following steps:

  • In the existing Code Server terminal, enter the following:
mkdir -p ~/src/
cd ~/src
cargo new helloworld --bin
  • Open the newly created folder in Code Server verifying that the helloworld directory was successfully created.

Open File or Folder dialog in Code Server

  • Rust-analyzer runs when you open up src/main.rs and index the file.
  • You can run the program by choosing Run in the editor.

Main Code Server editor window showing helloworld Rust program code.

  • Similarly, to launch the debugger, choose Debug in the editor.

Code Server Debugger view


If the CloudShell session times out, you need to reset your environment variables in order to re-deploy, modify, and delete the stack deployment.

Clean up

This stack incurs an estimated monthly cost of $143.00.

To delete the stack, log in to CloudShell and enter the following commands:

cd cdk-graviton2-alb-aga-route53
source bin/activate

# Re-set the environment variables again if required
export VPC_CIDR=”” #Substitute your CIDR here.
export CDK_DEPLOY_ACCOUNT=`aws sts get-caller-identity | jq -r '.Account'`
export R53_DOMAIN=”code-server.example.com” #Substitute your domain here.
cdk destroy --all

This destroys all the resources created in the first step. You can verify this by browsing to the AWS CloudFormation console and noting the deletion of all the stacks.


AWS is a place where builders can reinvent the future. The future of development means supporting different chipsets depending on different business requirements. This post is designed to enable development targeting the ARM64 microarchitecture by utilizing AWS Graviton2. Happy building!

Author bio

Author portrait

Alistair is a Principal Solutions Architect at AWS focused on EdTech customers. Originally from the west coast of Scotland, Alistair now lives in Fairfield, Connecticut, with his wife and two daughters and enjoys spending time with his family, skiing, golfing, cycling, and using his pellet smoker.

How to implement a hybrid PKI solution on AWS

Post Syndicated from Max Farnga original https://aws.amazon.com/blogs/security/how-to-implement-a-hybrid-pki-solution-on-aws/

As customers migrate workloads into Amazon Web Services (AWS) they may be running a combination of on-premises and cloud infrastructure. When certificates are issued to this infrastructure, having a common root of trust to the certificate hierarchy allows for consistency and interoperability of the Public Key Infrastructure (PKI) solution.

In this blog post, I am going to show how you can plan and deploy a PKI that enables certificates to be issued across a hybrid (cloud & on-premises) environment with a common root. This solution will use Windows Server Certificate Authority (Windows CA), also known as Active Directory Certificate Services (ADCS) to distribute and manage x.509 certificates for Active Directory users, domain controllers, routers, workstations, web servers, mobile and other devices. And an AWS Certificate Manager Private Certificate Authority (ACM PCA) to manage certificates for AWS services, including API Gateway, CloudFront, Elastic Load Balancers, and other workloads.

The Windows CA also integrates with AWS Cloud HSM to securely store the private keys that sign the certificates issued by your CAs, and use the HSM to perform the cryptographic signing operations. In Figure 1, the diagram below shows how ACM PCA and Windows CA can be used together to issue certificates across a hybrid environment.

Figure 1: Hybrid PKI hierarchy

Figure 1: Hybrid PKI hierarchy

PKI is a framework that enables a safe and trustworthy digital environment through the use of a public and private key encryption mechanism. PKI maintains secure electronic transactions on the internet and in private networks. It also governs the verification, issuance, revocation, and validation of individual systems in a network.

There are two types of PKI:

This blog post focuses on the implementation of a private PKI, to issue and manage private certificates.

When implementing a PKI, there can be challenges from security, infrastructure, and operations standpoints, especially when dealing with workloads across multiple platforms. These challenges include managing isolated PKIs for individual networks across on-premises and AWS cloud, managing PKI with no Hardware Security Module (HSM) or on-premises HSM, and lack of automation to rapidly scale the PKI servers to meet demand.

Figure 2 shows how an internal PKI can be limited to a single network. In the following example, the root CA, issuing CAs, and certificate revocation list (CRL) distribution point are all in the same network, and issue cryptographic certificates only to users and devices in the same private network.

Figure 2: On-premises PKI hierarchy in a single network

Figure 2: On-premises PKI hierarchy in a single network

Planning for your PKI system deployment

It’s important to carefully consider your business requirements, encryption use cases, corporate network architecture, and the capabilities of your internal teams. You must also plan for how to manage the confidentiality, integrity, and availability of the cryptographic keys. These considerations should guide the design and implementation of your new PKI system.

In the below section, we outline the key services and components used to design and implement this hybrid PKI solution.

Key services and components for this hybrid PKI solution

Solution overview

This hybrid PKI can be used if you need a new private PKI, or want to upgrade from an existing legacy PKI with a cryptographic service provider (CSP) to a secure PKI with Windows Cryptography Next Generation (CNG). The hybrid PKI design allows you to seamlessly manage cryptographic keys throughout the IT infrastructure of your organization, from on-premises to multiple AWS networks.

Figure 3: Hybrid PKI solution architecture

Figure 3: Hybrid PKI solution architecture

The solution architecture is depicted in the preceding figure—Figure 3. The solution uses an offline root CA that can be operated on-premises or in an Amazon VPC, while the subordinate Windows CAs run on EC2 instances and are integrated with CloudHSM for key management and storage. To insulate the PKI from external access, the CloudHSM cluster are deployed in protected subnets, the EC2 instances are deployed in private subnets, and the host VPC has site-to-site network connectivity to the on-premises network. The Amazon EC2 volumes are encrypted with AWS KMS customer managed keys. Users and devices connect and enroll to the PKI interface through a Network Load Balancer.

This solution also includes a subordinate ACM private CA to issue certificates that will be installed on AWS services that are integrated with ACM. For example, ELB, CloudFront, and API Gateway. This is so that the certificates users see are always presented from your organization’s internal PKI.

Prerequisites for deploying this hybrid internal PKI in AWS

  • Experience with AWS Cloud, Windows Server, and AD CS is necessary to deploy and configure this solution.
  • An AWS account to deploy the cloud resources.
  • An offline root CA, running on Windows 2016 or newer, to sign the CloudHSM and the issuing CAs, including the private CA and Windows CAs. Here is an AWS Quick-Start article to deploy your Root CA in a VPC. We recommend installing the Windows Root CA in its own AWS account.
  • A VPC with at least four subnets. Two or more public subnets and two or more private subnets, across two or more AZs, with secure firewall rules, such as HTTPS to communicate with your PKI web servers through a load balancer, along with DNS, RDP and other port to communicate within your organization network. You can use this CloudFormation sample VPC template to help you get started with your PKI VPC provisioning.
  • Site-to-site AWS Direct Connect or VPN connection from your VPC to the on-premises network and other VPCs to securely manage multiple networks.
  • Windows 2016 EC2 instances for the subordinate CAs.
  • An Active Directory environment that has access to the VPC that hosts the PKI servers. This is required for a Windows Enterprise CA implementation.

Deploy the solution

The below CloudFormation Code and instructions will help you deploy and configure all the AWS components shown in the above architecture diagram. To implement the solution, you’ll deploy a series of CloudFormation templates through the AWS Management Console.

If you’re not familiar with CloudFormation, you can learn about it from Getting started with AWS CloudFormation. The templates for this solution can be deployed with the CloudFormation console, AWS Service Catalog, or a code pipeline.

Download and review the template bundle

To make it easier to deploy the components of this internal PKI solution, you download and deploy a template bundle. The bundle includes a set of CloudFormation templates, and a PowerShell script to complete the integration between CloudHSM and the Windows CA servers.

To download the template bundle

  1. Download or clone the solution source code repository from AWS GitHub.
  2. Review the descriptions in each template for more instructions.

Deploy the CloudFormation templates

Now that you have the templates downloaded, use the CouldFormation console to deploy them.

To deploy the VPC modification template

Deploy this template into an existing VPC to create the protected subnets to deploy a CloudHSM cluster.

  1. Navigate to the CloudFormation console.
  2. Select the appropriate AWS Region, and then choose Create Stack.
  3. Choose Upload a template file.
  4. Select 01_PKI_Automated-VPC_Modifications.yaml as the CloudFormation stack file, and then choose Next.
  5. On the Specify stack details page, enter a stack name and the parameters. Some parameters have a dropdown list that you can use to select existing values.

    Figure 4: Example of a <strong>Specify stack details</strong> page

    Figure 4: Example of a Specify stack details page

  6. Choose Next, Next, and Create Stack.

To deploy the PKI CDP S3 bucket template

This template creates an S3 bucket for the CRL and AIA distribution point, with initial bucket policies that allow access from the PKI VPC, and PKI users and devices from your on-premises network, based on your input. To grant access to additional AWS accounts, VPCs, and on-premises networks, please refer to the instructions in the template.

  1. Navigate to the CloudFormation console.
  2. Choose Upload a template file.
  3. Select 02_PKI_Automated-Central-PKI_CDP-S3bucket.yaml as the CloudFormation stack file, and then choose Next.
  4. On the Specify stack details page, enter a stack name and the parameters.
  5. Choose Next, Next, and Create Stack

To deploy the ACM Private CA subordinate template

This step provisions the ACM private CA, which is signed by an existing Windows root CA. Provisioning your private CA with CloudFormation makes it possible to sign the CA with a Windows root CA.

  1. Navigate to the CloudFormation console.
  2. Choose Upload a template file.
  3. Select 03_PKI_Automated-ACMPrivateCA-Provisioning.yaml as the CloudFormation stack file, and then choose Next.
  4. On the Specify stack details page, enter a stack name and the parameters. Some parameters have a dropdown list that you can use to select existing values.
  5. Choose Next, Next, and Create Stack.

Assign and configure certificates

After deploying the preceding templates, use the console to assign certificate renewal permissions to ACM and configure your certificates.

To assign renewal permissions

  1. In the ACM Private CA console, choose Private CAs.
  2. Select your private CA from the list.
  3. Choose the Permissions tab.
  4. Select Authorize ACM to use this CA for renewals.
  5. Choose Save.

To sign private CA certificates with an external CA (console)

  1. In the ACM Private CA console, select your private CA from the list.
  2. From the Actions menu, choose Import CA certificate. The ACM Private CA console returns the certificate signing request (CSR).
  3. Choose Export CSR to a file and save it locally.
  4. Choose Next.
    1. Use your existing Windows root CA.
    2. Copy the CSR to the root CA and sign it.
    3. Export the signed CSR in base64 format.
    4. Export the <RootCA>.crt certificate in base64 format.
  5. On the Upload the certificates page, upload the signed CSR and the RootCA certificates.
  6. Choose Confirm and Import to import the private CA certificate.

To request a private certificate using the ACM console

Note: Make a note of IDs of the certificate you configure in this section to use when you deploy the HTTPS listener CloudFormation templates.

  1. Sign in to the console and open the ACM console.
  2. Choose Request a certificate.
  3. On the Request a certificate page, choose Request a private certificate and Request a certificate to continue.
  4. On the Select a certificate authority (CA) page, choose Select a CA to view the list of available private CAs.
  5. Choose Next.
  6. On the Add domain names page, enter your domain name. You can use a fully qualified domain name, such as www.example.com, or a bare—also called apex—domain name such as example.com. You can also use an asterisk (*) as a wild card in the leftmost position to include all subdomains in the same root domain. For example, you can use *.example.com to include all subdomains of the root domain example.com.
  7. To add another domain name, choose Add another name to this certificate and enter the name in the text box.
  8. (Optional) On the Add tags page, tag your certificate.
  9. When you finish adding tags, choose Review and request.
  10. If the Review and request page contains the correct information about your request, choose Confirm and request.

Note: You can learn more at Requesting a Private Certificate.

To share the private CA with other accounts or with your organization

You can use ACM Private CA to share a single private CA with multiple AWS accounts. To share your private CA with multiple accounts, follow the instructions in How to use AWS RAM to share your ACM Private CA cross-account.

Continue deploying the CloudFormation templates

With the certificates assigned and configured, you can complete the deployment of the CloudFormation templates for this solution.

To deploy the Network Load Balancer template

In this step, you provision a Network Load Balancer.

  1. Navigate to the CloudFormation console.
  2. Choose Upload a template file.
  3. Select 05_PKI_Automated-LoadBalancer-Provisioning.yaml as the CloudFormation stack file, and then choose Next.
  4. On the Specify stack details page, enter a stack name and the parameters. Some parameters are filled in automatically or have a dropdown list that you can use to select existing values.
  5. Choose Next, Next, and Create Stack.

To deploy the HTTPS listener configuration template

The following steps create the HTTPS listener with an initial configuration for the load balancer.

  1. Navigate to the CloudFormation console:
  2. Choose Upload a template file.
  3. Select 06_PKI_Automated-HTTPS-Listener.yaml as the CloudFormation stack file, and then choose Next.
  4. On the Specify stack details page, enter the stack name and the parameters. Some parameters are filled in automatically or have a dropdown list that you can use to select existing values.
  5. Choose Next, Next, and Create Stack.

To deploy the AWS KMS CMK template

In this step, you create an AWS KMS CMK to encrypt EC2 EBS volumes and other resources. This is required for the EC2 instances in this solution.

  1. Open the CloudFormation console.
  2. Choose Upload a template file.
  3. Select 04_PKI_Automated-KMS_CMK-Creation.yaml as the CloudFormation stack file, and then choose Next.
  4. On the Specify stack details page, enter a stack name and the parameters.
  5. Choose Next, Next, and Create Stack.

To deploy the Windows EC2 instances provisioning template

This template provisions a purpose-built Windows EC2 instance within an existing VPC. It will provision an EC2 instance for the Windows CA, with KMS to encrypt the EBS volume, an IAM instance profile and automatically installs SSM agent on your instance.

It also has optional features and flexibilities. For example, the template can automatically create new target group, or add instance to existing target group. It can also configure listener rules, create Route 53 records and automatically join an Active Directory domain.

Note: The AWS KMS CMK and the IAM role are required to provision the EC2, while the target group, listener rules, and domain join features are optional.

  1. Navigate to the CloudFormation console.
  2. Choose Upload a template file.
  3. Select 07_PKI_Automated-EC2-Servers-Provisioning.yaml as the CloudFormation stack file, and then choose Next.
  4. On the Specify stack details page, enter the stack name and the parameters. Some parameters are filled in automatically or have a dropdown list that you can use to select existing values.

    Note: The Optional properties section at the end of the parameters list isn’t required if you’re not joining the EC2 instance to an Active Directory domain.

  5. Choose Next, Next, and Create Stack.

Create and initialize a CloudHSM cluster

In this section, you create and configure CloudHSM within the VPC subnets provisioned in previous steps. After the CloudHSM cluster is completed and signed by the Windows root CA, it will be integrated with the EC2 Windows servers provisioned in previous sections.

To create a CloudHSM cluster

  1. Log in to the AWS account, open the console, and navigate to the CloudHSM.
  2. Choose Create cluster.
  3. In the Cluster configuration section:
    1. Select the VPC you created.
    2. Select the three private subnets you created across the Availability Zones in previous steps.
  4. Choose Next: Review.
  5. Review your cluster configuration, and then choose Create cluster.

To create an HSM

  1. Open the console and go to the CloudHSM cluster you created in the preceding step.
  2. Choose Initialize.
  3. Select an AZ for the HSM that you’re creating, and then choose Create.

To download and sign a CSR

Before you can initialize the cluster, you must download and sign a CSR generated by the first HSM of the cluster.

  1. Open the CloudHSM console.
  2. Choose Initialize next to the cluster that you created previously.
  3. When the CSR is ready, select Cluster CSR to download it.

    Figure 5: Download CSR

    Figure 5: Download CSR

To initialize the cluster

  1. Open the CloudHSM console.
  2. Choose Initialize next to the cluster that you created previously.
  3. On the Download certificate signing request page, choose Next. If Next is not available, choose one of the CSR or certificate links, and then choose Next.
  4. On the Sign certificate signing request (CSR) page, choose Next.
  5. Use your existing Windows root CA.
    1. Copy the CSR to the root CA and sign it.
    2. Export the signed CSR in base64 format.
    3. Also export the <RootCA>.crt certificate in base64 format.
  6. On the Upload the certificates page, upload the signed CSR and the root CA certificates.
  7. Choose Upload and initialize.

Integrate CloudHSM cluster to Windows Server AD CS

In this section you use a script that provides step-by-step instructions to help you successfully integrate your Windows Server CA with AWS CloudHSM.

To integrate CloudHSM cluster to Windows Server AD CS

Open the script 09_PKI_AWS_CloudHSM-Windows_CA-Integration-Playbook.txt and follow the instructions to complete the CloudHSM integration with the Windows servers.

Install and configure Windows CA with CloudHSM

When the CloudHSM integration is complete, install and configure your Windows Server CA with the CloudHSM key storage provider and select RSA#Cavium Key Storage Provider as your cryptographic provider.


By deploying the hybrid solution in this post, you’ve implemented a PKI to manage security across all workloads in your AWS accounts and in your on-premises network.

With this solution, you can use a private CA to issue Transport Layer Security (TLS) certificates to your Application Load Balancers, Network Load Balancers, CloudFront, and other AWS workloads across multiple accounts and VPCs. The Windows CA lets you enhance your internal security by binding your internal users, digital devices, and applications to appropriate private keys. You can use this solution with TLS, Internet Protocol Security (IPsec), digital signatures, VPNs, wireless network authentication, and more.

Additional resources

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager forum or CloudHSM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Max Farnga

Max is a Security Transformation Consultant with AWS Professional Services – Security, Risk and Compliance team. He has a diverse technical background in infrastructure, security, and cloud computing. He helps AWS customers implement secure and innovative solutions on the AWS cloud.

How to import AWS IoT Device Defender audit findings into Security Hub

Post Syndicated from Joaquin Manuel Rinaudo original https://aws.amazon.com/blogs/security/how-to-import-aws-iot-device-defender-audit-findings-into-security-hub/

AWS Security Hub provides a comprehensive view of the security alerts and security posture in your accounts. In this blog post, we show how you can import AWS IoT Device Defender audit findings into Security Hub. You can then view and organize Internet of Things (IoT) security findings in Security Hub together with findings from other integrated AWS services, such as Amazon GuardDuty, Amazon Inspector, Amazon Macie, AWS Identity and Access Management (IAM) Access Analyzer, AWS Systems Manager, and more. You will gain a centralized security view across both enterprise and IoT types of workloads, and have an aggregated view of AWS IoT Device Defender audit findings. This solution can support AWS Accounts managed by AWS Organizations.

In this post, you’ll learn how the integration of IoT security findings into Security Hub works, and you can download AWS CloudFormation templates to implement the solution. After you deploy the solution, every failed audit check will be recorded as a Security Hub finding. The findings within Security Hub provides an AWS IoT Device Defender finding severity level and direct link to the AWS IoT Device Defender console so that you can take possible remediation actions. If you address the underlying findings or suppress the findings by using the AWS IoT Device Defender console, the solution function will automatically archive any related findings in Security Hub when a new audit occurs.

Solution scope

For this solution, we assume that you are familiar with how to set up an IoT environment and set up AWS IoT Device Defender. To learn more how to set up your environment, see the AWS tutorials, such as Getting started with AWS IoT Greengrass and Setting up AWS IoT Device Defender

The solution is intended for AWS accounts with fewer than 10,000 findings per scan. If AWS IoT Device Defender has more than 10,000 findings, the limit of 15 minutes for the duration of the serverless AWS Lambda function might be exceeded, depending on the network delay, and the function will fail.

The solution is designed for AWS Regions where AWS IoT Device Defender, serverless Lambda functionality and Security Hub are available; for more information, see AWS Regional Services. The China (Beijing) and China (Ningxia) Regions and the AWS GovCloud (US) Regions are excluded from the solution scope.

Solution overview

The templates that we provide here will provision an Amazon Simple Notification Service (Amazon SNS) topic notifying you when the AWS IoT Device Defender report is ready, and a Lambda function that imports the findings from the report into Security Hub. Figure 1 shows the solution architecture.

Figure 1: Solution architecture

Figure 1: Solution architecture

The solution workflow is as follows:

  1. AWS IoT Device Defender performs an audit of your environment. You should set up a regular audit as described in Audit guide: Enable audit checks.
  2. AWS IoT Device Defender sends an SNS notification with a summary of the audit report.
  3. A Lambda function named import-iot-defender-findings-to-security-hub is triggered by the SNS topic.
  4. The Lambda function gets the details of the findings from AWS IoT Device Defender.
  5. The Lambda function imports the new findings to Security Hub and archives the previous report findings. An example of findings in Security Hub is shown in Figure 2.
    Figure 2: Security Hub findings example

    Figure 2: Security Hub findings example


  • You must have Security Hub turned on in the Region where you’re deploying the solution.
  • You must also have your IoT environment set, see step by step tutorial at Getting started with AWS IoT Greengrass
  • You must also have AWS IoT Device Defender audit checks turned on. Learn how to configure recurring audit checks across all your IoT devices by using this tutorial.

Deploy the solution

You will need to deploy the solution once in each AWS Region where you want to integrate IoT security findings into Security Hub.

To deploy the solution

  1. Choose Launch Stack to launch the AWS CloudFormation console with the prepopulated CloudFormation demo template.

    Select the Launch Stack button to launch the template

    Additionally, you can download the latest solution code from GitHub.

  2. (Optional) In the CloudFormation console, you are presented with the template parameters before you deploy the stack. You can customize these parameters or keep the defaults:
    • S3 bucket with sources: This bucket contains all the solution sources, such as the Lambda function and templates. You can keep the default text if you’re not customizing the sources.
    • Prefix for S3 bucket with sources: The prefix for all the solution sources. You can keep the default if you’re not customizing the sources.
  3. Go to the AWS IoT Core console and set up an SNS alert notification parameter for the audit report. To do this, in the left navigation pane of the console, under Defend, choose Settings, and then choose Edit to edit the SNS alert. The SNS topic is created by the solution stack and named iot-defender-report-notification.
    Figure 3: SNS alert settings for AWS IoT Device Defender

    Figure 3: SNS alert settings for AWS IoT Device Defender

Test the solution

To test the solution, you can simulate an “AWS IoT policies are overly permissive” finding by creating an insecure policy.

To create an insecure policy

  1. Go to the AWS IoT Core console. In the left navigation pane, under Secure, choose Policies.
  2. Choose Create. For Name, enter InsecureIoTPolicy.
  3. For Action, select iot:*. For Resources, enter *. Choose Allow statement, and then choose Create.

Next, run a new IoT security audit by choosing IoT Core > Defend > Audit > Results > Create and selecting the option Run audit now (Once).

After the audit is finished, you’ll see audit reports in the AWS IoT Core console, similar to the ones shown in Figure 4. One of the reports shows that the IoT policies are overly permissive. The same findings are also imported into Security Hub as shown in Figure 2.

Figure 4: AWS IoT Device Defender report

Figure 4: AWS IoT Device Defender report


To troubleshoot the solution, use the Amazon CloudWatch Logs of the Lambda function import-iot-defender-findings-to-security-hub. The solution can fail if:

  • Security Hub isn’t turned on in your Region
  • Service control policies (SCPs) are preventing access to AWS IoT Device Defender audit reports
  • The wrong SNS topic is configured in the AWS IoT Device Defender settings
  • The Lambda function times out because there are more than 10,000 findings

To find these issues, go to the CloudWatch console, choose Log Group, and then choose /aws/lambda/import-iot-defender-findings-to-security-hub.


In this post, you’ve learned how to integrate AWS IoT Device Defender audit findings with Security Hub to gain a centralized view of security findings across both your enterprise and IoT workloads. If you have more questions about IoT, you can reach out to the AWS IoT forum, and if you have questions about Security Hub, visit the AWS Security Hub forum. If you need AWS experts to help you plan, build, or optimize your infrastructure, contact AWS Professional Services.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Joaquin Manuel Rinaudo

Joaquin is a Senior Security Architect with AWS Professional Services. He is passionate about building solutions that help developers improve their software quality. Prior to AWS, he worked across multiple domains in the security industry, from mobile security to cloud and compliance related topics. In his free time, Joaquin enjoys spending time with family and reading science-fiction novels.


Vesselin Tzvetkov

Vesselin is a Senior Security Architect at AWS Professional Services and is passionate about security architecture and engineering innovative solutions. Outside of technology, he likes classical music, philosophy, and sports. He holds a Ph.D. in security from TU-Darmstadt and a M.S. in electrical engineering from Bochum University in Germany.

Building fine-grained authorization using Amazon Cognito, API Gateway, and IAM

Post Syndicated from Artem Lovan original https://aws.amazon.com/blogs/security/building-fine-grained-authorization-using-amazon-cognito-api-gateway-and-iam/

June 5, 2021: We’ve updated Figure 1: User request flow.

Authorizing functionality of an application based on group membership is a best practice. If you’re building APIs with Amazon API Gateway and you need fine-grained access control for your users, you can use Amazon Cognito. Amazon Cognito allows you to use groups to create a collection of users, which is often done to set the permissions for those users. In this post, I show you how to build fine-grained authorization to protect your APIs using Amazon Cognito, API Gateway, and AWS Identity and Access Management (IAM).

As a developer, you’re building a customer-facing application where your users are going to log into your web or mobile application, and as such you will be exposing your APIs through API Gateway with upstream services. The APIs could be deployed on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, or Elastic Load Balancing where each of these options will forward the request to your Amazon Elastic Compute Cloud (Amazon EC2) instances. Additionally, you can use on-premises services that are connected to your Amazon Web Services (AWS) environment over an AWS VPN or AWS Direct Connect. It’s important to have fine-grained controls for each API endpoint and HTTP method. For instance, the user should be allowed to make a GET request to an endpoint, but should not be allowed to make a POST request to the same endpoint. As a best practice, you should assign users to groups and use group membership to allow or deny access to your API services.

Solution overview

In this blog post, you learn how to use an Amazon Cognito user pool as a user directory and let users authenticate and acquire the JSON Web Token (JWT) to pass to the API Gateway. The JWT is used to identify what group the user belongs to, as mapping a group to an IAM policy will display the access rights the group is granted.

Note: The solution works similarly if Amazon Cognito would be federating users with an external identity provider (IdP)—such as Ping, Active Directory, or Okta—instead of being an IdP itself. To learn more, see Adding User Pool Sign-in Through a Third Party. Additionally, if you want to use groups from an external IdP to grant access, Role-based access control using Amazon Cognito and an external identity provider outlines how to do so.

The following figure shows the basic architecture and information flow for user requests.

Figure 1: User request flow

Figure 1: User request flow

Let’s go through the request flow to understand what happens at each step, as shown in Figure 1:

  1. A user logs in and acquires an Amazon Cognito JWT ID token, access token, and refresh token. To learn more about each token, see using tokens with user pools.
  2. A RestAPI request is made and a bearer token—in this solution, an access token—is passed in the headers.
  3. API Gateway forwards the request to a Lambda authorizer—also known as a custom authorizer.
  4. The Lambda authorizer verifies the Amazon Cognito JWT using the Amazon Cognito public key. On initial Lambda invocation, the public key is downloaded from Amazon Cognito and cached. Subsequent invocations will use the public key from the cache.
  5. The Lambda authorizer looks up the Amazon Cognito group that the user belongs to in the JWT and does a lookup in Amazon DynamoDB to get the policy that’s mapped to the group.
  6. Lambda returns the policy and—optionally—context to API Gateway. The context is a map containing key-value pairs that you can pass to the upstream service. It can be additional information about the user, the service, or anything that provides additional information to the upstream service.
  7. The API Gateway policy engine evaluates the policy.

    Note: Lambda isn’t responsible for understanding and evaluating the policy. That responsibility falls on the native capabilities of API Gateway.

  8. The request is forwarded to the service.

Note: To further optimize Lambda authorizer, the authorization policy can be cached or disabled, depending on your needs. By enabling cache, you could improve the performance as the authorization policy will be returned from the cache whenever there is a cache key match. To learn more, see Configure a Lambda authorizer using the API Gateway console.

Let’s have a closer look at the following example policy that is stored as part of an item in DynamoDB.


Based on this example policy, the user is allowed to make calls to the petstore API. For version v1, the user can make requests to any verb and any path, which is expressed by an asterisk (*). For v2, the user is only allowed to make a GET request for path /status. To learn more about how the policies work, see Output from an Amazon API Gateway Lambda authorizer.

Getting started

For this solution, you need the following prerequisites:

  • The AWS Command Line Interface (CLI) installed and configured for use.
  • Python 3.6 or later, to package Python code for Lambda

    Note: We recommend that you use a virtual environment or virtualenvwrapper to isolate the solution from the rest of your Python environment.

  • An IAM role or user with enough permissions to create Amazon Cognito User Pool, IAM Role, Lambda, IAM Policy, API Gateway and DynamoDB table.
  • The GitHub repository for the solution. You can download it, or you can use the following Git command to download it from your terminal.

    Note: This sample code should be used to test out the solution and is not intended to be used in production account.

     $ git clone https://github.com/aws-samples/amazon-cognito-api-gateway.git
     $ cd amazon-cognito-api-gateway

    Use the following command to package the Python code for deployment to Lambda.

     $ bash ./helper.sh package-lambda-functions
     Successfully completed packaging files.

To implement this reference architecture, you will be utilizing the following services:

Note: This solution was tested in the us-east-1, us-east-2, us-west-2, ap-southeast-1, and ap-southeast-2 Regions. Before selecting a Region, verify that the necessary services—Amazon Cognito, API Gateway, and Lambda—are available in those Regions.

Let’s review each service, and how those will be used, before creating the resources for this solution.

Amazon Cognito user pool

A user pool is a user directory in Amazon Cognito. With a user pool, your users can log in to your web or mobile app through Amazon Cognito. You use the Amazon Cognito user directory directly, as this sample solution creates an Amazon Cognito user. However, your users can also log in through social IdPs, OpenID Connect (OIDC), and SAML IdPs.

Lambda as backing API service

Initially, you create a Lambda function that serves your APIs. API Gateway forwards all requests to the Lambda function to serve up the requests.

An API Gateway instance and integration with Lambda

Next, you create an API Gateway instance and integrate it with the Lambda function you created. This API Gateway instance serves as an entry point for the upstream service. The following bash command below creates an Amazon Cognito user pool, a Lambda function, and an API Gateway instance. The command then configures proxy integration with Lambda and deploys an API Gateway stage.

Deploy the sample solution

From within the directory where you downloaded the sample code from GitHub, run the following command to generate a random Amazon Cognito user password and create the resources described in the previous section.

 $ bash ./helper.sh cf-create-stack-gen-password
 Successfully created CloudFormation stack.

When the command is complete, it returns a message confirming successful stack creation.

Validate Amazon Cognito user creation

To validate that an Amazon Cognito user has been created successfully, run the following command to open the Amazon Cognito UI in your browser and then log in with your credentials.

Note: When you run this command, it returns the user name and password that you should use to log in.

 $ bash ./helper.sh open-cognito-ui
  Opening Cognito UI. Please use following credentials to login:
  Username: cognitouser
  Password: xxxxxxxx

Alternatively, you can open the CloudFormation stack and get the Amazon Cognito hosted UI URL from the stack outputs. The URL is the value assigned to the CognitoHostedUiUrl variable.

Figure 2: CloudFormation Outputs - CognitoHostedUiUrl

Figure 2: CloudFormation Outputs – CognitoHostedUiUrl

Validate Amazon Cognito JWT upon login

Since we haven’t installed a web application that would respond to the redirect request, Amazon Cognito will redirect to localhost, which might look like an error. The key aspect is that after a successful log in, there is a URL similar to the following in the navigation bar of your browser:


Test the API configuration

Before you protect the API with Amazon Cognito so that only authorized users can access it, let’s verify that the configuration is correct and the API is served by API Gateway. The following command makes a curl request to API Gateway to retrieve data from the API service.

 $ bash ./helper.sh curl-api

The expected result is that the response will be a list of pets. In this case, the setup is correct: API Gateway is serving the API.

Protect the API

To protect your API, the following is required:

  1. DynamoDB to store the policy that will be evaluated by the API Gateway to make an authorization decision.
  2. A Lambda function to verify the user’s access token and look up the policy in DynamoDB.

Let’s review all the services before creating the resources.

Lambda authorizer

A Lambda authorizer is an API Gateway feature that uses a Lambda function to control access to an API. You use a Lambda authorizer to implement a custom authorization scheme that uses a bearer token authentication strategy. When a client makes a request to one of the API operations, the API Gateway calls the Lambda authorizer. The Lambda authorizer takes the identity of the caller as input and returns an IAM policy as the output. The output is the policy that is returned in DynamoDB and evaluated by the API Gateway. If there is no policy mapped to the caller identity, Lambda will generate a deny policy and request will be denied.

DynamoDB table

DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. This is ideal for this use case to ensure that the Lambda authorizer can quickly process the bearer token, look up the policy, and return it to API Gateway. To learn more, see Control access for invoking an API.

The final step is to create the DynamoDB table for the Lambda authorizer to look up the policy, which is mapped to an Amazon Cognito group.

Figure 3 illustrates an item in DynamoDB. Key attributes are:

  • Group, which is used to look up the policy.
  • Policy, which is returned to API Gateway to evaluate the policy.


Figure 3: DynamoDB item

Figure 3: DynamoDB item

Based on this policy, the user that is part of the Amazon Cognito group pet-veterinarian is allowed to make API requests to endpoints https://<domain>/<api-gateway-stage>/petstore/v1/* and https://<domain>/<api-gateway-stage>/petstore/v2/status for GET requests only.

Update and create resources

Run the following command to update existing resources and create a Lambda authorizer and DynamoDB table.

 $ bash ./helper.sh cf-update-stack
Successfully updated CloudFormation stack.

Test the custom authorizer setup

Begin your testing with the following request, which doesn’t include an access token.

$ bash ./helper.sh curl-api

The request is denied with the message Unauthorized. At this point, the Amazon API Gateway expects a header named Authorization (case sensitive) in the request. If there’s no authorization header, the request is denied before it reaches the lambda authorizer. This is a way to filter out requests that don’t include required information.

Use the following command for the next test. In this test, you pass the required header but the token is invalid because it wasn’t issued by Amazon Cognito but is a simple JWT-format token stored in ./helper.sh. To learn more about how to decode and validate a JWT, see decode and verify an Amazon Cognito JSON token.

$ bash ./helper.sh curl-api-invalid-token
{"Message":"User is not authorized to access this resource"}

This time the message is different. The Lambda authorizer received the request and identified the token as invalid and responded with the message User is not authorized to access this resource.

To make a successful request to the protected API, your code will need to perform the following steps:

  1. Use a user name and password to authenticate against your Amazon Cognito user pool.
  2. Acquire the tokens (id token, access token, and refresh token).
  3. Make an HTTPS (TLS) request to API Gateway and pass the access token in the headers.

Before the request is forwarded to the API service, API Gateway receives the request and passes it to the Lambda authorizer. The authorizer performs the following steps. If any of the steps fail, the request is denied.

  1. Retrieve the public keys from Amazon Cognito.
  2. Cache the public keys so the Lambda authorizer doesn’t have to make additional calls to Amazon Cognito as long as the Lambda execution environment isn’t shut down.
  3. Use public keys to verify the access token.
  4. Look up the policy in DynamoDB.
  5. Return the policy to API Gateway.

The access token has claims such as Amazon Cognito assigned groups, user name, token use, and others, as shown in the following example (some fields removed).

    "sub": "00000000-0000-0000-0000-0000000000000000",
    "cognito:groups": [
    "token_use": "access",
    "scope": "openid email",
    "username": "cognitouser"

Finally, let’s programmatically log in to Amazon Cognito UI, acquire a valid access token, and make a request to API Gateway. Run the following command to call the protected API.

$ bash ./helper.sh curl-protected-api

This time, you receive a response with data from the API service. Let’s examine the steps that the example code performed:

  1. Lambda authorizer validates the access token.
  2. Lambda authorizer looks up the policy in DynamoDB based on the group name that was retrieved from the access token.
  3. Lambda authorizer passes the IAM policy back to API Gateway.
  4. API Gateway evaluates the IAM policy and the final effect is an allow.
  5. API Gateway forwards the request to Lambda.
  6. Lambda returns the response.

Let’s continue to test our policy from Figure 3. In the policy document, arn:aws:execute-api:*:*:*/*/GET/petstore/v2/status is the only endpoint for version V2, which means requests to endpoint /GET/petstore/v2/pets should be denied. Run the following command to test this.

 $ bash ./helper.sh curl-protected-api-not-allowed-endpoint
{"Message":"User is not authorized to access this resource"}

Note: Now that you understand fine grained access control using Cognito user pool, API Gateway and lambda function, and you have finished testing it out, you can run the following command to clean up all the resources associated with this solution:

 $ bash ./helper.sh cf-delete-stack

Advanced IAM policies to further control your API

With IAM, you can create advanced policies to further refine access to your APIs. You can learn more about condition keys that can be used in API Gateway, their use in an IAM policy with conditions, and how policy evaluation logic determines whether to allow or deny a request.


In this post, you learned how IAM and Amazon Cognito can be used to provide fine-grained access control for your API behind API Gateway. You can use this approach to transparently apply fine-grained control to your API, without having to modify the code in your API, and create advanced policies by using IAM condition keys.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Artem Lovan

Artem is a Senior Solutions Architect based in New York. He helps customers architect and optimize applications on AWS. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.

Integrate GitHub monorepo with AWS CodePipeline to run project-specific CI/CD pipelines

Post Syndicated from Vivek Kumar original https://aws.amazon.com/blogs/devops/integrate-github-monorepo-with-aws-codepipeline-to-run-project-specific-ci-cd-pipelines/

AWS CodePipeline is a continuous delivery service that enables you to model, visualize, and automate the steps required to release your software. With CodePipeline, you model the full release process for building your code, deploying to pre-production environments, testing your application, and releasing it to production. CodePipeline then builds, tests, and deploys your application according to the defined workflow either in manual mode or automatically every time a code change occurs. A lot of organizations use GitHub as their source code repository. Some organizations choose to embed multiple applications or services in a single GitHub repository separated by folders. This method of organizing your source code in a repository is called a monorepo.

This post demonstrates how to customize GitHub events that invoke a monorepo service-specific pipeline by reading the GitHub event payload using AWS Lambda.


Solution overview

With the default setup in CodePipeline, a release pipeline is invoked whenever a change in the source code repository is detected. When using GitHub as the source for a pipeline, CodePipeline uses a webhook to detect changes in a remote branch and starts the pipeline. When using a monorepo style project with GitHub, it doesn’t matter which folder in the repository you change the code, CodePipeline gets an event at the repository level. If you have a continuous integration and continuous deployment (CI/CD) pipeline for each of the applications and services in a repository, all pipelines detect the change in any of the folders every time. The following diagram illustrates this scenario.


GitHub monorepo folder structure


This post demonstrates how to customize GitHub events that invoke a monorepo service-specific pipeline by reading the GitHub event payload using Lambda. This solution has the following benefits:

  • Add customizations to start pipelines based on external factors – You can use custom code to evaluate whether a pipeline should be triggered. This allows for further customization beyond polling a source repository or relying on a push event. For example, you can create custom logic to automatically reschedule deployments on holidays to the next available workday.
  • Have multiple pipelines with a single source – You can trigger selected pipelines when multiple pipelines are listening to a single GitHub repository. This lets you group small and highly related but independently shipped artifacts such as small microservices without creating thousands of GitHub repos.
  • Avoid reacting to unimportant files – You can avoid triggering a pipeline when changing files that don’t affect the application functionality (such as documentation, readme, PDF, and .gitignore files).

In this post, we’re not debating the advantages or disadvantages of a monorepo versus a single repo, or when to create monorepos or single repos for each application or project.


Sample architecture

This post focuses on controlling running pipelines in CodePipeline. CodePipeline can have multiple stages like test, approval, and deploy. Our sample architecture considers a simple pipeline with two stages: source and build.


Github monorepo - CodePipeline Sample Architecture

This solution is made up of following parts:

  • An Amazon API Gateway endpoint (3) is backed by a Lambda function (5) to receive and authenticate GitHub webhook push events (2)
  • The same function evaluates incoming GitHub push events and starts the pipeline on a match
  • An Amazon Simple Storage Service (Amazon S3) bucket (4) stores the CodePipeline-specific configuration files
  • The pipeline contains a build stage with AWS CodeBuild


Normally, after you create a CI/CD pipeline, it automatically triggers a pipeline to release the latest version of your source code. From then on, every time you make a change in your source code, the pipeline is triggered. You can also manually run the last revision through a pipeline by choosing Release change on the CodePipeline console. This architecture uses the manual mode to run the pipeline. GitHub push events and branch changes are evaluated by the Lambda function to avoid commits that change unimportant files from starting the pipeline.


Creating an API Gateway endpoint

We need a single API Gateway endpoint backed by a Lambda function with the responsibility of authenticating and validating incoming requests from GitHub. You can authenticate requests using HMAC security or GitHub Apps. API Gateway only needs one POST method to consume GitHub push events, as shown in the following screenshot.


Creating an API Gateway endpoint


Creating the Lambda function

This Lambda function is responsible for authenticating and evaluating the GitHub events. As part of the evaluation process, the function can parse through the GitHub events payload, determine which files are changed, added, or deleted, and perform the appropriate action:

  • Start a single pipeline, depending on which folder is changed in GitHub
  • Start multiple pipelines
  • Ignore the changes if non-relevant files are changed

You can store the project configuration details in Amazon S3. Lambda can read this configuration to decide what needs to be done when a particular folder is matched from a GitHub event. The following code is an example configuration:


    "GitHubRepo": "SampleRepo",

    "GitHubBranch": "main",

    "ChangeMatchExpressions": "ProjectA/.*",

    "IgnoreFiles": "*.pdf;*.md",

    "CodePipelineName": "ProjectA - CodePipeline"


For more complex use cases, you can store the configuration file in Amazon DynamoDB.

The following is the sample Lambda function code in Python 3.7 using Boto3:

def lambda_handler(event, context):

    import json
    modifiedFiles = event["commits"][0]["modified"]
    #full path
    for filePath in modifiedFiles:
        # Extract folder name
        folderName = (filePath[:filePath.find("/")])

    #start the pipeline
    if len(folderName)>0:
        # Codepipeline name is foldername-job. 
        # We can read the configuration from S3 as well. 
        returnCode = start_code_pipeline(folderName + '-job')

    return {
        'statusCode': 200,
        'body': json.dumps('Modified project in repo:' + folderName)

def start_code_pipeline(pipelineName):
    client = codepipeline_client()
    response = client.start_pipeline_execution(name=pipelineName)
    return True

cpclient = None
def codepipeline_client():
    import boto3
    global cpclient
    if not cpclient:
        cpclient = boto3.client('codepipeline')
    return cpclient

Creating a GitHub webhook

GitHub provides webhooks to allow external services to be notified on certain events. For this use case, we create a webhook for a push event. This generates a POST request to the URL (API Gateway URL) specified for any files committed and pushed to the repository. The following screenshot shows our webhook configuration.

Creating a GitHub webhook2


In our sample architecture, two pipelines monitor the same GitHub source code repository. A Lambda function decides which pipeline to run based on the GitHub events. The same function can have logic to ignore unimportant files, for example any readme or PDF files.

Using API Gateway, Lambda, and Amazon S3 in combination serves as a general example to introduce custom logic to invoke pipelines. You can expand this solution for increasingly complex processing logic.


About the Author

Vivek Kumar

Vivek is a Solutions Architect at AWS based out of New York. He works with customers providing technical assistance and architectural guidance on various AWS services. He brings more than 23 years of experience in software engineering and architecture roles for various large-scale enterprises.




Gaurav is a Solutions Architect at AWS. He works with digital native business customers providing architectural guidance on AWS services.





Nitin is a Solutions Architect at AWS. He works with digital native business customers providing architectural guidance on AWS services.





How to delegate management of identity in AWS Single Sign-On

Post Syndicated from Louay Shaat original https://aws.amazon.com/blogs/security/how-to-delegate-management-of-identity-in-aws-single-sign-on/

In this blog post, I show how you can use AWS Single Sign-On (AWS SSO) to delegate administration of user identities. Delegation is the process of providing your teams permissions to manage accounts and identities associated with their teams. You can achieve this by using the existing integration that AWS SSO has with AWS Organizations, and by using tags and conditions in AWS Identity and Access Management (IAM).

AWS SSO makes it easy to centrally manage access to multiple Amazon Web Services (AWS) accounts and business applications, and to provide users with single sign-on access to all their assigned accounts and applications from one place.

AWS SSO uses permission sets—a collection of administrator-defined policies—to determine a user’s effective permissions to access a given AWS account. Permission sets can contain either AWS managed policies or custom policies that are stored in AWS SSO. Policies are documents that act as containers for one or more permission statements. These statements represent individual access controls (allow or deny) for various tasks, which determine what tasks users can or cannot perform within the AWS account. Permission sets are provisioned as IAM roles in your organizational accounts, and are managed centrally using AWS SSO.

AWS SSO is tightly integrated with AWS Organizations, and runs in your AWS Organizations management account. This integration enables AWS SSO to retrieve and manage permission sets across your AWS Organizations configuration.

As you continue to build more of your workloads on AWS, managing access to AWS accounts and services becomes more time consuming for team members that manage identities. With a centralized identity approach that uses AWS SSO, there’s an increased need to delegate control of permission sets and accounts to domain and application owners. Although this is a valid use case, access to the management account in Organizations should be tightly guarded as a security best practice. As an administrator in the management account of an organization, you can control how teams and users access your AWS accounts and applications.

This post shows how you can build comprehensive delegation models in AWS SSO to securely and effectively delegate control of identities to various teams.

Solution overview

Suppose you’ve implemented AWS SSO in Organizations to manage identity across your entire AWS environment. Your organization is growing and the number of accounts and teams that need access to your AWS environment is also growing. You have a small Identity team that is constantly adding, updating, or deleting users or groups and permission sets to enable your teams to gain access to their required services and accounts.

Note: You can learn how to enable AWS SSO from the Introducing AWS Single Sign-On blog post.

As the number of teams grows, you want to start using a delegation model to enable account and application owners to manage access to their resources, in order to reduce the heavy lifting that is done by teams that manage identities.

Figure 1 shows a simple organizational structure that your organization implemented.

Figure 1: AWS SSO with AWS Organizations

Figure 1: AWS SSO with AWS Organizations

In this scenario, you’ve already built a collection of organizational-approved permission sets that are used across your organization. You have a tagging strategy for permission sets, and you’ve implemented two tags across all your permission sets:

  • Environment: The values for this tag are Production or Development. You only apply Production permission sets to Production accounts.
  • OU: This tag identifies the organizational unit (OU) that the permission set belongs to.

A value of All can be assigned to either tag to identify organization-wide use of the permission set.

You identified three models of delegation that you want to enable based on the setup just described, and your Identity team has identified three use cases that they want to implement:

  • A simple delegation model for a team to manage all permission sets for a set of accounts.
  • A delegation model for support teams to apply read-only permission sets to all accounts.
  • A delegation model based on AWS Organizations, where a team can manage only the permission sets intended for a specific OU.

The AWS SSO delegation model enables three key conditions for restricting user access:

  • Permission sets.
  • Accounts
  • Tags that use the condition aws:ResourceTag, to ensure that tags are present on your permission sets as part of your delegation model.

In the rest of this blog post, I show you how AWS SSO administrators can use these conditions to implement the use cases highlighted here to build a delegation model.

See Delegating permission set administration and Actions, resources, and condition keys for AWS SSO for more information.

Important: The use cases that follow are examples that can be adopted by your organization. The permission sets in these use cases show only what is needed to delegate the components discussed. You need to add additional policies to give users and groups access to AWS SSO.

Some examples:

Identify your permission set and AWS SSO instance IDs

You can use either the AWS Command Line Interface (AWS CLI) v2 or the AWS Management Console to identify your permission set and AWS SSO instance IDs.

Use the AWS CLI

To use the AWS CLI to identify the Amazon resource names (ARNs) of the AWS SSO instance and permission set, make sure you have AWS CLI v2 installed.

To list the AWS SSO instance ID ARN

Run the following command:

aws sso-admin list-instances

To list the permission set ARN

Run the following command:

aws sso-admin list-permission-sets --instance-arn <instance arn from above>

Use the console

You can also use the console to identify your permission sets and AWS SSO instance IDs.

To list the AWS SSO Instance ID ARN

  1. Navigate to the AWS SSO in your Region. Choose the Dashboard and then choose Choose your identity source.
  2. Copy the AWS SSO ARN ID.
Figure 2: AWS SSO ID ARN

Figure 2: AWS SSO ID ARN

To list the permission set ARN

  1. Navigate to the AWS SSO Service in your Region. Choose AWS Accounts and then Permission Sets.
  2. Select the permission set you want to use.
  3. Copy the ARN of the permission set.
Figure 3: Permission set ARN

Figure 3: Permission set ARN

Use case 1: Accounts-based delegation model

In this use case, you create a single policy to allow administrators to assign any permission set to a specific set of accounts.

First, you need to create a custom permission set to use with the following example policy.

The example policy is as follows.

            "Sid": "DelegatedAdminsAccounts",
            "Effect": "Allow",
            "Action": [
            "Resource": [

This policy specifies that delegated admins are allowed to provision any permission set to the three accounts listed in the policy.

Note: To apply this permission set to your environment, replace the account numbers following Resource with your account numbers.

Use case 2: Permission-based delegation model

In this use case, you create a single policy to allow administrators to assign a specific permission set to any account. The policy is as follows.

                    "Sid": "DelegatedPermissionsAdmin",
                    "Effect": "Allow",
                    "Action": [
                    "Resource": [



This policy specifies that delegated admins are allowed to provision only the specific permission set listed in the policy to any account.


Use case 3: OU-based delegation model

In this use case, the Identity team wants to delegate the management of the Development permission sets (identified by the tag key Environment) to the Test OU (identified by the tag key OU). You use the Environment and OU tags on permission sets to restrict access to only the permission sets that contain both tags.

To build this permission set for delegation, you need to create two policies in the same permission set:

  • A policy that filters the permission sets based on both tags—Environment and OU.
  • A policy that filters the accounts belonging to the Development OU.

The policies are as follows.

                    "Sid": "DelegatedOUAdmin",
                    "Effect": "Allow",
                    "Action": [
                    "Resource": "arn:aws:sso:::permissionSet/*/*",
                    "Condition": {
                        "StringEquals": {
                            "aws:ResourceTag/Environment": "Development",
                            "aws:ResourceTag/OU": "Test"
            "Sid": "Instance",
            "Effect": "Allow",
            "Action": [
            "Resource": [


In the delegated policy, the user or group is only allowed to provision permission sets that have both tags, OU and Environment, set to “Development” and only to accounts in the Development OU.

Note: In the example above arn:aws:sso:::instance/ssoins-11112222233333 is the ARN for the AWS SSO Instance ID. To get your AWS SSO Instance ID, refer to Identify your permission set and AWS SSO Instance IDs.

Create a delegated admin profile in AWS SSO

Now that you know what’s required to delegate permissions, you can create a delegated profile and deploy that to your users and groups.

To create a delegated AWS SSO profile

  1. In the AWS SSO console, sign in to your management account and browse to the Region where AWS SSO is provisioned.
  2. Navigate to AWS Accounts and choose Permission sets, and then choose Create permission set.
    Figure 4: AWS SSO permission sets menu

    Figure 4: AWS SSO permission sets menu

  3. Choose Create a custom permission set.
    Figure 5: Create a new permission set

    Figure 5: Create a new permission set

  4. Give a name to your permission set based on your naming standards and select a session duration from your organizational policies.
  5. For Relay state, enter the following URL:

    where <region> is the AWS Region in which you deployed AWS SSO.

    The relay state will automatically redirect the user to the Accounts section in the AWS SSO console, for simplicity.

    Figure 6: Custom permission set

    Figure 6: Custom permission set

  6. Choose Create new permission set. Here is where you can decide the level of delegation required for your application or domain administrators.
    Figure 7: Assign users

    Figure 7: Assign users

    See some of the examples in the earlier sections of this post for the permission set.

  7. If you’re using AWS SSO with AWS Directory Service for Microsoft Active Directory, you’ll need to provide access to AWS Directory Service in order for your administrator to assign permission sets to users and groups.

    To provide this access, navigate to the AWS Accounts screen in the AWS SSO console, and select your management account. Assign the required users or groups, and select the permission set that you created earlier. Then choose Finish.

  8. To test this delegation, sign in to AWS SSO. You’ll see the newly created permission set.
    Figure 8: AWS SSO sign-on page

    Figure 8: AWS SSO sign-on page

  9. Next to developer-delegated-admin, choose Management console. This should automatically redirect you to AWS SSO in the AWS Accounts submenu.

If you try to provision access by assigning or creating new permission sets to accounts or permission sets you are not explicitly allowed, according to the policies you specified earlier, you will receive the following error.

Figure 9: Error based on lack of permissions

Figure 9: Error based on lack of permissions

Otherwise, the provisioning will be successful.


You’ve seen that by using conditions and tags on permission sets, application and account owners can use delegation models to manage the deployment of permission sets across the accounts they manage, providing them and their teams with secure access to AWS accounts and services.

Additionally, because AWS SSO supports attribute-based access control (ABAC), you can create a more dynamic delegation model based on attributes from your identity provider, to match the tags on the permission set.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Single Sign-On forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Louay Shaat

Louay is a Security Solutions Architect with AWS. He spends his days working with customers, from startups to the largest of enterprises, helping them build cool new capabilities and accelerating their cloud journey. He has a strong focus on security and automation to help customers improve their security, risk, and compliance in the cloud.

Automate Amazon EC2 instance isolation by using tags

Post Syndicated from Jose Obando original https://aws.amazon.com/blogs/security/automate-amazon-ec2-instance-isolation-by-using-tags/

Containment is a crucial part of an overall Incident Response Strategy, as this practice allows time for responders to perform forensics, eradication and recovery during an Incident. There are many different approaches to containment. In this post, we will be focusing on isolation—the ability to keep multiple targets separated so that each target only sees and affects itself—as a containment strategy.

I’ll show you how to automate isolation of an Amazon Elastic Compute Cloud (Amazon EC2) instance by using an AWS Lambda function that’s triggered by tag changes on the instance, as reported by Amazon CloudWatch Events.

CloudWatch Event Rules deliver a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. See also Amazon EventBridge.

Preparing for an incident is important as outlined in the Security Pillar of the AWS Well-Architected Framework.

Out of the 7 Design Principles for Security in the Cloud, as per the Well-Architected Framework, this solution will cover the following:

  • Enable traceability: Monitor, alert, and audit actions and changes to your environment in real time. Integrate log and metric collection with systems to automatically investigate and take action.
  • Automate security best practices: Automated software-based security mechanisms can improve your ability to securely scale more rapidly and cost-effectively. Create secure architectures, including through the implementation of controls that can be defined and managed by AWS as code in version-controlled templates.
  • Prepare for security events: Prepare for an incident by implementing incident management and investigation policy and processes that align to your organizational requirements. Run incident response simulations and use tools with automation to help increase your speed for detection, investigation, and recovery.

After detecting an event in the Detection phase and analyzing in the Analysis phase, you can automate the process of logically isolating an instance from a Virtual Private Cloud (VPC) in Amazon Web Services (AWS).

In this blog post, I describe how to automate EC2 instance isolation by using the tagging feature that a responder can use to identify instances that need to be isolated. A Lambda function then uses AWS API calls to isolate the instances by performing the actions described in the following sections.

Use cases

Your organization can use automated EC2 instance isolation for scenarios like these:

  • A security analyst wants to automate EC2 instance isolation in order to respond to security events in a timely manner.
  • A security manager wants to provide their first responders with a way to quickly react to security incidents without providing too much access to higher security features.

High-level architecture and design

The example solution in this blog post uses a CloudWatch Events rule to trigger a Lambda function. The CloudWatch Events rule is triggered when a tag is applied to an EC2 instance. The Lambda code triggers further actions based on the contents of the event. Note that the CloudFormation template includes the appropriate permissions to run the function.

The event flow is shown in Figure 1 and works as follows:

  1. The EC2 instance is tagged.
  2. The CloudWatch Events rule filters the event.
  3. The Lambda function is invoked.
  4. If the criteria are met, the isolation workflow begins.

When the Lambda function is invoked and the criteria are met, these actions are performed:

  1. Checks for IAM instance profile associations.
  2. If the instance is associated to a role, the Lambda function disassociates that role.
  3. Applies the isolation role that you defined during CloudFormation stack creation.
  4. Checks the VPC where the EC2 instance resides.
    • If there is no isolation security group in the VPC (if the VPC is new, for example), the function creates one.
  5. Applies the isolation security group.

Note: If you had a security group with an open ( outbound rule, and you apply this Isolation security group, your existing SSH connections to the instance are immediately dropped. On the other hand, if you have a narrower inbound rule that initially allows the SSH connection, the existing connection will not be broken by changing the group. This is known as Connection Tracking.

Figure 1: High-level diagram showing event flow

Figure 1: High-level diagram showing event flow

For the deployment method, we will be using an AWS CloudFormation Template. AWS CloudFormation gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.

The AWS CloudFormation template that I provide here deploys the following resources:

  • An EC2 instance role for isolation – this is attached to the EC2 Instance to prevent further communication with other AWS Services thus limiting the attack surface to your overall infrastructure.
  • An Amazon CloudWatch Events rule – this is used to detect changes to an AWS EC2 resource, in this case a “tag change event”. We will use this as a trigger to our Lambda function.
  • An AWS Identity and Access Management (IAM) role for Lambda functions – this is what the Lambda function will use to execute the workflow.
  • A Lambda function for automation – this function is where all the decision logic sits, once triggered it will follow a set of steps described in the following section.
  • Lambda function permissions – this is used by the Lambda function to execute.
  • An IAM instance profile – this is a container for an IAM role that you can use to pass role information to an EC2 instance.

Supporting functions within the Lambda function

Let’s dive deeper into each supporting function inside the Lambda code.

The following function identifies the virtual private cloud (VPC) ID for a given instance. This is needed to identify which security groups are present in the VPC.

def identifyInstanceVpcId(instanceId):
    instanceReservations = ec2Client.describe_instances(InstanceIds=[instanceId])['Reservations']
    for instanceReservation in instanceReservations:
        instancesDescription = instanceReservation['Instances']
        for instance in instancesDescription:
            return instance['VpcId']

The following function modifies the security group of an EC2 instance.

def modifyInstanceAttribute(instanceId,securityGroupId):
    response = ec2Client.modify_instance_attribute(

The following function creates a security group on a VPC that blocks all egress access to the security group.

def createSecurityGroup(groupName, descriptionString, vpcId):
    resource = boto3.resource('ec2')
    securityGroupId = resource.create_security_group(GroupName=groupName, Description=descriptionString, VpcId=vpcId)
    securityGroupId.revoke_egress(IpPermissions= [{'IpProtocol': '-1','IpRanges': [{'CidrIp': ''}],'Ipv6Ranges': [],'PrefixListIds': [],'UserIdGroupPairs': []}])
    return securityGroupId.

Deploy the solution

To deploy the solution provided in this blog post, first download the CloudFormation template, and then set up a CloudFormation stack that specifies the tags that are used to trigger the automated process.

Download the CloudFormation template

To get started, download the CloudFormation template from Amazon S3. Alternatively, you can launch the CloudFormation template by selecting the following Launch Stack button:

Select the Launch Stack button to launch the template

Deploy the CloudFormation stack

Start by uploading the CloudFormation template to your AWS account.

To upload the template

  1. From the AWS Management Console, open the CloudFormation console.
  2. Choose Create Stack.
  3. Choose With new resources (standard).
  4. Choose Upload a template file.
  5. Select Choose File, and then select the YAML file that you just downloaded.
Figure 2: CloudFormation stack creation

Figure 2: CloudFormation stack creation

Specify stack details

You can leave the default values for the stack as long as there aren’t any resources provisioned already with the same name, such as an IAM role. For example, if left with default values an IAM role named “SecurityIsolation-IAMRole” will be created. Otherwise, the naming convention is fully customizable from this screen and you can enter your choice of name for the CloudFormation stack, and modify the parameters as you see fit. Figure 3 shows the parameters that you can set.

The Evaluation Parameters section defines the tag key and value that will initiate the automated response. Keep in mind that these values are case-sensitive.

Figure 3: CloudFormation stack parameters

Figure 3: CloudFormation stack parameters

Choose Next until you reach the final screen, shown in Figure 4, where you acknowledge that an IAM role is created and you trust the source of this template. Select the check box next to the statement I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create Stack.

Figure 4: CloudFormation IAM notification

Figure 4: CloudFormation IAM notification

After you complete these steps, the following resources will be provisioned, as shown in Figure 5:

  • EC2IsolationRole
  • EC2TagChangeEvent
  • IAMRoleForLambdaFunction
  • IsolationLambdaFunction
  • IsolationLambdaFunctionInvokePermissions
  • RootInstanceProfile
Figure 5: CloudFormation created resources

Figure 5: CloudFormation created resources


To start your automation, tag an EC2 instance using the tag defined during the CloudFormation setup. If you’re using the Amazon EC2 console, you can apply tags to resources by using the Tags tab on the relevant resource screen, or you can use the Tags screen, the AWS CLI or an AWS SDK. A detailed walkthrough for each approach can be found in the Amazon EC2 Documentation page.

Reverting Changes

If you need to remove the restrictions applied by this workflow, complete the following steps.

  1. From the EC2 dashboard, in the Instances section, check the box next to the instance you want to modify.

    Figure 6: Select the instance to modify

    Figure 6: Select the instance to modify

  2. In the top right, select Actions, choose Instance settings, and then choose Modify IAM role.

    Figure 7: Choose Actions > Instance settings > Modify IAM role

    Figure 7: Choose Actions > Instance settings > Modify IAM role

  3. Under IAM role, choose the IAM role to attach to your instance, and then select Save.

    Figure 8: Choose the IAM role to attach

    Figure 8: Choose the IAM role to attach

  4. Select Actions, choose Networking, and then choose Change security groups.

    Figure 9: Choose Actions > Networking > Change security groups

    Figure 9: Choose Actions > Networking > Change security groups

  5. Under Associated security groups, select Remove and add a different security group with the access you want to grant to this instance.


Using the CloudFormation template provided in this blog post, a Security Operations Center analyst could have only tagging privileges and isolate an EC2 instance based on this tag. Or a security service such as Amazon GuardDuty could trigger a lambda to apply the tag as part of a workflow. This means the Security Operations Center analyst wouldn’t need administrative privileges over the EC2 service.

This solution creates an isolation security group for any new VPCs that don’t have one already. The security group would still follow the naming convention defined during the CloudFormation stack launch, but won’t be part of the provisioned resources. If you decide to delete the stack, manual cleanup would be required to remove these security groups.

This solution terminates established inbound Secure Shell (SSH) sessions that are associated to the instance, and isolates the instance from new connections either inbound or outbound. For outbound connections that are already established (for example, reverse shell), you either need to shut down the network interface card (NIC) at the operating system (OS) level, restart the instance network stack at the OS level, terminate the established connections, or apply a network access control list (network ACL).

For more information, see the following documentation:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Jose Obando

Jose is a Security Consultant on the Global Financial Services team. He helps the world’s top financial institutions improve their security posture in the cloud. He has a background in network security and cloud architecture. In his free time, you can find him playing guitar or training in Muay Thai.

Analyze and understand IAM role usage with Amazon Detective

Post Syndicated from Sheldon Sides original https://aws.amazon.com/blogs/security/analyze-and-understand-iam-role-usage-with-amazon-detective/

In this blog post, we’ll demonstrate how you can use Amazon Detective’s new role session analysis feature to investigate security findings that are tied to the usage of an AWS Identity and Access Management (IAM) role. You’ll learn about how you can use this new role session analysis feature to determine which Amazon Web Services (AWS) resource assumed the role that triggered a finding, and to understand the context of the activities that the resource performed when the finding was triggered. As a result of this walkthrough, you’ll gain an understanding of how to quickly ascertain anomalous identity and access behaviors. While this demonstration utilizes an Amazon GuardDuty finding as a starting point, the techniques demonstrated within this post highlight how Detective can be utilized to investigate any access behaviors that are tied to using IAM roles.

IAM roles provide a valuable mechanism that you can use to delegate access to users and services for managing and accessing your AWS resources, but using IAM roles can make it more complex to determine who performed an action. AWS CloudTrail logs do track all usage tied to IAM roles, but attributing activity to a specific resource that assumed a role requires storage of CloudTrail logs and analysis of this log telemetry. Understanding role usage through log analysis gets even more complex if cross-account role assumptions are involved, since that requires you to collate and analyze logs from multiple accounts. In some cases, permissions may allow a resource to sequentially assume a series of different roles (role chaining), further complicating the attribution of activity to a specific resource.

With its built-in, multi-account log analysis, Detective’s new role session analysis feature provides visibility into role usage, cross-account role assumptions and into any role chaining activities that may have been performed across the accounts. With this feature, you can quickly determine who or what assumed a role, regardless of whether this was a federated, IAM user or other resource. The feature shows you when roles were assumed and for how long, and helps you determine the activities that were performed during the assumption. Detective visualizes these results based upon its automatic analysis of CloudTrail logs and VPC flow log traffic that it continuously processes for enabled accounts, regardless of whether these log sources are enabled on each account.

To demonstrate this feature, we’ll investigate a “CloudTrail logging disabled” finding that is triggered by Amazon GuardDuty as a result of activity performed by a resource that has assumed an IAM role. Amazon GuardDuty is an AWS service that continuously monitors for malicious or unauthorized behavior to help protect your AWS resources, including your AWS accounts, access keys, and EC2 instances. GuardDuty identifies unusual or unauthorized activity, like crypto-currency mining, access to data stored in S3 from unusual locations, or infrastructure deployments in a region that has never been used.

Start the investigation in GuardDuty

GuardDuty issues a CloudTrailLoggingDisabled finding to alert you that CloudTrail logging has been disabled in one of your accounts. This is an important finding, because it could indicate that an attacker is attempting to hide their tracks. Since Detective receives a copy of CloudTrail traffic directly from the AWS infrastructure, Detective will continue to receive API calls that are made after CloudTrail logging is disabled.

In order to properly investigate this type of finding and determine if this is an issue that you need to be concerned about, you’ll need to answer a few specific questions:

  1. You’ll need to determine which user or resource disabled CloudTrail.
  2. You’ll need to see what other actions they performed after disabling logging.
  3. You’ll want to understand if their access pattern and behavior is consistent with their previous access patterns and behaviors.

Let’s take a look at a CloudTrailLoggingDisabled finding in GuardDuty as we start trying to answer these questions. When you access the GuardDuty console, a list of your recent findings is displayed. In Figure 1, a filter has been applied to display the CloudTrailLoggingDisabled finding.

Figure 1: A GuardDuty finding showing that CloudTrail was disabled

Figure 1: A GuardDuty finding showing that CloudTrail was disabled

After you select the GuardDuty finding, you can see the finding details, including some of the user information related to the finding. Figure 2 shows the Resources affected section of the finding.

Figure 2: Viewing user data related to the GuardDuty finding

Figure 2: Viewing user data related to the GuardDuty finding

The Affected resources field indicates that the demo-trail-2 trail was where logging was disabled. You can also see that User type is set to AssumedRole and that User name contains the role AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5. This was the role that was assumed and using which CloudTrail logging was disabled. This information can help you understand the resources this role delegates access to and the permissions it provides. You still need to identify who specifically assumed the role to disable CloudTrail logging and the activities they performed afterwards. You can use Amazon Detective to answer these questions.

Investigate the finding in Detective

In order to investigate this GuardDuty finding in Detective, you select the finding and then select Investigate in the Actions menu, as shown in Figure 3.

Figure 3: Choose 'Investigate with Detective' and select the GuardDuty finding ID on the pop-up to investigate the finding

Figure 3: Choose ‘Investigate with Detective’ and select the GuardDuty finding ID on the pop-up to investigate the finding

View the finding profile page

Choosing the Investigate action for this CloudTrailLoggingDisabled finding in GuardDuty opens the finding’s profile page in the Detective console, as shown in Figure 4. Detective has the concept of a profile page, which displays summaries and analytics gleaned from CloudTrail management logs, VPC flow traffic and GuardDuty findings for AWS resources, IP addresses, and user agents. Each profile page can display up to 12 months of information for the selected resource and is intended to help an investigator review and understand the behavior of a resource, or quickly triage and delve into potential issues. Detective doesn’t require a customer to enable CloudTrail or VPC Flog logging in order to retrieve this data and provides these 12 months of visibility regardless of the customers log retention or archiving policies.

Figure 4: Viewing a GuardDuty finding in Detective

Figure 4: Viewing a GuardDuty finding in Detective

Scope time

To help focus your investigation, Detective defaults the time range and thus the displayed information in a finding profile to cover the period of time from when the finding was created through when it was last updated. In the case of this finding, the scope time covers a 1-hour period of time. You can change the scope time by choosing the calendar icon at the top right of the page, if you want to examine additional information before or after the finding was created. The defaulted scope time is sufficient for this investigation, so we can leave it as-is.

Role session overview

Detective uses tabs to group information on profile pages, and for this finding it shows the role session overview tab by default. The role session represents the activities and behavior of the resource that assumed the role tied to our finding. In this case, the role was assumed by someone with the user name sara, as shown in the Assumed by field. (We’ll assume that the user’s first name is Sara.) By analyzing the role session information in the CloudTrail logs, Detective was able to immediately identify that sara was the user who disabled CloudTrail logging and caused the finding to be triggered. You now have an answer to the question of who did this action.

Before we move to answer our other questions about what Sara did after disabling logging and whether her behavior changed, let’s discuss role sessions in more detail. Every role session has a role session name, sara in this case, and a unique role session identifier. The role session identifier is the role ID of the role assumed and the role session name, concatenated together. Best practices dictate that for a specific role that’s assumed by a specific resource, the role session name represents the user name of the IAM or federated user, or includes other useful information about the resource that assumed the role (for more information, see the Naming of individual IAM role sessions blog post). In this case, because the best practices are being followed, Detective is able to track Sara’s activities and behavior each time she assumes the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role.

Detective tracks statistics such as when a role session was first observed (October in Sara’s case, for this role), as well as the actions performed and behavioral insights such as the geolocations where Sara initiated her role assumptions. Knowing that Sara has assumed this role before is useful, because you can now assess whether her usage of the role changed during the 1-hour window of the scope time that you’re looking at now, compared to all of her previous assumptions of this role.

Review changes in Sara’s access patterns and operations

Detective tracks changes in geographical access and operations on the New behavior tab. Let’s choose the New behavior tab for the role session to see this information, as displayed in Figure 5.

Figure 5: Viewing new role session behavior

Figure 5: Viewing new role session behavior

During a security investigation, determining that access patterns have changed can be helpful in highlighting malicious activity. Since Detective tracks Sara’s assumptions of the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role, it can show the location where Sara assumed the role and whether the current assumption took place from the same location as her previous ones.

In Figure 5, you can see that Sara has a history of assuming the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role from Bellevue, WA and Ashburn, VA, since those geographies are shown in blue. If she had assumed this role from a new location, you would see the new location indicated on the map in orange. Since the API calls being made by this user are from a previously observed location, it’s very unlikely that the user’s credentials were compromised. Making this determination through a manual analysis of CloudTrail logs would have been much more time consuming.

Other information that you can gather from the New behavior role session tab includes newly observed API calls, API calls with increased volume, newly observed autonomous system organizations, and newly observed user agents. It’s useful to be able to validate that the operations Sara performed during the current scope time are relatively consistent with the operations she has performed in the past. This helps us be more certain that it was indeed Sara who was conducting this activity.

Investigate Sara’s API activity

Now that we’ve determined that Sara’s access pattern and activities are consistent with previous behavior, let’s use Detective to look further into Sara’s activity to determine if she accidentally disabled CloudTrail logging or if there was possible malicious intent behind her action.

To investigate the user’s actions

  1. On the finding profile page, in the dropdown list at the top of the screen, select Overview: Role Session to go back to the Overview tab for the role session.

    Figure 6: Navigating to the 'Overview: Role Session' page

    Figure 6: Navigating to the ‘Overview: Role Session’ page

  2. Once you’re on the Overview tab, navigate to the Overall API call volume panel.
    Figure 7: Navigating to the Overall API call volume panel

    Figure 7: Navigating to the ‘Overall API call volume’ panel

    This panel displays a chart of the successful and failed API calls that Sara has made while she assumes the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role. The chart shows a black rectangle around activities that were performed during the CloudTrail findings scope time. It also displays historical activities and shows a baseline across the chart so that you can understand how actively she uses the permissions granted to her by assuming this role.

  3. Choose the display details for scope time button to retrieve the details of the API calls that were invoked by Sara during the scope time, so that you can determine her actions after she disabled CloudTrail logging.
    Figure 8: Displaying details based on scope time

    Figure 8: Displaying details based on scope time

    You will now see the Overall API call volume panel expand to show you all the IP addresses, API calls, and access keys used by Sara during the scope time window of this finding.

  4. Choose the API method tab to see a list of all the API calls that were made.
    Figure 9: Viewing the API methods called

    Figure 9: Viewing the API methods called

    She invoked just two API calls during this scope time: the StopLogging and AssumeRole API calls. You were already aware that Sara disabled CloudTrail logging, but you weren’t aware that she assumed another role. When a user assumes a role while they have another one assumed, this is called role chaining. Although role chaining can be used because a user needs additional permissions, it can also be used to hide activities. Because we don’t know what other actions Sara performed after assuming this second role, let’s dig further. That may shed light on why she chose to disable CloudTrail logging.

Examine chained role assumptions

To find out more about Sara’s use of role chaining, let’s look at the other role that she assumed during this role session.

To view the user’s other role

  1. Navigate back to the top of the finding profile page. In the Role session details panel, choose AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5.
    Figure 10: Locating the 'Assumed role' name

    Figure 10: Locating the ‘Assumed role’ name

    Detective displays the AWS Role profile page for this role, and you can now see the activity that has occurred across all resources that have assumed this role. In order to highlight information that’s relevant to the time frame of your investigation, Detective maintains your scope time as you move from the CloudTrailLoggingDisabled finding profile page to this role profile page.

  2. The goal for coming to this page is to determine which other role Sara assumed after assuming the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role, so choose the Resource interaction tab. On this tab, you will see the following three panels: Resources that assumed this role, Assumed roles, and Sessions involved.In Figure 11, you can see the Resources that assumed this role panel, which lists all the AWS resources that have assumed this role, their type (EC2 instance, federated or IAM user, IAM role), their account, and when they assumed the role for the first and last time. Sara is on this list, but Detective does not show an AWS account next to her because federated users aren’t tied to a specific account. The account field is populated for other resource types that are displayed on this panel and can be useful to understand cross-account role assumptions.

    Figure 11: Viewing resources that have assumed a role

    Figure 11: Viewing resources that have assumed a role

  3. On the same Resource Interaction tab, as you scroll down you will see the Assumed Roles panel, Figure 12, which helps you understand role chaining by listing the other roles that have been assumed by the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role. In this case, the role has assumed several other roles, including DemoRole1 during the same window of time when the CloudTrailLoggingDisabled finding occurred.

    Figure 12: Viewing the roles that have been assumed

    Figure 12: Viewing the roles that have been assumed

  4. In Figure 13, you can see the Sessions involved panel, which shows the role sessions for all the resources that have assumed this role, and role sessions where this role has assumed other roles within the current scope time. You see two role sessions with the session name sara, one where Sara assumed the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role and another where AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 assumed DemoRole1.

    Figure 13: Viewing the role sessions this role was involved with

    Figure 13: Viewing the role sessions this role was involved with

Now that you know that Sara also used the role DemoRole1 during her role session, let’s take a closer look at what actions she performed.

View API operations that were called within the chained role

In this step, we’ll view Sara’s activity within the DemoRole1 role, focusing on the API calls that were made.

To view the user’s activity in another role

  1. In the Sessions involved panel, in the Session name column, find the row where DemoRole1 is the Assumed Role value. Choose the session name in this row, sara, to go to the role session profile page.
  2. You will be most interested in the API methods that were called during this role session, and you can view those in the Overall API call volume panel. As shown in Figure 14, you can see that Sara has accessed DemoRole1 before, because there are calls graphed prior to the calls in our scope time.
  3. Choose the display details for scope time button on the Overall API call volume panel, and then choose the API method tab.

    Figure 14: Viewing the role session API method calls

    Figure 14: Viewing the role session API method calls

In Figure 14, you can see that calls were made to the DescribeInstances and RunInstances API methods. So you now know that Sara determined the type of Amazon Elastic Compute Cloud (Amazon EC2) instances that were running in your account and then successfully created an EC2 instance by calling the RunInstances API method. You can also see that successful and failed calls were made to the AttachRolePolicy API method as a part of the session. This could possibly be an attempt to elevate permissions in the account and would justify further investigation into the user’s actions.

As an investigator, you’ve determined that Sara was the user who disabled CloudTrail logging and that her access pattern was consistent with her past accesses. You’ve also determined the other actions she performed after she disabled logging and assumed a second role, but you can continue to investigate further by answering additional questions, such as:

  • What did Sara do with DemoRole1 when she assumed this role in the past? Are her current activities consistent with those past activities?
  • What activities are being performed across this account? Are those consistent with Sara’s activities?

By using Detective’s features that have been demonstrated in this post, you will be able to answer the questions like the ones listed above.


After you read this post, we hope you have a better understanding of the ways in which Amazon Detective collects, organizes, and presents log data to simplify your security investigations. All Detective service subscriptions include the new role session analysis capabilities. With these capabilities, you can quickly attribute activity performed under a role to a specific resource in your environment, understand cross-account role assumptions, determine role chaining behavior, and quickly see called APIs.

All customers receive a 30-day free trial when they enable Amazon Detective. See the AWS Regional Services page for all the Regions where Detective is available. To learn more, visit the Amazon Detective product page or see the additional resources at the end of this post to further expand your knowledge of Detective capabilities and features.

Additional resources

Amazon Detective features

Amazon Detective overview and demo

Amazon Detective FAQs

Amazon Detective Regions, endpoints, and quotas

Naming of individual IAM role sessions

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Sheldon Sides

Sheldon is a Senior Solutions Architect, focused on helping customers implement native AWS security services. He enjoys using his experience as a consultant and running a cloud security startup to help customers build secure AWS Cloud solutions. His interests include working out, software development, and learning about the latest technologies.


Gagan Prakash

Gagan is a Product Manager on the Amazon Detective team and is passionate about ensuring that Detective’s new features and capabilities address the needs of our customers. He leverages his past experiences in cybersecurity startups and software companies to help drive product improvements. His interests include spending time with family, reading, and learning more about the world at large.