Tag Archives: Advanced (300)

How to set up federated single sign-on to AWS using Google Workspace

Post Syndicated from Wei Chen original https://aws.amazon.com/blogs/security/how-to-set-up-federated-single-sign-on-to-aws-using-google-workspace/

Organizations who want to federate their external identity provider (IdP) to AWS will typically do it through AWS Single Sign-On (AWS SSO), AWS Identity and Access Management (IAM), or use both. With AWS SSO, you configure federation once and manage access to all of your AWS accounts centrally. With AWS IAM, you configure federation to each AWS account, and manage access individually for each account. AWS SSO supports identity synchronization through the System for Cross-domain Identity Management (SCIM) v2.0 for several identity providers. For IdPs not currently supported, you can provision users manually. Otherwise, you can choose to federate to AWS from Google Workspace through IAM federation, which this post will cover below.

Google Workspace offers a single sign-on service based off of the Security Assertion Markup Language (SAML) 2.0. Users can use this service to access to your AWS resources by using their existing Google credentials. For users to whom you grant access, they will see an additional SAML app in their Google Workspace console. When your users choose this SAML app, they will be redirected to www.google.com the AWS Management Console.

Solution Overview

In this solution, you will create a SAML identity provider in IAM to establish a trusted communication channel across which user authentication information may be securely passed with your Google IdP in order to permit your Google Workspace users to access the AWS Management Console. You, as the AWS administrator, delegate responsibility for user authentication to a trusted IdP, in this case Google Workspace. Google Workspace leverages SAML 2.0 messages to communicate user authentication information between Google and your AWS account. The information contained within the SAML 2.0 messages allows an IAM role to grant the federated user permissions to sign in to the AWS Management Console and access your AWS resources. The IAM policy attached to the role they select determines which permissions the federated user has in the console.

Figure 1: Login process for IAM federation

Figure 1: Login process for IAM federation

Figure 1 illustrates the login process for IAM federation. From the federated user’s perspective, this process happens transparently: the user starts at the Google Workspace portal and ends up at the AWS Management Console, without having to supply yet another user name and password.

  1. The portal verifies the user’s identity in your organization. The user begins by browsing to your organization’s portal and selects the option to go to the AWS Management Console. In your organization, the portal is typically a function of your IdP that handles the exchange of trust between your organization and AWS. In Google Workspace, you navigate to https://myaccount.google.com/ and select the nine dots icon on the top right corner. This will show you a list of apps, one of which will log you in to AWS. This blog post will show you how to configure this custom app.
    Figure 2: Google Account page

    Figure 2: Google Account page

  2. The portal verifies the user’s identity in your organization.
  3. The portal generates a SAML authentication response that includes assertions that identify the user and include attributes about the user. The portal sends this response to the client browser. Although not discussed here, you can also configure your IdP to include a SAML assertion attribute called SessionDuration that specifies how long the console session is valid. You can also configure the IdP to pass attributes as session tags.
  4. The client browser is redirected to the AWS single sign-on endpoint and posts the SAML assertion.
  5. The endpoint requests temporary security credentials on behalf of the user, and creates a console sign-in URL that uses those credentials.
  6. AWS sends the sign-in URL back to the client as a redirect.
  7. The client browser is redirected to the AWS Management Console. If the SAML authentication response includes attributes that map to multiple IAM roles, the user is first prompted to select the role for accessing the console.

The list below is a high-level view of the specific step-by-step procedures needed to set up federated single sign-on access via Google Workspace.

The setup

Follow these top-level steps to set up federated single sign-on to your AWS resources by using Google Apps:

  1. Download the Google identity provider (IdP) information.
  2. Create the IAM SAML identity provider in your AWS account.
  3. Create roles for your third-party identity provider.
  4. Assign the user’s role in Google Workspace.
  5. Set up Google Workspace as a SAML identity provider (IdP) for AWS.
  6. Test the integration between Google Workspace and AWS IAM.
  7. Roll out to a wider user base.

Detailed procedures for each of these steps compose the remainder of this blog post.

Step 1. Download the Google identity provider (IdP) information

First, let’s get the SAML metadata that contains essential information to enable your AWS account to authenticate the IdP and locate the necessary communication endpoint locations:

  1. Log in to the Google Workspace Admin console
  2. From the Admin console Home page, select Security > Settings > Set up single sign-on (SSO) with Google as SAML Identity Provider (IdP).
    Figure 3: Accessing the "single sign-on for SAML applications" setting

    Figure 3: Accessing the “single sign-on for SAML applications” setting

  3. Choose Download Metadata under IdP metadata.
    Figure 4: The "SSO with Google as SAML IdP" page

    Figure 4: The “SSO with Google as SAML IdP” page

Step 2. Create the IAM SAML identity provider in your account

Now, create an IAM IdP for Google Workspace in order to establish the trust relationship between Google Workspace and your AWS account. The IAM IdP you create is an entity within your AWS account that describes the external IdP service whose users you will configure to assume IAM roles.

  1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
  2. In the navigation pane, choose Identity providers and then choose Add provider.
  3. For Configure provider, choose SAML.
  4. Type a name for the identity provider (such as GoogleWorkspace).
  5. For Metadata document, select Choose file then specify the SAML metadata document that you downloaded in Step 1–c.
  6. Verify the information that you have provided. When you are done, choose Add provider.
    Figure 5: Adding an Identity provider

    Figure 5: Adding an Identity provider

  7. Document the Amazon Resource Name (ARN) by viewing the identity provider you just created in step f. The ARN should looks similar to this:

    arn:aws:iam::123456789012:saml-provider/GoogleWorkspace

Step 3. Create roles for your third-party Identity Provider

For users accessing the AWS Management Console, the IAM role that the user assumes allows access to resources within your AWS account. The role is where you define what you allow a federated user to do after they sign in.

  1. To create an IAM role, go to the AWS IAM console. Select Roles > Create role.
  2. Choose the SAML 2.0 federation role type.
  3. For SAML Provider, select the provider which you created in Step 2.
  4. Choose Allow programmatic and AWS Management Console access to create a role that can be assumed programmatically and from the AWS Management Console.
  5. Review your SAML 2.0 trust information and then choose Next: Permissions.
    Figure 6: Reviewing your SAML 2.0 trust information

    Figure 6: Reviewing your SAML 2.0 trust information

GoogleSAMLPowerUserRole:

  1. For this walkthrough, you are going to create two roles that can be assumed by SAML 2.0 federation. For GoogleSAMLPowerUserRole, you will attach the PowerUserAccess AWS managed policy. This policy provides full access to AWS services and resources, but does not allow management of users and groups. Choose Filter policies, then select AWS managed – job function from the dropdown. This will show a list of AWS managed policies designed around specific job functions.
    Figure 7: Selecting the AWS managed job function

    Figure 7: Selecting the AWS managed job function

  2. To attach the policy, select PowerUserAccess. Then choose Next: Tags, then Next: Review.
    Figure 8: Attaching the PowerUserAccess policy to your role

    Figure 8: Attaching the PowerUserAccess policy to your role

  3. Finally, choose Create role to finalize creation of your role.
    Figure 9: Creating your role

    Figure 9: Creating your role

GoogleSAMLViewOnlyRole

Repeat steps a to g for the GoogleSAMLViewOnlyRole, attaching the ViewOnlyAccess AWS managed policy.

Figure 10: Creating the GoogleSAMLViewOnlyRole

Figure 10: Creating the GoogleSAMLViewOnlyRole

Figure 11: Attaching the ViewOnlyAccess permissions policy

Figure 11: Attaching the ViewOnlyAccess permissions policy

  1. Document the ARN of both roles. The ARN should be similar to

    arn:aws:iam::123456789012:role/GoogleSAMLPowerUserRole and

    arn:aws:iam::123456789012:role/GoogleSAMLViewOnlyAccessRole.

Step 4. Assign the user’s role in Google Workspace

Here you will specify the role or roles that this user can assume in AWS.

  1. Log in to the Google Admin console.
  2. From the Admin console Home page, go to Directory > Users and select Manage custom attributes from the More dropdown, and choose Add Custom Attribute.
  3. Configure the custom attribute as follows:

    Category: AWS
    Description: Amazon Web Services Role Mapping

    For Custom fields, enter the following values:

    Name: AssumeRoleWithSaml
    Info type: Text
    Visibility: Visible to user and admin
    InNo. of values: Multi-value
  4. Choose Add. The new category should appear in the Manage user attributes page.
    Figure12: Adding the custom attribute

    Figure12: Adding the custom attribute

  5. Navigate to Users, and find the user you want to allow to federate into AWS. Select the user’s name to open their account page, then choose User Information.
  6. Select on the custom attribute you recently created, named AWS. Add two rows, each of which will include the values you recorded earlier, using the format below for each AssumeRoleWithSaml row.

    Row 1:
    arn:aws:iam::123456789012:role/GoogleSAMLPowerUserRole,arn:aws:iam:: 123456789012:saml-provider/GoogleWorkspace

    Row 2:
    arn:aws:iam::123456789012:role/GoogleSAMLViewOnlyAccessRole,arn:aws:iam:: 123456789012:saml-provider/GoogleWorkspace

    The format of the AssumeRoleWithSaml is constructed by using the RoleARN(from Step 3-h) + “,”+ Identity provider ARN (from Step 2-g), this value will be passed as SAML attribute value for attribute with name https://aws.amazon.com/SAML/Attributes/Role. The final result will look similar to below:

    Figure 13: Adding the roles that the user can assume

    Figure 13: Adding the roles that the user can assume

Step 5. Set up Google Workspace as a SAML identity provider (IdP) for AWS

Now you’ll set up the SAML app in your Google Workspace account. This includes adding the SAML attributes that the AWS Management Console expects in order to allow a SAML-based authentication to take place.

Log into the Google Admin console.

  1. From the Admin console Home page, go to Apps > Web and mobile apps.
  2. Choose Add custom SAML app from the Add App dropdown.
  3. Enter AWS Single-Account Access for App name and upload an optional App icon to identify your SAML application, and select Continue.
    Figure 14: Naming the custom SAML app and setting the icon

    Figure 14: Naming the custom SAML app and setting the icon

  4. Fill in the following values:

    ACS URL: https://signin.aws.amazon.com/saml
    Entity ID: urn:amazon:webservices
    Name ID format: EMAIL
    Name ID: Basic Information > Primary email

    Note: Your primary email will become your role’s AWS session name

  5. Choose CONTINUE.
    Figure 15: Adding the custom SAML app

    Figure 15: Adding the custom SAML app

  6. AWS requires the IdP to issue a SAML assertion with some mandatory attributes (known as claims). The AWS documentation explains how to configure the SAML assertion. In short, you need to create an assertion with the following:
    • An attribute of name https://aws.amazon.com/SAML/Attributes/Role. This element contains one or more AttributeValue elements that list the IAM identity provider and role to which the user is mapped by your IdP. The IAM role and IAM identity provider are specified as a comma-delimited pair of ARNs in the same format as the RoleArn and PrincipalArn parameters that are passed to AssumeRoleWithSAML.
    • An attribute of name https://aws.amazon.com/SAML/Attributes/RoleSessionName (again, this is just a definition of type, not an actual URL) with a string value. This is the federated user’s role session name in AWS.
    • A name identifier (NameId) that is used to identify the subject of a SAML assertion.

      Google Directory attributes App attributes
      AWS > AssumeRoleWithSaml https://aws.amazon.com/SAML/Attributes/Role
      Basic Information > Primary email https://aws.amazon.com/SAML/Attributes/RoleSessionName
      Figure 16: Mapping between Google Directory attributes and SAML attributes

      Figure 16: Mapping between Google Directory attributes and SAML attributes

  7. Choose FINISH and save the mapping.

Step 6. Test the integration between Google Workspace and AWS IAM

  1. Log into the Google Admin portal.
  2. From the Admin console Home page, go to Apps > Web and mobile apps.
  3. Select the Application you created in Step 5-i.
  4. At the top left, select TEST SAML LOGIN, then choose ALLOW ACCESS within the popup box.
    Figure 18: Testing the SAML login

    Figure 18: Testing the SAML login

  5. Select ON for everyone in the Service status section, and choose SAVE. This will allow every user in Google Workspace to see the new SAML custom app.
    Figure 19: Saving the custom app settings

    Figure 19: Saving the custom app settings

  6. Now navigate to Web and mobile apps and choose TEST SAML LOGIN again. Amazon Web Services should open in a separate tab and display two roles for users to choose from:
    FIgure 20: Testing SAML login again

    FIgure 20: Testing SAML login again

    Figure 21: Selecting the IAM role you wish to assume for console access

    Figure 21: Selecting the IAM role you wish to assume for console access

  7. Select the desired role and select Sign in.
  8. You should now be redirected to AWS Management Console home page.
  9. Google workspace users should now be able to access the AWS application from their workspace:
    Figure 22: Viewing the AWS custom app

    Figure 22: Viewing the AWS custom app

Conclusion

By following the steps in this blog post, you’ve configured your Google Workspace directory and AWS accounts to allow SAML-based federated sign-on for selected Google Workspace users. Using this over IAM users helps centralize identity management, making it easier to adopt a multi-account strategy.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Wei Chen

Wei Chen

Wei Chen is a Sr. Solutions Architect at Amazon Web Services, based in Austin, TX. He has more than 20 years of experience assisting customers with the building of solutions to significantly complex challenges. At AWS, Wei helps customers achieve their strategic business objectives by rearchitecting their applications to take full advantage of the cloud. He specializes on mastering the compliance frameworks, technical compliance programs, physical security, security processes, and AWS Security services.

Roy Tokeshi

Roy Tokeshi

Roy is a Solutions Architect for Amazon End User Computing. He enjoys making in AWS, CNC, laser engravers, and IoT. He likes to help customers build mechanisms to create business value.

Michael Chan

Michael Chan

Michael is a Solutions Architect for AWS Identity. He enjoys understanding customer problems with AWS IAM and working backwards to provide practical solutions.

Detecting security issues in logging with Amazon CodeGuru Reviewer

Post Syndicated from Brian Farnhill original https://aws.amazon.com/blogs/devops/detecting-security-issues-in-logging-with-amazon-codeguru-reviewer/

Amazon CodeGuru is a developer tool that provides intelligent recommendations for identifying security risks in code and improving code quality. To help you find potential issues related to logging of inputs that haven’t been sanitized, Amazon CodeGuru Reviewer now includes additional checks for both Python and Java. In this post, we discuss these updates and show examples of code that relate to these new detectors.

In December 2021, an issue was discovered relating to Apache’s popular Log4j Java-based logging utility (CVE-2021-44228). There are several resources available to help mitigate this issue (some of which are highlighted in a post on the AWS Public Sector blog). This issue has drawn attention to the importance of logging inputs in a way that is safe. To help developers understand where un-sanitized values are being logged, CodeGuru Reviewer can now generate findings that highlight these and make it easier to remediate them.

The new detectors and recommendations in CodeGuru Reviewer can detect findings in Java where Log4j is used, and in Python where the standard logging module is used. The following examples demonstrate how this works and what the recommendations look like.

Findings in Java

Consider the following Java sample that responds to a web request.

@RequestMapping("/example.htm")
public ModelAndView handleRequest(HttpServletRequest request, HttpServletResponse response) {
    ModelAndView result = new ModelAndView("success");
    String userId = request.getParameter("userId");
    result.addObject("userId", userId);

    // More logic to populate `result`.
     log.info("Successfully processed {} with user ID: {}.", request.getRequestURL(), userId);
    return result;
}

This simple example generates a result to the initial request, and it extracts the userId field from the initial request to do this. Before returning the result, the userId field is passed to the log.info statement. This presents a potential security issue, because the value of userId is not sanitized or changed in any way before it is logged. CodeGuru Reviewer is able to identify that the variable userId points to a value that needs to be sanitized before it is logged, as it comes from an HTTP request. All user inputs in a request (including query parameters, headers, body and cookie values) should be checked before logging to ensure a malicious user hasn’t passed values that could compromise your logging mechanism.

CodeGuru Reviewer recommends to sanitize user-provided inputs before logging them to ensure log integrity. Let’s take a look at CodeGuru Reviewer’s findings for this issue.

A screenshot of the AWS Console that describes the log injection risk found by CodeGuru Reviewer

An option to remediate this risk would be to add a sanitize() method that checks and modifies the value to remove known risks. The specific process of doing this will vary based on the values you expect and what is safe for your application and its processes. By logging the now sanitized value, you have mitigated those risks that could impact on your logging framework. The modified code sample below shows one example of how this could be addressed.

@RequestMapping("/example.htm")
public ModelAndView handleRequestSafely(HttpServletRequest request, HttpServletResponse response) {
    ModelAndView result = new ModelAndView("success");
    String userId = request.getParameter("userId");
    String sanitizedUserId = sanitize(userId);
    result.addObject("userId", sanitizedUserId);

    // More logic to populate `result`.
    log.info("Successfully processed {} with user ID: {}.", request.getRequestURL(), sanitizedUserId);
    return result;
}

private static String sanitize(String userId) {
    return userId.replaceAll("\\D", "");
}

The example now uses the sanitize() method, which uses a replaceAll() call that uses a regular expression to remove all non-digit characters. This example assumes the userId value should only be digit characters, ensuring that any other characters that could be used to expose a vulnerability in the logging framework are removed first.

Findings in Python

Now consider the following python code from a sample Flask project that handles a web request.

from flask import app, current_app, request

@app.route('/log')
def getUserInput():
    input = request.args.get('input')
    current_app.logger.info("User input: %s", input)

    # More logic to process user input.

In this example, the input variable is assigned the input query string value from a web request. Then, the Flask logger records its value as an info level message. This has the same challenge as the Java example above. However this time rather than changing the value, we can instead inspect it and choose to log it only when it is in a format we expect. A simple example of this could be where we expect only alphanumeric characters in the input variable. The isalnum() function can act as a simple test in this case. Here is an example of what this style of validation could look like.

from flask import app, current_app, request

@app.route('/log')
def safe_getUserInput():
    input = request.args.get('input')    
    if input.isalnum():
        current_app.logger.info("User input: %s", input)        
    else:
        current_app.logger.warning("Unexpected input detected")

Getting started

While log sanitization implementation is a long journey for many, it is a guardrail for maintaining your application’s log integrity. With CodeGuru Reviewer detecting log inputs that are neither sanitized nor validated, developers can use these recommendations as a guide to reduce risks related to log injection attacks. Additionally, you can provide feedback on recommendations in the CodeGuru Reviewer console or by commenting on the code in a pull request. This feedback helps improve the precision of CodeGuru Reviewer, so the recommendations you see get better over time.

To get started with CodeGuru Reviewer, you can leverage AWS Free Tier without any cost. For 90 days, you can review up to 100K lines of code in onboarded repositories per AWS account. For more information, please review the pricing page.

About the authors

Brian Farnhill

Brian Farnhill is a Software Development Engineer in the Australian Public Sector team. His background is in building solutions and helping customers improve DevOps tools and processes. When he isn’t working, you’ll find him either coding for fun or playing online games.

Jia Qin

Jia Qin is part of the Solutions Architect team in Malaysia. She loves developing on AWS, trying out new technology, and sharing her knowledge with customers. Outside of work, she enjoys taking walks and petting cats.

Fine-tune and optimize AWS WAF Bot Control mitigation capability

Post Syndicated from Dmitriy Novikov original https://aws.amazon.com/blogs/security/fine-tune-and-optimize-aws-waf-bot-control-mitigation-capability/

Introduction

A few years ago at Sydney Summit, I had an excellent question from one of our attendees. She asked me to help her design a cost-effective, reliable, and not overcomplicated solution for protection against simple bots for her web-facing resources on Amazon Web Services (AWS). I remember the occasion because with the release of AWS WAF Bot Control, I can now address the question with an elegant solution. The Bot Control feature now makes this a matter of switching it on to start filtering out common and pervasive bots that generate over 50 percent of the traffic against typical web applications.

Reduce Unwanted Traffic on Your Website with New AWS WAF Bot Control introduced AWS WAF Bot Control and some of its capabilities. That blog post covers everything you need to know about where to start and what elements it uses for configuration and protection. This post unpacks closely-related functionalities, and shares key considerations, best practices, and how to customize for common use cases. Use cases covered include:

  • Limiting the crawling rate of a bot leveraging labels and AWS WAF response headers
  • Enabling Bot Control only for certain parts of your application with scope down statements
  • Prioritizing verified bots or allowing only specific ones using labels
  • Inserting custom headers into requests from certain bots based on their labels

Key elements of AWS WAF Bot Control fine-tuning

Before moving on to precise configuration of the bot mitigation capability, it is important to understand the components that go into the process.

Labels

Although labels aren’t unique to Bot Control, the feature takes advantage of them, and many configurations use labels as the main input. A label is a string value that is applied to a request based on matching a rule statement. One way of thinking about them is as tags that belong to the specific request. The request acquires them after being processed by a rule statement, and can be used as identification of similar requests in all subsequent rules within the same web ACL. Labels enable you to act on a group of requests that meets specific criteria. That’s because the subsequent rules in the same web ACL have access to the generated labels and can match against them.

Labels go beyond just a mechanism for matching a rule. Labels are independent of a rule’s action, as they can be generated for Block, Allow, and Count. That opens up opportunities to filter or construct queries against records in AWS WAF logs based on labels, and so implement sophisticated analytics.

A label is a string made up of a prefix, optional namespace, and a name delimited by a colon. For example: prefix:[namespace:]name. The prefix is automatically added by AWS WAF.

AWS WAF Bot Control includes various labels and namespaces:

  • bot:category: Type of bot. For example, search_engine, content_fetcher
  • bot:name: Name of a specific bot (if available). For example, scrapy, mauibot, crawler4j
  • bot:verified: Verified bots are generally safe for web applications. For example, googlebot and linkedin. Bot Control performs validation to confirm that such bots come from the source that they claim, using the bot confirmation detection logic described later in this section.

    By default, verified bots are not blocked by Bot Control, but you can use a label to block them with a custom rule.

  • signal: attributes of the request indicate a bot activity. For example, non_browser_user_agent, automated_browser

These labels are added through managed bot detection logic, and Bot Control uses them to perform the following:

Known bot categorization: Comparing the request user-agent to known bots to categorize and allow customers to block by category. Bots are categorized by their function, such as scrapers, search engines, social media.

Bot confirmation: Most respectable bots provide a way to validate beyond the user-agent, typically by doing a reverse DNS lookup of the IP address to confirm the validity of domain and host names. These automatic checks will help you to ensure that only legitimate bots are allowed, and provide a signal to flag requests to downstream systems for bot detection.

Header validation: Request headers validation is performed against a series of checks to look for missing headers, malformed headers, or invalid headers.

Browser signature matching: TLS handshake data and request headers can be deconstructed and partially recombined to create a browser signature that identifies browser and OS combinations. This signature can be validated against the user-agent to confirm they match, and checked against lists of known-good browser known-bad browser signatures.

Below are a few examples of labels that Bot Control has. You can obtain the full list by calling the DescribeManagedRuleGroup API.

awswaf:managed:aws:bot-control:bot:category:search_engine
awswaf:managed:aws:bot-control:bot:name:scrapy
awswaf:managed:aws:bot-control:bot:verified
awswaf:managed:aws:bot-control:signal:non_browser_user_agent

Best practice to start with Bot Control

Although Bot Control can be enabled and start protecting your web resources with the default Block action, you can switch all rules in the rule group into a Count action at the beginning. This accomplishes the following:

  • Avoids false positives with requests that might match one of the rules in Bot Control but still be a valid bot for your resource.
  • Allows you to accumulate enough data points in the form of labels and actions on requests with them, if some of the requests matched rules in Bot Control. That enables you to make informed decisions on constructing rules for each desired bot or category and when switching them into a default action is appropriate.

Labels can be looked up in Amazon CloudWatch metrics and AWS WAF logs, and as soon as you have them, you can start planning whether exceptions or any custom rules are needed to cater for a specific scenario. This blog post explores examples of such use cases in the Common use cases sections below.

Additionally, as AWS WAF processes rules in sequential order, you should consider where the Bot Control rule group is located in your web ACL. To filter out requests that you confidently consider unwanted, you can place AWS Managed Rules rule groups—such as the Amazon IP reputation list—before the Bot Control rule group in the evaluation order. This decreases the number of requests processed by Bot Control, and makes it more cost effective. Simultaneously, Bot Control should be early enough in the rules to:

  • Enable label generation for downstream rules. That also provides higher visibility as a side benefit.
  • Decrease false positives by not blocking desired bots before they reach Bot Control.

AWS WAF Bot Control fine-tuning wouldn’t be complete and configurable without a set of recently released features and capabilities of AWS WAF. Let’s unpack them.

How to work with labels in CloudWatch metrics and AWS WAF logs

Generated labels generate CloudWatch metrics and are placed into AWS WAF logs. It enables you to see what bots and categories hit your website, and the labels associated with them that you can use for fine tuning.

CloudWatch metrics are generated with the following dimensions and metrics.

  • Region dimension is available for all Regions except Amazon CloudFront. When web ACL is associated with CloudFront, metrics are in the Northern Virginia Region.
  • WebACL dimension is the name of the WebACL
  • Namespace is the fully qualified namespace, including the prefix
  • LabelValue is the label name
  • Action is the terminating action (for example, Allow, Block, Count)

AWS WAF includes a shortcut to associated CloudWatch metrics at the top of the Overview page, as shown in Figure 1.

Figure 1: Title and description of the chart in AWS WAF with a shortcut to CloudWatch

Figure 1: Title and description of the chart in AWS WAF with a shortcut to CloudWatch

Alternatively, you can find them in the WAFV2 service category of the CloudWatch Metrics section.

CloudWatch displays generated labels and the volume across dates and times, so you can evaluate and make informed decisions to structure the rules or address false positives. Figure 2 illustrates what labels were generated for requests from bots that hit my website. This example configured only a couple of explicit Allow actions, so most of them were blocked. The top section of the figure 2 shows the load from two selected labels.

Figure 2: WAFV2 CloudWatch metrics for generated Label Namespaces

Figure 2: WAFV2 CloudWatch metrics for generated Label Namespaces

In AWS WAF logs, generated labels are included in an array under the field labels. Figure 3 shows an example request with the labels array at the bottom.

Figure 3: An example of an AWS WAF log record

Figure 3: An example of an AWS WAF log record

This example shows three labels generated for the same request. Uptimerobot follows the monitoring category label, and combining these two labels is useful to provide flexibility for configurations based on them. You can use the whole category, or be laser-focused using the label of the specific bot. You will see how and why that matters later in this blog post. The third label, non_browser_user_agent, is a signal of forwarded requests that have extra headers. For protection from bots in conjunction with labels, you can construct extra scanning in your application for certain requests.

Scope-down statements

Given that Bot Control is a premium feature and is a paid AWS Managed Rules, the ability to keep your costs in control is crucial. The scope-down statement allows you to optimize for cost by filtering out any traffic that doesn’t require inspection by Bot Control.

To address this goal, you can use scope down statements that can be applied to two broad scenarios.

You can exclude certain parts of your resource from scanning by Bot Control. Think of parts of your web site that you don’t mind being accessed by bots, typically that would be static content, such as images and CSS files. Leaving protection on everything else, such as APIs and login pages. You can also exclude IP ranges that can be considered safe from bot management. For example, traffic that’s known to come from your organization or viewers that belong to your partners or customers.

Alternatively, you can look at this from a different angle, and only apply bot management to a small section of your resources. For example, you can use Bot Control to protect a login page, or certain sensitive APIs, leaving everything else outside of your bot management.

With all of these tools in our toolkit let’s put them into perspective and dive deep into use cases and scenarios.

Common use cases for AWS WAF Bot Control fine-tuning

There are several methods for fine tuning Bot Control to better meet your needs. In this section, you’ll see some of the methods you can use.

Limit the crawling rate

In some cases, it is necessary to allow bots access to your websites. A good example is search engine bots, that crawl the web and create an index. If optimization for search engines is important for your business, but you notice excessive load from too many requests hitting your web resource, you might face a dilemma of how to slow crawlers down without unnecessarily blocking them. You can solve this with a combination of Bot Control detection logic and a rate-based rule with a response status code and header to communicate your intention back to crawlers. Most crawlers that are deemed useful have a built-in mechanism to decrease their crawl rate when you detect and respond to increased load.

To customize bot mitigation and set the crawl rate below limits that might negatively affect your web resource

  1. In the AWS WAF console, select Web ACLs from the left menu. Open your web ACL or follow the steps to create a web ACL.
  2. Choose the Rules tab and select Add rules. Select Add managed rule groups and proceed with the following settings:
    1. In the AWS managed rule groups section, select the switch Add to web ACL to enable Bot Control in the web ACL. This also gives you labels that you can use in other rules later in the evaluation process inside the web ACL.
    2. Select Add rules and choose Save
  3. In the same web ACL, select Add rules menu and select Add my own rules and rule groups.
  4. Using the provided Rule builder, configure the following settings:
    1. Enter a preferred name for the rule and select Rate-based rule.
    2. Enter a preferred rate limit for the rule. For example, 500.

      Note: The rate limit is the maximum number of requests allowed from a single IP address in a five-minute period.

    3. Select Only consider requests that match the criteria in a rule statement to enable the scope-down statement to narrow the scope of the requests that the rule evaluates.
    4. Under the Inspect menu, select Has a label to focus only on certain types of bots.
    5. In the Match key field, enter one of the following labels to match based on broad categories, such as verified bots or all bots identified as scraping as illustrated on Figure 4:

      awswaf:managed:aws:bot-control:bot:verified
      awswaf:managed:aws:bot-control:bot:category:scraping_framework

    6. Alternatively, you can narrow down to a specific bot using its label:

      awswaf:managed:aws:bot-control:bot:name:Googlebot

      Figure 4: Label match rule statement in a rule builder with a specific match key

      Figure 4: Label match rule statement in a rule builder with a specific match key

  5. In the Action section, configure the following settings:
    1. Select Custom response to enable it.
    2. Enter 429 as the Response code to indicate and communicate back to the bot that it has sent too many requests in a given amount of time.
    3. Select Add new custom header and enter Retry-After in the Key field and a value in seconds for the Value field. The value indicates how many seconds a bot must wait before making a new request.
  6. Select Add rule.
  7. It’s important to place the rule after the Bot Control rule group inside your web ACL, so that the label is available in this custom rule.
    1. In the Set rule priority section, check that the new rate-based rule is under the existing Bot Control rule set and if not, choose the newly created rule and select Move up or Move down until the rule is located after it.
    2. Select Save.
Figure 5: AWS WAF rule action with a custom response code

Figure 5: AWS WAF rule action with a custom response code

With the preceding configuration, Bot Control sets required labels, which you then use in the scope-down statement in a rate-based rule to not only establish a ceiling of how many requests you will allow from specific bots, but also communicate to bots when their crawling rate is too high. If they don’t respect the response and lower their rate, the rule will temporarily block them, protecting your web resource from being overwhelmed.

Note: If you use a category label, such as scraping_framework, all bots that have that label will be counted by your rate-based rule. To avoid unintentional blocking of bots that use the same label, you can either narrow down to a specific bot with a precise bot:name: label, or select a higher rate limit to allow a greater margin for the aggregate.

Enable Bot Control only for certain parts of your application

As mentioned earlier, excluding parts of your web resource from Bot Control protection is a mechanism to reduce the cost of running the feature by focusing only on a subset of the requests reaching a resource. There are a few common scenarios that take advantage of this approach.

To run Bot Control only on dynamic parts of your traffic

  1. In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL.
  2. Choose the Rules tab and select Add rules. Then select Add managed rule groups to proceed with the following settings:
    1. In the AWS managed rule groups section, select Add to web ACL to enable Bot Control in the web ACL.
    2. Select Edit.
  3. Select Scope-down statement – optional and select Enable Scope-down statement.
  4. In If a request, select doesn’t match the statement (NOT).
  5. In the Statement section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type, choose Starts with string.
    3. Depending on the structure of your resource, you can enter a whole URI string—such as images/—in the String to match field. The string will be excluded from Bot Control evaluation.
    Figure 6: A scope-down statement to match based on a string that a URI path starts with

    Figure 6: A scope-down statement to match based on a string that a URI path starts with

  6. Select Save rule.

An alternative to using string matching

As an alternative to a string match type, you can use a regex pattern set. If you don’t have a regex pattern set, create one using the following guide.

Note: This pattern matches most common file extensions associated with static files for typical web resources. You can customize the pattern set if you have different file types.

  1. Follow steps 1-4 of the previous procedure.
  2. In the Statement section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type, choose Matches pattern from regex pattern set and select your created set in the Regex pattern set. as illustrated in Figure 7.
    3. In Regex pattern set, enter the pattern
      (?i)\.(jpe?g|gif|png|svg|ico|css|js|woff2?)$

      Figure 7: A scope-down statement to match based on a regex pattern set as part of a URI path

      Figure 7: A scope-down statement to match based on a regex pattern set as part of a URI path

To run Bot Control only on the most sensitive parts of your application.

Another option is to exclude almost everything, by only enabling the Bot Control on the most sensitive part of your application. For example, a login page.

Note: The actual URI path depends on the structure of your application.

  1. Inside the Scope-down statement, in the If a request menu, select matches the statement.
  2. In the Statement section:
    1. In the Inspect field, select URI path.
    2. For the Match type, select Contains string.
    3. In the String to match field, enter the string you want to match. For example, login as shown in the Figure 8.
  3. Choose Save rule.
    Figure 8: A scope-down statement to match based on a string within a URI path

    Figure 8: A scope-down statement to match based on a string within a URI path

To exclude more than one part of your application from Bot Control.

If you have more than one part to exclude, you can use an OR logical statement to list each part in a scope-down statement.

  1. Inside the Scope-down statement, in the If a request menu, select matches at least one of the statements (OR).
  2. In the Statement 1 section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type choose Contains string.
    3. In the String to match field enter a preferred value. For example, login.
  3. In the Statement 2 section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type choose Starts with string.
    3. In the String to match field enter a preferred URI value. For example, payment/.
  4. Select Save rule.

Figure 9 builds on the previous example of an exact string match by adding an OR statement to protect an API named payment.

Figure 9: A scope-down statement with OR logic for more sophisticated matching

Figure 9: A scope-down statement with OR logic for more sophisticated matching

Note: The visual editor on the console supports up to five statements. To add more, edit the JSON representation of the rule on the console or use the APIs.

Prioritize verified bots that you don’t want to block

Since verified bots aren’t blocked by default, in most cases there is no need to apply extra logic to allow them through. However, there are scenarios where other AWS WAF rules might match some aspects of requests from verified bots and block them. That can hurt some metrics for SEO, or prevent links from your website from properly propagating and displaying in social media resources. If this is important for your business, then you might want to ensure you protect verified bots by explicitly allowing them in AWS WAF.

To prioritize the verified bots category

  1. In the AWS WAF menu, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. The next steps assume you already have a Bot Control rule group enabled inside the web ACL.
  2. In the web ACL, select Add rules, and then select Add my own rules and rule groups.
  3. Using the provided Rule builder, configure the following settings:
    1. Enter a name for the rule in the Name field.
    2. Under the Inspect menu, select Has a label.
    3. In the Match key field, enter the following label to match based on the label that each verified bot has:

      awswaf:managed:aws:bot-control:bot:verified

    4. In the Action section, select Allow to confirm the action on a request match
  4. Select Add rule. It’s important to place the rule after the Bot Control rule group inside your web ACL, so that the bot:verified label is available in this custom rule. To complete this, configure the following steps:
    1. In the Set rule priority section, check that the rule you just created is listed immediately after the existing Bot Control rule set. If it’s not, choose the newly created rule and select Move up or Move down until the rule is located immediately after the existing Bot Control rule set.
    2. Select Save.
Figure 10: Label match rule statement in a Rule builder with a specific match key

Figure 10: Label match rule statement in a Rule builder with a specific match key

Allow a specific bot

Labels also enable you to single out the bot you don’t want to block from the category that is blocked. One of the common examples are third-party bots that perform monitoring of your web resources.

Let’s take a look at a scenario where UptimeRobot is used to allow a specific bot. The bot falls into a category that’s being blocked by default—bot:category:monitoring. You can either exclude the whole category, which can have a wider impact on resource than you want, or allow only UptimeRobot.

To explicitly allow a specific bot

  1. Analyze CloudWatch metrics or AWS WAF logs to find the bot that is being blocked and its associated labels. Unless you want to allow the whole category, the label you would be looking for is bot:name: The example that follows is based on the label awswaf:managed:aws:bot-control:bot:name:uptimerobot.

    From the logs, you can also verify which category the bot belongs to, which is useful for configuring Scope-down statements.

  2. In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. For the next steps, it’s assumed that you already have a Bot Control rule group enabled inside the webACL.
  3. Open the Bot Control rule set in the list inside your web ACL and choose Edit
  4. From the list of Rules find CategoryMonitoring and set to Count. This will prevent the default block action of the category.
  5. Select Scope-down statement – optional and select Scope-down statement. Then configure the following settings:
    1. Inside the Scope-down statement, in the If a request menu, choose matches all the statements (AND). This will allow you to construct the complex logic necessary to block the category but allow a specified bot.
    2. In the Statement 1 section under the Inspect menu select Has a label.
    3. In the Match key field, enter the label of the broad category that you set to count in step number 4. In this example, it is monitoring. This configuration will keep other bots from the category blocked:

      awswaf:managed:aws:bot-control:bot:category:monitoring

    4. In the Statement 2 section, select Negate statement results to allow you to exclude a specific bot.
    5. Under the Inspect menu, select Has a label.
    6. In the Match key field, enter the label that will uniquely identify the bot you want to explicitly allow. In this example, it’s uptimerobot with the following label:

      awswaf:managed:aws:bot-control:bot:name:uptimerobot

  6. Choose Save rule.
Figure 11: Label match rule statement with AND logic to single out a specific bot name from a category

Figure 11: Label match rule statement with AND logic to single out a specific bot name from a category

Note: This approach is the best practice for analyzing and, if necessary, addressing false positives situations. You can apply exclusion to any bot, or multiple bots, based on the unique bot:name: label.

Insert custom headers into requests from certain bots

There are situations when you want to further process or analyze certain requests. or implement logic that is provided by systems in the downstream. In such cases, you can use AWS WAF Bot Control to categorize the requests. Applications later in the process can then apply the intended logic on either a broad group of requests, such as all bots within a category, or as narrow as a certain bot.

To insert a custom header

  1. In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. The next steps assume that you already have Bot Control rule group enabled inside the webACL.
  2. Open the Bot Control rule set in the list inside your web ACL and choose Edit.
  3. From the list of Rules set the targeted category to Count.
  4. Choose Save rule.
  5. In the same web ACL, choose the Add rules menu and select Add my own rules and rule groups.
  6. Using the provided Rule builder, configure the following settings:
    1. Enter a name for the rule in the Name field.
    2. Under the Inspect menu, select Has a label.
    3. In the Match key field, enter the label to match either a targeted category or a bot. This example uses the security category label:
      awswaf:managed:aws:bot-control:bot:category:security
    4. In the Action section, select Count
    5. Open Custom request – optional and select Add new custom header
    6. Enter values in the Key and Value fields that correspond to the inserted custom header key-value pair that you want to use in downstream systems. The example in Figure 12 shows this configuration.
    7. Choose Add rule.

    AWS WAF prefixes your custom header names with x-amzn-waf- when it inserts them, so when you add abc-category, your downstream system sees it as x-amzn-waf-abc-category.

Figure 12: AWS WAF rule action with a custom header inserted by the service

Figure 12: AWS WAF rule action with a custom header inserted by the service

The custom rule located after Bot Control now inserts the header into any request that it labeled as coming from bots within the security category. Then the security appliance that is after AWS WAF acts on the requests based on the header, and processes them accordingly.

This implementation can serve other scenarios. For example, using your custom headers to communicate to your Origin to append headers that will explicitly prevent caching certain content. That makes bots always get it from the Origin. Inserted headers are accessible within AWS Lambda@Edge functions and CloudFront Functions, this opens up advanced processing scenarios.

Conclusion

This post describes the primary building blocks for using Bot Control, and how you can combine and customize them to address different scenarios. It’s not an exhaustive list of the use cases that Bot Control can be fine-tuned for, but hopefully the examples provided here inspire and provide you with ideas for other implementations.

If you already have AWS WAF associated with any of your web-facing resources, you can view current bot traffic estimates for your applications based on a sample of requests currently processed by the service. Visit the AWS WAF console to view the bot overview dashboard. That’s a good starting point to consider implementing learnings from this blog to improve your bot protection.

It is early days for the feature, and it will keep gaining more capabilities, stay tuned!

 
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on AWS WAF re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Dmitriy Novikov

Dmitriy Novikov

In his role as Senior Solutions Architect at Amazon Web Services, Dmitriy supports AWS customers to utilize emerging technologies for business value generation. He’s a technology enthusiast who gets a charge out of finding innovative solutions to complex security challenges. He enjoys sharing his learnings on architecture and best practices in blogs, whitepapers and public speaking events. Outside work, Dmitriy has a passion for reading and triathlon.

How to build a multi-Region AWS Security Hub analytic pipeline and visualize Security Hub data

Post Syndicated from David Hessler original https://aws.amazon.com/blogs/security/how-to-build-a-multi-region-aws-security-hub-analytic-pipeline/

AWS Security Hub is a service that gives you aggregated visibility into your security and compliance posture across multiple Amazon Web Services (AWS) accounts. By joining Security Hub with Amazon QuickSight—a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud—your senior leaders and decision-makers can use dashboards to empower data-driven decisions and facilitate a secure configuration of AWS resources

In organizations that operate at cloud scale, being able to summarize and perform trend analysis is key to identifying and remediating problems early, which leads to the overall success of the organization. Additionally, QuickSight dashboards can be embedded in dashboard and reporting platforms that leaders are already familiar with, making the dashboards even more user friendly.

With the solution in this blog post, you can provide leaders with cross-AWS Region views of data to enable decision-makers to assess the health and status of an organizations IT infrastructure at a glance. You also can enrich the dashboard with data sources not available to Security Hub. Finally, this solution allows you the flexibility to have multiple administrator accounts across several AWS organizations and combine them into a single view.

In this blog post, you will learn how to build an analytics pipeline of your Security Hub findings, summarize the data with Amazon Athena, and visualize the data via QuickSight using the following steps:

  • Deploy an AWS Cloud Development Kit (AWS CDK) stack that builds the infrastructure you need to get started.
  • Create an Athena view that summarizes the raw findings.
  • Visualize the summary of findings in QuickSight.
  • Secure QuickSight using best practices.

For a high-level discussion without code examples please see Visualize AWS Security Hub Findings using Analytics and Business Intelligence Tools.

Prerequisites

This blog post assumes that you:

  • Have a basic understanding of how to authenticate and access your AWS account.
  • Are able to run commands via a command line prompt on your local machine.
  • Have a basic understanding of Structured Query Language (SQL).

Solution overview

Figure 1 shows the flow of events and a high-level architecture diagram of the solution.

Figure 1. High level architecture diagram

Figure 1. High level architecture diagram

The steps shown in Figure 1 include:

  • Detect
  • Collect
  • Aggregate
  • Transform
  • Analyze
  • Visualize

Detect

AWS offers a number of tools to help detect security findings continuously. These tools fall into three types:

In this blog, you will use two built-in security standards of Security Hub—CIS AWS Foundations Benchmark controls and AWS Foundational Security Best Practices Standard—and a serverless Prowler scanner that acts as a third-party partner product. In cases where AWS Organizations is used, member accounts send these findings to the member account’s Security Hub

Collect

Within a region, security findings are centralized into a single administrator account using Security Hub.

Aggregate

Using the cross-Region aggregation feature within Security Hub, findings within each administrator account can be aggregated and continuously synchronized across multiple regions.

Ingest

Security Hub not only provides a comprehensive view of security alerts and security posture across your AWS accounts, it also acts as a data sink for your security tools. Any tool that can expose data via AWS Security Finding Format (ASFF) can use the BatchImportFindings API action to push data to Security Hub. For more details, see Using custom product integration to send findings to AWS Security Hub and Available AWS service integrations in the Security Hub User Guide.

Transform

Data coming out of Security Hub is exposed via Amazon EventBridge. Unfortunately, it’s not quite in a form that Athena can consume. EventBridge streams data through Amazon Kinesis Data Firehose directly to Amazon Simple Storage Service (Amazon S3). From Amazon S3, you can create an AWS Lambda function that flattens and fixes some of the column names, such as by removing special characters that Athena cannot recognize. The Lambda function then saves the results back to S3. Finally, an AWS Glue crawler dynamically discovers the schema of the data and creates or updates an Athena table.

Analyze

You will aggregate the raw findings data and create metrics along various grains or pivots by creating a simple yet meaningful Athena view. With Athena, you also can use views to join the data with other data sources, such as your organization’s configuration management database (CMDB) or IT service management (ITSM) system.

Visualize

Using QuickSight, you will register the data sources and build visualizations that can be used to identify areas where security can be improved or reduce risk. This post shares steps detailing how to do this in the Build QuickSight visualizations section below.

Use AWS CDK to deploy the infrastructure

In order to analyze and visualize security related findings, you will need to deploy the infrastructure required to detect, ingest, and transform those findings. You will use an AWS CDK stack to deploy the infrastructure to your account. To begin, review the prerequisites to make sure you have everything you need to deploy the CDK stack. Once the CDK stack is deployed, you can deploy the actual infrastructure. After the infrastructure has been deployed, you will build an Athena view and a QuickSight visualization.

Install the software to deploy the solution

For the solution in this blog post, you must have the following tools installed:

  • The solution in this blog post is written in Python, so you must install Python in addition to CDK. Instructions on how to install Python version 3.X can be found on their downloads page.
  • AWS CDK requires node.js. Directions on how to install node.js can found on the node.js downloads page.
  • This CDK application uses Docker for local bundling. Directions for using Docker can be found at Get Docker.
  • AWS CDK—a software-development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. To install CDK, visit AWS CDK Toolkit page.

To confirm you have the everything you need

  1. Confirm you are running version 1.108.0 or later of CDK.

    $ cdk ‐‐version

  2. Download the code from github by cloning the repository. cd into the clone directory.

    $ git clone [email protected]:aws-samples/aws-security-hub-analytic-pipeline.git

    $ cd aws-security-hub-analytic-pipeline

  3. Manually create a virtualenv.

    $ python3 -m venv .venv

  4. After the initialization process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

    $ source .venv/bin/activate

  5. If you’re using a Windows platform, use the following command to activate virtualenv:

    % .venv/Scripts/activate.bat

  6. Once the virtualenv is activated, you can install the required dependencies.

    $ pip install -r requirements.txt

Use AWS CDK to deploy the infrastructure into your account

The following steps use AWS CDK to deploy the infrastructure. This infrastructure includes the various scanners, Security Hub, EventBridge, and Kinesis Firehose streams. When complete, the raw Security Hub data will already be stored in an S3 bucket.

To deploy the infrastructure using AWS CDK

  1. If you’ve never used AWS CDK in the account you’re using or if you’ve never used CDK in the us-east-1, us-east-2, or us-west-1 Regions, you must bootstrap the regions via the command prompt.

    $ cdk bootstrap

  2. At this point, you can deploy the stack to your default AWS account via the command prompt.

    $ cdk deploy –all

  3. While cdk deploy is running, you will see the output in Figure 2. This is a prompt to ensure you’re aware that you’re making a security-relevant change and creating AWS Identity and Access Management (IAM) roles. Enter y when prompted to continue the deployment process:

    Figure 2. CDK approval prompt to create IAM roles

    Figure 2. CDK approval prompt to create IAM roles

  4. Confirm cdk deploy is finished. When the deployment is finished, you should see three stack ARNs. It will look similar to Figure 3.

    Figure 3. Final output of CDK deploy

    Figure 3. Final output of CDK deploy

As a result of the deployed CDK code, Security Hub and the Prowler scanner will automatically scan your account, process the data, and send it to S3. While it takes less than an hour for some data to be processed and searchable in Athena, we recommend waiting 24 hours before proceeding to the next steps, to ensure enough data is processed to generate useful visualizations. This is because the remaining steps roll-up findings by the hour. Also, it takes several minutes to get initial results from the Security Hub standards and up to an hour to get initial results from Prowler.

Build an Athena view

Now that you’re deployed the infrastructure to detect, ingest, and transform security related findings, it’s time to use an Athena view to accomplish the analyze portion of the solution. The following view aggregates the number of findings for a given day. Athena views can be used to summarize data or enrich it with data from other sources. Use the following steps to build a simple example view. For more information on creating Athena views, see Working with Views.

To build an Athena view

  1. Open the AWS Management Console and ensure that the Region is set to us-east-1 (Northern Virginia).
  2. Navigate to the Athena service. If you’ve never used this service, choose Get Started to navigate to the Query Editor screen. Otherwise, the Query Editor screen is the default view.
  3. If you’re new to Athena, you also need to set up a query result location.
    1. Choose Settings in the top right of the Query Editor screen to open the settings panel.
    2. Choose Select to select a query result location.

      Figure 4. Athena settings

      Figure 4. Athena settings

    3. Locate an S3 bucket in the list that starts with analyticsink-queryresults and choose the right-arrow icon.
    4. Choose Select to select a query results bucket.

      Figure 5. Select S3 location confirmation

      Figure 5. Select S3 location confirmation

  4. Select AwsDataCatalog as the Data source and security_hub_database as the Database. The Query Editor screen should look like Figure 6.

    Figure 6. Empty query editor

    Figure 6. Empty query editor

  5. Copy and paste the following SQL in the query window:

    CREATE OR REPLACE VIEW “security-hub-rolled-up-finding” AS
    SELECT

    “date_format”(“from_iso8601_timestamp”(updatedat), ‘%Y-%m-%d %H:00’) year_month_day
    , region
    , compliance_status
    , workflowstate
    , severity_label
    , COUNT(DISTINCT title) as cnt
    FROM
    security_hub_database.“security-hub-crawled-findings”
    GROUP BY “date_format”(“from_iso8601_timestamp”(updatedat), ‘%Y-%m-%d %H:00’), compliance_status, workflowstate, severity_label, region

  6. Choose the Run query button.

If everything is correct, you should see Query successful in the Results, as shown in Figure 7.

Figure 7. Creating an Athena view

Figure 7. Creating an Athena view

Build QuickSight visualizations

Now that you’ve deployed the infrastructure to detect, ingest, and transform security related findings, and have created an Athena view to analyze those findings, it’s time to use QuickSight to visualize the findings. To use QuickSight, you must first grant QuickSight permissions to access S3 and Athena. Next you create a QuickSight data source. Third, you will create a QuickSight analysis. (Optional) When complete, you can publish the analysis.

You will build a simple visualization that shows counts of findings over time separated by severity, though it’s also possible to use QuickSight to tell rich and compelling visual stories.

In order to use QuickSight, you need to sign up for a QuickSight subscription. Steps to do so can be found in Signing Up for an Amazon QuickSight Subscription.

The first thing you need to do once logged in to QuickSight is create the data source. If this is your first time logging in to the service, you will be greeted with an initial QuickSight page as shown in Figure 8.

Figure 8. Initial QuickSight page

Figure 8. Initial QuickSight page

Grant QuickSight access to S3 and Athena

While creating the Athena data source will enable QuickSight to query data from Athena, you also need to enable QuickSight to read from S3.

To grant QuickSight access to S3 and Athena

  1. Inside QuickSight, select your profile name (upper right). Choose Manage QuickSight, and then choose Security & permissions.
  2. Choose Add or remove.
  3. Ensure the checkbox next to Athena is selected.
  4. Ensure the checkbox next to Amazon S3 is selected.
  5. Choose Details and then choose Select S3 Buckets.
  6. Locate an S3 bucket in the list that starts with analyticsink-bucket and ensure the checkbox is selected.
    Figure 9. Example permissions

    Figure 9. Example permissions

  7. Choose Finish to save changes.

Create a QuickSight dataset

Once you’ve given QuickSight the necessary permissions, you can create a new dataset.

To create a QuickSight dataset

  1. Choose Datasets from the navigation pane at left. Then choose New Dataset.

    Figure 10. Dataset page

    Figure 10. Dataset page

  2. To create a new Athena connection profile, use the following steps:
    1. In the FROM NEW DATA SOURCES section, choose the Athena data source card.
    2. For Data source name, enter a descriptive name. For example: security-hub-rolled-up-finding.
    3. For Athena workgroup choose [ primary ].
    4. Choose Validate connection to test the connection. This also confirms encryption at rest.
    5. Choose Create data source.
  3. On the Choose your table screen, select:
    Catalog: AwsDataCatalog
    Database: security_hub_database
    Table: security-hub-rolled-up-finding
  4. Finally, select the Import to SPICE for quicker analytics option and choose Visualize.

Once you’re finished, the page to create your first analysis will automatically open. Figure 11 shows an example of the page.

Figure 11. Create an analysis page

Figure 11. Create an analysis page

Create a QuickSight analysis

A QuickSight analysis is more than just a visualization—it helps you uncover hidden insights and trends in your data, identify key drivers, and forecast business metrics. You can create rich analytic experiences with QuickSight. For more information, visit Working with Visuals in the QuickSight User Guide.

For simplicity, you’ll build a visualization that summarizes findings categories by severity and aggregated by hour.

To create a QuickSight analysis

  1. Choose Line Chart from the Visual Types.

    Figure 12. Visual types

    Figure 12. Visual types

  2. Select Fields. Figure 13 shows what your field wells should look like at the end of this step.
    1. Locate the year_month_day_hour field in the field list and drag it over to the X axis field well.
    2. Locate the cnt field in the field list and drag it over to the Value field well.
    3. Locate the severity_label field in the field list and drag it over to Color field well.

      Figure 13. Field wells

      Figure 13. Field wells

  3. Add Filters.
    1. Select Filter in the left navigation panel.

      Figure 14. Filters panel

      Figure 14. Filters panel

    2. Choose Create one… and select the compliance_status field.
    3. Expand the filter and clear NOT_AVAILABLE and PASSED (Note: depending on your data, you might not have all of these statuses).
    4. Choose Apply to apply the filter.

      Figure 15. Filtering out findings that are not failing

      Figure 15. Filtering out findings that are not failing

You should now see a visualization that looks like Figure 16, which shows a summary count of events and their severity.

Figure 16. Example visualization (note: this visualization has five days’ worth of data.)

Figure 16. Example visualization (note: this visualization has five days’ worth of data.)

Publish a QuickSight analysis dashboard (optional)

Publishing a dashboard is a great way to share reports with leaders. This two-step process allows you to share visualizations as a dashboard.

To publish a QuickSight analysis

  1. Choose Share on the application bar, then choose Publish dashboard.
  2. Select Publish new dashboard as, and then enter a dashboard name, such as Security Hub Findings by Severity.

You can also embed dashboards into web applications. This requires using the AWS SDK or through the AWS Command Line Interface (AWS CLI). For more information, see Embedding QuickSight Data Dashboards for Everyone.

Encouraged security posture in QuickSight

QuickSight has a number of security features. While the AWS Security section of the QuickSight User Guide goes into detail, here’s a summary of the standards that apply to this specific scenario. For more details see AWS security in Amazon QuickSight within the QuickSight user guide.

Clean up (optional)

When done, you can clean up QuickSight by removing the Athena view and the CDK stack. Follow the detailed steps below to clean up everything.

To clean up QuickSight

  1. Open the console and choose Datasets in the left navigation pane.
  2. Select security-hub-rolled-up-finding then choose Delete dataset.
  3. Confirm dataset deletion by choosing Delete.
  4. Choose Analyses from the left navigation pane.
  5. Choose the menu in the lower right corner of the security-hub-rolled-up-finding card.

    Figure 17. Example analysis card

    Figure 17. Example analysis card

  6. Select Delete and confirm Delete.

To remove the Athena view

  1. Paste the following SQL in the query window:

    DROP VIEW “security-hub-rolled-up-finding”

  2. Choose the Run query button.

To remove the CDK stack

  1. Run the following command in your terminal:

    cdk destroy

    Note: If you experience errors, you might need to reactivate your Python virtual environment by completing steps 3–5 of Use AWS CDK to deploy the infrastructure.

Conclusion

In this blog, you used Security Hub and QuickSight to deploy a scalable analytic pipeline for your security tools. Security Hub allowed you to join and collect security findings from multiple sources. With QuickSight, you summarized data for your senior leaders and decision-makers to give them the right data in real-time.

You ensured that your sensitive data remained protected by explicitly granting QuickSight the ability to read from a specific S3 bucket. By authorizing access only to the data sources needed to visualize your data, you ensure least privilege access. QuickSight supports many other AWS data sources, including Amazon RDS, Amazon Redshift, Lake Formation, and Amazon OpenSearch Service (successor to Amazon Elasticsearch Service). Because the data doesn’t live inside an Amazon Virtual Private Cloud (Amazon VPC), you didn’t need to grant access to any specific VPCs. Limiting access to VPCs is another great way to improve the security of your environment.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Security Hub forum. To start your 30-day free trial of Security Hub, visit AWS Security Hub.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

David Hessler

David Hessler

David is a senior cloud consultant with AWS Professional Services. He has over a decade of technical experience helping customers tackle their most challenging technical problems and providing tailor-made solutions using AWS services. He is passionate about DevOps, security automation, and how the two work together to allow customers to focus on what matters: their mission.

How to automate AWS account creation with SSO user assignment

Post Syndicated from Rafael Koike original https://aws.amazon.com/blogs/security/how-to-automate-aws-account-creation-with-sso-user-assignment/

Background

AWS Control Tower offers a straightforward way to set up and govern an Amazon Web Services (AWS) multi-account environment, following prescriptive best practices. AWS Control Tower orchestrates the capabilities of several other AWS services, including AWS Organizations, AWS Service Catalog, and AWS Single Sign-On (AWS SSO), to build a landing zone very quickly. AWS SSO is a cloud-based service that simplifies how you manage SSO access to AWS accounts and business applications using Security Assertion Markup Language (SAML) 2.0. You can use AWS Control Tower to create and provision new AWS accounts and use AWS SSO to assign user access to those newly-created accounts.

Some customers need to provision tens, if not hundreds, of new AWS accounts at one time and assign access to many users. If you are using AWS Control Tower, doing this requires that you provision an AWS account in AWS Control Tower, and then assign the user access to the AWS account in AWS SSO before moving to the next AWS account. This process adds complexity and time for administrators who manage the AWS environment while delaying users’ access to their AWS accounts.

In this blog post, we’ll show you how to automate creating multiple AWS accounts in AWS Control Tower, and how to automate assigning user access to the AWS accounts in AWS SSO, with the ability to repeat the process easily for subsequent batches of accounts. This solution simplifies the provisioning and assignment processes, while enabling automation for your AWS environment, and allows your builders to start using and experimenting on AWS more quickly.

Services used

This solution uses the following AWS services:

High level solution overview

Figure 1 shows the architecture and workflow of the batch AWS account creation and SSO assignment processes.

Figure 1: Batch AWS account creation and SSO assignment automation architecture and workflow

Figure 1: Batch AWS account creation and SSO assignment automation architecture and workflow

Before starting

This solution is configured to be deployed in the North Virginia Region (us-east-1). But you can change the CloudFormation template to run in any Region that supports all the services required in the solution.

AWS Control Tower Account Factory can take up to 25 minutes to create and provision a new account. During this time, you will be unable to use AWS Control Tower to perform actions such as creating an organizational unit (OU) or enabling a guardrail on an OU. As a recommendation, running this solution during a time period when you do not anticipate using AWS Control Tower’s features is best practice.

Collect needed information

Note: You must have already configured AWS Control Tower, AWS Organizations, and AWS SSO to use this solution.

Before deploying the solution, you need to first collect some information for AWS CloudFormation.

The required information you’ll need to gather in these steps is:

  • AWS SSO instance ARN
  • AWS SSO Identity Store ID
  • Admin email address
  • Amazon S3 bucket
  • AWS SSO user group ARN

Prerequisite information: AWS SSO instance ARN

From the web console

You can find this information under Settings in the AWS SSO web console as shown in Figure 2.

Figure 2: AWS SSO instance ARN

Figure 2: AWS SSO instance ARN

From the CLI

You can also get this information by running the following CLI command using AWS Command Line Interface (AWS CLI):

aws sso-admin list-instances

The output is similar to the following:

{
    "Instances": [
        {
        "InstanceArn": "arn:aws:sso:::instance/ssoins-abc1234567",
        "IdentityStoreId": "d-123456abcd"
        }
    ]
}

Make a note of the InstanceArn value from the output, as this will be used in the AWS SSO instance ARN.

Prerequisite information: AWS SSO Identity Store ID

This is available from either the web console or the CLI.

From the web console

You can find this information in the same screen as the AWS SSO Instance ARN, as shown in Figure 3.

Figure 3: AWS SSO identity store ID

Figure 3: AWS SSO identity store ID

From the CLI

To find this from the AWS CLI command aws sso-admin list-instances, use the IdentityStoreId from the second key-value pair returned.

Prerequisite information: Admin email address

The admin email address notified when a new AWS account is created.

This email address is used to receive notifications when a new AWS account is created.

Prerequisite information: S3 bucket

The name of the Amazon S3 bucket where the AWS account list CSV files will be uploaded to automate AWS account creation.

This globally unique bucket name will be used to create a new Amazon S3 Bucket, and the automation script will receive events from new objects uploaded to this bucket.

Prerequisite information: AWS SSO user group ARN

Go to AWS SSO > Groups and select the user group whose permission set you would like to assign to the new AWS account. Copy the Group ID from the selected user group. This can be a local AWS SSO user group, or a third-party identity provider-synced user group.

Note: For the AWS SSO user group, there is no AWS CLI equivalent; you need to use the AWS web console to collect this information.

Figure 4: AWS SSO user group ARN

Figure 4: AWS SSO user group ARN

Prerequisite information: AWS SSO permission set

The ARN of the AWS SSO permission set to be assigned to the user group.

From the web console

To view existing permission sets using the AWS SSO web console, go to AWS accounts > Permission sets. From there, you can see a list of permission sets and their respective ARNs.

Figure 5: AWS SSO permission sets list

Figure 5: AWS SSO permission sets list

You can also select the permission set name and from the detailed permission set window, copy the ARN of the chosen permission set. Alternatively, create your own unique permission set to be assigned to the intended user group.

Figure 6: AWS SSO permission set ARN

Figure 6: AWS SSO permission set ARN

From the CLI

To get permission set information from the CLI, run the following AWS CLI command:

aws sso-admin list-permission-sets --instance-arn <SSO Instance ARN>

This command will return an output similar to this:

{
    "PermissionSets": [
    "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-1234567890abcdef",
    "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-abcdef1234567890"
    ]
}

If you can’t determine the details for your permission set from the output of the CLI shown above, you can get the details of each permission set by running the following AWS CLI command:

aws sso-admin describe-permission-set --instance-arn <SSO Instance ARN> --permission-set-arn <PermissionSet ARN>

The output will be similar to this:

{
    "PermissionSet": {
    "Name": "AWSPowerUserAccess",
    "PermissionSetArn": "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-abc123def4567890",
    "Description": "Provides full access to AWS services and resources, but does not allow management of Users and groups",
    "CreatedDate": "2020-08-28T11:20:34.242000-04:00",
    "SessionDuration": "PT1H"
    }
}

The output above lists the name and description of each permission set, which can help you identify which permission set ARN you will use.

Solution initiation

The solution steps are in two parts: the initiation, and the batch account creation and SSO assignment processes.

To initiate the solution

  1. Log in to the management account as the AWS Control Tower administrator, and deploy the provided AWS CloudFormation stack with the required parameters filled out.

    Note: To fill out the required parameters of the solution, refer to steps 1 to 6 of the To launch the AWS CloudFormation stack procedure below.

  2. When the stack is successfully deployed, it performs the following actions to set up the batch process. It creates:
    • The S3 bucket where you will upload the AWS account list CSV file.
    • A DynamoDB table. This table tracks the AWS account creation status.
    • A Lambda function, NewAccountHandler.
    • A Lambda function, CreateManagedAccount. This function is triggered by the entries in the Amazon DynamoDB table and initiates the batch account creation process.
    • An Amazon CloudWatch Events rule to detect the AWS Control Tower CreateManagedAccount lifecycle event.
    • Another Lambda function, CreateAccountAssignment. This function is triggered by AWS Control Tower Lifecycle Events via Amazon CloudWatch Events to assign the AWS SSO Permission Set to the specified User Group and AWS account

To create the AWS Account list CSV file

After you deploy the solution stack, you need to create a CSV file based on this sample.csv and upload it to the Amazon S3 bucket created in this solution. This CSV file will be used to automate the new account creation process.

CSV file format

The CSV file must follow the following format:

AccountName,SSOUserEmail,AccountEmail,SSOUserFirstName,SSOUserLastName,OrgUnit,Status,AccountId,ErrorMsg
Test-account-1,[email protected],[email protected],Fname-1,Lname-1,Test-OU-1,,,
Test-account-2,[email protected],[email protected],Fname-2,Lname-2,Test-OU-2,,,
Test-account-3,[email protected],[email protected],Fname-3,Lname-3,Test-OU-1,,,

Where the first line is the column names, and each subsequent line contains the new AWS accounts that you want to create and automatically assign that SSO user group to the permission set.

CSV fields

AccountName: String between 1 and 50 characters [a-zA-Z0-9_-]
SSOUserEmail: String with more than seven characters and be a valid email address for the primary AWS Administrator of the new AWS account
AccountEmail: String with more than seven characters and be a valid email address not used by other AWS accounts
SSOUserFirstName: String with the first name of the primary AWS Administrator of the new AWS account
SSOUserLastName: String with the last name of the primary AWS Administrator of the new AWS account
OrgUnit: String and must be an existing AWS Organizations OrgUnit
Status: String, for future use
AccountId: String, for future use
ErrorMsg: String, for future use

Figure 7 shows the details that are included in our example for the two new AWS accounts that will be created.

Figure 7: Sample AWS account list CSV

Figure 7: Sample AWS account list CSV

  1. The NewAccountHandler function is triggered from an object upload into the Amazon S3 bucket, validates the input file entries, and uploads the validated input file entries to the Amazon DynamoDB table.
  2. The CreateManagedAccount function queries the DynamoDB table to get the details of the next account to be created. If there is another account to be created, then the batch account creation process moves on to Step 4, otherwise it completes.
  3. The CreateManagedAccount function launches the AWS Control Tower Account Factory product in AWS Service Catalog to create and provision a new account.
  4. After Account Factory has completed the account creation workflow, it generates the CreateManagedAccount lifecycle event, and the event log states if the workflow SUCCEEDED or FAILED.
  5. The CloudWatch Events rule detects the CreateManagedAccount AWS Control Tower Lifecycle Event, and triggers the CreateManagedAccount and CreateAccountAssignment functions, and sends email notification to the administrator via AWS SNS.
  6. The CreateManagedAccount function updates the Amazon DynamoDB table with the results of the AWS account creation workflow. If the account was successfully created, it updates the input file entry in the Amazon DynamoDB table with the account ID; otherwise, it updates the entry in the table with the appropriate failure or error reason.
  7. The CreateAccountAssignment function assigns the AWS SSO Permission Set with the appropriate AWS IAM policies to the User Group specified in the Parameters when launching the AWS CloudFormation stack.
  8. When the Amazon DynamoDB table is updated, the Amazon DynamoDB stream triggers the CreateManagedAccount function for subsequent AWS accounts or when new AWS account list CSV files are updated, then steps 1-9 are repeated.

Upload the CSV file

Once the AWS account list CSV file has been created, upload it into the Amazon S3 bucket created by the stack.

Deploying the solution

To launch the AWS CloudFormation stack

Now that all the requirements and the specifications to run the solution are ready, you can launch the AWS CloudFormation stack:

  1. Open the AWS CloudFormation launch wizard in the console.
  2. In the Create stack page, choose Next.

    Figure 8: Create stack in CloudFormation

    Figure 8: Create stack in CloudFormation

  3. On the Specify stack details page, update the default parameters to use the information you captured in the prerequisites as shown in Figure 9, and choose Next.

    Figure 9: Input parameters into AWS CloudFormation

    Figure 9: Input parameters into AWS CloudFormation

  4. On the Configure stack option page, choose Next.
  5. On the Review page, check the box “I acknowledge that AWS CloudFormation might create IAM resources.” and choose Create Stack.
  6. Once the AWS CloudFormation stack has completed, go to the Amazon S3 web console and select the Amazon S3 bucket that you defined in the AWS CloudFormation stack.
  7. Upload the AWS account list CSV file with the information to create new AWS accounts. See To create the AWS Account list CSV file above for details on creating the CSV file.

Workflow and solution details

When a new file is uploaded to the Amazon S3 bucket, the following actions occur:

  1. When you upload the AWS account list CSV file to the Amazon S3 bucket, the Amazon S3 service triggers an event for newly uploaded objects that invokes the Lambda function NewAccountHandler.
  2. This Lambda function executes the following steps:
    • Checks whether the Lambda function was invoked by an Amazon S3 event, or the CloudFormation CREATE event.
    • If the event is a new object uploaded from Amazon S3, read the object.
    • Validate the content of the CSV file for the required columns and values.
    • If the data has a valid format, insert a new item with the data into the Amazon DynamoDB table, as shown in Figure 10 below.

      Figure 10: DynamoDB table items with AWS accounts details

      Figure 10: DynamoDB table items with AWS accounts details

    • Amazon DynamoDB is configured to initiate the Lambda function CreateManagedAccount when insert, update, or delete items are initiated.
    • The Lambda function CreateManagedAccount checks for update event type. When an item is updated in the table, this item is checked by the Lambda function, and if the AWS account is not created, the Lambda function invokes the AWS Control Tower Account Factory from the AWS Service Catalog to create a new AWS account with the details stored in the Amazon DynamoDB item.
    • AWS Control Tower Account Factory starts the AWS account creation process. When the account creation process completes, the status of Account Factory will show as Available in Provisioned products, as shown in Figure 11.

      Figure 11: AWS Service Catalog provisioned products for AWS account creation

      Figure 11: AWS Service Catalog provisioned products for AWS account creation

    • Based on the Control Tower lifecycle events, the CreateAccountAssignment Lambda function will be invoked when the CreateManagedAccount event is sent to CloudWatch Events. An AWS SNS topic is also triggered to send an email notification to the administrator email address as shown in Figure 12 below.

      Figure 12: AWS email notification when account creation completes

      Figure 12: AWS email notification when account creation completes

    • When invoked, the Lambda function CreateAccountAssignment assigns the AWS SSO user group to the new AWS account with the permission set defined in the AWS CloudFormation stack.

      Figure 13: New AWS account showing user groups with permission sets assigned

      Figure 13: New AWS account showing user groups with permission sets assigned

Figure 13 above shows the new AWS account with the user groups and the assigned permission sets. This completes the automation process. The AWS SSO users that are part of the user group will automatically be allowed to access the new AWS account with the defined permission set.

Handling common sources of error

This solution connects multiple components to facilitate the new AWS account creation and AWS SSO permission set assignment. The correctness of the parameters in the AWS CloudFormation stack is important to make sure that when AWS Control Tower creates a new AWS account, it is accessible.

To verify that this solution works, make sure that the email address is a valid email address, you have access to that email, and it is not being used for any existing AWS account. After a new account is created, it is not possible to change its root account email address, so if you input an invalid or inaccessible email, you will need to create a new AWS account and remove the invalid account.

You can view common errors by going to AWS Service Catalog web console. Under Provisioned products, you can see all of your AWS Control Tower Account Factory-launched AWS accounts.

Figure 14: AWS Service Catalog provisioned product with error

Figure 14: AWS Service Catalog provisioned product with error

Selecting Error under the Status column shows you the source of the error. Figure 15 below is an example of the source of the error:

Figure 15: AWS account creation error explanation

Figure 15: AWS account creation error explanation

Conclusion

In this post, we’ve shown you how to automate batch creation of AWS accounts in AWS Control Tower and batch assignment of user access to AWS accounts in AWS SSO. When the batch AWS accounts creation and AWS SSO user access assignment processes are complete, the administrator will be notified by emails from AWS SNS. We’ve also explained how to handle some common sources of errors and how to avoid them.

As you automate the batch AWS account creation and user access assignment, you can reduce the time you spend on the undifferentiated heavy lifting work, and onboard your users in your organization much more quickly, so they can start using and experimenting on AWS right away.

To learn more about the best practices of setting up an AWS multi-account environment, check out this documentation for more information.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rafael Koike

Rafael is a Principal Solutions Architect supporting Enterprise customers in SouthEast and part of the Storage TFC. Rafael has a passion to build and his expertise in security, storage, networking and application development have been instrumental to help customers move to the cloud secure and fast. When he is not building he like to do Crossfit and target shooting.

Eugene Toh

Eugene Toh is a Solutions Architect supporting Enterprise customers in the Georgia and Alabama areas. He is passionate in helping customers to transform their businesses and take them to the next level. His area of expertise is in cloud migrations and disaster recovery and he enjoys giving public talks on the latest cloud technologies. Outside of work, he loves trying great food and traveling all over the world.

How to enrich AWS Security Hub findings with account metadata

Post Syndicated from Siva Rajamani original https://aws.amazon.com/blogs/security/how-to-enrich-aws-security-hub-findings-with-account-metadata/

In this blog post, we’ll walk you through how to deploy a solution to enrich AWS Security Hub findings with additional account-related metadata, such as the account name, the Organization Unit (OU) associated with the account, security contact information, and account tags. Account metadata can help you search findings, create insights, and better respond to and remediate findings.

AWS Security Hub ingests findings from multiple AWS services, including Amazon GuardDuty, Amazon Inspector, Amazon Macie, AWS Firewall Manager, AWS Identity and Access Management (IAM) Access Analyzer, and AWS Systems Manager Patch Manager. Findings from each service are normalized into the AWS Security Finding Format (ASFF), so you can review findings in a standardized format and take action quickly. You can use AWS Security Hub to provide a single view of all security-related findings, and to set up alerts, automate remediation, and export specific findings to third‑party incident management systems.

The Security or DevOps teams responsible for investigating, responding to, and remediating Security Hub findings may need additional account metadata beyond the account ID, to determine what to do about the finding or where to route it. For example, determining whether the finding originated from a development or production account can be key to determining the priority of the finding and the type of remediation action needed. Having this metadata information in the finding allows customers to create custom insights in Security Hub to track which OUs or applications (based on account tags) have the most open security issues. This blog post demonstrates a solution to enrich your findings with account metadata to help your Security and DevOps teams better understand and improve their security posture.

Solution Overview

In this solution, you will use a combination of AWS Security Hub, Amazon EventBridge and AWS Lambda to ingest the findings and automatically enrich them with account related metadata by querying AWS Organizations and Account management service APIs. The solution architecture is shown in Figure 1 below:

Figure 1: Solution Architecture and workflow for metadata enrichment

Figure 1: Solution Architecture and workflow for metadata enrichment

The solution workflow includes the following steps:

  1. New findings and updates to existing Security Hub findings from all the member accounts flow into the Security Hub administrator account. Security Hub generates Amazon EventBridge events for the findings.
  2. An EventBridge rule created as part of the solution in the Security Hub administrator account will trigger a Lambda function configured as a target every time an EventBridge notification for a new or updated finding imported into Security Hub matches the EventBridge rule shown below:
    {
      "detail-type": ["Security Hub Findings - Imported"],
      "source": ["aws.securityhub"],
      "detail": {
        "findings": {
          "RecordState": ["ACTIVE"],
          "UserDefinedFields": {
            "findingEnriched": [{
              "exists": false
            }]
          }
        }
      }
    }

  3. The Lambda function uses the account ID from the event payload to retrieve both the account information and the alternate contact information from the AWS Organizations and Account management service API. The following code within the helper.py constructs the account_details object representing the account information to enrich the finding:
    def get_account_details(account_id, role_name):
        account_details ={}
        organizations_client = AwsHelper().get_client('organizations')
        response = organizations_client.describe_account(AccountId=account_id)
        account_details["Name"] = response["Account"]["Name"]
        response = organizations_client.list_parents(ChildId=account_id)
        ou_id = response["Parents"][0]["Id"]
        if ou_id and response["Parents"][0]["Type"] == "ORGANIZATIONAL_UNIT":
            response = organizations_client.describe_organizational_unit(OrganizationalUnitId=ou_id)
            account_details["OUName"] = response["OrganizationalUnit"]["Name"]
        elif ou_id:
            account_details["OUName"] = "ROOT"
        if role_name:
            account_client = AwsHelper().get_session_for_role(role_name).client("account")
        else:
            account_client = AwsHelper().get_client('account')
        try:
            response = account_client.get_alternate_contact(
                AccountId=account_id,
                AlternateContactType='SECURITY'
            )
            if response['AlternateContact']:
                print("contact :{}".format(str(response["AlternateContact"])))
                account_details["AlternateContact"] = response["AlternateContact"]
        except account_client.exceptions.AccessDeniedException as error:
            #Potentially due to calling alternate contact on Org Management account
            print(error.response['Error']['Message'])
        
        response = organizations_client.list_tags_for_resource(ResourceId=account_id)
        results = response["Tags"]
        while "NextToken" in response:
            response = organizations_client.list_tags_for_resource(ResourceId=account_id, NextToken=response["NextToken"])
            results.extend(response["Tags"])
        
        account_details["tags"] = results
        AccountHelper.logger.info("account_details: %s" , str(account_details))
        return account_details

  4. The Lambda function updates the finding using the Security Hub BatchUpdateFindings API to add the account related data into the Note and UserDefinedFields attributes of the SecurityHub finding:
    #lookup and build the finding note and user defined fields  based on account Id
    enrichment_text, tags_dict = enrich_finding(account_id, assume_role_name)
    logger.debug("Text to post: %s" , enrichment_text)
    logger.debug("User defined Fields %s" , json.dumps(tags_dict))
    #add the Note to the finding and add a userDefinedField to use in the event bridge rule and prevent repeat lookups
    response = secHubClient.batch_update_findings(
        FindingIdentifiers=[
            {
                'Id': enrichment_finding_id,
                'ProductArn': enrichment_finding_arn
            }
        ],
        Note={
            'Text': enrichment_text,
            'UpdatedBy': enrichment_author
        },
        UserDefinedFields=tags_dict
    )

    Note: All state change events published by AWS services through Amazon Event Bridge are free of cost. The AWS Lambda free tier includes 1M free requests per month, and 400,000 GB-seconds of compute time per month at the time of publication of this post. If you process 2M requests per month, the estimated cost for this solution would be approximately $7.20 USD per month.

  5. Prerequisites

    1. Your AWS organization must have all features enabled.
    2. This solution requires that you have AWS Security Hub enabled in an AWS multi-account environment which is integrated with AWS Organizations. The AWS Organizations management account must designate a Security Hub administrator account, which can view data from and manage configuration for its member accounts. Follow these steps to designate a Security Hub administrator account for your AWS organization.
    3. All the members accounts are tagged per your organization’s tagging strategy and their security alternate contact is filled. If the tags or alternate contacts are not available, the enrichment will be limited to the Account Name and the Organizational Unit name.
    4. Trusted access must be enabled with AWS Organizations for AWS Account Management service. This will enable the AWS Organizations management account to call the AWS Account Management API operations (such as GetAlternateContact) for other member accounts in the organization. Trusted access can be enabled either by using AWS Management Console or by using AWS CLI and SDKs.

      The following AWS CLI example enables trusted access for AWS Account Management in the calling account’s organization.

      aws organizations enable-aws-service-access --service-principal account.amazonaws.com

    5. An IAM role with a read only access to lookup the GetAlternateContact details must be created in the Organizations management account, with a trust policy that allows the Security Hub administrator account to assume the role.

    Solution Deployment

    This solution consists of two parts:

    1. Create an IAM role in your Organizations management account, giving it necessary permissions as described in the Create the IAM role procedure below.
    2. Deploy the Lambda function and the other associated resources to your Security Hub administrator account

    Create the IAM role

    Using console, AWS CLI or AWS API

    Follow the Creating a role to delegate permissions to an IAM user instructions to create a IAM role using the console, AWS CLI or AWS API in the AWS Organization management account with role name as account-contact-readonly, based on the trust and permission policy template provided below. You will need the account ID of your Security Hub administrator account.

    The IAM trust policy allows the Security Hub administrator account to assume the role in your Organization management account.

    IAM Role trust policy

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<SH administrator Account ID>:root"
          },
          "Action": "sts:AssumeRole",
          "Condition": {}
        }
      ]
    }

    Note: Replace the <SH Delegated Account ID> with the account ID of your Security Hub administrator account. Once the solution is deployed, you should update the principal in the trust policy shown above to use the new IAM role created for the solution.

    IAM Permission Policy

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "account:GetAlternateContact"
                ],
                "Resource": "arn:aws:account::<Org. Management Account id>:account/o-*/*"
            }
        ]
    }

    The IAM permission policy allows the Security Hub administrator account to look up the alternate contact information for the member accounts.

    Make a note of the Role ARN for the IAM role similar to this format:

    arn:aws:iam::<Org. Management Account id>:role/account-contact-readonly. 
    			

    You will need this while the deploying the solution in the next procedure.

    Using AWS CloudFormation

    Alternatively, you can use the  provided CloudFormation template to create the role in the management account. The IAM role ARN is available in the Outputs section of the created CloudFormation stack.

    Deploy the Solution to your Security Hub administrator account

    You can deploy the solution using either the AWS Management Console, or from the GitHub repository using the AWS SAM CLI.

    Note: if you have designated an aggregation Region within the Security Hub administrator account, you can deploy this solution only in the aggregation Region, otherwise you need to deploy this solution separately in each Region of the Security Hub administrator account where Security Hub is enabled.

    To deploy the solution using the AWS Management Console

    1. In your Security Hub administrator account, launch the template by choosing the Launch Stack button below, which creates the stack the in us-east-1 Region.

      Note: if your Security Hub aggregation region is different than us-east-1 or want to deploy the solution in a different AWS Region, you can deploy the solution from the GitHub repository described in the next section.

      Select this image to open a link that starts building the CloudFormation stack

    2. On the Quick create stack page, for Stack name, enter a unique stack name for this account; for example, aws-security-hub–findings-enrichment-stack, as shown in Figure 2 below.
      Figure 2: Quick Create CloudFormation stack for the Solution

      Figure 2: Quick Create CloudFormation stack for the Solution

    3. For ManagementAccount, enter the AWS Organizations management account ID.
    4. For OrgManagementAccountContactRole, enter the role ARN of the role you created previously in the Create IAM role procedure.
    5. Choose Create stack.
    6. Once the stack is created, go to the Resources tab and take note of the name of the IAM Role which was created.
    7. Update the principal element of the IAM role trust policy which you previously created in the Organization management account in the Create the IAM role procedure above, replacing it with the role name you noted down, as shown below.
      Figure 3 Update Management Account Role’s Trust

      Figure 3 Update Management Account Role’s Trust

    To deploy the solution from the GitHub Repository and AWS SAM CLI

    1. Install the AWS SAM CLI
    2. Download or clone the github repository using the following commands
      $ git clone https://github.com/aws-samples/aws-security-hub-findings-account-data-enrichment.git
      $ cd aws-security-hub-findings-account-data-enrichment

    3. Update the content of the profile.txt file with the profile name you want to use for the deployment
    4. To create a new bucket for deployment artifacts, run create-bucket.sh by specifying the region as argument as below.
      $ ./create-bucket.sh us-east-1

    5. Deploy the solution to the account by running the deploy.sh script by specifying the region as argument
      $ ./deploy.sh us-east-1

    6. Once the stack is created, go to the Resources tab and take note of the name of the IAM Role which was created.
    7. Update the principal element of the IAM role trust policy which you previously created in the Organization management account in the Create the IAM role procedure above, replacing it with the role name you noted down, as shown below.
      "AWS": "arn:aws:iam::<SH Delegated Account ID>: role/<Role Name>"

    Using the enriched attributes

    To test that the solution is working as expected, you can create a standalone security group with an ingress rule that allows traffic from the internet. This will trigger a finding in Security Hub, which will be populated with the enriched attributes. You can then use these enriched attributes to filter and create custom insights, or take specific response or remediation actions.

    To generate a sample Security Hub finding using AWS CLI

    1. Create a Security Group using following AWS CLI command:
      aws ec2 create-security-group --group-name TestSecHubEnrichmentSG--description "Test Security Hub enrichment function"

    2. Make a note of the security group ID from the output, and use it in Step 3 below.
    3. Add an ingress rule to the security group which allows unrestricted traffic on port 100:
      aws ec2 authorize-security-group-ingress --group-id <Replace Security group ID> --protocol tcp --port 100 --cidr 0.0.0.0/0

    Within few minutes, a new finding will be generated in Security Hub, warning about the unrestricted ingress rule in the TestSecHubEnrichmentSG security group. For any new or updated findings which do not have the UserDefinedFields attribute findingEnriched set to true, the solution will enrich the finding with account related fields in both the Note and UserDefinedFields sections in the Security Hub finding.

    To see and filter the enriched finding

    1. Go to Security Hub and click on Findings on the left-hand navigation.
    2. Click in the filter field at the top to add additional filters. Choose a filter field of AWS Account ID, a filter match type of is, and a value of the AWS Account ID where you created the TestSecHubEnrichmentSG security group.
    3. Add one more filter. Choose a filter field of Resource type, a filter match type of is, and the value of AwsEc2SecurityGroup.
    4. Identify the finding for security group TestSecHubEnrichmentSG with updates to Note and UserDefinedFields, as shown in Figures 4 and 5 below:
      Figure 4: Account metadata enrichment in Security Hub finding’s Note field

      Figure 4: Account metadata enrichment in Security Hub finding’s Note field

      Figure 5: Account metadata enrichment in Security Hub finding’s UserDefinedFields field

      Figure 5: Account metadata enrichment in Security Hub finding’s UserDefinedFields field

      Note: The actual attributes you will see as part of the UserDefinedFields may be different from the above screenshot. Attributes shown will depend on your tagging configuration and the alternate contact configuration. At a minimum, you will see the AccountName and OU fields.

    5. Once you confirm that the solution is working as expected, delete the stand-alone security group TestSecHubEnrichmentSG, which was created for testing purposes.

    Create custom insights using the enriched attributes

    You can use the attributes available in the UserDefinedFields in the Security Hub finding to filter the findings. This lets you generate custom Security Hub Insight and reports tailored to suit your organization’s needs. The example shown in Figure 6 below creates a custom Security Hub Insight for findings grouped by severity for a specific owner, using the Owner attribute within the UserDefinedFields object of the Security Hub finding.

    Figure 6: Custom Insight with Account metadata filters

    Figure 6: Custom Insight with Account metadata filters

    Event Bridge rule for response or remediation action using enriched attributes

    You can also use the attributes in the UserDefinedFields object of the Security Hub finding within the EventBridge rule to take specific response or remediation actions based on values in the attributes. In the example below, you can see how the Environment attribute can be used within the EventBridge rule configuration to trigger specific actions only when value matches PROD.

    {
      "detail-type": ["Security Hub Findings - Imported"],
      "source": ["aws.securityhub"],
      "detail": {
        "findings": {
          "RecordState": ["ACTIVE"],
          "UserDefinedFields": {
            "Environment": "PROD"
          }
        }
      }
    }

    Conclusion

    This blog post walks you through a solution to enrich AWS Security Hub findings with AWS account related metadata using Amazon EventBridge notifications and AWS Lambda. By enriching the Security Hub findings with account related information, your security teams have better visibility, additional insights and improved ability to create targeted reports for specific account or business teams, helping them prioritize and improve overall security response. To learn more, see:

 
If you have feedback about this post, submit comments in the Comments section below. If you have any questions about this post, start a thread on the AWS Security Hub forum.

Want more AWS Security news? Follow us on Twitter.

Siva Rajamani

Siva Rajamani

Siva Rajamani is a Boston-based Enterprise Solutions Architect at AWS. Siva enjoys working closely with customers to accelerate their AWS cloud adoption and improve their overall security posture.

Prashob Krishnan

Prashob Krishnan

Prashob Krishnan is a Denver-based Technical Account Manager at AWS. Prashob is passionate about security. He enjoys working with customers to solve their technical challenges and help build a secure scalable architecture on the AWS Cloud.

Configure AWS SSO ABAC for EC2 instances and Systems Manager Session Manager

Post Syndicated from Rodrigo Ferroni original https://aws.amazon.com/blogs/security/configure-aws-sso-abac-for-ec2-instances-and-systems-manager-session-manager/

In this blog post, I show you how to configure AWS Single Sign-On to define attribute-based access control (ABAC) permissions to manage Amazon Elastic Compute Cloud (Amazon EC2) instances and AWS Systems Manager Session Manager for federated users. This combination allows you to control access to specific Amazon EC2 instances based on users’ attributes. I show you how defined AWS SSO identity source attributes like login and department can be used, and how custom attributes like SSMSessionRunAs can be used to pass these attributes into Amazon Web Services (AWS) from an external identity provider (IdP) using  SAML 2.0 assertion.

AWS SSO added support for ABAC to enable you to create fine-grained permissions for your workforce in AWS using user attributes. Using user attributes as tags in AWS helps you simplify the process of creating fine-grained permissions in AWS and enables you to ensure that your workforce has access only to the AWS resources with matching tags.

The new feature works with any supported AWS SSO identity source. This post walks you through the steps to enable attributes for access control, create permission sets and manage assignments when using a supported external IdP as your identity source.

Solution overview

The following architecture diagram—Figure 1—presents an overview of the solution.

Figure 1: Solution architecture diagram

Figure 1: Solution architecture diagram

In the example in Figure 1, Alice and Bob are users who each have the attributes
login
, department, and SSMSessionRunAs. These attributes are created and updated in the external directory—Okta in this example—under those users’ profiles. The first two attributes are automatically synchronized by using System for Cross-domain Identity Management (SCIM) protocol between AWS SSO and Okta and configured within AWS SSO settings. The third custom attribute is passed directly from Okta into the AWS accounts as a new SAML assertion.

Both users are using the same AWS SSO custom permission set that allows them to launch a new Amazon EC2 instance with proper tags enforcement. Based on those tags, they can start, stop, and restart the EC2 instance if they are in the same department, and to terminate it if they are the owner. Also, they can connect using Session Manager if they’re in the same department. Users can sign in to those instances using the Linux OS user defined in the attribute SSMSessionRunAs.

Prerequisites

To perform the steps to use AWS SSO attributes for ABAC, you must already have deployed AWS SSO for your AWS Organizations and have connected with an external identity source using SAML and SCIM protocols. For more information, see Checklist: Configuring ABAC in AWS using AWS SSO.

You need two test users for implementing and testing the solution. You can use two existing users, or create new users named Alice and Bob to match the solution and testing described in the following sections.

Implement the solution

The basic steps to implement the solution are:

  1. Confirm in AWS SSO settings that you have defined an external IdP, authentication via SAML 2.0, and provisioning via SCIM protocol.
  2. Enable attributes for access control and define the two supported attributes: login and department.
  3. Create a new user attribute in the Okta Directory.
  4. Edit and confirm the users’ attributes defined in the Okta Directory profile.
  5. Configure the SAML attribute statement in the Okta AWS SSO application.
  6. Create a new permission set using an ABAC policy.
  7. Create an AWS account assignment to the users using the permission set created in the previous step.

Confirm AWS SSO configuration

In this first step, you confirm that AWS SSO has been properly configured. Go to AWS SSO console SSO settings to check that the configuration of your identity source, authentication, and provisioning is as follows:

Identity source: External Identity Provider
Authentication: SAML 2.0
Provisioning: SCIM

  1. Confirm authentication is working as expected, by going to your user portal URL in a new browser instance (to ensure your user authentication doesn’t overwrite your existing authentication). The user portal offers a single place to access all the assigned AWS accounts, roles, and applications. For example, it should look like https://exampledomain.awsapps.com/start. Once you access it, the process automatically redirects the request to your external provider for authentication, and then returns the user to the AWS SSO user portal.
  2. To confirm provisioning, go to the AWS SSO console and choose Users from the right panel. You should see your Okta users assigned to the AWS SSO application being synchronized by SCIM protocol. Select any user to see the Created by SCIM and Updated by SCIM information for that user.

Enable AWS SSO attributes for access control

In this step, you enable ABAC and then configure AWS SSO attributes. This solution uses the Attributes for access control page in the AWS Management Console to enter the key and value pairs.

To enable attributes for access control

  1. Open the AWS SSO console.
  2. Choose Settings.
  3. On the Settings page, under Identity source, next to Attributes for access control, select Enable. As shown in Figure 2.
Figure 2: Attributes for access control settings (enable ABAC)

Figure 2: Attributes for access control settings (enable ABAC)

Once ABAC is enabled, you can select the attributes to be synchronized. For this use case, select login and department.

To select your attributes using the AWS SSO console

  1. Open the AWS SSO console.
  2. Choose Settings.
  3. On the Settings page, under Identity source, next to Attributes for access control, choose View details.
  4. On the Attributes for access control page, notice the Key and Value columns. This is where you will be mapping the attribute from your identity source to an attribute that AWS SSO passes as a session tag. Set the first key and value pair by entering login as the key and ${path:userName} as the value. Set the second key and value pair to department and ${path:enterprise.department}. The settings are shown in Figure 3 below.

    Figure 3: Map attributes using the Attributes for access control page

    Figure 3: Map attributes using the Attributes for access control page

  5. Choose Save changes.

Create a new attribute in Okta Directory

In this third step, you create the new custom attribute SSMSessionRunAs.

To create a new user attribute

  1. Open the Okta console.
  2. Under Directory, choose Profile Editor.
  3. Choose Edit Profile for Okta User (default).
  4. Under Attributes, choose Add Attribute as follows:
    Data type: Select String
    Display Name: Enter SSMSessionRunAs
    Variable Name: Enter SSMSessionRunAs
    Attribute Length: Select Less than and enter 10 (max).
  5. Choose Save.

Edit and confirm users’ attributes defined in Okta Directory profile

Now that you have the new attribute SSMSessionRunAs created, go to the users’ profiles to enter the Department and SSMSessionRunAs values for both users.

To edit and confirm users’ attributes

  1. Open the Okta console.
  2. Under Directory, choose People.
  3. Select user Bob.
  4. Under Profile tab choose Edit as follows:

    For the key Department, enter blue as the value.

    For the key SSMSessionRunAs, enter bob as the value.

  5. Choose Save.
  6. Repeat steps 1 through 5 for Alice. For the key Department, enter amber as the value and for SSMSessionRunAs, enter alice as the value.
  7. Confirm that the attributes of both users are defined in the external directory as follows:Username (login): [email protected]
    First name (firstName): Bob
    Last name (lastName): Rodriguez
    Display name (displayName): Bob
    Department (department): blue
    SSMSessionRunAs (SSMSessionRunAs): bob

    Username (login): [email protected]
    First name (firstName): Alice
    Last name (lastName): Rosalez
    Display name (displayName): Alice
    Department (department): amber
    SSMSessionRunAs (SSMSessionRunAs): alice

Configure SAML attribute statement in Okta AWS SSO application

The attribute SSMSessionRunAs isn’t available as an attribute within AWS SSO. However, you can include it by defining SAML attribute statements, which are inserted into the SAML assertions.

To create a new SAML attribute

  1. Open the Okta Application console.
  2. Choose AWS Single Sign-on application.
  3. On the Sign On tab, choose Edit Settings.
  4. Under SAML 2.0 Attributes Statements enter the following:
    • For Name, enter https://aws.amazon.com/SAML/Attributes/AccessControl:SSMSessionRunAs
    • For Name format, select URI Reference
    • For Value, enter user.SSMSessionRunAs
  5. Choose Save.

Create a new permission set using an ABAC policy

In this step, you create a permissions policy that determines who can access your AWS resources based on the configured attribute value. When you enable ABAC and specify attributes, AWS SSO passes the attribute value of the authenticated user into AWS Identity and Access Management (IAM) for use in policy evaluation.

To create a permission set

  1. Open the AWS SSO console.
  2. Choose AWS accounts.
  3. Select the Permission sets tab.
  4. Choose Create permission set.
  5. On the Create new permission set page, choose Create a custom permission set.
    1. Choose Next: Details.
    2. Under Create a custom permission set, enter a name that will identify this permission set in AWS SSO. This name will also appear as an IAM role in the user portal for any users who have access to it. For this solution, name it myCustomPermissionSetEC2SSM.
    3. Choose Create a custom permissions policy and paste in the following ABAC policy document:
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "AllowDescribeList",
            "Action": [
              "ec2:Describe*",
              "ssm:Describe*",
              "ssm:Get*",
              "ssm:List*",
              "iam:ListInstanceProfiles",
              "cloudwatch:DescribeAlarms"
            ],
            "Effect": "Allow",
            "Resource": "*"
          },
          {
            "Sid": "AllowRunInstancesResources",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
              "arn:aws:ec2:*::image/*",
              "arn:aws:ec2:*::snapshot/*",
              "arn:aws:ec2:*:*:subnet/*",
              "arn:aws:ec2:*:*:key-pair/*",
              "arn:aws:ec2:*:*:security-group/*",
              "arn:aws:ec2:*:*:network-interface/*"
            ]
          },
          {
            "Sid": "AllowRunInstancesConditions",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
              "arn:aws:ec2:*:*:instance/*",
              "arn:aws:ec2:*:*:volume/*",
              "arn:aws:ec2:*:*:network-interface/*"
            ],
            "Condition": {
              "StringLike": {
                "aws:RequestTag/Name": "*"
              },
              "StringEquals": {
                "aws:RequestTag/Owner": "${aws:PrincipalTag/login}",
                "aws:RequestTag/Department": "${aws:PrincipalTag/department}"
              },
              "ForAllValues:StringEquals": {
                "aws:TagKeys": [
                  "Name",
                  "Owner",
                  "Department"
                ]
              }
            }
          },
          {
            "Sid": "AllowCreateTagsOnRunInstance",
            "Effect": "Allow",
            "Action": "ec2:CreateTags",
            "Resource": [
              "arn:aws:ec2:*:*:volume/*",
              "arn:aws:ec2:*:*:instance/*",
              "arn:aws:ec2:*:*:network-interface/*"
            ],
            "Condition": {
              "StringEquals": {
                "ec2:CreateAction": "RunInstances"
              }
            }
          },
          {
            "Sid": "AllowPassRoleSpecificRole",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/EC2UbuntuSSMRole"
          },
          {
            "Sid": "AllowEC2ActionsConditions",
            "Effect": "Allow",
            "Action": [
              "ec2:StartInstances",
              "ec2:StopInstances",
              "ec2:RebootInstances"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "ec2:ResourceTag/Department": "${aws:PrincipalTag/department}"
              }
            }
          },
          {
            "Sid": "AllowTerminateConditions",
            "Effect": "Allow",
            "Action": [
              "ec2:TerminateInstances"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "ec2:ResourceTag/Owner": "${aws:PrincipalTag/login}"
              }
            }
          },
          {
            "Sid": "AllowStartSessionConditions",
            "Effect": "Allow",
            "Action": [
              "ssm:StartSession"
            ],
            "Resource": "*",
            "Condition": {
              "StringEquals": {
                "ssm:resourceTag/Department": "${aws:PrincipalTag/department}"
              }
            }
          },
          {
            "Sid": "AllowTerminateSessionConditions",
            "Effect": "Allow",
            "Action": [
              "ssm:TerminateSession"
            ],
            "Resource": [
              "arn:aws:ssm:*:*:session/${aws:PrincipalTag/login}-*"
            ]
          }
        ]
      }
      

    4. Choose Next: Tags.
    5. Review the selections you made, and then choose Create.

The policy described above uses SAML session tags for the ABAC to define permissions based on attributes. These attributes are the tags passed in the AssumeRoleWithSAML operation when the SAML-based federation occurs.

A combination of global (aws:TagKeys, aws:PrincipalTag, aws:RequestTag) and service (ec2:ResourceTag, ec2:CreateAction, ssm:resourceTag) condition keys is used to assign the permissions.

To learn more about AWS global and service conditions keys, see AWS global condition context keys and The condition keys table for AWS services.

Assign users to an AWS account

In this step, you use the permission set created in the previous step to assign access to the users for a specified AWS account.

To assign access to users

  1. Open the AWS SSO console.
  2. Choose AWS accounts.
  3. Under the AWS organization tab, in the list of AWS accounts, select one or more accounts to which you want to assign access.
  4. Choose Assign users.
  5. On the Select users or groups page, select both test users from the list of users as shown in Figure 4.

    Note: You can use the search box to look for specific users.

    Figure 4: Select users to assign to AWS accounts

    Figure 4: Select users to assign to AWS accounts

  6. Choose Next: Permission sets.
  7. On the Select permission sets page, select the permission sets that you created in step 5 to apply to the users from the table as shown in Figure 5.

    Figure 5: Select permissions sets

    Figure 5: Select permissions sets

  8. Choose Finish to start the configuration of your AWS account. When configuration is complete, a message is displayed stating that you have successfully configured your AWS account as shown in Figure 6.

    Figure 6: Confirmation that configuration is complete

    Figure 6: Confirmation that configuration is complete

Test the solution

Now that you have everything in place, let’s test the solution. To test the solution, you’ll log in to AWS SSO, access the AWS account and check the event logs, and test the Amazon EC2 operations.

Log in to AWS SSO as Bob through your external IdP

Enter the user portal URL in a browser window and log in to AWS SSO as Bob. AWS SSO redirects to the external provider for the log in process. After successful authentication, the external provider redirects to the AWS SSO portal, which shows you a list of the AWS accounts that you have access to. In this case, Bob has access to one AWS account as shown in Figure 7.

Figure 7: AWS SSO showing AWS accounts that the user has access to

Figure 7: AWS SSO showing AWS accounts that the user has access to

Access the AWS account using the permission set and confirm the event logs

Select the Management console link for the AWS account that has the myCustomPermissionSetEC2SSM permission set that you created earlier. This action federates into the AWS account and is logged in to AWS CloudTrail with the API AssumeRoleWithSAML. To confirm that the SAML session tags are being passed in the session, look at the API event log in the CloudTrail Event history console. In the following example, you can check the principalTags keys and their values under requestParameters.

{
     "eventVersion": "1.08",
     "userIdentity": {
          "type": "SAMLUser",
          "principalId": "d/UbWH0ijLBmlakaboZwi5CA/30=:[email protected]",
          "userName": "[email protected]",
          "identityProvider": "d/UbWH0ijLBmlakaboZwi5CA/30="
},
     "eventTime": "2021-05-13T16:08:48Z",
     "eventSource": "sts.amazonaws.com",
     "eventName": "AssumeRoleWithSAML",
     ...
     "requestParameters": {
        "sAMLAssertionID": "_5072d119-64f5-4341-aeed-30d9b7c24b5b",
        "roleSessionName": "[email protected]",
        "principalTags": {
            "SSMSessionRunAs": "bob",
            "department": "blue",
            "login": "[email protected]"
        },
        "durationSeconds": 3600,
        "roleArn": "arn:aws:iam::555555555555:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_myCustomPermissionSetEC2SSM_9e80ec498218bbea",
        "principalArn": "arn:aws:iam::555555555555:saml-provider/AWSSSO_5f872b6782a0507a_DO_NOT_DELETE"
    },
     "responseElements": {
     ...

Test EC2 operations

  1. Open the Amazon EC2 console:
    For this example, when opening the Amazon EC2 console there are already three running EC2 instances to test the ABAC policy that have been created with proper tags explained in the following step. From the top menu, you can also confirm the federated login AWSReservedSSO_myCustomPermissionSetEC2SSM_9e80ec498218bbea/[email protected] that represents the AWS SSO managed role and the user as shown in Figure 8.

    Figure 8: EC2 instances and user information

    Figure 8: EC2 instances and user information

  2. Launch a new EC2 instance:
    Start testing the ABAC policy by launching a new EC2 instance. This action is authorized only when you fill in the three required tags: Name, Owner, and Department.

    1. From the Amazon EC2 console, choose Launch Instances.
    2. Set the AMI, for this example select an Ubuntu-based OS.
    3. Set the Instance Type, a t2.micro will work.
    4. Configure the EC2 instance. Choose an IAM role to allow Systems Manager to manage the new EC2 instance. In this case, you have to create the IAM role EC2UbuntuSSMRole with the AWS managed policy AmazonEC2RoleforSSM attached in advanced with proper IAM permissions since the user Bob is not allow to do so. Then, you must use the user data to create the OS Ubuntu user—Bob—that you need to log in to the EC2 instance by using Session Manager. You can copy and paste the following to create the user “Bob”:#!/bin/bash
      sudo useradd -m bob
    5. Add storage using the default settings.
    6. Add tags. From the ABAC policy previously created, you can confirm that tag key Name can be anything as the condition StringLike is indicated with a wildcard (*). The tag keys Owner and Department have to match the principal session tags passed through federation. In this case, enter [email protected] as the key Owner, and enter blue as the Department, as shown in Figure 9.

      Figure 9: EC2 tags describing key value pairs

      Figure 9: EC2 tags describing key value pairs

    7. Configure security groups. When configuring security groups, you can choose an existing security group that doesn’t allow any inbound traffic to the SSH port. Since when using Session Manager you connect to the EC2 instance through an API that is going to be an outbound connection. This way you can safely leave the security group inbound rules close.
    8. Review and launch. It will ask you about selecting or creating a key pair. You don’t need one, because you’re using Session Manager. Proceed without selecting or creating a new SSH key pair. When launching the EC2 instance with the correct tag keys and values, you get the success message shown in Figure 10.
      Figure 10: EC2 success message launching an instance with the correct tags

      Figure 10: EC2 success message launching an instance with the correct tags

      If there are any missing tag keys or the values aren’t correct, the action will be denied as shown in Figure 11. For more information, you can decode the authorization error message using the API DecodeAuthorizationMessage.

      Figure 11: EC2 failed message launching an instance with incorrect tags

      Figure 11: EC2 failed message launching an instance with incorrect tags

  3. Stop, reboot, and terminate EC2 instances.
    The next tests are to be stop, reboot, and terminate the EC2 instances. In the ABAC policy you defined that only users who have the same department value as the resource can perform the first two actions. You can terminate and EC2 instance only if you are an owner. To stop, reboot, and terminate instances, open the EC2 Console, choose Instances, and select the instance you want to affect. Choose Instance state and choose the action you want to test: Stop instance, Reboot instance or Terminate instance.

    Trying to stop the EC2 instance amber-instance where Department is amber is shown in Figure 12.

    Figure 12: EC2 console showing how to stop an instance

    Figure 12: EC2 console showing how to stop an instance

    The action should fail as shown in Figure 13.

    Figure 13: EC2 instance failure message stopping an instance with wrong tags

    Figure 13: EC2 instance failure message stopping an instance with wrong tags

    Only when the department value of the EC2 instance is blue is it possible to stop or reboot the instance as shown in Figure 14.

    Figure 14: EC2 success message stopping an instance with correct tags

    Figure 14: EC2 success message stopping an instance with correct tags

    Only when the owner who launched the EC2 instance matches with the federated login is it possible to terminate the instance. Trying to terminate an EC2 instance that was launched by anyone other than the owner will lead to a failed action as shown in Figure 15.

    Figure 15: EC2 failed message terminating an instance with incorrect tags

    Figure 15: EC2 failed message terminating an instance with incorrect tags

  4. Try to modify tags. Because ABAC policies rely on tags, you cannot modify tags after the resources have been created. This is set in the ABAC policy statement AllowCreateTagsOnRunInstance in Create a new permission set using an ABAC policy. If you try to modify any tag keys or values on existing resources, the changes will be denied. For example, if you try to modify the owner of a tag on an existing EC2 instance, you get the “Failed to update tags” error message as shown in Figure 16.

    Figure 16: Failed message when attempting to modify tags

    Figure 16: Failed message when attempting to modify tags

  5. Connect to the EC2 instance using Session Manager.
    1. Test logging in to the EC2 instance by choosing the new instance and choosing Connect as shown in Figure 17.

      Figure 17: EC2 console selecting an instance to connect

      Figure 17: EC2 console selecting an instance to connect

    2.  Then choose the Session Manager tab and choose Connect as shown in Figure 18.
      Figure 18: EC2 console selecting Session Manager to connect

      Figure 18: EC2 console selecting Session Manager to connect

      This will open a new tab in the browser redirecting to a Systems Manager session where you can confirm that the Ubuntu OS user is Bob as shown in Figure 19.

      Figure 19: Systems Manager session started confirming Ubunto OS user

      Figure 19: Systems Manager session started confirming Ubunto OS user

      Note: By default, sessions are launched using the credentials of a system-generated account named ssm-user that is created on a managed instance. However, you can instead launch sessions using any OS user by enabling the run as feature in SSM. To learn more about this, see Enable run as support for Linux and macOS instances in the Systems Manager Session Manager user guide.

    3. Performing the same action in an EC2 instance with a different Department tag will lead to a denied action as shown in Figure 20. This is because the ABAC policy allows the StartSession action only when the Department key matches the Department value in the EC2 instance.

      Figure 20: Systems Manager StartSession failed message

      Figure 20: Systems Manager StartSession failed message

Conclusion

In this blog post, you learned how to use AWS SSO with the two methods of passing attributes to AWS account using session tags for ABAC. You also learned how to build policies with tags as conditions to simplify and reuse custom permission sets. You have seen working examples with services like EC2, and Systems Manager Session Manager. To learn more about ABAC policies, SAML session tags, and how to pass session tags in federation, see IAM tutorial: Use SAML session tags for ABAC and Passing session tags using AssumeRoleWithSAML.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Author

Rodrigo Ferroni

Rodrigo Ferroni is a senior Security Specialist at AWS Enterprise Support. He is certified in CISSP, AWS Security Specialist, and AWS Solutions Architect Associate. He enjoys helping customers to continue adopting AWS security services to improve their security posture in the cloud. Outside of work, he loves to travel as much as he can. In every winter he enjoys snowboarding with his friends.

Use AWS Step Functions to Monitor Services Choreography

Post Syndicated from Vito De Giosa original https://aws.amazon.com/blogs/architecture/use-aws-step-functions-to-monitor-services-choreography/

Organizations frequently need access to quick visual insight on the status of complex workflows. This involves collaboration across different systems. If your customer requires assistance on an order, you need an overview of the fulfillment process, including payment, inventory, dispatching, packaging, and delivery. If your products are expensive assets such as cars, you must track each item’s journey instantly.

Modern applications use event-driven architectures to manage the complexity of system integration at scale. These often use choreography for service collaboration. Instead of directly invoking systems to perform tasks, services interact by exchanging events through a centralized broker. Complex workflows are the result of actions each service initiates in response to events produced by other services. Services do not directly depend on each other. This increases flexibility, development speed, and resilience.

However, choreography can introduce two main challenges for the visibility of your workflow.

  1. It obfuscates the workflow definition. The sequence of events emitted by individual services implicitly defines the workflow. There is no formal statement that describes steps, permitted transitions, and possible failures.
  2. It might be harder to understand the status of workflow executions. Services act independently, based on events. You can implement distributed tracing to collect information related to a single execution across services. However, getting visual insights from traces may require custom applications. This increases time to market (TTM) and cost.

To address these challenges, we will show you how to use AWS Step Functions to model choreographies as state machines. The solution enables stakeholders to gain visual insights on workflow executions, identify failures, and troubleshoot directly from the AWS Management Console.

This GitHub repository provides a Quick Start and examples on how to model choreographies.

Modeling choreographies with Step Functions

Monitoring a choreography requires a formal representation of the distributed system behavior, such as state machines. State machines are mathematical models representing the behavior of systems through states and transitions. States model situations in which the system can operate. Transitions define which input causes a change from the current state to the next. They occur when a new event happens. Figure 1 shows a state machine modeling an order workflow.

Figure 1. Order workflow

Figure 1. Order workflow

The solution in this post uses Amazon State Language to describe a choreography as a Step Functions state machine. The state machine pauses, using Task states combined with a callback integration pattern. It then waits for the next event to be published on the broker. Choice states control transitions to the next state by inspecting event payloads. Figure 2 shows how the workflow in Figure 1 translates to a Step Functions state machine.

Figure 2. Order workflow translated into Step Functions state machine

Figure 2. Order workflow translated into Step Functions state machine

Figure 3 shows the architecture for monitoring choreographies with Step Functions.

Figure 3. Choreography monitoring with AWS Step Functions

Figure 3. Choreography monitoring with AWS Step Functions

  1. Services involved in the choreography publish events to Amazon EventBridge. There are two configured rules. The first rule matches the first event of the choreography sequence, Order Placed in the example. The second rule matches any other event of the sequence. Event payloads contain a correlation id (order_id) to group them by workflow instance.
  2. The first rule invokes an AWS Lambda function, which starts a new execution of the choreography state machine. The correlation id is passed in the name parameter, so you can quickly identify an execution in the AWS Management Console.
  3. The state machine uses Task states with AWS SDK service integrations, to directly call Amazon DynamoDB. Tasks are configured with a callback pattern. They issue a token, which is stored in DynamoDB with the execution name. Then, the workflow pauses.
  4. A service publishes another event on the event bus.
  5. The second rule invokes another Lambda function with the event payload.
  6. The function uses the correlation id to retrieve the task token from DynamoDB.
  7. The function invokes the Step Functions SendTaskSuccess API, with the token and the event payload as parameters.
  8. The state machine resumes the execution and uses Choice states to transition to the next state. If the choreography definition expects the received event payload, it selects the next state and the process will restart from Step # 3. The state machine transitions to a Fail state when it receives an unexpected event.

Increased visibility with Step Functions console

Modeling service choreographies as Step Functions Standard Workflows increases visibility with out-of-the-box features.

1. You can centrally track events produced by distributed components. Step Functions records full execution history for 90 days after the execution completes. You’ll be able to capture detailed information about the input and output of each state, including event payloads. Additionally, state machines integrate with Amazon CloudWatch to publish execution logs and metrics.

2. You can monitor choreographies visually. The Step Functions console displays a list of executions with information such as execution id, status, and start date (see Figure 4).

Figure 4. Step Functions workflow dashboard

Figure 4. Step Functions workflow dashboard

After you’ve selected an execution, a graph inspector is displayed (see Figure 5). It shows states, transitions, and marks individual states with colors. This identifies at a glance, successful tasks, failures, and tasks that are still in progress.

Figure 5. Step Functions graph inspector

Figure 5. Step Functions graph inspector

3. You can implement event-driven automation. Step Functions enables you to capture execution status changes emitting events directly to EventBridge (see Figure 6). Additionally, AWS gives you the ability to emit events by setting alarms on top of metrics. Step Functions publishes these to CloudWatch. You can respond to events by initiating corrective actions, sending notifications, or integrating with third-party solutions, such as issue tracking systems.

Figure 6. Automation with Step Functions, EventBridge, and CloudWatch alarms

Figure 6. Automation with Step Functions, EventBridge, and CloudWatch alarms

Enabling access to AWS Step Functions console

Stakeholders need secure access to the Step Functions console. This requires mechanisms to authenticate users and authorize read-only access to specific Step Functions workflows.

AWS Single Sign-On authenticates users by directly managing identities or through federation. SSO supports federation with Active Directory and SAML 2.0 compliant external identity providers (IdP). Users gain access to Step Functions state machines by assigning a permission set, which is a collection of AWS Identity and Access Management (IAM) policies. Additionally, with permission sets, you can configure a relay state, which is a URL to redirect the user after successful authentication. You can authenticate the user through the selected identity provider and immediately show the AWS Step Functions console with the workflow state machine already displayed. Figure 7 shows this process.

Figure 7. Access to Step Functions state machine with AWS SSO

Figure 7. Access to Step Functions state machine with AWS SSO

  1. The user logs in through the selected identity provider.
  2. The SSO user portal uses the SSO endpoint to send the response from the previous step. SSO uses AWS Security Token Service (STS) to get temporary security credentials on behalf of the user. It then creates a console sign-in URL using those credentials and the relay state. Finally, it sends the URL back as a redirect.
  3. The browser redirects the user to the Step Functions console.

When the identity provider does not support SAML 2.0, SSO is not a viable solution. In this case, you can create a URL with a sign-in token for users to securely access the AWS Management Console. This approach uses STS AssumeRole to get temporary security credentials. Then, it uses credentials to obtain a sign-in token from the AWS federation endpoint. Finally, it constructs a URL for the AWS Management Console, which includes the token. It then distributes this to users to grant access. This is similar to the SSO process. However, it requires custom development.

Conclusion

This post shows how you can increase visibility on choreographed business processes using AWS Step Functions. The solution provides detailed visual insights directly from the AWS Management Console, without requiring custom UI development. This reduces TTM and cost.

To learn more:

Continuous runtime security monitoring with AWS Security Hub and Falco

Post Syndicated from Rajarshi Das original https://aws.amazon.com/blogs/security/continuous-runtime-security-monitoring-with-aws-security-hub-and-falco/

Customers want a single and comprehensive view of the security posture of their workloads. Runtime security event monitoring is important to building secure, operationally excellent, and reliable workloads, especially in environments that run containers and container orchestration platforms. In this blog post, we show you how to use services such as AWS Security Hub and Falco, a Cloud Native Computing Foundation project, to build a continuous runtime security monitoring solution.

With the solution in place, you can collect runtime security findings from multiple AWS accounts running one or more workloads on AWS container orchestration platforms, such as Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). The solution collates the findings across those accounts into a designated account where you can view the security posture across accounts and workloads.

 

Solution overview

Security Hub collects security findings from other AWS services using a standardized AWS Security Findings Format (ASFF). Falco provides the ability to detect security events at runtime for containers. Partner integrations like Falco are also available on Security Hub and use ASFF. Security Hub provides a custom integrations feature using ASFF to enable collection and aggregation of findings that are generated by custom security products.

The solution in this blog post uses AWS FireLens, Amazon CloudWatch Logs, and AWS Lambda to enrich logs from Falco and populate Security Hub.

Figure : Architecture diagram of continuous runtime security monitoring

Figure 1: Architecture diagram of continuous runtime security monitoring

Here’s how the solution works, as shown in Figure 1:

  1. An AWS account is running a workload on Amazon EKS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs act as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon EKS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as a member account for Security Hub.
  2. Another AWS account is running a workload on Amazon ECS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs acts as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon ECS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as another member account for Security Hub.
  3. The designated Security Hub administrator account combines the findings generated by the two member accounts, and then provides a comprehensive view of security alerts and security posture across AWS accounts. If your workloads span multiple regions, Security Hub supports aggregating findings across Regions.

 

Prerequisites

For this walkthrough, you should have the following in place:

  1. Three AWS accounts.

    Note: We recommend three accounts so you can experience Security Hub’s support for a multi-account setup. However, you can use a single AWS account instead to host the Amazon ECS and Amazon EKS workloads, and send findings to Security Hub in the same account. If you are using a single account, skip the following account specific-guidance. If you are integrated with AWS Organizations, the designated Security Hub administrator account will automatically have access to the member accounts.

  2. Security Hub set up with an administrator account on one account.
  3. Security Hub set up with member accounts on two accounts: one account to host the Amazon EKS workload, and one account to host the Amazon ECS workload.
  4. Falco set up on the Amazon EKS and Amazon ECS clusters, with logs routed to CloudWatch Logs using FireLens. For instructions on how to do this, see:

    Important: Take note of the names of the CloudWatch Logs groups, as you will need them in the next section.

  5. AWS Cloud Development Kit (CDK) installed on the member accounts to deploy the solution that provides the custom integration between Falco and Security Hub.

 

Deploying the solution

In this section, you will learn how to deploy the solution and enable the CloudWatch Logs group. Enabling the CloudWatch Logs group is the trigger for running the Lambda function in both member accounts.

To deploy this solution in your own account

  1. Clone the aws-securityhub-falco-ecs-eks-integration GitHub repository by running the following command.
    $git clone https://github.com/aws-samples/aws-securityhub-falco-ecs-eks-integration
  2. Follow the instructions in the README file provided on GitHub to build and deploy the solution. Make sure that you deploy the solution to the accounts hosting the Amazon EKS and Amazon ECS clusters.
  3. Navigate to the AWS Lambda console and confirm that you see the newly created Lambda function. You will use this function in the next section.
Figure : Lambda function for Falco integration with Security Hub

Figure 2: Lambda function for Falco integration with Security Hub

To enable the CloudWatch Logs group

  1. In the AWS Management Console, select the Lambda function shown in Figure 2—AwsSecurityhubFalcoEcsEksln-lambdafunction—and then, on the Function overview screen, select + Add trigger.
  2. On the Add trigger screen, provide the following information and then select Add, as shown in Figure 3.
    • Trigger configuration – From the drop-down, select CloudWatch logs.
    • Log group – Choose the Log group you noted in Step 4 of the Prerequisites. In our setup, the log group for the Amazon ECS and Amazon EKS clusters, deployed in separate AWS accounts, was set with the same value (falco).
    • Filter name – Provide a name for the filter. In our example, we used the name falco.
    • Filter pattern – optional – Leave this field blank.
    Figure 3: Lambda function trigger - CloudWatch Log group

    Figure 3: Lambda function trigger – CloudWatch Log group

  3. Repeat these steps (as applicable) to set up the trigger for the Lambda function deployed in other accounts.

 

Testing the deployment

Now that you’ve deployed the solution, you will verify that it’s working.

With the default rules, Falco generates alerts for activities such as:

  • An attempt to write to a file below the /etc folder. The /etc folder contains important system configuration files.
  • An attempt to open a sensitive file (such as /etc/shadow) for reading.

To test your deployment, you will attempt to perform these activities to generate Falco alerts that are reported as Security Hub findings in the same account. Then you will review the findings.

To test the deployment in member account 1

  1. Run the following commands to trigger an alert in member account 1, which is running an Amazon EKS cluster. Replace <container_name> with your own value.
    kubectl exec -it <container_name> /bin/bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. To see the list of findings, log in to your Security Hub admin account and navigate to Security Hub > Findings. As shown in Figure 4, you will see the alerts generated by Falco, including the Falco-generated title, and the instance where the alert was triggered.

    Figure 4: Findings in Security Hub

    Figure 4: Findings in Security Hub

  3. To see more detail about a finding, check the box next to the finding. Figure 5 shows some of the details for the finding Read sensitive file untrusted.
    Figure 5: Sensitive file read finding - detail view

    Figure 5: Sensitive file read finding – detail view

    Figure 6 shows the Resources section of this finding, that includes the instance ID of the Amazon EKS cluster node. In our example this is the Amazon Elastic Compute Cloud (Amazon EC2) instance.

    Figure 6: Resource Detail in Security Hub finding

To test the deployment in member account 2

  1. Run the following commands to trigger a Falco alert in member account 2, which is running an Amazon ECS cluster. Replace <<container_id> with your own value.
    docker exec -it <container_id> bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. As in the preceding example with member account 1, to view the findings related to this alert, navigate to your Security Hub admin account and select Findings.

To view the collated findings from both member accounts in Security Hub

  1. In the designated Security Hub administrator account, navigate to Security Hub > Findings. The findings from both member accounts are collated in the designated Security Hub administrator account. You can use this centralized account to view the security posture across accounts and workloads. Figure 7 shows two findings, one from each member account, viewable in the Single Pane of Glass administrator account.

    Figure 7: Write below /etc findings in a single view

    Figure 7: Write below /etc findings in a single view

  2. To see more information and a link to the corresponding member account where the finding was generated, check the box next to the finding. Figure 8 shows the account detail associated with a specific finding in member account 1.
    Figure 8: Write under /etc detail view in Security Hub admin account

    Figure 8: Write under /etc detail view in Security Hub admin account

    By centralizing and enriching the findings from Falco, you can take action more quickly or perform automated remediation on the impacted resources.

 

Cleaning up

To clean up this demo:

  1. Delete the CloudWatch Logs trigger from the Lambda functions that were created in the section To enable the CloudWatch Logs group.
  2. Delete the Lambda functions by deleting the CloudFormation stack, created in the section To deploy this solution in your own account.
  3. Delete the Amazon EKS and Amazon ECS clusters created as part of the Prerequisites.

 

Conclusion

In this post, you learned how to achieve multi-account continuous runtime security monitoring for container-based workloads running on Amazon EKS and Amazon ECS. This is achieved by creating a custom integration between Falco and Security Hub.

You can extend this solution in a number of ways. For example:

  • You can forward findings across accounts using a single source to security information and event management (SIEM) tools such as Splunk.
  • You can perform automated remediation activities based on the findings generated, using Lambda.

To learn more about managing a centralized Security Hub administrator account, see Managing administrator and member accounts. To learn more about working with ASFF, see AWS Security Finding Format (ASFF) in the documentation. To learn more about the Falco engine and rule structure, see the Falco documentation.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rajarshi Das

Rajarshi Das

Rajarshi is a Solutions Architect at Amazon Web Services. He focuses on helping Public Sector customers accelerate their security and compliance certifications and authorizations by architecting secure and scalable solutions. Rajarshi holds 4 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Author

Adam Cerini

Adam is a Senior Solutions Architect with Amazon Web Services. He focuses on helping Public Sector customers architect scalable, secure, and cost effective systems. Adam holds 5 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

How to set up Amazon Quicksight dashboard for Amazon Pinpoint and Amazon SES engagement events

Post Syndicated from satyaso original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-set-up-amazon-quicksight-dashboard-for-amazon-pinpoint-and-amazon-ses-events/

In this post, we will walk through using Amazon Pinpoint and Amazon Quicksight to create customizable messaging campaign reports. Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service that allows customers to connect with users over channels like email, SMS, push, or voice. Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. This solution allows event and user data from Amazon Pinpoint to flow into Amazon Quicksight. Once in Quicksight, customers can build their own reports that shows campaign performance on a more granular level.

Engagement Event Dashboard

Customers want to view the results of their messaging campaigns in ever increasing levels of granularity and ensure their users see value from the email, SMS or push notifications they receive. Customers also want to analyze how different user segments respond to different messages, and how to optimize subsequent user communication. Previously, customers could only view this data in Amazon Pinpoint analytics, which offers robust reporting on: events, funnels, and campaigns. However, does not allow analysis across these different parameters and the building of custom reports. For example, show campaign revenue across different user segments, or show what events were generated after a user viewed a campaign in a funnel analysis. Customers would need to extract this data themselves and do the analysis in excel.

Prerequisites

  • Digital user engagement event database solution must be setup at 1st.
  • Customers should be prepared to purchase Amazon Quicksight because it has its own set of costs which is not covered within Amazon Pinpoint cost.

Solution Overview

This Solution uses the Athena tables created by Digital user engagement events database solution. The AWS CloudFormation template given in this post automatically sets up the different architecture components, to capture detailed notifications about Amazon Pinpoint engagement events and log those in Amazon Athena in the form of Athena views. You still need to manually configure Amazon Quicksight dashboards to link to these newly generated Athena views. Please follow the steps below in order for further information.

Use case(s)

Event dashboard solutions have following use cases: –

  • Deep dive into engagement insights. (eg: SMS events, Email events, Campaign events, Journey events)
  • The ability to view engagement events at the individual user level.
  • Data/process mining turn raw event data into useful marking insights.
  • User engagement benchmarking and end user event funneling.
  • Compute campaign conversions (post campaign user analysis to show campaign effectiveness)
  • Build funnels that shows user progression.

Getting started with solution deployment

Prerequisite tasks to be completed before deploying the logging solution

Step 1 – Create AWS account, Pinpoint Project, Implement Event-Database-Solution.
As part of this step customers need to implement DUE Event database solution as the current solution (DUE event dashboard) is an extension of DUE event database solution. The basic assumption here is that the customer has already configured Amazon Pinpoint project or Amazon SES within the required AWS region before implementing this step.

The steps required to implement an event dashboard solution are as follows.

a/Follow the steps mentioned in Event database solution to implement the complete stack. Prior installing the complete stack copy and save the name Athena events database name as shown in the diagram. For my case it is due_eventdb. Database name is required as an input parameter for the current Event Dashboard solution.

b/Once the solution is deployed, navigate to the output page of the cloud formation stack, and copy, and save the following information, which will be required as input parameters in step 2 of the current Event Dashboard solution.

Step 2 – Deploy Cloud formation template for Event dashboard solution
This step generates a number of new Amazon Athena views that will serve as a data source for Amazon Quicksight. Continue with the following actions.

  • Download the cloud formation template(“Event-dashboard.yaml”) from AWS samples.
  • Navigate to Cloud formation page in AWS console, click up right on “Create stack” and select the option “With new resources (standard)”
  • Leave the “Prerequisite – Prepare template” to “Template is ready” and for the “Specify template” option, select “Upload a template file”. On the same page, click on “Choose file”, browse to find the file “Event-dashboard.yaml” file and select it. Once the file is uploaded, click “Next” and deploy the stack.

  • Enter following information under the section “Specify stack details”:
    • EventAthenaDatabaseName – As mentioned in Step 1-a.
    • S3DataLogBucket- As mentioned in Step 1-b
    • This solution will create additional 5 Athena views which are
      • All_email_events
      • All_SMS_events
      • All_custom_events (Custom events can be Mobile app/WebApp/Push Events)
      • All_campaign_events
      • All_journey_events

Step 3 – Create Amazon Quicksight engagement Dashboard
This step walks you through the process of creating an Amazon Quicksight dashboard for Amazon Pinpoint engagement events using the Athena views you created in step-2

  1. To Setup Amazon Quicksight for the 1st time please follow this link (this process is not needed if you have already setup Amazon Quicksight). Please make sure you are an Amazon Quicksight Administrator.
  2. Go/search Amazon Quicksight on AWS console.
  3. Create New Analysis and then select “New dataset”
  4. Select Athena as data source
  5. As a next step, you need to select what all analysis you need for respective events. This solution provides option to create 5 different set of analysis as mentioned in Step 2. They are a/All email events, b/All SMS Events, c/All Custom Events (Mobile/Web App, web push etc), d/ All Campaign events, e/All Journey events. Dashboard can be created from Quicksight analysis and same can be shared among the organization stake holders. Following are the steps to create analysis and dashboards for different type of events.
  6. Email Events –
    • For all email events, name the analysis “All-emails-events” (this can be any kind of customer preferred nomenclature), select Athena workgroup as primary, and then create a data source.
    • Once you create the data source Quicksight lists all the views and tables available under the specified database (in our case it is:-  due_eventdb). Select the email_all_events view as data source.
    • Select the event data location for analysis. There are mainly two options available which are a/ Import to Spice quicker analysis b/ Directly query your data. Please select the preferred options and then click on “visualize the data”.
    • Import to Spice quicker analysis – SPICE is the Amazon QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. In Enterprise edition, data stored in SPICE is encrypted at rest. (1 GB of storage is available for free for extra storage customer need to pay extra, please refer cost section in this document )
    • Directly query your data – This process enables Quicksight to query directly to the Athena or source database (In the current case it is Athena) and Quicksight will not store any data.
    • Now that you have selected a data source, you will be taken to a blank quick sight canvas (Blank analysis page) as shown in the following Image, please drag and drop what visualization type you need to visualize onto the auto-graph pane. Please note that Amazon QuickSight is a Busines intelligence platform, so customers are free to choose the desired visualization types to observe the individual engagement events.
    • As part of this blog, we have displayed how to create some simple analysis graphs to visualize the engagement events.
    • As an initial step please Select tabular Visualization as shown in the Image.
    • Select all the event dimensions that you want to put it as part of the Table in X axis. Amazon Quicksight table can be extended to show as many as tables columns, this completely depends upon the business requirement how much data marketers want to visualize.
    • Further filtering on the table can be done using Quicksight filters, you can apply the filter on specific granular values to enable further filtering. For Eg – If you want to apply filtering on the destination email Id then 1/Select the filter from left hand menu 2/Add destination field as the filtering criterion 3/ Tick on the destination field you are trying to filter or search for the Destination email ID that 4/ All the result in the table gets further filtered as per the filter criterion
    • As a next step please add another visual from top left corner “Add -> Add Visual”, then select the Donut Chart from Visual types pane. Donut charts are always used for displaying aggregation.
    • Then select the “event_type” as the Group to visualize the aggregated events, this helps marketers/business users to figure out how many email events occurred and what are the aggregated success ratio, click ratio, complain ratio or bounce ratio etc for the emails/Campaign that’s sent to end users.
    • To create a Quicksight dashboards from the Quicksight analysis click Share menu option at the top right corner then select publish dashboard”. Provide required dashboard name while publishing the dashboard”. Same dashboard can be shared with multiple audiences in the Organization.
    • Following is the final version of the dashboard. As mentioned above Quicksight dashboards can be shared with other stakeholders and also complete dashboard can be exported as excel sheet.
  7. SMS Events-
    • As shown above SMS events can be analyzed using Quicksight and dash boards can be created out of the analysis. Please repeat all of the sub-steps listed in step 6. Following is a sample SMS dashboard.
  8. Custom Events-
    • After you integrate your application (app) with Amazon Pinpoint, Amazon Pinpoint can stream event data about user activity, different type custom events, and message deliveries for the app. Eg :- Session.start, Product_page_view, _session.stop etc. Do repeat all of the sub-steps listed in step 6 create a custom event dashboards.
  9. Campaign events
    • As shown before campaign also can be included in the same dashboard or you can create new dashboard only for campaign events.

Cost for Event dashboard solution
You are responsible for the cost of the AWS services used while running this solution. As of the date of publication, the cost for running this solution with default settings in the US West (Oregon) Region is approximately $65 a month. The cost estimate includes the cost of AWS Lambda, Amazon Athena, Amazon Quicksight. The estimate assumes querying 1TB of data in a month, and two authors managing Amazon Quicksight every month, four Amazon Quicksight readers witnessing the events dashboard unlimited times in a month, and a Quicksight spice capacity is 50 GB per month. Prices are subject to change. For full details, see the pricing webpage for each AWS service you will be using in this solution.

Clean up

When you’re done with this exercise, complete the following steps to delete your resources and stop incurring costs:

  1. On the CloudFormation console, select your stack and choose Delete. This cleans up all the resources created by the stack,
  2. Delete the Amazon Quicksight Dashboards and data sets that you have created.

Conclusion

In this blog post, I have demonstrated how marketers, business users, and business analysts can utilize Amazon Quicksight dashboards to evaluate and exploit user engagement data from Amazon SES and Pinpoint event streams. Customers can also utilize this solution to understand how Amazon Pinpoint campaigns lead to business conversions, in addition to analyzing multi-channel communication metrics at the individual user level.

Next steps

The personas for this blog are both the tech team and the marketing analyst team, as it involves a code deployment to create very simple Athena views, as well as the steps to create an Amazon Quicksight dashboard to analyse Amazon SES and Amazon Pinpoint engagement events at the individual user level. Customers may then create their own Amazon Quicksight dashboards to illustrate the conversion ratio and propensity trends in real time by integrating campaign events with app-level events such as purchase conversions, order placement, and so on.

Extending the solution

You can download the AWS Cloudformation templates, code for this solution from our public GitHub repository and modify it to fit your needs.


About the Author


Satyasovan Tripathy works at Amazon Web Services as a Senior Specialist Solution Architect. He is based in Bengaluru, India, and specialises on the AWS Digital User Engagement product portfolio. He likes reading and travelling outside of work.

How to set up Amazon Cognito for federated authentication using Azure AD

Post Syndicated from Ratan Kumar original https://aws.amazon.com/blogs/security/how-to-set-up-amazon-cognito-for-federated-authentication-using-azure-ad/

In this blog post, I’ll walk you through the steps to integrate Azure AD as a federated identity provider in Amazon Cognito user pool. A user pool is a user directory in Amazon Cognito that provides sign-up and sign-in options for your app users.

Identity management and authentication flow can be challenging when you need to support requirements such as OAuth, social authentication, and login using a Security Assertion Markup Language (SAML) 2.0 based identity provider (IdP) to meet your enterprise identity management requirements. Amazon Cognito provides you a managed, scalable user directory, user sign-up and sign-in, and federation through third-party identity providers. An added benefit for developers is that it provides you a standardized set of tokens (Identity, Access and Refresh Token). So, in situations when you have to support authentication with multiple identity providers (e.g. Social authentication, SAML IdP, etc.), you don’t have to write code for handling different tokens issued by different identity providers. Instead, you can just work with a consistent set of tokens issued by Amazon Cognito user pool.
 

Figure 1: High-level architecture for federated authentication in a web or mobile app

Figure 1: High-level architecture for federated authentication in a web or mobile app

As shown in Figure 1, the high-level application architecture of a serverless app with federated authentication typically involves following steps:

  1. User selects their preferred IdP to authenticate.
  2. User gets re-directed to the federated IdP for login. On successful authentication, the IdP posts back a SAML assertion or token containing user’s identity details to an Amazon Cognito user pool.
  3. Amazon Cognito user pool issues a set of tokens to the application
  4. Application can use the token issued by the Amazon Cognito user pool for authorized access to APIs protected by Amazon API Gateway.

To learn more about the authentication flow with SAML federation, see the blog post Building ADFS Federation for your Web App using Amazon Cognito User Pools.

Step-by-step instructions for enabling Azure AD as federated identity provider in an Amazon Cognito user pool

This post will walk you through the following steps:

  1. Create an Amazon Cognito user pool
  2. Add Amazon Cognito as an enterprise application in Azure AD
  3. Add Azure AD as SAML identity provider (IDP) in Amazon Cognito
  4. Create an app client and use the newly created SAML IDP for Azure AD

Prerequisites

You’ll need to have administrative access to Azure AD, an AWS account and the AWS Command Line Interface (AWS CLI) installed on your machine. Follow the instructions for installing, updating, and uninstalling the AWS CLI version 2; and then to configure your installation, follow the instructions for configuring the AWS CLI. If you don’t want to install AWS CLI, you can also run these commands from AWS CloudShell which provides a browser-based shell to securely manage, explore, and interact with your AWS resources.

Step 1: Create an Amazon Cognito user pool

The procedures in this post use the AWS CLI, but you can also follow the instructions to use the AWS Management Console to create a new user pool.

To create a user pool in the AWS CLI

  1. Use the following command to create a user pool with default settings. Be sure to replace <yourUserPoolName> with the name you want to use for your user pool.
    aws cognito-idp create-user-pool \
    --pool-name <yourUserPoolName>
    

    You should see an output containing number of details about the newly created user pool.

  2. Copy the value of user pool ID, in this example, ap-southeast-2_xx0xXxXXX. You will need this value for the next steps.
    "UserPool": {
            "Id": "ap-southeast-2_xx0xXxXXX",
            "Name": "example-corp-prd-userpool"
           "Policies": { …
    

Add a domain name to user pool

One of the many useful features of Amazon Cognito is hosted UI which provides a configurable web interface for user sign in. Hosted UI is accessible from a domain name that needs to be added to the user pool. There are two options for adding a domain name to a user pool. You can either use an Amazon Cognito domain, or a domain name that you own. This solution uses an Amazon Cognito domain, which will look like the following:

https://<yourDomainPrefix>.auth.<aws-region>.amazoncognito.com

To add a domain name to user pool

  1. Use following CLI command to add an Amazon Cognito domain to the user pool. Replace <yourDomainPrefix> with a unique domain name prefix (for example example-corp-prd). Note that you cannot use keywords aws, amazon, or cognito for domain prefix.
    aws cognito-idp create-user-pool-domain \
    --domain <yourDomainPrefix> \
    --user-pool-id <yourUserPoolID>
    

Prepare information for Azure AD setup

Next, you prepare Identifier (Entity ID) and Reply URL, which are required to add Amazon Cognito as an enterprise application in Azure AD (done in Step 2 below). Azure AD expects these values in a very specific format. In a text editor, note down your values for Identifier (Entity ID) and Reply URL according to the following formats:

  • For Identifier (Entity ID) the format is:
    urn:amazon:cognito:sp:<yourUserPoolID>
    

    For example:

    urn:amazon:cognito:sp:ap-southeast-2_nYYYyyYyYy
    

  • For Reply URL the format is:
    https://<yourDomainPrefix>.auth.<aws-region>.amazoncognito.com/saml2/idpresponse
    

    For example:

    https://example-corp-prd.auth.ap-southeast-2.amazoncognito.com/saml2/idpresponse
    

    Note: The Reply URL is the endpoint where Azure AD will send SAML assertion to Amazon Cognito during the process of user authentication.

Update the placeholders above with your values (without < >), and then note the values of Identifier (Entity ID) and Reply URL in a text editor for future reference.

For more information, see Adding SAML Identity Providers to a User Pool in the Amazon Cognito Developer Guide.

Step 2: Add Amazon Cognito as an enterprise application in Azure AD

In this step, you add an Amazon Cognito user pool as an application in Azure AD, to establish a trust relationship between them.

To add new application in Azure AD

  1. Log in to the Azure Portal.
  2. In the Azure Services section, choose Azure Active Directory.
  3. In the left sidebar, choose Enterprise applications.
  4. Choose New application.
  5. On the Browse Azure AD Gallery page, choose Create your own application.
  6. Under What’s the name of your app?, enter a name for your application and select Integrate any other application you don’t find in the gallery (Non-gallery), as shown in Figure 2. Choose Create.
     
    Figure 2: Add an enterprise app in Azure AD

    Figure 2: Add an enterprise app in Azure AD

It will take few seconds for the application to be created in Azure AD, then you should be redirected to the Overview page for the newly added application.

Note: Occasionally, this step can result in a Not Found error, even though Azure AD has successfully created a new application. If that happens, in Azure AD navigate back to Enterprise applications and search for your application by name.

To set up Single Sign-on using SAML

  1. On the Getting started page, in the Set up single sign on tile, choose Get started, as shown in Figure 3.
     
    Figure 3: Application configuration page in Azure AD

    Figure 3: Application configuration page in Azure AD

  2. On the next screen, select SAML.
  3. In the middle pane under Set up Single Sign-On with SAML, in the Basic SAML Configuration section, choose the edit icon ().
  4. In the right pane under Basic SAML Configuration, replace the default Identifier ID (Entity ID) with the Identifier (Entity ID) you copied previously. In the Reply URL (Assertion Consumer Service URL) field, enter the Reply URL you copied previously, as shown in Figure 4. Choose Save.
     
    Figure 4: Azure AD SAML-based Sign-on setup

    Figure 4: Azure AD SAML-based Sign-on setup

  5. In the middle pane under Set up Single Sign-On with SAML, in the User Attributes & Claims section, choose Edit.
  6. Choose Add a group claim.
  7. On the User Attributes & Claims page, in the right pane under Group Claims, select Groups assigned to the application, leave Source attribute as Group ID, as shown in Figure 5. Choose Save.
     
    Figure 5: Option to select group claims to release to Amazon Cognito

    Figure 5: Option to select group claims to release to Amazon Cognito

    This adds the group claim so that Amazon Cognito can receive the group membership detail of the authenticated user as part of the SAML assertion.

  8. In a text editor, note down the Claim names under Additional claims, as shown in Figure 5. You’ll need these when creating attribute mapping in Amazon Cognito.
  9. Close the User Attributes & Claims screen by choosing the X in the top right corner. You’ll be redirected to the Set up Single Sign-on with SAML page.
  10. Scroll down to the SAML Signing Certificate section, and copy the App Federation Metadata Url by choosing the copy into clipboard icon (highlighted with red arrow in Figure 6). Keep this URL in a text editor, as you’ll need it in the next step.
     
    Figure 6: Copy SAML metadata URL from Azure AD

    Figure 6: Copy SAML metadata URL from Azure AD

Step 3: Add Azure AD as SAML IDP in Amazon Cognito

Next, you need an attribute in the Amazon Cognito user pool where group membership details from Azure AD can be received, and add Azure AD as an identity provider.

To add custom attribute to user pool and add Azure AD as an identity provider

  1. Use the following CLI command to add a custom attribute to the user pool. Replace <yourUserPoolID> and <customAttributeName> with your own values.
    aws cognito-idp add-custom-attributes \
    --user-pool-id <yourUserPoolID> \
    --custom-attributes Name=<customAttributeName>,AttributeDataType="String"
    

    If the command succeeds, you’ll not see any output.

  2. Use the following CLI command to add Azure AD as an identity provider. Be sure to replace the following with your own values:
    • Replace <yourUserPoolID> with Amazon Cognito user pool ID copied previously.
    • Replace <IDProviderName> with a name for your identity provider (for example, Example-Corp-IDP).
    • Replace <MetadataURLCopiedFromAzureAD> with the Metadata URL copied from Azure AD.
    • Replace <customAttributeName> with custom attribute name created previously.
    aws cognito-idp create-identity-provider \
    --user-pool-id <yourUserPoolID> \
    --provider-name=<IDProviderName> \
    --provider-type SAML \
    --provider-details MetadataURL=<MetadataURLCopiedFromAzureAD> \
    --attribute-mapping email=http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress,<customAttributeName>=http://schemas.microsoft.com/ws/2008/06/identity/claims/groups
    

    Successful running of this command adds Azure AD as a SAML IDP to your Amazon Cognito user pool.

Step 4: Create an app client and use the newly created SAML IDP for Azure AD

Before you can use Amazon Cognito in your web application, you need to register your app with Amazon Cognito as an app client. An app client is an entity within an Amazon Cognito user pool that has permission to call unauthenticated API operations (operations that do not require an authenticated user), for example to register, sign in, and handle forgotten passwords.

To create an app client

  1. Use following command to create an app client. Be sure to replace the following with your own values:
    • Replace <yourUserPoolID> with the Amazon Cognito user pool ID created previously.
    • Replace <yourAppClientName> with a name for your app client.
    • Replace <callbackURL> with the URL of your web application that will receive the authorization code. It must be an HTTPS endpoint, except for in a local development environment where you can use http://localhost:PORT_NUMBER.
    • Use parameter –allowed-o-auth-flows for allowed OAuth flows that you want to enable. In this example, we use code for Authorization code grant.
    • Use parameter –allowed-o-auth-scopes to specify which OAuth scopes (such as phone, email, openid) Amazon Cognito will include in the tokens. In this example, we use openid.
    • Replace <IDProviderName> with the same name you used for ID provider previously.
    aws cognito-idp create-user-pool-client \
    --user-pool-id <yourUserPoolID> \
    --client-name <yourAppClientName> \
    --no-generate-secret \
    --callback-urls <callbackURL> \
    --allowed-o-auth-flows code \
    --allowed-o-auth-scopes openid email\
    --supported-identity-providers <IDProviderName> \
    --allowed-o-auth-flows-user-pool-client
    

Successful running of this command will provide an output in following format. In a text editor, note down the ClientId for referencing in the web application. In this following example, the ClientId is 7xyxyxyxyxyxyxyxyxyxy.

{
    "UserPoolClient": {
        "UserPoolId": "ap-southeast-2_xYYYYYYY",
        "ClientName": "my-client-name",
        "ClientId": "7xyxyxyxyxyxyxyxyxyxy",
        "LastModifiedDate": "2021-05-04T17:33:32.936000+12:00",
        "CreationDate": "2021-05-04T17:33:32.936000+12:00",
        "RefreshTokenValidity": 30,
        "SupportedIdentityProviders": [
            "Azure-AD"
        ],
        "CallbackURLs": [
            "http://localhost:3030"
        ],
        "AllowedOAuthFlows": [
            "code"
        ],
        "AllowedOAuthScopes": [
            "openid", "email"
        ],
        "AllowedOAuthFlowsUserPoolClient": true
    }
}

Test the setup

Next, do a quick test to check if everything is configured properly.

  1. Open the Amazon Cognito console.
  2. Choose Manage User Pools, then choose the user pool you created in Step 1: Create an Amazon Cognito user pool.
  3. In the left sidebar, choose App client settings, then look for the app client you created in Step 4: Create an app client and use the newly created SAML IDP for Azure AD. Scroll to the Hosted UI section and choose Launch Hosted UI, as shown in Figure 7.
     
    Figure 7: App client settings showing link to access Hosted UI

    Figure 7: App client settings showing link to access Hosted UI

  4. On the sign-in page as shown in Figure 8, you should see all the IdPs that you enabled on the app client. Choose the Azure-AD button, which redirects you to the sign-in page hosted on https://login.microsoftonline.com/.
     
    Figure 8: Amazon Cognito hosted UI

    Figure 8: Amazon Cognito hosted UI

  5. Sign in using your corporate ID. If everything is working properly, you should be redirected back to the callback URL after successful authentication.

(Optional) Add authentication to a single page application

One way to add secure authentication using Amazon Cognito into a single page application (SPA) is to use the Auth.federatedSignIn() method of Auth class from AWS Amplify. AWS Amplify provides SDKs to integrate your web or mobile app with a growing list of AWS services, including integration with Amazon Cognito user pool. The federatedSign() method will render the hosted UI that gives users the option to sign in with the identity providers that you enabled on the app client (in Step 4), as shown in Figure 8. One advantage of hosted UI is that you don’t have to write any code for rendering it. Additionally, it will transparently implement the Authorization code grant with PKCE and securely provide your client-side application with the tokens (ID, Access and Refresh) that are required to access the backend APIs.

For a sample web application and instructions to connect it with Amazon Cognito authentication, see the aws-amplify-oidc-federation GitHub repository.

Conclusion

In this blog post, you learned how to integrate an Amazon Cognito user pool with Azure AD as an external SAML identity provider, to allow your users to use their corporate ID to sign in to web or mobile applications.

For more information about this solution, see our video Integrating Amazon Cognito with Azure Active Directory (from timestamp 25:26) on the official AWS twitch channel. In the video, you’ll find an end-to-end demo of how to integrate Amazon Cognito with Azure AD, and then how to use AWS Amplify SDK to add authentication to a simple React app (using the example of a pet store). The video also includes how you can access group membership details from Azure AD for authorization and fine-grained access control.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ratan Kumar

Ratan is a solutions architect based out of Auckland, New Zealand. He works with large enterprise customers helping them design and build secure, cost-effective, and reliable internet scale applications using the AWS cloud. He is passionate about technology and likes sharing knowledge through blog posts and twitch sessions.

Author

Vishwanatha Nayak

Vish is a solutions architect at AWS. He engages with customers to create innovative solutions that are secure, reliable, and cost optimised to address business problems and accelerate the adoption of AWS services. He has over 15 years of experience in various software development, consulting, and architecture roles.

Scale Up Language Detection with Amazon Comprehend and S3 Batch Operations

Post Syndicated from Ameer Hakme original https://aws.amazon.com/blogs/architecture/scale-up-language-detection-with-amazon-comprehend-and-s3-batch-operations/

Organizations have been collecting text data for years. Text data can help you intelligently address a range of challenges, from customer experience to analytics. These mixed language, unstructured datasets can contain a wealth of information within business documents, emails, and webpages. If you’re able to process and interpret it, this information can provide insight that can help guide your business decisions.

Amazon Comprehend is a natural language processing (NLP) service that extracts insights from text datasets. Amazon Comprehend asynchronous batch operations provides organizations with the ability to detect dominant languages from text documents stored in Amazon Simple Storage Service (S3) buckets. The asynchronous operations support a maximum document size of 1 MB for language detection. They can process up to one million documents per batch, for a total size of 5 GB.

But what if your organization has millions, or even billions of documents stored in an S3 bucket waiting for language detection processing? What if your language detection process requires customization to let you organize your documents based on language? What if you need to create a search index that can help you quickly audit your text document repositories?

In this blog post, we walk through a solution using Amazon S3 Batch Operations to initiate language detection jobs with AWS Lambda and Amazon Comprehend.

Real world language detection solution architecture

In our example, we have tens of millions of text objects stored in a single S3 bucket. These need to be processed to detect the dominant language. To create a language detection job, we must supply the S3 Batch Operations with a manifest file that lists all text objects. We can use an Amazon S3 Inventory report as an input to the manifest file to create S3 bucket object lists.

One of the supported S3 Batch Operations is invoking an AWS Lambda function. The S3 Batch Operations job uses LambdaInvoke to run a Lambda function on every object listed in a manifest. Lambda jobs are subject to overall Lambda concurrency limits for the account and each Lambda invocation will have a defined runtime. Organizations can request a service quota increase if necessary. Lambda functions in a single AWS account and in one Region share the concurrency limit. You can set reserved capacity for Lambda functions to ensure that they can be invoked even when overall capacity has been exhausted.

The Lambda function can be customized to take further actions based on the output received from Amazon Comprehend. The following diagram shows an architecture for language detection with S3 Batch Operations and Amazon Comprehend.

Figure 1. Language detection with S3 Batch Operations and Amazon Comprehend

Figure 1. Language detection with S3 Batch Operations and Amazon Comprehend

Here is the architecture flow, as shown in Figure 1:

  1. S3 Batch Operations will pull the manifest file from the source S3 bucket.
  2. The S3 Batch Operations job will invoke the language detection Lambda function for each object listed in the manifest file. Lambda function code will perform a preliminary scan to check the file size, file extension, or any other requirements before calling Amazon Comprehend API. The Lambda function will then read the text object from S3 and then call the Amazon Comprehend API to detect the dominant language.
  3. The Language Detection API automatically identifies text written in over 100 languages. The API response contains the dominant language with a confidence score supporting the interpretation. An example API response would be: {‘LanguageCode’: ‘fr’, ‘Score’: 0.9888556003570557}. Once the Lambda function receives the API response, Lambda will return a message back to S3 Batch Operations with a result code.
  4. The Lambda function will then publish a message to an Amazon Simple Notification Service (SNS) topic.
  5. An Amazon Simple Queue Service (SQS) queue subscribed to the SNS topic will receive the message with all required information related to each processed text object.
  6. The SQS queue will invoke a Lambda function to process the message.
  7. The Lambda function will move the targeted S3 object to a destination S3 bucket.
  8. S3 Batch Operations will generate a completion report and will store it in an S3 bucket. The completion report will contain additional information for each task, including the object key name and version, status, error codes, and error descriptions.

Leverage SNS fanout pattern for more complex use cases

This blog post describes the basic building blocks for the solution, but it can be extended for more complex use cases, as illustrated in Figure 2. Using an SNS fanout application integration pattern would enable many SQS queues to subscribe to the same SNS topic. These SQS queues would receive identical notifications for the processed text objects, and you could implement downstream services for additional evaluation. For example, you can store text object metadata in an Amazon DynamoDB table. You can further analyze the number of processed text objects, dominant languages, object size, word count, and more.

Your source S3 bucket may have objects being uploaded in real time in addition to the existing batch processes. In this case, you could process these objects in a new batch job, or process them individually during upload by using S3 event triggers and Lambda.

Figure 2. Extending the solution

Figure 2. Extending the solution

Conclusion

You can implement a language detection job in a number of ways. All the Amazon Comprehend single document and synchronous API batch operations can be used for real-time analysis. Asynchronous batch operations can analyze large documents and large collections of documents. However, by using S3 Batch Operations, you can scale language detection batch operations to billions of text objects stored in S3. This solution has the flexibility to add customized functionality. This may be useful for more complex jobs, or when you want to capture different data points from your S3 objects.

For further reading:

Managing temporary elevated access to your AWS environment

Post Syndicated from James Greenwood original https://aws.amazon.com/blogs/security/managing-temporary-elevated-access-to-your-aws-environment/

In this post you’ll learn about temporary elevated access and how it can mitigate risks relating to human access to your AWS environment. You’ll also be able to download a minimal reference implementation and use it as a starting point to build a temporary elevated access solution tailored for your organization.

Introduction

While many modern cloud architectures aim to eliminate the need for human access, there often remain at least some cases where it is required. For example, unexpected issues might require human intervention to diagnose or fix, or you might deploy legacy technologies into your AWS environment that someone needs to configure manually.

AWS provides a rich set of tools and capabilities for managing access. Users can authenticate with multi-factor authentication (MFA), federate using an external identity provider, and obtain temporary credentials with limited permissions. AWS Identity and Access Management (IAM) provides fine-grained access control, and AWS Single Sign-On (AWS SSO) makes it easy to manage access across your entire organization using AWS Organizations.

For higher-risk human access scenarios, your organization can supplement your baseline access controls by implementing temporary elevated access.

What is temporary elevated access?

The goal of temporary elevated access is to ensure that each time a user invokes access, there is an appropriate business reason for doing so. For example, an appropriate business reason might be to fix a specific issue or deploy a planned change.

Traditional access control systems require users to be authenticated and authorized before they can access a protected resource. Becoming authorized is typically a one-time event, and a user’s authorization status is reviewed periodically—for example as part of an access recertification process.

With persistent access, also known as standing access, a user who is authenticated and authorized can invoke access at any time just by navigating to a protected resource. The process of invoking access does not consider the reason why they are invoking it on each occurrence. Today, persistent access is the model that AWS Single Sign-On supports, and is the most common model used for IAM users and federated users.

With temporary elevated access, also known as just-in-time access, users must be authenticated and authorized as before—but furthermore, each time a user invokes access an additional process takes place, whose purpose is to identify and record the business reason for invoking access on this specific occasion. The process might involve additional human actors or it might use automation. When the process completes, the user is only granted access if the business reason is appropriate, and the scope and duration of their access is aligned to the business reason.

Why use temporary elevated access?

You can use temporary elevated access to mitigate risks related to human access scenarios that your organization considers high risk. Access generally incurs risk when two elements come together: high levels of privilege, such as ability to change configuration, modify permissions, read data, or update data; and high-value resources, such as production environments, critical services, or sensitive data. You can use these factors to define a risk threshold, above which you enforce temporary elevated access, and below which you continue to allow persistent access.

Your motivation for implementing temporary elevated access might be internal, based on your organization’s risk appetite; or external, such as regulatory requirements applicable to your industry. If your organization has regulatory requirements, you are responsible for interpreting those requirements and determining whether a temporary elevated access solution is required, and how it should operate.

Regardless of the source of requirement, the overall goal is to reduce risk.

Important: While temporary elevated access can reduce risk, the preferred approach is always to automate your way out of needing human access in the first place. Aim to use temporary elevated access only for infrequent activities that cannot yet be automated. From a risk perspective, the best kind of human access is the kind that doesn’t happen at all.

The AWS Well-Architected Framework provides guidance on using automation to reduce the need for human user access:

How can temporary elevated access help reduce risk?

In scenarios that require human intervention, temporary elevated access can help manage the risks involved. It’s important to understand that temporary elevated access does not replace your standard access control and other security processes, such as access governance, strong authentication, session logging and monitoring, and anomaly detection and response. Temporary elevated access supplements the controls you already have in place.

The following are some of the ways that using temporary elevated access can help reduce risk:

1. Ensuring users only invoke elevated access when there is a valid business reason. Users are discouraged from invoking elevated access habitually, and service owners can avoid potentially disruptive operations during critical time periods.

2. Visibility of access to other people. With persistent access, user activity is logged—but no one is routinely informed when a user invokes access, unless their activity causes an incident or security alert. With temporary elevated access, every access invocation is typically visible to at least one other person. This can arise from their participation in approvals, notifications, or change and incident management processes which are multi-party by nature. With greater visibility to more people, inappropriate access by users is more likely to be noticed and acted upon.

3. A reminder to be vigilant. Temporary elevated access provides an overt reminder for users to be vigilant when they invoke high-risk access. This is analogous to the kind security measures you see in a physical security setting. Imagine entering a secure facility. You see barriers, fences, barbed wire, CCTV, lighting, guards, and signs saying “You are entering a restricted area.” Temporary elevated access has a similar effect. It reminds users there is a heightened level of control, their activity is being monitored, and they will be held accountable for any actions they perform.

4. Reporting, analytics, and continuous improvement. A temporary elevated access process records the reasons why users invoke access. This provides a rich source of data to analyze and derive insights. Management can see why users are invoking access, which systems need the most human access, and what kind of tasks they are performing. Your organization can use this data to decide where to invest in automation. You can measure the amount of human access and set targets to reduce it. The presence of temporary elevated access might also incentivize users to automate common tasks, or ask their engineering teams to do so.

Implementing temporary elevated access

Before you examine the reference implementation, first take a look at a logical architecture for temporary elevated access, so you can understand the process flow at a high level.

A typical temporary elevated access solution involves placing an additional component between your identity provider and the AWS environment that your users need to access. This is referred to as a temporary elevated access broker, shown in Figure 1.
 

Figure 1: A logical architecture for temporary elevated access

Figure 1: A logical architecture for temporary elevated access

When a user needs to perform a task requiring temporary elevated access to your AWS environment, they will use the broker to invoke access. The broker performs the following steps:

1. Authenticate the user and determine eligibility. The broker integrates with your organization’s existing identity provider to authenticate the user with multi-factor authentication (MFA), and determine whether they are eligible for temporary elevated access.

Note: Eligibility is a key concept in temporary elevated access. You can think of it as pre-authorization to invoke access that is contingent upon additional conditions being met, described in step 3. A user typically becomes eligible by becoming a trusted member of a team of admins or operators, and the scope of their eligibility is based on the tasks they’re expected to perform as part of their job function. Granting and revoking eligibility is generally based on your organization’s standard access governance processes. Eligibility can be expressed as group memberships (if using role-based access control, or RBAC) or user attributes (if using attribute-based access control, or ABAC). Unlike regular authorization, eligibility is not sufficient to grant access on its own.

2. Initiate the process for temporary elevated access. The broker provides a way to start the process for gaining temporary elevated access. In most cases a user will submit a request on their own behalf—but some broker designs allow access to be initiated in other ways, such as an operations user inviting an engineer to assist them. The scope of a user’s requested access must be a subset of their eligibility. The broker might capture additional information about the context of the request in order to perform the next step.

3. Establish a business reason for invoking access. The broker tries to establish whether there is a valid business reason for invoking access with a given scope on this specific occasion. Why does this user need this access right now? The process of establishing a valid business reason varies widely between organizations. It might be a simple approval workflow, a quorum-based authorization, or a fully automated process. It might integrate with existing change and incident management systems to infer the business reason for access. A broker will often provide a way to expedite access in a time-critical emergency, which is a form of break-glass access. A typical broker implementation allows you to customize this step.

4. Grant time-bound access. If the business reason is valid, the broker grants time-bound access to the AWS target environment. The scope of access that is granted to the user must be a subset of their eligibility. Further, the scope and duration of access granted should be necessary and sufficient to fulfill the business reason identified in the previous step, based on the principle of least privilege.

A minimal reference implementation for temporary elevated access

To get started with temporary elevated access, you can deploy a minimal reference implementation accompanying this blog post. Information about deploying, running and extending the reference implementation is available in the Git repo README page.

Note: You can use this reference implementation to complement the persistent access that you manage for IAM users, federated users, or manage through AWS Single Sign-On. For example, you can use the multi-account access model of AWS SSO for persistent access management, and create separate roles for temporary elevated access using this reference implementation.

To establish a valid business reason for invoking access, the reference implementation uses a single-step approval workflow. You can adapt the reference implementation and replace this with a workflow or business logic of your choice.

To grant time-bound access, the reference implementation uses the identity broker pattern. In this pattern, the broker itself acts as an intermediate identity provider which conditionally federates the user into the AWS target environment granting a time-bound session with limited scope.

Figure 2 shows the architecture of the reference implementation.
 

Figure 2: Architecture of the reference implementation

Figure 2: Architecture of the reference implementation

To illustrate how the reference implementation works, the following steps walk you through a user’s experience end-to-end, using the numbers highlighted in the architecture diagram.

Starting the process

Consider a scenario where a user needs to perform a task that requires privileged access to a critical service running in your AWS environment, for which your security team has configured temporary elevated access.

Loading the application

The user first needs to access the temporary elevated access broker so that they can request the AWS access they need to perform their task.

  1. The user navigates to the temporary elevated access broker in their browser.
  2. The user’s browser loads a web application using web static content from an Amazon CloudFront distribution whose target is an Amazon S3 bucket.

The broker uses a web application that runs in the browser, known as a Single Page Application (SPA).

Note: CloudFront and S3 are only used for serving web static content. If you prefer, you can modify the solution to serve static content from a web server in your private network.

Authenticating users

  1. The user is redirected to your organization’s identity provider to authenticate. The reference implementation uses the OpenID Connect Authorization Code flow with Proof Key for Code Exchange (PKCE).
  2. The user returns to the application as an authenticated user with an access token and ID token signed by the identity provider.

The access token grants delegated authority to the browser-based application to call server-side APIs on the user’s behalf. The ID token contains the user’s attributes and group memberships, and is used for authorization.

Calling protected APIs

  1. The application calls APIs hosted by Amazon API Gateway and passes the access token and ID token with each request.
  2. For each incoming request, API Gateway invokes a Lambda authorizer using AWS Lambda.

The Lambda authorizer checks whether the user’s access token and ID token are valid. It then uses the ID token to determine the user’s identity and their authorization based on their group memberships.

Displaying information

  1. The application calls one of the /get… API endpoints to fetch data about previous temporary elevated access requests.
  2. The /get… API endpoints invoke Lambda functions which fetch data from a table in Amazon DynamoDB.

The application displays information about previously-submitted temporary elevated access requests in a request dashboard, as shown in Figure 3.
 

Figure 3: The request dashboard

Figure 3: The request dashboard

Submitting requests

A user who is eligible for temporary elevated access can submit a new request in the request dashboard by choosing Create request. As shown in Figure 4, the application then displays a form with input fields for the IAM role name and AWS account ID the user wants to access, a justification for invoking access, and the duration of access required.
 

Figure 4: Submitting requests

Figure 4: Submitting requests

The user can only request an IAM role and AWS account combination for which they are eligible, based on their group memberships.

Note: The duration specified here determines a time window during which the user can invoke sessions to access the AWS target environment if their request is approved. It does not affect the duration of each session. Session duration can be configured independently.

  1. When a user submits a new request for temporary elevated access, the application calls the /create… API endpoint, which writes information about the new request to the DynamoDB table.

The user can submit multiple concurrent requests for different role and account combinations, as long as they are eligible.

Generating notifications

The broker generates notifications when temporary elevated access requests are created, approved, or rejected.

  1. When a request is created, approved, or rejected, a DynamoDB stream record is created for notifications.
  2. The stream record then invokes a Lambda function to handle notifications.
  3. The Lambda function reads data from the stream record, and generates a notification using Amazon Simple Notification Service (Amazon SNS).

By default, when a user submits a new request for temporary elevated access, an email notification is sent to all authorized reviewers. When a reviewer approves or rejects a request, an email notification is sent to the original requester.

Reviewing requests

A user who is authorized to review requests can approve or reject requests submitted by other users in a review dashboard, as shown in Figure 5. For each request awaiting their review, the application displays information about the request, including the business justification provided by the requester.
 

Figure 5: The review dashboard

Figure 5: The review dashboard

The reviewer can select a request, determine whether the request is appropriate, and choose either Approve or Reject.

  1. When a reviewer approves or rejects a request, the application calls the /approve… or /reject… API endpoint, which updates the status of the request in the DynamoDB table and initiates a notification.

Invoking sessions

After a requester is notified that their request has been approved, they can log back into the application and see their approved requests, as shown in Figure 6. For each approved request, they can invoke sessions. There are two ways they can invoke a session, by choosing either Access console or CLI.

Figure 6: Invoking sessions

Figure 6: Invoking sessions

Both options grant the user a session in which they assume the IAM role in the AWS account specified in their request.

When a user invokes a session, the broker performs the following steps.

  1. When the user chooses Access console or CLI, the application calls one of the /federate… API endpoints.
  2. The /federate… API endpoint invokes a Lambda function, which performs the following three checks before proceeding:
    1. Is the user authenticated? The Lambda function checks that the access and ID tokens are valid and uses the ID token to determine their identity.
    2. Is the user eligible? The Lambda function inspects the user’s group memberships in their ID token to confirm they are eligible for the AWS role and account combination they are seeking to invoke.
    3. Is the user elevated? The Lambda function confirms the user is in an elevated state by querying the DynamoDB table, and verifying whether there is an approved request for this user whose duration has not yet ended for the role and account combination they are seeking to invoke.
  3. If all three checks succeed, the Lambda function calls sts:AssumeRole to fetch temporary credentials on behalf of the user for the IAM role and AWS account specified in the request.
  4. The application returns the temporary credentials to the user.
  5. The user obtains a session with temporary credentials for the IAM role in the AWS account specified in their request, either in the AWS Management Console or AWS CLI.

Once the user obtains a session, they can complete the task they need to perform in the AWS target environment using either the AWS Management Console or AWS CLI.

The IAM roles that users assume when they invoke temporary elevated access should be dedicated for this purpose. They must have a trust policy that allows the broker to assume them. The trusted principal is the Lambda execution role used by the broker’s /federate… API endpoints. This ensures that the only way to assume those roles is through the broker.

In this way, when the necessary conditions are met, the broker assumes the requested role in your AWS target environment on behalf of the user, and passes the resulting temporary credentials back to them. By default, the temporary credentials last for one hour. For the duration of a user’s elevated access they can invoke multiple sessions through the broker, if required.

Session expiry

When a user’s session expires in the AWS Management Console or AWS CLI, they can return to the broker and invoke new sessions, as long as their elevated status is still active.

Ending elevated access

A user’s elevated access ends when the requested duration elapses following the time when the request was approved.
 

Figure 7: Ending elevated access

Figure 7: Ending elevated access

Once elevated access has ended for a particular request, the user can no longer invoke sessions for that request, as shown in Figure 7. If they need further access, they need to submit a new request.

Viewing historical activity

An audit dashboard, as shown in Figure 8, provides a read-only view of historical activity to authorized users.
 

Figure 8: The audit dashboard

Figure 8: The audit dashboard

Logging session activity

When a user invokes temporary elevated access, their session activity in the AWS control plane is logged to AWS CloudTrail. Each time they perform actions in the AWS control plane, the corresponding CloudTrail events contain the unique identifier of the user, which provides traceability back to the identity of the human user who performed the actions.

The following example shows the userIdentity element of a CloudTrail event for an action performed by user [email protected] using temporary elevated access.

"userIdentity": {
    "type": "AssumedRole",
    "principalId": "AROACKCEVSQ6C2EXAMPLE:[email protected]-TempAccessRoleS3Admin",
    "arn": "arn:aws:sts::111122223333:assumed-role/TempAccessRoleS3Admin/[email protected]-TempAccessRoleS3Admin",
    "accountId": "111122223333",
    "sessionContext": {
        "sessionIssuer": {
            "type": "Role",
            "principalId": "AROACKCEVSQ6C2EXAMPLE",
            "arn": "arn:aws:iam::111122223333:role/TempAccessRoleS3Admin",
            "accountId": "111122223333",
            "userName": "TempAccessRoleS3Admin"
        },
        "webIdFederationData": {},
        "attributes": {
            "mfaAuthenticated": "true",
            "creationDate": "2021-07-02T13:24:06Z"
        }
    }
}

Security considerations

The temporary elevated access broker controls access to your AWS environment, and must be treated with extreme care in order to prevent unauthorized access. It is also an inline dependency for accessing your AWS environment and must operate with sufficient resiliency.

The broker should be deployed in a dedicated AWS account with a minimum of dependencies on the AWS target environment for which you’ll manage access. It should use its own access control configuration following the principle of least privilege. Ideally the broker should be managed by a specialized team and use its own deployment pipeline, with a two-person rule for making changes—for example by requiring different users to check in code and approve deployments. Special care should be taken to protect the integrity of the broker’s code and configuration and the confidentiality of the temporary credentials it handles.

See the reference implementation README for further security considerations.

Extending the solution

You can extend the reference implementation to fit the requirements of your organization. Here are some ways you can extend the solution:

  • Customize the UI, for example to use your organization’s branding.
  • Keep network traffic within your private network, for example to comply with network security policies.
  • Change the process for initiating and evaluating temporary elevated access, for example to integrate with a change or incident management system.
  • Change the authorization model, for example to use groups with different scope, granularity, or meaning.
  • Use SAML 2.0, for example if your identity provider does not support OpenID Connect.

See the reference implementation README for further details on extending the solution.

Conclusion

In this blog post you learned about temporary elevated access and how it can help reduce risk relating to human user access. You learned that you should aim to eliminate the need to use high-risk human access through the use of automation, and only use temporary elevated access for infrequent activities that cannot yet be automated. Finally, you studied a minimal reference implementation for temporary elevated access which you can download and customize to fit your organization’s needs.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

James Greenwood

James is a principal security solutions architect who helps helps AWS Financial Services customers meet their security and compliance objectives in the AWS cloud. James has a background in identity and access management, authentication, credential management, and data protection with more than 20 years experience in the financial services industry.

Author

Bikash Behera

Bikash is a principal solutions architect who provides transformation guidance to AWS Financial Services customers and develops solutions for high priority customer objectives. Bikash has been delivering transformation guidance and technology solutions to the financial services industry for the last 25 years.

Author

Kevin Higgins

Kevin is a principal cloud architect with AWS Professional Services. He helps customers with the architecture, design, and development of cloud-optimized infrastructure solutions. As a member of the Microsoft Global Specialty Practice, he collaborates with AWS field sales, training, support, and consultants to help drive AWS product feature roadmap and go-to-market strategies.

Running a Cost-effective NLP Pipeline on Serverless Infrastructure at Scale

Post Syndicated from Eitan Sela original https://aws.amazon.com/blogs/architecture/running-a-cost-effective-nlp-pipeline-on-serverless-infrastructure-at-scale/

Amenity Analytics develops enterprise natural language processing (NLP) platforms for the finance, insurance, and media industries that extract critical insights from mountains of documents. We provide a scalable way for businesses to get a human-level understanding of information from text.

In this blog post, we will show how Amenity Analytics improved the continuous integration (CI) pipeline speed by 15x. We hope that this example can help other customers achieve high scalability using AWS Step Functions Express.

Amenity Analytics’ models are developed using both a test-driven development (TDD) and a behavior-driven development (BDD) approach. We verify the model accuracy throughout the model lifecycle—from creation to production, and on to maintenance.

One of the actions in the Amenity Analytics model development cycle is backtesting. It is an important part of our CI process. This process consists of two steps running in parallel:

  • Unit tests (TDD): checks that the code performs as expected
  • Backtesting tests (BDD): validates that the precision and recall of our models is similar or better than previous

The backtesting process utilizes hundreds of thousands of annotated examples in each “code build.” To accomplish this, we initially used the AWS Step Functions default workflow. AWS Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Workflows manage failures, retries, parallelization, service integrations, and observability.

Challenge with the existing Step Functions solution

We found that Step Functions standard workflow has a bucket of 5,000 state transitions with a refill rate of 1,500. Each annotated example has ~10 state transitions. This creates millions of state transitions per code build. Since state transitions are limited and couldn’t be increased to our desired amount, we often faced delays and timeouts. Developers had to coordinate their work with each other, which inevitably slowed down the entire development cycle.

To resolve these challenges, we migrated from Step Functions standard workflows to Step Functions Express workflows, which have no limits on state transitions. In addition, we changed the way each step in the pipeline is initiated, from an async call to a sync API call.

Step Functions Express workflow solution

When a model developer merges their new changes, the CI process starts the backtesting for all existing models.

  • Each model is checked to see if the annotated examples were already uploaded and saved in the Amazon Simple Storage Service (S3) cache. The check is made by a unique key representing the list of items. Once a model is reviewed, the review items will rarely be changed.
  • If the review items haven’t been uploaded yet, it uploads them and initiates an unarchive process. This way the review items can be used in the next phase.
  • When the items are uploaded, an API call is invoked using Amazon API Gateway with the review items keys, see Figure 1.
  • The request is forwarded to an AWS Lambda function. It is responsible for validating the request and sending a job message to an Amazon Simple Queue Service (SQS) queue.
  • The SQS messages are consumed by concurrent Lambda functions, which synchronously invoke a Step Function. The number of Lambda functions are limited to ensure that they don’t exceed their limit in the production environment.
  • When an item is finished in the Step Function, it creates an SQS notification message. This message is inserted into a queue and consumed as a message batch by a Lambda function. The function then sends an AWS IoT message containing all relevant messages for each individual user.
Figure 1. Step Functions Express workflow solution

Figure 1. Step Functions Express workflow solution

Main Step Function Express workflow pipeline

Step Functions Express supports only sync calls. Therefore, we replaced the previous async Amazon Simple Notification Service (SNS) and Amazon SQS, with sync calls to API Gateway.

Figure 2 shows the workflow for a single document in Step Function Express:

  1. Generate a document ID for the current iteration
  2. Perform base NLP analysis by calling another Step Function Express wrapped by an API Gateway
  3. Reformat the response to be the same as all other “logic” steps results
  4. Verify the result by the “Choice” state – if failed go to end, otherwise, continue
  5. Perform the Amenity core NLP analysis in three model invocations: Group, Patterns, and Business Logic (BL)
  6. For each of the model runtime steps:
    • Check if the result is correct
    • If failed, go to end, otherwise continue
  7. Return a formatted result at the end
Figure 2. Workflow for a single document

Figure 2. Workflow for a single document

Base NLP analysis Step Function Express

For our base NLP analysis, we use Spacy. Figure 3 shows how we used it in Step Functions Express:

  1. Confirm if text exists in cache (this means it has been previously analyzed)
  2. If it exists, return the cached result
  3. If it doesn’t exist, split the text to a manageable size (Spacy has text size limitations)
  4. All the texts parts are analyzed in parallel by Spacy
  5. Merge the results into a single, analyzed document and save it to the cache
  6. If there was an exception during the process, it is handled in the “HandleStepFunctionExceptionState”
  7. Send a reference to the analyzed document if successful
  8. Send an error message if there was an exception
Figure 3. Base NLP analysis for a single document

Figure 3. Base NLP analysis for a single document

Results

Our backtesting migration was deployed on August 10, and unit testing migration on September 14. After the first migration, the CI was limited by the unit tests, which took ~25 minutes. When the second migration was deployed, the process time was reduced to ~6 minutes (P95).

Figure 4. Process time reduced from 50 minutes to 6 minutes

Figure 4. Process time reduced from 50 minutes to 6 minutes

Conclusion

By migrating from standard Step Functions to Step Functions Express, Amenity Analytics increased processing speed 15x. A complete pipeline that used to take ~45 minutes with standard Step Functions, now takes ~3 minutes using Step Functions Express. This migration removed the need for users to coordinate workflow processes to create a build. Unit testing (TDD) went from ~25 mins to ~30 seconds. Backtesting (BDD) went from taking more than 1 hour to ~6 minutes.

Switching to Step Functions Express allows us to focus on delivering business value faster. We will continue to explore how AWS services can help us drive more value to our users.

For further reading:

Managing permissions with grants in AWS Key Management Service

Post Syndicated from Rick Yin original https://aws.amazon.com/blogs/security/managing-permissions-with-grants-in-aws-key-management-service/

AWS Key Management Service (AWS KMS) helps customers to use encryption to secure their data. When creating a new encrypted Amazon Web Services (AWS) resource, such as an Amazon Relational Database Service (Amazon RDS) database or an Amazon Simple Storage Service (Amazon S3) bucket, all you have to do is provide an AWS KMS key ID that you control and the data will be encrypted and the complexity of protecting and making encryption keys highly available is reduced.

If you’re considering delegating encryption to an AWS service to use a key under your control when it encrypts your data in that service, you might wonder how to ensure the AWS service can only use your key when you want it to and not have full access to decrypt any of your resources at any time. The answer is to use scoped-down dynamic permissions in AWS KMS. Specifically, a combination of permissions that you define in the KMS key policy document along with additional permissions that are created dynamically using KMS grants define the conditions under which one or more AWS services can use your KMS keys to encrypt and decrypt your data.

In this blog post, I discuss:

  • An example of how an AWS service uses your KMS key policy and grants to securely manage access to your encryption keys. The example uses Amazon RDS and demonstrates how the block storage volume behind your database instance is encrypted.
  • Best practices for using grants from AWS KMS in your own workloads.
  • Recent performance improvements when using grants in AWS KMS.

Case study: How RDS uses grants from AWS KMS to encrypt your database volume

Many Amazon RDS instance types are hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance where the underlying storage layer is an Amazon Elastic Block Store (Amazon EBS) volume. The blocks of the EBS volume that stores the database content are encrypted under a randomly generated 256-bit symmetric data key that is itself encrypted under a KMS key that you configure RDS to use when you create your database instance. Let’s look at how RDS interacts with EBS, EC2, and AWS KMS to securely create an RDS instance using an KMS key.

When you send a request to RDS to create your database, there are several asynchronous requests being made among the RDS, EC2, EBS, and KMS services to:

  1. Create the underlying storage volume with a unique encryption key.
  2. Create the compute instance in EC2.
  3. Load the database engine into the EC2 instance.
  4. Give the EC2 instance permissions to use the encryption key to read and write data to the database storage volume.

The initial authenticated request that you make to RDS to create a new database is made by an AWS Identity and Access Management (IAM) principal in your account (e.g. a user or role). Once the request is received, a series of things has to happen:

  1. RDS needs to request EBS to create an encrypted volume to store your future data.
  2. EBS needs to request AWS KMS generate a unique 256-bit data key for the volume and encrypt it under the KMS key you told RDS to use.
  3. RDS then needs to request that EC2 launch an instance, attach that encrypted volume, and make the data key available to EC2 for use in reads and writes to the volume.

From your perspective, the IAM principal used to create the database also must have permissions in the KMS key policy for the GenerateDataKeyWithoutPlaintext and Decrypt actions. This enables the unique 256-bit data key to be created and encrypted under the desired KMS key as well as allowing the user or role to have the data key decrypted and provisioned to the Nitro card managing your EC2 instance so that reads/writes can happen from/to the database. Given the asynchronous nature of the process of creating the database vs. launching the database volume in the future, how do the RDS, EBS, and EC2 services all get the necessary least privileged permissions to create and provision the data key for use with your database? The answer starts with your IAM principal having permission for the AWS KMS CreateGrant action in the key policy.

RDS uses the identity from your IAM principal to create a grant in AWS KMS that allows it to create other grants for EC2 and EBS with very limited permissions that are further scoped down compared to the original permissions your IAM principal has on the AWS KMS key. A total of three grants are created:

  • The initial RDS grant.
  • A subsequent EBS grant that allows EBS to call AWS KMS and generate a 256-bit data key that is encrypted under the KMS key you defined when creating your database.
  • The attachment grant, which allows the specific EC2 instance hosting your database volume to decrypt the encrypted data key for and provision it for use during I/O between the instance and the EBS volume.

RDS grant

In this example, let’s say you’ve created an RDS instance with an ID of db-1234 and specified a KMS key for encryption. The following grant is created on the KMS key, allowing RDS to create more grants for EC2 and EBS to use in the asynchronous processes required to launch your database instance. The RDS grant is as follows:

{Grantee Principal: '<Regional RDS Service Account>', Encryption Context: '"aws:rds:db-id": "db-1234"', Operations: ['CreateGrant', 'Decrypt', 'GenerateDataKeyWithoutPlaintext']}

In plain English, this grant gives RDS permissions to use the KMS key for three specific operations (API actions) only when the call specifies the RDS instance ID db-1234 in the Encryption Context parameter. The grant provides access for the the grantee principal, which in this case is the value shown for the <Regional RDS service account>. This grant is created in AWS KMS and associated with your KMS key. Because the EC2 instance hasn’t yet been created and launched, the grantee principal cannot include the EC2 instance ID and must instead be the regional RDS service account.

EBS grant

With the RDS instance and initial AWS KMS grant created, RDS requests EC2 to launch an instance for the RDS database. EC2 creates an instance with a unique ID (e.g. i-1234567890abcdefg) using EC2 permissions you gave to the original IAM principal. In addition to the EC2 instance being created, RDS requests that Amazon EBS create an encrypted volume dedicated to the database. As a part of volume creation, EBS needs permission to call AWS KMS to generate a unique 256-bit data key for the volume and encrypt that data key under the KMS key you defined.

The EC2 instance ID is used as the name of the identity for future calls to AWS KMS, so RDS inserts it as the grantee principal in the EBS grant it creates. The EBS grant is as follows:

{Grantee Principal: '<RDS-Host-Role>:i-1234567890abcdefg', Encryption Context: '"aws:rds:db-id": "db-1234"', Operations: ['CreateGrant', 'Decrypt', 'GenerateDataKeyWithoutPlaintext']}}

You’ll notice that this grant uses the same encryption context as the initial RDS grant. However, now that we have the EC2 instance ID associated with the database ID, the permissions that EBS gets to use your key as the grantee principal can be scoped down to require both values. Once this grant is created, EBS can create the EBS volume (e.g. vol-0987654321gfedcba) and call AWS KMS to generate and encrypt a 256-bit data key that can only be used for that volume. This encrypted data key is stored by EBS in preparation for the volume attachment process.

Attachment grant

The final step in creating the RDS instance is to attach the EBS volume to the EC2 instance hosting your database. EC2 now uses the previously created EBS grant to create the attachment grant with the i-1234567890abcdefg instance identity. This grant allows EC2 to decrypt the encrypted data key, provision it to the Nitro card that manages the instance, and begin encrypting I/O to the EBS volume of the RDS database. The attachment grant in this example will be as follows:

{Grantee Principal: 'EC2 Instance Role:i-1234567890abcdefg', Encryption Context: '"aws:rds:db-id": "db-1234", "aws:ebs:id":"vol-0987654321gfedcba"', Operations: ['Decrypt']}

The attachment grant is the most restrictive of the three grants. It requires the caller to know the IDs of all the AWS entities involved: EC2 instance ID, EBS volume ID, and RDS database ID. This design ensures that your KMS key can only be used for decryption by these AWS services in order to launch the specific RDS database you want.

The encrypted EBS volume is now active and attached to the EC2 instance. Should you terminate the RDS instance, the services retire all the relevant KMS grants so they no longer have any permission to use your KMS key to decrypt the 256-bit data key required to decrypt data in your database. If you need to launch your encrypted database again, a similar set of three grants will be dynamically created with the RDS database, EC2 instance, and EBS volume IDs used to scope down permissions on the AWS KMS key.

The process described in the previous paragraphs is graphically shown in Figure 1:
 
Figure 1: How Amazon RDS uses Amazon EC2, Amazon EBS, and AWS KMS to create an encrypted RDS instance

Considering all the AWS KMS key permissions that are added and removed as a part of launching a database, you might ask why not just use the key policy document to make these changes? A KMS key allows only one key policy with a maximum document size of 32 KB. Because one key could be used to encrypt any number of AWS resources, trying to dynamically add and remove scoped-down permissions related to each resource to the key policy document creates two risks. First, the maximum allowable size of the key policy document (32KB) might be exceeded. Second, depending on how many resources are being accessed concurrently, you may exceed the request rate quota for the PutKeyPolicy API action in AWS KMS.

In contrast, there can be any number of grants on a given AWS KMS key, each grant specifying a scoped-down permission for the use of a KMS key with any AWS service that integrated with AWS KMS. Grant creation and deletion is also designed for much higher-volume request rates than modifications to the key policy document. Finally, permission to call PutKeyPolicy is a highly privileged permission, as it lets the caller make unrestricted changes to the permissions on the key, including changes to administrative permissions to disable or schedule the key for deletion. Grants on a key can only allow permissions to use the key, not administer the key. Also, grants that allow the creation of other grants by other IAM principals prohibit the escalation of privilege. In the RDS example above, the permissions RDS receives from the IAM principal in your account during the first CreateGrant request cannot be more permissive than what you defined for the IAM principal in the KMS key policy. The permissions RDS gives to EC2 and EBS during the database creation process cannot be more permissive than the original permission RDS has from the initial grant. This design ensures that AWS services cannot escalate their privileges and use your KMS key for purposes different than what you intend.

Best practices for using AWS KMS grants

AWS KMS grants are a powerful tool to dynamically define permissions to use keys. They are automatically created when you use server-side encryption features in various AWS services. You can also use grants to control permission in your own applications that perform client-side encryption. Here are some best practices to consider:

  • Design the permissions to be as scoped down as possible. Use a specific grantee principal, such as an IAM role, and give the principal access only to the AWS KMS API actions that are needed. You can further limit the scope of grants with the Encryption Context parameter by using any element you want to ensure callers are using the AWS KMS key only for the intended purpose. Below is a specific example that grants AWS account 123456789012 permission to call the GenerateDataKey or Decrypt APIs, but only if the supplied encryption context for customerID is 5678.
    {Actions: 'GenerateDataKey, Decrypt', Grantee Principal: '123456789012', Encryption Context: '"customerID": "5678"'}
    

    This grant could prevent your application from decrypting data belonging to customer “5678” without explicitly passing the expected customerID in the request to AWS KMS. This may be a useful defense-in-depth mechanism to prevent unauthorized access to your customers’ data if your application’s AWS credentials were compromised and used from a different caller who doesn’t know that encryption context is a required parameter for all reads and writes in order to encrypt and decrypt data.

    For more information on how you can use encryption context in AWS KMS permissions, requests, and AWS CloudTrail logs, see How to Protect the Integrity of Your Encrypted Data by Using AWS Key Management Service and EncryptionContext.

  • Remember that grants don’t automatically expire. Your code needs to retire or revoke them once you know the permission is no longer needed on the KMS key. Grants that aren’t retired are leftover permissions that might create a security risk for encrypted resources. See retiring and revoking grants in the AWS KMS developer guide for more detail.
  • Avoid creating duplicate grants. A duplicate grant is a grant that shares the same AWS KMS key ID, API actions, grantee principal, encryption context, and name. If you retire the original grant after use and not the duplicates, then the leftover duplicate grants can lead to unintended access to encrypt or decrypt data.

Recent performance improvements to AWS KMS grants: Removing a resource quota

For customers who use AWS KMS to encrypt resources in AWS services that use grants, there used to be cases where AWS KMS had to enforce a quota on the number of concurrently active resources that could be encrypted under the same KMS key. For example, customers of Amazon RDS, Amazon WorkSpaces, or Amazon EBS would run into this quota at very large scale. This was the Grants for a given principal per key quota and was previously set to 500. You might have seen the error message “Keys only support 500 grants per grantee principal in this region” when trying to create a resource in one of these services.

We recently made a change to AWS KMS to remove this quota entirely and this error message no longer exists. With this quota removed, you can now attach unlimited grants to any KMS key when using any AWS service.

Summary

In this blog post, you’ve seen how services such as Amazon RDS use AWS KMS grants to pass scoped-down permissions through the Amazon EC2 and Amazon EBS instances. You also saw some best practices for using AWS KMS grants in your own applications. Finally, you learned about how AWS KMS has improved grants by removing one of the resource quotas.

Below are some additional resources for AWS KMS and grants.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Rick Yin

Rick is a software development engineer on the AWS KMS team. His current focus is helping to scale AWS KMS to meet increasing customer demand by making sure we can serve our requests at ultra-low latency and ultra-high availability. In his free time, Rick enjoys learning about history and trying to stay in shape. He has recently taken up rowing.

Deep learning image vector embeddings at scale using AWS Batch and CDK

Post Syndicated from Filip Saina original https://aws.amazon.com/blogs/devops/deep-learning-image-vector-embeddings-at-scale-using-aws-batch-and-cdk/

Applying various transformations to images at scale is an easily parallelized and scaled task. As a Computer Vision research team at Amazon, we occasionally find that the amount of image data we are dealing with can’t be effectively computed on a single machine, but also isn’t large enough to justify running a large and potentially costly AWS Elastic Map Reduce (EMR) job. This is when we can utilize AWS Batch as our main computing environment, as well as Cloud Development Kit (CDK) to provision the necessary infrastructure in order to solve our task.

In Computer Vision, we often need to represent images in a more concise and uniform way. Working with standard image files would be challenging, as they can vary in resolution or are otherwise too large in terms of dimensionality to be provided directly to our models. For that reason, the common practice for deep learning approaches is to translate high-dimensional information representations, such as images, into vectors that encode most (if not all) information present in them — in other words, to create vector embeddings.

This post will demonstrate how we utilize the AWS Batch platform to solve a common task in many Computer Vision projects — calculating vector embeddings from a set of images so as to allow for scaling.

 Architecture Overview

Diagram explained in post.

Figure 1: High-level architectural diagram explaining the major solution components.

As seen in Figure 1, AWS Batch will pull the docker image containing our code onto provisioned hosts and start the docker containers. Our sample code, referenced in this post, will then read the resources from S3, conduct the vectorization, and write the results as entries in the DynamoDB Table.

In order to run our image vectorization task, we will utilize the following AWS cloud components:

  • Amazon ECR — Elastic Container Registry is a Docker image repository from which our batch instances will pull the job images;
  • S3 — Amazon Simple Storage Service will act as our image source from which our batch jobs will read the image;
  • Amazon DynamoDB — NoSQL database in which we will write the resulting vectors and other metadata;
  • AWS Lambda — Serverless compute environment which will conduct some pre-processing and, ultimately, trigger the batch job execution; and
  • AWS Batch — Scalable computing environment powering our models as embarrassingly parallel tasks running as AWS Batch jobs.

To translate an image to a vector, we can utilize a pre-trained model architecture, such as AlexNet, ResNet, VGG, or more recent ones, like ResNeXt and Vision Transformers. These model architectures are available in most of the popular deep learning frameworks, and they can be further modified and extended depending on our project requirements. For this post, we will utilize a pre-trained ResNet18 model from MxNet. We will output an intermediate layer of the model, which will result in a 512 dimensional representation, or, in other words, a 512 dimensional vector embedding.

Deployment using Cloud Development Kit (CDK)

In recent years, the idea of provisioning cloud infrastructure components using popular programming languages was popularized under the term of infrastructure as code (IaC). Instead of writing a file in the YAML/JSON/XML format, which would define every cloud component we want to provision, we might want to define those components trough a popular programming language.

As part of this post, we will demonstrate how easy it is to provision infrastructure on AWS cloud by using Cloud Development Kit (CDK). The CDK code included in the exercise is written in Python and defines all of the relevant exercise components.

Hands-on exercise

1. Deploying the infrastructure with AWS CDK

For this exercise, we have provided a sample batch job project that is available on Github (link). By using that code, you should have every component required to do this exercise, so make sure that you have the source on your machine. The root of your sample project local copy should contain the following files:

batch_job_cdk - CDK stack code of this batch job project
src_batch_job - source code for performing the image vectorization
src_lambda - source code for the lambda function which will trigger the batch job execution
app.py - entry point for the CDK tool
cdk.json - config file specifying the entry point for CDK
requirements.txt - list of python dependencies for CDK 
README.md  
  1. Make sure you have installed and correctly configured the AWS CLI and AWS CDK in your environment. Refer to the CDK documentation for more information, as well as the CDK getting started guide.
  2. Set the CDK_DEPLOY_ACCOUNT and CDK_DEPLOY_REGION environmental variables, as described in the project README.md.
  3. Go to the sample project root and install the CDK python dependencies by running pip install -r requirements.txt.
  4. Install and configure Docker in your environment.
  5. If you have multiple AWS CLI profiles, utilize the --profile option to specify which profile to use for deployment. Otherwise, simply run cdk deploy and deploy the infrastructure to your AWS account set in step 1.

NOTE: Before deploying, make sure that you are familiar with the restrictions and limitations of the AWS services we are using in this post. For example, if you choose to set an S3 bucket name in the CDK Bucket construct, you must avoid naming conflicts that might cause deployment errors.

The CDK tool will now trigger our docker image build, provision the necessary AWS infrastructure (i.e., S3 Bucket, DynamoDB table, roles and permissions), and, upon completion, upload the docker image to a newly created repository on Amazon Elastic Container Registry (ECR).

2. Upload data to S3

Console explained in post.

Figure 2: S3 console window with uploaded images to the `images` directory.

After CDK has successfully finished deploying, head to the S3 console screen and upload images you want to process to a path in the S3 bucket. For this exercise, we’ve added every image to the `images` directory, as seen in Figure 2.

For larger datasets, utilize the AWS CLI tool to sync your local directory with the S3 bucket. In that case, consider enabling the ‘Transfer acceleration’ option of your S3 bucket for faster data transfers. However, this will incur an additional fee.

3. Trigger batch job execution

Once CDK has completed provisioning our infrastructure and we’ve uploaded the image data we want to process, open the newly created AWS Lambda in the AWS console screen in order to trigger the batch job execution.

To do this, create a test event with the following JSON body:

{
"Paths": [
    "images"
   ]
}

The JSON body that we provide as input to the AWS Lambda function defines a list of paths to directories in the S3 buckets containing images. Having the ability to dynamically provide paths to directories with images in S3, lets us combine multiple data sources into a single AWS Batch job execution. Furthermore, if we decide in the future to put an API Gateway in front of the Lambda, you could pass every parameter of the batch job with a simple HTTP method call.

In this example, we specified just one path to the `images` directory in the S3 bucket, which we populated with images in the previous step.

Console screen explained in post.

Figure 3: AWS Lambda console screen of the function that triggers batch job execution. Modify the batch size by modifying the `image_batch_limit` variable. The value of this variable will depend on your particular use-case, computation type, image sizes, as well as processing time requirements.

The python code will list every path under the images S3 path, batch them into batches of desired size, and finally save the paths to batches as txt files under tmp S3 path. Each path to a txt files in S3 will be passed as an input to a batch jobs.

Select the newly created event, and then trigger the Lambda function execution. The AWS Lambda function will submit the AWS Batch jobs to the provisioned AWS Batch compute environment.

Batch job explained in post.

Figure 4: Screenshot of a running AWS Batch job that creates feature vectors from images and stores them to DynamoDB.

Once the AWS Lambda execution finishes its execution, we can monitor the AWS Batch jobs being processed on the AWS console screen, as seen in Figure 4. Wait until every job has finished successfully.

4. View results in DynamoDB

Image vectorization results.

Figure 5: Image vectorization results stored for each image as a entry in the DynamoDB table.

Once every batch job is successfully finished, go to the DynamoDB AWS cloud console and see the feature vectors stored as strings obtained from the numpy tostring method, as well as other data we stored in the table.

When you are ready to access the vectors in one of your projects, utilize the code snippet provided here:

#!/usr/bin/env python3

import numpy as np
import boto3

def vector_from(item):
    '''
    Parameters
    ----------
    item : DynamoDB response item object
    '''
    vector = np.frombuffer(item['Vector'].value, dtype=item['DataType'])
    assert len(vector) == item['Dimension']
    return vector

def vectors_from_dydb(dynamodb, table_name, image_ids):
    '''
    Parameters
    ----------
    dynamodb : DynamoDB client
    table_name : Name of the DynamoDB table
    image_ids : List of id's to query the DynamoDB table for
    '''

    response = dynamodb.batch_get_item(
        RequestItems={table_name: {'Keys': [{'ImageId': val} for val in image_ids]}},
        ReturnConsumedCapacity='TOTAL'
    )

    query_vectors =  [vector_from(item) for item in response['Responses'][table_name]]
    query_image_ids =  [item['ImageId'] for item in response['Responses'][table_name]]

    return zip(query_vectors, query_image_ids)
    
def process_entry(vector, image_id):
    '''
    NOTE - Add your code here.
    '''
    pass

def main():
    '''
    Reads vectors from the batch job DynamoDB table containing the vectorization results.
    '''
    dynamodb = boto3.resource('dynamodb', region_name='eu-central-1')
    table_name = 'aws-blog-batch-job-image-transform-dynamodb-table'

    image_ids = ['B000KT6OK6', 'B000KTC6X0', 'B000KTC6XK', 'B001B4THHG']

    for vector, image_id in vectors_from_dydb(dynamodb, table_name, image_ids):
        process_entry(vector, image_id)

if __name__ == "__main__":
    main()

This code snippet will utilize the boto3 client to access the results stored in the DynamoDB table. Make sure to update the code variables, as well as to modify this implementation to one that fits your use-case.

5. Tear down the infrastructure using CDK

To finish off the exercise, we will tear down the infrastructure that we have provisioned. Since we are using CDK, this is very simple — go to the project root directory and run:

cdk destroy

After a confirmation prompt, the infrastructure tear-down should be underway. If you want to follow the process in more detail, then go to the CloudFormation console view and monitor the process from there.

NOTE: The S3 Bucket, ECR image, and DynamoDB table resource will not be deleted, since the current CDK code defaults to RETAIN behavior in order to prevent the deletion of data we stored there. Once you are sure that you don’t need them, remove those remaining resources manually or modify the CDK code for desired behavior.

Conclusion

In this post we solved an embarrassingly parallel job of creating vector embeddings from images using AWS batch. We provisioned the infrastructure using Python CDK, uploaded sample images, submitted AWS batch job for execution, read the results from the DynamoDB table, and, finally, destroyed the AWS cloud resources we’ve provisioned at the beginning.

AWS Batch serves as a good compute environment for various jobs. For this one in particular, we can scale the processing to more compute resources with minimal or no modifications to our deep learning models and supporting code. On the other hand, it lets us potentially reduce costs by utilizing smaller compute resources and longer execution times.

The code serves as a good point for beginning to experiment more with AWS batch in a Deep Leaning/Machine Learning setup. You could extend it to utilize EC2 instances with GPUs instead of CPUs, utilize Spot instances instead of on-demand ones, utilize AWS Step Functions to automate process orchestration, utilize Amazon SQS as a mechanism to distribute the workload, as well as move the lambda job submission to another compute resource, or pretty much tailor your project for anything else you might need AWS Batch to do.

And that brings us to the conclusion of this post. Thanks for reading, and feel free to leave a comment below if you have any questions. Also, if you enjoyed reading this post, make sure to share it with your friends and colleagues!

About the author

Filip Saina

Filip is a Software Development Engineer at Amazon working in a Computer Vision team. He works with researchers and engineers across Amazon to develop and deploy Computer Vision algorithms and ML models into production systems. Besides day-to-day coding, his responsibilities also include architecting and implementing distributed systems in AWS cloud for scalable ML applications.

Generating DevOps Guru Proactive Insights for Amazon ECS

Post Syndicated from Trishanka Saikia original https://aws.amazon.com/blogs/devops/generate-devops-guru-proactive-insights-in-ecs-using-container-insights/

Monitoring is fundamental to operating an application in production, since we can only operate what we can measure and alert on. As an application evolves, or the environment grows more complex, it becomes increasingly challenging to maintain monitoring thresholds for each component, and to validate that they’re still set to an effective value. We not only want monitoring alarms to trigger when needed, but also want to minimize false positives.

Amazon DevOps Guru is an AWS service that helps you effectively monitor your application by ingesting vended metrics from Amazon CloudWatch. It learns your application’s behavior over time and then detects anomalies. Based on these anomalies, it generates insights by first combining the detected anomalies with suspected related events from AWS CloudTrail, and then providing the information to you in a simple, ready-to-use dashboard when you start investigating potential issues. Amazon DevOpsGuru makes use of the CloudWatch Containers Insights to detect issues around resource exhaustion for Amazon ECS or Amazon EKS applications. This helps in proactively detecting issues like memory leaks in your applications before they impact your users, and also provides guidance as to what the probable root-causes and resolutions might be.

This post will demonstrate how to simulate a memory leak in a container running in Amazon ECS, and have it generate a proactive insight in Amazon DevOps Guru.

Solution Overview

The following diagram shows the environment we’ll use for our scenario. The container “brickwall-maker” is preconfigured as to how quickly to allocate memory, and we have built this container image and published it to our public Amazon ECR repository. Optionally, you can build and host the docker image in your own private repository as described in step 2 & 3.

After creating the container image, we’ll utilize an AWS CloudFormation template to create an ECS Cluster and an ECS Service called “Test” with a desired count of two. This will create two tasks using our “brickwall-maker” container image. The stack will also enable Container Insights for the ECS Cluster. Then, we will enable resource coverage for this CloudFormation stack in Amazon DevOpsGuru in order to start our resource analysis.

Architecture Diagram showing the service “Test” using the container “brickwall-maker” with a desired count of two. The two ECS Task’s vended metrics are then processed by CloudWatch Container Insights. Both, CloudWatch Container Insights and CloudTrail, are ingested by Amazon DevOps Guru which then makes detected insights available to the user. [Image: DevOpsGuruBlog1.png]V1: DevOpsGuruBlog1.drawio (https://api.quip-amazon.com/2/blob/fbe9AAT37Ge/LdkTqbmlZ8uNj7A44pZbnw?name=DevOpsGuruBlog1.drawio&s=cVbmAWsXnynz) V2: DevOpsGuruBlog1.drawio (https://api.quip-amazon.com/2/blob/fbe9AAT37Ge/SvsNTJLEJOHHBls_kV7EwA?name=DevOpsGuruBlog1.drawio&s=cVbmAWsXnynz) V3: DevOpsGuruBlog1.drawio (https://api.quip-amazon.com/2/blob/fbe9AAT37Ge/DqKTxtQvmOLrzM3KcF_oTg?name=DevOpsGuruBlog1.drawio&s=cVbmAWsXnynz)

Source provided on GitHub:

  • DevOpsGuru.yaml
  • EnableDevOpsGuruForCfnStack.yaml
  • Docker container source

Steps:

1. Create your IDE environment

In the AWS Cloud9 console, click Create environment, give your environment a Name, and click Next step. On the Environment settings page, change the instance type to t3.small, and click Next step. On the Review page, make sure that the Name and Instance type are set as intended, and click Create environment. The environment creation will take a few minutes. After that, the AWS Cloud9 IDE will open, and you can continue working in the terminal tab displayed in the bottom pane of the IDE.

Install the following prerequisite packages, and ensure that you have docker installed:

sudo yum install -y docker
sudo service docker start 
docker --version
Clone the git repository in order to download the required CloudFormation templates and code:

git clone https://github.com/aws-samples/amazon-devopsguru-brickwall-maker

Change to the directory that contains the cloned repository

cd amazon-devopsguru-brickwall-maker

2. Optional : Create ECR private repository

If you want to build your own container image and host it in your own private ECR repository, create a new repository with the following command and then follow the steps to prepare your own image:

aws ecr create-repository —repository-name brickwall-maker

3. Optional: Prepare Docker Image

Authenticate to Amazon Elastic Container Registry (ECR) in the target region

aws ecr get-login-password --region ap-northeast-1 | \
    docker login --username AWS --password-stdin \
    123456789012.dkr.ecr.ap-northeast-1.amazonaws.com

In the above command, as well as in the following shown below, make sure that you replace 123456789012 with your own account ID.

Build brickwall-maker Docker container:

docker build -t brickwall-maker .

Tag the Docker container to prepare it to be pushed to ECR:

docker tag brickwall-maker:latest 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/brickwall-maker:latest

Push the built Docker container to ECR

docker push 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/brickwall-maker:latest

4. Launch the CloudFormation template to deploy your ECS infrastructure

To deploy your ECS infrastructure, run the following command (replace your own private ECR URL or use our public URL) in the ParameterValue) to launch the CloudFormation template :

aws cloudformation create-stack --stack-name myECS-Stack \
--template-body file://DevOpsGuru.yaml \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--parameters ParameterKey=ImageUrl,ParameterValue=public.ecr.aws/p8v8e7e5/myartifacts:brickwallv1

5. Enable DevOps Guru to monitor the ECS Application

Run the following command to enable DevOps Guru for monitoring your ECS application:

aws cloudformation create-stack \
--stack-name EnableDevOpsGuruForCfnStack \
--template-body file://EnableDevOpsGuruForCfnStack.yaml \
--parameters ParameterKey=CfnStackNames,ParameterValue=myECS-Stack

6. Wait for base-lining of resources

This step lets DevOps Guru complete the baselining of the resources and benchmark the normal behavior. For this particular scenario, we recommend waiting two days before any insights are triggered.

Unlike other monitoring tools, the DevOps Guru dashboard would not present any counters or graphs. In the meantime, you can utilize CloudWatch Container Insights to monitor the cluster-level, task-level, and service-level metrics in ECS.

7. View Container Insights metrics

  • Open the CloudWatch console.
  • In the navigation pane, choose Container Insights.
  • Use the drop-down boxes near the top to select ECS Services as the resource type to view, then select DevOps Guru as the resource to monitor.
  • The performance monitoring view will show you graphs for several metrics, including “Memory Utilization”, which you can watch increasing from here. In addition, it will show the list of tasks in the lower “Task performance” pane showing the “Avg CPU” and “Avg memory” metrics for the individual tasks.

8. Review DevOps Guru insights

When DevOps Guru detects an anomaly, it generates a proactive insight with the relevant information needed to investigate the anomaly, and it will list it in the DevOps Guru Dashboard.

You can view the insights by clicking on the number of insights displayed in the dashboard. In our case, we expect insights to be shown in the “proactive insights” category on the dashboard.

Once you have opened the insight, you will see that the insight view is divided into the following sections:

  • Insight Overview with a basic description of the anomaly. In this case, stating that Memory Utilization is approaching limit with details of the stack that is being affected by the anomaly.
  • Anomalous metrics consisting of related graphs and a timeline of the predicted impact time in the future.
  • Relevant events with contextual information, such as changes or updates made to the CloudFormation stack’s resources in the region.
  • Recommendations to mitigate the issue. As seen in the following screenshot, it recommends troubleshooting High CPU or Memory Utilization in ECS along with a link to the necessary documentation.

The following screenshot illustrates an example insight detail page from DevOps Guru

 An example of an ECS Service’s Memory Utilization approaching a limit of 100%. The metric graph shows the anomaly starting two days ago at about 22:00 with memory utilization increasing steadily until the anomaly was reported today at 18:08. The graph also shows a forecast of the memory utilization with a predicted impact of reaching 100% the next day at about 22:00.

Potentially related events on a timeline and below them a list of recommendations. Two deployment events are shown without further details on a timeline. The recommendations table links to one document on how to troubleshoot high CPU or memory utilization in Amazon ECS.

Conclusion

This post describes how DevOps Guru continuously monitors resources in a particular region in your AWS account, as well as proactively helps identify problems around resource exhaustion such as running out of memory, in advance. This helps IT operators take preventative actions even before a problem presents itself, thereby preventing downtime.

Cleaning up

After walking through this post, you should clean up and un-provision the resources in order to avoid incurring any further charges.

  1. To un-provision the CloudFormation stacks, on the AWS CloudFormation console, choose Stacks. Select the stack name, and choose Delete.
  2. Delete the AWS Cloud9 environment.
  3. Delete the ECR repository.

About the authors

Trishanka Saikia

Trishanka Saikia is a Technical Account Manager for AWS. She is also a DevOps enthusiast and works with AWS customers to design, deploy, and manage their AWS workloads/architectures.

Gerhard Poul

Gerhard Poul is a Senior Solutions Architect at Amazon Web Services based in Vienna, Austria. Gerhard works with customers in Austria to enable them with best practices in their cloud journey. He is passionate about infrastructure as code and how cloud technologies can improve IT operations.

Forensic investigation environment strategies in the AWS Cloud

Post Syndicated from Sol Kavanagh original https://aws.amazon.com/blogs/security/forensic-investigation-environment-strategies-in-the-aws-cloud/

When a deviation from your secure baseline occurs, it’s crucial to respond and resolve the issue quickly and follow up with a forensic investigation and root cause analysis. Having a preconfigured infrastructure and a practiced plan for using it when there’s a deviation from your baseline will help you to extract and analyze the information needed to determine the impact, scope, and root cause of an incident and return to operations confidently.

Time is of the essence in understanding the what, how, who, where, and when of a security incident. You often hear of automated incident response, which has repeatable and auditable processes to standardize the resolution of incidents and accelerate evidence artifact gathering.

Similarly, having a standard, pristine, pre-configured, and repeatable forensic clean-room environment that can be automatically deployed through a template allows your organization to minimize human interaction, keep the larger organization separate from contamination, hasten evidence gathering and root cause analysis, and protect forensic data integrity. The forensic analysis process assists in data preservation, acquisition, and analysis to identify the root cause of an incident. This approach can also facilitate the presentation or transfer of evidence to outside legal entities or auditors. AWS CloudFormation templates—or other infrastructure as code (IaC) provisioning tools—help you to achieve these goals, providing your business with consistent, well-structured, and auditable results that allow for a better overall security posture. Having these environments as a permanent part of your infrastructure allows them to be well documented and tested, and gives you opportunities to train your teams in their use.

This post provides strategies that you can use to prepare your organization to respond to secure baseline deviations. These strategies take the form of best practices around Amazon Web Services (AWS) account structure, AWS Organizations organizational units (OUs) and service control policies (SCPs), forensic Amazon Virtual Private Cloud (Amazon VPC) and network infrastructure, evidence artifacts to be collected, AWS services to be used, forensic analysis tool infrastructure, and user access and authorization to the above. The specific focus is to provide an environment where Amazon Elastic Compute Cloud (Amazon EC2) instances with forensic tooling can be used to examine evidence artifacts.

This post presumes that you already have an evidence artifact collection procedure or that you are implementing one and that the evidence can be transferred to the accounts described here. If you are looking for advice on how to automate artifact collection, see How to automate forensic disk collection for guidance.

Infrastructure overview

A well-architected multi-account AWS environment is based on the structure provided by Organizations. As companies grow and need to scale their infrastructure with multiple accounts, often in multiple AWS Regions, Organizations offers programmatic creation of new AWS accounts combined with central management and governance that helps them to do so in a controlled and standardized manner. This programmatic, centralized approach should be used to create the forensic investigation environments described in the strategy in this blog post.

The example in this blog post uses a simplified structure with separate dedicated OUs and accounts for security and forensics, shown in Figure 1. Your organization’s architecture might differ, but the strategy remains the same.

Note: There might be reasons for forensic analysis to be performed live within the compromised account itself, such as to avoid shutting down or accessing the compromised instance or resource; however, that approach isn’t covered here.

Figure 1: AWS Organizations forensics OU example

Figure 1: AWS Organizations forensics OU example

The most important components in Figure 1 are:

  • A security OU, which is used for hosting security-related access and services. The security OU and the associated AWS accounts should be owned and managed by your security organization.
  • A forensics OU, which should be a separate entity, although it can have some similarities and crossover responsibilities with the security OU. There are several reasons for having it within a separate OU and account. Some of the more important reasons are that the forensics team might be a different team than the security team (or a subset of it), certain investigations might be under legal hold with additional access restrictions, or a member of the security team could be the focus of an investigation.

When speaking about Organizations, accounts, and the permissions required for various actions, you must first look at SCPs, a core functionality of Organizations. SCPs offer control over the maximum available permissions for all accounts in your organization. In the example in this blog post, you can use SCPs to provide similar or identical permission policies to all the accounts under the forensics OU, which is being used as a resource container. This policy overrides all other policies, and is a crucial mechanism to ensure that you can explicitly deny or allow any API calls desired. Some use cases of SCPs are to restrict the ability to disable AWS CloudTrail, restrict root user access, and ensure that all actions taken in the forensic investigation account are logged. This provides a centralized way to avoid changing individual policies for users, groups, or roles. Accessing the forensic environment should be done using a least-privilege model, with nobody capable of modifying or compromising the initially collected evidence. For an investigation environment, denying all actions except those you want to list as exceptions is the most straightforward approach. Start with the default of denying all, and work your way towards the least authorizations needed to perform the forensic processes established by your organization. AWS Config can be a valuable tool to track the changes made to the account and provide evidence of these changes.

Keep in mind that once the restrictive SCP is applied, even the root account or those with administrator access won’t have access beyond those permissions; therefore, frequent, proactive testing as your environment changes is a best practice. Also, be sure to validate which principals can remove the protective policy, if required, to transfer the account to an outside entity. Finally, create the environment before the restrictive permissions are applied, and then move the account under the forensic OU.

Having a separate AWS account dedicated to forensic investigations is best to keep your larger organization separate from the possible threat of contamination from the incident itself, ensure the isolation and protection of the integrity of the artifacts being analyzed, and keeping the investigation confidential. Separate accounts also avoid situations where the threat actors might have used all the resources immediately available to your compromised AWS account by hitting service quotas and so preventing you from instantiating an Amazon EC2 instance to perform investigations.

Having a forensic investigation account per Region is also a good practice, as it keeps the investigative capabilities close to the data being analyzed, reduces latency, and avoids issues of the data changing regulatory jurisdictions. For example, data residing in the EU might need to be examined by an investigative team in North America, but the data itself cannot be moved because its North American architecture doesn’t align with GDPR compliance. For global customers, forensics teams might be situated in different locations worldwide and have different processes. It’s better to have a forensic account in the Region where an incident arose. The account as a whole could also then be provided to local legal institutions or third-party auditors if required. That said, if your AWS infrastructure is contained within Regions only in one jurisdiction or country, then a single re-creatable account in one Region with evidence artifacts shared from and kept in their respective Regions could be an easier architecture to manage over time.

An account created in an automated fashion using a CloudFormation template—or other IaC methods—allows you to minimize human interaction before use by recreating an entirely new and untouched forensic analysis instance for each separate investigation, ensuring its integrity. Individual people will only be given access as part of a security incident response plan, and even then, permissions to change the environment should be minimal or none at all. The post-investigation environment would then be either preserved in a locked state or removed, and a fresh, blank one created in its place for the subsequent investigation with no trace of the previous artifacts. Templating your environment also facilitates testing to ensure your investigative strategy, permissions, and tooling will function as intended.

Accessing your forensics infrastructure

Once you’ve defined where your investigative environment should reside, you must think about who will be accessing it, how they will do so, and what permissions they will need.

The forensic investigation team can be a separate team from the security incident response team, the same team, or a subset. You should provide precise access rights to the group of individuals performing the investigation as part of maintaining least privilege.

You should create specific roles for the various needs of the forensic procedures, each with only the permissions required. As with SCPs and other situations described here, start with no permissions and add authorizations only as required while establishing and testing your templated environments. As an example, you might create the following roles within the forensic account:

Responder – acquire evidence

Investigator – analyze evidence

Data custodian – manage (copy, move, delete, and expire) evidence

Analyst – access forensics reports for analytics, trends, and forecasting (threat intelligence)

You should establish an access procedure for each role, and include it in the response plan playbook. This will help you ensure least privilege access as well as environment integrity. For example, establish a process for an owner of the Security Incident Response Plan to verify and approve the request for access to the environment. Another alternative is the two-person rule. Alert on log-in is an additional security measure that you can add to help increase confidence in the environment’s integrity, and to monitor for unauthorized access.

You want the investigative role to have read-only access to the original evidence artifacts collected, generally consisting of Amazon Elastic Block Store (Amazon EBS) snapshots, memory dumps, logs, or other artifacts in an Amazon Simple Storage Service (Amazon S3) bucket. The original sources of evidence should be protected; MFA delete and S3 versioning are two methods for doing so. Work should be performed on copies of copies if rendering the original immutable isn’t possible, especially if any modification of the artifact will happen. This is discussed in further detail below.

Evidence should only be accessible from the roles that absolutely require access—that is, investigator and data custodian. To help prevent potential insider threat actors from being aware of the investigation, you should deny even read access from any roles not intended to access and analyze evidence.

Protecting the integrity of your forensic infrastructures

Once you’ve built the organization, account structure, and roles, you must decide on the best strategy inside the account itself. Analysis of the collected artifacts can be done through forensic analysis tools hosted on an EC2 instance, ideally residing within a dedicated Amazon VPC in the forensics account. This Amazon VPC should be configured with the same restrictive approach you’ve taken so far, being fully isolated and auditable, with the only resources being dedicated to the forensic tasks at hand.

This might mean that the Amazon VPC’s subnets will have no internet gateways, and therefore all S3 access must be done through an S3 VPC endpoint. VPC flow logging should be enabled at the Amazon VPC level so that there are records of all network traffic. Security groups should be highly restrictive, and deny all ports that aren’t related to the requirements of the forensic tools. SSH and RDP access should be restricted and governed by auditable mechanisms such as a bastion host configured to log all connections and activity, AWS Systems Manager Session Manager, or similar.

If using Systems Manager Session Manager with a graphical interface is required, RDP or other methods can still be accessed. Commands and responses performed using Session Manager can be logged to Amazon CloudWatch and an S3 bucket, this allows auditing of all commands executed on the forensic tooling Amazon EC2 instances. Administrative privileges can also be restricted if required. You can also arrange to receive an Amazon Simple Notification Service (Amazon SNS) notification when a new session is started.

Given that the Amazon EC2 forensic tooling instances might not have direct access to the internet, you might need to create a process to preconfigure and deploy standardized Amazon Machine Images (AMIs) with the appropriate installed and updated set of tooling for analysis. Several best practices apply around this process. The OS of the AMI should be hardened to reduce its vulnerable surface. We do this by starting with an approved OS image, such as an AWS-provided AMI or one you have created and managed yourself. Then proceed to remove unwanted programs, packages, libraries, and other components. Ensure that all updates and patches—security and otherwise—have been applied. Configuring a host-based firewall is also a good precaution, as well as host-based intrusion detection tools. In addition, always ensure the attached disks are encrypted.

If your operating system is supported, we recommend creating golden images using EC2 Image Builder. Your golden image should be rebuilt and updated at least monthly, as you want to ensure it’s kept up to date with security patches and functionality.

EC2 Image Builder—combined with other tools—facilitates the hardening process; for example, allowing the creation of automated pipelines that produce Center for Internet Security (CIS) benchmark hardened AMIs. If you don’t want to maintain your own hardened images, you can find CIS benchmark hardened AMIs on the AWS Marketplace.

Keep in mind the infrastructure requirements for your forensic tools—such as minimum CPU, memory, storage, and networking requirements—before choosing an appropriate EC2 instance type. Though a variety of instance types are available, you’ll want to ensure that you’re keeping the right balance between cost and performance based on your minimum requirements and expected workloads.

The goal of this environment is to provide an efficient means to collect evidence, perform a comprehensive investigation, and effectively return to safe operations. Evidence is best acquired through the automated strategies discussed in How to automate incident response in the AWS Cloud for EC2 instances. Hashing evidence artifacts immediately upon acquisition is highly recommended in your evidence collection process. Hashes, and in turn the evidence itself, can then be validated after subsequent transfers and accesses, ensuring the integrity of the evidence is maintained. Preserving the original evidence is crucial if legal action is taken.

Evidence and artifacts can consist of, but aren’t limited to:

Access to the control plane logs mentioned above—such as the CloudTrail logs—can be accessed in one of two ways. Ideally, the logs should reside in a central location with read-only access for investigations as needed. However, if not centralized, read access can be given to the original logs within the source account as needed. Read access to certain service logs found within the security account, such as AWS Config, Amazon GuardDuty, Security Hub, and Amazon Detective, might be necessary to correlate indicators of compromise with evidence discovered during the analysis.

As previously mentioned, it’s imperative to have immutable versions of all evidence. This can be achieved in many ways, including but not limited to the following examples:

  • Amazon EBS snapshots, including hibernation generated memory dumps:
    • Original Amazon EBS disks are snapshotted, shared to the forensics account, used to create a volume, and then mounted as read-only for offline analysis.
  • Amazon EBS volumes manually captured:
    • Linux tools such as dc3dd can be used to stream a volume to an S3 bucket, as well as provide a hash, and then made immutable using an S3 method from the next bullet point.
  • Artifacts stored in an S3 bucket, such as memory dumps and other artifacts:
    • S3 Object Lock prevents objects from being deleted or overwritten for a fixed amount of time or indefinitely.
    • Using MFA delete requires the requestor to use multi-factor authentication to permanently delete an object.
    • Amazon S3 Glacier provides a Vault Lock function if you want to retain immutable evidence long term.
  • Disk volumes:
    • Linux: Mount in read-only mode.
    • Windows: Use one of the many commercial or open-source write-blocker applications available, some of which are specifically made for forensic use.
  • CloudTrail:
  • AWS Systems Manager inventory:
  • AWS Config data:
    • By default, AWS Config stores data in an S3 bucket, and can be protected using the above methods.

Note: AWS services such as KMS can help enable encryption. KMS is integrated with AWS services to simplify using your keys to encrypt data across your AWS workloads.

An example use case of Amazon EBS disks being shared as evidence to the forensics account, the following figure—Figure 2—is a simplified S3 bucket folder structure you could use to store and work with evidence.

Figure 2 shows an S3 bucket structure for a forensic account. An S3 bucket and folder is created to hold incoming data—for example, from Amazon EBS disks—which is streamed to Incoming Data > Evidence Artifacts using dc3dd. The data is then copied from there to a folder in another bucket—Active Investigation > Root Directory > Extracted Artifacts—to be analyzed by the tooling installed on your forensic Amazon EC2 instance. Also, there are folders under Active Investigation for any investigation notes you make during analysis, as well as the final reports, which are discussed at the end of this blog post. Finally, a bucket and folders for legal holds, where an object lock will be placed to hold evidence artifacts at a specific version.

Figure 2: Forensic account S3 bucket structure

Figure 2: Forensic account S3 bucket structure

Considerations

Finally, depending on the severity of the incident, your on-premises network and infrastructure might also be compromised. Having an alternative environment for your security responders to use in case of such an event reduces the chance of not being able to respond in an emergency. Amazon services such as Amazon Workspaces—a fully managed persistent desktop virtualization service—can be used to provide your responders a ready-to-use, independent environment that they can use to access the digital forensics and incident response tools needed to perform incident-related tasks.

Aside from the investigative tools, communications services are among the most critical for coordination of response. You can use Amazon WorkMail and Amazon Chime to provide that capability independent of normal channels.

Conclusion

The goal of a forensic investigation is to provide a final report that’s supported by the evidence. This includes what was accessed, who might have accessed it, how it was accessed, whether any data was exfiltrated, and so on. This report might be necessary for legal circumstances, such as criminal or civil investigations or situations requiring breach notifications. What output each circumstance requires should be determined in advance in order to develop an appropriate response and reporting process for each. A root cause analysis is vital in providing the information required to prepare your resources and environment to help prevent a similar incident in the future. Reports should not only include a root cause analysis, but also provide the methods, steps, and tools used to arrive at the conclusions.

This article has shown you how you can get started creating and maintaining forensic environments, as well as enable your teams to perform advanced incident resolution investigations using AWS services. Implementing the groundwork for your forensics environment, as described above, allows you to use automated disk collection to begin iterating on your forensic data collection capabilities and be better prepared when security events occur.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on one of the AWS Security, Identity, and Compliance forums or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sol Kavanagh

Sol is a senior solutions architect and CISSP with 20+ years of experience in the enterprise space. He is passionate about security and helping customers build solutions in the AWS Cloud. In his spare time he enjoys distance cycling, adventure travelling, Buddhist philosophy, and Muay Thai.

Migrate and secure your Windows PKI to AWS with AWS CloudHSM

Post Syndicated from Govindarajan Varadan original https://aws.amazon.com/blogs/security/migrate-and-secure-your-windows-pki-to-aws-with-aws-cloudhsm/

AWS CloudHSM provides a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys in AWS. Using CloudHSM as part of a Microsoft Active Directory Certificate Services (AD CS) public key infrastructure (PKI) fortifies the security of your certificate authority (CA) private key and ensures the security of the trust hierarchy. In this blog post, we walk you through how to migrate your existing Microsoft AD CS CA private key to the HSM in a CloudHSM cluster.

The challenge

Organizations implement public key infrastructure (PKI) as an application to provide integrity and confidentiality between internal and customer-facing applications. A PKI provides encryption/decryption, message hashing, digital certificates, and digital signatures to ensure these security objectives are met. Microsoft AD CS is a popular choice for creating and managing a CA for enterprise applications such as Active Directory, Exchange, and Systems Center Configuration Manager. Moving your Microsoft AD CS to AWS as part of your overall migration plan allows you to continue to use your existing investment in Windows certificate auto enrollment for users and devices without disrupting existing workflows or requiring new certificates to be issued. However, when you migrate an on-premises infrastructure to the cloud, your security team may determine that storing private keys on the AD CS server’s disk is insufficient for protecting the private key that signs the certificates issued by the CA. Moving from storing private keys on the AD CS server’s disk to a hardware security module (HSM) can provide the added security required to maintain trust of the private keys.

This walkthrough shows you how to migrate your existing AD CS CA private key to the HSM in your CloudHSM cluster. The resulting configuration avoids the security concerns of using keys stored on your AD CS server, and uses the HSM to perform the cryptographic signing operations.

Prerequisites

For this walkthrough, you should have the following in place:

Migrating a domain

In this section, you will walk through migrating your AD CS environment to AWS by using your existing CA certificate and private key that will be secured in CloudHSM. In order to securely migrate the private key into the HSM, you will install the CloudHSM client and import the keys directly from the existing CA server.

This walkthrough includes the following steps:

  1. Create a crypto user (CU) account
  2. Import the CA private key into CloudHSM
  3. Export the CA certificate and database
  4. Configure and import the certificate into the new Windows CA server
  5. Install AD CS on the new server

The operations you perform on the HSM require the credentials of an HSM user. Each HSM user has a type that determines the operations you can perform when authenticated as that user. Next, you will create a crypto user (CU) account to use with your CA servers, to manage keys and to perform cryptographic operations.

To create the CU account

  1. From the on-premises CA server, use the following command to log in with the crypto officer (CO) account that you created when you activated the cluster. Be sure to replace <co_password> with your CO password.
    loginHSM CO admin <co_password>
    

  2. Use the following command to create the CU account. Replace <cu_user> and <cu_password> with the username and password you want to use for the CU.
    createUser CU <cu_user> <cu_password>
    

  3. Use the following command to set the login credentials for the HSM on your system and enable the AWS CloudHSM client for Windows to use key storage providers (KSPs) and Cryptography API: Next Generation (CNG) providers. Replace <cu_user> and <cu_password> with the username and password of the CU.
    set_cloudhsm_credentials.exe --username <cu_user> password <cu_password>
    

Now that you have the CloudHSM client installed and configured on the on-premises CA server, you can import the CA private key from the local server into your CloudHSM cluster.

To import the CA private key into CloudHSM

  1. Open an administrative command prompt and navigate to C:\Program Files\Amazon\CloudHSM.
  2. To identify the unique container name for your CA’s private key, enter certutil -store my to list all certificates stored in the local machine store. The CA certificate will be shown as follows:
    ================ Certificate 0 ================
    Serial Number: <certificate_serial_number>
    Issuer: CN=example-CA, DC=example, DC=com
     NotBefore: 6/25/2021 5:04 PM
     NotAfter: 6/25/2022 5:14 PM
    Subject: CN=example-CA-test3, DC=example, DC=com
    Certificate Template Name (Certificate Type): CA
    CA Version: V0.0
    Signature matches Public Key
    Root Certificate: Subject matches Issuer
    Template: CA, Root Certification Authority
    Cert Hash(sha1): cb7c09cd6c76d69d9682a31fbdbbe01c29cebd82
      Key Container = example-CA-test3
      Unique container name: <unique_container_name>
      Provider = Microsoft Software Key Storage Provider
    Signature test passed
    

  3. Verify that the key is backed by the Microsoft Software Key Storage Provider and make note of the <unique_container_name> from the output, to use it in the following steps.
  4. Use the following command to set the environment variable n3fips_password. Replace <cu_user> and <cu_password> with the username and password for the CU you created earlier for the CloudHSM cluster. This variable will be used by the import_key command in the next step.
    set n3fips_password=<cu_user>:<cu_password>
    

  5. Use the following import_key command to import the private key into the HSM. Replace <unique_container_name> with the value you noted earlier.
    import_key.exe -RSA "<unique_container_name>

The import_key command will report that the import was successful. At this point, your private key has been imported into the HSM, but the on-premises CA server will continue to run using the key stored locally.

The Active Directory Certificate Services Migration Guide for Windows Server 2012 R2 uses the Certification Authority snap-in to migrate the CA database, as well as the certificate and private key. Because you have already imported your private key into the HSM, next you will need to make a slight modification to this process and export the certificate manually, without its private key.

To export the CA certificate and database

  1. To open the Microsoft Management Console (MMC), open the Start menu and in the search field, enter MMC, and choose Enter.
  2. From the File menu, select Add/Remove Snapin.
  3. Select Certificates and choose Add.
  4. You will be prompted to select which certificate store to manage. Select Computer account and choose Next.
  5. Select Local Computer, choose Finish, then choose OK.
  6. In the left pane, choose Personal, then choose Certificates. In the center pane, locate your CA certificate, as shown in Figure 1.
     
    The MMC Certificates snap-in displays the Certificates directories for the local computer. The Personal Certificates location is open displaying the example-CA-test3 certificate.

    Figure 1: Microsoft Management Console Certificates snap-in

  7. Open the context (right-click) menu for the certificate, choose All Tasks, then choose Export.
  8. In the Certificate Export Wizard, choose Next, then choose No, do not export the private key.
  9. Under Select the format you want to use, select Cryptographic Message Syntax Standard – PKCS #7 format file (.p7b) and select Include all certificates in the certification path if possible, as shown in Figure 2.
     
    The Certificate Export Wizard window is displayed.  This windows is prompting for the selection of an export format.  The toggle is selected for Cryptographic Message Syntax Standard – PKCS #7 Certificates (.P7B) and the check box is marked to Include all certificates in the certification path if possible.

    Figure 2: Certificate Export Wizard

  10. Save the file in a location where you’ll be able to locate it later, so you will be able to copy it to the new CA server.
  11. From the Start menu, browse to Administrative Tools, then choose Certificate Authority.
  12. Open the context (right-click) menu for your CA and choose All Tasks, then choose Back up CA.
  13. In the Certificate Authority Backup Wizard, choose Next. For items to back up, select only Certificate database and certificate database log. Leave all other options unselected.
  14. Under Back up to this location, choose Browse and select a new empty folder to hold the backup files, which you will move to the new CA later.
  15. After the backup is complete, in the MMC, open the context (right-click) menu for your CA, choose All Tasks, then choose Stop service.

At this point, until you complete the migration, your CA will no longer be issuing new certificates.

To configure and import the certificate into the new Windows CA server

  1. Open a Remote Desktop session to the EC2 instance that you created in the prerequisite steps, which will serve as your new AD CS certificate authority.
  2. Copy the certificate (.p7b file) backup from the on-premises CA server to the EC2 instance.
  3. On your EC2 instance, locate the certificate you just copied, as shown in Figure 3. Open the certificate to start the import process.
     
    The Certificate Manager tool window shows the Certificates directory for the p7b file that was opened. The main window for this location is displaying the example-CA-test3 certificate.

    Figure 3: Certificate Manager tool

  4. Select Install Certificate. For Store Location, select Local Machine.
  5. Select Place the Certificates in the following store. Allowing Windows to place the certificate automatically will install it as a trusted root certificate, rather than a server certificate.
  6. Select Browse, select the Personal store, and then choose OK.
  7. Choose Next, then choose Finish to complete the certificate installation.

At this point, you’ve installed the public key and certificate from the on-premises CA server to your EC2-based Windows CA server. Next, you need to link this installed certificate with the private key, which is now stored on the CloudHSM cluster, in order to make it functional for signing issued certificates and CRLs.

To link the certificate with the private key

  1. Open an administrative command prompt and navigate to C:\Program Files\Amazon\CloudHSM.
  2. Use the following command to set the environment variable n3fips_password. Replace <cu_user> and <cu_password> with the username and password for the CU that you created earlier for the CloudHSM cluster. This variable will be used by the import_key command in the next step.
    set n3fips_password=<cu_user>:<cu_password>
    

  3. Use the following import_key command to represent all keys stored on the HSM in a new key container in the key storage provider. This step is necessary to allow the cryptography tools to see the CA private key that is stored on the HSM.
    import_key -from HSM -all
    

  4. Use the following Windows certutil command to find your certificate’s unique serial number.
    certutil -store my
    

    Take note of the CA certificate’s serial number.

  5. Use the following Windows certutil command to link the installed certificate with the private key stored on the HSM. Replace <certificate_serial_number> with the value noted in the previous step.
    certutil -repairstore my <certificate_serial_number>
    

  6. Enter the command certutil -store my. The CA certificate will be shown as follows. Verify that the certificate is now linked with the HSM-backed private key. Note that the private key is using the Cavium Key Store Provider. Also note the message Encryption test passed, which means that the private key is usable for encryption.
    ================ Certificate 0 ================
    Serial Number: <certificate_serial_number>
    Issuer: CN=example-CA, DC=example, DC=com
     NotBefore: 6/25/2021 5:04 PM
     NotAfter: 6/25/2022 5:14 PM
    Subject: CN=example-CA, DC=example, DC=com
    Certificate Template Name (Certificate Type): CA
    CA Version: V0.0
    Signature matches Public Key
    Root Certificate: Subject matches Issuer
    Template: CA, Root Certification Authority
    Cert Hash(sha1): cb7c09cd6c76d69d9682a31fbdbbe01c29cebd82
      Key Container = PRV_KEY_IMPORT-6-9-7e5cde
      Provider = Cavium Key Storage Provider
    Private key is NOT exportable
    Encryption test passed
    

Now that your CA certificate and key materials are in place, you are ready to setup your EC2 instance as a CA server.

To install AD CS on the new server

  1. In Microsoft’s documentation to Install the Certificate Authority role on your new EC2 instance, follow steps 1-8. Do not complete the remaining steps, because you will be configuring the CA to use the existing HSM backed certificate and private-key instead of generating a new key.
  2. In Confirm installation selections, select Install.
  3. After your installation is complete, Server Manager will show a notification banner prompting you to configure AD CS. Select Configure Active Directory Certificate Services from this prompt.
  4. Select either Standalone or Enterprise CA installation, based upon the configuration of your on-premises CA.
  5. Select Use Existing Certificate and Private Key and browse to select the CA certificate imported from your on-premises CA server.
  6. Select Next and verify your location for the certificate database files.
  7. Select Finish to complete the wizard.
  8. To restore the CA database backup, from the Start menu, browse to Administrative Tools, then choose Certificate Authority.
  9. Open the context (right-click) menu for the certificate authority and choose All Tasks, then choose Restore CA. Browse to and select the database backup that you copied from the on-premises CA server.

Review the Active Directory Certificate Services Migration Guide for Windows Server 2012 R2 to complete migration of your remaining Microsoft Public Key Infrastructure (PKI) components. Depending on your existing CA environment, these steps may include establishing new CRL and AIA endpoints, configuring Windows Routing and Remote Access to use the new CA, or configuring certificate auto enrollment for Windows clients.

Conclusion

In this post, we walked you through migrating an on-premises Microsoft AD CS environment to an AWS environment that uses AWS CloudHSM to secure the CA private key. By migrating your existing Windows PKI backed by AWS CloudHSM, you can continue to use your Windows certificate auto enrollment for users and devices with your private key secured in a dedicated HSM.

For more information about setting up and managing CloudHSM, see Getting Started with AWS CloudHSM and the AWS Security Blog post CloudHSM best practices to maximize performance and avoid common configuration pitfalls.

If you have feedback about this blog post, submit comments in the Comments section below. You can also start a new thread on the AWS CloudHSM forum to get answers from the community.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Govindarajan Varadan

Govindarajan is a senior solutions architect at AWS based out of Silicon Valley in California. He works with AWS customers to help them achieve their business objectives by innovating at scale, modernizing their applications, and adopting game-changing technologies like AI/ML.

Author

Brian Benscoter

Brian is a senior solutions architect at AWS with a passion for governance at scale and is based in Charlotte, NC. Brian works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.

Author

Axel Larsson

Axel is an enterprise solutions architect at AWS. He has helped several companies migrate to AWS and modernize their architecture. Axel is passionate about helping organizations establish a solid foundation in the cloud, enabled by security best practices.