All posts by Sheldon Sides

Analyze and understand IAM role usage with Amazon Detective

2021-02-23 Sheldon Sides

Post Syndicated from Sheldon Sides original https://aws.amazon.com/blogs/security/analyze-and-understand-iam-role-usage-with-amazon-detective/

In this blog post, we’ll demonstrate how you can use Amazon Detective’s new role session analysis feature to investigate security findings that are tied to the usage of an AWS Identity and Access Management (IAM) role. You’ll learn about how you can use this new role session analysis feature to determine which Amazon Web Services (AWS) resource assumed the role that triggered a finding, and to understand the context of the activities that the resource performed when the finding was triggered. As a result of this walkthrough, you’ll gain an understanding of how to quickly ascertain anomalous identity and access behaviors. While this demonstration utilizes an Amazon GuardDuty finding as a starting point, the techniques demonstrated within this post highlight how Detective can be utilized to investigate any access behaviors that are tied to using IAM roles.

IAM roles provide a valuable mechanism that you can use to delegate access to users and services for managing and accessing your AWS resources, but using IAM roles can make it more complex to determine who performed an action. AWS CloudTrail logs do track all usage tied to IAM roles, but attributing activity to a specific resource that assumed a role requires storage of CloudTrail logs and analysis of this log telemetry. Understanding role usage through log analysis gets even more complex if cross-account role assumptions are involved, since that requires you to collate and analyze logs from multiple accounts. In some cases, permissions may allow a resource to sequentially assume a series of different roles (role chaining), further complicating the attribution of activity to a specific resource.

With its built-in, multi-account log analysis, Detective’s new role session analysis feature provides visibility into role usage, cross-account role assumptions and into any role chaining activities that may have been performed across the accounts. With this feature, you can quickly determine who or what assumed a role, regardless of whether this was a federated, IAM user or other resource. The feature shows you when roles were assumed and for how long, and helps you determine the activities that were performed during the assumption. Detective visualizes these results based upon its automatic analysis of CloudTrail logs and VPC flow log traffic that it continuously processes for enabled accounts, regardless of whether these log sources are enabled on each account.

To demonstrate this feature, we’ll investigate a “CloudTrail logging disabled” finding that is triggered by Amazon GuardDuty as a result of activity performed by a resource that has assumed an IAM role. Amazon GuardDuty is an AWS service that continuously monitors for malicious or unauthorized behavior to help protect your AWS resources, including your AWS accounts, access keys, and EC2 instances. GuardDuty identifies unusual or unauthorized activity, like crypto-currency mining, access to data stored in S3 from unusual locations, or infrastructure deployments in a region that has never been used.

Start the investigation in GuardDuty

GuardDuty issues a CloudTrailLoggingDisabled finding to alert you that CloudTrail logging has been disabled in one of your accounts. This is an important finding, because it could indicate that an attacker is attempting to hide their tracks. Since Detective receives a copy of CloudTrail traffic directly from the AWS infrastructure, Detective will continue to receive API calls that are made after CloudTrail logging is disabled.

In order to properly investigate this type of finding and determine if this is an issue that you need to be concerned about, you’ll need to answer a few specific questions:

You’ll need to determine which user or resource disabled CloudTrail.
You’ll need to see what other actions they performed after disabling logging.
You’ll want to understand if their access pattern and behavior is consistent with their previous access patterns and behaviors.

Let’s take a look at a CloudTrailLoggingDisabled finding in GuardDuty as we start trying to answer these questions. When you access the GuardDuty console, a list of your recent findings is displayed. In Figure 1, a filter has been applied to display the CloudTrailLoggingDisabled finding.

Figure 1: A GuardDuty finding showing that CloudTrail was disabled

After you select the GuardDuty finding, you can see the finding details, including some of the user information related to the finding. Figure 2 shows the Resources affected section of the finding.

Figure 2: Viewing user data related to the GuardDuty finding

The Affected resources field indicates that the demo-trail-2 trail was where logging was disabled. You can also see that User type is set to AssumedRole and that User name contains the role AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5. This was the role that was assumed and using which CloudTrail logging was disabled. This information can help you understand the resources this role delegates access to and the permissions it provides. You still need to identify who specifically assumed the role to disable CloudTrail logging and the activities they performed afterwards. You can use Amazon Detective to answer these questions.

Investigate the finding in Detective

In order to investigate this GuardDuty finding in Detective, you select the finding and then select Investigate in the Actions menu, as shown in Figure 3.

Figure 3: Choose ‘Investigate with Detective’ and select the GuardDuty finding ID on the pop-up to investigate the finding

View the finding profile page

Choosing the Investigate action for this CloudTrailLoggingDisabled finding in GuardDuty opens the finding’s profile page in the Detective console, as shown in Figure 4. Detective has the concept of a profile page, which displays summaries and analytics gleaned from CloudTrail management logs, VPC flow traffic and GuardDuty findings for AWS resources, IP addresses, and user agents. Each profile page can display up to 12 months of information for the selected resource and is intended to help an investigator review and understand the behavior of a resource, or quickly triage and delve into potential issues. Detective doesn’t require a customer to enable CloudTrail or VPC Flog logging in order to retrieve this data and provides these 12 months of visibility regardless of the customers log retention or archiving policies.

Figure 4: Viewing a GuardDuty finding in Detective

Scope time

To help focus your investigation, Detective defaults the time range and thus the displayed information in a finding profile to cover the period of time from when the finding was created through when it was last updated. In the case of this finding, the scope time covers a 1-hour period of time. You can change the scope time by choosing the calendar icon at the top right of the page, if you want to examine additional information before or after the finding was created. The defaulted scope time is sufficient for this investigation, so we can leave it as-is.

Role session overview

Detective uses tabs to group information on profile pages, and for this finding it shows the role session overview tab by default. The role session represents the activities and behavior of the resource that assumed the role tied to our finding. In this case, the role was assumed by someone with the user name sara, as shown in the Assumed by field. (We’ll assume that the user’s first name is Sara.) By analyzing the role session information in the CloudTrail logs, Detective was able to immediately identify that sara was the user who disabled CloudTrail logging and caused the finding to be triggered. You now have an answer to the question of who did this action.

Before we move to answer our other questions about what Sara did after disabling logging and whether her behavior changed, let’s discuss role sessions in more detail. Every role session has a role session name, sara in this case, and a unique role session identifier. The role session identifier is the role ID of the role assumed and the role session name, concatenated together. Best practices dictate that for a specific role that’s assumed by a specific resource, the role session name represents the user name of the IAM or federated user, or includes other useful information about the resource that assumed the role (for more information, see the Naming of individual IAM role sessions blog post). In this case, because the best practices are being followed, Detective is able to track Sara’s activities and behavior each time she assumes the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role.

Detective tracks statistics such as when a role session was first observed (October in Sara’s case, for this role), as well as the actions performed and behavioral insights such as the geolocations where Sara initiated her role assumptions. Knowing that Sara has assumed this role before is useful, because you can now assess whether her usage of the role changed during the 1-hour window of the scope time that you’re looking at now, compared to all of her previous assumptions of this role.

Review changes in Sara’s access patterns and operations

Detective tracks changes in geographical access and operations on the New behavior tab. Let’s choose the New behavior tab for the role session to see this information, as displayed in Figure 5.

Figure 5: Viewing new role session behavior

During a security investigation, determining that access patterns have changed can be helpful in highlighting malicious activity. Since Detective tracks Sara’s assumptions of the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role, it can show the location where Sara assumed the role and whether the current assumption took place from the same location as her previous ones.

In Figure 5, you can see that Sara has a history of assuming the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role from Bellevue, WA and Ashburn, VA, since those geographies are shown in blue. If she had assumed this role from a new location, you would see the new location indicated on the map in orange. Since the API calls being made by this user are from a previously observed location, it’s very unlikely that the user’s credentials were compromised. Making this determination through a manual analysis of CloudTrail logs would have been much more time consuming.

Other information that you can gather from the New behavior role session tab includes newly observed API calls, API calls with increased volume, newly observed autonomous system organizations, and newly observed user agents. It’s useful to be able to validate that the operations Sara performed during the current scope time are relatively consistent with the operations she has performed in the past. This helps us be more certain that it was indeed Sara who was conducting this activity.

Investigate Sara’s API activity

Now that we’ve determined that Sara’s access pattern and activities are consistent with previous behavior, let’s use Detective to look further into Sara’s activity to determine if she accidentally disabled CloudTrail logging or if there was possible malicious intent behind her action.

To investigate the user’s actions

On the finding profile page, in the dropdown list at the top of the screen, select Overview: Role Session to go back to the Overview tab for the role session.

Figure 6: Navigating to the ‘Overview: Role Session’ page
Once you’re on the Overview tab, navigate to the Overall API call volume panel.

Figure 7: Navigating to the ‘Overall API call volume’ panel

This panel displays a chart of the successful and failed API calls that Sara has made while she assumes the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role. The chart shows a black rectangle around activities that were performed during the CloudTrail findings scope time. It also displays historical activities and shows a baseline across the chart so that you can understand how actively she uses the permissions granted to her by assuming this role.
Choose the display details for scope time button to retrieve the details of the API calls that were invoked by Sara during the scope time, so that you can determine her actions after she disabled CloudTrail logging.

Figure 8: Displaying details based on scope time

You will now see the Overall API call volume panel expand to show you all the IP addresses, API calls, and access keys used by Sara during the scope time window of this finding.
Choose the API method tab to see a list of all the API calls that were made.

Figure 9: Viewing the API methods called

She invoked just two API calls during this scope time: the StopLogging and AssumeRole API calls. You were already aware that Sara disabled CloudTrail logging, but you weren’t aware that she assumed another role. When a user assumes a role while they have another one assumed, this is called role chaining. Although role chaining can be used because a user needs additional permissions, it can also be used to hide activities. Because we don’t know what other actions Sara performed after assuming this second role, let’s dig further. That may shed light on why she chose to disable CloudTrail logging.

Examine chained role assumptions

To find out more about Sara’s use of role chaining, let’s look at the other role that she assumed during this role session.

To view the user’s other role

Navigate back to the top of the finding profile page. In the Role session details panel, choose AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5.

Figure 10: Locating the ‘Assumed role’ name

Detective displays the AWS Role profile page for this role, and you can now see the activity that has occurred across all resources that have assumed this role. In order to highlight information that’s relevant to the time frame of your investigation, Detective maintains your scope time as you move from the CloudTrailLoggingDisabled finding profile page to this role profile page.
The goal for coming to this page is to determine which other role Sara assumed after assuming the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role, so choose the Resource interaction tab. On this tab, you will see the following three panels: Resources that assumed this role, Assumed roles, and Sessions involved.In Figure 11, you can see the Resources that assumed this role panel, which lists all the AWS resources that have assumed this role, their type (EC2 instance, federated or IAM user, IAM role), their account, and when they assumed the role for the first and last time. Sara is on this list, but Detective does not show an AWS account next to her because federated users aren’t tied to a specific account. The account field is populated for other resource types that are displayed on this panel and can be useful to understand cross-account role assumptions.

Figure 11: Viewing resources that have assumed a role
On the same Resource Interaction tab, as you scroll down you will see the Assumed Roles panel, Figure 12, which helps you understand role chaining by listing the other roles that have been assumed by the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role. In this case, the role has assumed several other roles, including DemoRole1 during the same window of time when the CloudTrailLoggingDisabled finding occurred.

Figure 12: Viewing the roles that have been assumed
In Figure 13, you can see the Sessions involved panel, which shows the role sessions for all the resources that have assumed this role, and role sessions where this role has assumed other roles within the current scope time. You see two role sessions with the session name sara, one where Sara assumed the AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 role and another where AWSReservedSSO_AdministratorAccess_598c5f73f8b2b4e5 assumed DemoRole1.

Figure 13: Viewing the role sessions this role was involved with

Now that you know that Sara also used the role DemoRole1 during her role session, let’s take a closer look at what actions she performed.

View API operations that were called within the chained role

In this step, we’ll view Sara’s activity within the DemoRole1 role, focusing on the API calls that were made.

To view the user’s activity in another role

In the Sessions involved panel, in the Session name column, find the row where DemoRole1 is the Assumed Role value. Choose the session name in this row, sara, to go to the role session profile page.
You will be most interested in the API methods that were called during this role session, and you can view those in the Overall API call volume panel. As shown in Figure 14, you can see that Sara has accessed DemoRole1 before, because there are calls graphed prior to the calls in our scope time.
Choose the display details for scope time button on the Overall API call volume panel, and then choose the API method tab.

Figure 14: Viewing the role session API method calls

In Figure 14, you can see that calls were made to the DescribeInstances and RunInstances API methods. So you now know that Sara determined the type of Amazon Elastic Compute Cloud (Amazon EC2) instances that were running in your account and then successfully created an EC2 instance by calling the RunInstances API method. You can also see that successful and failed calls were made to the AttachRolePolicy API method as a part of the session. This could possibly be an attempt to elevate permissions in the account and would justify further investigation into the user’s actions.

As an investigator, you’ve determined that Sara was the user who disabled CloudTrail logging and that her access pattern was consistent with her past accesses. You’ve also determined the other actions she performed after she disabled logging and assumed a second role, but you can continue to investigate further by answering additional questions, such as:

What did Sara do with DemoRole1 when she assumed this role in the past? Are her current activities consistent with those past activities?
What activities are being performed across this account? Are those consistent with Sara’s activities?

By using Detective’s features that have been demonstrated in this post, you will be able to answer the questions like the ones listed above.

Summary

After you read this post, we hope you have a better understanding of the ways in which Amazon Detective collects, organizes, and presents log data to simplify your security investigations. All Detective service subscriptions include the new role session analysis capabilities. With these capabilities, you can quickly attribute activity performed under a role to a specific resource in your environment, understand cross-account role assumptions, determine role chaining behavior, and quickly see called APIs.

All customers receive a 30-day free trial when they enable Amazon Detective. See the AWS Regional Services page for all the Regions where Detective is available. To learn more, visit the Amazon Detective product page or see the additional resources at the end of this post to further expand your knowledge of Detective capabilities and features.

Additional resources

Amazon Detective features

Amazon Detective overview and demo

Amazon Detective FAQs

Amazon Detective Regions, endpoints, and quotas

Naming of individual IAM role sessions

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Detecting sensitive data in DynamoDB with Macie

2020-12-11 Sheldon Sides

Post Syndicated from Sheldon Sides original https://aws.amazon.com/blogs/security/detecting-sensitive-data-in-dynamodb-with-macie/

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in Amazon Web Services (AWS). It gives you the ability to automatically scan for sensitive data and get an inventory of your Amazon Simple Storage Service (Amazon S3) buckets. Macie also gives you the added ability to detect which buckets are public, unencrypted, and accessible from other AWS accounts.

In this post, we’ll walk through how to use Macie to detect sensitive data in Amazon DynamoDB tables by exporting the data to Amazon S3 so that Macie can scan the data. An example of why you would deploy a solution like this is if you have potentially sensitive data stored in DynamoDB tables. When we’re finished, you’ll have a solution that can set up on-demand or scheduled Macie discovery jobs to detect sensitive data exported from DynamoDB to S3.

Architecture

In figure 1, you can see an architectural diagram explaining the flow of the solution that you’ll be deploying.

Figure 1: Solution architecture

Here’s a brief overview of the steps that you’ll take to deploy the solution. Some steps you will do manually, while others will be handled by the provided AWS CloudFormation template. The following outline describes the steps taken to extract the data from DynamoDB and store it in S3, which allows Macie to run a discovery job against the data.

Enable Amazon Macie, if it isn’t already enabled.
Deploy a test DynamoDB dataset.
Create an S3 bucket to export DynamoDB data to.
Configure an AWS Identity and Access Management (IAM) policy and role. (These are used by the Lambda function to access the S3 and DynamoDB tables)
Deploy an AWS Lambda function to export DynamoDB data to S3.
Set up an Amazon EventBridge rule to schedule export of the DynamoDB data.
Create a Macie discovery job to discover sensitive data from the DynamoDB data export.
View the results of the Macie discovery job.

The goal is that when you finish, you have a solution that you can use to set up either on-demand or scheduled Macie discovery jobs to detect sensitive data that was exported from DynamoDB to S3.

Prerequisite: Enable Macie

If Macie hasn’t been enabled in your account, complete Step 1 in Getting started with Amazon Macie to enable Macie. Once you’ve enabled Macie, you can proceed with the deployment of the CloudFormation template.

Deploy the CloudFormation template

In this section, you start by deploying the CloudFormation template that will deploy all the resources needed for the solution. You can then review the output of the resources that have been deployed.

To deploy the CloudFormation template

Download the CloudFormation template: https://github.com/aws-samples/macie-dynamodb-blog/blob/main/src/cft.yaml
Sign in to the AWS Management Console and navigate to the CloudFormation console.
Choose Upload a template file, and then select the CloudFormation template that you downloaded in the previous step. Choose Next.

Figure 2 – Uploading the CloudFormation template to be deployed
For Stack Name, name your stack macie-blog, and then choose Next.

Figure 3: Naming your CloudFormation stack
For Configure stack options, keep the default values and choose Next.
At the bottom of the Review screen, select the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create stack.

Figure 4: Acknowledging that this CloudFormation template will create IAM roles

You should then see the following screen. It may take several minutes for the CloudFormation template to finish deploying.

Figure 5: CloudFormation stack creation in progress

View CloudFormation output

Once the CloudFormation template has been completely deployed, choose the Outputs tab, and you will see the following screen. Here you’ll find the names and URLs for all the AWS resources that are needed to complete the remainder of the solution.

Figure 6: Completed CloudFormation stack output

For easier reference, open a new browser tab to your AWS Management Console and leave this tab open. This will make it easier to quickly copy and paste the resource URLs as you navigate to different resources during this walkthrough.

Import DynamoDB data

In this section, we walk through importing the test dataset to DynamoDB. You first start by downloading the test CSV datasets, then upload those datasets to S3 and run the Lambda function that imports the data to DynamoDB. Finally, you review the data that was imported into DynamoDB.

Test datasets

Download the following test datasets:

Accounts Info test dataset (accounts.csv): https://github.com/aws-samples/macie-dynamodb-blog/blob/main/datasets/accounts.csv
People test dataset (people.csv): https://github.com/aws-samples/macie-dynamodb-blog/blob/main/datasets/people.csv

Upload data to the S3 import bucket

Now that you’ve downloaded the test datasets, you’ll need to navigate to the data import S3 bucket and upload the data.

To upload the datasets to the S3 import bucket

Navigate to the CloudFormation Outputs tab, where you’ll find the bucket information.

Figure 7: S3 bucket output values for the CloudFormation stack
Copy the ImportS3BucketURL link and navigate to the URL.
Upload the two test CSV datasets, people.csv and accounts.csv, to your S3 bucket.
After the upload is complete, you should see the two CSV files in the S3 bucket. You’ll use these files as your test DynamoDB data.

Figure 8: Test S3 datasets in the S3 bucket

View the data import Lambda function

Now that you have your test data staged for loading, you’ll import it into DynamoDB by using a Lambda function that was deployed with the CloudFormation template. To start, navigate to the CloudFormation console and get the URL to the Lambda function that will handle the data import to DynamoDB, as shown in figure 9.

Figure 9: CloudFormation output information for the People DynamoDB table

To run the data import Lambda function

Copy the LambdaImportS3DataToDynamoURL link and navigate to the URL. You will see the Import-Data-To-DynamoDB Lambda function, as shown in figure 10.

Figure 10: The Lambda function that imports data to DynamoDB
Choose the Test button in the upper right-hand corner. In the dialog screen, for Event name, enter Test and replace the value with {}.
Your screen should now look as shown in figure 11. Choose Create.

Figure 11: Configuring a test event to manually run the Lambda function
Choose the Test button again in the upper right-hand corner. You should now see the Lambda function running, as shown in figure 12.

Figure 12: View of the Lambda function running
Once the Lambda function is finished running, you can expand the Details section. You should see a screen similar to the one in figure 13. When you see this screen, the test datasets have successfully been imported into the DynamoDB tables.

Figure 13: View of the data import Lambda function after it runs successfully

View the DynamoDB test dataset

Now that you have the datasets imported, you can look at the data in the console.

To view the test dataset

Navigate to the two DynamoDB tables. You can do this by getting the URL values from the CloudFormation Outputs tab. Figure 14 shows the URL for the accounts tables.

Figure 14: Output values for CloudFormation stack DynamoDB account tables

Figure 15 shows the URL for the people tables.

Figure 15: Output values for CloudFormation stack DynamoDB people tables
Copy the AccountsDynamoDBTableURL link value and navigate to it in the browser. Then choose the Items tab.

Figure 16: View of DynamoDB account-info-macie table data

You should now see a screen showing data similar to the screen in figure 16. This DynamoDB table stores the test account data that you will use to run a Macie discovery job against after the data has been exported to S3.
Navigate to the PeopleDynamoDBTableURL link that is in the CloudFormation output. Then choose the Items tab.

Figure 17: View of DynamoDB people table data

You should now see a screen showing data similar to the screen in figure 17. This DynamoDB table stores the test people data that you will use to run a Macie discovery job against after the data has been exported to S3.

Export DynamoDB data to S3

In the previous section, you set everything up and staged the data to DynamoDB. In this section, you will export data from DynamoDB to S3.

View the EventBridge rule

The EventBridge rule that was deployed earlier allows you to automatically schedule the export of DynamoDB data to S3. You will can export data in hours, in minutes, or in days. The purpose of the EventBridge rule is to allow you to set up an automated data pipeline from DynamoDB to S3. For demonstration purposes, you’ll run the Lambda function that the EventBridge rule uses manually, so that you can see the data be exported to S3 without having to wait.

To view the EventBridge rule

Navigate to the CloudFormation Outputs tab for the CloudFormation stack you deployed earlier.

Figure 18: CloudFormation output information for the EventBridge rule
Navigate to the EventBridgeRule link. You should see the following screen.

Figure 19: EventBridge rule configuration details page

On this screen, you can see that we’ve set the event schedule to run every hour. The interval can be changed to fit your business needs. We have set it for 1 hour for demonstration purposes only. To make changes to the interval, you can choose the Edit button to make changes and then save the rule.

In the Target(s) section, we’ve configured a Lambda function named Export-DynamoDB-Data-To-S3 to handle the process of exporting data to the S3 bucket the Macie discovery job will run against. We will cover the Lambda function that handles the export of the data from DynamoDB next.

View the data export Lambda function

In this section, you’ll take a look at the Lambda function that handles the exporting of DynamoDB data to the S3 bucket that Macie will run its discovery job against.

To view the Lambda function

Navigate to the CloudFormation Outputs tab for the CloudFormation stack you deployed earlier.

Figure 20: CloudFormation output information for the Lambda function that exports DynamoDB data to S3
Copy the link value for LambdaExportDynamoDBDataToS3URL and navigate to the URL in your browser. You should see the Python code that will handle the exporting of data to S3. The code has been commented so that you can easily follow it and refactor it for your needs.
Scroll to the Environment variables section.

Figure 21: Environment variables used by the Lambda function

You will see two environment variables:

bucket_to_export_to – This environment variable is used by the function as the S3 bucket location to save the DynamoDB data to. This is the bucket that the Macie discovery will run against.
dynamo_db_tables – This environment variable is a comma-delimited list of DynamoDB tables that will be read and have data exported to S3. If there was another table that you wanted to export data from, you would simply add it to the comma-delimited list and it would be part of the export.

Export DynamoDB data

In this section, you will manually run the Lambda function to export the DynamoDB tables data to S3. As stated previously, you would normally allow the EventBridge rule to handle the automated export of the data to S3. In order to see the export in action, you’re going to manually run the function.

To run the export Lambda function

In the console, scroll back to the top of the screen and choose the Test button.
Name the test dynamoDBExportTest, and for the test data create an empty JSON object “{}” as shown in figure 22.

Figure 22: Configuring a test event to manually test the data export Lambda function
Choose Create.
Choose the Test button again to run the Lambda function to export the DynamoDB data to S3.

Figure 23: View of the screen where you run the Lambda function to export data
It could take about one minute to export the data from DynamoDB to S3. Once the Lambda function exports the data, you should see a screen similar to the following one.

Figure 24: The result after you successfully run the data export Lambda function

View the exported DynamoDB data

Now that the DynamoDB data has been exported for Macie to run discovery jobs against, you can navigate to S3 to verify that the files exported to the bucket.

To view the data, navigate to the CloudFormation stack Output tab. Find the ExportS3BucketURL, shown in figure 25, and navigate to the link.

Figure 25: CloudFormation output information for the S3 buckets that the DynamoDB data was exported to

You should then see two different JSON files for the two DynamoDB tables that data was exported from, as shown in figure 26.

Figure 26: View of S3 objects that were exported to S3

This is the file naming convention that’s used for the files:

<Service-name>-<DynamoDB-Table-Name>-<AWS-Region>-<DataAndTime>.json

Next, you’ll create a Macie discovery job to run against the files in this S3 bucket to discover sensitive data.

Create the Macie discovery job

In this section, you’ll create a Macie discovery job and view the results after the job has finished running.

To create the discovery job

In the AWS Management Console, navigate to Macie. In the left-hand menu, choose Jobs.

Figure 27: Navigation menu to Macie discovery jobs
Choose the Create job button.

Figure 28: Macie discovery job list screen
Using the Bucket Name filter, search for the S3 bucket that the DynamoDB data was exported to. This can be found in the CloudFormation stack output, as shown in figure 29.

Figure 29: CloudFormation stack output
Select the value you see for ExportS3BucketName, as shown in figure 30.

Note: The value you see for your bucket name will be slightly different, based on the random characters added to the end of the bucket name generated by CloudFormation.

Figure 30: Selecting the S3 bucket to include in the Macie discovery job
Once you’ve found the S3 bucket, select the check box next to it, and then choose Next.
On the Review S3 Buckets screen, if you’re satisfied with the selected buckets, choose Next.

Following are some important options when setting up Macie data discovery jobs.

Scheduling
You have the following scheduling options for the data discovery job:

Daily
Weekly
Monthly

Data Sampling
This allows you to randomly sample a percentage of the data that the Macie discovery job will run against.

Object criteria
This enables you to target objects based on certain metadata values. The values are:

Tags – Target objects with certain tags.
Last modified – Target objects based on when they were last modified.
File extensions – Target objects based on file extensions.
Object size – Target objects based on the file size.

You can include or exclude objects based on these object criteria filters.

Set the discovery job scope

For demonstration purposes, this will be a one-time discovery job.

To set the discovery job scope

On the Scope page that appears after you create the job, set the following options for the job scope:
1. Select the One-time job option.
2. Leave Sampling depth set to 100%, and choose Next.
  
  Figure 31: Selecting the objects that should be in scope for this discovery job
On the Custom data identifiers screen, select account_number, and then choose Next.With the custom identifier, you can create custom business logic to look for certain patterns in files stored in S3. In this example, the job generates a finding for any file that contains data with the following format:
Account Number Format: Starts with “XYZ-” followed by 11 numbers

The logic to create a custom data identifier can be found in the CloudFormation template.

Figure 32: Custom data identifiers
Give your discovery job the name dynamodb-macie-discovery-job. For Description, enter Discovery job to detect sensitive data exported from DynamoDB, and choose Next.

Figure 33: Giving the Macie discovery job a name and description

You will then see the Review and create screen, as shown in figure 34.

Figure 34: The Macie discovery job review screen

Note: Macie must have proper permissions to decrypt objects that are part of the Macie discovery job. The CloudFormation template that you deployed during the initial setup has already deployed an AWS Key Management Service (AWS KMS) key with the proper permissions.

For this proof of concept you won’t store the results, so you can select the check box next to Override this requirement. If you wanted to store detailed results of the discovery job long term, you would configure a repository for data discovery results. To view detailed steps for setting this up, see Storing and retaining discovery results with Amazon Macie.

Submit the discovery job

Next, you can submit the discovery job. On the Review and create screen, choose the Submit button to start the discovery job. You should see a screen similar to the following.

Figure 35: A Macie discovery job run that is in progress

The amount of data that is being scanned dictates how long the job will take to run. You can choose the Refresh button at the top of the screen to see the updated status of the job. This job, based on the size of the test dataset, will take about seven minutes to complete.

Review the job results

Now that the Macie discovery job has run, you can review the results to see what sensitive data was discovered in the data exported from DynamoDB.

You should see the following screen once the job has successfully run.

Figure 36: View of the completed Macie discovery job

On the right, you should see another pane with more information related to the discovery job. The pane should look like the following screen.

Figure 37: Summary showing which S3 bucket the discovery job ran against and start and complete time

Note: If you don’t see this pane, choose on the discovery job to have this information displayed.

To review the job results

On the page for the discovery job, in the Show Results list, select Show findings.

Figure 38: Option to view discovery job findings
The Findings screen appears, as follows.

Figure 39: Viewing the list of findings generated by the Macie discovery job

The discovery job that you ran has two different “High Severity” finding types:

SensitiveData:S3Object/Personal – The object contains personal information, such as full names or identification numbers.

SensitiveData:S3Object/Multiple – The object contains more than one type of sensitive data.

Learn more about Macie findings types.
Choose the SensitiveData:S3Object/Personal finding type, and you will see an information pane appear to the right, as shown in figure 40.Some of the key information that you can find here:
Severity – What the severity of the finding is: Low, Medium, or High.
Resource – The S3 bucket where the S3 object exists that caused the finding to be generated.
Region – The Region where the S3 bucket exists.

Figure 40: Viewing the severity of the discovery job finding

Since the finding is based on the detection of personal information in the S3 object, you get the number of times and type of personal data that was discovered, as shown in figure 41.

Figure 41: Viewing the number of social security numbers that were discovered in the finding

Here you can see that 10 names were detected in the data that you exported from the DynamoDB table. Occurrences of name equals 10 line ranges, which tells you that the names were found on 10 different lines in the file. If you choose the 10 line ranges link, you are given the starting line and column in the document where the name was discovered.

The S3 object that triggered the finding is displayed in the Resource affected section, as shown in figure 42.

Figure 42: The S3 object that generated the Macie finding

Now that you know which S3 object contains the sensitive data, you can investigate further to take appropriate action to protect the data.

View the Macie finding details

In this section, you will walk through how to read and download the objects related to the Macie discovery job.

To download and view the S3 object that contains the finding

In the Overview section of the finding details, select the value for the Resource link. You will then be taken to the object in the S3 bucket.

Figure 43: Viewing the S3 bucket where the object is located that generated the Macie finding
You can then download the S3 object from the S3 bucket to view the file content and further investigate the file content for sensitive data. Select the check box next to the S3 object, and choose the Download button at the top of the screen.Next, we will look at the SensitiveData:S3Object/Multiple finding type that was generated. This finding type lets us know that there are multiple types of potentially sensitive data related to an object stored in S3.
In the left navigation menu, navigate back to the Jobs menu.
Choose the job that you created in the previous steps. In the Show Results list, select Show Findings.
Select the SensitiveData:S3Object/Multiple finding type. An information pane appears to the right. As with the previous finding, you will see the severity, Region, S3 bucket location, and other relevant information about the finding. For this finding, we will focus on the Custom data identifiers and Personal info sections.

Figure 44: Details about the sensitive data that was discovered by the Macie discovery job

Here you can see that the discovery job found 10 names on 10 different lines in the file. Also, you can see that 10 account numbers were discovered on 10 different lines in the file, based on the custom identifier that was included as part of the discovery job.

This finding demonstrates how you can use the built-in Macie identifiers, such as names, and also include custom business logic based on your organization’s needs by using Macie custom data identifiers.

To view the data and investigate further, follow the same steps as in the previous finding you investigated.
Navigate to the top of the screen and in the Overview section, locate the Resource.

Figure 45: Viewing the S3 bucket where the object is located that generated the Macie finding
Choose Resource, which will take you to the S3 object to download. You can now view the contents of the file and investigate further.

You’ve now created a Macie discovery job to scan for sensitive data stored in an S3 bucket that originated in DynamoDB. You can also automate this solution further by using EventBridge rules to detect Macie findings to take actions against those objects with sensitive data.

Solution cleanup

In order to clean up the solution that you just deployed, complete the following steps. Note that you need to do these steps to stop data from being exported from DynamoDB to S3 every 1 hour.

To perform cleanup

Navigate to the S3 buckets used to import and export data. You can find the bucket names in the CloudFormation Outputs tab in the console, as shown in figure 7 and figure 25.
After you’ve navigated to each of the buckets, delete all objects from the bucket.
Navigate to the CloudFormation console, and then delete the CloudFormation stack named macie-blog. After the stack is deleted, the solution will no longer be deployed in your AWS account.

Summary

After deploying the solution, we hope you have a better understanding of how you can use Macie to detect sensitive from other data sources, such as DynamoDB, as outlined in this post. The following are links to resources that you can use to further expand your knowledge of Amazon Macie capabilities and features.

Additional resources

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Start the investigation in GuardDuty

Investigate the finding in Detective

View the finding profile page

Scope time

Role session overview

Review changes in Sara’s access patterns and operations

Investigate Sara’s API activity

To investigate the user’s actions

Examine chained role assumptions

To view the user’s other role

View API operations that were called within the chained role

To view the user’s activity in another role

Summary

Additional resources

Architecture

Prerequisite: Enable Macie

Deploy the CloudFormation template

To deploy the CloudFormation template

View CloudFormation output

Import DynamoDB data

Test datasets

Upload data to the S3 import bucket

To upload the datasets to the S3 import bucket

View the data import Lambda function

To run the data import Lambda function

View the DynamoDB test dataset

To view the test dataset

Export DynamoDB data to S3

View the EventBridge rule

To view the EventBridge rule

View the data export Lambda function

To view the Lambda function

Export DynamoDB data

To run the export Lambda function

View the exported DynamoDB data

Create the Macie discovery job

To create the discovery job

Set the discovery job scope

To set the discovery job scope

Submit the discovery job

Review the job results

To review the job results

View the Macie finding details

To download and view the S3 object that contains the finding

Solution cleanup

To perform cleanup

Summary

Additional resources

The collective thoughts of the interwebz