Tag Archives: Sensitive Data Discovery

Detect Stripe keys in S3 buckets with Amazon Macie

Post Syndicated from Koulick Ghosh original https://aws.amazon.com/blogs/security/detect-stripe-keys-in-s3-buckets-with-amazon-macie/

Many customers building applications on Amazon Web Services (AWS) use Stripe global payment services to help get their product out faster and grow revenue, especially in the internet economy. It’s critical for customers to securely and properly handle the credentials used to authenticate with Stripe services. Much like your AWS API keys, which enable access to your AWS resources, Stripe API keys grant access to the Stripe account, which allows for the movement of real money. Therefore, you must keep Stripe’s API keys secret and well-controlled. And, much like AWS keys, it’s important to invalidate and re-issue Stripe API keys that have been inadvertently committed to GitHub, emitted in logs, or uploaded to Amazon Simple Storage Service (Amazon S3).

Customers have asked us for ways to reduce the risk of unintentionally exposing Stripe API keys, especially when code files and repositories are stored in Amazon S3. To help meet this need, we collaborated with Stripe to develop a new managed data identifier that you can use to help discover and protect Stripe API keys.

“I’m really glad we could collaborate with AWS to introduce a new managed data identifier in Amazon Macie. Mutual customers of AWS and Stripe can now scan S3 buckets to detect exposed Stripe API keys.”
Martin Pool, Staff Engineer in Cloud Security at Stripe

In this post, we will show you how to use the new managed data identifier in Amazon Macie to discover and protect copies of your Stripe API keys.

About Stripe API keys

Stripe provides payment processing software and services for businesses. Using Stripe’s technology, businesses can accept online payments from customers around the globe.

Stripe authenticates API requests by using API keys, which are included in the request. Stripe takes various measures to help customers keep their secret keys safe and secure. Stripe users can generate test-mode keys, which can only access simulated test data, and which doesn’t move real money. Stripe encourages its customers to use only test API keys for testing and development purposes to reduce the risk of inadvertent disclosure of live keys or of accidentally generating real charges.

Stripe also supports publishable keys, which you can make publicly accessible in your web or mobile app’s client-side code to collect payment information.

In this blog post, we focus on live-mode keys, which are the primary security concern because they can access your real data and cause money movement. These keys should be closely held within the production services that need to use them. Stripe allows keys to be restricted to read or write specific API resources, or used only from certain IP ranges, but even with these restrictions, you should still handle live mode keys with caution.

Stripe keys have distinctive prefixes to help you detect them such as sk_live_ for secret keys, and rk_live_ for restricted keys (which are also secret).

Amazon Macie

Amazon Macie is a fully managed service that uses machine learning (ML) and pattern matching to discover and help protect your sensitive data, such as personally identifiable information. Macie can also provide detailed visibility into your data and help you align with compliance requirements by identifying data that needs to be protected under various regulations, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).

Macie supports a suite of managed data identifiers to make it simpler for you to configure and adopt. Managed data identifiers are prebuilt, customizable patterns that help automatically identify sensitive data, such as credit card numbers, social security numbers, and email addresses.

Now, Macie has a new managed data identifier STRIPE_CREDENTIALS that you can use to identify Stripe API secret keys.

Configure Amazon Macie to detect Stripe credentials

In this section, we show you how to use the managed data identifier STRIPE_CREDENTIALS to detect Stripe API secret keys. We recommend that you carry out these tutorial steps in an AWS account dedicated to experimentation and exploration before you move forward with detection in a production environment.

Prerequisites

To follow along with this walkthrough, complete the following prerequisites.

Create example data

The first step is to create some example objects in an S3 bucket in the AWS account. The objects contain strings that resemble Stripe secret keys. You will use the example data later to demonstrate how Macie can detect Stripe secret keys.

To create the example data

  1. Open the S3 console and create an S3 bucket.
  2. Create four files locally, paste the following mock sensitive data into those files, and upload them to the bucket.
    file1
     stripe publishable key sk_live_cpegcLxKILlrXYNIuqYhGXoy
    
    file2
     sk_live_cpegcLxKILlrXYNIuqYhGXoy
     sk_live_abcdcLxKILlrXYNIuqYhGXoy
     sk_live_efghcLxKILlrXYNIuqYhGXoy
     stripe payment sk_live_ijklcLxKILlrXYNIuqYhGXoy
    
     file3
     sk_live_cpegcLxKILlrXYNIuqYhGXoy
     stripe api key sk_live_abcdcLxKILlrXYNIuqYhGXoy
    
     file4
     stripe secret key sk_live_cpegcLxKILlrXYNIuqYhGXoy

Note: The keys mentioned in the preceding files are mock data and aren’t related to actual live Stripe keys.

Create a Macie job with the STRIPE_CREDENTIALS managed data identifier

Using Macie, you can scan your S3 buckets for sensitive data and security risks. In this step, you run a one-time Macie job to scan an S3 bucket and review the findings.

To create a Macie job with STRIPE_CREDENTIALS

  1. Open the Amazon Macie console, and in the left navigation pane, choose Jobs. On the top right, choose Create job.
    Figure 1: Create Macie Job

    Figure 1: Create Macie Job

  2. Select the bucket that you want Macie to scan or specify bucket criteria, and then choose Next.
    Figure 2: Select S3 bucket

    Figure 2: Select S3 bucket

  3. Review the details of the S3 bucket, such as estimated cost, and then choose Next.
    Figure 3: Review S3 bucket

    Figure 3: Review S3 bucket

  4. On the Refine the scope page, choose One-time job, and then choose Next.

    Note: After you successfully test, you can schedule the job to scan S3 buckets at the frequency that you choose.

    Figure 4: Select one-time job

    Figure 4: Select one-time job

  5. For Managed data identifier options, select Custom and then select Use specific managed data identifiers. For Select managed data identifiers, search for STRIPE_CREDENTIALS and then select it. Choose Next.
    Figure 5: Select managed data identifier

    Figure 5: Select managed data identifier

  6. Enter a name and an optional description for the job, and then choose Next.
    Figure 6: Enter job name

    Figure 6: Enter job name

  7. Review the job details and choose Submit. Macie will create and start the job immediately, and the job will run one time.
  8. When the Status of the job shows Complete, select the job, and from the Show results dropdown, select Show findings.
    Figure 7: Select the job and then select Show findings

    Figure 7: Select the job and then select Show findings

  9. You can now review the findings for sensitive data in your S3 bucket. As shown in Figure 8, Macie detected Stripe keys in each of the four files, and categorized the findings as High severity. You can review and manage the findings in the Macie console, retrieve them through the Macie API for further analysis, send them to Amazon EventBridge for automated processing, or publish them to AWS Security Hub for a comprehensive view of your security state.
    Figure 8: Review the findings

    Figure 8: Review the findings

Respond to unintended disclosure of Stripe API keys

If you discover Stripe live-mode keys (or other sensitive data) in an S3 bucket, then through the Stripe dashboard, you can roll your API keys to revoke access to the compromised key and generate a new one. This helps ensure that the key can’t be used to make malicious API requests. Make sure that you install the replacement key into the production services that need it. In the longer term, you can take steps to understand the path by which the key was disclosed and help prevent a recurrence.

Conclusion

In this post, you learned about the importance of safeguarding Stripe API keys on AWS. By using Amazon Macie with managed data identifiers, setting up regular reviews and restricted access to S3 buckets, training developers in security best practices, and monitoring logs and repositories, you can help mitigate the risk of key exposure and potential security breaches. By adhering to these practices, you can help ensure a robust security posture for your sensitive data on AWS.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers about how AWS Security services can help improve their security. In his free time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Sagar Gandha

Sagar Gandha

Sagar is an experienced Senior Technical Account Manager at AWS adept at assisting large customers in enterprise support. He offers expert guidance on best practices, facilitates access to subject matter experts, and delivers actionable insights on optimizing AWS spend, workloads, and events. Outside of work, Sagar loves spending time with his kids.

Mohan Musti

Mohan Musti

Mohan is a Senior Technical Account Manager at AWS based in Dallas. Mohan helps customers architect and optimize applications on AWS. In his spare time, he enjoys spending time with his family and camping.

How to use Amazon Macie to reduce the cost of discovering sensitive data

Post Syndicated from Nicholas Doropoulos original https://aws.amazon.com/blogs/security/how-to-use-amazon-macie-to-reduce-the-cost-of-discovering-sensitive-data/

Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to discover and help protect your sensitive data, such as personally identifiable information (PII), payment card data, and Amazon Web Services (AWS) credentials. Analyzing large volumes of data for the presence of sensitive information can be expensive, due to the nature of compute-intensive operations involved in the process.

Macie offers several capabilities to help customers reduce the cost of discovering sensitive data, including automated data discovery, which can reduce your spend with new data sampling techniques that are custom-built for Amazon Simple Storage Service (Amazon S3). In this post, we will walk through such Macie capabilities and best practices so that you can cost-efficiently discover sensitive data with Macie.

Overview of the Macie pricing plan

Let’s do a quick recap of how customers pay for the Macie service. With Macie, you are charged based on three dimensions: the number of S3 buckets evaluated for bucket inventory and monitoring, the number of S3 objects monitored for automated data discovery, and the quantity of data inspected for sensitive data discovery. You can read more about these dimensions on the Macie pricing page.

The majority of the cost incurred by customers is driven by the quantity of data inspected for sensitive data discovery. For this reason, we will limit the scope of this post to techniques that you can use to optimize the quantity of data that you scan with Macie.

Not all security use cases require the same quantity of data for scanning

Broadly speaking, you can choose to scan your data in two ways—run full scans on your data or sample a portion of it. When to use which method depends on your use cases and business objectives. Full scans are useful when customers have identified what they want to scan. A few examples of such use cases are: scanning buckets that are open to the internet, monitoring a bucket for unintentionally added sensitive data by scanning every new object, or performing analysis on a bucket after a security incident.

The other option is to use sampling techniques for sensitive data discovery. This method is useful when security teams want to reduce data security risks. For example, just knowing that an S3 bucket contains credit card numbers is enough information to prioritize it for remediation activities.

Macie offers both options, and you can discover sensitive data either by creating and running sensitive data discovery jobs that perform full scans on targeted locations, or by configuring Macie to perform automated sensitive data discovery for your account or organization. You can also use both options simultaneously in Macie.

Use automated data discovery as a best practice

Automated data discovery in Macie minimizes the quantity of data scanning that is needed to a fraction of your S3 estate.

When you enable Macie for the first time, automated data discovery is enabled by default. When you already use Macie in your organization, you can enable automatic data discovery in the management console of the Amazon Macie administrator account. This Macie capability automatically starts discovering sensitive data in your S3 buckets and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to identify data security risks that need immediate attention.

Automated data discovery in Macie starts to evaluate the level of sensitivity of each of your buckets by using intelligent and fully managed data sampling techniques to minimize the quantity of data scanning needed. During evaluation, objects are organized with similar S3 metadata, such as bucket names, object-key prefixes, file-type extensions, and storage class, into groups that are likely to have similar content. Macie then selects small, but representative, samples from each identified group of objects and scans them to detect the presence of sensitive data. Macie has a feedback loop that uses the results of previously scanned samples to prioritize the next set of samples to inspect.

The automated sensitive data discovery feature is designed to detect sensitive data at scale across hundreds of buckets and accounts, which makes it easier to identify the S3 buckets that need to be prioritized for more focused scanning. Because the amount of data that needs to be scanned is reduced, this task can be done at fraction of the cost of running a full data inspection across all your S3 buckets. The Macie console displays the scanning results as a heat map (Figure 1), which shows the consolidated information grouped by account, and whether a bucket is sensitive, not sensitive, or not analyzed yet.

Figure 1: A heat map showing the results of automated sensitive data discovery

Figure 1: A heat map showing the results of automated sensitive data discovery

There is a 30-day free trial period when you enable automatic data discovery on your AWS account. During the trial period, in the Macie console, you can see the estimated cost of running automated sensitive data discovery after the trial period ends. After the evaluation period, we charge based on the total quantity of S3 objects in your account, as well as the bytes that are scanned for sensitive content. Charges are prorated per day. You can disable this capability at any time.

Tune your monthly spend on automated sensitive data discovery

To further reduce your monthly spend on automated sensitive data, Macie allows you to exclude buckets from automated discovery. For example, you might consider excluding buckets that are used for storing operational logs, if you’re sure they don’t contain any sensitive information. Your monthly spend is reduced roughly by the percentage of data in those excluded buckets compared to your total S3 estate.

Figure 2 shows the setting in the heatmap area of the Macie console that you can use to exclude a bucket from automated discovery.

Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap

Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap

You can also use the automated data discovery settings page to specify multiple buckets to be excluded, as shown in Figure 3.

Figure 3: Excluding buckets from the automated sensitive data discovery settings page

Figure 3: Excluding buckets from the automated sensitive data discovery settings page

How to run targeted, cost-efficient sensitive data discovery jobs

Making your sensitive data discovery jobs more targeted make them more cost-efficient, because it reduces the quantity of data scanned. Consider using the following strategies:

  1. Make your sensitive data discovery jobs as targeted and specific as possible in their scope by using the Object criteria settings on the Refine the scope page, shown in Figure 4.
    Figure 4: Adjusting the scope of a sensitive data discovery job

    Figure 4: Adjusting the scope of a sensitive data discovery job

    Options to make discovery jobs more targeted include:

    • Include objects by using the “last modified” criterion — If you are aware of the frequency at which your classifiable S3-hosted objects get modified, and you want to scan the resources that changed at a particular point in time, include in your scope the objects that were modified at a certain date or time by using the “last modified” criterion.
    • Don’t scan CloudTrail logs — Identify the S3 bucket prefixes that contain AWS CloudTrail logs and exclude them from scanning.
    • Consider using random object sampling — With this option, you specify the percentage of eligible S3 objects that you want Macie to analyze when a sensitive data discovery job runs. If this value is less than 100%, Macie selects eligible objects to analyze at random, up to the specified percentage, and analyzes the data in those objects. If your data is highly consistent and you want to determine whether a specific S3 bucket, rather than each object, contains sensitive information, adjust the sampling depth accordingly.
    • Include objects with specific extensions, tags, or storage size — To fine tune the scope of a sensitive data discovery job, you can also define custom criteria that determine which S3 objects Macie includes or excludes from a job’s analysis. These criteria consist of one or more conditions that derive from properties of S3 objects. You can exclude objects with specific file name extensions, exclude objects by using tags as the criterion, and exclude objects on the basis of their storage size. For example, you can use a criteria-based job to scan the buckets associated with specific tag key/value pairs such as Environment: Production.
  2. Specify S3 bucket criteria in your job — Use a criteria-based job to scan only buckets that have public read/write access. For example, if you have 100 buckets with 10 TB of data, but only two of those buckets containing 100 GB are public, you could reduce your overall Macie cost by 99% by using a criteria-based job to classify only the public buckets.
  3. Consider scheduling jobs based on how long objects live in your S3 buckets. Running jobs at a higher frequency than needed can result in unnecessary costs in cases where objects are added and deleted frequently. For example, if you’ve determined that the S3 objects involved contain high velocity data that is expected to reside in your S3 bucket for a few days, and you’re concerned that sensitive data might remain, scheduling your jobs to run at a lower frequency will help in driving down costs. In addition, you can deselect the Include existing objects checkbox to scan only new objects.
    Figure 5: Specifying the frequency of a sensitive data discovery job

    Figure 5: Specifying the frequency of a sensitive data discovery job

  4. As a best practice, review your scheduled jobs periodically to verify that they are still meaningful to your organization. If you aren’t sure whether one of your periodic jobs continues to be fit for purpose, you can pause it so that you can investigate whether it is still needed, without incurring potentially unnecessary costs in the meantime. If you determine that a periodic job is no longer required, you can cancel it completely.
    Figure 6: Pausing a scheduled sensitive data discovery job

    Figure 6: Pausing a scheduled sensitive data discovery job

  5. If you don’t know where to start to make your jobs more targeted, use the results of Macie automated data discovery to plan your scanning strategy. Start with small buckets and the ones that have policy findings associated with them.
  6. In multi-account environments, you can monitor Macie’s usage across your organization in AWS Organizations through the usage page of the delegated administrator account. This will enable you to identify member accounts that are incurring higher costs than expected, and you can then investigate and take appropriate actions to keep expenditure low.
  7. Take advantage of the Macie pricing calculator so that you get an estimate of your Macie fees in advance.

Conclusion

In this post, we highlighted the best practices to keep in mind and configuration options to use when you discover sensitive data with Amazon Macie. We hope that you will walk away with a better understanding of when to use the automated data discovery capability and when to run targeted sensitive data discovery jobs. You can use the pointers in this post to tune the quantity of data you want to scan with Macie, so that you can continuously optimize your Macie spend.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Nicholas Doropoulos

Nicholas Doropoulos

Nicholas is an AWS Cloud Security Engineer, a Bestselling Udemy Instructor, and a subject matter expert in AWS Shield, GuardDuty and Certificate Manager. Outside work, he enjoys spending his time with his wife and their beautiful baby son.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

How to query and visualize Macie sensitive data discovery results with Athena and QuickSight

Post Syndicated from Keith Rozario original https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/

Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to help you discover and protect sensitive data in Amazon Simple Storage Service (Amazon S3). With Macie, you can analyze objects in your S3 buckets to detect occurrences of sensitive data, such as personally identifiable information (PII), financial information, personal health information, and access credentials.

In this post, we walk you through a solution to gain comprehensive and organization-wide visibility into which types of sensitive data are present in your S3 storage, where the data is located, and how much is present. Once enabled, Macie automatically starts discovering sensitive data in your S3 storage and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to run targeted sensitive data discovery jobs. Both automated data discovery and targeted jobs produce rich, detailed sensitive data discovery results. This solution uses Amazon Athena and Amazon QuickSight to deep-dive on the Macie results, and to help you analyze, visualize, and report on sensitive data discovered by Macie, even when the data is distributed across millions of objects, thousands of S3 buckets, and thousands of AWS accounts. Athena is an interactive query service that makes it simpler to analyze data directly in Amazon S3 using standard SQL. QuickSight is a cloud-scale business intelligence tool that connects to multiple data sources, including Athena databases and tables.

This solution is relevant to data security, data governance, and security operations engineering teams.

The challenge: how to summarize sensitive data discovered in your growing S3 storage

Macie issues findings when an object is found to contain sensitive data. In addition to findings, Macie keeps a record of each S3 object analyzed in a bucket of your choice for long-term storage. These records are known as sensitive data discovery results, and they include additional context about your data in Amazon S3. Due to the large size of the results file, Macie exports the sensitive data discovery results to an S3 bucket, so you need to take additional steps to query and visualize the results. We discuss the differences between findings and results in more detail later in this post.

With the increasing number of data privacy guidelines and compliance mandates, customers need to scale their monitoring to encompass thousands of S3 buckets across their organization. The growing volume of data to assess, and the growing list of findings from discovery jobs, can make it difficult to review and remediate issues in a timely manner. In addition to viewing individual findings for specific objects, customers need a way to comprehensively view, summarize, and monitor sensitive data discovered across their S3 buckets.

To illustrate this point, we ran a Macie sensitive data discovery job on a dataset created by AWS. The dataset contains about 7,500 files that have sensitive information, and Macie generated a finding for each sensitive file analyzed, as shown in Figure 1.

Figure 1: Macie findings from the dataset

Figure 1: Macie findings from the dataset

Your security team could spend days, if not months, analyzing these individual findings manually. Instead, we outline how you can use Athena and QuickSight to query and visualize the Macie sensitive data discovery results to understand your data security posture.

The additional information in the sensitive data discovery results will help you gain comprehensive visibility into your data security posture. With this visibility, you can answer questions such as the following:

  • What are the top 5 most commonly occurring sensitive data types?
  • Which AWS accounts have the most findings?
  • How many S3 buckets are affected by each of the sensitive data types?

Your security team can write their own customized queries to answer questions such as the following:

  • Is there sensitive data in AWS accounts that are used for development purposes?
  • Is sensitive data present in S3 buckets that previously did not contain sensitive information?
  • Was there a change in configuration for S3 buckets containing the greatest amount of sensitive data?

How are findings different from results?

As a Macie job progresses, it produces two key types of output: sensitive data findings (or findings for short), and sensitive data discovery results (or results).

Findings provide a report of potential policy violations with an S3 bucket, or the presence of sensitive data in a specific S3 object. Each finding provides a severity rating, information about the affected resource, and additional details, such as when Macie found the issue. Findings are published to the Macie console, AWS Security Hub, and Amazon EventBridge.

In contrast, results are a collection of records for each S3 object that a Macie job analyzed. These records contain information about objects that do and do not contain sensitive data, including up to 1,000 occurrences of each sensitive data type that Macie found in a given object, and whether Macie was unable to analyze an object because of issues such as permissions settings or use of an unsupported format. If an object contains sensitive data, the results record includes detailed information that isn’t available in the finding for the object.

One of the key benefits of querying results is to uncover gaps in your data protection initiatives—these gaps can occur when data in certain buckets can’t be analyzed because Macie was denied access to those buckets, or was unable to decrypt specific objects. The following table maps some of the key differences between findings and results.

Findings Results
Enabled by default Yes No
Location of published results Macie console, Security Hub, and EventBridge S3 bucket
Details of S3 objects that couldn’t be scanned No Yes
Details of S3 objects in which no sensitive data was found No Yes
Identification of files inside compressed archives that contain sensitive data No Yes
Number of occurrences reported per object Up to 15 Up to 1,000
Retention period 90 days in Macie console Defined by customer

Architecture

As shown in Figure 2, you can build out the solution in three steps:

  1. Enable the results and publish them to an S3 bucket
  2. Build out the Athena table to query the results by using SQL
  3. Visualize the results with QuickSight
Figure 2: Architecture diagram showing the flow of the solution

Figure 2: Architecture diagram showing the flow of the solution

Prerequisites

To implement the solution in this blog post, you must first complete the following prerequisites:

Figure 3: Sample data loaded into three different AWS accounts

Figure 3: Sample data loaded into three different AWS accounts

Note: All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not, nor is it intended, to relate back to any individual person.

Step 1: Enable the results and publish them to an S3 bucket

Publication of the discovery results to Amazon S3 is not enabled by default. The setup requires that you specify an S3 bucket to store the results (we also refer to this as the discovery results bucket), and use an AWS Key Management Service (AWS KMS) key to encrypt the bucket.

If you are analyzing data across multiple accounts in your organization, then you need to enable the results in your delegated Macie administrator account. You do not need to enable results in individual member accounts. However, if you’re running Macie jobs in a standalone account, then you should enable the Macie results directly in that account.

To enable the results

  1. Open the Macie console.
  2. Select the AWS Region from the upper right of the page.
  3. From the left navigation pane, select Discovery results.
  4. Select Configure now.
  5. Select Create Bucket, and enter a unique bucket name. This will be the discovery results bucket name. Make note of this name because you will use it when you configure the Athena tables later in this post.
  6. Under Encryption settings, select Create new key. This takes you to the AWS KMS console in a new browser tab.
  7. In the AWS KMS console, do the following:
    1. For Key type, choose symmetric, and for Key usage, choose Encrypt and Decrypt.
    2. Enter a meaningful key alias (for example, macie-results-key) and description.
    3. (Optional) For simplicity, set your current user or role as the Key Administrator.
    4. Set your current user/role as a user of this key in the key usage permissions step. This will give you the right permissions to run the Athena queries later.
    5. Review the settings and choose Finish.
  8. Navigate to the browser tab with the Macie console.
  9. From the AWS KMS Key dropdown, select the new key.
  10. To view KMS key policy statements that were automatically generated for your specific key, account, and Region, select View Policy. Copy these statements in their entirety to your clipboard.
  11. Navigate back to the browser tab with the AWS KMS console and then do the following:
    1. Select Customer managed keys.
    2. Choose the KMS key that you created, choose Switch to policy view, and under Key policy, select Edit.
    3. In the key policy, paste the statements that you copied. When you add the statements, do not delete any existing statements and make sure that the syntax is valid. Policies are in JSON format.
  12. Navigate back to the Macie console browser tab.
  13. Review the inputs in the Settings page for Discovery results and then choose Save. Macie will perform a check to make sure that it has the right access to the KMS key, and then it will create a new S3 bucket with the required permissions.
  14. If you haven’t run a Macie discovery job in the last 90 days, you will need to run a new discovery job to publish the results to the bucket.

In this step, you created a new S3 bucket and KMS key that you are using only for Macie. For instructions on how to enable and configure the results using existing resources, see Storing and retaining sensitive data discovery results with Amazon Macie. Make sure to review Macie pricing details before creating and running a sensitive data discovery job.

Step 2: Build out the Athena table to query the results using SQL

Now that you have enabled the discovery results, Macie will begin publishing them into your discovery results bucket in the form of jsonl.gz files. Depending on the amount of data, there could be thousands of individual files, with each file containing multiple records. To identify the top five most commonly occurring sensitive data types in your organization, you would need to query all of these files together.

In this step, you will configure Athena so that it can query the results using SQL syntax. Before you can run an Athena query, you must specify a query result bucket location in Amazon S3. This is different from the Macie discovery results bucket that you created in the previous step.

If you haven’t set up Athena previously, we recommend that you create a separate S3 bucket, and specify a query result location using the Athena console. After you’ve set up the query result location, you can configure Athena.

To create a new Athena database and table for the Macie results

  1. Open the Athena console, and in the query editor, enter the following data definition language (DDL) statement. In the context of SQL, a DDL statement is a syntax for creating and modifying database objects, such as tables. For this example, we named our database macie_results.
    CREATE DATABASE macie_results;
    

    After running this step, you’ll see a new database in the Database dropdown. Make sure that the new macie_results database is selected for the next queries.

    Figure 4: Create database in the Athena console

    Figure 4: Create database in the Athena console

  2. Create a table in the database by using the following DDL statement. Make sure to replace <RESULTS-BUCKET-NAME> with the name of the discovery results bucket that you created previously.
    CREATE EXTERNAL TABLE maciedetail_all_jobs(
    	accountid string,
    	category string,
    	classificationdetails struct<jobArn:string,result:struct<status:struct<code:string,reason:string>,sizeClassified:string,mimeType:string,sensitiveData:array<struct<category:string,totalCount:string,detections:array<struct<type:string,count:string,occurrences:struct<lineRanges:array<struct<start:string,`end`:string,`startColumn`:string>>,pages:array<struct<pageNumber:string>>,records:array<struct<recordIndex:string,jsonPath:string>>,cells:array<struct<row:string,`column`:string,`columnName`:string,cellReference:string>>>>>>>,customDataIdentifiers:struct<totalCount:string,detections:array<struct<arn:string,name:string,count:string,occurrences:struct<lineRanges:array<struct<start:string,`end`:string,`startColumn`:string>>,pages:array<string>,records:array<string>,cells:array<string>>>>>>,detailedResultsLocation:string,jobId:string>,
    	createdat string,
    	description string,
    	id string,
    	partition string,
    	region string,
    	resourcesaffected struct<s3Bucket:struct<arn:string,name:string,createdAt:string,owner:struct<displayName:string,id:string>,tags:array<string>,defaultServerSideEncryption:struct<encryptionType:string,kmsMasterKeyId:string>,publicAccess:struct<permissionConfiguration:struct<bucketLevelPermissions:struct<accessControlList:struct<allowsPublicReadAccess:boolean,allowsPublicWriteAccess:boolean>,bucketPolicy:struct<allowsPublicReadAccess:boolean,allowsPublicWriteAccess:boolean>,blockPublicAccess:struct<ignorePublicAcls:boolean,restrictPublicBuckets:boolean,blockPublicAcls:boolean,blockPublicPolicy:boolean>>,accountLevelPermissions:struct<blockPublicAccess:struct<ignorePublicAcls:boolean,restrictPublicBuckets:boolean,blockPublicAcls:boolean,blockPublicPolicy:boolean>>>,effectivePermission:string>>,s3Object:struct<bucketArn:string,key:string,path:string,extension:string,lastModified:string,eTag:string,serverSideEncryption:struct<encryptionType:string,kmsMasterKeyId:string>,size:string,storageClass:string,tags:array<string>,embeddedFileDetails:struct<filePath:string,fileExtension:string,fileSize:string,fileLastModified:string>,publicAccess:boolean>>,
    	schemaversion string,
    	severity struct<description:string,score:int>,
    	title string,
    	type string,
    	updatedat string)
    ROW FORMAT SERDE
    	'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES (
    	'paths'='accountId,category,classificationDetails,createdAt,description,id,partition,region,resourcesAffected,schemaVersion,severity,title,type,updatedAt')
    STORED AS INPUTFORMAT
    	'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    	's3://<RESULTS-BUCKET-NAME>/AWSLogs/'
    

    After you complete this step, you will see a new table named maciedetail_all_jobs in the Tables section of the query editor.

  3. Query the results to start gaining insights. For example, to identify the top five most common sensitive data types, run the following query:
    select sensitive_data.category,
    	detections_data.type,
    	sum(cast(detections_data.count as INT)) total_detections
    from maciedetail_all_jobs,
    	unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),
    	unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by sensitive_data.category, detections_data.type
    order by total_detections desc
    LIMIT 5
    

    Running this query on the sample dataset gives the following output.

    Results of a query showing the five most common sensitive data types in the dataset

    Figure 5: Results of a query showing the five most common sensitive data types in the dataset

  4. (Optional) The previous query ran on all of the results available for Macie. You can further query which accounts have the greatest amount of sensitive data detected.
    select accountid,
    	sum(cast(detections_data.count as INT)) total_detections
    from maciedetail_all_jobs,
    	unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),
    	unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by accountid
    order by total_detections desc
    

    To test this query, we distributed the synthetic dataset across three member accounts in our organization, ran the query, and received the following output. If you enable Macie in just a single account, then you will only receive results for that one account.

    Figure 6: Query results for total number of sensitive data detections across all accounts in an organization

    Figure 6: Query results for total number of sensitive data detections across all accounts in an organization

For a list of more example queries, see the amazon-macie-results-analytics GitHub repository.

Step 3: Visualize the results with QuickSight

In the previous step, you used Athena to query your Macie discovery results. Although the queries were powerful, they only produced tabular data as their output. In this step, you will use QuickSight to visualize the results of your Macie jobs.

Before creating the visualizations, you first need to grant QuickSight the right permissions to access Athena, the results bucket, and the KMS key that you used to encrypt the results.

To allow QuickSight access to the KMS key

  1. Open the AWS Identity and Access Management (IAM) console, and then do the following:
    1. In the navigation pane, choose Roles.
    2. In the search pane for roles, search for aws-quicksight-s3-consumers-role-v0. If this role does not exist, search for aws-quicksight-service-role-v0.
    3. Select the role and copy the role ARN. You will need this role ARN to modify the KMS key policy to grant permissions for this role.
  2. Open the AWS KMS console and then do the following:
    1. Select Customer managed keys.
    2. Choose the KMS key that you created.
    3. Paste the following statement in the key policy. When you add the statement, do not delete any existing statements, and make sure that the syntax is valid. Replace <QUICKSIGHT_SERVICE_ROLE_ARN> and <KMS_KEY_ARN> with your own information. Policies are in JSON format.
	{ "Sid": "Allow Quicksight Service Role to use the key",
		"Effect": "Allow",
		"Principal": {
			"AWS": <QUICKSIGHT_SERVICE_ROLE_ARN>
		},
		"Action": "kms:Decrypt",
		"Resource": <KMS_KEY_ARN>
	}

To allow QuickSight access to Athena and the discovery results S3 bucket

  1. In QuickSight, in the upper right, choose your user icon to open the profile menu, and choose US East (N.Virginia). You can only modify permissions in this Region.
  2. In the upper right, open the profile menu again, and select Manage QuickSight.
  3. Select Security & permissions.
  4. Under QuickSight access to AWS services, choose Manage.
  5. Make sure that the S3 checkbox is selected, click on Select S3 buckets, and then do the following:
    1. Choose the discovery results bucket.
    2. You do not need to check the box under Write permissions for Athena workgroup. The write permissions are not required for this post.
    3. Select Finish.
  6. Make sure that the Amazon Athena checkbox is selected.
  7. Review the selections and be careful that you don’t inadvertently disable AWS services and resources that other users might be using.
  8. Select Save.
  9. In QuickSight, in the upper right, open the profile menu, and choose the Region where your results bucket is located.

Now that you’ve granted QuickSight the right permissions, you can begin creating visualizations.

To create a new dataset referencing the Athena table

  1. On the QuickSight start page, choose Datasets.
  2. On the Datasets page, choose New dataset.
  3. From the list of data sources, select Athena.
  4. Enter a meaningful name for the data source (for example, macie_datasource) and choose Create data source.
  5. Select the database that you created in Athena (for example, macie_results).
  6. Select the table that you created in Athena (for example, maciedetail_all_jobs), and choose Select.
  7. You can either import the data into SPICE or query the data directly. We recommend that you use SPICE for improved performance, but the visualizations will still work if you query the data directly.
  8. To create an analysis using the data as-is, choose Visualize.

You can then visualize the Macie results in the QuickSight console. The following example shows a delegated Macie administrator account that is running a visualization, with account IDs on the y axis and the count of affected resources on the x axis.

Figure 7: Visualize query results to identify total number of sensitive data detections across accounts in an organization

Figure 7: Visualize query results to identify total number of sensitive data detections across accounts in an organization

You can also visualize the aggregated data in QuickSight. For example, you can view the number of findings for each sensitive data category in each S3 bucket. The Athena table doesn’t provide aggregated data necessary for visualization. Instead, you need to query the table and then visualize the output of the query.

To query the table and visualize the output in QuickSight

  1. On the Amazon QuickSight start page, choose Datasets.
  2. On the Datasets page, choose New dataset.
  3. Select the data source that you created in Athena (for example, macie_datasource) and then choose Create Dataset.
  4. Select the database that you created in Athena (for example, macie_results).
  5. Choose Use Custom SQL, enter the following query below, and choose Confirm Query.
    	select resourcesaffected.s3bucket.name as bucket_name,
    		sensitive_data.category,
    		detections_data.type,
    		sum(cast(detections_data.count as INT)) total_detections
    	from macie_results.maciedetail_all_jobs,
    		unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by resourcesaffected.s3bucket.name, sensitive_data.category, detections_data.type
    order by total_detections desc
    	

  6. You can either import the data into SPICE or query the data directly.
  7. To create an analysis using the data as-is, choose Visualize.

Now you can visualize the output of the query that aggregates data across your S3 buckets. For example, we used the name of the S3 bucket to group the results, and then we created a donut chart of the output, as shown in Figure 6.

Figure 8: Visualize query results for total number of sensitive data detections across each S3 bucket in an organization

Figure 8: Visualize query results for total number of sensitive data detections across each S3 bucket in an organization

From the visualizations, we can identify which buckets or accounts in our organizations contain the most sensitive data, for further action. Visualizations can also act as a dashboard to track remediation.

If you encounter permissions issues, see Insufficient permissions when using Athena with Amazon QuickSight and Troubleshooting key access for troubleshooting steps.

You can replicate the preceding steps by using the sample queries from the amazon-macie-results-analytics GitHub repo to view data that is aggregated across S3 buckets, AWS accounts, or individual Macie jobs. Using these queries with the results of your Macie results will help you get started with tracking the security posture of your data in Amazon S3.

Conclusion

In this post, you learned how to enable sensitive data discovery results for Macie, query those results with Athena, and visualize the results in QuickSight.

Because Macie sensitive data discovery results provide more granular data than the findings, you can pursue a more comprehensive incident response when sensitive data is discovered. The sample queries in this post provide answers to some generic questions that you might have. After you become familiar with the structure, you can run other interesting queries on the data.

We hope that you can use this solution to write your own queries to gain further insights into sensitive data discovered in S3 buckets, according to the business needs and regulatory requirements of your organization. You can consider using this solution to better understand and identify data security risks that need immediate attention. For example, you can use this solution to answer questions such as the following:

  • Is financial information present in an AWS account where it shouldn’t be?
  • Are S3 buckets that contain PII properly hardened with access controls and encryption?

You can also use this solution to understand gaps in your data security initiatives by tracking files that Macie couldn’t analyze due to encryption or permission issues. To further expand your knowledge of Macie capabilities and features, see the following resources:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Keith Rozario

Keith is a Sr. Solution Architect at Amazon Web Services based in Singapore, where he helps customers develop solutions for their most complex business problems. He loves road cycling, reading comics from DC, and enjoying the sweet sound of music from AC/DC.

Author

Scott Ward

Scott is a Principal Solutions Architect with AWS External Security Services (ESS) and has been with Amazon for over 20 years. Scott provides technical guidance to the ESS services, such as GuardDuty, Security Hub, Macie, Inspector and Detective, and helps customers make their applications secure. Scott has a deep background in supporting, enhancing, and building global financial solutions to meet the needs of large companies, including many years of supporting the global financial systems for Amazon.com.

Author

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

How to use Amazon Macie to preview sensitive data in S3 buckets

Post Syndicated from Koulick Ghosh original https://aws.amazon.com/blogs/security/how-to-use-amazon-macie-to-preview-sensitive-data-in-s3-buckets/

Security teams use Amazon Macie to discover and protect sensitive data, such as names, payment card data, and AWS credentials, in Amazon Simple Storage Service (Amazon S3). When Macie discovers sensitive data, these teams will want to see examples of the actual sensitive data found. Reviewing a sampling of the discovered data helps them quickly confirm that the object is truly sensitive according to their data protection and privacy policies.

In this post, we walk you through how your data security teams are able to use a new capability in Amazon Macie to retrieve up to 10 examples of sensitive data found in your S3 objects, so that you are able to confirm the nature of the data at a glance. Additionally, we will discuss how you are able to control who is able to use this capability, so that only authorized personnel have permissions to view these examples.

The challenge customers face

After a Macie sensitive data discovery job is run, security teams start their work. The security team will review the Macie findings to investigate the discovered sensitive data and decide what actions to take to protect such data. The findings provide details that include the severity of the finding, information on the affected S3 object, and a summary of the type, location, and amount of sensitive data found. However, Macie findings only contain pointers to data that Macie found in the object. In order to complete their investigation, customers in the past had to do additional work to extract the contents of a sensitive object, such as navigating to a different AWS account where the object is located, downloading and manually searching for keywords in a file editor, or writing and refining SQL queries by using Amazon S3 Select. The investigations are further slowed down when the object type is one that is not easily readable without additional tooling, such as big-data file types like Avro and Parquet. By using the Macie capability to retrieve sensitive data samples, you are able to review the discovered data and make decisions concerning the finding remediation.

Prerequisites

To implement the ability to retrieve and reveal samples of sensitive data, you’ll need the following prerequisites:

  • Enable Amazon Macie in your AWS account. For instructions, see Getting started with Amazon Macie.
  • Set your account as the delegated Macie administrator account and enable Macie in at least one member account by using AWS Organizations. In this post, we will refer to the delegated administrator account as Account A and the member account as Account B.
  • Configure Macie detailed classification results in Account A.

    Note: The detailed classification results contain a record for each Amazon S3 object that you configure the job to analyze, and include the location of up to 1,000 occurrences of each type of sensitive data that Macie found in an object. Macie uses the location information in the detailed classification results to retrieve the examples of sensitive data. The detailed classification results are stored in an S3 bucket of your choice. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET1.

  • Create an S3 bucket that contains sensitive data in Account B. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET2.

    Note: You should enable server-side encryption on this bucket by using customer managed AWS Key Management Service (AWS KMS) keys (a type of encryption known as SSE-KMS).

  • (Optional) Add sensitive data to DOC-EXAMPLE-BUCKET2. This post uses a sample dataset that contains fake sensitive data. You are able to download this sample dataset, unarchive the .zip folder, and follow these steps to upload the objects to S3. This is a synthetic dataset generated by AWS that we will use for the examples in this post. All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not relate back to any individual person, nor is it intended to.
  • Create and run a sensitive data discovery job from Account A to analyze the contents of DOC-EXAMPLE-BUCKET2.
  • (Optional) Set up the AWS Command Line Interface (AWS CLI).

Configure Macie to retrieve and reveal examples of sensitive data

In this section, we’ll describe how to configure Macie so that you are able to retrieve and view examples of sensitive data from Macie findings.

To configure Macie (console)

  • In the AWS Management Console, in the Macie delegated administrator account (Account A), follow these steps from the Amazon Macie User Guide.

To configure Macie (AWS CLI)

  1. Confirm that you have Macie enabled.
    	$ aws macie2 get-macie-session --query 'status'
    	// The expected response is "ENABLED"

  2. Confirm that you have configured the detailed classification results bucket.
    	$ aws macie2 get-classification-export-configuration
    
    	// The expected response is:
    	{
       	 "configuration": {
       		 	    "s3Destination": {
            		    "bucketName": " DOC-EXAMPLE-BUCKET1 ",
               			"kmsKeyArn": "arn:aws:kms:<YOUR-REGION>:<YOUR-ACCOUNT-ID>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET1>"
         		  	 }
    		}	
    	} 

  3. Create a new KMS key to encrypt the retrieved examples of sensitive data. Make sure that the key is created in the same AWS Region where you are operating Macie.
    $ aws kms create-key
    {
        "KeyMetadata": {
            "Origin": "AWS_KMS",
            "KeyId": "<YOUR-KEY-ID>",
            "Description": "",
            "KeyManager": "CUSTOMER",
            "Enabled": true,
            "KeySpec": "SYMMETRIC_DEFAULT",
            "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
            "KeyUsage": "ENCRYPT_DECRYPT",
            "KeyState": "Enabled",
            "CreationDate": 1502910355.475,
            "Arn": "arn:aws:kms: <YOUR-AWS-REGION>:<AWS-ACCOUNT-A>:key/<YOUR-KEY-ID>",
            "AWSAccountId": "<AWS-ACCOUNT-A>",
            "MultiRegion": false
            "EncryptionAlgorithms": [
                "SYMMETRIC_DEFAULT"
            ],
        }
    }

  4. Give this key the alias REVEAL-KMS-KEY.
    $ aws kms CreateAlias
    {
       "AliasName": " <REVEAL-KMS-KEY> ",
       "TargetKeyId": "<YOUR-KEY-ID>"
    }

  5. Enable the feature in Macie and configure it to encrypt the data by using REVEAL-KMS-KEY. You do not specify a key policy for your new KMS key in this step. The key policy will be discussed later in the post.
    $ aws macie2 update-reveal-configuration --configuration '{"status":"ENABLED","kmsKeyId":"alias/ <REVEAL-KMS-KEY> "}'
    
    // The expected response is:
    {
        "configuration": {
            "kmsKeyId": "arn:aws:kms:<YOUR-REGION>: <YOUR ACCOUNT ID>:key/<REVEAL-KMS-KEY>.",
            "status": "ENABLED"
        }
    }

Control access to read sensitive data and protect data displayed in Macie

This new Macie capability uses the AWS Identity and Access Management (IAM) policies, S3 bucket policies, and AWS KMS key policies that you have defined in your accounts. This means that in order to see examples through the Macie console or by invoking the Macie API, the IAM principal needs to have read access to the S3 object and to decrypt the object if it is server-side encrypted. It’s important to note that Macie uses the IAM permissions of the AWS principal to locate, retrieve, and reveal the samples and does not use the Macie service-linked role to perform these tasks.

Using the setup discussed in the previous section, you will walk through how to control access to the ability to retrieve and reveal sensitive data examples. To recap, you created and ran a discovery job from the Amazon Macie delegated administrator account (Account A) to analyze the contents of DOC-EXAMPLE-BUCKET2 in a member account (Account B). You configured Macie to retrieve examples and to encrypt the examples of sensitive data with the REVEAL-KMS-KEY.

The next step is to create and use an IAM role that will be assumed by other users in Account A to retrieve and reveal examples of sensitive data discovered by Macie. In this post, we’ll refer to this role as MACIE-REVEAL-ROLE.

To apply the principle of least privilege and allow only authorized personnel to view the sensitive data samples, grant the following permissions so that Macie users who assume MACIE-REVEAL-ROLE will be able to successfully retrieve and reveal examples of sensitive data:

  • Step 1 – Update the IAM policy for MACIE-REVEAL-ROLE.
  • Step 2 – Update the KMS key policy for REVEAL-KMS-KEY.
  • Step 3 – Update the S3 bucket policy for DOC-EXAMPLE-BUCKET2 and the KMS key policy used for its server-side encryption in Account B.

After you grant these permissions, MACIE-REVEAL-ROLE is succcesfully able to retrieve and reveal examples of sensitive data in DOC-EXAMPLE-BUCKET2, as shown in Figure 1.

Figure 1: Macie runs the discovery job from the delegated administrator account in a member account, and MACIE-REVEAL-ROLE retrieves examples of sensitive data

Figure 1: Macie runs the discovery job from the delegated administrator account in a member account, and MACIE-REVEAL-ROLE retrieves examples of sensitive data

Step 1: Update the IAM policy

Provide the following required permissions to MACIE-REVEAL-ROLE:

  1. Allow GetObject from DOC-EXAMPLE-BUCKET2 in Account B.
  2. Allow decryption of DOC-EXAMPLE-BUCKET2 if it is server-side encrypted with a customer managed key (SSE-KMS).
  3. Allow GetObject from DOC-EXAMPLE-BUCKET1.
  4. Allow decryption of the Macie discovery results.
  5. Allow the necessary Macie actions to retrieve and reveal sensitive data examples.

To set up the required permissions

  • Use the following commands to provide the permissions. Make sure to replace the placeholders with your own data.
    {
        "Version": "2012-10-17",
        "Statement": [
    	{
                "Sid": "AllowGetFromCompanyDataBucket",
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<DOC-EXAMPLE-BUCKET2>/*"
            },
            {
                "Sid": "AllowKMSDecryptForCompanyDataBucket",
                "Effect": "Allow",
                "Action": [
                    "kms:Decrypt"
                ],
                "Resource": "arn:aws:kms:<AWS-Region>:<AWS-Account-B>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET2>"
            },
            {
                "Sid": "AllowGetObjectfromMacieResultsBucket",
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<DOC-EXAMPLE-BUCKET1>/*"
            },
    	{
                "Sid": "AllowKMSDecryptForMacieRoleDiscoveryBucket",
                "Effect": "Allow",
                "Action": [
                    "kms:Decrypt"
                ],
                "Resource": "arn:aws:kms:<AWS-REGION>:<AWS-ACCOUNT-A>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET1>"
            },
    	{
                "Sid": "AllowActionsRetrieveAndReveal",
                "Effect": "Allow",
                "Action": [
                    "macie2:GetMacieSession",
                    "macie2:GetFindings",
                    "macie2:GetSensitiveDataOccurrencesAvailability",
                    "macie2:GetSensitiveDataOccurrences",
                    "macie2:ListFindingsFilters",
                    "macie2:GetBucketStatistics",
                    "macie2:ListMembers",
                    "macie2:ListFindings",
                    "macie2:GetFindingStatistics",
                    "macie2:GetAdministratorAccount",
                    "macie2:GetClassificationExportConfiguration",
                    "macie2:GetRevealConfiguration",
                    "macie2:DescribeBuckets"
                ],
                "Resource": "*” 
            }
        ]
    }

Step 2: Update the KMS key policy

Next, update the KMS key policy that is used to encrypt sensitive data samples that you retrieve and reveal in your delegated administrator account.

To update the key policy

  • Allow the MACIE-REVEAL-ROLE access to the KMS key that you created for protecting the retrieved sensitive data, using the following commands. Make sure to replace the placeholders with your own data.
    	{
                "Sid": "AllowMacieRoleDecrypt",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam:<AWS-REGION>:<AWS-ACCOUNT-A>:role/<MACIE-REVEAL-ROLE>"
                },
                "Action": [
                    "kms:Decrypt",
                    "kms:DescribeKey",
                    "kms:GenerateDataKey"
                ],
                "Resource": "arn:aws:kms:<AWS-REGION>:<AWS-ACCOUNT-A>:key/<REVEAL-KMS-KEY>"
            }

Step 3: Update the bucket policy of the S3 bucket

Finally, update the bucket policy of the S3 bucket in member accounts, and update the key policy of the key used for SSE-KMS.

To update the S3 bucket policy and KMS key policy

  1. Use the following commands to update key policy for the KMS key used for server-side encryption of the DOC-EXAMPLE-BUCKET2 bucket in Account B.
    	{
                "Sid": "AllowMacieRoleDecrypt”
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam:<AWS-REGION>:<AWS-ACCOUNT-A>:role/<MACIE-REVEAL-ROLE>"
                },
                "Action": "kms:Decrypt",
                "Resource": "arn:aws:kms:<AWS-REGION>:<AWS-ACCOUNT-B>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET2>"
      }

  2. Use the following commands to update the bucket policy of DOC-EXAMPLE-BUCKET2 to allow cross-account access for MACIE-REVEAL-ROLE to get objects from this bucket.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowMacieRoleGet",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::<AWS-ACCOUNT-A>:role/<MACIE-REVEAL-ROLE>"
                },
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<DOC-EXAMPLE-BUCKET2>/*"
            }
        ]
    }

Retrieve and reveal sensitive data samples

Now that you’ve put in place the necessary permissions, users who assume MACIE-REVEAL-ROLE will be able to conveniently retrieve and reveal sensitive data samples.

To retrieve and reveal sensitive data samples

  1. In the Macie console, in the left navigation pane, choose Findings, and select a specific finding. Under Sensitive Data, choose Review.
    Figure 2: The finding details panel

    Figure 2: The finding details panel

  2. On the Reveal sensitive data page, choose Reveal samples.
    Figure 3: The Reveal sensitive data page

    Figure 3: The Reveal sensitive data page

  3. Under Sensitive data, you will be able to view up to 10 examples of the sensitive data found by Amazon Macie.
    Figure 4: Examples of sensitive data revealed in the Amazon Macie console

    Figure 4: Examples of sensitive data revealed in the Amazon Macie console

You are able to find additional information on setting up the Macie Reveal function in the Amazon Macie User Guide.

Conclusion

In this post, we showed how you are to retrieve and review examples of sensitive data that were found in Amazon S3 using Amazon Macie. This capability will make it easier for your data protection teams to review the sensitive contents found in S3 buckets across the accounts in your AWS environment. With this information, security teams are able to quickly take remediation actions, such as updating the configuration of sensitive buckets, quarantining files with sensitive information, or sending a notification to the owner of the account where the sensitive data resides. In certain cases, you are able to add the examples to an allow list in Macie if you don’t want Macie to report those as sensitive data (for example, corporate addresses or sample data that is used for testing).

The following are links to additional resources that you will be able to use to expand your knowledge of Amazon Macie capabilities and features:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Author

Michael Ingoldby

Michael is a Senior Security Solutions Architect at AWS based in Frisco, Texas. He provides guidance and helps customers to implement AWS native security services. Michael has been working in the security domain since 2006. When he is not working, he enjoys spending time outdoors.

Robert Wu

Robert Wu

Robert is the Software Development Engineer for AWS Macie, working on enabling customers with more sensitive data discovery capabilities. In his free time, he enjoys exploring and contributing to various open-source projects to widen his domain knowledge.

Use Security Hub custom actions to remediate S3 resources based on Macie discovery results

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/security/use-security-hub-custom-actions-to-remediate-s3-resources-based-on-macie-discovery-results/

The amount of data available to be collected, stored and processed within an organization’s AWS environment can grow rapidly and exponentially. This increases the operational complexity and the need to identify and protect sensitive data. If your security teams need to review and remediate security risks manually, it would either take a large team or the actions might not be timely. There is also a chance that with manual operation, a step could be missed or the incorrect action could be taken. As a result, your security teams will need an automated and scalable way to support these operations efficiently.

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. Macie generates findings for sensitive data in an S3 object or a potential issue with the security or privacy of an S3 bucket. AWS Security Hub allows you to gain a centralized view into the security posture across your AWS environment by aggregating security findings from various AWS services and partner products, including Amazon Macie. Security Hub also includes the custom actions feature, which you can use to create actions for response and remediation to selected findings within the Security Hub console in an efficient and consistent manner.

It is important for your security teams to create effective and standardized mechanisms for taking action against Macie findings to ensure that data remains secure. By using Security Hub custom actions, you can have predefined actions for the security team to take against Macie findings without having to manually find and remediate the resources.

This blog post provides you with an example solution for responding to Macie sensitive data findings and policy findings in Security Hub by using custom actions. I will walk through the components of the solution, as well as opportunities where resources can be customized for your specific use case.

Prerequisites

You must have AWS Security Hub and Amazon Macie enabled in the AWS account where you are deploying this solution.

Solution overview

In this solution, you’ll use a combination of Security Hub custom actions, Amazon EventBridge, and AWS Lambda to take action on Macie findings in Security Hub. You will be working with the findings within the same AWS account where you deployed the solution.

Macie generates two categories of findings relating to different resources, which will require different remediation actions.

  1. Policy finding is a detailed report of a potential policy violation or issue with the security or privacy of an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Sensitive data finding is a detailed report of sensitive data in an S3 object.

A full list of Macie finding types can be found in the Macie User guide.

For the two Macie finding categories, there is an associated Security Hub custom action:

  1. Custom action for sensitive data finding (S3 object) – When the security team selects this custom action, the action invokes a Lambda function that will take the following steps on the S3 object in the Macie finding:
    1. Tag the object with the Security Hub finding ID
    2. Encrypt the S3 object with a different customer-managed KMS key
    3. Update the Security Hub finding workflow status to RESOLVED
  2. Custom action for policy finding (S3 bucket). When you select this this custom action, it invokes a Lambda function that will take the following steps on the S3 bucket in the Macie finding:
    1. Tag the object with the Security Hub finding ID
    2. Update the S3 bucket configuration to:
      • Enable default encryption
      • Enable public access block
    3. Update the Security Hub finding workflow status to RESOLVED

The solution is configured to take action within the AWS account where the finding and corresponding resource is generated. In order to enable cross-account remediation, you will need to deploy an additional IAM role for the automation to assume and provision a KMS key to use for encryption.

Note: The custom actions in this solution are meant to be examples of actions to take against Macie policy and sensitive data findings. These actions will be different depending on your use-case and environment. You will also need to review and update the associated Lambda function execution role IAM policies accordingly.

Solution architecture

Figure 1: Resources deployed in the Security AWS account taking action on resources identified in the Workload AWS account

Figure 1: Resources deployed in the Security AWS account taking action on resources identified in the Workload AWS account

Figure 1 shows the architecture for the solution. The workflow is as follows:

  1. A Macie job runs and creates findings, which are sent to Security Hub in the same AWS account as the Macie finding.
  2. The delegated administrator Security Hub account combines findings across all member Security Hub accounts, including Macie findings.
  3. The security team reviews the Macie findings in the Security Hub delegated administrator account and determines to take remediation actions for a finding by selecting the finding and then selecting the appropriate Security Hub custom action.
  4. The Security Hub custom action sends the finding to the EventBridge rule, which is linked to the Lambda function.
  5. The EventBridge rule invokes the Lambda function to take action against the resources from the Macie finding.
  6. The Lambda function will:
    1. Take action for the S3 resource
    2. Mark the Macie finding as resolved in the delegated administrator Security Hub account

The solution is currently intended to work in a single Region. In order to enable this solution across Regions, you will need to change the Remediation Lambda function code for any regional resources used for remediation actions (i.e. AWS Key Management Service).

Deploy the solution

You can deploy the solution through either the AWS Management Console or the AWS Cloud Development Kit (AWS CDK).

To deploy the solution by using the AWS Management Console

  • In your security tooling account, launch the AWS CloudFormation template by choosing the following Launch Stack button. It will take approximately 10 minutes for the CloudFormation stack to complete.
    Select this image to open a link that starts building the CloudFormation stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, download the solution’s CloudFormation template, modify it, and deploy it to the selected Region.

  • (OPTIONAL) If you want to enable cross-account remediation, launch the following AWS CloudFormation template in the AWS account where you want to be able to take remediation actions. You can also use AWS CloudFormation StackSets if deploying to multiple AWS accounts.
    Select this image to open a link that starts building the CloudFormation stack

To deploy the solution by using AWS CDK

You can find the latest code in our GitHub repository, where you can also contribute to the sample code. The following commands show how to deploy the solution by using the AWS CDK. First, the CDK initializes your environment and uploads the AWS Lambda assets to Amazon S3. Then, you can deploy the solution to your account. Make sure to replace <AWS_ACCOUNT> with the account number, and replace <REGION> with the AWS Region that you want the solution deployed to.

  1. Run the following commands in your terminal while authenticated in the security tooling AWS account:

    cdk bootstrap aws://<Security_Tooling_AWS_ACCOUNT>/<REGION>

    cdk deploy MacieRemediationStack

  2. (OPTIONAL) If you want to enable cross-account remediation, Run the following commands in your terminal while authenticated to member AWS account:

    cdk bootstrap aws://<Member_AWS_ACCOUNT>/<REGION>

    cdk deploy MacieRemediationIAMStack –parameters solutionaccount=<Security_Tooling_AWS_ACCOUNT>

Solution walkthrough and validation

Now that you’ve successfully deployed the solution, you can see things in action. You have two options for testing the workflow on your own:

  1. Use a sample event, generated by a Macie finding in Security Hub, and invoke the Lambda function that is tied to the Security Hub custom action.

    Note: If using sample events, you can replace the values for the resources with real resources. Otherwise, you will not be able to see the Lambda function successfully take action because the resource in your sample event may not exist.

  2. Generate demo Macie findings in Security Hub by using this sample data for Amazon Macie.

I have existing findings for Macie generated in my AWS account, and in the procedures in this section, I’ll walk through taking action against these.

Note: If you set up Macie and Security Hub in a delegated administrator and member model that ingests findings from other AWS accounts, the IAM remediation roles for the S3 bucket and S3 objects must be deployed in the member accounts.

Review deployed resources in the AWS console

Before taking action on your sample findings, review the deployed resources that you’ll use.

To review deployed resources

  1. In the AWS account console where the automation was deployed, go to Security Hub, choose Settings, and then choose Custom actions. You should see two custom actions:
    • Macie Policy Finding
      • arn:aws:securityhub:<region>:<account-id>:action/custom/MacieS3BucketPolicy
    • Macie Data Finding
      • arn:aws:securityhub:<region>:<account-id>:action/custom/MacieSensitiveData
        Figure 2: Custom actions in Security Hub

        Figure 2: Custom actions in Security Hub

  2. Navigate to the EventBridge console and then choose Rules. You should see four rules:
    • Disabled – These are disabled by default during deployment
      • Autoremediate_Macie_Policy_Finding
      • Autoremediate_Macie_Sensitive_Data_Finding
        Figure 3: Disabled EventBridge rules for autoremediation of Macie findings in Security Hub

        Figure 3: Disabled EventBridge rules for autoremediation of Macie findings in Security Hub

    • Enabled – These are enabled by default during deployment:
      • Custom_Action_Macie_Policy_Finding
      • Custom_Action_Macie_Sensitive_Data_Finding
        Figure 4: Enabled EventBridge rules tied to the Security Hub custom actions

        Figure 4: Enabled EventBridge rules tied to the Security Hub custom actions

    In the enabled EventBridge rules, you should see the corresponding Security Hub custom action Amazon Resource Names (ARNs) in the rule event pattern.

    Figure 5: Enabled EventBridge rule event pattern for the Security Hub custom action

    Figure 5: Enabled EventBridge rule event pattern for the Security Hub custom action

Take action on an Amazon Macie object or policy finding

Each Security Hub custom action invokes a corresponding Lambda function that is configured as a target in the EventBridge rule. The Lambda function parses the information in the Macie finding from Security Hub to take action.

Each Security Hub custom action is specific to either an S3 object or an S3 bucket. If you attempt a custom action meant for an S3 object against a Macie policy finding, this will successfully initiate the custom action, but the Lambda function that is invoked will be unsuccessful.

If the Macie finding is specific to an S3 object, the title will display “The S3 object …,” whereas if the Macie finding is for a policy finding, the title will display information for an S3 bucket.

To take action on findings

  1. In the AWS account console where the automation was deployed, navigate to AWS Security Hub, and then choose Findings.
  2. Filter the findings by setting Product Name to Macie.
    Figure 6: Filter for Macie findings in Security Hub

    Figure 6: Filter for Macie findings in Security Hub

  3. Select the checkbox for either a Macie policy finding or a sensitive data finding; this will select a custom action. After you select the action, there is no confirmation step, and the action will invoke the Lambda function.
    Figure 7: Validate Custom Action has sent the finding to Amazon CloudWatch Events (EventBridge rule)

    Figure 7: Validate Custom Action has sent the finding to Amazon CloudWatch Events (EventBridge rule)

Review and validate the Security Hub custom action on target resources

In order to validate or troubleshoot the solution, you need to review whether the Lambda function was able to take action against the resources in the Security Hub finding for Macie.

To validate or troubleshoot the custom action

  1. For validation of sensitive data finding remediation, review S3 object configuration:
    1. Navigate to the Amazon S3 console.
    2. Choose the S3 object in the Macie finding.
    3. Choose the Properties tab and review the following fields:
      • Tags should be set to SH_Finding_ID.
      • AWS KMS key ARN should be set to the KMS key with the alias `macie_key`
        1. Click on the KMS key ARN and validate the key’s alias is the key deployed in the solution
  2. For validation of policy finding remediation, review the S3 bucket configuration:
    1. Navigate to the Amazon S3 console.
    2. Choose the S3 bucket in the Macie finding.
    3. Choose the Properties tab and review the following fields:
      • Tags should be set to SH_Finding_ID.
      • Default Encryption should be set to Enabled.
    4. Choose the Permissions tab and review the following fields:
      • Block public access should be set to On.
  3. For troubleshooting, you can review the CloudWatch logs for the Lambda function:
    1. Navigate to the CloudWatch console.
    2. Choose /aws/lambda/Remediate_Macie_S3_Bucket.
    3. Choose the most recent log stream and review the logs to see what actions were taken on the resources.

Next steps and customization

The solution in this post has a custom action for an S3 object and an S3 bucket, and is meant to serve as a template. You could modify the Lambda functions associated with the custom actions to take different or additional actions that are specific to your environment and data classification.

Additionally, I walked through specific Security Hub custom actions for Macie policy (bucket) or sensitive data (objects) findings. If you have defined actions to take for both, you could consolidate the custom actions and invoke a Lambda function that parses information from the Security Hub Macie finding to determine if it is a policy or sensitive data finding.

The two disabled EventBridge rules deployed as part of the solution are examples that can be leveraged for auto-remediation. After you use Security Hub’s custom actions to remediate findings, your security team could start to see a trend where you always want to take specific actions and enable the EventBridge rules to take action without requiring your security team to select a custom action in Security Hub in the AWS console.

  • Autoremediate_Macie_Policy_Finding
  • Autoremediate_Macie_Sensitive_Data_Finding

Conclusion

In this post, you deployed a solution to allow your security team to take automated actions against a Macie sensitive data and policy finding from Security Hub by using custom actions in the AWS console. We walked through what the solution does and how the solution can be customized to your use case.

If you have feedback about this post, submit comments in the Comments section below. If you have any questions about this post, start a thread on the AWS Security Hub forum or Amazon Macie forum.

Want more AWS Security news? Follow us on Twitter.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a Shared Delivery Team Senior Security Consultant at AWS. His background is in AWS Security with a focus on threat detection and incident response. Today, he helps enterprise customers develop a comprehensive security strategy and deploy security solutions at scale, and he trains customers on AWS Security best practices.