Tag Archives: Amazon Macie

Detect Stripe keys in S3 buckets with Amazon Macie

Post Syndicated from Koulick Ghosh original https://aws.amazon.com/blogs/security/detect-stripe-keys-in-s3-buckets-with-amazon-macie/

Many customers building applications on Amazon Web Services (AWS) use Stripe global payment services to help get their product out faster and grow revenue, especially in the internet economy. It’s critical for customers to securely and properly handle the credentials used to authenticate with Stripe services. Much like your AWS API keys, which enable access to your AWS resources, Stripe API keys grant access to the Stripe account, which allows for the movement of real money. Therefore, you must keep Stripe’s API keys secret and well-controlled. And, much like AWS keys, it’s important to invalidate and re-issue Stripe API keys that have been inadvertently committed to GitHub, emitted in logs, or uploaded to Amazon Simple Storage Service (Amazon S3).

Customers have asked us for ways to reduce the risk of unintentionally exposing Stripe API keys, especially when code files and repositories are stored in Amazon S3. To help meet this need, we collaborated with Stripe to develop a new managed data identifier that you can use to help discover and protect Stripe API keys.

“I’m really glad we could collaborate with AWS to introduce a new managed data identifier in Amazon Macie. Mutual customers of AWS and Stripe can now scan S3 buckets to detect exposed Stripe API keys.”
Martin Pool, Staff Engineer in Cloud Security at Stripe

In this post, we will show you how to use the new managed data identifier in Amazon Macie to discover and protect copies of your Stripe API keys.

About Stripe API keys

Stripe provides payment processing software and services for businesses. Using Stripe’s technology, businesses can accept online payments from customers around the globe.

Stripe authenticates API requests by using API keys, which are included in the request. Stripe takes various measures to help customers keep their secret keys safe and secure. Stripe users can generate test-mode keys, which can only access simulated test data, and which doesn’t move real money. Stripe encourages its customers to use only test API keys for testing and development purposes to reduce the risk of inadvertent disclosure of live keys or of accidentally generating real charges.

Stripe also supports publishable keys, which you can make publicly accessible in your web or mobile app’s client-side code to collect payment information.

In this blog post, we focus on live-mode keys, which are the primary security concern because they can access your real data and cause money movement. These keys should be closely held within the production services that need to use them. Stripe allows keys to be restricted to read or write specific API resources, or used only from certain IP ranges, but even with these restrictions, you should still handle live mode keys with caution.

Stripe keys have distinctive prefixes to help you detect them such as sk_live_ for secret keys, and rk_live_ for restricted keys (which are also secret).

Amazon Macie

Amazon Macie is a fully managed service that uses machine learning (ML) and pattern matching to discover and help protect your sensitive data, such as personally identifiable information. Macie can also provide detailed visibility into your data and help you align with compliance requirements by identifying data that needs to be protected under various regulations, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).

Macie supports a suite of managed data identifiers to make it simpler for you to configure and adopt. Managed data identifiers are prebuilt, customizable patterns that help automatically identify sensitive data, such as credit card numbers, social security numbers, and email addresses.

Now, Macie has a new managed data identifier STRIPE_CREDENTIALS that you can use to identify Stripe API secret keys.

Configure Amazon Macie to detect Stripe credentials

In this section, we show you how to use the managed data identifier STRIPE_CREDENTIALS to detect Stripe API secret keys. We recommend that you carry out these tutorial steps in an AWS account dedicated to experimentation and exploration before you move forward with detection in a production environment.

Prerequisites

To follow along with this walkthrough, complete the following prerequisites.

Create example data

The first step is to create some example objects in an S3 bucket in the AWS account. The objects contain strings that resemble Stripe secret keys. You will use the example data later to demonstrate how Macie can detect Stripe secret keys.

To create the example data

  1. Open the S3 console and create an S3 bucket.
  2. Create four files locally, paste the following mock sensitive data into those files, and upload them to the bucket.
    file1
     stripe publishable key sk_live_cpegcLxKILlrXYNIuqYhGXoy
    
    file2
     sk_live_cpegcLxKILlrXYNIuqYhGXoy
     sk_live_abcdcLxKILlrXYNIuqYhGXoy
     sk_live_efghcLxKILlrXYNIuqYhGXoy
     stripe payment sk_live_ijklcLxKILlrXYNIuqYhGXoy
    
     file3
     sk_live_cpegcLxKILlrXYNIuqYhGXoy
     stripe api key sk_live_abcdcLxKILlrXYNIuqYhGXoy
    
     file4
     stripe secret key sk_live_cpegcLxKILlrXYNIuqYhGXoy

Note: The keys mentioned in the preceding files are mock data and aren’t related to actual live Stripe keys.

Create a Macie job with the STRIPE_CREDENTIALS managed data identifier

Using Macie, you can scan your S3 buckets for sensitive data and security risks. In this step, you run a one-time Macie job to scan an S3 bucket and review the findings.

To create a Macie job with STRIPE_CREDENTIALS

  1. Open the Amazon Macie console, and in the left navigation pane, choose Jobs. On the top right, choose Create job.
    Figure 1: Create Macie Job

    Figure 1: Create Macie Job

  2. Select the bucket that you want Macie to scan or specify bucket criteria, and then choose Next.
    Figure 2: Select S3 bucket

    Figure 2: Select S3 bucket

  3. Review the details of the S3 bucket, such as estimated cost, and then choose Next.
    Figure 3: Review S3 bucket

    Figure 3: Review S3 bucket

  4. On the Refine the scope page, choose One-time job, and then choose Next.

    Note: After you successfully test, you can schedule the job to scan S3 buckets at the frequency that you choose.

    Figure 4: Select one-time job

    Figure 4: Select one-time job

  5. For Managed data identifier options, select Custom and then select Use specific managed data identifiers. For Select managed data identifiers, search for STRIPE_CREDENTIALS and then select it. Choose Next.
    Figure 5: Select managed data identifier

    Figure 5: Select managed data identifier

  6. Enter a name and an optional description for the job, and then choose Next.
    Figure 6: Enter job name

    Figure 6: Enter job name

  7. Review the job details and choose Submit. Macie will create and start the job immediately, and the job will run one time.
  8. When the Status of the job shows Complete, select the job, and from the Show results dropdown, select Show findings.
    Figure 7: Select the job and then select Show findings

    Figure 7: Select the job and then select Show findings

  9. You can now review the findings for sensitive data in your S3 bucket. As shown in Figure 8, Macie detected Stripe keys in each of the four files, and categorized the findings as High severity. You can review and manage the findings in the Macie console, retrieve them through the Macie API for further analysis, send them to Amazon EventBridge for automated processing, or publish them to AWS Security Hub for a comprehensive view of your security state.
    Figure 8: Review the findings

    Figure 8: Review the findings

Respond to unintended disclosure of Stripe API keys

If you discover Stripe live-mode keys (or other sensitive data) in an S3 bucket, then through the Stripe dashboard, you can roll your API keys to revoke access to the compromised key and generate a new one. This helps ensure that the key can’t be used to make malicious API requests. Make sure that you install the replacement key into the production services that need it. In the longer term, you can take steps to understand the path by which the key was disclosed and help prevent a recurrence.

Conclusion

In this post, you learned about the importance of safeguarding Stripe API keys on AWS. By using Amazon Macie with managed data identifiers, setting up regular reviews and restricted access to S3 buckets, training developers in security best practices, and monitoring logs and repositories, you can help mitigate the risk of key exposure and potential security breaches. By adhering to these practices, you can help ensure a robust security posture for your sensitive data on AWS.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers about how AWS Security services can help improve their security. In his free time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Sagar Gandha

Sagar Gandha

Sagar is an experienced Senior Technical Account Manager at AWS adept at assisting large customers in enterprise support. He offers expert guidance on best practices, facilitates access to subject matter experts, and delivers actionable insights on optimizing AWS spend, workloads, and events. Outside of work, Sagar loves spending time with his kids.

Mohan Musti

Mohan Musti

Mohan is a Senior Technical Account Manager at AWS based in Dallas. Mohan helps customers architect and optimize applications on AWS. In his spare time, he enjoys spending time with his family and camping.

Data masking and granular access control using Amazon Macie and AWS Lake Formation

Post Syndicated from Iris Ferreira original https://aws.amazon.com/blogs/security/data-masking-and-granular-access-control-using-amazon-macie-and-aws-lake-formation/

Companies have been collecting user data to offer new products, recommend options more relevant to the user’s profile, or, in the case of financial institutions, to be able to facilitate access to higher credit lines or lower interest rates. However, personal data is sensitive as its use enables identification of the person using a specific system or application and in the wrong hands, this data might be used in unauthorized ways. Governments and organizations have created laws and regulations, such as General Data Protection Regulation (GDPR) in the EU, General Data Protection Law (LGPD) in Brazil, and technical guidance such as the Cloud Computing Implementation Guide published by the Association of Banks in Singapore (ABS), that specify what constitutes sensitive data and how companies should manage it. A common requirement is to ensure that consent is obtained for collection and use of personal data and that any data collected is anonymized to protect consumers from data breach risks.

In this blog post, we walk you through a proposed architecture that implements data anonymization by using granular access controls according to well-defined rules. It covers a scenario where a user might not have read access to data, but an application does. A common use case for this scenario is a data scientist working with sensitive data to train machine learning models. The training algorithm would have access to the data, but the data scientist would not. This approach helps reduce the risk of data leakage while enabling innovation using data.

Prerequisites

To implement the proposed solution, you must have an active AWS account and AWS Identity and Access Management (IAM) permissions to use the following services:

Note: If there’s a pre-existing Lake Formation configuration, there might be permission issues when testing this solution. We suggest that you test this solution on a development account that doesn’t yet have Lake Formation active. If you don’t have access to a development account, see more details about the permissions required on your role in the Lake Formation documentation.

You must give permission for AWS DMS to create the necessary resources, such as the EC2 instance where you will run DMS tasks. If you have ever worked with DMS, this permission should already exist. Otherwise, you can use CloudFormation to create the necessary roles to deploy the solution. To see if permission already exists, open the AWS Management Console and go to IAM, select Roles, and see if there is a role called dms-vpc-role. If not, you must create the role during deployment.

We use the Faker library to create dummy data consisting of the following tables:

  • Customer
  • Bank
  • Card

Solution overview

This architecture allows multiple data sources to send information to the data lake environment on AWS, where Amazon S3 is the central data store. After the data is stored in an S3 bucket, Macie analyzes the objects and identifies sensitive data using machine learning (ML) and pattern matching. AWS Glue then uses the information to run a workflow to anonymize the data.

Figure 1: Solution architecture for data ingestion and identification of PII

Figure 1: Solution architecture for data ingestion and identification of PII

We will describe two techniques used in the process: data masking and data encryption. After the workflow runs, the data is stored in a separate S3 bucket. This hierarchy of buckets is used to segregate access to data for different user personas.

Figure 1 depicts the solution architecture:

  1. The data source in the solution is an Amazon RDS database. Data can be stored in a database on an EC2 instance, in an on-premises server, or even deployed in a different cloud provider.
  2. AWS DMS uses full load, which allows data migration from the source (an Amazon RDS database) into the target S3 bucket — dcp-macie — as a one-time migration. New objects uploaded to the S3 bucket are automatically encrypted using server-side encryption (SSE-S3).
  3. A personally identifiable information (PII) detection pipeline is invoked after the new Amazon S3 objects are uploaded. Macie analyzes the objects and identifies values that are sensitive. Users can manually identify which fields and values within the files should be classified as sensitive or use the Macie automated sensitive data discovery capabilities.
  4. The sensitive values identified by Macie are sent to EventBridge, invoking Kinesis Data Firehose to store them in the dcp-glue S3 bucket. AWS Glue uses this data to know which fields to mask or encrypt using an encryption key stored in AWS KMS.
    1. Using EventBridge enables an event-based architecture. EventBridge is used as a bridge between Macie and Kinesis Data Firehose, integrating these services.
    2. Kinesis Data Firehose supports data buffering mitigating the risk of information loss when sent by Macie while reducing the overall cost of storing data in Amazon S3. It also allows data to be sent to other locations, such as Amazon Redshift or Splunk, making it available to be analyzed by other products.
  5. At the end of this step, Amazon S3 is invoked from a Lambda function that starts the AWS Glue workflow, which masks and encrypts the identified data.
    1. AWS Glue starts a crawler on the S3 bucket dcp-macie (a) and the bucket dcp-glue (b) to populate two tables, respectively, created as part of the AWS Glue service.
    2. After that, a Python script is run (c), querying the data in these tables. It uses this information to mask and encrypt the data and then store it in the prefixes dcp-masked (d) and dcp-encrypted (e) in the bucket dcp-athena.
    3. The last step in the workflow is to perform a crawler for each of these prefixes (f) and (g) by creating their respective tables in the AWS Glue Data Catalog.
  6. To enable fine-grained access to data, Lake Formation maps permissions to the tags you have configured. The implementation of this part is described further in this post.
  7. Athena can be used to query the data. Other tools, such as Amazon Redshift or Amazon Quicksight can also be used, as well as third-party tools.

If a user lacks permission to view sensitive data but needs to access it for machine learning model training purposes, AWS KMS can be used. The AWS KMS service manages the encryption keys that are used for data masking and to give access to the training algorithms. Users can see the masked data, but the algorithms can use the data in its original form to train the machine learning models.

This solution uses three personas:

secure-lf-admin: Data lake administrator. Responsible for configuring the data lake and assigning permissions to data administrators.
secure-lf-business-analyst: Business analyst. No access to certain confidential information.
secure-lf-data-scientist: Data scientist. No access to certain confidential information.

Solution implementation

To facilitate implementation, we created a CloudFormation template. The model and other artifacts produced can be found in this GitHub repository. You can use the CloudFormation dashboard to review the output of all the deployed features.

Choose the following Launch Stack button to deploy the CloudFormation template.

Select this image to open a link that starts building the CloudFormation stack

Deploy the CloudFormation template

To deploy the CloudFormation template and create the resources in your AWS account, follow the steps below.

  1. After signing in to the AWS account, deploy the CloudFormation template. On the Create stack window, choose Next.
    Figure 2: CloudFormation create stack screen

    Figure 2: CloudFormation create stack screen

  2. In the following section, enter a name for the stack. Enter a password in the TestUserPassword field for Lake Formation personas to use to sign in to the console. When finished filling in the fields, choose Next.
  3. On the next screen, review the selected options and choose Next.
  4. In the last section, review the information and select I acknowledge that AWS CloudFormation might create IAM resources with custom names. Choose Create Stack.
    Figure 3: List of parameters and values in the CloudFormation stack

    Figure 3: List of parameters and values in the CloudFormation stack

  5. Wait until the stack status changes to CREATE_COMPLETE.

The deployment process should take approximately 15 minutes to finish.

Run an AWS DMS task

To extract the data from the Amazon RDS instance, you must run an AWS DMS task. This makes the data available to Macie in an S3 bucket in Parquet format.

  1. Open the AWS DMS console.
  2. On the navigation bar, for the Migrate data option, select Database migration tasks.
  3. Select the task with the name rdstos3task.
  4. Choose Actions.
  5. Choose Restart/Resume. The loading process should take around 1 minute.

When the status changes to Load Complete, you will be able to see the migrated data in the target bucket (dcp-macie-<AWS_REGION>-<ACCOUNT_ID>) in the dataset folder. Within each prefix there will be a parquet file that follows the naming pattern: LOAD00000001.parquet. After this step, use Macie to scan the data for sensitive information in the files.

Run a classification job with Macie 

You must create a data classification job before you can evaluate the contents of the bucket. The job you create will run and evaluate the full contents of your S3 bucket to determine the files stored in the bucket contain PII. This job uses the managed identifiers available in Macie and a custom identifier.

  1. Open the Macie Console, on the navigation bar, select Jobs.
  2. Choose Create job.
  3. Select the S3 bucket dcp-macie-<AWS_REGION>-<ACCOUNT_ID> containing the output of the AWS DMS task. Choose Next to continue.
  4. On the Review Bucket page, verify the selected bucket is dcp-macie-<AWS_REGION>-<ACCOUNT_ID>, and then choose Next.
  5. In Refine the scope, create a new job with the following scope:
    1. Sensitive data Discovery options: One-time job (for demonstration purposes, this will be a single discovery job. For production environments, we recommend selecting the Scheduled job option, so Macie can analyze objects following a scheduled).
    2. Sampling Depth: 100 percent.
    3. Leave the other settings at their default values.
  6. On Managed data identifiers options, select All so Macie can use all managed data identifiers. This enables a set of built-in criteria to detect all identified types of sensitive data. Choose Next.
  7. On the Custom data identifiers option, select account_number, and then choose Next. With the custom identifier, you can create custom business logic to look for certain patterns in files stored in Amazon S3. In this example, the task generates a discovery job for files that contain data with the following regular expression format XYZ- followed by numbers, which is the default format of the false account_number generated in the dataset. The logic used for creating this custom data identifier is included in the CloudFormation template file.
  8. On the Select allow lists, choose Next to continue.
  9. Enter a name and description for the job.
  10. Choose Next to continue.
  11. On Review and create step, check the details of the job you created and choose Submit.
    Figure 4: List of Macie findings detected by the solution

    Figure 4: List of Macie findings detected by the solution

The amount of data being scanned directly influences how long the job takes to run. You can choose the Update button at the top of the screen, as shown in Figure 4, to see the updated status of the job. This job, based on the size of the test dataset, will take about 10 minutes to complete.

Run the AWS Glue data transformation pipeline

After the Macie job is finished, the discovery results are ingested into the bucket dcp-glue-<AWS_REGION>-<ACCOUNT_ID>, invoking the AWS Glue step of the workflow (dcp-Workflow), which should take approximately 11 minutes to complete.

To check the workflow progress:

  1. Open the AWS Glue console and on navigation bar, select Workflows (orchestration).
  2. Next, choose dcp-workflow.
  3. Next, select History to see the past runs of the dcp-workflow.

The AWS Glue job, which is launched as part of the workflow (dcp-workflow), reads the Macie findings to know the exact location of sensitive data. For example, in the customer table are name and birthdate. In the bank table are account_number, iban, and bban. And in the card table are card_number, card_expiration, and card_security_code. After this data is found, the job masks and encrypts the information.

Text encryption is done using an AWS KMS key. Here is the code snippet that provides this functionality:

def encrypt_rows(r):
    encrypted_entities = columns_to_be_masked_and_encrypted
    try:
        for entity in encrypted_entities:
            if entity in table_columns:
                encrypted_entity = get_kms_encryption(r[entity])
                r[entity + '_encrypted'] = encrypted_entity.decode("utf-8")
                del r[entity]
    except:
        print ("DEBUG:",sys.exc_info())
    return r

def get_kms_encryption(row):
    # Create a KMS client
    session = boto3.session.Session()
    client = session.client(service_name='kms',region_name=region_name)
   
    try:
        encryption_result = client.encrypt(KeyId=key_id, Plaintext=row)
        blob = encryption_result['CiphertextBlob']
        encrypted_row = base64.b64encode(blob)       
        return encrypted_row
       
    except:
        return 'Error on get_kms_encryption function'

If your application requires access to the unencrypted text, and because access to the AWS KMS encryption key exists, you can use the following excerpt example to access the information:

client.decrypt(CiphertextBlob=base64.b64decode(data_encrypted))
print(decrypted['Plaintext'])

After performing all the above steps, the datasets are fully anonymized with tables created in Data Catalog and data stored in the respective S3 buckets. These are the buckets where fine-grained access controls are applied through Lake Formation:

  • Masked data — s3://dcp-athena-<AWS_REGION>-<ACCOUNT_ID>/masked/
  • Encrypted data — s3://dcp-athena-<AWS_REGION>-<ACCOUNT_ID>/encrypted/

Now that the tables are defined, you refine the permissions using Lake Formation.

Enable Lake Formation fine-grained access

After the data is processed and stored, you use Lake Formation to define and enforce fine-grained access permissions and provide secure access to data analysts and data scientists.

To enable fine-grained access, you first add a user (secure-lf-admin) to Lake Formation:

  1. In the Lake Formation console, clear Add myself and select Add other AWS users or roles.
  2. From the drop-down menu, select secure-lf-admin.
  3. Choose Get started.
    Figure 5: Lake Formation deployment process

    Figure 5: Lake Formation deployment process

Grant access to different personas

Before you grant permissions to different user personas, you must register Amazon S3 locations in Lake Formation so that the personas can access the data. All buckets have been created with the following pattern <prefix>-<bucket_name>-<aws_region>-<account_id>, where <prefix> matches the prefix you selected when you deployed the Cloudformation template and <aws_region> corresponds to the selected AWS Region (for example, ap-southeast-1), and <account_id> is the 12 numbers that match your AWS account (for example, 123456789012). For ease of reading, we left only the initial part of the bucket name in the following instructions.

  1. In the Lake Formation console, on the navigation bar, on the Register and ingest option, select Data Lake locations.
  2. Choose Register location.
  3. Select the dcp-glue bucket and choose Register Location.
  4. Repeat for the dcp-macie/dataset, dcp-athena/masked, and dcp-athena/encrypted prefixes.
    Figure 6: Amazon S3 locations registered in the solution

    Figure 6: Amazon S3 locations registered in the solution

You’re now ready to grant access to different users.

Granting per-user granular access

After successfully deploying the AWS services described in the CloudFormation template, you must configure access to resources that are part of the proposed solution.

Grant read-only accesses to all tables for secure-lf-admin

Before proceeding you must sign in as the secure-lf-admin user. To do this, sign out from the AWS console and sign in again using the secure-lf-admin credential and password that you set in the CloudFormation template.

Now that you’re signed in as the user who administers the data lake, you can grant read-only access to all tables in the dataset database to the secure-lf-admin user.

  1. In the Permissions section, select Data Lake permissions, and then choose Grant.
  2. Select IAM users and roles.
  3. Select the secure-lf-admin user.
  4. Under LF-Tags or catalog resources, select Named data catalog resources.
  5. Select the database dataset.
  6. For Tables, select All tables.
  7. In the Table permissions section, select Alter and Super.
  8. Under Grantable permissions, select Alter and Super.
  9. Choose Grant.

You can confirm your user permissions on the Data Lake permissions page.

Create tags to grant access

Return to the Lake Formation console to define tag-based access control for users. You can assign policy tags to Data Catalog resources (databases, tables, and columns) to control access to this type of resources. Only users who receive the corresponding Lake Formation tag (and those who receive access with the resource method named) can access the resources.

  1. Open the Lake Formation console, then on the navigation bar, under Permissions, select LF-tags.
  2. Choose Add LF Tag. In the dialog box Add LF-tag, for Key, enter data, and for Values, enter mask. Choose Add, and then choose Add LF-Tag.
  3. Follow the same steps to add a second tag. For Key, enter segment, and for Values enter campaign.

Assign tags to users and databases

Now grant read-only access to the masked data to the secure-lf-data-scientist user.

  1. In the Lake Formation console, on the navigation bar, under Permissions, select Data Lake permissions
  2. Choose Grant.
  3. Under IAM users and roles, select secure-lf-data-scientist as the user.
  4. In the LF-Tags or catalog resources section, select Resources matched by LF-Tags and choose add LF-Tag. For Key, enter data and for Values, enter mask.
    Figure 7: Creating resource tags for Lake Formation

    Figure 7: Creating resource tags for Lake Formation

  5. In the section Database permissions, in the Database permissions part and in Grantable permissions, select Describe.
  6. In the section Table permissions, in the Table permissions part and in Grantable permissions, select Select.
  7. Choose Grant.
    Figure 8: Database and table permissions granted

    Figure 8: Database and table permissions granted

To complete the process and give the secure-lf-data-scientist user access to the dataset_masked database, you must assign the tag you created to the database.

  1. On the navigation bar, under Data Catalog, select Databases.
  2. Select dataset_masked and select Actions. From the drop-down menu, select Edit LF-Tags.
  3. In the section Edit LF-Tags: dataset_masked, choose Assign new LF-Tag. For Key, enter data, and for Values, enter mask. Choose Save.

Grant read-only accesses to secure-lf-business-analyst

Now grant the secure-lf-business-analyst user read-only access to certain encrypted columns using column-based permissions.

  1. In the Lake Formation console, under Data Catalog, select Databases.
  2. Select the database dataset_encrypted and then select Actions. From the drop-down menu, choose Grant.
  3. Select IAM users and roles.
  4. Choose secure-lf-business-analyst.
  5. In the LF-Tags or catalog resources section, select Named data catalog resources.
  6. In the Database permissions section, in the Database permissions section and in Grantable permissions, select Describe and Alter.
  7. Choose Grant.

Now give the secure-lf-business-analyst user access to the Customer table, except for the username column.

  1. In the Lake Formation console, under Data Catalog, select Databases.
  2. Select the database dataset_encrypted and then, choose View tables.
  3. From the Actions option in the drop-down menu, select Grant.
  4. Select IAM users and roles.
  5. Select secure-lf-business-analyst.
  6. In the LF-Tags or catalog resources part, select Named data catalog resources.
  7. In the Database section, leave the dataset_encrypted selected.
  8. In the tables section, select the customer table.
  9. In the Table permission section, in the Table permission section and in Grantable permissions, choose Select.
  10. In the Data Permissions section, select Column-based access.
  11. Select Include columns and select the idusername, mail, and gender columns, which are the data-less columns encrypted for the secure-lf-business-analyst user to have access to.
  12. Choose Grant.
    Figure 9: Granting access to secure-lf-business-analyst user in the Customer table

    Figure 9: Granting access to secure-lf-business-analyst user in the Customer table

Now give the secure-lf-business-analyst user access to the table Card, only for columns that do not contain PII information.

  1. In the Lake Formation console, under Data Catalog, choose Databases.
  2. Select the database dataset_encrypted and choose View tables.
  3. Select the table Card.
  4. In the Schema section, choose Edit schema.
  5. Select the cred_card_provider column, which is the column that has no PII data.
  6. Choose Edit tags.
  7. Choose Assign new LF-Tag.
  8. For Assigned keys, enter segment and for Values, enter mask.
    Figure 10: Editing tags in Lake Formation tables

    Figure 10: Editing tags in Lake Formation tables

  9. Choose Save, and then choose Save as new version.

In this step you add the segment tag in the column cred_card_provider to the card table. For the user secure-lf-business-analyst to have access, you need to configure this tag for the user.

  1. In the Lake Formation console, under Permissions, select Data Lake permissions.
  2. Choose Grant.
  3. Under IAM users and roles, select secure-lf-business-analyst as the user.
  4. In the LF-Tags or catalog resources section, select Resources matched by LF-Tags, and choose add LF-tag and for as Key enter segment and for Values, enter campaign.
    Figure 11: Configure tag-based access for user secure-lf-business-analyst

    Figure 11: Configure tag-based access for user secure-lf-business-analyst

  5. In the Database permissions section, in the Database permissions part and in Grantable permissions, select Describe from both options.
  6. In the Table permission section, in the Table permission part as well as in Grantable permissions, choose Select.
  7. Choose Grant.

The next step is to revoke Super access to the IAMAllowedPrincipals group.

The IAMAllowedPrincipals group includes all IAM users and roles that are allowed access to Data Catalog resources using IAM policies. The Super permission allows a principal to perform all operations supported by Lake Formation on the database or table on which it is granted. These settings provide access to Data Catalog resources and Amazon S3 locations controlled exclusively by IAM policies. Therefore, the individual permissions configured by Lake Formation are not considered, so you will remove the concessions already configured by the IAMAllowedPrincipals group, leaving only the Lake Formation settings.

  1. In the Databases menu, select the database dataset, and then select Actions. From the drop-down menu, select Revoke.
  2. In the Principals section, select IAM users and roles, and then select the IAMAllowedPrincipals group as the user.
  3. Under LF-Tags or catalog resources, select Named data catalog resources.
  4. In the Database section, leave the dataset option selected.
  5. Under Tables, select the following tables: bank, card, and customer.
  6. In the Table permissions section, select Super.
  7. Choose Revoke.

Repeat the same steps for the dataset_encrypted and dataset_masked databases.

Figure 12: Revoke SUPER access to the IAMAllowedPrincipals group

Figure 12: Revoke SUPER access to the IAMAllowedPrincipals group

You can confirm all user permissions on the Data Permissions page.

Querying the data lake using Athena with different personas

To validate the permissions of different personas, you use Athena to query the Amazon S3 data lake.

Ensure the query result location has been created as part of the CloudFormation stack (secure-athena-query-<ACCOUNT_ID>-<AWS_REGION>).

  1. Sign in to the Athena console with secure-lf-admin (use the password value for TestUserPassword from the CloudFormation stack) and verify that you are in the AWS Region used in the query result location.
  2. On the navigation bar, choose Query editor.
  3. Choose Setting to set up a query result location in Amazon S3, and then choose Browse S3 and select the bucket secure-athena-query-<ACCOUNT_ID>-<AWS_REGION>.
  4. Run a SELECT query on the dataset.
    SELECT * FROM "dataset"."bank" limit 10;

The secure-lf-admin user should see all tables in the dataset database and dcp. As for the banks dataset_encrypted and dataset_masked, the user should not have access to the tables.

Figure 13: Athena console with query results in clear text

Figure 13: Athena console with query results in clear text

Finally, validate the secure-lf-data-scientist permissions.

  1. Sign in to the Athena console with secure-lf-data-scientist (use the password value for TestUserPassword from the CloudFormation stack) and verify that you are in the correct Region.
  2. Run the following query:
    SELECT * FROM “dataset_masked”.”bank” limit 10;

The user secure-lf-data-scientist will only be able to view all the columns in the database dataset_masked.

Figure 14: Athena query results with masked data

Figure 14: Athena query results with masked data

Now, validate the secure-lf-business-analyst user permissions.

  1. Sign in to the Athena console as secure-lf-business-analyst (use the password value for TestUserPassword from the CloudFormation stack) and verify that you are in the correct Region.
  2. Run a SELECT query on the dataset.
    SELECT * FROM “dataset_encrypted”.”card” limit 10;

    Figure 15: Validating secure-lf-business-analyst user permissions to query data

    Figure 15: Validating secure-lf-business-analyst user permissions to query data

The user secure-lf-business-analyst should only be able to view the card and customer tables of the dataset_encrypted database. In the table card, you will only have access to the cred_card_provider column and in the table Customer, you will have access only in the username, mail, and sex columns, as previously configured in Lake Formation.

Cleaning up the environment

After testing the solution, remove the resources you created to avoid unnecessary expenses.

  1. Open the Amazon S3 console.
  2. Navigate to each of the following buckets and delete all the objects within:
    1. dcp-assets-<AWS_REGION>-<ACCOUNT_ID>
    2. dcp-athena-<AWS_REGION>-<ACCOUNT_ID>
    3. dcp-glue-<AWS_REGION>-<ACCOUNT_ID>
    4. dcp-macie-<AWS_REGION>-<ACCOUNT_ID>
  3. Open the CloudFormation console.
  4. Select the Stacks option from the navigation bar.
  5. Select the stack that you created in Deploy the CloudFormation Template.
  6. Choose Delete, and then choose Delete Stack in the pop-up window.
  7. If you also want to delete the bucket that was created, go to Amazon S3 and delete it from the console or by using the AWS CLI.
  8. To remove the settings made in Lake Formation, go to the Lake Formation dashboard, and remove the data lake locales and the Lake Formation administrator.

Conclusion 

Now that the solution is implemented, you have an automated anonymization dataflow. This solution demonstrates how you can build a solution using AWS serverless solutions where you only pay for what you use and without worrying about infrastructure provisioning. In addition, this solution is customizable to meet other data protection requirements such as General Data Protection Law (LGPD) in Brazil, General Data Protection Regulation in Europe (GDPR), and the Association of Banks in Singapore (ABS) Cloud Computing Implementation Guide.

We used Macie to identify the sensitive data stored in Amazon S3 and AWS Glue to generate Macie reports to anonymize the sensitive data found. Finally, we used Lake Formation to implement fine-grained data access control to specific information and demonstrated how you can programmatically grant access to applications that need to work with unmasked data.

Related links

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Iris Ferreira

Iris Ferreira

Iris is a solutions architect at AWS, supporting clients in their innovation and digital transformation journeys in the cloud. In her free time, she enjoys going to the beach, traveling, hiking and always being in contact with nature.

Paulo Aragão

Paulo Aragão

Paulo is a Principal Solutions Architect and supports clients in the financial sector to tread the new world of DeFi, web3.0, Blockchain, dApps, and Smart Contracts. In addition, he has extensive experience in high performance computing (HPC) and machine learning. Passionate about music and diving, he devours books, plays World of Warcraft and New World, and cooks for friends.

Leo da Silva

Leo da Silva

Leo is a Principal Security Solutions Architect at AWS and uses his knowledge to help customers better use cloud services and technologies securely. Over the years, he had the opportunity to work in large, complex environments, designing, architecting, and implementing highly scalable and secure solutions to global companies. He is passionate about football, BBQ, and Jiu Jitsu — the Brazilian version of them all.

Building sensitive data remediation workflows in multi-account AWS environments

Post Syndicated from Darren Roback original https://aws.amazon.com/blogs/security/building-sensitive-data-remediation-workflows-in-multi-account-aws-environments/

The rapid growth of data has empowered organizations to develop better products, more personalized services, and deliver transformational outcomes for their customers. As organizations use Amazon Web Services (AWS) to modernize their data capabilities, they can sometimes find themselves with data spread across several AWS accounts, each aligned to distinct use cases and business units. This can present a challenge for security professionals, who need not only a mechanism to identify sensitive data types—such as protected health information (PHI), payment card industry (PCI), and personally identifiable information (PII), or organizational intellectual property—stored on AWS, but also the ability to automatically act upon these findings through custom logic that supports organizational policies and regulatory requirements.

In this blog post, we present a solution that provides you with visibility into sensitive data residing across a fleet of AWS accounts through a ChatOps-style notification mechanism using Microsoft Teams, which also provides contextual information needed to conduct security investigations. This solution also incorporates a decision logic mechanism to automatically quarantine sensitive data assets while they’re pending review, which can be tailored to meet unique organizational, industry, or regulatory environment requirements.

Prerequisites

Before you proceed, ensure that you have the following within your environment:

Assumptions

Things to know about the solution in this blog post:

Solution overview

The solution architecture and overall workflow are detailed in Figure 1 that follows.

Figure 1: Solution overview

Figure 1: Solution overview

Upon discovering sensitive data in member accounts, this solution selectively quarantines objects based on their finding severity and public status. This logic can be customized to evaluate additional details such as the finding type or number of sensitive data occurrences detected. The ability to adjust this workflow logic can provide a custom-tailored solution to meet a variety of industry use cases, helping you adhere to industry-specific security regulations and frameworks.

Figure 1 provides an overview of the components used in this solution, and for illustrative purposes we step through them here.

Automated scanning of buckets

Macie supports various scope options for sensitive data discovery jobs, including the use of bucket tags to determine in-scope buckets for scanning. Setting up automated scanning includes the use of an AWS Organizations tag policy to verify that the S3 buckets deployed in your chosen OUs conform to tagging requirements, an AWS Config job to automatically check that tags have been applied correctly, an Amazon EventBridge rule and bus to receive compliance events from a fleet of member accounts, and an AWS Lambda function to notify administrators of compliance change events.

  1. An AWS Organizations tag policy verifies that the S3 buckets created in the in-scope AWS accounts have a tag structure that facilitates automated scanning and identification of the bucket owner. Specific tags enforced with this tag policy include RequireMacieScan : True|False and BucketOwner : <variable>. The tag policy is enforced at the OU level.
  2. A custom AWS Config rule evaluates whether these tags have been applied to all in-scope S3 buckets, and marks resources as compliant or not compliant.
  3. After every evaluation of S3 bucket tag compliance, AWS Config will send compliance events to an EventBridge event bus in the same AWS member account.
  4. An EventBridge rule is used to send compliance messages from member accounts to a centralized security account, which is used for operational administration of the solution.
  5. An EventBridge rule in the centralized security account is used to send all tag compliance events to a Lambda serverless function, which is used for notification.
  6. The tag compliance notification function receives the tag compliance events from EventBridge, parses the information, and posts it to a Microsoft Teams webhook. This provides you with details such as the affected bucket, compliance status, and bucket owner, along with a link to investigate the compliance event in AWS Config.

Detecting sensitive data

Macie is a key component of this solution and facilitates sensitive data discovery across a fleet of AWS accounts based on the RequireMacieScan tag value configured at the time of bucket creation. Setting up sensitive data detection includes using Macie for sensitive data discovery, an EventBridge rule and bus to receive these finding events from the Macie delegated administrator account, an AWS Step Functions state machine to process these findings, and a Lambda function to notify administrators of new findings.

  1. Macie is used to detect sensitive data in member account S3 buckets with job definitions based on bucket tags, and central reporting of findings through a Macie delegated administrator account (that is, the central security account).
  2. When new findings are detected, Macie will send finding events to an EventBridge event bus in the security account.
  3. An EventBridge rule is used to send all finding events to AWS Step Functions, which runs custom business logic to evaluate each finding and determine the next action to take on the affected object.
  4. In the default configuration, for findings that are of low or medium severity and in objects that are not publicly accessible, Step Functions sends finding event information to a Lambda serverless function, which is used for notification. You can alter this event processing logic in Step Functions to change which finding types initiate an automated quarantine of the object.
  5. The Macie finding notification function receives the finding events from Step Functions, parses this information, and posts it to a Microsoft Teams webhook. This presents you with details such as the affected bucket and AWS account, finding ID and severity, public status, and bucket owner, along with a link to investigate the finding in Macie.

Automated quarantine

As outlined in the previous section, this solution uses event processing decision logic in Step Functions to determine whether to quarantine a sensitive object in place. Setting up automated quarantine includes a Step Functions state machine for Macie finding processing logic, an Amazon DynamoDB table to track items moved to quarantine, a Lambda function to notify administrators of quarantine, and an Amazon API Gateway and second Step Functions state machine to facilitate remediation or removal of objects from quarantine.

  1. In the default configuration for findings that are of high severity, or in objects that are publicly accessible, Step Functions adds the affected object’s metadata to an Amazon DynamoDB table, which is used to track quarantine and remediation status at scale.
  2. Step Functions then quarantines the affected object, moving it to an automatically configured and secured prefix in the same S3 bucket while the finding is being investigated. Only select personnel (that is, cybersecurity) has access to the object.
  3. Step Functions then sends finding event information to a Lambda serverless function, which is used for notification.
  4. The Macie quarantine notification function receives the finding events from Step Functions, parses this information, and posts it to a Microsoft Teams webhook. This presents you with similar details to the Macie finding notification function, but also notes the object has been moved to quarantine, and provides a one-click option to release the object from quarantine.
  5. In the Microsoft Teams channel, which should only be accessible to qualified security professionals, DLP administrators select the option to release the object from quarantine. This invokes a REST API deployed on API Gateway.
  6. API Gateway invokes a release object workflow in Step Functions, which begins to process releasing the affected object from quarantine.
  7. Step Functions inspects the affected object ID received through API Gateway, and first retrieves details about this object from DynamoDB. Upon receiving these details, the object is removed from the quarantine tracking database.
  8. Step Functions then moves the affected object to its original location in Amazon S3, thereby making the object accessible again under the original bucket policy and IAM permissions.

Organization structure

As mentioned in the prerequisites, the solution uses a set of AWS accounts that have been joined to an organization, which is shown in Figure 2. While the logical structure of your AWS Organizations deployment can differ from what’s shown, for illustration purposes, we’re looking for sensitive data in the Development and Production AWS accounts, and the examples shown throughout the remainder of this blog post reflect that.

Figure 2: Organization structure

Figure 2: Organization structure

Deploy the solution

The overall deployment process of this solution has been decomposed into three AWS CloudFormation templates to be deployed to your management, security, and member accounts as CloudFormation stacks and StackSets, respectively. Performing the deployment in this manner not only verifies that the solution is extended to other member accounts created after the initial solution deployment, but also serves as an illustrative aid of the components deployed in each portion of the solution. An overview of the deployment process is as follows:

  1. Set up of Microsoft Teams webhooks to receive information from this solution.
  2. Deployment of a CloudFormation stack to the management account to configure the tag policy for this solution.
  3. Deployment of a CloudFormation stack set to member accounts to enable monitoring of S3 bucket tags and forwarding of tag compliance events to the security account.
  4. Deployment of a CloudFormation stack to the security account to configure the remainder of the solution that will facilitate sensitive data discovery, finding event processing, administrator notification, and release from quarantine functionality.
  5. Remaining manual configuration of quarantine remediation API authorization, enabling Macie, and specifying a Macie results location.

Set up Microsoft Teams

Before deploying the solution, you must create two incoming webhooks in a Microsoft Teams channel of your choice. Due to the sensitivity of the event information provided and the ability to release objects from quarantine, we recommend that this channel only be accessible to information security professionals. In addition, we recommend creating two distinct webhooks to distinguish tag compliance events from finding events and have named the webhooks in the examples S3-Tag-Compliance-Notification and Macie-Finding-Notification, respectively. A complete walkthrough of creating an incoming webhook is out of scope for this blog post, but you can access Microsoft’s documentation on creating incoming webhooks for an overview. After the webhooks have been created, save the URLs, to use in the solution deployment process.

Configure the management account

The first step of the deployment process is to deploy a CloudFormation stack in your management account that creates an AWS Organizations tag policy and applies it to the OUs of your choice. Before performing this step, note the two OU IDs you will apply the policy to, as these will be captured as input parameters for the CloudFormation stack.

  1. Choose the following Launch Stack button to open the CloudFormation console pre-loaded with the template for this step:

    Launch Stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, change your regional selection in the CloudFormation console, and deploy it to the selected Region.

  2. For Stack name, enter ConfigureManagementAccount.
  3. For First AWS Organizations Unit ID, enter your first OU ID.
  4. For Second AWS Organizations Unit ID, enter your second OU ID.
  5. Choose Next.
    Figure 3: Management account stack details

    Figure 3: Management account stack details

After you’ve entered all details, launch the stack and wait until the stack has reached CREATE_COMPLETE status before proceeding. The deployment process will take 1–2 minutes.

Configure the member accounts

The next step of the deployment process is to deploy a CloudFormation Stack, which will initiate a StackSet deployment from your management account that’s scoped to the OUs of your choice. This stack set will enable AWS Config along with an AWS Config rule to evaluate Amazon S3 tag compliance and will deploy an EventBridge rule to send compliance events from AWS Config in your member accounts to a centralized event bus in your security account. If AWS Config has previously been enabled, select True for the AWS Config Status parameter to help prevent an overwrite of your existing settings. Prior to performing this setup, note the two AWS Organizations OU IDs you will deploy the stack set to. You will also be prompted to enter the AWS account ID and Region of your security account.

  1. Choose the following Launch Stack button to open the CloudFormation console pre-loaded with the template for this step:

    Launch Stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, change your regional selection in the CloudFormation console, and deploy it to the selected Region.

  2. For Stack name, enter DeployToMemberAccounts.
  3. For First AWS Organizations Unit ID, enter your first OU ID.
  4. For Second AWS Organizations Unit ID, enter your second OU ID.
  5. For Deployment Region, enter the Region you want to deploy the Stack set to.
  6. For AWS Config Status, accept the default value of false if you have not previously enabled AWS Config in your accounts.
  7. For Support all resource types, accept the default value of false.
  8. For Include global resource types, accept the default value of false.
  9. For List of resource types if not all supported, accept the default value of AWS::S3::Bucket.
  10. For Configuration delivery channel name, accept the default value of <Generated>.
  11. For Snapshot delivery frequency, accept the default value of 1 hour.
  12. For Security account ID, enter your security account ID.
  13. For Security account region, select the Region of your security account.
  14. Choose Next.
    Figure 4: Member account Stack details

    Figure 4: Member account Stack details

After you’ve entered all details, launch the stack and wait until the stack has reached CREATE_COMPLETE status before proceeding. The deployment process will take 3–5 minutes.

Configure the security account

The next step of the deployment process involves deploying a CloudFormation stack in your security account that creates all the resources needed for at-scale sensitive data detection, automated quarantine, and security professional notification and response. This stack configures the following:

  • An S3 bucket and AWS Key Management Service (AWS KMS) keys for storage and encryption of discovery results in Macie.
  • Two rules in EventBridge for routing tag compliance and Macie finding events.
  • Two Step Functions state machines whose logic will be used for automated object quarantine and release.
  • Three Lambda functions for tag compliance, Macie findings, and quarantine notification.
  • A DynamoDB table for quarantine status tracking.
  • An API Gateway REST endpoint to facilitate the release of objects from quarantine.

Before performing this setup, note your AWS Organizations ID and two Microsoft Teams webhook URLs previously configured.

  1. Choose the following Launch Stack button to open the CloudFormation console pre-loaded with the template for this step:

    Launch Stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, change your regional selection in the CloudFormation console, and deploy it to the selected Region.

  2. For Stack name, enter ConfigureSecurityAccount.
  3. For AWS Org ID, enter your AWS Organizations ID.
  4. For Webhook URI for S3 tag compliance notifications, enter the first webhook URL you created in Microsoft Teams.
  5. For Webhook URI for Macie finding and quarantine notifications, enter the second webhook URL you created in Microsoft Teams.
  6. Choose Next.
    Figure 5: Security account stack details

    Figure 5: Security account stack details

After you’ve entered all details, launch the stack and wait until the stack has reached CREATE_COMPLETE status before proceeding. The deployment process will take 3–5 minutes.

Remaining configuration

While most of the solution is deployed automatically using CloudFormation, there are a few items that you must configure manually.

Configure Lambda API key environment variable

When the CloudFormation stack was deployed to the security account, CloudFormation created a new REST API for security professionals to release objects from quarantine. This API was configured with an API key to be used for authorization, and you must retrieve the value of this API key and set it as an environment variable in your MacieQuarantineNotification function, which also deployed in the security account. To retrieve the value of this API key, navigate to the REST API created in the security account, select API Keys, and retrieve the value of APIKey1. Next, navigate to the MacieQuarantineNotification function in the Lambda console, and set the ReleaseObjectApiKey environment variable to the value of your API key.

Enable Macie

Next, you must enable Macie to facilitate sensitive data discovery in selected accounts in your organization, and this process begins with the selection of a delegated administrator account (that is, the security account), followed by onboarding the member accounts you want to test with. See Integrating and configuring an organization in Amazon Macie for detailed instructions on enabling Macie in your organization.

Configure the Macie results bucket and KMS key

Macie creates an analysis record for each Amazon S3 object that it analyzes. This includes objects where Macie has detected sensitive data as well as objects where sensitive data was not detected or that Macie could not analyze. The CloudFormation stack deployed in the security account created an S3 bucket and KMS key for this, and they are noted as MacieResultsS3BucketName and MacieResultsKmsKeyAlias in the CloudFormation stack output. Use these resources to configure the Macie results bucket and KMS key in the security account according to Storing and retaining sensitive data discovery results with Amazon Macie. Customization of the S3 bucket policy or KMS key policy has already been done for you as part of the ConfigureSecurityAccount CloudFormation template deployed earlier.

Validate the solution

With the solution fully deployed, you now need to deploy an S3 bucket with sample data to test the solution and review the findings.

Create a member account S3 bucket

In any of the member accounts onboarded into this solution as part of the Configure the member accounts step, deploy a new S3 bucket and the KMS key used to encrypt the bucket using the CloudFormation template that follows. Before performing this step, note the InvestigateMacieFindingsRole, StepFunctionsProcessFindingsRole, and StepFunctionsReleaseObjectRole outputs from the CloudFormation template deployed to the security account, as these will be captured as input parameters for the CloudFormation stack.

  1. Choose the following Launch Stack button to open the CloudFormation console pre-loaded with the template for this step:

    Launch Stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, change your regional selection in the CloudFormation console, and deploy it to the selected Region.

  2. For Stack name, enter DeployS3BucketKmsKey.
  3. For IAM Role used by Step Functions to process Macie findings, enter the ARN that was provided to you as the StepFunctionsProcessFindingsRole output from the Configure security account step.
  4. For IAM Role used by Step Functions to release objects from quarantine, enter the ARN that was provided to you as the StepFunctionsReleaseObjectRole output from the Configure security account step.
  5. For IAM Role used by security professionals to investigate Macie findings, enter the ARN that was provided to you as the InvestigateMacieFindingsRole output from the Configure security account step.
  6. For Department name of the bucket owner, enter any department or team name you want to designate as having ownership responsibility over the S3 bucket.
  7. Choose Next.
    Figure 6: Deploy S3 bucket and KMS key stack details

    Figure 6: Deploy S3 bucket and KMS key stack details

After you’ve entered all details, launch the stack and wait until the stack has reached CREATE_COMPLETE status before proceeding. The deployment process will take 3–5 minutes.

Monitor S3 bucket tag compliance

Shortly after the deployment of the new S3 bucket, you should see a message in your Microsoft Teams channel notifying you of the tag compliance status of the new bucket. This AWS Config rule is evaluated automatically any time an S3 resource event takes place, and the tag compliance event is sent to the centralized security account for notification purposes. While the notification shown in Figure 7 depicts a compliant S3 bucket, a bucket deployed without the required tags will be marked as NON_COMPLIANT, and security professionals can check the AWS Config compliance status directly in the AWS console for the member account.

Figure 7: S3 tag compliance notification

Figure 7: S3 tag compliance notification

Upload sample data

Download this .zip file of sample data and upload the expanded files into the newly created S3 bucket. The sample files include fictitious PII, including credit card information and social security numbers, and so will invoke various Macie findings.

Note: All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not, nor is it intended, to relate back to any individual person.

Configure a Macie discovery job

Configure a sensitive data discovery job in the Amazon Macie delegated administrator account (that is, the security account) according to Creating a sensitive data discovery job. When creating the job, specify tag-based bucket criteria instructing Macie to scan any bucket with a tag key of RequireMacieScan and a tag value of True. This instructs Macie to scan buckets matching this criterion across the accounts that have been onboarded into Macie.

Figure 8: Macie bucket criteria

Figure 8: Macie bucket criteria

On the discovery options page, specify a one-time job with a sampling depth of 100 percent. Further refine the job scope by adding the quarantine prefix to the exclusion criteria of the sensitive data discovery job.

Figure 9: Macie scan scope

Figure 9: Macie scan scope

Select the AWS_CREDENTIALS, CREDIT_CARD_NUMBER, CREDIT_CARD_SECURITY_CODE, and DATE_OF_BIRTH managed data identifiers and proceed to the review screen. On the review screen, ensure that the bucket name you created is listed under bucket matches, and launch the discovery job.

Note: This solution also works with the Automated Sensitive Data Discovery feature in Amazon Macie. I recommend you investigate this feature further for broad visibility into where sensitive data might reside within your Amazon S3 data estate. Regardless of the method you choose, you will be able to integrate the appropriate notification and quarantine solution.

Review and investigate findings

After a few minutes, the discovery job will complete and soon you should see four messages in your Microsoft teams channel notifying you of the finding events created by Macie. One of these findings will be marked as medium severity, while the other three will be marked as high.

Review the medium severity finding, and recall the flow of event information from the solution overview section. Macie scanned a bucket in the member account and presented this finding in the Macie delegated administrator account. Macie then sent this finding event to EventBridge, which initiated a workflow run in Step Functions. Step Functions invoked customer-specified logic to evaluate the finding and determined that because the object isn’t publicly accessible, and because the finding isn’t high severity, it should only notify security professionals rather than quarantine the object in question. Several key pieces of information necessary for investigation are presented to the security team, along with a link to directly investigate the finding in the AWS console of the central security account.

Figure 10: Macie medium severity finding notification

Figure 10: Macie medium severity finding notification

Now review the high severity finding. The flow of event information in this scenario is identical to the medium severity finding, but in this case, Step Functions quarantined the object because the severity is high. The security team is again presented with an option to use the console to investigate further. The process to investigate this finding is a bit different due to the object being moved to a quarantine prefix. If security professionals want to view the original object in its entirety, they would assume the InvestigateMacieFindingsRole in the security account, which has cross-account access to the S3 bucket quarantine prefix in the in-scope member accounts. S3 buckets deployed in member accounts using the CloudFormation template listed above will have a special bucket policy that denies access to the quarantine prefix for any role other than the InvestigateMacieFindingsRole, StepFunctionsProcessFindingsRole, and StepFunctionsReleaseObjectRole. This makes sure that objects are truly quarantined and inaccessible while being investigated by security professionals.

Figure 11: Macie high severity finding notification

Figure 11: Macie high severity finding notification

Unlike the previous example, the security team is also notified that an affected object was moved to quarantine, and is presented with a separate option to release the object from quarantine. Choosing Release Object from Quarantine runs an HTTP POST to the REST API transparently in the background, and the API responds with a SUCCEEDED or FAILED message indicating the result of the operation.

Figure 12: Release object from quarantine

Figure 12: Release object from quarantine

The state machine uses decision logic based on the affected object’s public status and the severity of the finding. Customers deploying this solution can choose to alter this logic or add additional customization by altering the Step Functions state machine definition either directly in the CloudFormation template or through the Step Functions Workflow Studio low-code interface available in the Step Functions console. For reference, the full event schema used by Macie can be found in the Eventbridge event schema for Macie findings.

The logic of the Step Functions state machine used to process Macie finding events follows and is shown in Figure 13.

  1. An EventBridge rule invokes the Step Functions state machine as Macie findings are received.
  2. Step Functions parses Macie event data to determine the finding severity.
  3. If the finding is high severity or determined to be public, the affected object is then added to the quarantine tracking database in DynamoDB.
  4. After adding the object to quarantine tracking, the object is copied into the quarantine prefix within its S3 bucket.
  5. After being copied, the object is deleted from its original S3 location.
  6. After the object is deleted, the MacieQuarantineNotification function is invoked to alert you of the finding and quarantine status.
  7. If the finding is not high severity and not determined to be public, the MacieFindingNotification function is invoked to alert you of the finding.
    Figure 13: Process Macie findings workflow

    Figure 13: Process Macie findings workflow

Solution cleanup

To remove the solution and avoid incurring additional charges for the AWS resources used in this solution, perform the following steps.

Note: If you want to only suspend Macie, which will preserve your settings, resources, and findings, see Suspending or disabling Amazon Macie.

  1. Open the Macie console in your security account. Under Settings, choose Accounts. Select the checkboxes next to the member accounts onboarded previously, select the Actions dropdown, and select Disassociate Account. When that has completed, select the same accounts again, and choose Delete.
  2. Open the Macie console in your management account. Click on Get started, and remove your security account as the Macie delegated administrator.
  3. Open the Macie console in your security account, choose Settings, then choose Disable Macie.
  4. Open the S3 console in your security account. Remove all objects from the Macie results S3 bucket.
  5. Open the CloudFormation console in your security account. Select the ConfigureSecurityAccount stack and choose Delete.
  6. Open the Macie console in your member accounts. Under Settings, choose Disable Macie.
  7. Open the Amazon S3 console in your member accounts. Remove all sample data objects from the S3 bucket.
  8. Open the CloudFormation console in your member accounts. Select the DeployS3BucketKmsKey stack and choose Delete.
  9. Open the CloudFormation console in your management account. Select the DeployToMemberAccounts stack and choose Delete.
  10. Still in the CloudFormation console in your management account, select the ConfigureManagementAccount stack and choose Delete.

Summary

In this blog post, we demonstrated a solution to process—and act upon—sensitive data discovery findings surfaced by Macie through the incorporation of decision logic implemented in Step Functions. This logic provides granularity on quarantine decisions and extensibility you can use to customize automated finding response to suit your business and regulatory needs.

In addition, we demonstrated a mechanism to notify you of finding event information as it’s discovered, while providing you with the contextual details necessary for investigative purposes. Additional customization of the information presented in Microsoft Teams can be accomplished through parsing of the event payload. You can also customize the Lambda functions to interface with Slack, as demonstrated in this sample solution for Macie.

Finally, we demonstrated a solution that can operate at-scale, and will automatically be deployed to new member accounts in your organization. By using an in-place quarantine strategy instead of one that is centralized, you can more easily manage this solution across potentially hundreds of AWS accounts and thousands of S3 buckets. By incorporating a global tracking database in DynamoDB, this solution can also be enhanced through a visual dashboard depicting quarantine insights.

If you have feedback about this post, let us know in the comments section. If you have questions about this post, start a new thread on AWS Security, Identity, & Compliance re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Darren Roback

Darren Roback

Darren is a Senior Solutions Architect with Amazon Web Services based in St. Louis, Missouri. He has a background in Security and Compliance, Serverless Event-Driven Architecture, and Enterprise Architecture. At AWS, Darren partners with customers to help them solve business challenges with AWS technology. Outside of work, Darren enjoys spending time in his shop working on woodworking projects.

Seth Malone

Seth Malone

Seth is a Solutions Architect with Amazon Web Services based in Missouri. He has experience maintaining secure workloads across healthcare, financial, and engineering firms. Seth enjoys the outdoors, and can be found on the water after building solutions with his customers.

Chuck Enstall

Chuck Enstall

Chuck is a Principal Solutions Architect with AWS based in St. Louis, MO, focusing on enterprise customers. He has decades of experience in security across multiple clouds, and holds numerous security certifications such as CISSP, CCSP, and others. As a former Navy veteran, Chuck enjoys mentoring other veterans as they explore possible careers within the security field.

How to use Amazon Macie to reduce the cost of discovering sensitive data

Post Syndicated from Nicholas Doropoulos original https://aws.amazon.com/blogs/security/how-to-use-amazon-macie-to-reduce-the-cost-of-discovering-sensitive-data/

Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to discover and help protect your sensitive data, such as personally identifiable information (PII), payment card data, and Amazon Web Services (AWS) credentials. Analyzing large volumes of data for the presence of sensitive information can be expensive, due to the nature of compute-intensive operations involved in the process.

Macie offers several capabilities to help customers reduce the cost of discovering sensitive data, including automated data discovery, which can reduce your spend with new data sampling techniques that are custom-built for Amazon Simple Storage Service (Amazon S3). In this post, we will walk through such Macie capabilities and best practices so that you can cost-efficiently discover sensitive data with Macie.

Overview of the Macie pricing plan

Let’s do a quick recap of how customers pay for the Macie service. With Macie, you are charged based on three dimensions: the number of S3 buckets evaluated for bucket inventory and monitoring, the number of S3 objects monitored for automated data discovery, and the quantity of data inspected for sensitive data discovery. You can read more about these dimensions on the Macie pricing page.

The majority of the cost incurred by customers is driven by the quantity of data inspected for sensitive data discovery. For this reason, we will limit the scope of this post to techniques that you can use to optimize the quantity of data that you scan with Macie.

Not all security use cases require the same quantity of data for scanning

Broadly speaking, you can choose to scan your data in two ways—run full scans on your data or sample a portion of it. When to use which method depends on your use cases and business objectives. Full scans are useful when customers have identified what they want to scan. A few examples of such use cases are: scanning buckets that are open to the internet, monitoring a bucket for unintentionally added sensitive data by scanning every new object, or performing analysis on a bucket after a security incident.

The other option is to use sampling techniques for sensitive data discovery. This method is useful when security teams want to reduce data security risks. For example, just knowing that an S3 bucket contains credit card numbers is enough information to prioritize it for remediation activities.

Macie offers both options, and you can discover sensitive data either by creating and running sensitive data discovery jobs that perform full scans on targeted locations, or by configuring Macie to perform automated sensitive data discovery for your account or organization. You can also use both options simultaneously in Macie.

Use automated data discovery as a best practice

Automated data discovery in Macie minimizes the quantity of data scanning that is needed to a fraction of your S3 estate.

When you enable Macie for the first time, automated data discovery is enabled by default. When you already use Macie in your organization, you can enable automatic data discovery in the management console of the Amazon Macie administrator account. This Macie capability automatically starts discovering sensitive data in your S3 buckets and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to identify data security risks that need immediate attention.

Automated data discovery in Macie starts to evaluate the level of sensitivity of each of your buckets by using intelligent and fully managed data sampling techniques to minimize the quantity of data scanning needed. During evaluation, objects are organized with similar S3 metadata, such as bucket names, object-key prefixes, file-type extensions, and storage class, into groups that are likely to have similar content. Macie then selects small, but representative, samples from each identified group of objects and scans them to detect the presence of sensitive data. Macie has a feedback loop that uses the results of previously scanned samples to prioritize the next set of samples to inspect.

The automated sensitive data discovery feature is designed to detect sensitive data at scale across hundreds of buckets and accounts, which makes it easier to identify the S3 buckets that need to be prioritized for more focused scanning. Because the amount of data that needs to be scanned is reduced, this task can be done at fraction of the cost of running a full data inspection across all your S3 buckets. The Macie console displays the scanning results as a heat map (Figure 1), which shows the consolidated information grouped by account, and whether a bucket is sensitive, not sensitive, or not analyzed yet.

Figure 1: A heat map showing the results of automated sensitive data discovery

Figure 1: A heat map showing the results of automated sensitive data discovery

There is a 30-day free trial period when you enable automatic data discovery on your AWS account. During the trial period, in the Macie console, you can see the estimated cost of running automated sensitive data discovery after the trial period ends. After the evaluation period, we charge based on the total quantity of S3 objects in your account, as well as the bytes that are scanned for sensitive content. Charges are prorated per day. You can disable this capability at any time.

Tune your monthly spend on automated sensitive data discovery

To further reduce your monthly spend on automated sensitive data, Macie allows you to exclude buckets from automated discovery. For example, you might consider excluding buckets that are used for storing operational logs, if you’re sure they don’t contain any sensitive information. Your monthly spend is reduced roughly by the percentage of data in those excluded buckets compared to your total S3 estate.

Figure 2 shows the setting in the heatmap area of the Macie console that you can use to exclude a bucket from automated discovery.

Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap

Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap

You can also use the automated data discovery settings page to specify multiple buckets to be excluded, as shown in Figure 3.

Figure 3: Excluding buckets from the automated sensitive data discovery settings page

Figure 3: Excluding buckets from the automated sensitive data discovery settings page

How to run targeted, cost-efficient sensitive data discovery jobs

Making your sensitive data discovery jobs more targeted make them more cost-efficient, because it reduces the quantity of data scanned. Consider using the following strategies:

  1. Make your sensitive data discovery jobs as targeted and specific as possible in their scope by using the Object criteria settings on the Refine the scope page, shown in Figure 4.
    Figure 4: Adjusting the scope of a sensitive data discovery job

    Figure 4: Adjusting the scope of a sensitive data discovery job

    Options to make discovery jobs more targeted include:

    • Include objects by using the “last modified” criterion — If you are aware of the frequency at which your classifiable S3-hosted objects get modified, and you want to scan the resources that changed at a particular point in time, include in your scope the objects that were modified at a certain date or time by using the “last modified” criterion.
    • Don’t scan CloudTrail logs — Identify the S3 bucket prefixes that contain AWS CloudTrail logs and exclude them from scanning.
    • Consider using random object sampling — With this option, you specify the percentage of eligible S3 objects that you want Macie to analyze when a sensitive data discovery job runs. If this value is less than 100%, Macie selects eligible objects to analyze at random, up to the specified percentage, and analyzes the data in those objects. If your data is highly consistent and you want to determine whether a specific S3 bucket, rather than each object, contains sensitive information, adjust the sampling depth accordingly.
    • Include objects with specific extensions, tags, or storage size — To fine tune the scope of a sensitive data discovery job, you can also define custom criteria that determine which S3 objects Macie includes or excludes from a job’s analysis. These criteria consist of one or more conditions that derive from properties of S3 objects. You can exclude objects with specific file name extensions, exclude objects by using tags as the criterion, and exclude objects on the basis of their storage size. For example, you can use a criteria-based job to scan the buckets associated with specific tag key/value pairs such as Environment: Production.
  2. Specify S3 bucket criteria in your job — Use a criteria-based job to scan only buckets that have public read/write access. For example, if you have 100 buckets with 10 TB of data, but only two of those buckets containing 100 GB are public, you could reduce your overall Macie cost by 99% by using a criteria-based job to classify only the public buckets.
  3. Consider scheduling jobs based on how long objects live in your S3 buckets. Running jobs at a higher frequency than needed can result in unnecessary costs in cases where objects are added and deleted frequently. For example, if you’ve determined that the S3 objects involved contain high velocity data that is expected to reside in your S3 bucket for a few days, and you’re concerned that sensitive data might remain, scheduling your jobs to run at a lower frequency will help in driving down costs. In addition, you can deselect the Include existing objects checkbox to scan only new objects.
    Figure 5: Specifying the frequency of a sensitive data discovery job

    Figure 5: Specifying the frequency of a sensitive data discovery job

  4. As a best practice, review your scheduled jobs periodically to verify that they are still meaningful to your organization. If you aren’t sure whether one of your periodic jobs continues to be fit for purpose, you can pause it so that you can investigate whether it is still needed, without incurring potentially unnecessary costs in the meantime. If you determine that a periodic job is no longer required, you can cancel it completely.
    Figure 6: Pausing a scheduled sensitive data discovery job

    Figure 6: Pausing a scheduled sensitive data discovery job

  5. If you don’t know where to start to make your jobs more targeted, use the results of Macie automated data discovery to plan your scanning strategy. Start with small buckets and the ones that have policy findings associated with them.
  6. In multi-account environments, you can monitor Macie’s usage across your organization in AWS Organizations through the usage page of the delegated administrator account. This will enable you to identify member accounts that are incurring higher costs than expected, and you can then investigate and take appropriate actions to keep expenditure low.
  7. Take advantage of the Macie pricing calculator so that you get an estimate of your Macie fees in advance.

Conclusion

In this post, we highlighted the best practices to keep in mind and configuration options to use when you discover sensitive data with Amazon Macie. We hope that you will walk away with a better understanding of when to use the automated data discovery capability and when to run targeted sensitive data discovery jobs. You can use the pointers in this post to tune the quantity of data you want to scan with Macie, so that you can continuously optimize your Macie spend.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Nicholas Doropoulos

Nicholas Doropoulos

Nicholas is an AWS Cloud Security Engineer, a Bestselling Udemy Instructor, and a subject matter expert in AWS Shield, GuardDuty and Certificate Manager. Outside work, he enjoys spending his time with his wife and their beautiful baby son.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Top 2022 AWS data protection service and cryptography tool launches

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/top-2022-aws-data-protection-service-and-cryptography-tool-launches/

Given the pace of Amazon Web Services (AWS) innovation, it can be challenging to stay up to date on the latest AWS service and feature launches. AWS provides services and tools to help you protect your data, accounts, and workloads from unauthorized access. AWS data protection services provide encryption capabilities, key management, and sensitive data discovery. Last year, we saw growth and evolution in AWS data protection services as we continue to give customers features and controls to help meet their needs. Protecting data in the AWS Cloud is a top priority because we know you trust us to help protect your most critical and sensitive asset: your data. This post will highlight some of the key AWS data protection launches in the last year that security professionals should be aware of.

AWS Key Management Service
Create and control keys to encrypt or digitally sign your data

In April, AWS Key Management Service (AWS KMS) launched hash-based message authentication code (HMAC) APIs. This feature introduced the ability to create AWS KMS keys that can be used to generate and verify HMACs. HMACs are a powerful cryptographic building block that incorporate symmetric key material within a hash function to create a unique keyed message authentication code. HMACs provide a fast way to tokenize or sign data such as web API requests, credit card numbers, bank routing information, or personally identifiable information (PII). This technology is used to verify the integrity and authenticity of data and communications. HMACs are often a higher performing alternative to asymmetric cryptographic methods like RSA or elliptic curve cryptography (ECC) and should be used when both message senders and recipients can use AWS KMS.

At AWS re:Invent in November, AWS KMS introduced the External Key Store (XKS), a new feature for customers who want to protect their data with encryption keys that are stored in an external key management system under their control. This capability brings new flexibility for customers to encrypt or decrypt data with cryptographic keys, independent authorization, and audit in an external key management system outside of AWS. XKS can help you address your compliance needs where encryption keys for regulated workloads must be outside AWS and solely under your control. To provide customers with a broad range of external key manager options, AWS KMS developed the XKS specification with feedback from leading key management and hardware security module (HSM) manufacturers as well as service providers that can help customers deploy and integrate XKS into their AWS projects.

AWS Nitro System
A combination of dedicated hardware and a lightweight hypervisor enabling faster innovation and enhanced security

In November, we published The Security Design of the AWS Nitro System whitepaper. The AWS Nitro System is a combination of purpose-built server designs, data processors, system management components, and specialized firmware that serves as the underlying virtualization technology that powers all Amazon Elastic Compute Cloud (Amazon EC2) instances launched since early 2018. This new whitepaper provides you with a detailed design document that covers the inner workings of the AWS Nitro System and how it is used to help secure your most critical workloads. The whitepaper discusses the security properties of the Nitro System, provides a deeper look into how it is designed to eliminate the possibility of AWS operator access to a customer’s EC2 instances, and describes its passive communications design and its change management process. Finally, the paper surveys important aspects of the overall system design of Amazon EC2 that provide mitigations against potential side-channel vulnerabilities that can exist in generic compute environments.

AWS Secrets Manager
Centrally manage the lifecycle of secrets

In February, AWS Secrets Manager added the ability to schedule secret rotations within specific time windows. Previously, Secrets Manager supported automated rotation of secrets within the last 24 hours of a specified rotation interval. This new feature added the ability to limit a given secret rotation to specific hours on specific days of a rotation interval. This helps you avoid having to choose between the convenience of managed rotations and the operational safety of application maintenance windows. In November, Secrets Manager also added the capability to rotate secrets as often as every four hours, while providing the same managed rotation experience.

In May, Secrets Manager started publishing secrets usage metrics to Amazon CloudWatch. With this feature, you have a streamlined way to view how many secrets you are using in Secrets Manager over time. You can also set alarms for an unexpected increase or decrease in number of secrets.

At the end of December, Secrets Manager added support for managed credential rotation for service-linked secrets. This feature helps eliminate the need for you to manage rotation Lambda functions and enables you to set up rotation without additional configuration. Amazon Relational Database Service (Amazon RDS) has integrated with this feature to streamline how you manage your master user password for your RDS database instances. Using this feature can improve your database’s security by preventing the RDS master user password from being visible during the database creation workflow. Amazon RDS fully manages the master user password’s lifecycle and stores it in Secrets Manager whenever your RDS database instances are created, modified, or restored. To learn more about how to use this feature, see Improve security of Amazon RDS master database credentials using AWS Secrets Manager.

AWS Private Certificate Authority
Create private certificates to identify resources and protect data

In September, AWS Private Certificate Authority (AWS Private CA) launched as a standalone service. AWS Private CA was previously a feature of AWS Certificate Manager (ACM). One goal of this launch was to help customers differentiate between ACM and AWS Private CA. ACM and AWS Private CA have distinct roles in the process of creating and managing the digital certificates used to identify resources and secure network communications over the internet, in the cloud, and on private networks. This launch coincided with the launch of an updated console for AWS Private CA, which includes accessibility improvements to enhance screen reader support and additional tab key navigation for people with motor impairment.

In October, AWS Private CA introduced a short-lived certificate mode, a lower-cost mode of AWS Private CA that is designed for issuing short-lived certificates. With this new mode, public key infrastructure (PKI) administrators, builders, and developers can save money when issuing certificates where a validity period of 7 days or fewer is desired. To learn more about how to use this feature, see How to use AWS Private Certificate Authority short-lived certificate mode.

Additionally, AWS Private CA supported the launches of certificate-based authentication with Amazon AppStream 2.0 and Amazon WorkSpaces to remove the logon prompt for the Active Directory domain password. AppStream 2.0 and WorkSpaces certificate-based authentication integrates with AWS Private CA to automatically issue short-lived certificates when users sign in to their sessions. When you configure your private CA as a third-party root CA in Active Directory or as a subordinate to your Active Directory Certificate Services enterprise CA, AppStream 2.0 or WorkSpaces with AWS Private CA can enable rapid deployment of end-user certificates to seamlessly authenticate users. To learn more about how to use this feature, see How to use AWS Private Certificate Authority short-lived certificate mode.

AWS Certificate Manager
Provision and manage SSL/TLS certificates with AWS services and connected resources

In early November, ACM launched the ability to request and use Elliptic Curve Digital Signature Algorithm (ECDSA) P-256 and P-384 TLS certificates to help secure your network traffic. You can use ACM to request ECDSA certificates and associate the certificates with AWS services like Application Load Balancer or Amazon CloudFront. Previously, you could only request certificates with an RSA 2048 key algorithm from ACM. Now, AWS customers who need to use TLS certificates with at least 120-bit security strength can use these ECDSA certificates to help meet their compliance needs. The ECDSA certificates have a higher security strength—128 bits for P-256 certificates and 192 bits for P-384 certificates—when compared to 112-bit RSA 2048 certificates that you can also issue from ACM. The smaller file footprint of ECDSA certificates makes them ideal for use cases with limited processing capacity, such as small Internet of Things (IoT) devices.

Amazon Macie
Discover and protect your sensitive data at scale

Amazon Macie introduced two major features at AWS re:Invent. The first is a new capability that allows for one-click, temporary retrieval of up to 10 samples of sensitive data found in Amazon Simple Storage Service (Amazon S3). With this new capability, you can more readily view and understand which contents of an S3 object were identified as sensitive, so you can review, validate, and quickly take action as needed without having to review every object that a Macie job returned. Sensitive data samples captured with this new capability are encrypted by using customer-managed AWS KMS keys and are temporarily viewable within the Amazon Macie console after retrieval.

Additionally, Amazon Macie introduced automated sensitive data discovery, a new feature that provides continual, cost-efficient, organization-wide visibility into where sensitive data resides across your Amazon S3 estate. With this capability, Macie automatically samples and analyzes objects across your S3 buckets, inspecting them for sensitive data such as personally identifiable information (PII) and financial data; builds an interactive data map of where your sensitive data in S3 resides across accounts; and provides a sensitivity score for each bucket. Macie uses multiple automated techniques, including resource clustering by attributes such as bucket name, file types, and prefixes, to minimize the data scanning needed to uncover sensitive data in your S3 buckets. This helps you continuously identify and remediate data security risks without manual configuration and lowers the cost to monitor for and respond to data security risks.

Support for new open source encryption libraries

In February, we announced the availability of s2n-quic, an open source Rust implementation of the QUIC protocol, in our AWS encryption open source libraries. QUIC is a transport layer network protocol used by many web services to provide lower latencies than classic TCP. AWS has long supported open source encryption libraries for network protocols; in 2015 we introduced s2n-tls as a library for implementing TLS over HTTP. The name s2n is short for signal to noise and is a nod to the act of encryption—disguising meaningful signals, like your critical data, as seemingly random noise. Similar to s2n-tls, s2n-quic is designed to be small and fast, with simplicity as a priority. It is written in Rust, so it has some of the benefits of that programming language, such as performance, threads, and memory safety.

Cryptographic computing for AWS Clean Rooms (preview)

At re:Invent, we also announced AWS Clean Rooms, currently in preview, which includes a cryptographic computing feature that allows you to run a subset of queries on encrypted data. Clean rooms help customers and their partners to match, analyze, and collaborate on their combined datasets—without sharing or revealing underlying data. If you have data handling policies that require encryption of sensitive data, you can pre-encrypt your data by using a common collaboration-specific encryption key so that data is encrypted even when queries are run. With cryptographic computing, data that is used in collaborative computations remains encrypted at rest, in transit, and in use (while being processed).

If you’re looking for more opportunities to learn about AWS security services, read our AWS re:Invent 2022 Security recap post or watch the Security, Identity, and Compliance playlist.

Looking ahead in 2023

With AWS, you control your data by using powerful AWS services and tools to determine where your data is stored, how it is secured, and who has access to it. In 2023, we will further the AWS Digital Sovereignty Pledge, our commitment to offering AWS customers the most advanced set of sovereignty controls and features available in the cloud.

You can join us at our security learning conference, AWS re:Inforce 2023, in Anaheim, CA, June 13–14, for the latest advancements in AWS security, compliance, identity, and privacy solutions.

Stay updated on launches by subscribing to the AWS What’s New RSS feed and reading the AWS Security Blog.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Marta Taggart

Marta is a Seattle-native and Senior Product Marketing Manager in AWS Security Product Marketing, where she focuses on data protection services. Outside of work you’ll find her trying to convince Jack, her rescue dog, not to chase squirrels and crows (with limited success).

Three key security themes from AWS re:Invent 2022

Post Syndicated from Anne Grahn original https://aws.amazon.com/blogs/security/three-key-security-themes-from-aws-reinvent-2022/

AWS re:Invent returned to Las Vegas, Nevada, November 28 to December 2, 2022. After a virtual event in 2020 and a hybrid 2021 edition, spirits were high as over 51,000 in-person attendees returned to network and learn about the latest AWS innovations.

Now in its 11th year, the conference featured 5 keynotes, 22 leadership sessions, and more than 2,200 breakout sessions and hands-on labs at 6 venues over 5 days.

With well over 100 service and feature announcements—and innumerable best practices shared by AWS executives, customers, and partners—distilling highlights is a challenge. From a security perspective, three key themes emerged.

Turn data into actionable insights

Security teams are always looking for ways to increase visibility into their security posture and uncover patterns to make more informed decisions. However, as AWS Vice President of Data and Machine Learning, Swami Sivasubramanian, pointed out during his keynote, data often exists in silos; it isn’t always easy to analyze or visualize, which can make it hard to identify correlations that spark new ideas.

“Data is the genesis for modern invention.” – Swami Sivasubramanian, AWS VP of Data and Machine Learning

At AWS re:Invent, we launched new features and services that make it simpler for security teams to store and act on data. One such service is Amazon Security Lake, which brings together security data from cloud, on-premises, and custom sources in a purpose-built data lake stored in your account. The service, which is now in preview, automates the sourcing, aggregation, normalization, enrichment, and management of security-related data across an entire organization for more efficient storage and query performance. It empowers you to use the security analytics solutions of your choice, while retaining control and ownership of your security data.

Amazon Security Lake has adopted the Open Cybersecurity Schema Framework (OCSF), which AWS cofounded with a number of organizations in the cybersecurity industry. The OCSF helps standardize and combine security data from a wide range of security products and services, so that it can be shared and ingested by analytics tools. More than 37 AWS security partners have announced integrations with Amazon Security Lake, enhancing its ability to transform security data into a powerful engine that helps drive business decisions and reduce risk. With Amazon Security Lake, analysts and engineers can gain actionable insights from a broad range of security data and improve threat detection, investigation, and incident response processes.

Strengthen security programs

According to Gartner, by 2026, at least 50% of C-Level executives will have performance requirements related to cybersecurity risk built into their employment contracts. Security is top of mind for organizations across the globe, and as AWS CISO CJ Moses emphasized during his leadership session, we are continuously building new capabilities to help our customers meet security, risk, and compliance goals.

In addition to Amazon Security Lake, several new AWS services announced during the conference are designed to make it simpler for builders and security teams to improve their security posture in multiple areas.

Identity and networking

Authorization is a key component of applications. Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for custom applications that simplifies policy-based access for developers and centralizes access governance. The new service gives developers a simple-to-use policy and schema management system to define and manage authorization models. The policy-based authorization system that Amazon Verified Permissions offers can shorten development cycles by months, provide a consistent user experience across applications, and facilitate integrated auditing to support stringent compliance and regulatory requirements.

Additional services that make it simpler to define authorization and service communication include Amazon VPC Lattice, an application-layer service that consistently connects, monitors, and secures communications between your services, and AWS Verified Access, which provides secure access to corporate applications without a virtual private network (VPN).

Threat detection and monitoring

Monitoring for malicious activity and anomalous behavior just got simpler. Amazon GuardDuty RDS Protection expands the threat detection capabilities of GuardDuty by using tailored machine learning (ML) models to detect suspicious logins to Amazon Aurora databases. You can enable the feature with a single click in the GuardDuty console, with no agents to manually deploy, no data sources to enable, and no permissions to configure. When RDS Protection detects a potentially suspicious or anomalous login attempt that indicates a threat to your database instance, GuardDuty generates a new finding with details about the potentially compromised database instance. You can view GuardDuty findings in AWS Security Hub, Amazon Detective (if enabled), and Amazon EventBridge, allowing for integration with existing security event management or workflow systems.

To bolster vulnerability management processes, Amazon Inspector now supports AWS Lambda functions, adding automated vulnerability assessments for serverless compute workloads. With this expanded capability, Amazon Inspector automatically discovers eligible Lambda functions and identifies software vulnerabilities in application package dependencies used in the Lambda function code. Actionable security findings are aggregated in the Amazon Inspector console, and pushed to Security Hub and EventBridge to automate workflows.

Data protection and privacy

The first step to protecting data is to find it. Amazon Macie now automatically discovers sensitive data, providing continual, cost-effective, organization-wide visibility into where sensitive data resides across your Amazon Simple Storage Service (Amazon S3) estate. With this new capability, Macie automatically and intelligently samples and analyzes objects across your S3 buckets, inspecting them for sensitive data such as personally identifiable information (PII), financial data, and AWS credentials. Macie then builds and maintains an interactive data map of your sensitive data in S3 across your accounts and Regions, and provides a sensitivity score for each bucket. This helps you identify and remediate data security risks without manual configuration and reduce monitoring and remediation costs.

Encryption is a critical tool for protecting data and building customer trust. The launch of the end-to-end encrypted enterprise communication service AWS Wickr offers advanced security and administrative controls that can help you protect sensitive messages and files from unauthorized access, while working to meet data retention requirements.

Management and governance

Maintaining compliance with regulatory, security, and operational best practices as you provision cloud resources is key. AWS Config rules, which evaluate the configuration of your resources, have now been extended to support proactive mode, so that they can be incorporated into infrastructure-as-code continuous integration and continuous delivery (CI/CD) pipelines to help identify noncompliant resources prior to provisioning. This can significantly reduce time spent on remediation.

Managing the controls needed to meet your security objectives and comply with frameworks and standards can be challenging. To make it simpler, we launched comprehensive controls management with AWS Control Tower. You can use it to apply managed preventative, detective, and proactive controls to accounts and organizational units (OUs) by service, control objective, or compliance framework. You can also use AWS Control Tower to turn on Security Hub detective controls across accounts in an OU. This new set of features reduces the time that it takes to define and manage the controls required to meet specific objectives, such as supporting the principle of least privilege, restricting network access, and enforcing data encryption.

Do more with less

As we work through macroeconomic conditions, security leaders are facing increased budgetary pressures. In his opening keynote, AWS CEO Adam Selipsky emphasized the effects of the pandemic, inflation, supply chain disruption, energy prices, and geopolitical events that continue to impact organizations.

Now more than ever, it is important to maintain your security posture despite resource constraints. Citing specific customer examples, Selipsky underscored how the AWS Cloud can help organizations move faster and more securely. By moving to the cloud, agricultural machinery manufacturer Agco reduced costs by 78% while increasing data retrieval speed, and multinational HVAC provider Carrier Global experienced a 40% reduction in the cost of running mission-critical ERP systems.

“If you’re looking to tighten your belt, the cloud is the place to do it.” – Adam Selipsky, AWS CEO

Security teams can do more with less by maximizing the value of existing controls, and bolstering security monitoring and analytics capabilities. Services and features announced during AWS re:Invent—including Amazon Security Lake, sensitive data discovery with Amazon Macie, support for Lambda functions in Amazon Inspector, Amazon GuardDuty RDS Protection, and more—can help you get more out of the cloud and address evolving challenges, no matter the economic climate.

Security is our top priority

AWS re:Invent featured many more highlights on a variety of topics, such as Amazon EventBridge Pipes and the pre-announcement of GuardDuty EKS Runtime protection, as well as Amazon CTO Dr. Werner Vogels’ keynote, and the security partnerships showcased on the Expo floor. It was a whirlwind week, but one thing is clear: AWS is working harder than ever to make our services better and to collaborate on solutions that ease the path to proactive security, so that you can focus on what matters most—your business.

For more security-related announcements and on-demand sessions, see A recap for security, identity, and compliance sessions at AWS re:Invent 2022 and the AWS re:Invent Security, Identity, and Compliance playlist on YouTube.

If you have feedback about this post, submit comments in the Comments section below.

Anne Grahn

Anne Grahn

Anne is a Senior Worldwide Security GTM Specialist at AWS based in Chicago. She has more than a decade of experience in the security industry, and has a strong focus on privacy risk management. She maintains a Certified Information Systems Security Professional (CISSP) certification.

Author

Paul Hawkins

Paul helps customers of all sizes understand how to think about cloud security so they can build the technology and culture where security is a business enabler. He takes an optimistic approach to security and believes that getting the foundations right is the key to improving your security posture.

How to query and visualize Macie sensitive data discovery results with Athena and QuickSight

Post Syndicated from Keith Rozario original https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/

Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to help you discover and protect sensitive data in Amazon Simple Storage Service (Amazon S3). With Macie, you can analyze objects in your S3 buckets to detect occurrences of sensitive data, such as personally identifiable information (PII), financial information, personal health information, and access credentials.

In this post, we walk you through a solution to gain comprehensive and organization-wide visibility into which types of sensitive data are present in your S3 storage, where the data is located, and how much is present. Once enabled, Macie automatically starts discovering sensitive data in your S3 storage and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to run targeted sensitive data discovery jobs. Both automated data discovery and targeted jobs produce rich, detailed sensitive data discovery results. This solution uses Amazon Athena and Amazon QuickSight to deep-dive on the Macie results, and to help you analyze, visualize, and report on sensitive data discovered by Macie, even when the data is distributed across millions of objects, thousands of S3 buckets, and thousands of AWS accounts. Athena is an interactive query service that makes it simpler to analyze data directly in Amazon S3 using standard SQL. QuickSight is a cloud-scale business intelligence tool that connects to multiple data sources, including Athena databases and tables.

This solution is relevant to data security, data governance, and security operations engineering teams.

The challenge: how to summarize sensitive data discovered in your growing S3 storage

Macie issues findings when an object is found to contain sensitive data. In addition to findings, Macie keeps a record of each S3 object analyzed in a bucket of your choice for long-term storage. These records are known as sensitive data discovery results, and they include additional context about your data in Amazon S3. Due to the large size of the results file, Macie exports the sensitive data discovery results to an S3 bucket, so you need to take additional steps to query and visualize the results. We discuss the differences between findings and results in more detail later in this post.

With the increasing number of data privacy guidelines and compliance mandates, customers need to scale their monitoring to encompass thousands of S3 buckets across their organization. The growing volume of data to assess, and the growing list of findings from discovery jobs, can make it difficult to review and remediate issues in a timely manner. In addition to viewing individual findings for specific objects, customers need a way to comprehensively view, summarize, and monitor sensitive data discovered across their S3 buckets.

To illustrate this point, we ran a Macie sensitive data discovery job on a dataset created by AWS. The dataset contains about 7,500 files that have sensitive information, and Macie generated a finding for each sensitive file analyzed, as shown in Figure 1.

Figure 1: Macie findings from the dataset

Figure 1: Macie findings from the dataset

Your security team could spend days, if not months, analyzing these individual findings manually. Instead, we outline how you can use Athena and QuickSight to query and visualize the Macie sensitive data discovery results to understand your data security posture.

The additional information in the sensitive data discovery results will help you gain comprehensive visibility into your data security posture. With this visibility, you can answer questions such as the following:

  • What are the top 5 most commonly occurring sensitive data types?
  • Which AWS accounts have the most findings?
  • How many S3 buckets are affected by each of the sensitive data types?

Your security team can write their own customized queries to answer questions such as the following:

  • Is there sensitive data in AWS accounts that are used for development purposes?
  • Is sensitive data present in S3 buckets that previously did not contain sensitive information?
  • Was there a change in configuration for S3 buckets containing the greatest amount of sensitive data?

How are findings different from results?

As a Macie job progresses, it produces two key types of output: sensitive data findings (or findings for short), and sensitive data discovery results (or results).

Findings provide a report of potential policy violations with an S3 bucket, or the presence of sensitive data in a specific S3 object. Each finding provides a severity rating, information about the affected resource, and additional details, such as when Macie found the issue. Findings are published to the Macie console, AWS Security Hub, and Amazon EventBridge.

In contrast, results are a collection of records for each S3 object that a Macie job analyzed. These records contain information about objects that do and do not contain sensitive data, including up to 1,000 occurrences of each sensitive data type that Macie found in a given object, and whether Macie was unable to analyze an object because of issues such as permissions settings or use of an unsupported format. If an object contains sensitive data, the results record includes detailed information that isn’t available in the finding for the object.

One of the key benefits of querying results is to uncover gaps in your data protection initiatives—these gaps can occur when data in certain buckets can’t be analyzed because Macie was denied access to those buckets, or was unable to decrypt specific objects. The following table maps some of the key differences between findings and results.

Findings Results
Enabled by default Yes No
Location of published results Macie console, Security Hub, and EventBridge S3 bucket
Details of S3 objects that couldn’t be scanned No Yes
Details of S3 objects in which no sensitive data was found No Yes
Identification of files inside compressed archives that contain sensitive data No Yes
Number of occurrences reported per object Up to 15 Up to 1,000
Retention period 90 days in Macie console Defined by customer

Architecture

As shown in Figure 2, you can build out the solution in three steps:

  1. Enable the results and publish them to an S3 bucket
  2. Build out the Athena table to query the results by using SQL
  3. Visualize the results with QuickSight
Figure 2: Architecture diagram showing the flow of the solution

Figure 2: Architecture diagram showing the flow of the solution

Prerequisites

To implement the solution in this blog post, you must first complete the following prerequisites:

Figure 3: Sample data loaded into three different AWS accounts

Figure 3: Sample data loaded into three different AWS accounts

Note: All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not, nor is it intended, to relate back to any individual person.

Step 1: Enable the results and publish them to an S3 bucket

Publication of the discovery results to Amazon S3 is not enabled by default. The setup requires that you specify an S3 bucket to store the results (we also refer to this as the discovery results bucket), and use an AWS Key Management Service (AWS KMS) key to encrypt the bucket.

If you are analyzing data across multiple accounts in your organization, then you need to enable the results in your delegated Macie administrator account. You do not need to enable results in individual member accounts. However, if you’re running Macie jobs in a standalone account, then you should enable the Macie results directly in that account.

To enable the results

  1. Open the Macie console.
  2. Select the AWS Region from the upper right of the page.
  3. From the left navigation pane, select Discovery results.
  4. Select Configure now.
  5. Select Create Bucket, and enter a unique bucket name. This will be the discovery results bucket name. Make note of this name because you will use it when you configure the Athena tables later in this post.
  6. Under Encryption settings, select Create new key. This takes you to the AWS KMS console in a new browser tab.
  7. In the AWS KMS console, do the following:
    1. For Key type, choose symmetric, and for Key usage, choose Encrypt and Decrypt.
    2. Enter a meaningful key alias (for example, macie-results-key) and description.
    3. (Optional) For simplicity, set your current user or role as the Key Administrator.
    4. Set your current user/role as a user of this key in the key usage permissions step. This will give you the right permissions to run the Athena queries later.
    5. Review the settings and choose Finish.
  8. Navigate to the browser tab with the Macie console.
  9. From the AWS KMS Key dropdown, select the new key.
  10. To view KMS key policy statements that were automatically generated for your specific key, account, and Region, select View Policy. Copy these statements in their entirety to your clipboard.
  11. Navigate back to the browser tab with the AWS KMS console and then do the following:
    1. Select Customer managed keys.
    2. Choose the KMS key that you created, choose Switch to policy view, and under Key policy, select Edit.
    3. In the key policy, paste the statements that you copied. When you add the statements, do not delete any existing statements and make sure that the syntax is valid. Policies are in JSON format.
  12. Navigate back to the Macie console browser tab.
  13. Review the inputs in the Settings page for Discovery results and then choose Save. Macie will perform a check to make sure that it has the right access to the KMS key, and then it will create a new S3 bucket with the required permissions.
  14. If you haven’t run a Macie discovery job in the last 90 days, you will need to run a new discovery job to publish the results to the bucket.

In this step, you created a new S3 bucket and KMS key that you are using only for Macie. For instructions on how to enable and configure the results using existing resources, see Storing and retaining sensitive data discovery results with Amazon Macie. Make sure to review Macie pricing details before creating and running a sensitive data discovery job.

Step 2: Build out the Athena table to query the results using SQL

Now that you have enabled the discovery results, Macie will begin publishing them into your discovery results bucket in the form of jsonl.gz files. Depending on the amount of data, there could be thousands of individual files, with each file containing multiple records. To identify the top five most commonly occurring sensitive data types in your organization, you would need to query all of these files together.

In this step, you will configure Athena so that it can query the results using SQL syntax. Before you can run an Athena query, you must specify a query result bucket location in Amazon S3. This is different from the Macie discovery results bucket that you created in the previous step.

If you haven’t set up Athena previously, we recommend that you create a separate S3 bucket, and specify a query result location using the Athena console. After you’ve set up the query result location, you can configure Athena.

To create a new Athena database and table for the Macie results

  1. Open the Athena console, and in the query editor, enter the following data definition language (DDL) statement. In the context of SQL, a DDL statement is a syntax for creating and modifying database objects, such as tables. For this example, we named our database macie_results.
    CREATE DATABASE macie_results;
    

    After running this step, you’ll see a new database in the Database dropdown. Make sure that the new macie_results database is selected for the next queries.

    Figure 4: Create database in the Athena console

    Figure 4: Create database in the Athena console

  2. Create a table in the database by using the following DDL statement. Make sure to replace <RESULTS-BUCKET-NAME> with the name of the discovery results bucket that you created previously.
    CREATE EXTERNAL TABLE maciedetail_all_jobs(
    	accountid string,
    	category string,
    	classificationdetails struct<jobArn:string,result:struct<status:struct<code:string,reason:string>,sizeClassified:string,mimeType:string,sensitiveData:array<struct<category:string,totalCount:string,detections:array<struct<type:string,count:string,occurrences:struct<lineRanges:array<struct<start:string,`end`:string,`startColumn`:string>>,pages:array<struct<pageNumber:string>>,records:array<struct<recordIndex:string,jsonPath:string>>,cells:array<struct<row:string,`column`:string,`columnName`:string,cellReference:string>>>>>>>,customDataIdentifiers:struct<totalCount:string,detections:array<struct<arn:string,name:string,count:string,occurrences:struct<lineRanges:array<struct<start:string,`end`:string,`startColumn`:string>>,pages:array<string>,records:array<string>,cells:array<string>>>>>>,detailedResultsLocation:string,jobId:string>,
    	createdat string,
    	description string,
    	id string,
    	partition string,
    	region string,
    	resourcesaffected struct<s3Bucket:struct<arn:string,name:string,createdAt:string,owner:struct<displayName:string,id:string>,tags:array<string>,defaultServerSideEncryption:struct<encryptionType:string,kmsMasterKeyId:string>,publicAccess:struct<permissionConfiguration:struct<bucketLevelPermissions:struct<accessControlList:struct<allowsPublicReadAccess:boolean,allowsPublicWriteAccess:boolean>,bucketPolicy:struct<allowsPublicReadAccess:boolean,allowsPublicWriteAccess:boolean>,blockPublicAccess:struct<ignorePublicAcls:boolean,restrictPublicBuckets:boolean,blockPublicAcls:boolean,blockPublicPolicy:boolean>>,accountLevelPermissions:struct<blockPublicAccess:struct<ignorePublicAcls:boolean,restrictPublicBuckets:boolean,blockPublicAcls:boolean,blockPublicPolicy:boolean>>>,effectivePermission:string>>,s3Object:struct<bucketArn:string,key:string,path:string,extension:string,lastModified:string,eTag:string,serverSideEncryption:struct<encryptionType:string,kmsMasterKeyId:string>,size:string,storageClass:string,tags:array<string>,embeddedFileDetails:struct<filePath:string,fileExtension:string,fileSize:string,fileLastModified:string>,publicAccess:boolean>>,
    	schemaversion string,
    	severity struct<description:string,score:int>,
    	title string,
    	type string,
    	updatedat string)
    ROW FORMAT SERDE
    	'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES (
    	'paths'='accountId,category,classificationDetails,createdAt,description,id,partition,region,resourcesAffected,schemaVersion,severity,title,type,updatedAt')
    STORED AS INPUTFORMAT
    	'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    	'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    	's3://<RESULTS-BUCKET-NAME>/AWSLogs/'
    

    After you complete this step, you will see a new table named maciedetail_all_jobs in the Tables section of the query editor.

  3. Query the results to start gaining insights. For example, to identify the top five most common sensitive data types, run the following query:
    select sensitive_data.category,
    	detections_data.type,
    	sum(cast(detections_data.count as INT)) total_detections
    from maciedetail_all_jobs,
    	unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),
    	unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by sensitive_data.category, detections_data.type
    order by total_detections desc
    LIMIT 5
    

    Running this query on the sample dataset gives the following output.

    Results of a query showing the five most common sensitive data types in the dataset

    Figure 5: Results of a query showing the five most common sensitive data types in the dataset

  4. (Optional) The previous query ran on all of the results available for Macie. You can further query which accounts have the greatest amount of sensitive data detected.
    select accountid,
    	sum(cast(detections_data.count as INT)) total_detections
    from maciedetail_all_jobs,
    	unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),
    	unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by accountid
    order by total_detections desc
    

    To test this query, we distributed the synthetic dataset across three member accounts in our organization, ran the query, and received the following output. If you enable Macie in just a single account, then you will only receive results for that one account.

    Figure 6: Query results for total number of sensitive data detections across all accounts in an organization

    Figure 6: Query results for total number of sensitive data detections across all accounts in an organization

For a list of more example queries, see the amazon-macie-results-analytics GitHub repository.

Step 3: Visualize the results with QuickSight

In the previous step, you used Athena to query your Macie discovery results. Although the queries were powerful, they only produced tabular data as their output. In this step, you will use QuickSight to visualize the results of your Macie jobs.

Before creating the visualizations, you first need to grant QuickSight the right permissions to access Athena, the results bucket, and the KMS key that you used to encrypt the results.

To allow QuickSight access to the KMS key

  1. Open the AWS Identity and Access Management (IAM) console, and then do the following:
    1. In the navigation pane, choose Roles.
    2. In the search pane for roles, search for aws-quicksight-s3-consumers-role-v0. If this role does not exist, search for aws-quicksight-service-role-v0.
    3. Select the role and copy the role ARN. You will need this role ARN to modify the KMS key policy to grant permissions for this role.
  2. Open the AWS KMS console and then do the following:
    1. Select Customer managed keys.
    2. Choose the KMS key that you created.
    3. Paste the following statement in the key policy. When you add the statement, do not delete any existing statements, and make sure that the syntax is valid. Replace <QUICKSIGHT_SERVICE_ROLE_ARN> and <KMS_KEY_ARN> with your own information. Policies are in JSON format.
	{ "Sid": "Allow Quicksight Service Role to use the key",
		"Effect": "Allow",
		"Principal": {
			"AWS": <QUICKSIGHT_SERVICE_ROLE_ARN>
		},
		"Action": "kms:Decrypt",
		"Resource": <KMS_KEY_ARN>
	}

To allow QuickSight access to Athena and the discovery results S3 bucket

  1. In QuickSight, in the upper right, choose your user icon to open the profile menu, and choose US East (N.Virginia). You can only modify permissions in this Region.
  2. In the upper right, open the profile menu again, and select Manage QuickSight.
  3. Select Security & permissions.
  4. Under QuickSight access to AWS services, choose Manage.
  5. Make sure that the S3 checkbox is selected, click on Select S3 buckets, and then do the following:
    1. Choose the discovery results bucket.
    2. You do not need to check the box under Write permissions for Athena workgroup. The write permissions are not required for this post.
    3. Select Finish.
  6. Make sure that the Amazon Athena checkbox is selected.
  7. Review the selections and be careful that you don’t inadvertently disable AWS services and resources that other users might be using.
  8. Select Save.
  9. In QuickSight, in the upper right, open the profile menu, and choose the Region where your results bucket is located.

Now that you’ve granted QuickSight the right permissions, you can begin creating visualizations.

To create a new dataset referencing the Athena table

  1. On the QuickSight start page, choose Datasets.
  2. On the Datasets page, choose New dataset.
  3. From the list of data sources, select Athena.
  4. Enter a meaningful name for the data source (for example, macie_datasource) and choose Create data source.
  5. Select the database that you created in Athena (for example, macie_results).
  6. Select the table that you created in Athena (for example, maciedetail_all_jobs), and choose Select.
  7. You can either import the data into SPICE or query the data directly. We recommend that you use SPICE for improved performance, but the visualizations will still work if you query the data directly.
  8. To create an analysis using the data as-is, choose Visualize.

You can then visualize the Macie results in the QuickSight console. The following example shows a delegated Macie administrator account that is running a visualization, with account IDs on the y axis and the count of affected resources on the x axis.

Figure 7: Visualize query results to identify total number of sensitive data detections across accounts in an organization

Figure 7: Visualize query results to identify total number of sensitive data detections across accounts in an organization

You can also visualize the aggregated data in QuickSight. For example, you can view the number of findings for each sensitive data category in each S3 bucket. The Athena table doesn’t provide aggregated data necessary for visualization. Instead, you need to query the table and then visualize the output of the query.

To query the table and visualize the output in QuickSight

  1. On the Amazon QuickSight start page, choose Datasets.
  2. On the Datasets page, choose New dataset.
  3. Select the data source that you created in Athena (for example, macie_datasource) and then choose Create Dataset.
  4. Select the database that you created in Athena (for example, macie_results).
  5. Choose Use Custom SQL, enter the following query below, and choose Confirm Query.
    	select resourcesaffected.s3bucket.name as bucket_name,
    		sensitive_data.category,
    		detections_data.type,
    		sum(cast(detections_data.count as INT)) total_detections
    	from macie_results.maciedetail_all_jobs,
    		unnest(classificationdetails.result.sensitiveData) as t(sensitive_data),unnest(sensitive_data.detections) as t(detections_data)
    where classificationdetails.result.sensitiveData is not null
    and resourcesaffected.s3object.embeddedfiledetails is null
    group by resourcesaffected.s3bucket.name, sensitive_data.category, detections_data.type
    order by total_detections desc
    	

  6. You can either import the data into SPICE or query the data directly.
  7. To create an analysis using the data as-is, choose Visualize.

Now you can visualize the output of the query that aggregates data across your S3 buckets. For example, we used the name of the S3 bucket to group the results, and then we created a donut chart of the output, as shown in Figure 6.

Figure 8: Visualize query results for total number of sensitive data detections across each S3 bucket in an organization

Figure 8: Visualize query results for total number of sensitive data detections across each S3 bucket in an organization

From the visualizations, we can identify which buckets or accounts in our organizations contain the most sensitive data, for further action. Visualizations can also act as a dashboard to track remediation.

If you encounter permissions issues, see Insufficient permissions when using Athena with Amazon QuickSight and Troubleshooting key access for troubleshooting steps.

You can replicate the preceding steps by using the sample queries from the amazon-macie-results-analytics GitHub repo to view data that is aggregated across S3 buckets, AWS accounts, or individual Macie jobs. Using these queries with the results of your Macie results will help you get started with tracking the security posture of your data in Amazon S3.

Conclusion

In this post, you learned how to enable sensitive data discovery results for Macie, query those results with Athena, and visualize the results in QuickSight.

Because Macie sensitive data discovery results provide more granular data than the findings, you can pursue a more comprehensive incident response when sensitive data is discovered. The sample queries in this post provide answers to some generic questions that you might have. After you become familiar with the structure, you can run other interesting queries on the data.

We hope that you can use this solution to write your own queries to gain further insights into sensitive data discovered in S3 buckets, according to the business needs and regulatory requirements of your organization. You can consider using this solution to better understand and identify data security risks that need immediate attention. For example, you can use this solution to answer questions such as the following:

  • Is financial information present in an AWS account where it shouldn’t be?
  • Are S3 buckets that contain PII properly hardened with access controls and encryption?

You can also use this solution to understand gaps in your data security initiatives by tracking files that Macie couldn’t analyze due to encryption or permission issues. To further expand your knowledge of Macie capabilities and features, see the following resources:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Keith Rozario

Keith is a Sr. Solution Architect at Amazon Web Services based in Singapore, where he helps customers develop solutions for their most complex business problems. He loves road cycling, reading comics from DC, and enjoying the sweet sound of music from AC/DC.

Author

Scott Ward

Scott is a Principal Solutions Architect with AWS External Security Services (ESS) and has been with Amazon for over 20 years. Scott provides technical guidance to the ESS services, such as GuardDuty, Security Hub, Macie, Inspector and Detective, and helps customers make their applications secure. Scott has a deep background in supporting, enhancing, and building global financial solutions to meet the needs of large companies, including many years of supporting the global financial systems for Amazon.com.

Author

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

How to use Amazon Macie to preview sensitive data in S3 buckets

Post Syndicated from Koulick Ghosh original https://aws.amazon.com/blogs/security/how-to-use-amazon-macie-to-preview-sensitive-data-in-s3-buckets/

Security teams use Amazon Macie to discover and protect sensitive data, such as names, payment card data, and AWS credentials, in Amazon Simple Storage Service (Amazon S3). When Macie discovers sensitive data, these teams will want to see examples of the actual sensitive data found. Reviewing a sampling of the discovered data helps them quickly confirm that the object is truly sensitive according to their data protection and privacy policies.

In this post, we walk you through how your data security teams are able to use a new capability in Amazon Macie to retrieve up to 10 examples of sensitive data found in your S3 objects, so that you are able to confirm the nature of the data at a glance. Additionally, we will discuss how you are able to control who is able to use this capability, so that only authorized personnel have permissions to view these examples.

The challenge customers face

After a Macie sensitive data discovery job is run, security teams start their work. The security team will review the Macie findings to investigate the discovered sensitive data and decide what actions to take to protect such data. The findings provide details that include the severity of the finding, information on the affected S3 object, and a summary of the type, location, and amount of sensitive data found. However, Macie findings only contain pointers to data that Macie found in the object. In order to complete their investigation, customers in the past had to do additional work to extract the contents of a sensitive object, such as navigating to a different AWS account where the object is located, downloading and manually searching for keywords in a file editor, or writing and refining SQL queries by using Amazon S3 Select. The investigations are further slowed down when the object type is one that is not easily readable without additional tooling, such as big-data file types like Avro and Parquet. By using the Macie capability to retrieve sensitive data samples, you are able to review the discovered data and make decisions concerning the finding remediation.

Prerequisites

To implement the ability to retrieve and reveal samples of sensitive data, you’ll need the following prerequisites:

  • Enable Amazon Macie in your AWS account. For instructions, see Getting started with Amazon Macie.
  • Set your account as the delegated Macie administrator account and enable Macie in at least one member account by using AWS Organizations. In this post, we will refer to the delegated administrator account as Account A and the member account as Account B.
  • Configure Macie detailed classification results in Account A.

    Note: The detailed classification results contain a record for each Amazon S3 object that you configure the job to analyze, and include the location of up to 1,000 occurrences of each type of sensitive data that Macie found in an object. Macie uses the location information in the detailed classification results to retrieve the examples of sensitive data. The detailed classification results are stored in an S3 bucket of your choice. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET1.

  • Create an S3 bucket that contains sensitive data in Account B. In this post, we will refer to this bucket as DOC-EXAMPLE-BUCKET2.

    Note: You should enable server-side encryption on this bucket by using customer managed AWS Key Management Service (AWS KMS) keys (a type of encryption known as SSE-KMS).

  • (Optional) Add sensitive data to DOC-EXAMPLE-BUCKET2. This post uses a sample dataset that contains fake sensitive data. You are able to download this sample dataset, unarchive the .zip folder, and follow these steps to upload the objects to S3. This is a synthetic dataset generated by AWS that we will use for the examples in this post. All data in this blog post has been artificially created by AWS for demonstration purposes and has not been collected from any individual person. Similarly, such data does not relate back to any individual person, nor is it intended to.
  • Create and run a sensitive data discovery job from Account A to analyze the contents of DOC-EXAMPLE-BUCKET2.
  • (Optional) Set up the AWS Command Line Interface (AWS CLI).

Configure Macie to retrieve and reveal examples of sensitive data

In this section, we’ll describe how to configure Macie so that you are able to retrieve and view examples of sensitive data from Macie findings.

To configure Macie (console)

  • In the AWS Management Console, in the Macie delegated administrator account (Account A), follow these steps from the Amazon Macie User Guide.

To configure Macie (AWS CLI)

  1. Confirm that you have Macie enabled.
    	$ aws macie2 get-macie-session --query 'status'
    	// The expected response is "ENABLED"

  2. Confirm that you have configured the detailed classification results bucket.
    	$ aws macie2 get-classification-export-configuration
    
    	// The expected response is:
    	{
       	 "configuration": {
       		 	    "s3Destination": {
            		    "bucketName": " DOC-EXAMPLE-BUCKET1 ",
               			"kmsKeyArn": "arn:aws:kms:<YOUR-REGION>:<YOUR-ACCOUNT-ID>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET1>"
         		  	 }
    		}	
    	} 

  3. Create a new KMS key to encrypt the retrieved examples of sensitive data. Make sure that the key is created in the same AWS Region where you are operating Macie.
    $ aws kms create-key
    {
        "KeyMetadata": {
            "Origin": "AWS_KMS",
            "KeyId": "<YOUR-KEY-ID>",
            "Description": "",
            "KeyManager": "CUSTOMER",
            "Enabled": true,
            "KeySpec": "SYMMETRIC_DEFAULT",
            "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
            "KeyUsage": "ENCRYPT_DECRYPT",
            "KeyState": "Enabled",
            "CreationDate": 1502910355.475,
            "Arn": "arn:aws:kms: <YOUR-AWS-REGION>:<AWS-ACCOUNT-A>:key/<YOUR-KEY-ID>",
            "AWSAccountId": "<AWS-ACCOUNT-A>",
            "MultiRegion": false
            "EncryptionAlgorithms": [
                "SYMMETRIC_DEFAULT"
            ],
        }
    }

  4. Give this key the alias REVEAL-KMS-KEY.
    $ aws kms CreateAlias
    {
       "AliasName": " <REVEAL-KMS-KEY> ",
       "TargetKeyId": "<YOUR-KEY-ID>"
    }

  5. Enable the feature in Macie and configure it to encrypt the data by using REVEAL-KMS-KEY. You do not specify a key policy for your new KMS key in this step. The key policy will be discussed later in the post.
    $ aws macie2 update-reveal-configuration --configuration '{"status":"ENABLED","kmsKeyId":"alias/ <REVEAL-KMS-KEY> "}'
    
    // The expected response is:
    {
        "configuration": {
            "kmsKeyId": "arn:aws:kms:<YOUR-REGION>: <YOUR ACCOUNT ID>:key/<REVEAL-KMS-KEY>.",
            "status": "ENABLED"
        }
    }

Control access to read sensitive data and protect data displayed in Macie

This new Macie capability uses the AWS Identity and Access Management (IAM) policies, S3 bucket policies, and AWS KMS key policies that you have defined in your accounts. This means that in order to see examples through the Macie console or by invoking the Macie API, the IAM principal needs to have read access to the S3 object and to decrypt the object if it is server-side encrypted. It’s important to note that Macie uses the IAM permissions of the AWS principal to locate, retrieve, and reveal the samples and does not use the Macie service-linked role to perform these tasks.

Using the setup discussed in the previous section, you will walk through how to control access to the ability to retrieve and reveal sensitive data examples. To recap, you created and ran a discovery job from the Amazon Macie delegated administrator account (Account A) to analyze the contents of DOC-EXAMPLE-BUCKET2 in a member account (Account B). You configured Macie to retrieve examples and to encrypt the examples of sensitive data with the REVEAL-KMS-KEY.

The next step is to create and use an IAM role that will be assumed by other users in Account A to retrieve and reveal examples of sensitive data discovered by Macie. In this post, we’ll refer to this role as MACIE-REVEAL-ROLE.

To apply the principle of least privilege and allow only authorized personnel to view the sensitive data samples, grant the following permissions so that Macie users who assume MACIE-REVEAL-ROLE will be able to successfully retrieve and reveal examples of sensitive data:

  • Step 1 – Update the IAM policy for MACIE-REVEAL-ROLE.
  • Step 2 – Update the KMS key policy for REVEAL-KMS-KEY.
  • Step 3 – Update the S3 bucket policy for DOC-EXAMPLE-BUCKET2 and the KMS key policy used for its server-side encryption in Account B.

After you grant these permissions, MACIE-REVEAL-ROLE is succcesfully able to retrieve and reveal examples of sensitive data in DOC-EXAMPLE-BUCKET2, as shown in Figure 1.

Figure 1: Macie runs the discovery job from the delegated administrator account in a member account, and MACIE-REVEAL-ROLE retrieves examples of sensitive data

Figure 1: Macie runs the discovery job from the delegated administrator account in a member account, and MACIE-REVEAL-ROLE retrieves examples of sensitive data

Step 1: Update the IAM policy

Provide the following required permissions to MACIE-REVEAL-ROLE:

  1. Allow GetObject from DOC-EXAMPLE-BUCKET2 in Account B.
  2. Allow decryption of DOC-EXAMPLE-BUCKET2 if it is server-side encrypted with a customer managed key (SSE-KMS).
  3. Allow GetObject from DOC-EXAMPLE-BUCKET1.
  4. Allow decryption of the Macie discovery results.
  5. Allow the necessary Macie actions to retrieve and reveal sensitive data examples.

To set up the required permissions

  • Use the following commands to provide the permissions. Make sure to replace the placeholders with your own data.
    {
        "Version": "2012-10-17",
        "Statement": [
    	{
                "Sid": "AllowGetFromCompanyDataBucket",
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<DOC-EXAMPLE-BUCKET2>/*"
            },
            {
                "Sid": "AllowKMSDecryptForCompanyDataBucket",
                "Effect": "Allow",
                "Action": [
                    "kms:Decrypt"
                ],
                "Resource": "arn:aws:kms:<AWS-Region>:<AWS-Account-B>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET2>"
            },
            {
                "Sid": "AllowGetObjectfromMacieResultsBucket",
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<DOC-EXAMPLE-BUCKET1>/*"
            },
    	{
                "Sid": "AllowKMSDecryptForMacieRoleDiscoveryBucket",
                "Effect": "Allow",
                "Action": [
                    "kms:Decrypt"
                ],
                "Resource": "arn:aws:kms:<AWS-REGION>:<AWS-ACCOUNT-A>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET1>"
            },
    	{
                "Sid": "AllowActionsRetrieveAndReveal",
                "Effect": "Allow",
                "Action": [
                    "macie2:GetMacieSession",
                    "macie2:GetFindings",
                    "macie2:GetSensitiveDataOccurrencesAvailability",
                    "macie2:GetSensitiveDataOccurrences",
                    "macie2:ListFindingsFilters",
                    "macie2:GetBucketStatistics",
                    "macie2:ListMembers",
                    "macie2:ListFindings",
                    "macie2:GetFindingStatistics",
                    "macie2:GetAdministratorAccount",
                    "macie2:GetClassificationExportConfiguration",
                    "macie2:GetRevealConfiguration",
                    "macie2:DescribeBuckets"
                ],
                "Resource": "*” 
            }
        ]
    }

Step 2: Update the KMS key policy

Next, update the KMS key policy that is used to encrypt sensitive data samples that you retrieve and reveal in your delegated administrator account.

To update the key policy

  • Allow the MACIE-REVEAL-ROLE access to the KMS key that you created for protecting the retrieved sensitive data, using the following commands. Make sure to replace the placeholders with your own data.
    	{
                "Sid": "AllowMacieRoleDecrypt",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam:<AWS-REGION>:<AWS-ACCOUNT-A>:role/<MACIE-REVEAL-ROLE>"
                },
                "Action": [
                    "kms:Decrypt",
                    "kms:DescribeKey",
                    "kms:GenerateDataKey"
                ],
                "Resource": "arn:aws:kms:<AWS-REGION>:<AWS-ACCOUNT-A>:key/<REVEAL-KMS-KEY>"
            }

Step 3: Update the bucket policy of the S3 bucket

Finally, update the bucket policy of the S3 bucket in member accounts, and update the key policy of the key used for SSE-KMS.

To update the S3 bucket policy and KMS key policy

  1. Use the following commands to update key policy for the KMS key used for server-side encryption of the DOC-EXAMPLE-BUCKET2 bucket in Account B.
    	{
                "Sid": "AllowMacieRoleDecrypt”
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam:<AWS-REGION>:<AWS-ACCOUNT-A>:role/<MACIE-REVEAL-ROLE>"
                },
                "Action": "kms:Decrypt",
                "Resource": "arn:aws:kms:<AWS-REGION>:<AWS-ACCOUNT-B>:key/<KEY-USED-TO-ENCRYPT-DOC-EXAMPLE-BUCKET2>"
      }

  2. Use the following commands to update the bucket policy of DOC-EXAMPLE-BUCKET2 to allow cross-account access for MACIE-REVEAL-ROLE to get objects from this bucket.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowMacieRoleGet",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::<AWS-ACCOUNT-A>:role/<MACIE-REVEAL-ROLE>"
                },
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<DOC-EXAMPLE-BUCKET2>/*"
            }
        ]
    }

Retrieve and reveal sensitive data samples

Now that you’ve put in place the necessary permissions, users who assume MACIE-REVEAL-ROLE will be able to conveniently retrieve and reveal sensitive data samples.

To retrieve and reveal sensitive data samples

  1. In the Macie console, in the left navigation pane, choose Findings, and select a specific finding. Under Sensitive Data, choose Review.
    Figure 2: The finding details panel

    Figure 2: The finding details panel

  2. On the Reveal sensitive data page, choose Reveal samples.
    Figure 3: The Reveal sensitive data page

    Figure 3: The Reveal sensitive data page

  3. Under Sensitive data, you will be able to view up to 10 examples of the sensitive data found by Amazon Macie.
    Figure 4: Examples of sensitive data revealed in the Amazon Macie console

    Figure 4: Examples of sensitive data revealed in the Amazon Macie console

You are able to find additional information on setting up the Macie Reveal function in the Amazon Macie User Guide.

Conclusion

In this post, we showed how you are to retrieve and review examples of sensitive data that were found in Amazon S3 using Amazon Macie. This capability will make it easier for your data protection teams to review the sensitive contents found in S3 buckets across the accounts in your AWS environment. With this information, security teams are able to quickly take remediation actions, such as updating the configuration of sensitive buckets, quarantining files with sensitive information, or sending a notification to the owner of the account where the sensitive data resides. In certain cases, you are able to add the examples to an allow list in Macie if you don’t want Macie to report those as sensitive data (for example, corporate addresses or sample data that is used for testing).

The following are links to additional resources that you will be able to use to expand your knowledge of Amazon Macie capabilities and features:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

Author

Michael Ingoldby

Michael is a Senior Security Solutions Architect at AWS based in Frisco, Texas. He provides guidance and helps customers to implement AWS native security services. Michael has been working in the security domain since 2006. When he is not working, he enjoys spending time outdoors.

Robert Wu

Robert Wu

Robert is the Software Development Engineer for AWS Macie, working on enabling customers with more sensitive data discovery capabilities. In his free time, he enjoys exploring and contributing to various open-source projects to widen his domain knowledge.

Automated Data Discovery for Amazon Macie

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/automated-data-discovery-for-amazon-macie/

Today, we announce automated data discovery for Amazon Macie. This new capability allows you to gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets.

At AWS, security is our first priority. The security of the infrastructure itself, but also the security of your data. We give you access to services to manage identities and access, to protect the network and your applications, to detect suspicious activities, to protect your data, and to report on and monitor your compliance status.

Amazon Macie is a data security service that discovers sensitive data using machine learning and pattern matching and enables visibility and automated protection against data security risks. You use Amazon Macie to protect your data in S3 by scanning for the presence of sensitive data, such as names, addresses, and credit card numbers, and continually monitoring for properly configured preventative controls, such as encryption and access policies. Amazon Macie generates alerts when it detects publicly accessible buckets, unencrypted buckets, or buckets shared with an AWS account outside of your organization. You may also configure Amazon Macie to scan your S3 to run full sensitive data discovery scans on your S3 buckets to provide visibility into where sensitive data resides.

But customers operating at scale told us it is difficult to know where to start. When employees and applications add new buckets and generate petabytes of data on a daily basis, what should be scanned first?

Automated data discovery automates the continual discovery of sensitive data and potential data security risks across your entire set of buckets aggregated at AWS Organizations level.

When you enable automated discovery in the console, Macie starts to evaluate the level of sensitivity of each of your buckets and highlights any data security risks. Automated data discovery introduces intelligent and fully managed data sampling to provide an optimized sample rate that meaningfully reduces the amount of data that needs to be analyzed. This reduces the cost of discovering S3 buckets containing sensitive data compared to the cost of full data inspection.

You can tune automated data discovery to only identify the types of sensitive data that are relevant for your use case by choosing from over 100 managed sensitive data types, such as personally identifiable information (PII) and financial records with specific formats for multiple countries. For example, you can enable detection of Spanish or Swedish driving license numbers and choose to ignore US Social Security numbers, depending on your use cases. When the specific type of data you manage is not on our list, you can create custom data types that may be unique to your business, such as employee or patient identification numbers.

Let’s See It in Action
Automated data discovery is on by default for all new Amazon Macie customers, and existing Macie customers can enable it with one click in the AWS Management Console of the Amazon Macie administrator account. There is a 30-day free trial, and you can always opt out at the administrator level.

I can enable or disable the capability from the Automated discovery entry–under Settings–on the left side navigation menu. The Status section reveals the current status.

Automated data discovery for Amazon Macie - Enable

On the same page, I can configure the list of managed data identifiers. I can turn on or off individual types of data among more than one hundred managed data identifier types. I can also configure new ones. I select Edit on the Managed data identifiers section to include or exclude additional data identifiers.

Automated data discovery for Amazon Macie - include or exclude data identifiers

If I have some buckets with lots of objects and others with a few, Macie won’t spend all its time inspecting one really large bucket at the expense of other smaller ones. Macie also prioritizes buckets that it knows the least about. For example, if it looked at the majority of objects in a small bucket, that bucket will be deprioritized compared to larger buckets where it has seen proportionally fewer objects.

Automated data discovery can provide an interactive data map of sensitive data distribution in S3 buckets within days of the feature being enabled. This data map refreshes daily as it intelligently picks and scans S3 objects in buckets and spreads the scan effort across the entire S3 estate in a given month.

Here is the Summary section of the Amazon Macie page. It looks like my set of buckets is secured. I have no bucket with public access, and 31 of my buckets might contain sensitive data.

Automated data discovery for Amazon Macie - Summary section

When selecting the S3 buckets section of the navigation menu on the left side, I can see a data map of my buckets. The more red the squares are, the more sensitive data are detected in the buckets. The squares in blue represent buckets with no sensitive data detected so far. From there, I can drill down at bucket level to investigate the details.

Automated data discovery for Amazon Macie - Heat map

Pricing and Availability
When you are new to Amazon Macie, automated data discovery is enabled by default. When you already use Amazon Macie in your organization, you can enable automatic data discovery with one click in the Management Console of the Amazon Macie administrator account.

There is a 30-day free trial period when you enable automatic data discovery on your AWS account. After the evaluation period, we charge based on the total quantity of S3 objects in your account as well as the bytes scanned for sensitive content. Charges are prorated per day. You can disable this capability at any time. The pricing page has all the details.

This new capability is now available in all 21 commercial AWS Regions where Macie is available.

Go and enable Amazon Macie automated data discovery today!

— seb

Building AWS Lambda governance and guardrails

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-aws-lambda-governance-and-guardrails/

When building serverless applications using AWS Lambda, there are a number of considerations regarding security, governance, and compliance. This post highlights how Lambda, as a serverless service, simplifies cloud security and compliance so you can concentrate on your business logic. It covers controls that you can implement for your Lambda workloads to ensure that your applications conform to your organizational requirements.

The Shared Responsibility Model

The AWS Shared Responsibility Model distinguishes between what AWS is responsible for and what customers are responsible for with cloud workloads. AWS is responsible for “Security of the Cloud” where AWS protects the infrastructure that runs all the services offered in the AWS Cloud. Customers are responsible for “Security in the Cloud”, managing and securing their workloads. When building traditional applications, you take on responsibility for many infrastructure services, including operating systems and network configuration.

Traditional application shared responsibility

Traditional application shared responsibility

One major benefit when building serverless applications is shifting more responsibility to AWS so you can concentrate on your business applications. AWS handles managing and patching the underlying servers, operating systems, and networking as part of running the services.

Serverless application shared responsibility

Serverless application shared responsibility

For Lambda, AWS manages the application platform where your code runs, which includes patching and updating the managed language runtimes. This reduces the attack surface while making cloud security simpler. You are responsible for the security of your code and AWS Identity and Access Management (IAM) to the Lambda service and within your function.

Lambda is SOCHIPAAPCI, and ISO-compliant. For more information, see Compliance validation for AWS Lambda and the latest Lambda certification and compliance readiness services in scope.

Lambda isolation

Lambda functions run in separate isolated AWS accounts that are dedicated to the Lambda service. Lambda invokes your code in a secure and isolated runtime environment within the Lambda service account. A runtime environment is a collection of resources running in a dedicated hardware-virtualized Micro Virtual Machines (MVM) on a Lambda worker node.

Lambda workers are bare metalEC2 Nitro instances, which are managed and patched by the Lambda service team. They have a maximum lease lifetime of 14 hours to keep the underlying infrastructure secure and fresh. MVMs are created by Firecracker, an open source virtual machine monitor (VMM) that uses Linux’s Kernel-based Virtual Machine (KVM) to create and manage MVMs securely at scale.

MVMs maintain a strong separation between runtime environments at the virtual machine hardware level, which increases security. Runtime environments are never reused across functions, function versions, or AWS accounts.

Isolation model for AWS Lambda workers

Isolation model for AWS Lambda workers

Network security

Lambda functions always run inside secure Amazon Virtual Private Cloud (Amazon VPCs) owned by the Lambda service. This gives the Lambda function access to AWS services and the public internet. There is no direct network inbound access to Lambda workers, runtime environments, or Lambda functions. All inbound access to a Lambda function only comes via the Lambda Invoke API, which sends the event object to the function handler.

You can configure a Lambda function to connect to private subnets in a VPC in your account if necessary, which you can control with IAM condition keys . The Lambda function still runs inside the Lambda service VPC but sends all network traffic through your VPC. Function outbound traffic comes from your own network address space.

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

To give your VPC-connected function access to the internet, route outbound traffic to a NAT gateway in a public subnet. Connecting a function to a public subnet doesn’t give it internet access or a public IP address, as the function is still running in the Lambda service VPC and then routing network traffic into your VPC.

All internal AWS traffic uses the AWS Global Backbone rather than traversing the internet. You do not need to connect your functions to a VPC to avoid connectivity to AWS services over the internet. VPC connected functions allow you to control and audit outbound network access.

You can use security groups to control outbound traffic for VPC-connected functions and network ACLs to block access to CIDR IP ranges or ports. VPC endpoints allow you to enable private communications with supported AWS services without internet access.

You can use VPC Flow Logs to audit traffic going to and from network interfaces in your VPC.

Runtime environment re-use

Each runtime environment processes a single request at a time. After Lambda finishes processing the request, the runtime environment is ready to process an additional request for the same function version. For more information on how Lambda manages runtime environments, see Understanding AWS Lambda scaling and throughput.

Data can persist in the local temporary filesystem path, in globally scoped variables, and in environment variables across subsequent invocations of the same function version. Ensure that you only handle sensitive information within individual invocations of the function by processing it in the function handler, or using local variables. Do not re-use files in the local temporary filesystem to process unencrypted sensitive data. Do not put sensitive or confidential information into Lambda environment variables, tags, or other freeform fields such as Name fields.

For more Lambda security information, see the Lambda security whitepaper.

Multiple accounts

AWS recommends using multiple accounts to isolate your resources because they provide natural boundaries for security, access, and billing. Use AWS Organizations to manage and govern individual member accounts centrally. You can use AWS Control Tower to automate many of the account build steps and apply managed guardrails to govern your environment. These include preventative guardrails to limit actions and detective guardrails to detect and alert on non-compliance resources for remediation.

Lambda access controls

Lambda permissions define what a Lambda function can do, and who or what can invoke the function. Consider the following areas when applying access controls to your Lambda functions to ensure least privilege:

Execution role

Lambda functions have permission to access other AWS resources using execution roles. This is an AWS principal that the Lambda service assumes which grants permissions using identity policy statements assigned to the role. The Lambda service uses this role to fetch and cache temporary security credentials, which are then available as environment variables during a function’s invocation. It may re-use them across different runtime environments that use the same execution role.

Ensure that each function has its own unique role with the minimum set of permissions..

Identity/user policies

IAM identity policies are attached to IAM users, groups, or roles. These policies allow users or callers to perform operations on Lambda functions. You can restrict who can create functions, or control what functions particular users can manage.

Resource policies

Resource policies define what identities have fine-grained inbound access to managed services. For example, you can restrict which Lambda function versions can add events to a specific Amazon EventBridge event bus. You can use resource-based policies on Lambda resources to control what AWS IAM identities and event sources can invoke a specific version or alias of your function. You also use a resource-based policy to allow an AWS service to invoke your function on your behalf. To see which services support resource-based policies, see “AWS services that work with IAM”.

Attribute-based access control (ABAC)

With attribute-based access control (ABAC), you can use tags to control access to your Lambda functions. With ABAC, you can scale an access control strategy by setting granular permissions with tags without requiring permissions updates for every new user or resource as your organization scales. You can also use tag policies with AWS Organizations to standardize tags across resources.

Permissions boundaries

Permissions boundaries are a way to delegate permission management safely. The boundary places a limit on the maximum permissions that a policy can grant. For example, you can use boundary permissions to limit the scope of the execution role to allow only read access to databases. A builder with permission to manage a function or with write access to the applications code repository cannot escalate the permissions beyond the boundary to allow write access.

Service control policies

When using AWS Organizations, you can use Service control policies (SCPs) to manage permissions in your organization. These provide guardrails for what actions IAM users and roles within the organization root or OUs can do. For more information, see the AWS Organizations documentation, which includes example service control policies.

Code signing

As you are responsible for the code that runs in your Lambda functions, you can ensure that only trusted code runs by using code signing with the AWS Signer service. AWS Signer digitally signs your code packages and Lambda validates the code package before accepting the deployment, which can be part of your automated software deployment process.

Auditing Lambda configuration, permissions and access

You should audit access and permissions regularly to ensure that your workloads are secure. Use the IAM console to view when an IAM role was last used.

IAM last used

IAM last used

IAM access advisor

Use IAM access advisor on the Access Advisor tab in the IAM console to review when was the last time an AWS service was used from a specific IAM user or role. You can use this to remove IAM policies and access from your IAM roles.

IAM access advisor

IAM access advisor

AWS CloudTrail

AWS CloudTrail helps you monitor, log, and retain account activity to provide a complete event history of actions across your AWS infrastructure. You can monitor Lambda API actions to ensure that only appropriate actions are made against your Lambda functions. These include CreateFunction, DeleteFunction, CreateEventSourceMapping, AddPermission, UpdateEventSourceMapping,  UpdateFunctionConfiguration, and UpdateFunctionCode.

AWS CloudTrail

AWS CloudTrail

IAM Access Analyzer

You can validate policies using IAM Access Analyzer, which provides over 100 policy checks with security warnings for overly permissive policies. To learn more about policy checks provided by IAM Access Analyzer, see “IAM Access Analyzer policy validation”.

You can also generate IAM policies based on access activity from CloudTrail logs, which contain the permissions that the role used in your specified date range.

IAM Access Analyzer

IAM Access Analyzer

AWS Config

AWS Config provides you with a record of the configuration history of your AWS resources. AWS Config monitors the resource configuration and includes rules to alert when they fall into a non-compliant state.

For Lambda, you can track and alert on changes to your function configuration, along with the IAM execution role. This allows you to gather Lambda function lifecycle data for potential audit and compliance requirements. For more information, see the Lambda Operators Guide.

AWS Config includes Lambda managed config rules such as lambda-concurrency-check, lambda-dlq-check, lambda-function-public-access-prohibited, lambda-function-settings-check, and lambda-inside-vpc. You can also write your own rules.

There are a number of other AWS services to help with security compliance.

  1. AWS Audit Manager: Collect evidence to help you audit your use of cloud services.
  2. Amazon GuardDuty: Detect unexpected and potentially unauthorized activity in your AWS environment.
  3. Amazon Macie: Evaluates your content to identify business-critical or potentially confidential data.
  4. AWS Trusted Advisor: Identify opportunities to improve stability, save money, or help close security gaps.
  5. AWS Security Hub: Provides security checks and recommendations across your organization.

Conclusion

Lambda makes cloud security simpler by taking on more responsibility using the AWS Shared Responsibility Model. Lambda implements strict workload security at scale to isolate your code and prevent network intrusion to your functions. This post provides guidance on assessing and implementing best practices and tools for Lambda to improve your security, governance, and compliance controls. These include permissions, access controls, multiple accounts, and code security. Learn how to audit your function permissions, configuration, and access to ensure that your applications conform to your organizational requirements.

For more serverless learning resources, visit Serverless Land.

AWS Week in Review – August 1, 2022

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-august-1-2022/

AWS re:Inforce returned to Boston last week, kicking off with a keynote from Amazon Chief Security Officer Steve Schmidt and AWS Chief Information Security officer C.J. Moses:

Be sure to take some time to watch this video and the other leadership sessions, and to use what you learn to take some proactive steps to improve your security posture.

Last Week’s Launches
Here are some launches that caught my eye last week:

AWS Wickr uses 256-bit end-to-end encryption to deliver secure messaging, voice, and video calling, including file sharing and screen sharing, across desktop and mobile devices. Each call, message, and file is encrypted with a new random key and can be decrypted only by the intended recipient. AWS Wickr supports logging to a secure, customer-controlled data store for compliance and auditing, and offers full administrative control over data: permissions, ephemeral messaging options, and security groups. You can now sign up for the preview.

AWS Marketplace Vendor Insights helps AWS Marketplace sellers to make security and compliance data available through AWS Marketplace in the form of a unified, web-based dashboard. Designed to support governance, risk, and compliance teams, the dashboard also provides evidence that is backed by AWS Config and AWS Audit Manager assessments, external audit reports, and self-assessments from software vendors. To learn more, read the What’s New post.

GuardDuty Malware Protection protects Amazon Elastic Block Store (EBS) volumes from malware. As Danilo describes in his blog post, a malware scan is initiated when Amazon GuardDuty detects that a workload running on an EC2 instance or in a container appears to be doing something suspicious. The new malware protection feature creates snapshots of the attached EBS volumes, restores them within a service account, and performs an in-depth scan for malware. The scanner supports many types of file systems and file formats and generates actionable security findings when malware is detected.

Amazon Neptune Global Database lets you build graph applications that run across multiple AWS Regions using a single graph database. You can deploy a primary Neptune cluster in one region and replicate its data to up to five secondary read-only database clusters, with up to 16 read replicas each. Clusters can recover in minutes in the result of an (unlikely) regional outage, with a Recovery Point Objective (RPO) of 1 second and a Recovery Time Objective (RTO) of 1 minute. To learn a lot more and see this new feature in action, read Introducing Amazon Neptune Global Database.

Amazon Detective now Supports Kubernetes Workloads, with the ability to scale to thousands of container deployments and millions of configuration changes per second. It ingests EKS audit logs to capture API activity from users, applications, and the EKS control plane, and correlates user activity with information gleaned from Amazon VPC flow logs. As Channy notes in his blog post, you can enable Amazon Detective and take advantage of a free 30 day trial of the EKS capabilities.

AWS SSO is Now AWS IAM Identity Center in order to better represent the full set of workforce and account management capabilities that are part of IAM. You can create user identities directly in IAM Identity Center, or you can connect your existing Active Directory or standards-based identify provider. To learn more, read this post from the AWS Security Blog.

AWS Config Conformance Packs now provide you with percentage-based scores that will help you track resource compliance within the scope of the resources addressed by the pack. Scores are computed based on the product of the number of resources and the number of rules, and are reported to Amazon CloudWatch so that you can track compliance trends over time. To learn more about how scores are computed, read the What’s New post.

Amazon Macie now lets you perform one-click temporary retrieval of sensitive data that Macie has discovered in an S3 bucket. You can retrieve up to ten examples at a time, and use these findings to accelerate your security investigations. All of the data that is retrieved and displayed in the Macie console is encrypted using customer-managed AWS Key Management Service (AWS KMS) keys. To learn more, read the What’s New post.

AWS Control Tower was updated multiple times last week. CloudTrail Organization Logging creates an org-wide trail in your management account to automatically log the actions of all member accounts in your organization. Control Tower now reduces redundant AWS Config items by limiting recording of global resources to home regions. To take advantage of this change you need to update to the latest landing zone version and then re-register each Organizational Unit, as detailed in the What’s New post. Lastly, Control Tower’s region deny guardrail now includes AWS API endpoints for AWS Chatbot, Amazon S3 Storage Lens, and Amazon S3 Multi Region Access Points. This allows you to limit access to AWS services and operations for accounts enrolled in your AWS Control Tower environment.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items and customer stories that you may find interesting:

AWS Open Source News and Updates – My colleague Ricardo Sueiras writes a weekly open source newsletter and highlights new open source projects, tools, and demos from the AWS community. Read installment #122 here.

Growy Case Study – This Netherlands-based company is building fully-automated robot-based vertical farms that grow plants to order. Read the case study to learn how they use AWS IoT and other services to monitor and control light, temperature, CO2, and humidity to maximize yield and quality.

Journey of a Snap on Snapchat – This video shows you how a snapshot flows end-to-end from your camera to AWS, to your friends. With over 300 million daily active users, Snap takes advantage of Amazon Elastic Kubernetes Service (EKS), Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), Amazon CloudFront, and many other AWS services, storing over 400 terabytes of data in DynamoDB and managing over 900 EKS clusters.

Cutting Cardboard Waste – Bin packing is almost certainly a part of every computer science curriculum! In the linked article from the Amazon Science site, you can learn how an Amazon Principal Research Scientist developed PackOpt to figure out the optimal set of boxes to use for shipments from Amazon’s global network of fulfillment centers. This is an NP-hard problem and the article describes how they build a parallelized solution that explores a multitude of alternative solutions, all running on AWS.

Upcoming Events
Check your calendar and sign up for these online and in-person AWS events:

AWS SummitAWS Global Summits – AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Registrations are open for the following AWS Summits in August:

Imagine Conference 2022IMAGINE 2022 – The IMAGINE 2022 conference will take place on August 3 at the Seattle Convention Center, Washington, USA. It’s a no-cost event that brings together education, state, and local leaders to learn about the latest innovations and best practices in the cloud. You can register here.

That’s all for this week. Check back next Monday for another Week in Review!

Jeff;

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Correlate IAM Access Analyzer findings with Amazon Macie

Post Syndicated from Nihar Das original https://aws.amazon.com/blogs/security/correlate-iam-access-analyzer-findings-with-amazon-macie/

In this blog post, you’ll learn how to detect when unintended access has been granted to sensitive data in Amazon Simple Storage Service (Amazon S3) buckets in your Amazon Web Services (AWS) accounts.

It’s critical for your enterprise to understand where sensitive data is stored in your organization and how and why it is shared. The ability to efficiently find data that is shared with entities outside your account and the contents of that data is paramount. You need a process to quickly detect and report which accounts have access to sensitive data. Amazon Macie is an AWS service that can detect many sensitive data types. Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and help protect your sensitive data in AWS.

AWS Identity and Access Management (IAM) Access Analyzer helps to identify resources in your organization and accounts, such as S3 buckets or IAM roles, that are shared with an external entity. When you enable IAM Access Analyzer, you create an analyzer for your entire organization or your account. The organization or account you choose is known as the zone of trust for the analyzer. The analyzer monitors the supported resources within your zone of trust. This analyzer enables IAM Access Analyzer to detect each instance of a resource shared outside the zone of trust and generates a finding about the resource and the external principals that have access to it.

Currently, you can use IAM Access Analyzer and Macie to detect external access and discover sensitive data as separate processes. You can join the findings from both to best evaluate the risk. The solution in this post integrates IAM Access Analyzer, Macie, and AWS Security Hub to automate the process of correlating findings between the services and presenting them in Security Hub.

How does the solution work?

First, IAM Access Analyzer discovers S3 buckets that are shared outside the zone of trust. Next, the solution schedules a Macie sensitive data discovery job for each of these buckets to determine if the bucket contains sensitive data. Upon discovery of shared sensitive data in S3, a custom high severity finding is created in Security Hub for review and incident response.

Solution architecture

This solution is based on a serverless architecture, and uses the following services:

Figure 1: Architecture diagram

Figure 1: Architecture diagram

Figure 1 depicts the following process flow:

  1. IAM Access Analyzer detects shared S3 buckets outside of the zone of trust—the organization or account you choose is known as a zone of trust for the analyzer—and creates the event Access Analyzer Finding in EventBridge.
  2. EventBridge triggers the Lambda function sda-aa-save-findings.
  3. The sda-aa-save-findings function records each finding in DynamoDB.
  4. An EventBridge scheduled event periodically starts a new cycle of the Step Function state machine, which immediately runs the Lambda function sda-macie-submit-scan. The template sets a 15-minute interval, but this is configurable.
  5. The sda-macie-submit-scan function reads the IAM Access Analyzer findings that were created by sda-aa-save-findings from DynamoDB.
  6. sda-macie-submit-scan launches a Macie classification job for each distinct S3 bucket that is related to one or more recent IAM Access Analyzer findings.
  7. Macie performs a sensitive discovery scan on each requested S3 bucket.
  8. The sda-macie-submit-scan function initiates the Lambda function sda-macie-check-status.
  9. sda-macie-check-status periodically checks the status of each Macie classification job, waiting for all the Macie jobs initiated by this solution to complete.
  10. Upon completion of the sda-macie-check-status function, the step function runs the Lambda function sda-sh-create-findings.
  11. sda-sh-create-findings joins the resulting IAM Access Analyzer and Macie datasets for each S3 bucket.
  12. sda-sh-create-findings publishes a finding to Security Hub for each bucket that has both external access and sensitive data.

    Note: The Macie scan is skipped if the S3 bucket is tagged to be excluded or if it was recently scanned by Macie. See the Cost considerations section for more information on custom configurations.

  13. Information security can review and act on the findings shown in Security Hub.

Sample Security Hub output

Figure 2 shows the sample findings that Security Hub will present. Each finding includes:

  • Severity
  • Workflow status
  • Record state
  • Company
  • Product
  • Title
  • Resource
Figure 2: Sample Security Hub findings

Figure 2: Sample Security Hub findings

The output to Security Hub will display a severity of HIGH with workflow NEW, because this is the first time the event has been observed. The record state is ACTIVE because the workflow state is NEW. The title explains the reason for the event.

For example, if potentially sensitive data is discovered in a bucket that is shared outside a zone of trust, selecting an event will display the resources involved in the finding so you can investigate. For more information, see the Security Hub User Guide.

Notes:

  • Detection of public S3 buckets by IAM Access Analyzer will still occur through Security Hub and will be marked as critical severity. This solution does not add to or augment this finding in Security Hub.
  • If a finding in IAM Access Analyzer is archived, the solution does not update the related finding in Security Hub.

Prerequisites

To use this solution, you need the following:

  • Permission to run AWS CloudFormation
  • Permission to create Lambda functions
  • Permission to create DynamoDB tables
  • Permission to create Step Function state machines
  • Permission to create EventBridge event rules
  • Permission to enable IAM Access Analyzer on the account where sensitive discovery is required
  • Permission to enable Macie on the account
  • Permission to enable Security Hub on the account

Deploy the solution

The solution is deployed through AWS CloudFormation, and you can review the template for options to best suit your specific needs.

  1. Sign in to your AWS account located at https://aws.amazon.com/console/.
  2. In the AWS Management Console, navigate to the AWS CloudFormation service, and then choose Create stack.
  3. Under Prerequisite – Prepare template, choose Template is ready.
  4. Under Specify template, choose Amazon S3 URL and provide the following URL:
    https://awsiammedia.s3.amazonaws.com/public/sample/936-correlating-aa-findings-macie/sda-cfn.yml
  5. Choose Next.
  6. Enter the stack name.
  7. The Application code location, S3 Bucket and S3 Key fields will be pre-filled.
  8. Under Service Activations, modify the activations based on the services you presently have running in your account.
  9. Modify the Logging and Monitoring settings if required.
  10. (Optional) Set an alert email address for errors.
  11. Choose Next, then choose Next again.
  12. Under Capabilities, select the check box.
  13. Choose Create Stack. The solution will begin deploying; watch for the CREATE_COMPLETE message.
Figure 3: Sample CloudFormation deployment status

Figure 3: Sample CloudFormation deployment status

The solution is now deployed and will start monitoring for sensitive data that is being shared. It will send the findings to Security Hub for your teams to investigate.

Cost considerations

When you scan large S3 buckets with sensitive data, remember that Macie cost is based on the amount of data scanned. For more information on Macie costs, see Amazon Macie pricing.

This solution allows the following options, which you can use to help manage costs:

  • Use environment variables in Lambda to skip specific tagged buckets
  • Skip recently scanned S3 buckets and reuse prior findings
Figure 4: Screen shot of configurable environment variable

Figure 4: Screen shot of configurable environment variable

Conclusion

In this post, we discussed how the solution uses Lambda, Step Functions and EventBridge to integrate IAM Access Analyzer with Macie discovery jobs. We reviewed the components of the application, deployed it by using CloudFormation, and reviewed the output a security team would use to take the appropriate actions. We also provided two ways that you can manage the costs associated with the solution.

After you deploy this project, you can modify it to meet your organization’s needs. For example, you can modify the tags to skip specific S3 buckets your organization has already classified to hold sensitive data. Customers who use multiple AWS accounts can designate a centralized Security Hub administrator account to receive the solution alerts from each member account. For more information on this option, see Designating a Security Hub administrator account.

If you have feedback about this post, please submit it in the Comments section below. If you have questions about this post, please start a new thread on the AWS Identity and Access Management forum.

Other resources

For more information on correlating security findings with AWS Security Hub and Amazon EventBridge, refer to this blog post.

Want more AWS Security news? Follow us on Twitter.

Nihar Das

Nihar Das

Nihar has over 20 years of experience in various business domains including financial services. As an AWS Senior Solutions Architect, he is passionate about solving challenges in the cloud and helps financial services customers to migrate to AWS and support the continued innovation.

Joe Dunn

Joe Dunn

Joe is an AWS Senior Solutions Architect in Financial Services with over 20 years of experience in infrastructure architecture and migration of business-critical loads to AWS. He helps financial services customers to innovate on the AWS Cloud by providing solutions using AWS products and services.

Armand Aquino

Armand Aquino

Armand is a solutions architect helping financial services organizations design their critical workloads on AWS. In his spare time, he enjoys exploring outdoors and learning Korean.

Journey to Adopt Cloud-Native Architecture Series #5 – Enhancing Threat Detection, Data Protection, and Incident Response

Post Syndicated from Anuj Gupta original https://aws.amazon.com/blogs/architecture/journey-to-adopt-cloud-native-architecture-series-5-enhancing-threat-detection-data-protection-and-incident-response/

In Part 4 of this series, Governing Security at Scale and IAM Baselining, we discussed building a multi-account strategy and improving access management and least privilege to prevent unwanted access and to enforce security controls.

As a refresher from previous posts in this series, our example e-commerce company’s “Shoppers” application runs in the cloud. The company experienced hypergrowth, which posed a number of platform and technology challenges, including enforcing security and governance controls to mitigate security risks.

With the pace of new infrastructure and software deployments, we had to ensure we maintain strong security. This post, Part 5, shows how we detect security misconfigurations, indicators of compromise, and other anomalous activity. We also show how we developed and iterated on our incident response processes.

Threat detection and data protection

With our newly acquired customer base from hypergrowth, we had to make sure we maintained customer trust. We also needed to detect and respond to security events quickly to reduce the scope and impact of any unauthorized activity. We were concerned about vulnerabilities on our public-facing web servers, accidental sensitive data exposure, and other security misconfigurations.

Prior to hypergrowth, application teams scanned for vulnerabilities and maintained the security of their applications. After hypergrowth, we established dedicated security team and identified tools to simplify the management of our cloud security posture. This allowed us to easily identify and prioritize security risks.

Use AWS security services to detect threats and misconfigurations

We use the following AWS security services to simplify the management of cloud security risks and reduce the burden of third-party integrations. This also minimizes the amount of engineering work required by our security team.

Detect threats with Amazon GuardDuty

We use Amazon GuardDuty to keep up with the newest threat actor tactics, techniques, and procedures (TTPs) and indicators of compromise (IOCs).

GuardDuty saves us time and reduces complexity, because we don’t have to continuously engineer detections for new TTPs and IOCs for static events and machine-learning-based detections. This allows our security analysts to focus on building runbooks and quickly responding to security findings.

Discover sensitive data with Amazon Macie for Amazon S3

To host our external website, we use a few public Amazon Simple Storage Service (Amazon S3) buckets with static assets. We don’t want developers to accidentally put sensitive data in these buckets, and we wanted to understand which S3 buckets contain sensitive information, such as financial or personally identifiable information (PII).

We explored building a custom scanner to search for sensitive data, but maintaining the search patterns was too complex. It was also costly to continuously re-scan files each month. Therefore, we use Amazon Macie to continuously scan our S3 buckets for sensitive data. After Macie makes its initial scan, it will only scan new or updated objects in those S3 buckets, which reduces our costs significantly. We added filter rules to exclude files of larger size and S3 prefixes to scan required objects and provided a sampling rate to further cost optimize scanning large S3 buckets (in our case, S3 buckets greater than 1 TB).

Scan for vulnerabilities with Amazon Inspector

Because we use a wide variety of operating systems and software, we must scan our Amazon Elastic Compute Cloud (Amazon EC2) instances for known software vulnerabilities, such as Log4J.

We use Amazon Inspector to run continuous vulnerability scans on our EC2 instances and Amazon Elastic Container Registry (Amazon ECR) container images. With Amazon Inspector, we can continuously detect if our developers are deploying and releasing vulnerable software on our EC2 instances and ECR images without setting up a third-party vulnerability scanner and installing additional endpoint agents.

Aggregate security findings with AWS Security Hub

We don’t want our security analysts to arbitrarily act on one security finding over another. This is time-consuming and does not properly prioritize the highest risks to address. We also need to track ownership, view progress of various findings, and build consistent responses for common security findings.

With AWS Security Hub, our analysts can seamlessly prioritize findings from GuardDuty, Macie, Amazon Inspector, and many other AWS services. Our analysts also use Security Hub’s built-in security checks and insights to identify AWS resources and accounts that have a high number of findings and act on them.

Setting up the threat detection services

This is how we set up these services:

Our security analysts use Security Hub-generated Jira tickets to view, prioritize, and respond to all security findings and misconfigurations across our AWS environment.

Through this configuration, our analysts no longer need to pivot between various AWS accounts, security tool consoles, and Regions, which makes the day-to-day management and operations much easier. Figure 1 depicts the data flow to Security Hub.

Aggregation of security services in security tooling account

Figure 1. Aggregation of security services in security tooling account

Delegated administrator setup

Figure 2. Delegated administrator setup

Incident response

Before hypergrowth, there was no formal way to respond to security incidents. To prevent future security issues, we built incident response plans and processes to quickly address potential security incidents and minimize the impact and exposure. Following the AWS Security Incident Response Guide and NIST framework, we adopted the following best practices.

Playbooks and runbooks for repeatability

We developed incident response playbooks and runbooks for repeatable responses for security events that include:

  • Playbooks for more strategic scenarios and responses based on some of the sample playbooks found here.
  • Runbooks that provide step-by-step guidance for our security analysts to follow in case an event occurs. We used Amazon SageMaker notebooks and AWS Systems Manager Incident Manager runbooks to develop repeatable responses for pre-identified incidents, such as suspected command and control activity on an EC2 instance.

Automation for quicker response time

After developing our repeatable processes, we identified areas where we could accelerate responses to security threats by automating the response. We used the AWS Security Hub Automated Response and Remediation solution as a starting point.

By using this solution, we didn’t need to build our own automated response and remediation workflow. The code is also easy to read, repeat, and centrally deploy through AWS CloudFormation StackSets. We used some of the built-in remediations like disabling active keys that have not been rotated for more than 90 days, making all Amazon Elastic Block Store (Amazon EBS) snapshots private, and many more. With automatic remediation, our analysts can respond quicker and in a more holistic and repeatable way.

Simulations to improve incident response capabilities

We implemented quarterly incident response simulations. These simulations test how well prepared our people, processes, and technologies are for an incident. We included some cloud-specific simulations like an S3 bucket exposure and an externally shared Amazon Relational Database Service (Amazon RDS) snapshot to ensure our security staff are prepared for an incident in the cloud. We use the results of the simulations to iterate on our incident response processes.

Conclusion

In this blog post, we discussed how to prepare for, detect, and respond to security events in an AWS environment. We identified security services to detect security events, vulnerabilities, and misconfigurations. We then discussed how to develop incident response processes through building playbooks and runbooks, performing simulations, and automation. With these new capabilities, we can detect and respond to a security incident throughout hypergrowth.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Other blog posts in this series

Top 2021 AWS Security service launches security professionals should review – Part 1

Post Syndicated from Ryan Holland original https://aws.amazon.com/blogs/security/top-2021-aws-security-service-launches-part-1/

Given the speed of Amazon Web Services (AWS) innovation, it can sometimes be challenging to keep up with AWS Security service and feature launches. To help you stay current, here’s an overview of some of the most important 2021 AWS Security launches that security professionals should be aware of. This is the first of two related posts; Part 2 will highlight some of the important 2021 launches that security professionals should be aware of across all AWS services.

Amazon GuardDuty

In 2021, the threat detection service Amazon GuardDuty expanded the internal AWS security intelligence it consumes to use more of the intel that AWS internal threat detection teams collect, including additional nation-state threat intelligence. Sharing more of the important intel that internal AWS teams collect lets you quickly improve your protection. GuardDuty also launched domain reputation modeling. These machine learning models take all the domain requests from across all of AWS, and feed them into a model that allows AWS to categorize previously unseen domains as highly likely to be malicious or benign based on their behavioral characteristics. In practice, AWS is seeing that these models often deliver high-fidelity threat detections, identifying malicious domains 7–14 days before they are identified and available on commercial threat feeds.

AWS also launched second generation anomaly detection for GuardDuty. Shortly after the original GuardDuty launch in 2017, AWS added additional anomaly detection for user behavior analytics and monitoring for unusual activity of AWS Identity and Access Management (IAM) users. After receiving customer feedback that the original feature was a little too noisy, and that it was difficult to understand why some findings were generated, the GuardDuty analytics team rebuilt this functionality on an entirely new machine learning model, considerably reducing the number of detections and generating a more accurate positive-detection rate. The new model also added additional context that security professionals (such as analysts) can use to understand why the model shows findings as suspicious or unusual.

Since its introduction, GuardDuty has detected when AWS EC2 Role credentials are used to call AWS APIs from IP addresses outside of AWS. Beginning in early 2022, GuardDuty now supports detection when credentials are used from other AWS accounts, inside the AWS network. This is a complex problem for customers to solve on their own, which is why the GuardDuty team added this enhancement. The solution considers that there are legitimate reasons why a source IP address that is communicating with AWS services APIs might be different than the Amazon Elastic Compute Cloud (Amazon EC2) instance IP address, or a NAT gateway associated with the instance’s VPC. The enhancement also considers complex network topologies that route traffic to one or multiple VPCs—for example, AWS Transit Gateway or AWS Direct Connect.

Our customers are increasingly running container workloads in production; helping to raise the security posture of these workloads became an AWS development priority in 2021. GuardDuty for EKS Protection is one recent feature that has resulted from this investment. This new GuardDuty feature monitors Amazon Elastic Kubernetes Service (Amazon EKS) cluster control plane activity by analyzing Kubernetes audit logs. GuardDuty is integrated with Amazon EKS, giving it direct access to the Kubernetes audit logs without requiring you to turn on or store these logs. Once a threat is detected, GuardDuty generates a security finding that includes container details such as pod ID, container image ID, and associated tags. See below for details on how the new Amazon Inspector is also helping to protect containers.

Amazon Inspector

At AWS re:Invent 2021, we launched the new Amazon Inspector, a vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure. The original Amazon Inspector was completely re-architected in this release to automate vulnerability management and to deliver near real-time findings to minimize the time needed to discover new vulnerabilities. This new Amazon Inspector has simple one-click enablement and multi-account support using AWS Organizations, similar to our other AWS Security services. This launch also introduces a more accurate vulnerability risk score, called the Inspector score. The Inspector score is a highly contextualized risk score that is generated for each finding by correlating Common Vulnerability and Exposures (CVE) metadata with environmental factors for resources such as network accessibility. This makes it easier for you to identify and prioritize your most critical vulnerabilities for immediate remediation. One of the most important new capabilities is that Amazon Inspector automatically discovers running EC2 instances and container images residing in Amazon Elastic Container Registry (Amazon ECR), at any scale, and immediately starts assessing them for known vulnerabilities. Now you can consolidate your vulnerability management solutions for both Amazon EC2 and Amazon ECR into one fully managed service.

AWS Security Hub

In addition to a significant number of smaller enhancements throughout 2021, in October AWS Security Hub, an AWS cloud security posture management service, addressed a top customer enhancement request by adding support for cross-Region finding aggregation. You can now view all your findings from all accounts and all selected Regions in a single console view, and act on them from an Amazon EventBridge feed in a single account and Region. Looking back at 2021, Security Hub added 72 additional best practice checks, four new AWS service integrations, and 13 new external partner integrations. A few of these integrations are Atlassian Jira Service Management, Forcepoint Cloud Security Gateway (CSG), and Amazon Macie. Security Hub also achieved FedRAMP High authorization to enable security posture management for high-impact workloads.

Amazon Macie

Based on customer feedback, data discovery tool Amazon Macie launched a number of enhancements in 2021. One new feature, which made it easier to manage Amazon Simple Storage Service (Amazon S3) buckets for sensitive data, was criteria-based bucket selection. This Macie feature allows you to define runtime criteria to determine which S3 buckets should be included in a sensitive data-discovery job. When a job runs, Macie identifies the S3 buckets that match your criteria, and automatically adds or removes them from the job’s scope. Before this feature, once a job was configured, it was immutable. Now, for example, you can create a policy where if a bucket becomes public in the future, it’s automatically added to the scan, and similarly, if a bucket is no longer public, it will no longer be included in the daily scan.

Originally Macie included all managed data identifiers available for all scans. However, customers wanted more surgical search criteria. For example, they didn’t want to be informed if there were exposed data types in a particular environment. In September 2021, Macie launched the ability to enable/disable managed data identifiers. This allows you to customize the data types you deem sensitive and would like Macie to alert on, in accordance with your organization’s data governance and privacy needs.

Amazon Detective

Amazon Detective is a service to analyze and visualize security findings and related data to rapidly get to the root cause of potential security issues. In January 2021, Amazon Detective added a convenient, time-saving integration that allows you to start security incident investigation workflows directly from the GuardDuty console. This new hyperlink pivot in the GuardDuty console takes findings directly from the GuardDuty console into the Detective console. Another time-saving capability added was the IP address drill down functionality. This new capability can be useful to security forensic teams performing incident investigations, because it helps quickly determine the communications that took place from an EC2 instance under investigation before, during, and after an event.

In December 2021, Detective added support for AWS Organizations to simplify management for security operations and investigations across all existing and future accounts in an organization. This launch allows new and existing Detective customers to onboard and centrally manage the Detective graph database for up to 1,200 AWS accounts.

AWS Key Management Service

In June 2021, AWS Key Management Service (AWS KMS) introduced multi-Region keys, a capability that lets you replicate keys from one AWS Region into another. With multi-Region keys, you can more easily move encrypted data between Regions without having to decrypt and re-encrypt with different keys for each Region. Multi-Region keys are supported for client-side encryption using direct AWS KMS API calls, or in a simplified manner with the AWS Encryption SDK and Amazon DynamoDB Encryption Client.

AWS Secrets Manager

Last year was a busy year for AWS Secrets Manager, with four feature launches to make it easier to manage secrets at scale, not just for client applications, but also for platforms. In March 2021, Secrets Manager launched multi-Region secrets to automatically replicate secrets for multi-Region workloads. Also in March, Secrets Manager added three new rules to AWS Config, to help administrators verify that secrets in Secrets Manager are configured according to organizational requirements. Then in April 2021, Secrets Manager added a CSI driver plug-in, to make it easy to consume secrets from Amazon EKS by using Kubernetes’s standard Secrets Store interface. In November, Secrets Manager introduced a higher secret limit of 500,000 per account to simplify secrets management for independent software vendors (ISVs) that rely on unique secrets for a large number of end customers. Although launched in January 2022, it’s also worth mentioning Secrets Manager’s release of rotation windows to align automatic rotation of secrets with application maintenance windows.

Amazon CodeGuru and Secrets Manager

In November 2021, AWS announced a new secrets detector feature in Amazon CodeGuru that searches your codebase for hardcoded secrets. Amazon CodeGuru is a developer tool powered by machine learning that provides intelligent recommendations to detect security vulnerabilities, improve code quality, and identify an application’s most expensive lines of code.

This new feature can pinpoint locations in your code with usernames and passwords; database connection strings, tokens, and API keys from AWS; and other service providers. When a secret is found in your code, CodeGuru Reviewer provides an actionable recommendation that links to AWS Secrets Manager, where developers can secure the secret with a point-and-click experience.

Looking ahead for 2022

AWS will continue to deliver experiences in 2022 that meet administrators where they govern, developers where they code, and applications where they run. A lot of customers are moving to container and serverless workloads; you can expect to see more work on this in 2022. You can also expect to see more work around integrations, like CodeGuru Secrets Detector identifying plaintext secrets in code (as noted previously).

To stay up-to-date in the year ahead on the latest product and feature launches and security use cases, be sure to read the Security service launch announcements. Additionally, stay tuned to the AWS Security Blog for Part 2 of this blog series, which will provide an overview of some of the important 2021 launches that security professionals should be aware of across all AWS services.

If you’re looking for more opportunities to learn about AWS security services, check out AWS re:Inforce, the AWS conference focused on cloud security, identity, privacy, and compliance, which will take place June 28-29 in Houston, Texas.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Ryan Holland

Ryan is a Senior Manager with GuardDuty Security Response. His team is responsible for ensuring GuardDuty provides the best security value to customers, including threat intelligence, behavioral analytics, and finding quality.

Author

Marta Taggart

Marta is a Seattle-native and Senior Product Marketing Manager in AWS Security Product Marketing, where she focuses on data protection services. Outside of work you’ll find her trying to convince Jack, her rescue dog, not to chase squirrels and crows (with limited success).

Enabling data classification for Amazon RDS database with Macie

Post Syndicated from Bruno Silveira original https://aws.amazon.com/blogs/security/enabling-data-classification-for-amazon-rds-database-with-amazon-macie/

Customers have been asking us about ways to use Amazon Macie data discovery on their Amazon Relational Database Service (Amazon RDS) instances. This post presents how to do so using AWS Database Migration Service (AWS DMS) to extract data from Amazon RDS, store it on Amazon Simple Storage Service (Amazon S3), and then classify the data using Macie. Macie’s resulting findings will also be made available to be queried with Amazon Athena by appropriate teams.

The challenge

Let’s suppose you need to find sensitive data in an RDS-hosted database using Macie, which currently only supports S3 as a data source. Therefore, you will need to extract and store the data from RDS in S3. In addition, you will need an interface for audit teams to audit these findings.

Solution overview

Figure 1: Solution architecture workflow

Figure 1: Solution architecture workflow

The architecture of the solution in Figure 1 can be described as:

  1. A MySQL engine running on RDS is populated with the Sakila sample database.
  2. A DMS task connects to the Sakila database, transforms the data into a set of Parquet compressed files, and loads them into the dcp-macie bucket.
  3. A Macie classification job analyzes the objects in the dcp-macie bucket using a combination of techniques such as machine learning and pattern matching to determine whether the objects contain sensitive data and to generate detailed reports on the findings.
  4. Amazon EventBridge routes the Macie findings reports events to Amazon Kinesis Data Firehose.
  5. Kinesis Data Firehose stores these reports in the dcp-glue bucket.
  6. S3 event notification triggers an AWS Lambda function whenever an object is created in the dcp-glue bucket.
  7. The Lambda function named Start Glue Workflow starts a Glue Workflow.
  8. Glue Workflow transforms the data from JSONL to Apache Parquet file format and places it in the dcp-athena bucket. This provides better performance during data query and optimized storage usage using a binary optimized columnar storage.
  9. Athena is used to query and visualize data generated by Macie.

Note: For better readability, the S3 bucket nomenclature omits the suffix containing the AWS Region and AWS account ID used to meet the global uniqueness naming requirement (for example, dcp-athena-us-east-1-123456789012).

The Sakila database schema consists of the following tables:

  • actor
  • address
  • category
  • city
  • country
  • customer

Building the solution

Prerequisites

Before configuring the solution, the AWS Identity and Access Management (IAM) user must have appropriate access granted for the following services:

You can find an IAM policy with the required permissions here.

Step 1 – Deploying the CloudFormation template

You’ll use CloudFormation to provision quickly and consistently the AWS resources illustrated in Figure 1. Through a pre-built template file, it will create the infrastructure using an Infrastructure-as-Code (IaC) approach.

  1. Download the CloudFormation template.
  2. Go to the CloudFormation console.
  3. Select the Stacks option in the left menu.
  4. Select Create stack and choose With new resources (standard).
  5. On Step 1 – Specify template, choose Upload a template file, select Choose file, and select the file template.yaml downloaded previously.
  6. On Step 2 – Specify stack details, enter a name of your preference for Stack name. You might also adjust the parameters as needed, like the parameter CreateRDSServiceRole to create a service role for RDS if it does not exist in the current account.
  7. On Step 3 – Configure stack options, select Next.
  8. On Step 4 – Review, check the box for I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
  9. Wait for the stack to show status CREATE_COMPLETE.

Note: It is expected that provisioning will take around 10 minutes to complete.

Step 2 – Running an AWS DMS task

To extract the data from the Amazon RDS instance, you need to run an AWS DMS task. This makes the data available for Amazon Macie in an S3 bucket in Parquet format.

  1. Go to the AWS DMS console.
  2. In the left menu, select Database migration tasks.
  3. Select the task Identifier named rdstos3task.
  4. Select Actions.
  5. Select the option Restart/Resume.

When the Status changes to Load Complete the task has finished and you will be able to see migrated data in your target bucket (dcp-macie).

Inside each folder you can see parquet file(s) with names similar to LOAD00000001.parquet. Now you can use Macie to discover if you have sensitive data in your database contents as exported to S3.

Step 3 – Running a classification job with Amazon Macie

Now you need to create a data classification job so you can assess the contents of your S3 bucket. The job you create will run once and evaluate the complete contents of your S3 bucket to determine whether it can identify PII among the data. As mentioned earlier, this job only uses the managed identifiers available with Macie – you could also add your own custom identifiers.

  1. Go to the Macie console.
  2. Select the S3 buckets option in the left menu.
  3. Choose the S3 bucket dcp-macie containing the output data from the DMS task. You may need to wait a minute and select the Refresh icon if the bucket names do not display.

  4. Select Create job.
  5. Select Next to continue.
  6. Create a job with the following scope.
    1. Sensitive data Discovery options: One-time job
    2. Sampling Depth: 100%
    3. Leave all other settings with their default values
  7. Select Next to continue.
  8. Select Next again to skip past the Custom data identifiers section.
  9. Give the job a name and description.
  10. Select Next to continue.
  11. Verify the details of the job you have created and select Submit to continue.

You will see a green banner stating that The Job was successfully created. The job can take up to 15 minutes to conclude and the Status will change from Active to Complete. To open the findings from the job, select the job’s check box, choose Show results, and select Show findings.
 

Figure 2: Macie Findings screen

Figure 2: Macie Findings screen

Note: You can navigate in the findings and select each checkbox to see the details.

Step 4 – Enabling querying on classification job results with Amazon Athena

  1. Go to the Athena console and open the Query editor.
  2. If it’s your first-time using Athena you will see a message Before you run your first query, you need to set up a query result location in Amazon S3. Learn more. Select the link presented with this message.
  3. In the Settings window, choose Select and then choose the bucket dcp-assets to store the Athena query results.
  4. (Optional) To store the query results encrypted, check the box for Encrypt query results and select your preferred encryption type. To learn more about Amazon S3 encryption types, see Protecting data using encryption.
  5. Select Save.

Step 5 – Query Amazon Macie results with Amazon Athena.

It might take a few minutes for the data to complete the flow between Amazon Macie and AWS Glue. After it’s finished, you’ll be able to see in Athena’s console the table dcp_athena within the database dcp.

Then, select the three dots next to the table dcp_athena and select the Preview table option to see a data preview, or run your own custom queries.
 

Figure 3: Athena table preview

Figure 3: Athena table preview

As your environment grows, this blog post on Top 10 Performance Tuning Tips for Amazon Athena can help you apply partitioning of data and consolidate your data into larger files if needed.

Clean up

After you finish, to clean up the solution and avoid unnecessary expenses, complete the following steps:

  1. Go to the Amazon S3 console.
  2. Navigate to each of the buckets listed below and delete all its objects:
    • dcp-assets
    • dcp-athena
    • dcp-glue
    • dcp-macie
  3. Go to the CloudFormation console.
  4. Select the Stacks option in the left menu.
  5. Choose the stack you created in Step 1 – Deploying the CloudFormation template.
  6. Select Delete and then select Delete Stack in the pop-up window.

Conclusion

In this blog post, we show how you can find Personally Identifiable Information (PII), and other data defined as sensitive, in Macie’s Managed Data Identifiers in an RDS-hosted MySQL database. You can use this solution with other relational databases like PostgreSQL, SQL Server, or Oracle, whether hosted on RDS or EC2. If you’re using Amazon DynamoDB, you may also find useful the blog post Detecting sensitive data in DynamoDB with Macie.

By classifying your data, you can inform your management of appropriate data protection and handling controls for the use of that data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Bruno Silveira

Bruno is a Solutions Architect Manager in the Public Sector team with a focus on educational institutions in Brazil. His previous career was in government, financial services, utilities, and nonprofit institutions. Bruno is an enthusiast of cloud security and an appreciator of good rock’n roll with a good beer.

Author

Thiago Pádua

Thiago is Solutions Architect in the AWS Worldwide Public Sector team working in the development and support of partners. He is experienced in software development and systems integration, mainly in the telecommunications industry. He has a special interest in microservices, serverless, and containers.

Strengthen the security of sensitive data stored in Amazon S3 by using additional AWS services

Post Syndicated from Jerry Mullis original https://aws.amazon.com/blogs/security/strengthen-the-security-of-sensitive-data-stored-in-amazon-s3-by-using-additional-aws-services/

In this post, we describe the AWS services that you can use to both detect and protect your data stored in Amazon Simple Storage Service (Amazon S3). When you analyze security in depth for your Amazon S3 storage, consider doing the following:

Using these additional AWS services along with Amazon S3 can improve your security posture across your accounts.

Audit and restrict Amazon S3 access with IAM Access Analyzer

IAM Access Analyzer allows you to identify unintended access to your resources and data. Users and developers need access to Amazon S3, but it’s important for you to keep users and privileges accurate and up to date.

Amazon S3 can often house sensitive and confidential information. To help secure your data within Amazon S3, you should be using AWS Key Management Service (AWS KMS) with server-side encryption at rest for Amazon S3. It is also important that you secure the S3 buckets so that you only allow access to the developers and users who require that access. Bucket policies and access control lists (ACLs) are the foundation of Amazon S3 security. Your configuration of these policies and lists determines the accessibility of objects within Amazon S3, and it is important to audit them regularly to properly secure and maintain the security of your Amazon S3 bucket.

IAM Access Analyzer can scan all the supported resources within a zone of trust. Access Analyzer then provides you with insight when a bucket policy or ACL allows access to any external entities that are not within your organization or your AWS account’s zone of trust.

To setup and use IAM Access Analyzer, follow the instructions for Enabling Access Analyzer in the AWS IAM User Guide.

The example in Figure 1 shows creating an analyzer with the zone of trust as the current account, but you can also create an analyzer with the organization as the zone of trust.

Figure 1: Creating IAM Access Analyzer and zone of trust

Figure 1: Creating IAM Access Analyzer and zone of trust

After you create your analyzer, IAM Access Analyzer automatically scans the resources in your zone of trust and returns the findings from your Amazon S3 storage environment. The initial scan shown in Figure 2 shows the findings of an unsecured S3 bucket.

Figure 2: Example of unsecured S3 bucket findings

Figure 2: Example of unsecured S3 bucket findings

For each finding, you can decide which action you would like to take. As shown in figure 3, you are given the option to archive (if the finding indicates intended access) or take action to modify bucket permissions (if the finding indicates unintended access).

Figure 3: Displays choice of actions to take

Figure 3: Displays choice of actions to take

After you address the initial findings, Access Analyzer monitors your bucket policies for changes, and notifies you of access issues it finds. Access Analyzer is regional and must be enabled in each AWS Region independently.

Classify and secure sensitive data with Macie

Organizational compliance standards often require the identification and securing of sensitive data. Your organization’s sensitive data might contain personally identifiable information (PII), which includes things such as credit card numbers, birthdates, and addresses.

Macie is a data security and privacy service offered by AWS that uses machine learning and pattern matching to discover the sensitive data stored within Amazon S3. You can define your own custom type of sensitive data category that might be unique to your business or use case. Macie will automatically provide an inventory of S3 buckets and alert you of unprotected sensitive data.

Figure 4 shows a sample result from a Macie scan in which you can see important information regarding Amazon S3 public access, encryption settings, and sharing.

Figure 4: Sample results from a Macie scan

Figure 4: Sample results from a Macie scan

In addition to finding potential sensitive data, Macie also gives you a severity score based on the privacy risk, as shown in the example data in Figure 5.

Figure 5: Example Macie severity scores

Figure 5: Example Macie severity scores

When you use Macie in conjunction with AWS Step Functions, you can also automatically remediate any issues found. You can use this combination to help meet regulations such as General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA). Macie allows you to have constant visibility of sensitive data within your Amazon S3 storage environment.

When you deploy Macie in a multi-account configuration, your usage is rolled up to the master account to provide the total usage for all accounts and a breakdown across the entire organization.

Detect malicious access patterns with GuardDuty

Your customers and users can commit thousands of actions each day on S3 buckets. Discerning access patterns manually can be extremely time consuming as the volume of data increases. GuardDuty uses machine learning, anomaly detection, and integrated threat intelligence to analyze billions of events across multiple accounts and uses data collected in AWS CloudTrail logs for S3 data events as well as S3 access logs, VPC Flow Logs, and DNS logs. GuardDuty can be configured to analyze these logs and notify you of suspicious activity, such as unusual data access patterns, unusual discovery API calls, and more. After you receive a list of findings on these activities, you will be able to make informed decisions to secure your S3 buckets.

Figure 6 shows a sample list of findings returned by GuardDuty which shows the finding type, resource affected, and count of occurrences.

Figure 6: Example GuardDuty list of findings

Figure 6: Example GuardDuty list of findings

You can select one of the results in Figure 6 to see the IP address and details associated from this potential malicious IP caller, as shown in Figure 7.

Figure 7: GuardDuty Malicious IP Caller detailed findings

Figure 7: GuardDuty Malicious IP Caller detailed findings

Monitor and remediate configuration changes with AWS Config

Configuration management is important when securing Amazon S3, to prevent unauthorized users from gaining access. It is important that you monitor the configuration changes of your S3 buckets, whether the changes are intentional or unintentional. AWS Config can track all configuration changes that are made to an S3 bucket. For example, if an S3 bucket had its permissions and configurations unexpectedly changed, using AWS Config allows you to see the changes made, as well as who made them.

With AWS Config, you can set up AWS Config managed rules that serve as a baseline for your S3 bucket. When any bucket has configurations that deviate from this baseline, you can be alerted by Amazon Simple Notification Service (Amazon SNS) of the bucket being noncompliant.

AWS Config can be used in conjunction with a service called AWS Lambda. If an S3 bucket is noncompliant, AWS Config can trigger a preprogrammed Lambda function and then the Lambda function can resolve those issues. This combination can be used to reduce your operational overhead in maintaining compliance within your S3 buckets.

Figure 8 shows a sample of AWS Config managed rules selected for configuration monitoring and gives a brief description of what the rule does.

Figure 8: Sample selections of AWS Managed Rules

Figure 8: Sample selections of AWS Managed Rules

Figure 9 shows a sample result of a non-compliant configuration and resource inventory listing the type of resource affected and the number of occurrences.

Figure 9: Example of AWS Config non-compliant resources

Figure 9: Example of AWS Config non-compliant resources

Conclusion

AWS has many offerings to help you audit and secure your storage environment. In this post, we discussed the particular combination of AWS services that together will help reduce the amount of time and focus your business devotes to security practices. This combination of services will also enable you to automate your responses to any unwanted permission and configuration changes, saving you valuable time and resources to dedicate elsewhere in your organization.

For more information about pricing of the services mentioned in this post, see AWS Free Tier and AWS Pricing. For more information about Amazon S3 security, see Amazon S3 Preventative Security Best Practices in the Amazon S3 User Guide.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jerry Mullis

Jerry is an Associate Solutions Architect at AWS. His interests are in data migration, machine learning, and device automation. Jerry has previous experience in machine learning research and healthcare management. His certifications include AWS Solutions Architect Pro, AWS Developer Associate, AWS Sysops Admin Associate and AWS Certified Cloud Practitioner. In his free time, Jerry enjoys hiking, playing basketball, and spending time with his wife.

Author

Dave Geyer

Dave is an Associate Solutions Architect at AWS. He has a background in data management and organizational design, and is interested in data analytics and infrastructure security. Dave has advised and worked for customers in the commercial and public sectors, providing them with architectural best practices and recommendations. Dave is interested in the aerospace and financial services industries. Outside of work, he is an adrenaline junkie, and is passionate about mountaineering and high altitudes.

Author

Andrew Chen

Andrew is an Associate Solutions Architect with an interest in data analytics, machine learning, and virtualization of infrastructure. Andrew has previous experience in management consulting in which he worked as a technical lead for various cloud migration projects. In his free time, Andrew enjoys fishing, hiking, kayaking, and keeping up with financial markets.

Using Amazon Macie to Validate S3 Bucket Data Classification

Post Syndicated from Bill Magee original https://aws.amazon.com/blogs/architecture/using-amazon-macie-to-validate-s3-bucket-data-classification/

Securing sensitive information is a high priority for organizations for many reasons. At the same time, organizations are looking for ways to empower development teams to stay agile and innovative. Centralized security teams strive to create systems that align to the needs of the development teams, rather than mandating how those teams must operate.

Security teams who create automation for the discovery of sensitive data have some issues to consider. If development teams are able to self-provision data storage, how does the security team protect that data? If teams have a business need to store sensitive data, they must consider how, where, and with what safeguards that data is stored.

Let’s look at how we can set up Amazon Macie to validate data classifications provided by decentralized software development teams. Macie is a fully managed service that uses machine learning (ML) to discover sensitive data in AWS. If you are not familiar with Macie, read New – Enhanced Amazon Macie Now Available with Substantially Reduced Pricing.

Data classification is part of the security pillar of a Well-Architected application. Following the guidelines provided in the AWS Well-Architected Framework, we can develop a resource-tagging scheme that fits our needs.

Overview of decentralized data validation system

In our example, we have multiple levels of data classification that represent different levels of risk associated with each classification. When a software development team creates a new Amazon Simple Storage Service (S3) bucket, they are responsible for labeling that bucket with a tag. This tag represents the classification of data stored in that bucket. The security team must maintain a system to validate that the data in those buckets meets the classification specified by the development teams.

This separation of roles and responsibilities for development and security teams who work independently requires a validation system that’s decoupled from S3 bucket creation. It should automatically detect new buckets or data in the existing buckets, and validate the data against the assigned classification tags. It should also notify the appropriate development teams of misclassified or unclassified buckets in a timely manner. These notifications can be through standard notification channels, such as email or Slack channel notifications.

Validation and alerts with AWS services

Figure 1. Validation system for Data Classification

Figure 1. Validation system for data classification

We assume that teams are permitted to create S3 buckets and we will use AWS Config to enforce the following required tags: DataClassification and SupportSNSTopic. The DataClassification tag indicates what type of data is allowed in the bucket. The SupportSNSTopic tag indicates an Amazon Simple Notification Service (SNS) topic. If there are issues found with the data in the bucket, a message is published to the topic, and Amazon SNS will deliver an alert. For example, if there is personally identifiable information (PII) data in a bucket that is classified as non-sensitive, the system will alert the owners of the bucket.

Macie is configured to scan all S3 buckets on a scheduled basis. This configuration ensures that any new bucket and data placed in the buckets is analyzed the next time the Macie job runs.

Macie provides several managed data identifiers for discovering and classifying the data. These include bank account numbers, credit card information, authentication credentials, PII, and more. You can also create custom identifiers (or rules) to gather information not covered by the managed identifiers.

Macie integrates with Amazon EventBridge to allow us to capture data classification events and route them to one or more destinations for reporting and alerting needs. In our configuration, the event initiates an AWS Lambda. The Lambda function is used to validate the data classification inferred by Macie against the classification specified in the DataClassification tag using custom business logic. If a data classification violation is found, the Lambda then sends a message to the Amazon SNS topic specified in the SupportSNSTopic tag.

The Lambda function also creates custom metrics and sends those to Amazon CloudWatch. The metrics are organized by engineering team and severity. This allows the security team to create a dashboard of metrics based on the Macie findings. The findings can also be filtered per engineering team and severity to determine which teams need to be contacted to ensure remediation.

Conclusion

This solution provides a centralized security team with the tools it needs. The team can validate the data classification of an Amazon S3 bucket that is self-provisioned by a development team. New Amazon S3 buckets are automatically included in the Macie jobs and alerts. These are only sent out if the data in the bucket does not conform to the classification specified by the development team. The data auditing process is loosely coupled with the Amazon S3 Bucket creation process, enabling self-service capabilities for development teams, while ensuring proper data classification. Your teams can stay agile and innovative, while maintaining a strong security posture.

Learn more about Amazon Macie and Data Classification.

Automate the archival and deletion of sensitive data using Amazon Macie

Post Syndicated from Subhro Bose original https://aws.amazon.com/blogs/big-data/automate-the-archival-and-deletion-of-sensitive-data-using-amazon-macie/

Customers are looking for ways to securely and cost-efficiently manage large volumes of sensitive data archival and deletion in their data lake by following regulations and data protection and privacy laws, such as GDPR, POPIA, and LGPD. This post describes a way to automatically identify sensitive data stored in your data lake within AWS, tag the data according to its sensitivity level, and apply appropriate lifecycle policies in a secured and cost-effective way.

Amazon Macie is a managed data security and data privacy service that uses machine learning (ML) and pattern matching to discover and protect your sensitive data stored in Amazon Simple Storage Service. (Amazon S3). In this post, we show you how to develop a solution using Macie, Amazon Kinesis Data Firehose, Amazon S3, Amazon EventBridge, and AWS Lambda to identify sensitive data across a large number of S3 buckets, tag them, and apply lifecycle policies for transition and deletion.

Solution overview

The following diagram illustrates the architecture of our solution.

The flow of the solution is as follows:

  1. An Amazon S3 bucket contains multiple objects with different sensitivities of data.
  2. A Macie job analyses the S3 bucket to identify the different sensitivities.
    1. An EventBridge rule is triggered for each finding that Macie generates from the job.
    2. The rule copies the results created by the Macie job to a Kinesis Data Firehose delivery stream.
    3. The delivery stream copies the results to an S3 bucket as a JSON file for future audit requirements.
  3. The arrival of the results triggers a Lambda function that parses the sensitivity metadata from the JSON file.
  4. The function tags the objects in the bucket mentioned in Step 1 and creates an S3 Lifecycle policy based on the sensitivity level of each object and overwrites an S3 Lifecycle policy for each existing object.
  5. The S3 Lifecycle policy moves data to different classes and deletes data based on the configured rules. For example, we implement the following rules:
    1. Archive objects with high sensitivity, tagged as High, after 700 days.
    2. Delete objects tagged as High after 3,000 days.

Create resources with AWS CloudFormation

We provide an AWS CloudFormation template to create the following resources:

  • An S3 bucket named archival-blog-<account_number>-<region_name> as a sample subject bucket as described above.
  • An S3 bucket named archival-blog-results-<account_number>-<region_name> to store the results generated by the Macie job.
  • A Firehose delivery stream to send the results of the Macie job to the S3 bucket.
  • An EventBridge rule that matches the incoming event of a result generated by the Macie job and routes the result to the Firehose delivery stream.
  • A Lambda function to apply tags and S3 Lifecycle policies on the data objects of the subject S3 bucket based on the result generated by the Macie job.
  • AWS Identity and Access Management (IAM) roles and policies with appropriate permissions.

Launch the following stack, providing your stack name:

After the cloud formation stack is deployed, copy  sample_pii.xlsx and recipe.xlsx as sample data for Macie to detect as sensitive data in archival-blog-<account_number>-<region_name>.

Next, we scan the subject S3 bucket for sensitive data to tag and attach the appropriate lifecycle policies.

Configure a Macie job

Macie uses ML and pattern matching to discover and protect your sensitive data in AWS. To configure a Macie job, complete the following steps:

  1. On the Macie console, create a new job by choosing Create Job.
  2. Select the bucket that you want to analyze and choose Next.
  3. Select a schedule if you want to run the job on a schedule, or One-time job if you want to run the job one time.For this post, we select One-time job. We can also choose Scheduled job for periodic jobs in production, but this is out of the scope of this post.
  4. Choose Next.
  5. Enter a name for the job and choose Next.
  6. Review the job details and confirm they’re correct before choosing Submit.

The job immediately starts after you submit it.

Review the results

Whenever the Macie job runs, it generates the following results.

Secondly, the Lambda function tags sensitive object, with Sensitivity : High.

Thirdly, the function creates a corresponding S3 Lifecycle policy.

Clean up

When you’re done with this solution, delete the sample data from the subject S3 bucket, delete all the data objects that are stored as Macie results in the S3 bucket for this post, and then delete the CloudFormation stack to remove all the service resources used in the solution.

Conclusion

In this post, we highlighted how you can use Macie to scan all your data stored on Amazon S3 and how to store your security findings in Amazon S3 using EventBridge and Kinesis Data Firehose. We also explored how you can use Lambda to tag the relevant objects in your subject S3 bucket using the Macie job’s security findings. An S3 Lifecycle transition policy moves tagged objects across different storage classes to help save cost, and an S3 Lifecycle expiration policy deletes objects that have reached the end of their lifecycle. According to the privacy laws like GDPR and POPIA, personal data should be retained as long as the data needs to be retained for legal purposes, or needs to be processed. In this post, we provide a mechanism to allow archival of sensitive data which you might not want to delete immediately, but reduce related storage costs and delete it after a certain period when that data is no longer required. The data archival and deletion periods used above are sample numbers that can be customised within the Lambda function based on requirements. Additionally, please also explore Macie’s different type of findings. You can use these different finding to build capabilities like sending notifications, creating searches in cloud trail S3 object logs for who is accessing specific objects, etc.

If you have any questions, comments, or concerns, please reach out to AWS Support. If you have feedback about this post, submit it in the comments section.


About the Authors

Subhro Bose is a Data Architect in Emergent Technologies and Intelligence Platform in Amazon. He loves working on ways for emergent technologies such as AI/ML, big data, quantum, and more to help businesses across different industry verticals succeed within their innovation journey.

 

 

 

Akshay Chandiramani is Data Analytics Consultant for AWS Professional Services. He is passionate about solving complex big data and MLOps problems for customers to create tangible business outcomes. In his spare time, he enjoys playing snooker and table tennis, and capturing trends on the stock market.

 

 

 

Ikenna Uzoh is a Senior Manager, Business Intelligence (BI) and Analytics at AWS. Before his current role, he was a Senior Big Data and Analytics Architect with AWS Professional Services. He is passionate about building data platforms (data lakes, BI, Machine learning) that help organizations achieve their desired outcomes.

Creating a notification workflow from sensitive data discover with Amazon Macie, Amazon EventBridge, AWS Lambda, and Slack

Post Syndicated from Bruno Silviera original https://aws.amazon.com/blogs/security/creating-a-notification-workflow-from-sensitive-data-discover-with-amazon-macie-amazon-eventbridge-aws-lambda-and-slack/

Following the example of the EU in implementing the General Data Protection Regulation (GDPR), many countries are implementing similar data protection laws. In response, many companies are forming teams that are responsible for data protection. Considering the volume of information that companies maintain, it’s essential that these teams are alerted when sensitive data is at risk.

This post shows how to deploy a solution that uses Amazon Macie to discover sensitive data. This solution enables you to set up automatic notification to your company’s designated data protection team via a Slack channel when sensitive data that needs to be protected is discovered by Amazon EventBridge and AWS Lambda.

The challenge

Let’s imagine that you’re part of a team that’s responsible for classifying your organization’s data but the data structure isn’t documented. Amazon Macie provides you the ability to run a scheduled classification job that examines your data, and you want to notify the data protection team when there’s new sensitive data to classify. Let’s build a solution to automatically notify the data protection team.

Solution overview

To be scalable and cost-effective, this solution uses serverless technologies and managed AWS services, including:

  • Macie – A fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in Amazon Web Services (AWS).
  • EventBridge – A serverless event bus that connects application data from your apps, SaaS, and AWS services. EventBridge can respond to specific events or run according to a schedule. The solution presented in this post uses EventBridge to initiate a custom Lambda function in response to a specific event.
  • Lambda – Runs code in response to events such as changes in data, changes in application state, or user actions. In this solution, a Lambda function is initiated by EventBridge.

Solution architecture

The architecture workflow is shown in Figure 1 and includes the following steps:

  1. Macie runs a classification job and publishes its findings to EventBridge as a JSON object.
  2. The EventBridge rule captures the findings and invokes a Lambda function as a target.
  3. The Lambda function parses the JSON object. The function then sends a custom message to a Slack channel with the sensitive data finding for the data protection team to evaluate and respond to.

 

Figure 1: Solution architecture workflow

Figure 1: Solution architecture workflow

Set up Slack

For this solution, you need a Slack workspace and an incoming webhook. The workspace must be in place before you create the webhook.

Create a Slack workspace

If you already have a Slack workspace in your environment, you can skip forward, to creating the webhook.

If you don’t have a Slack workspace, follow the steps in Create a Slack Workspace to create one.

Create an incoming webhook in Slack API

  1. Go to your Slack API.
  2. Choose Start Building to create an app.
  3. Enter the following details for your app:
    • App Namemacie-to-slack.
    • Development Slack Workspace – Choose the Slack workspace—either an existing workspace or one you created for this solution—to receive the Macie findings.
  4. Choose the Create App button.
  5. In the left menu, choose Incoming Webhooks.
  6. At the Activate Incoming Webhooks screen, move the slider from OFF to ON.
  7. Scroll down and choose Add New Webhook to Workspace.
  8. In the screen asking where your app should post, enter the name of the Slack channel from your Workspace that you want to send notification to and choose Authorize.
  9. On the next screen, scroll down to the Webhook URL section. Make a note of the URL to use later.

Deploy the CloudFormation template with the solution

The deployment of the CloudFormation template automatically creates the following resources:

  • A Lambda function that begins with the name named macie-to-slack-lambdafindingsToSlack-.
  • An EventBridge rule named MacieFindingsToSlack.
  • An IAM role named MacieFindingsToSlackkRole.
  • A permission to invoke the Lambda function named LambdaInvokePermission.

Note: Before you proceed, make sure you’re deploying the template to the same Region that your production Macie is running.

To deploy the Cloudformation template

  1. Download the YAML template to your computer.

    Note: To save the template, you can right click the Raw button at the top of the code and then select Save link as if you’re using Chrome, or the equivalent in your browser. This file is used in Step 4.

  2. Open CloudFormation in the AWS Management Console.
  3. On the Welcome page, choose Create stack and then choose With new resources.
  4. On Step 1 — Specify template, choose Upload a template file, select Choose file and then select the file template.yaml (the file extension might be .YML), then choose Next.
  5. On Step 2 — Specify stack details:
    1. Enter macie-to-slack as the Stack name.
    2. At the Slack Incoming Web Hook URL, paste the webhook URL you copied earlier.
    3. At Slack channel, enter the name of the channel in your workspace that will receive the alerts and choose Next.
    Figure 2: Defining stack details

    Figure 2: Defining stack details

  6. On Step 3 – Configure Stack options, you can leave the default settings, or change them for your environment. Choose Next to continue.
  7. At the bottom of Step 4 – Review, select I acknowledge that AWS CloudFormation might create IAM resources, and choose Create stack.

    Figure 3: Confirmation before stack creation

    Figure 3: Confirmation before stack creation

  8. Wait for the stack to reach status CREATE_COMPLETE.

Running the solution

At this point, you’ve deployed the solution and your resources are created.

To test the solution, you can schedule a Macie job targeting a bucket that contains a file with sensitive information that Macie can detect.

Note: You can check the Amazon Macie documentation to see the list of supported managed data identifiers.

When the Macie job is complete, any findings are sent to the Slack channel.

Figure 4: Macie finding delivered to Slack channel

Figure 4: Macie finding delivered to Slack channel

Select the link in the message sent to the Slack channel to open that finding in the Macie console, as shown in Figure 5.

Figure 5: Finding details

Figure 5: Finding details

And you’re done!

Now your Macie finding results are delivered to your Slack channel where they can be easily monitored, reducing response time and risk exposure.

If you deployed this for testing purposes, or want to clean this up and move to your production account, you can delete the Cloudformation stack:

  1. Open the CloudFormation console.
  2. Select the stack and choose Delete.

Conclusion

In this blog post we walked through the steps to configure a notification workflow using Macie, Lambda, and EventBridge to send sensitive data findings to your data protection team via a Slack channel.

Your data protection team will appreciate the timely notifications of sensitive data findings, giving you the ability to focus on creating controls to improve data security and compliance with regulations related to protection and treatment of personal data.

For more information about data privacy on AWS, see Data Privacy FAQ.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Bruno Silveira

Bruno is a Solutions Architect Manager in the Public Sector team with focus on educational institutions in Brazil. His previous career was in government, financial services, utilities, and nonprofit institutions. Bruno is an enthusiast of cloud security and an appreciator of good rock’n roll with a good beer.

Author

Julio Carvalho

Julio is a Principal Security Solutions Architect at AWS for the Latin American financial market. As a security specialist, he helps customers solve protection and compliance challenges on their cloud journey.

Deploy an automated ChatOps solution for remediating Amazon Macie findings

Post Syndicated from Nick Cuneo original https://aws.amazon.com/blogs/security/deploy-an-automated-chatops-solution-for-remediating-amazon-macie-findings/

The amount of data being collected, stored, and processed by Amazon Web Services (AWS) customers is growing at an exponential rate. In order to keep pace with this growth, customers are turning to scalable cloud storage services like Amazon Simple Storage Service (Amazon S3) to build data lakes at the petabyte scale. Customers are looking for new, automated, and scalable ways to address their data security and compliance requirements, including the need to identify and protect their sensitive data. Amazon Macie helps customers address this need by offering a managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data that is stored in Amazon S3.

In this blog post, I show you how to deploy a solution that establishes an automated event-driven workflow for notification and remediation of sensitive data findings from Macie. Administrators can review and approve remediation of findings through a ChatOps-style integration with Slack. Slack is a business communication tool that provides messaging functionality, including persistent chat rooms known as channels. With this solution, you can streamline the notification, investigation, and remediation of sensitive data findings in your AWS environment.

Prerequisites

Before you deploy the solution, make sure that your environment is set up with the following prerequisites:

Important: This solution uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See the AWS pricing page for details.

Solution overview

The solution architecture and workflow are detailed in Figure 1.

Figure 1: Solution overview

Figure 1: Solution overview

This solution allows for the configuration of auto-remediation behavior based on finding type and finding severity. For each finding type, you can define whether you want the offending S3 object to be automatically quarantined, or whether you want the finding details to be reviewed and approved by a human in Slack prior to being quarantined. In a similar manner, you can define the minimum severity level (Low, Medium, High) that a finding must have before the solution will take action. By adjusting these parameters, you can manage false positives and tune the volume and type of findings about which you want to be notified and take action. This configurability is important because customers have different security, risk, and regulatory requirements.

Figure 1 details the services used in the solution and the integration points between them. Let’s walk through the full sequence from the detection of sensitive data to the remediation (quarantine) of the offending object.

  1. Macie is configured with sensitive data discovery jobs (scheduled or one-time), which you create and run to detect sensitive data within S3 buckets. When Macie runs a job, it uses a combination of criteria and techniques to analyze objects in S3 buckets that you specify. For a full list of the categories of sensitive data Macie can detect, see the Amazon Macie User Guide.
  2. For each sensitive data finding, an event is sent to Amazon EventBridge that contains the finding details. An EventBridge rule triggers a Lambda function for processing.
  3. The Finding Handler Lambda function parses the event and examines the type of the finding. Based on the auto-remediation configuration, the function either invokes the Finding Remediator function for immediate remediation, or sends the finding details for manual review and remediation approval through Slack.
  4. Delegated security and compliance administrators monitor the configured Slack channel for notifications. Notifications provide high-level finding information, remediation status, and a link to the Macie console for the finding in question. For findings configured for manual review, administrators can choose to approve the remediation in Slack by using an action button on the notification.
  5. After an administrator chooses the Remediate button, Slack issues an API call to an Amazon API Gateway endpoint, supplying both the unique identifier of the finding to be remediated and that of the Slack user. API Gateway proxies the request to a Remediation Handler Lambda function.
  6. The Remediation Handler Lambda function validates the request and request signature, extracts the offending object’s location from the finding, and makes an asynchronous call to the Finding Remediator Lambda function.
  7. The Finding Remediator Lambda function moves the offending object from the source bucket to a designated S3 quarantine bucket with restricted access.
  8. Finally, the Finding Remediator Lambda function uses a callback URL to update the original finding notification in Slack, indicating that the offending object has now been quarantined.

Deploy the solution

Now we’ll walk through the steps for configuring Slack and deploying the solution into your AWS environment by using the AWS CDK. The AWS CDK is a software development framework that you can use to define cloud infrastructure in code and provision through AWS CloudFormation.

The deployment steps can be summarized as follows:

  1. Configure a Slack channel and app
  2. Check the project out from GitHub
  3. Set the configuration parameters
  4. Build and deploy the solution
  5. Configure Slack with an API Gateway endpoint

To configure a Slack channel and app

  1. In your browser, make sure you’re logged into the Slack workspace where you want to integrate the solution.
  2. Create a new channel where you will send the notifications, as follows:
    1. Choose the + icon next to the Channels menu, and select Create a channel.
    2. Give your channel a name, for example macie-findings, and make sure you turn on the Make private setting.

      Important: By providing Slack users with access to this configured channel, you’re providing implicit access to review Macie finding details and approve remediations. To avoid unwanted user access, it’s strongly recommended that you make this channel private and by invite only.

  3. On your Apps page, create a new app by selecting Create New App, and then enter the following information:
    1. For App Name, enter a name of your choosing, for example MacieRemediator.
    2. Select your chosen development Slack workspace that you logged into in step 1.
    3. Choose Create App.
    Figure 2: Create a Slack app

    Figure 2: Create a Slack app

  4. You will then see the Basic Information page for your app. Scroll down to the App Credentials section, and note down the Signing Secret. This secret will be used by the Lambda function that handles all remediation requests from Slack. The function uses the secret with Hash-based Message Authentication Code (HMAC) authentication to validate that requests to the solution are legitimate and originated from your trusted Slack channel.

    Figure 3: Signing secret

    Figure 3: Signing secret

  5. Scroll back to the top of the Basic Information page, and under Add features and functionality, select the Incoming Webhooks tile. Turn on the Activate Incoming Webhooks setting.
  6. At the bottom of the page, choose Add New Webhook to Workspace.
    1. Select the macie-findings channel you created in step 2, and choose Allow.
    2. You should now see webhook URL details under Webhook URLs for Your Workspace. Use the Copy button to note down the URL, which you will need later.

      Figure 4: Webhook URL

      Figure 4: Webhook URL

To check the project out from GitHub

The solution source is available on GitHub in AWS Samples. Clone the project to your local machine or download and extract the available zip file.

To set the configuration parameters

In the root directory of the project you’ve just cloned, there’s a file named cdk.json. This file contains configuration parameters to allow integration with the macie-findings channel you created earlier, and also to allow you to control the auto-remediation behavior of the solution. Open this file and make sure that you review and update the following parameters:

  • autoRemediateConfig – This nested attribute allows you to specify for each sensitive data finding type whether you want to automatically remediate and quarantine the offending object, or first send the finding to Slack for human review and authorization. Note that you will still be notified through Slack that auto-remediation has taken place if this attribute is set to AUTO. Valid values are either AUTO or REVIEW. You can use the default values.
  • minSeverityLevel – Macie assigns all findings a Severity level. With this parameter, you can define a minimum severity level that must be met before the solution will trigger action. For example, if the parameter is set to MEDIUM, the solution won’t take any action or send any notifications when a finding has a LOW severity, but will take action when a finding is classified as MEDIUM or HIGH. Valid values are: LOW, MEDIUM, and HIGH. The default value is set to LOW.
  • slackChannel – The name of the Slack channel you created earlier (macie-findings).
  • slackWebHookUrl – For this parameter, enter the webhook URL that you noted down during Slack app setup in the “Configure a Slack channel and app” step.
  • slackSigningSecret – For this parameter, enter the signing secret that you noted down during Slack app setup.

Save your changes to the configuration file.

To build and deploy the solution

  1. From the command line, make sure that your current working directory is the root directory of the project that you cloned earlier. Run the following commands:
    • npm install – Installs all Node.js dependencies.
    • npm run build – Compiles the CDK TypeScript source.
    • cdk bootstrap – Initializes the CDK environment in your AWS account and Region, as shown in Figure 5.

      Figure 5: CDK bootstrap output

      Figure 5: CDK bootstrap output

    • cdk deploy – Generates a CloudFormation template and deploys the solution resources.

    The resources created can be reviewed in the CloudFormation console and can be summarized as follows:

    • Lambda functions – Finding Handler, Remediation Handler, and Remediator
    • IAM execution roles and associated policy – The roles and policy associated with each Lambda function and the API Gateway
    • S3 bucket – The quarantine S3 bucket
    • EventBridge rule – The rule that triggers the Lambda function for Macie sensitive data findings
    • API Gateway – A single remediation API with proxy integration to the Lambda handler
  2. After you run the deploy command, you’ll be prompted to review the IAM resources deployed as part of the solution. Press y to continue.
  3. Once the deployment is complete, you’ll be presented with an output parameter, shown in Figure 6, which is the endpoint for the API Gateway that was deployed as part of the solution. Copy this URL.

    Figure 6: CDK deploy output

    Figure 6: CDK deploy output

To configure Slack with the API Gateway endpoint

  1. Open Slack and return to the Basic Information page for the Slack app you created earlier.
  2. Under Add features and functionality, select the Interactive Components tile.
  3. Turn on the Interactivity setting.
  4. In the Request URL box, enter the API Gateway endpoint URL you copied earlier.
  5. Choose Save Changes.

    Figure 7: Slack app interactivity

    Figure 7: Slack app interactivity

Now that you have the solution components deployed and Slack configured, it’s time to test things out.

Test the solution

The testing steps can be summarized as follows:

  1. Upload dummy files to S3
  2. Run the Macie sensitive data discovery job
  3. Review and act upon Slack notifications
  4. Confirm that S3 objects are quarantined

To upload dummy files to S3

Two sample text files containing dummy financial and personal data are available in the project you cloned from GitHub. If you haven’t changed the default auto-remediation configurations, these two files will exercise both the auto-remediation and manual remediation review flows.

Find the files under sensitive-data-samples/dummy-financial-data.txt and sensitive-data-samples/dummy-personal-data.txt. Take these two files and upload them to S3 by using either the console, as shown in Figure 8, or AWS CLI. You can choose to use any new or existing bucket, but make sure that the bucket is in the same AWS account and Region that was used to deploy the solution.

Figure 8: Dummy files uploaded to S3

Figure 8: Dummy files uploaded to S3

To run a Macie sensitive data discovery job

  1. Navigate to the Amazon Macie console, and make sure that your selected Region is the same as the one that was used to deploy the solution.
    1. If this is your first time using Macie, choose the Get Started button, and then choose Enable Macie.
  2. On the Macie Summary dashboard, you will see a Create Job button at the top right. Choose this button to launch the Job creation wizard. Configure each step as follows:
    1. Select S3 buckets: Select the bucket where you uploaded the dummy sensitive data file. Choose Next.
    2. Review S3 buckets: No changes are required, choose Next.
    3. Scope: For Job type, choose One-time job. Make sure Sampling depth is set to 100%. Choose Next.
    4. Custom data identifiers: No changes are required, choose Next.
    5. Name and description: For Job name, enter any name you like, such as Dummy job, and then choose Next.
    6. Review and create: Review your settings; they should look like the following sample. Choose Submit.
Figure 9: Configure the Macie sensitive data discovery job

Figure 9: Configure the Macie sensitive data discovery job

Macie will launch the sensitive data discovery job. You can track its status from the Jobs page within the Macie console.

To review and take action on Slack notifications

Within five minutes of submitting the data discovery job, you should expect to see two notifications appear in your configured Slack channel. One notification, similar to the one in Figure 10, is informational only and is related to an auto-remediation action that has taken place.

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

The other notification, similar to the one in Figure 11, requires end user action and is for a finding that requires administrator review. All notifications will display key information such as the offending S3 object, a description of the finding, the finding severity, and other relevant metadata.

Figure 11: Slack notification for human review of the file containing dummy personal data

Figure 11: Slack notification for human review of the file containing dummy personal data

(Optional) You can review the finding details by choosing the View Macie Finding in Console link in the notification.

In the Slack notification, choose the Remediate button to quarantine the object. The notification will be updated with confirmation of the quarantine action, as shown in Figure 12.

Figure 12: Slack notification of authorized remediation

Figure 12: Slack notification of authorized remediation

To confirm that S3 objects are quarantined

Finally, navigate to the S3 console and validate that the objects have been removed from their original bucket and placed into the quarantine bucket listed in the notification details, as shown in Figure 13. Note that you may need to refresh your S3 object listing in the browser.

Figure 13: Slack notification of authorized remediation

Figure 13: Slack notification of authorized remediation

Congratulations! You now have a fully operational solution to detect and respond to Macie sensitive data findings through a Slack ChatOps workflow.

Solution cleanup

To remove the solution and avoid incurring additional charges from the AWS resources that you deployed, complete the following steps.

To remove the solution and associated resources

  1. Navigate to the Macie console. Under Settings, choose Suspend Macie.
  2. Navigate to the S3 console and delete all objects in the quarantine bucket.
  3. Run the command cdk destroy from the command line within the root directory of the project. You will be prompted to confirm that you want to remove the solution. Press y.

Summary

In this blog post, I showed you how to integrate Amazon Macie sensitive data findings with an auto-remediation and Slack ChatOps workflow. We reviewed the AWS services used, how they are integrated, and the steps to configure, deploy, and test the solution. With Macie and the solution in this blog post, you can substantially reduce the heavy lifting associated with detecting and responding to sensitive data in your AWS environment.

I encourage you to take this solution and customize it to your needs. Further enhancements could include supporting policy findings, adding additional remediation actions, or integrating with additional findings from AWS Security Hub.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Nick Cuneo

Nick is an Enterprise Solutions Architect at AWS who works closely with Australia’s largest financial services organisations. His previous roles span operations, software engineering, and design. Nick is passionate about application and network security, automation, microservices, and event driven architectures. Outside of work, he enjoys motorsport and is found most weekends in his garage wrenching on cars.