Tag Archives: Amazon Macie

Building AWS Lambda governance and guardrails

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-aws-lambda-governance-and-guardrails/

When building serverless applications using AWS Lambda, there are a number of considerations regarding security, governance, and compliance. This post highlights how Lambda, as a serverless service, simplifies cloud security and compliance so you can concentrate on your business logic. It covers controls that you can implement for your Lambda workloads to ensure that your applications conform to your organizational requirements.

The Shared Responsibility Model

The AWS Shared Responsibility Model distinguishes between what AWS is responsible for and what customers are responsible for with cloud workloads. AWS is responsible for “Security of the Cloud” where AWS protects the infrastructure that runs all the services offered in the AWS Cloud. Customers are responsible for “Security in the Cloud”, managing and securing their workloads. When building traditional applications, you take on responsibility for many infrastructure services, including operating systems and network configuration.

Traditional application shared responsibility

Traditional application shared responsibility

One major benefit when building serverless applications is shifting more responsibility to AWS so you can concentrate on your business applications. AWS handles managing and patching the underlying servers, operating systems, and networking as part of running the services.

Serverless application shared responsibility

Serverless application shared responsibility

For Lambda, AWS manages the application platform where your code runs, which includes patching and updating the managed language runtimes. This reduces the attack surface while making cloud security simpler. You are responsible for the security of your code and AWS Identity and Access Management (IAM) to the Lambda service and within your function.

Lambda is SOCHIPAAPCI, and ISO-compliant. For more information, see Compliance validation for AWS Lambda and the latest Lambda certification and compliance readiness services in scope.

Lambda isolation

Lambda functions run in separate isolated AWS accounts that are dedicated to the Lambda service. Lambda invokes your code in a secure and isolated runtime environment within the Lambda service account. A runtime environment is a collection of resources running in a dedicated hardware-virtualized Micro Virtual Machines (MVM) on a Lambda worker node.

Lambda workers are bare metalEC2 Nitro instances, which are managed and patched by the Lambda service team. They have a maximum lease lifetime of 14 hours to keep the underlying infrastructure secure and fresh. MVMs are created by Firecracker, an open source virtual machine monitor (VMM) that uses Linux’s Kernel-based Virtual Machine (KVM) to create and manage MVMs securely at scale.

MVMs maintain a strong separation between runtime environments at the virtual machine hardware level, which increases security. Runtime environments are never reused across functions, function versions, or AWS accounts.

Isolation model for AWS Lambda workers

Isolation model for AWS Lambda workers

Network security

Lambda functions always run inside secure Amazon Virtual Private Cloud (Amazon VPCs) owned by the Lambda service. This gives the Lambda function access to AWS services and the public internet. There is no direct network inbound access to Lambda workers, runtime environments, or Lambda functions. All inbound access to a Lambda function only comes via the Lambda Invoke API, which sends the event object to the function handler.

You can configure a Lambda function to connect to private subnets in a VPC in your account if necessary, which you can control with IAM condition keys . The Lambda function still runs inside the Lambda service VPC but sends all network traffic through your VPC. Function outbound traffic comes from your own network address space.

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

To give your VPC-connected function access to the internet, route outbound traffic to a NAT gateway in a public subnet. Connecting a function to a public subnet doesn’t give it internet access or a public IP address, as the function is still running in the Lambda service VPC and then routing network traffic into your VPC.

All internal AWS traffic uses the AWS Global Backbone rather than traversing the internet. You do not need to connect your functions to a VPC to avoid connectivity to AWS services over the internet. VPC connected functions allow you to control and audit outbound network access.

You can use security groups to control outbound traffic for VPC-connected functions and network ACLs to block access to CIDR IP ranges or ports. VPC endpoints allow you to enable private communications with supported AWS services without internet access.

You can use VPC Flow Logs to audit traffic going to and from network interfaces in your VPC.

Runtime environment re-use

Each runtime environment processes a single request at a time. After Lambda finishes processing the request, the runtime environment is ready to process an additional request for the same function version. For more information on how Lambda manages runtime environments, see Understanding AWS Lambda scaling and throughput.

Data can persist in the local temporary filesystem path, in globally scoped variables, and in environment variables across subsequent invocations of the same function version. Ensure that you only handle sensitive information within individual invocations of the function by processing it in the function handler, or using local variables. Do not re-use files in the local temporary filesystem to process unencrypted sensitive data. Do not put sensitive or confidential information into Lambda environment variables, tags, or other freeform fields such as Name fields.

For more Lambda security information, see the Lambda security whitepaper.

Multiple accounts

AWS recommends using multiple accounts to isolate your resources because they provide natural boundaries for security, access, and billing. Use AWS Organizations to manage and govern individual member accounts centrally. You can use AWS Control Tower to automate many of the account build steps and apply managed guardrails to govern your environment. These include preventative guardrails to limit actions and detective guardrails to detect and alert on non-compliance resources for remediation.

Lambda access controls

Lambda permissions define what a Lambda function can do, and who or what can invoke the function. Consider the following areas when applying access controls to your Lambda functions to ensure least privilege:

Execution role

Lambda functions have permission to access other AWS resources using execution roles. This is an AWS principal that the Lambda service assumes which grants permissions using identity policy statements assigned to the role. The Lambda service uses this role to fetch and cache temporary security credentials, which are then available as environment variables during a function’s invocation. It may re-use them across different runtime environments that use the same execution role.

Ensure that each function has its own unique role with the minimum set of permissions..

Identity/user policies

IAM identity policies are attached to IAM users, groups, or roles. These policies allow users or callers to perform operations on Lambda functions. You can restrict who can create functions, or control what functions particular users can manage.

Resource policies

Resource policies define what identities have fine-grained inbound access to managed services. For example, you can restrict which Lambda function versions can add events to a specific Amazon EventBridge event bus. You can use resource-based policies on Lambda resources to control what AWS IAM identities and event sources can invoke a specific version or alias of your function. You also use a resource-based policy to allow an AWS service to invoke your function on your behalf. To see which services support resource-based policies, see “AWS services that work with IAM”.

Attribute-based access control (ABAC)

With attribute-based access control (ABAC), you can use tags to control access to your Lambda functions. With ABAC, you can scale an access control strategy by setting granular permissions with tags without requiring permissions updates for every new user or resource as your organization scales. You can also use tag policies with AWS Organizations to standardize tags across resources.

Permissions boundaries

Permissions boundaries are a way to delegate permission management safely. The boundary places a limit on the maximum permissions that a policy can grant. For example, you can use boundary permissions to limit the scope of the execution role to allow only read access to databases. A builder with permission to manage a function or with write access to the applications code repository cannot escalate the permissions beyond the boundary to allow write access.

Service control policies

When using AWS Organizations, you can use Service control policies (SCPs) to manage permissions in your organization. These provide guardrails for what actions IAM users and roles within the organization root or OUs can do. For more information, see the AWS Organizations documentation, which includes example service control policies.

Code signing

As you are responsible for the code that runs in your Lambda functions, you can ensure that only trusted code runs by using code signing with the AWS Signer service. AWS Signer digitally signs your code packages and Lambda validates the code package before accepting the deployment, which can be part of your automated software deployment process.

Auditing Lambda configuration, permissions and access

You should audit access and permissions regularly to ensure that your workloads are secure. Use the IAM console to view when an IAM role was last used.

IAM last used

IAM last used

IAM access advisor

Use IAM access advisor on the Access Advisor tab in the IAM console to review when was the last time an AWS service was used from a specific IAM user or role. You can use this to remove IAM policies and access from your IAM roles.

IAM access advisor

IAM access advisor

AWS CloudTrail

AWS CloudTrail helps you monitor, log, and retain account activity to provide a complete event history of actions across your AWS infrastructure. You can monitor Lambda API actions to ensure that only appropriate actions are made against your Lambda functions. These include CreateFunction, DeleteFunction, CreateEventSourceMapping, AddPermission, UpdateEventSourceMapping,  UpdateFunctionConfiguration, and UpdateFunctionCode.

AWS CloudTrail

AWS CloudTrail

IAM Access Analyzer

You can validate policies using IAM Access Analyzer, which provides over 100 policy checks with security warnings for overly permissive policies. To learn more about policy checks provided by IAM Access Analyzer, see “IAM Access Analyzer policy validation”.

You can also generate IAM policies based on access activity from CloudTrail logs, which contain the permissions that the role used in your specified date range.

IAM Access Analyzer

IAM Access Analyzer

AWS Config

AWS Config provides you with a record of the configuration history of your AWS resources. AWS Config monitors the resource configuration and includes rules to alert when they fall into a non-compliant state.

For Lambda, you can track and alert on changes to your function configuration, along with the IAM execution role. This allows you to gather Lambda function lifecycle data for potential audit and compliance requirements. For more information, see the Lambda Operators Guide.

AWS Config includes Lambda managed config rules such as lambda-concurrency-check, lambda-dlq-check, lambda-function-public-access-prohibited, lambda-function-settings-check, and lambda-inside-vpc. You can also write your own rules.

There are a number of other AWS services to help with security compliance.

  1. AWS Audit Manager: Collect evidence to help you audit your use of cloud services.
  2. Amazon GuardDuty: Detect unexpected and potentially unauthorized activity in your AWS environment.
  3. Amazon Macie: Evaluates your content to identify business-critical or potentially confidential data.
  4. AWS Trusted Advisor: Identify opportunities to improve stability, save money, or help close security gaps.
  5. AWS Security Hub: Provides security checks and recommendations across your organization.

Conclusion

Lambda makes cloud security simpler by taking on more responsibility using the AWS Shared Responsibility Model. Lambda implements strict workload security at scale to isolate your code and prevent network intrusion to your functions. This post provides guidance on assessing and implementing best practices and tools for Lambda to improve your security, governance, and compliance controls. These include permissions, access controls, multiple accounts, and code security. Learn how to audit your function permissions, configuration, and access to ensure that your applications conform to your organizational requirements.

For more serverless learning resources, visit Serverless Land.

AWS Week in Review – August 1, 2022

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-august-1-2022/

AWS re:Inforce returned to Boston last week, kicking off with a keynote from Amazon Chief Security Officer Steve Schmidt and AWS Chief Information Security officer C.J. Moses:

Be sure to take some time to watch this video and the other leadership sessions, and to use what you learn to take some proactive steps to improve your security posture.

Last Week’s Launches
Here are some launches that caught my eye last week:

AWS Wickr uses 256-bit end-to-end encryption to deliver secure messaging, voice, and video calling, including file sharing and screen sharing, across desktop and mobile devices. Each call, message, and file is encrypted with a new random key and can be decrypted only by the intended recipient. AWS Wickr supports logging to a secure, customer-controlled data store for compliance and auditing, and offers full administrative control over data: permissions, ephemeral messaging options, and security groups. You can now sign up for the preview.

AWS Marketplace Vendor Insights helps AWS Marketplace sellers to make security and compliance data available through AWS Marketplace in the form of a unified, web-based dashboard. Designed to support governance, risk, and compliance teams, the dashboard also provides evidence that is backed by AWS Config and AWS Audit Manager assessments, external audit reports, and self-assessments from software vendors. To learn more, read the What’s New post.

GuardDuty Malware Protection protects Amazon Elastic Block Store (EBS) volumes from malware. As Danilo describes in his blog post, a malware scan is initiated when Amazon GuardDuty detects that a workload running on an EC2 instance or in a container appears to be doing something suspicious. The new malware protection feature creates snapshots of the attached EBS volumes, restores them within a service account, and performs an in-depth scan for malware. The scanner supports many types of file systems and file formats and generates actionable security findings when malware is detected.

Amazon Neptune Global Database lets you build graph applications that run across multiple AWS Regions using a single graph database. You can deploy a primary Neptune cluster in one region and replicate its data to up to five secondary read-only database clusters, with up to 16 read replicas each. Clusters can recover in minutes in the result of an (unlikely) regional outage, with a Recovery Point Objective (RPO) of 1 second and a Recovery Time Objective (RTO) of 1 minute. To learn a lot more and see this new feature in action, read Introducing Amazon Neptune Global Database.

Amazon Detective now Supports Kubernetes Workloads, with the ability to scale to thousands of container deployments and millions of configuration changes per second. It ingests EKS audit logs to capture API activity from users, applications, and the EKS control plane, and correlates user activity with information gleaned from Amazon VPC flow logs. As Channy notes in his blog post, you can enable Amazon Detective and take advantage of a free 30 day trial of the EKS capabilities.

AWS SSO is Now AWS IAM Identity Center in order to better represent the full set of workforce and account management capabilities that are part of IAM. You can create user identities directly in IAM Identity Center, or you can connect your existing Active Directory or standards-based identify provider. To learn more, read this post from the AWS Security Blog.

AWS Config Conformance Packs now provide you with percentage-based scores that will help you track resource compliance within the scope of the resources addressed by the pack. Scores are computed based on the product of the number of resources and the number of rules, and are reported to Amazon CloudWatch so that you can track compliance trends over time. To learn more about how scores are computed, read the What’s New post.

Amazon Macie now lets you perform one-click temporary retrieval of sensitive data that Macie has discovered in an S3 bucket. You can retrieve up to ten examples at a time, and use these findings to accelerate your security investigations. All of the data that is retrieved and displayed in the Macie console is encrypted using customer-managed AWS Key Management Service (AWS KMS) keys. To learn more, read the What’s New post.

AWS Control Tower was updated multiple times last week. CloudTrail Organization Logging creates an org-wide trail in your management account to automatically log the actions of all member accounts in your organization. Control Tower now reduces redundant AWS Config items by limiting recording of global resources to home regions. To take advantage of this change you need to update to the latest landing zone version and then re-register each Organizational Unit, as detailed in the What’s New post. Lastly, Control Tower’s region deny guardrail now includes AWS API endpoints for AWS Chatbot, Amazon S3 Storage Lens, and Amazon S3 Multi Region Access Points. This allows you to limit access to AWS services and operations for accounts enrolled in your AWS Control Tower environment.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items and customer stories that you may find interesting:

AWS Open Source News and Updates – My colleague Ricardo Sueiras writes a weekly open source newsletter and highlights new open source projects, tools, and demos from the AWS community. Read installment #122 here.

Growy Case Study – This Netherlands-based company is building fully-automated robot-based vertical farms that grow plants to order. Read the case study to learn how they use AWS IoT and other services to monitor and control light, temperature, CO2, and humidity to maximize yield and quality.

Journey of a Snap on Snapchat – This video shows you how a snapshot flows end-to-end from your camera to AWS, to your friends. With over 300 million daily active users, Snap takes advantage of Amazon Elastic Kubernetes Service (EKS), Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), Amazon CloudFront, and many other AWS services, storing over 400 terabytes of data in DynamoDB and managing over 900 EKS clusters.

Cutting Cardboard Waste – Bin packing is almost certainly a part of every computer science curriculum! In the linked article from the Amazon Science site, you can learn how an Amazon Principal Research Scientist developed PackOpt to figure out the optimal set of boxes to use for shipments from Amazon’s global network of fulfillment centers. This is an NP-hard problem and the article describes how they build a parallelized solution that explores a multitude of alternative solutions, all running on AWS.

Upcoming Events
Check your calendar and sign up for these online and in-person AWS events:

AWS SummitAWS Global Summits – AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Registrations are open for the following AWS Summits in August:

Imagine Conference 2022IMAGINE 2022 – The IMAGINE 2022 conference will take place on August 3 at the Seattle Convention Center, Washington, USA. It’s a no-cost event that brings together education, state, and local leaders to learn about the latest innovations and best practices in the cloud. You can register here.

That’s all for this week. Check back next Monday for another Week in Review!

Jeff;

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Correlate IAM Access Analyzer findings with Amazon Macie

Post Syndicated from Nihar Das original https://aws.amazon.com/blogs/security/correlate-iam-access-analyzer-findings-with-amazon-macie/

In this blog post, you’ll learn how to detect when unintended access has been granted to sensitive data in Amazon Simple Storage Service (Amazon S3) buckets in your Amazon Web Services (AWS) accounts.

It’s critical for your enterprise to understand where sensitive data is stored in your organization and how and why it is shared. The ability to efficiently find data that is shared with entities outside your account and the contents of that data is paramount. You need a process to quickly detect and report which accounts have access to sensitive data. Amazon Macie is an AWS service that can detect many sensitive data types. Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and help protect your sensitive data in AWS.

AWS Identity and Access Management (IAM) Access Analyzer helps to identify resources in your organization and accounts, such as S3 buckets or IAM roles, that are shared with an external entity. When you enable IAM Access Analyzer, you create an analyzer for your entire organization or your account. The organization or account you choose is known as the zone of trust for the analyzer. The analyzer monitors the supported resources within your zone of trust. This analyzer enables IAM Access Analyzer to detect each instance of a resource shared outside the zone of trust and generates a finding about the resource and the external principals that have access to it.

Currently, you can use IAM Access Analyzer and Macie to detect external access and discover sensitive data as separate processes. You can join the findings from both to best evaluate the risk. The solution in this post integrates IAM Access Analyzer, Macie, and AWS Security Hub to automate the process of correlating findings between the services and presenting them in Security Hub.

How does the solution work?

First, IAM Access Analyzer discovers S3 buckets that are shared outside the zone of trust. Next, the solution schedules a Macie sensitive data discovery job for each of these buckets to determine if the bucket contains sensitive data. Upon discovery of shared sensitive data in S3, a custom high severity finding is created in Security Hub for review and incident response.

Solution architecture

This solution is based on a serverless architecture, and uses the following services:

Figure 1: Architecture diagram

Figure 1: Architecture diagram

Figure 1 depicts the following process flow:

  1. IAM Access Analyzer detects shared S3 buckets outside of the zone of trust—the organization or account you choose is known as a zone of trust for the analyzer—and creates the event Access Analyzer Finding in EventBridge.
  2. EventBridge triggers the Lambda function sda-aa-save-findings.
  3. The sda-aa-save-findings function records each finding in DynamoDB.
  4. An EventBridge scheduled event periodically starts a new cycle of the Step Function state machine, which immediately runs the Lambda function sda-macie-submit-scan. The template sets a 15-minute interval, but this is configurable.
  5. The sda-macie-submit-scan function reads the IAM Access Analyzer findings that were created by sda-aa-save-findings from DynamoDB.
  6. sda-macie-submit-scan launches a Macie classification job for each distinct S3 bucket that is related to one or more recent IAM Access Analyzer findings.
  7. Macie performs a sensitive discovery scan on each requested S3 bucket.
  8. The sda-macie-submit-scan function initiates the Lambda function sda-macie-check-status.
  9. sda-macie-check-status periodically checks the status of each Macie classification job, waiting for all the Macie jobs initiated by this solution to complete.
  10. Upon completion of the sda-macie-check-status function, the step function runs the Lambda function sda-sh-create-findings.
  11. sda-sh-create-findings joins the resulting IAM Access Analyzer and Macie datasets for each S3 bucket.
  12. sda-sh-create-findings publishes a finding to Security Hub for each bucket that has both external access and sensitive data.

    Note: The Macie scan is skipped if the S3 bucket is tagged to be excluded or if it was recently scanned by Macie. See the Cost considerations section for more information on custom configurations.

  13. Information security can review and act on the findings shown in Security Hub.

Sample Security Hub output

Figure 2 shows the sample findings that Security Hub will present. Each finding includes:

  • Severity
  • Workflow status
  • Record state
  • Company
  • Product
  • Title
  • Resource
Figure 2: Sample Security Hub findings

Figure 2: Sample Security Hub findings

The output to Security Hub will display a severity of HIGH with workflow NEW, because this is the first time the event has been observed. The record state is ACTIVE because the workflow state is NEW. The title explains the reason for the event.

For example, if potentially sensitive data is discovered in a bucket that is shared outside a zone of trust, selecting an event will display the resources involved in the finding so you can investigate. For more information, see the Security Hub User Guide.

Notes:

  • Detection of public S3 buckets by IAM Access Analyzer will still occur through Security Hub and will be marked as critical severity. This solution does not add to or augment this finding in Security Hub.
  • If a finding in IAM Access Analyzer is archived, the solution does not update the related finding in Security Hub.

Prerequisites

To use this solution, you need the following:

  • Permission to run AWS CloudFormation
  • Permission to create Lambda functions
  • Permission to create DynamoDB tables
  • Permission to create Step Function state machines
  • Permission to create EventBridge event rules
  • Permission to enable IAM Access Analyzer on the account where sensitive discovery is required
  • Permission to enable Macie on the account
  • Permission to enable Security Hub on the account

Deploy the solution

The solution is deployed through AWS CloudFormation, and you can review the template for options to best suit your specific needs.

  1. Sign in to your AWS account located at https://aws.amazon.com/console/.
  2. In the AWS Management Console, navigate to the AWS CloudFormation service, and then choose Create stack.
  3. Under Prerequisite – Prepare template, choose Template is ready.
  4. Under Specify template, choose Amazon S3 URL and provide the following URL:
    https://awsiammedia.s3.amazonaws.com/public/sample/936-correlating-aa-findings-macie/sda-cfn.yml
  5. Choose Next.
  6. Enter the stack name.
  7. The Application code location, S3 Bucket and S3 Key fields will be pre-filled.
  8. Under Service Activations, modify the activations based on the services you presently have running in your account.
  9. Modify the Logging and Monitoring settings if required.
  10. (Optional) Set an alert email address for errors.
  11. Choose Next, then choose Next again.
  12. Under Capabilities, select the check box.
  13. Choose Create Stack. The solution will begin deploying; watch for the CREATE_COMPLETE message.
Figure 3: Sample CloudFormation deployment status

Figure 3: Sample CloudFormation deployment status

The solution is now deployed and will start monitoring for sensitive data that is being shared. It will send the findings to Security Hub for your teams to investigate.

Cost considerations

When you scan large S3 buckets with sensitive data, remember that Macie cost is based on the amount of data scanned. For more information on Macie costs, see Amazon Macie pricing.

This solution allows the following options, which you can use to help manage costs:

  • Use environment variables in Lambda to skip specific tagged buckets
  • Skip recently scanned S3 buckets and reuse prior findings
Figure 4: Screen shot of configurable environment variable

Figure 4: Screen shot of configurable environment variable

Conclusion

In this post, we discussed how the solution uses Lambda, Step Functions and EventBridge to integrate IAM Access Analyzer with Macie discovery jobs. We reviewed the components of the application, deployed it by using CloudFormation, and reviewed the output a security team would use to take the appropriate actions. We also provided two ways that you can manage the costs associated with the solution.

After you deploy this project, you can modify it to meet your organization’s needs. For example, you can modify the tags to skip specific S3 buckets your organization has already classified to hold sensitive data. Customers who use multiple AWS accounts can designate a centralized Security Hub administrator account to receive the solution alerts from each member account. For more information on this option, see Designating a Security Hub administrator account.

If you have feedback about this post, please submit it in the Comments section below. If you have questions about this post, please start a new thread on the AWS Identity and Access Management forum.

Other resources

For more information on correlating security findings with AWS Security Hub and Amazon EventBridge, refer to this blog post.

Want more AWS Security news? Follow us on Twitter.

Nihar Das

Nihar Das

Nihar has over 20 years of experience in various business domains including financial services. As an AWS Senior Solutions Architect, he is passionate about solving challenges in the cloud and helps financial services customers to migrate to AWS and support the continued innovation.

Joe Dunn

Joe Dunn

Joe is an AWS Senior Solutions Architect in Financial Services with over 20 years of experience in infrastructure architecture and migration of business-critical loads to AWS. He helps financial services customers to innovate on the AWS Cloud by providing solutions using AWS products and services.

Armand Aquino

Armand Aquino

Armand is a solutions architect helping financial services organizations design their critical workloads on AWS. In his spare time, he enjoys exploring outdoors and learning Korean.

Journey to Adopt Cloud-Native Architecture Series #5 – Enhancing Threat Detection, Data Protection, and Incident Response

Post Syndicated from Anuj Gupta original https://aws.amazon.com/blogs/architecture/journey-to-adopt-cloud-native-architecture-series-5-enhancing-threat-detection-data-protection-and-incident-response/

In Part 4 of this series, Governing Security at Scale and IAM Baselining, we discussed building a multi-account strategy and improving access management and least privilege to prevent unwanted access and to enforce security controls.

As a refresher from previous posts in this series, our example e-commerce company’s “Shoppers” application runs in the cloud. The company experienced hypergrowth, which posed a number of platform and technology challenges, including enforcing security and governance controls to mitigate security risks.

With the pace of new infrastructure and software deployments, we had to ensure we maintain strong security. This post, Part 5, shows how we detect security misconfigurations, indicators of compromise, and other anomalous activity. We also show how we developed and iterated on our incident response processes.

Threat detection and data protection

With our newly acquired customer base from hypergrowth, we had to make sure we maintained customer trust. We also needed to detect and respond to security events quickly to reduce the scope and impact of any unauthorized activity. We were concerned about vulnerabilities on our public-facing web servers, accidental sensitive data exposure, and other security misconfigurations.

Prior to hypergrowth, application teams scanned for vulnerabilities and maintained the security of their applications. After hypergrowth, we established dedicated security team and identified tools to simplify the management of our cloud security posture. This allowed us to easily identify and prioritize security risks.

Use AWS security services to detect threats and misconfigurations

We use the following AWS security services to simplify the management of cloud security risks and reduce the burden of third-party integrations. This also minimizes the amount of engineering work required by our security team.

Detect threats with Amazon GuardDuty

We use Amazon GuardDuty to keep up with the newest threat actor tactics, techniques, and procedures (TTPs) and indicators of compromise (IOCs).

GuardDuty saves us time and reduces complexity, because we don’t have to continuously engineer detections for new TTPs and IOCs for static events and machine-learning-based detections. This allows our security analysts to focus on building runbooks and quickly responding to security findings.

Discover sensitive data with Amazon Macie for Amazon S3

To host our external website, we use a few public Amazon Simple Storage Service (Amazon S3) buckets with static assets. We don’t want developers to accidentally put sensitive data in these buckets, and we wanted to understand which S3 buckets contain sensitive information, such as financial or personally identifiable information (PII).

We explored building a custom scanner to search for sensitive data, but maintaining the search patterns was too complex. It was also costly to continuously re-scan files each month. Therefore, we use Amazon Macie to continuously scan our S3 buckets for sensitive data. After Macie makes its initial scan, it will only scan new or updated objects in those S3 buckets, which reduces our costs significantly. We added filter rules to exclude files of larger size and S3 prefixes to scan required objects and provided a sampling rate to further cost optimize scanning large S3 buckets (in our case, S3 buckets greater than 1 TB).

Scan for vulnerabilities with Amazon Inspector

Because we use a wide variety of operating systems and software, we must scan our Amazon Elastic Compute Cloud (Amazon EC2) instances for known software vulnerabilities, such as Log4J.

We use Amazon Inspector to run continuous vulnerability scans on our EC2 instances and Amazon Elastic Container Registry (Amazon ECR) container images. With Amazon Inspector, we can continuously detect if our developers are deploying and releasing vulnerable software on our EC2 instances and ECR images without setting up a third-party vulnerability scanner and installing additional endpoint agents.

Aggregate security findings with AWS Security Hub

We don’t want our security analysts to arbitrarily act on one security finding over another. This is time-consuming and does not properly prioritize the highest risks to address. We also need to track ownership, view progress of various findings, and build consistent responses for common security findings.

With AWS Security Hub, our analysts can seamlessly prioritize findings from GuardDuty, Macie, Amazon Inspector, and many other AWS services. Our analysts also use Security Hub’s built-in security checks and insights to identify AWS resources and accounts that have a high number of findings and act on them.

Setting up the threat detection services

This is how we set up these services:

Our security analysts use Security Hub-generated Jira tickets to view, prioritize, and respond to all security findings and misconfigurations across our AWS environment.

Through this configuration, our analysts no longer need to pivot between various AWS accounts, security tool consoles, and Regions, which makes the day-to-day management and operations much easier. Figure 1 depicts the data flow to Security Hub.

Aggregation of security services in security tooling account

Figure 1. Aggregation of security services in security tooling account

Delegated administrator setup

Figure 2. Delegated administrator setup

Incident response

Before hypergrowth, there was no formal way to respond to security incidents. To prevent future security issues, we built incident response plans and processes to quickly address potential security incidents and minimize the impact and exposure. Following the AWS Security Incident Response Guide and NIST framework, we adopted the following best practices.

Playbooks and runbooks for repeatability

We developed incident response playbooks and runbooks for repeatable responses for security events that include:

  • Playbooks for more strategic scenarios and responses based on some of the sample playbooks found here.
  • Runbooks that provide step-by-step guidance for our security analysts to follow in case an event occurs. We used Amazon SageMaker notebooks and AWS Systems Manager Incident Manager runbooks to develop repeatable responses for pre-identified incidents, such as suspected command and control activity on an EC2 instance.

Automation for quicker response time

After developing our repeatable processes, we identified areas where we could accelerate responses to security threats by automating the response. We used the AWS Security Hub Automated Response and Remediation solution as a starting point.

By using this solution, we didn’t need to build our own automated response and remediation workflow. The code is also easy to read, repeat, and centrally deploy through AWS CloudFormation StackSets. We used some of the built-in remediations like disabling active keys that have not been rotated for more than 90 days, making all Amazon Elastic Block Store (Amazon EBS) snapshots private, and many more. With automatic remediation, our analysts can respond quicker and in a more holistic and repeatable way.

Simulations to improve incident response capabilities

We implemented quarterly incident response simulations. These simulations test how well prepared our people, processes, and technologies are for an incident. We included some cloud-specific simulations like an S3 bucket exposure and an externally shared Amazon Relational Database Service (Amazon RDS) snapshot to ensure our security staff are prepared for an incident in the cloud. We use the results of the simulations to iterate on our incident response processes.

Conclusion

In this blog post, we discussed how to prepare for, detect, and respond to security events in an AWS environment. We identified security services to detect security events, vulnerabilities, and misconfigurations. We then discussed how to develop incident response processes through building playbooks and runbooks, performing simulations, and automation. With these new capabilities, we can detect and respond to a security incident throughout hypergrowth.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Other blog posts in this series

Top 2021 AWS Security service launches security professionals should review – Part 1

Post Syndicated from Ryan Holland original https://aws.amazon.com/blogs/security/top-2021-aws-security-service-launches-part-1/

Given the speed of Amazon Web Services (AWS) innovation, it can sometimes be challenging to keep up with AWS Security service and feature launches. To help you stay current, here’s an overview of some of the most important 2021 AWS Security launches that security professionals should be aware of. This is the first of two related posts; Part 2 will highlight some of the important 2021 launches that security professionals should be aware of across all AWS services.

Amazon GuardDuty

In 2021, the threat detection service Amazon GuardDuty expanded the internal AWS security intelligence it consumes to use more of the intel that AWS internal threat detection teams collect, including additional nation-state threat intelligence. Sharing more of the important intel that internal AWS teams collect lets you quickly improve your protection. GuardDuty also launched domain reputation modeling. These machine learning models take all the domain requests from across all of AWS, and feed them into a model that allows AWS to categorize previously unseen domains as highly likely to be malicious or benign based on their behavioral characteristics. In practice, AWS is seeing that these models often deliver high-fidelity threat detections, identifying malicious domains 7–14 days before they are identified and available on commercial threat feeds.

AWS also launched second generation anomaly detection for GuardDuty. Shortly after the original GuardDuty launch in 2017, AWS added additional anomaly detection for user behavior analytics and monitoring for unusual activity of AWS Identity and Access Management (IAM) users. After receiving customer feedback that the original feature was a little too noisy, and that it was difficult to understand why some findings were generated, the GuardDuty analytics team rebuilt this functionality on an entirely new machine learning model, considerably reducing the number of detections and generating a more accurate positive-detection rate. The new model also added additional context that security professionals (such as analysts) can use to understand why the model shows findings as suspicious or unusual.

Since its introduction, GuardDuty has detected when AWS EC2 Role credentials are used to call AWS APIs from IP addresses outside of AWS. Beginning in early 2022, GuardDuty now supports detection when credentials are used from other AWS accounts, inside the AWS network. This is a complex problem for customers to solve on their own, which is why the GuardDuty team added this enhancement. The solution considers that there are legitimate reasons why a source IP address that is communicating with AWS services APIs might be different than the Amazon Elastic Compute Cloud (Amazon EC2) instance IP address, or a NAT gateway associated with the instance’s VPC. The enhancement also considers complex network topologies that route traffic to one or multiple VPCs—for example, AWS Transit Gateway or AWS Direct Connect.

Our customers are increasingly running container workloads in production; helping to raise the security posture of these workloads became an AWS development priority in 2021. GuardDuty for EKS Protection is one recent feature that has resulted from this investment. This new GuardDuty feature monitors Amazon Elastic Kubernetes Service (Amazon EKS) cluster control plane activity by analyzing Kubernetes audit logs. GuardDuty is integrated with Amazon EKS, giving it direct access to the Kubernetes audit logs without requiring you to turn on or store these logs. Once a threat is detected, GuardDuty generates a security finding that includes container details such as pod ID, container image ID, and associated tags. See below for details on how the new Amazon Inspector is also helping to protect containers.

Amazon Inspector

At AWS re:Invent 2021, we launched the new Amazon Inspector, a vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure. The original Amazon Inspector was completely re-architected in this release to automate vulnerability management and to deliver near real-time findings to minimize the time needed to discover new vulnerabilities. This new Amazon Inspector has simple one-click enablement and multi-account support using AWS Organizations, similar to our other AWS Security services. This launch also introduces a more accurate vulnerability risk score, called the Inspector score. The Inspector score is a highly contextualized risk score that is generated for each finding by correlating Common Vulnerability and Exposures (CVE) metadata with environmental factors for resources such as network accessibility. This makes it easier for you to identify and prioritize your most critical vulnerabilities for immediate remediation. One of the most important new capabilities is that Amazon Inspector automatically discovers running EC2 instances and container images residing in Amazon Elastic Container Registry (Amazon ECR), at any scale, and immediately starts assessing them for known vulnerabilities. Now you can consolidate your vulnerability management solutions for both Amazon EC2 and Amazon ECR into one fully managed service.

AWS Security Hub

In addition to a significant number of smaller enhancements throughout 2021, in October AWS Security Hub, an AWS cloud security posture management service, addressed a top customer enhancement request by adding support for cross-Region finding aggregation. You can now view all your findings from all accounts and all selected Regions in a single console view, and act on them from an Amazon EventBridge feed in a single account and Region. Looking back at 2021, Security Hub added 72 additional best practice checks, four new AWS service integrations, and 13 new external partner integrations. A few of these integrations are Atlassian Jira Service Management, Forcepoint Cloud Security Gateway (CSG), and Amazon Macie. Security Hub also achieved FedRAMP High authorization to enable security posture management for high-impact workloads.

Amazon Macie

Based on customer feedback, data discovery tool Amazon Macie launched a number of enhancements in 2021. One new feature, which made it easier to manage Amazon Simple Storage Service (Amazon S3) buckets for sensitive data, was criteria-based bucket selection. This Macie feature allows you to define runtime criteria to determine which S3 buckets should be included in a sensitive data-discovery job. When a job runs, Macie identifies the S3 buckets that match your criteria, and automatically adds or removes them from the job’s scope. Before this feature, once a job was configured, it was immutable. Now, for example, you can create a policy where if a bucket becomes public in the future, it’s automatically added to the scan, and similarly, if a bucket is no longer public, it will no longer be included in the daily scan.

Originally Macie included all managed data identifiers available for all scans. However, customers wanted more surgical search criteria. For example, they didn’t want to be informed if there were exposed data types in a particular environment. In September 2021, Macie launched the ability to enable/disable managed data identifiers. This allows you to customize the data types you deem sensitive and would like Macie to alert on, in accordance with your organization’s data governance and privacy needs.

Amazon Detective

Amazon Detective is a service to analyze and visualize security findings and related data to rapidly get to the root cause of potential security issues. In January 2021, Amazon Detective added a convenient, time-saving integration that allows you to start security incident investigation workflows directly from the GuardDuty console. This new hyperlink pivot in the GuardDuty console takes findings directly from the GuardDuty console into the Detective console. Another time-saving capability added was the IP address drill down functionality. This new capability can be useful to security forensic teams performing incident investigations, because it helps quickly determine the communications that took place from an EC2 instance under investigation before, during, and after an event.

In December 2021, Detective added support for AWS Organizations to simplify management for security operations and investigations across all existing and future accounts in an organization. This launch allows new and existing Detective customers to onboard and centrally manage the Detective graph database for up to 1,200 AWS accounts.

AWS Key Management Service

In June 2021, AWS Key Management Service (AWS KMS) introduced multi-Region keys, a capability that lets you replicate keys from one AWS Region into another. With multi-Region keys, you can more easily move encrypted data between Regions without having to decrypt and re-encrypt with different keys for each Region. Multi-Region keys are supported for client-side encryption using direct AWS KMS API calls, or in a simplified manner with the AWS Encryption SDK and Amazon DynamoDB Encryption Client.

AWS Secrets Manager

Last year was a busy year for AWS Secrets Manager, with four feature launches to make it easier to manage secrets at scale, not just for client applications, but also for platforms. In March 2021, Secrets Manager launched multi-Region secrets to automatically replicate secrets for multi-Region workloads. Also in March, Secrets Manager added three new rules to AWS Config, to help administrators verify that secrets in Secrets Manager are configured according to organizational requirements. Then in April 2021, Secrets Manager added a CSI driver plug-in, to make it easy to consume secrets from Amazon EKS by using Kubernetes’s standard Secrets Store interface. In November, Secrets Manager introduced a higher secret limit of 500,000 per account to simplify secrets management for independent software vendors (ISVs) that rely on unique secrets for a large number of end customers. Although launched in January 2022, it’s also worth mentioning Secrets Manager’s release of rotation windows to align automatic rotation of secrets with application maintenance windows.

Amazon CodeGuru and Secrets Manager

In November 2021, AWS announced a new secrets detector feature in Amazon CodeGuru that searches your codebase for hardcoded secrets. Amazon CodeGuru is a developer tool powered by machine learning that provides intelligent recommendations to detect security vulnerabilities, improve code quality, and identify an application’s most expensive lines of code.

This new feature can pinpoint locations in your code with usernames and passwords; database connection strings, tokens, and API keys from AWS; and other service providers. When a secret is found in your code, CodeGuru Reviewer provides an actionable recommendation that links to AWS Secrets Manager, where developers can secure the secret with a point-and-click experience.

Looking ahead for 2022

AWS will continue to deliver experiences in 2022 that meet administrators where they govern, developers where they code, and applications where they run. A lot of customers are moving to container and serverless workloads; you can expect to see more work on this in 2022. You can also expect to see more work around integrations, like CodeGuru Secrets Detector identifying plaintext secrets in code (as noted previously).

To stay up-to-date in the year ahead on the latest product and feature launches and security use cases, be sure to read the Security service launch announcements. Additionally, stay tuned to the AWS Security Blog for Part 2 of this blog series, which will provide an overview of some of the important 2021 launches that security professionals should be aware of across all AWS services.

If you’re looking for more opportunities to learn about AWS security services, check out AWS re:Inforce, the AWS conference focused on cloud security, identity, privacy, and compliance, which will take place June 28-29 in Houston, Texas.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Ryan Holland

Ryan is a Senior Manager with GuardDuty Security Response. His team is responsible for ensuring GuardDuty provides the best security value to customers, including threat intelligence, behavioral analytics, and finding quality.

Author

Marta Taggart

Marta is a Seattle-native and Senior Product Marketing Manager in AWS Security Product Marketing, where she focuses on data protection services. Outside of work you’ll find her trying to convince Jack, her rescue dog, not to chase squirrels and crows (with limited success).

Enabling data classification for Amazon RDS database with Macie

Post Syndicated from Bruno Silveira original https://aws.amazon.com/blogs/security/enabling-data-classification-for-amazon-rds-database-with-amazon-macie/

Customers have been asking us about ways to use Amazon Macie data discovery on their Amazon Relational Database Service (Amazon RDS) instances. This post presents how to do so using AWS Database Migration Service (AWS DMS) to extract data from Amazon RDS, store it on Amazon Simple Storage Service (Amazon S3), and then classify the data using Macie. Macie’s resulting findings will also be made available to be queried with Amazon Athena by appropriate teams.

The challenge

Let’s suppose you need to find sensitive data in an RDS-hosted database using Macie, which currently only supports S3 as a data source. Therefore, you will need to extract and store the data from RDS in S3. In addition, you will need an interface for audit teams to audit these findings.

Solution overview

Figure 1: Solution architecture workflow

Figure 1: Solution architecture workflow

The architecture of the solution in Figure 1 can be described as:

  1. A MySQL engine running on RDS is populated with the Sakila sample database.
  2. A DMS task connects to the Sakila database, transforms the data into a set of Parquet compressed files, and loads them into the dcp-macie bucket.
  3. A Macie classification job analyzes the objects in the dcp-macie bucket using a combination of techniques such as machine learning and pattern matching to determine whether the objects contain sensitive data and to generate detailed reports on the findings.
  4. Amazon EventBridge routes the Macie findings reports events to Amazon Kinesis Data Firehose.
  5. Kinesis Data Firehose stores these reports in the dcp-glue bucket.
  6. S3 event notification triggers an AWS Lambda function whenever an object is created in the dcp-glue bucket.
  7. The Lambda function named Start Glue Workflow starts a Glue Workflow.
  8. Glue Workflow transforms the data from JSONL to Apache Parquet file format and places it in the dcp-athena bucket. This provides better performance during data query and optimized storage usage using a binary optimized columnar storage.
  9. Athena is used to query and visualize data generated by Macie.

Note: For better readability, the S3 bucket nomenclature omits the suffix containing the AWS Region and AWS account ID used to meet the global uniqueness naming requirement (for example, dcp-athena-us-east-1-123456789012).

The Sakila database schema consists of the following tables:

  • actor
  • address
  • category
  • city
  • country
  • customer

Building the solution

Prerequisites

Before configuring the solution, the AWS Identity and Access Management (IAM) user must have appropriate access granted for the following services:

You can find an IAM policy with the required permissions here.

Step 1 – Deploying the CloudFormation template

You’ll use CloudFormation to provision quickly and consistently the AWS resources illustrated in Figure 1. Through a pre-built template file, it will create the infrastructure using an Infrastructure-as-Code (IaC) approach.

  1. Download the CloudFormation template.
  2. Go to the CloudFormation console.
  3. Select the Stacks option in the left menu.
  4. Select Create stack and choose With new resources (standard).
  5. On Step 1 – Specify template, choose Upload a template file, select Choose file, and select the file template.yaml downloaded previously.
  6. On Step 2 – Specify stack details, enter a name of your preference for Stack name. You might also adjust the parameters as needed, like the parameter CreateRDSServiceRole to create a service role for RDS if it does not exist in the current account.
  7. On Step 3 – Configure stack options, select Next.
  8. On Step 4 – Review, check the box for I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
  9. Wait for the stack to show status CREATE_COMPLETE.

Note: It is expected that provisioning will take around 10 minutes to complete.

Step 2 – Running an AWS DMS task

To extract the data from the Amazon RDS instance, you need to run an AWS DMS task. This makes the data available for Amazon Macie in an S3 bucket in Parquet format.

  1. Go to the AWS DMS console.
  2. In the left menu, select Database migration tasks.
  3. Select the task Identifier named rdstos3task.
  4. Select Actions.
  5. Select the option Restart/Resume.

When the Status changes to Load Complete the task has finished and you will be able to see migrated data in your target bucket (dcp-macie).

Inside each folder you can see parquet file(s) with names similar to LOAD00000001.parquet. Now you can use Macie to discover if you have sensitive data in your database contents as exported to S3.

Step 3 – Running a classification job with Amazon Macie

Now you need to create a data classification job so you can assess the contents of your S3 bucket. The job you create will run once and evaluate the complete contents of your S3 bucket to determine whether it can identify PII among the data. As mentioned earlier, this job only uses the managed identifiers available with Macie – you could also add your own custom identifiers.

  1. Go to the Macie console.
  2. Select the S3 buckets option in the left menu.
  3. Choose the S3 bucket dcp-macie containing the output data from the DMS task. You may need to wait a minute and select the Refresh icon if the bucket names do not display.

  4. Select Create job.
  5. Select Next to continue.
  6. Create a job with the following scope.
    1. Sensitive data Discovery options: One-time job
    2. Sampling Depth: 100%
    3. Leave all other settings with their default values
  7. Select Next to continue.
  8. Select Next again to skip past the Custom data identifiers section.
  9. Give the job a name and description.
  10. Select Next to continue.
  11. Verify the details of the job you have created and select Submit to continue.

You will see a green banner stating that The Job was successfully created. The job can take up to 15 minutes to conclude and the Status will change from Active to Complete. To open the findings from the job, select the job’s check box, choose Show results, and select Show findings.
 

Figure 2: Macie Findings screen

Figure 2: Macie Findings screen

Note: You can navigate in the findings and select each checkbox to see the details.

Step 4 – Enabling querying on classification job results with Amazon Athena

  1. Go to the Athena console and open the Query editor.
  2. If it’s your first-time using Athena you will see a message Before you run your first query, you need to set up a query result location in Amazon S3. Learn more. Select the link presented with this message.
  3. In the Settings window, choose Select and then choose the bucket dcp-assets to store the Athena query results.
  4. (Optional) To store the query results encrypted, check the box for Encrypt query results and select your preferred encryption type. To learn more about Amazon S3 encryption types, see Protecting data using encryption.
  5. Select Save.

Step 5 – Query Amazon Macie results with Amazon Athena.

It might take a few minutes for the data to complete the flow between Amazon Macie and AWS Glue. After it’s finished, you’ll be able to see in Athena’s console the table dcp_athena within the database dcp.

Then, select the three dots next to the table dcp_athena and select the Preview table option to see a data preview, or run your own custom queries.
 

Figure 3: Athena table preview

Figure 3: Athena table preview

As your environment grows, this blog post on Top 10 Performance Tuning Tips for Amazon Athena can help you apply partitioning of data and consolidate your data into larger files if needed.

Clean up

After you finish, to clean up the solution and avoid unnecessary expenses, complete the following steps:

  1. Go to the Amazon S3 console.
  2. Navigate to each of the buckets listed below and delete all its objects:
    • dcp-assets
    • dcp-athena
    • dcp-glue
    • dcp-macie
  3. Go to the CloudFormation console.
  4. Select the Stacks option in the left menu.
  5. Choose the stack you created in Step 1 – Deploying the CloudFormation template.
  6. Select Delete and then select Delete Stack in the pop-up window.

Conclusion

In this blog post, we show how you can find Personally Identifiable Information (PII), and other data defined as sensitive, in Macie’s Managed Data Identifiers in an RDS-hosted MySQL database. You can use this solution with other relational databases like PostgreSQL, SQL Server, or Oracle, whether hosted on RDS or EC2. If you’re using Amazon DynamoDB, you may also find useful the blog post Detecting sensitive data in DynamoDB with Macie.

By classifying your data, you can inform your management of appropriate data protection and handling controls for the use of that data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Bruno Silveira

Bruno is a Solutions Architect Manager in the Public Sector team with a focus on educational institutions in Brazil. His previous career was in government, financial services, utilities, and nonprofit institutions. Bruno is an enthusiast of cloud security and an appreciator of good rock’n roll with a good beer.

Author

Thiago Pádua

Thiago is Solutions Architect in the AWS Worldwide Public Sector team working in the development and support of partners. He is experienced in software development and systems integration, mainly in the telecommunications industry. He has a special interest in microservices, serverless, and containers.

Strengthen the security of sensitive data stored in Amazon S3 by using additional AWS services

Post Syndicated from Jerry Mullis original https://aws.amazon.com/blogs/security/strengthen-the-security-of-sensitive-data-stored-in-amazon-s3-by-using-additional-aws-services/

In this post, we describe the AWS services that you can use to both detect and protect your data stored in Amazon Simple Storage Service (Amazon S3). When you analyze security in depth for your Amazon S3 storage, consider doing the following:

Using these additional AWS services along with Amazon S3 can improve your security posture across your accounts.

Audit and restrict Amazon S3 access with IAM Access Analyzer

IAM Access Analyzer allows you to identify unintended access to your resources and data. Users and developers need access to Amazon S3, but it’s important for you to keep users and privileges accurate and up to date.

Amazon S3 can often house sensitive and confidential information. To help secure your data within Amazon S3, you should be using AWS Key Management Service (AWS KMS) with server-side encryption at rest for Amazon S3. It is also important that you secure the S3 buckets so that you only allow access to the developers and users who require that access. Bucket policies and access control lists (ACLs) are the foundation of Amazon S3 security. Your configuration of these policies and lists determines the accessibility of objects within Amazon S3, and it is important to audit them regularly to properly secure and maintain the security of your Amazon S3 bucket.

IAM Access Analyzer can scan all the supported resources within a zone of trust. Access Analyzer then provides you with insight when a bucket policy or ACL allows access to any external entities that are not within your organization or your AWS account’s zone of trust.

To setup and use IAM Access Analyzer, follow the instructions for Enabling Access Analyzer in the AWS IAM User Guide.

The example in Figure 1 shows creating an analyzer with the zone of trust as the current account, but you can also create an analyzer with the organization as the zone of trust.

Figure 1: Creating IAM Access Analyzer and zone of trust

Figure 1: Creating IAM Access Analyzer and zone of trust

After you create your analyzer, IAM Access Analyzer automatically scans the resources in your zone of trust and returns the findings from your Amazon S3 storage environment. The initial scan shown in Figure 2 shows the findings of an unsecured S3 bucket.

Figure 2: Example of unsecured S3 bucket findings

Figure 2: Example of unsecured S3 bucket findings

For each finding, you can decide which action you would like to take. As shown in figure 3, you are given the option to archive (if the finding indicates intended access) or take action to modify bucket permissions (if the finding indicates unintended access).

Figure 3: Displays choice of actions to take

Figure 3: Displays choice of actions to take

After you address the initial findings, Access Analyzer monitors your bucket policies for changes, and notifies you of access issues it finds. Access Analyzer is regional and must be enabled in each AWS Region independently.

Classify and secure sensitive data with Macie

Organizational compliance standards often require the identification and securing of sensitive data. Your organization’s sensitive data might contain personally identifiable information (PII), which includes things such as credit card numbers, birthdates, and addresses.

Macie is a data security and privacy service offered by AWS that uses machine learning and pattern matching to discover the sensitive data stored within Amazon S3. You can define your own custom type of sensitive data category that might be unique to your business or use case. Macie will automatically provide an inventory of S3 buckets and alert you of unprotected sensitive data.

Figure 4 shows a sample result from a Macie scan in which you can see important information regarding Amazon S3 public access, encryption settings, and sharing.

Figure 4: Sample results from a Macie scan

Figure 4: Sample results from a Macie scan

In addition to finding potential sensitive data, Macie also gives you a severity score based on the privacy risk, as shown in the example data in Figure 5.

Figure 5: Example Macie severity scores

Figure 5: Example Macie severity scores

When you use Macie in conjunction with AWS Step Functions, you can also automatically remediate any issues found. You can use this combination to help meet regulations such as General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA). Macie allows you to have constant visibility of sensitive data within your Amazon S3 storage environment.

When you deploy Macie in a multi-account configuration, your usage is rolled up to the master account to provide the total usage for all accounts and a breakdown across the entire organization.

Detect malicious access patterns with GuardDuty

Your customers and users can commit thousands of actions each day on S3 buckets. Discerning access patterns manually can be extremely time consuming as the volume of data increases. GuardDuty uses machine learning, anomaly detection, and integrated threat intelligence to analyze billions of events across multiple accounts and uses data collected in AWS CloudTrail logs for S3 data events as well as S3 access logs, VPC Flow Logs, and DNS logs. GuardDuty can be configured to analyze these logs and notify you of suspicious activity, such as unusual data access patterns, unusual discovery API calls, and more. After you receive a list of findings on these activities, you will be able to make informed decisions to secure your S3 buckets.

Figure 6 shows a sample list of findings returned by GuardDuty which shows the finding type, resource affected, and count of occurrences.

Figure 6: Example GuardDuty list of findings

Figure 6: Example GuardDuty list of findings

You can select one of the results in Figure 6 to see the IP address and details associated from this potential malicious IP caller, as shown in Figure 7.

Figure 7: GuardDuty Malicious IP Caller detailed findings

Figure 7: GuardDuty Malicious IP Caller detailed findings

Monitor and remediate configuration changes with AWS Config

Configuration management is important when securing Amazon S3, to prevent unauthorized users from gaining access. It is important that you monitor the configuration changes of your S3 buckets, whether the changes are intentional or unintentional. AWS Config can track all configuration changes that are made to an S3 bucket. For example, if an S3 bucket had its permissions and configurations unexpectedly changed, using AWS Config allows you to see the changes made, as well as who made them.

With AWS Config, you can set up AWS Config managed rules that serve as a baseline for your S3 bucket. When any bucket has configurations that deviate from this baseline, you can be alerted by Amazon Simple Notification Service (Amazon SNS) of the bucket being noncompliant.

AWS Config can be used in conjunction with a service called AWS Lambda. If an S3 bucket is noncompliant, AWS Config can trigger a preprogrammed Lambda function and then the Lambda function can resolve those issues. This combination can be used to reduce your operational overhead in maintaining compliance within your S3 buckets.

Figure 8 shows a sample of AWS Config managed rules selected for configuration monitoring and gives a brief description of what the rule does.

Figure 8: Sample selections of AWS Managed Rules

Figure 8: Sample selections of AWS Managed Rules

Figure 9 shows a sample result of a non-compliant configuration and resource inventory listing the type of resource affected and the number of occurrences.

Figure 9: Example of AWS Config non-compliant resources

Figure 9: Example of AWS Config non-compliant resources

Conclusion

AWS has many offerings to help you audit and secure your storage environment. In this post, we discussed the particular combination of AWS services that together will help reduce the amount of time and focus your business devotes to security practices. This combination of services will also enable you to automate your responses to any unwanted permission and configuration changes, saving you valuable time and resources to dedicate elsewhere in your organization.

For more information about pricing of the services mentioned in this post, see AWS Free Tier and AWS Pricing. For more information about Amazon S3 security, see Amazon S3 Preventative Security Best Practices in the Amazon S3 User Guide.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jerry Mullis

Jerry is an Associate Solutions Architect at AWS. His interests are in data migration, machine learning, and device automation. Jerry has previous experience in machine learning research and healthcare management. His certifications include AWS Solutions Architect Pro, AWS Developer Associate, AWS Sysops Admin Associate and AWS Certified Cloud Practitioner. In his free time, Jerry enjoys hiking, playing basketball, and spending time with his wife.

Author

Dave Geyer

Dave is an Associate Solutions Architect at AWS. He has a background in data management and organizational design, and is interested in data analytics and infrastructure security. Dave has advised and worked for customers in the commercial and public sectors, providing them with architectural best practices and recommendations. Dave is interested in the aerospace and financial services industries. Outside of work, he is an adrenaline junkie, and is passionate about mountaineering and high altitudes.

Author

Andrew Chen

Andrew is an Associate Solutions Architect with an interest in data analytics, machine learning, and virtualization of infrastructure. Andrew has previous experience in management consulting in which he worked as a technical lead for various cloud migration projects. In his free time, Andrew enjoys fishing, hiking, kayaking, and keeping up with financial markets.

Using Amazon Macie to Validate S3 Bucket Data Classification

Post Syndicated from Bill Magee original https://aws.amazon.com/blogs/architecture/using-amazon-macie-to-validate-s3-bucket-data-classification/

Securing sensitive information is a high priority for organizations for many reasons. At the same time, organizations are looking for ways to empower development teams to stay agile and innovative. Centralized security teams strive to create systems that align to the needs of the development teams, rather than mandating how those teams must operate.

Security teams who create automation for the discovery of sensitive data have some issues to consider. If development teams are able to self-provision data storage, how does the security team protect that data? If teams have a business need to store sensitive data, they must consider how, where, and with what safeguards that data is stored.

Let’s look at how we can set up Amazon Macie to validate data classifications provided by decentralized software development teams. Macie is a fully managed service that uses machine learning (ML) to discover sensitive data in AWS. If you are not familiar with Macie, read New – Enhanced Amazon Macie Now Available with Substantially Reduced Pricing.

Data classification is part of the security pillar of a Well-Architected application. Following the guidelines provided in the AWS Well-Architected Framework, we can develop a resource-tagging scheme that fits our needs.

Overview of decentralized data validation system

In our example, we have multiple levels of data classification that represent different levels of risk associated with each classification. When a software development team creates a new Amazon Simple Storage Service (S3) bucket, they are responsible for labeling that bucket with a tag. This tag represents the classification of data stored in that bucket. The security team must maintain a system to validate that the data in those buckets meets the classification specified by the development teams.

This separation of roles and responsibilities for development and security teams who work independently requires a validation system that’s decoupled from S3 bucket creation. It should automatically detect new buckets or data in the existing buckets, and validate the data against the assigned classification tags. It should also notify the appropriate development teams of misclassified or unclassified buckets in a timely manner. These notifications can be through standard notification channels, such as email or Slack channel notifications.

Validation and alerts with AWS services

Figure 1. Validation system for Data Classification

Figure 1. Validation system for data classification

We assume that teams are permitted to create S3 buckets and we will use AWS Config to enforce the following required tags: DataClassification and SupportSNSTopic. The DataClassification tag indicates what type of data is allowed in the bucket. The SupportSNSTopic tag indicates an Amazon Simple Notification Service (SNS) topic. If there are issues found with the data in the bucket, a message is published to the topic, and Amazon SNS will deliver an alert. For example, if there is personally identifiable information (PII) data in a bucket that is classified as non-sensitive, the system will alert the owners of the bucket.

Macie is configured to scan all S3 buckets on a scheduled basis. This configuration ensures that any new bucket and data placed in the buckets is analyzed the next time the Macie job runs.

Macie provides several managed data identifiers for discovering and classifying the data. These include bank account numbers, credit card information, authentication credentials, PII, and more. You can also create custom identifiers (or rules) to gather information not covered by the managed identifiers.

Macie integrates with Amazon EventBridge to allow us to capture data classification events and route them to one or more destinations for reporting and alerting needs. In our configuration, the event initiates an AWS Lambda. The Lambda function is used to validate the data classification inferred by Macie against the classification specified in the DataClassification tag using custom business logic. If a data classification violation is found, the Lambda then sends a message to the Amazon SNS topic specified in the SupportSNSTopic tag.

The Lambda function also creates custom metrics and sends those to Amazon CloudWatch. The metrics are organized by engineering team and severity. This allows the security team to create a dashboard of metrics based on the Macie findings. The findings can also be filtered per engineering team and severity to determine which teams need to be contacted to ensure remediation.

Conclusion

This solution provides a centralized security team with the tools it needs. The team can validate the data classification of an Amazon S3 bucket that is self-provisioned by a development team. New Amazon S3 buckets are automatically included in the Macie jobs and alerts. These are only sent out if the data in the bucket does not conform to the classification specified by the development team. The data auditing process is loosely coupled with the Amazon S3 Bucket creation process, enabling self-service capabilities for development teams, while ensuring proper data classification. Your teams can stay agile and innovative, while maintaining a strong security posture.

Learn more about Amazon Macie and Data Classification.

Automate the archival and deletion of sensitive data using Amazon Macie

Post Syndicated from Subhro Bose original https://aws.amazon.com/blogs/big-data/automate-the-archival-and-deletion-of-sensitive-data-using-amazon-macie/

Customers are looking for ways to securely and cost-efficiently manage large volumes of sensitive data archival and deletion in their data lake by following regulations and data protection and privacy laws, such as GDPR, POPIA, and LGPD. This post describes a way to automatically identify sensitive data stored in your data lake within AWS, tag the data according to its sensitivity level, and apply appropriate lifecycle policies in a secured and cost-effective way.

Amazon Macie is a managed data security and data privacy service that uses machine learning (ML) and pattern matching to discover and protect your sensitive data stored in Amazon Simple Storage Service. (Amazon S3). In this post, we show you how to develop a solution using Macie, Amazon Kinesis Data Firehose, Amazon S3, Amazon EventBridge, and AWS Lambda to identify sensitive data across a large number of S3 buckets, tag them, and apply lifecycle policies for transition and deletion.

Solution overview

The following diagram illustrates the architecture of our solution.

The flow of the solution is as follows:

  1. An Amazon S3 bucket contains multiple objects with different sensitivities of data.
  2. A Macie job analyses the S3 bucket to identify the different sensitivities.
    1. An EventBridge rule is triggered for each finding that Macie generates from the job.
    2. The rule copies the results created by the Macie job to a Kinesis Data Firehose delivery stream.
    3. The delivery stream copies the results to an S3 bucket as a JSON file for future audit requirements.
  3. The arrival of the results triggers a Lambda function that parses the sensitivity metadata from the JSON file.
  4. The function tags the objects in the bucket mentioned in Step 1 and creates an S3 Lifecycle policy based on the sensitivity level of each object and overwrites an S3 Lifecycle policy for each existing object.
  5. The S3 Lifecycle policy moves data to different classes and deletes data based on the configured rules. For example, we implement the following rules:
    1. Archive objects with high sensitivity, tagged as High, after 700 days.
    2. Delete objects tagged as High after 3,000 days.

Create resources with AWS CloudFormation

We provide an AWS CloudFormation template to create the following resources:

  • An S3 bucket named archival-blog-<account_number>-<region_name> as a sample subject bucket as described above.
  • An S3 bucket named archival-blog-results-<account_number>-<region_name> to store the results generated by the Macie job.
  • A Firehose delivery stream to send the results of the Macie job to the S3 bucket.
  • An EventBridge rule that matches the incoming event of a result generated by the Macie job and routes the result to the Firehose delivery stream.
  • A Lambda function to apply tags and S3 Lifecycle policies on the data objects of the subject S3 bucket based on the result generated by the Macie job.
  • AWS Identity and Access Management (IAM) roles and policies with appropriate permissions.

Launch the following stack, providing your stack name:

After the cloud formation stack is deployed, copy  sample_pii.xlsx and recipe.xlsx as sample data for Macie to detect as sensitive data in archival-blog-<account_number>-<region_name>.

Next, we scan the subject S3 bucket for sensitive data to tag and attach the appropriate lifecycle policies.

Configure a Macie job

Macie uses ML and pattern matching to discover and protect your sensitive data in AWS. To configure a Macie job, complete the following steps:

  1. On the Macie console, create a new job by choosing Create Job.
  2. Select the bucket that you want to analyze and choose Next.
  3. Select a schedule if you want to run the job on a schedule, or One-time job if you want to run the job one time.For this post, we select One-time job. We can also choose Scheduled job for periodic jobs in production, but this is out of the scope of this post.
  4. Choose Next.
  5. Enter a name for the job and choose Next.
  6. Review the job details and confirm they’re correct before choosing Submit.

The job immediately starts after you submit it.

Review the results

Whenever the Macie job runs, it generates the following results.

Secondly, the Lambda function tags sensitive object, with Sensitivity : High.

Thirdly, the function creates a corresponding S3 Lifecycle policy.

Clean up

When you’re done with this solution, delete the sample data from the subject S3 bucket, delete all the data objects that are stored as Macie results in the S3 bucket for this post, and then delete the CloudFormation stack to remove all the service resources used in the solution.

Conclusion

In this post, we highlighted how you can use Macie to scan all your data stored on Amazon S3 and how to store your security findings in Amazon S3 using EventBridge and Kinesis Data Firehose. We also explored how you can use Lambda to tag the relevant objects in your subject S3 bucket using the Macie job’s security findings. An S3 Lifecycle transition policy moves tagged objects across different storage classes to help save cost, and an S3 Lifecycle expiration policy deletes objects that have reached the end of their lifecycle. According to the privacy laws like GDPR and POPIA, personal data should be retained as long as the data needs to be retained for legal purposes, or needs to be processed. In this post, we provide a mechanism to allow archival of sensitive data which you might not want to delete immediately, but reduce related storage costs and delete it after a certain period when that data is no longer required. The data archival and deletion periods used above are sample numbers that can be customised within the Lambda function based on requirements. Additionally, please also explore Macie’s different type of findings. You can use these different finding to build capabilities like sending notifications, creating searches in cloud trail S3 object logs for who is accessing specific objects, etc.

If you have any questions, comments, or concerns, please reach out to AWS Support. If you have feedback about this post, submit it in the comments section.


About the Authors

Subhro Bose is a Data Architect in Emergent Technologies and Intelligence Platform in Amazon. He loves working on ways for emergent technologies such as AI/ML, big data, quantum, and more to help businesses across different industry verticals succeed within their innovation journey.

 

 

 

Akshay Chandiramani is Data Analytics Consultant for AWS Professional Services. He is passionate about solving complex big data and MLOps problems for customers to create tangible business outcomes. In his spare time, he enjoys playing snooker and table tennis, and capturing trends on the stock market.

 

 

 

Ikenna Uzoh is a Senior Manager, Business Intelligence (BI) and Analytics at AWS. Before his current role, he was a Senior Big Data and Analytics Architect with AWS Professional Services. He is passionate about building data platforms (data lakes, BI, Machine learning) that help organizations achieve their desired outcomes.

Creating a notification workflow from sensitive data discover with Amazon Macie, Amazon EventBridge, AWS Lambda, and Slack

Post Syndicated from Bruno Silviera original https://aws.amazon.com/blogs/security/creating-a-notification-workflow-from-sensitive-data-discover-with-amazon-macie-amazon-eventbridge-aws-lambda-and-slack/

Following the example of the EU in implementing the General Data Protection Regulation (GDPR), many countries are implementing similar data protection laws. In response, many companies are forming teams that are responsible for data protection. Considering the volume of information that companies maintain, it’s essential that these teams are alerted when sensitive data is at risk.

This post shows how to deploy a solution that uses Amazon Macie to discover sensitive data. This solution enables you to set up automatic notification to your company’s designated data protection team via a Slack channel when sensitive data that needs to be protected is discovered by Amazon EventBridge and AWS Lambda.

The challenge

Let’s imagine that you’re part of a team that’s responsible for classifying your organization’s data but the data structure isn’t documented. Amazon Macie provides you the ability to run a scheduled classification job that examines your data, and you want to notify the data protection team when there’s new sensitive data to classify. Let’s build a solution to automatically notify the data protection team.

Solution overview

To be scalable and cost-effective, this solution uses serverless technologies and managed AWS services, including:

  • Macie – A fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in Amazon Web Services (AWS).
  • EventBridge – A serverless event bus that connects application data from your apps, SaaS, and AWS services. EventBridge can respond to specific events or run according to a schedule. The solution presented in this post uses EventBridge to initiate a custom Lambda function in response to a specific event.
  • Lambda – Runs code in response to events such as changes in data, changes in application state, or user actions. In this solution, a Lambda function is initiated by EventBridge.

Solution architecture

The architecture workflow is shown in Figure 1 and includes the following steps:

  1. Macie runs a classification job and publishes its findings to EventBridge as a JSON object.
  2. The EventBridge rule captures the findings and invokes a Lambda function as a target.
  3. The Lambda function parses the JSON object. The function then sends a custom message to a Slack channel with the sensitive data finding for the data protection team to evaluate and respond to.

 

Figure 1: Solution architecture workflow

Figure 1: Solution architecture workflow

Set up Slack

For this solution, you need a Slack workspace and an incoming webhook. The workspace must be in place before you create the webhook.

Create a Slack workspace

If you already have a Slack workspace in your environment, you can skip forward, to creating the webhook.

If you don’t have a Slack workspace, follow the steps in Create a Slack Workspace to create one.

Create an incoming webhook in Slack API

  1. Go to your Slack API.
  2. Choose Start Building to create an app.
  3. Enter the following details for your app:
    • App Namemacie-to-slack.
    • Development Slack Workspace – Choose the Slack workspace—either an existing workspace or one you created for this solution—to receive the Macie findings.
  4. Choose the Create App button.
  5. In the left menu, choose Incoming Webhooks.
  6. At the Activate Incoming Webhooks screen, move the slider from OFF to ON.
  7. Scroll down and choose Add New Webhook to Workspace.
  8. In the screen asking where your app should post, enter the name of the Slack channel from your Workspace that you want to send notification to and choose Authorize.
  9. On the next screen, scroll down to the Webhook URL section. Make a note of the URL to use later.

Deploy the CloudFormation template with the solution

The deployment of the CloudFormation template automatically creates the following resources:

  • A Lambda function that begins with the name named macie-to-slack-lambdafindingsToSlack-.
  • An EventBridge rule named MacieFindingsToSlack.
  • An IAM role named MacieFindingsToSlackkRole.
  • A permission to invoke the Lambda function named LambdaInvokePermission.

Note: Before you proceed, make sure you’re deploying the template to the same Region that your production Macie is running.

To deploy the Cloudformation template

  1. Download the YAML template to your computer.

    Note: To save the template, you can right click the Raw button at the top of the code and then select Save link as if you’re using Chrome, or the equivalent in your browser. This file is used in Step 4.

  2. Open CloudFormation in the AWS Management Console.
  3. On the Welcome page, choose Create stack and then choose With new resources.
  4. On Step 1 — Specify template, choose Upload a template file, select Choose file and then select the file template.yaml (the file extension might be .YML), then choose Next.
  5. On Step 2 — Specify stack details:
    1. Enter macie-to-slack as the Stack name.
    2. At the Slack Incoming Web Hook URL, paste the webhook URL you copied earlier.
    3. At Slack channel, enter the name of the channel in your workspace that will receive the alerts and choose Next.
    Figure 2: Defining stack details

    Figure 2: Defining stack details

  6. On Step 3 – Configure Stack options, you can leave the default settings, or change them for your environment. Choose Next to continue.
  7. At the bottom of Step 4 – Review, select I acknowledge that AWS CloudFormation might create IAM resources, and choose Create stack.

    Figure 3: Confirmation before stack creation

    Figure 3: Confirmation before stack creation

  8. Wait for the stack to reach status CREATE_COMPLETE.

Running the solution

At this point, you’ve deployed the solution and your resources are created.

To test the solution, you can schedule a Macie job targeting a bucket that contains a file with sensitive information that Macie can detect.

Note: You can check the Amazon Macie documentation to see the list of supported managed data identifiers.

When the Macie job is complete, any findings are sent to the Slack channel.

Figure 4: Macie finding delivered to Slack channel

Figure 4: Macie finding delivered to Slack channel

Select the link in the message sent to the Slack channel to open that finding in the Macie console, as shown in Figure 5.

Figure 5: Finding details

Figure 5: Finding details

And you’re done!

Now your Macie finding results are delivered to your Slack channel where they can be easily monitored, reducing response time and risk exposure.

If you deployed this for testing purposes, or want to clean this up and move to your production account, you can delete the Cloudformation stack:

  1. Open the CloudFormation console.
  2. Select the stack and choose Delete.

Conclusion

In this blog post we walked through the steps to configure a notification workflow using Macie, Lambda, and EventBridge to send sensitive data findings to your data protection team via a Slack channel.

Your data protection team will appreciate the timely notifications of sensitive data findings, giving you the ability to focus on creating controls to improve data security and compliance with regulations related to protection and treatment of personal data.

For more information about data privacy on AWS, see Data Privacy FAQ.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Bruno Silveira

Bruno is a Solutions Architect Manager in the Public Sector team with focus on educational institutions in Brazil. His previous career was in government, financial services, utilities, and nonprofit institutions. Bruno is an enthusiast of cloud security and an appreciator of good rock’n roll with a good beer.

Author

Julio Carvalho

Julio is a Principal Security Solutions Architect at AWS for the Latin American financial market. As a security specialist, he helps customers solve protection and compliance challenges on their cloud journey.

Deploy an automated ChatOps solution for remediating Amazon Macie findings

Post Syndicated from Nick Cuneo original https://aws.amazon.com/blogs/security/deploy-an-automated-chatops-solution-for-remediating-amazon-macie-findings/

The amount of data being collected, stored, and processed by Amazon Web Services (AWS) customers is growing at an exponential rate. In order to keep pace with this growth, customers are turning to scalable cloud storage services like Amazon Simple Storage Service (Amazon S3) to build data lakes at the petabyte scale. Customers are looking for new, automated, and scalable ways to address their data security and compliance requirements, including the need to identify and protect their sensitive data. Amazon Macie helps customers address this need by offering a managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data that is stored in Amazon S3.

In this blog post, I show you how to deploy a solution that establishes an automated event-driven workflow for notification and remediation of sensitive data findings from Macie. Administrators can review and approve remediation of findings through a ChatOps-style integration with Slack. Slack is a business communication tool that provides messaging functionality, including persistent chat rooms known as channels. With this solution, you can streamline the notification, investigation, and remediation of sensitive data findings in your AWS environment.

Prerequisites

Before you deploy the solution, make sure that your environment is set up with the following prerequisites:

Important: This solution uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See the AWS pricing page for details.

Solution overview

The solution architecture and workflow are detailed in Figure 1.

Figure 1: Solution overview

Figure 1: Solution overview

This solution allows for the configuration of auto-remediation behavior based on finding type and finding severity. For each finding type, you can define whether you want the offending S3 object to be automatically quarantined, or whether you want the finding details to be reviewed and approved by a human in Slack prior to being quarantined. In a similar manner, you can define the minimum severity level (Low, Medium, High) that a finding must have before the solution will take action. By adjusting these parameters, you can manage false positives and tune the volume and type of findings about which you want to be notified and take action. This configurability is important because customers have different security, risk, and regulatory requirements.

Figure 1 details the services used in the solution and the integration points between them. Let’s walk through the full sequence from the detection of sensitive data to the remediation (quarantine) of the offending object.

  1. Macie is configured with sensitive data discovery jobs (scheduled or one-time), which you create and run to detect sensitive data within S3 buckets. When Macie runs a job, it uses a combination of criteria and techniques to analyze objects in S3 buckets that you specify. For a full list of the categories of sensitive data Macie can detect, see the Amazon Macie User Guide.
  2. For each sensitive data finding, an event is sent to Amazon EventBridge that contains the finding details. An EventBridge rule triggers a Lambda function for processing.
  3. The Finding Handler Lambda function parses the event and examines the type of the finding. Based on the auto-remediation configuration, the function either invokes the Finding Remediator function for immediate remediation, or sends the finding details for manual review and remediation approval through Slack.
  4. Delegated security and compliance administrators monitor the configured Slack channel for notifications. Notifications provide high-level finding information, remediation status, and a link to the Macie console for the finding in question. For findings configured for manual review, administrators can choose to approve the remediation in Slack by using an action button on the notification.
  5. After an administrator chooses the Remediate button, Slack issues an API call to an Amazon API Gateway endpoint, supplying both the unique identifier of the finding to be remediated and that of the Slack user. API Gateway proxies the request to a Remediation Handler Lambda function.
  6. The Remediation Handler Lambda function validates the request and request signature, extracts the offending object’s location from the finding, and makes an asynchronous call to the Finding Remediator Lambda function.
  7. The Finding Remediator Lambda function moves the offending object from the source bucket to a designated S3 quarantine bucket with restricted access.
  8. Finally, the Finding Remediator Lambda function uses a callback URL to update the original finding notification in Slack, indicating that the offending object has now been quarantined.

Deploy the solution

Now we’ll walk through the steps for configuring Slack and deploying the solution into your AWS environment by using the AWS CDK. The AWS CDK is a software development framework that you can use to define cloud infrastructure in code and provision through AWS CloudFormation.

The deployment steps can be summarized as follows:

  1. Configure a Slack channel and app
  2. Check the project out from GitHub
  3. Set the configuration parameters
  4. Build and deploy the solution
  5. Configure Slack with an API Gateway endpoint

To configure a Slack channel and app

  1. In your browser, make sure you’re logged into the Slack workspace where you want to integrate the solution.
  2. Create a new channel where you will send the notifications, as follows:
    1. Choose the + icon next to the Channels menu, and select Create a channel.
    2. Give your channel a name, for example macie-findings, and make sure you turn on the Make private setting.

      Important: By providing Slack users with access to this configured channel, you’re providing implicit access to review Macie finding details and approve remediations. To avoid unwanted user access, it’s strongly recommended that you make this channel private and by invite only.

  3. On your Apps page, create a new app by selecting Create New App, and then enter the following information:
    1. For App Name, enter a name of your choosing, for example MacieRemediator.
    2. Select your chosen development Slack workspace that you logged into in step 1.
    3. Choose Create App.
    Figure 2: Create a Slack app

    Figure 2: Create a Slack app

  4. You will then see the Basic Information page for your app. Scroll down to the App Credentials section, and note down the Signing Secret. This secret will be used by the Lambda function that handles all remediation requests from Slack. The function uses the secret with Hash-based Message Authentication Code (HMAC) authentication to validate that requests to the solution are legitimate and originated from your trusted Slack channel.

    Figure 3: Signing secret

    Figure 3: Signing secret

  5. Scroll back to the top of the Basic Information page, and under Add features and functionality, select the Incoming Webhooks tile. Turn on the Activate Incoming Webhooks setting.
  6. At the bottom of the page, choose Add New Webhook to Workspace.
    1. Select the macie-findings channel you created in step 2, and choose Allow.
    2. You should now see webhook URL details under Webhook URLs for Your Workspace. Use the Copy button to note down the URL, which you will need later.

      Figure 4: Webhook URL

      Figure 4: Webhook URL

To check the project out from GitHub

The solution source is available on GitHub in AWS Samples. Clone the project to your local machine or download and extract the available zip file.

To set the configuration parameters

In the root directory of the project you’ve just cloned, there’s a file named cdk.json. This file contains configuration parameters to allow integration with the macie-findings channel you created earlier, and also to allow you to control the auto-remediation behavior of the solution. Open this file and make sure that you review and update the following parameters:

  • autoRemediateConfig – This nested attribute allows you to specify for each sensitive data finding type whether you want to automatically remediate and quarantine the offending object, or first send the finding to Slack for human review and authorization. Note that you will still be notified through Slack that auto-remediation has taken place if this attribute is set to AUTO. Valid values are either AUTO or REVIEW. You can use the default values.
  • minSeverityLevel – Macie assigns all findings a Severity level. With this parameter, you can define a minimum severity level that must be met before the solution will trigger action. For example, if the parameter is set to MEDIUM, the solution won’t take any action or send any notifications when a finding has a LOW severity, but will take action when a finding is classified as MEDIUM or HIGH. Valid values are: LOW, MEDIUM, and HIGH. The default value is set to LOW.
  • slackChannel – The name of the Slack channel you created earlier (macie-findings).
  • slackWebHookUrl – For this parameter, enter the webhook URL that you noted down during Slack app setup in the “Configure a Slack channel and app” step.
  • slackSigningSecret – For this parameter, enter the signing secret that you noted down during Slack app setup.

Save your changes to the configuration file.

To build and deploy the solution

  1. From the command line, make sure that your current working directory is the root directory of the project that you cloned earlier. Run the following commands:
    • npm install – Installs all Node.js dependencies.
    • npm run build – Compiles the CDK TypeScript source.
    • cdk bootstrap – Initializes the CDK environment in your AWS account and Region, as shown in Figure 5.

      Figure 5: CDK bootstrap output

      Figure 5: CDK bootstrap output

    • cdk deploy – Generates a CloudFormation template and deploys the solution resources.

    The resources created can be reviewed in the CloudFormation console and can be summarized as follows:

    • Lambda functions – Finding Handler, Remediation Handler, and Remediator
    • IAM execution roles and associated policy – The roles and policy associated with each Lambda function and the API Gateway
    • S3 bucket – The quarantine S3 bucket
    • EventBridge rule – The rule that triggers the Lambda function for Macie sensitive data findings
    • API Gateway – A single remediation API with proxy integration to the Lambda handler
  2. After you run the deploy command, you’ll be prompted to review the IAM resources deployed as part of the solution. Press y to continue.
  3. Once the deployment is complete, you’ll be presented with an output parameter, shown in Figure 6, which is the endpoint for the API Gateway that was deployed as part of the solution. Copy this URL.

    Figure 6: CDK deploy output

    Figure 6: CDK deploy output

To configure Slack with the API Gateway endpoint

  1. Open Slack and return to the Basic Information page for the Slack app you created earlier.
  2. Under Add features and functionality, select the Interactive Components tile.
  3. Turn on the Interactivity setting.
  4. In the Request URL box, enter the API Gateway endpoint URL you copied earlier.
  5. Choose Save Changes.

    Figure 7: Slack app interactivity

    Figure 7: Slack app interactivity

Now that you have the solution components deployed and Slack configured, it’s time to test things out.

Test the solution

The testing steps can be summarized as follows:

  1. Upload dummy files to S3
  2. Run the Macie sensitive data discovery job
  3. Review and act upon Slack notifications
  4. Confirm that S3 objects are quarantined

To upload dummy files to S3

Two sample text files containing dummy financial and personal data are available in the project you cloned from GitHub. If you haven’t changed the default auto-remediation configurations, these two files will exercise both the auto-remediation and manual remediation review flows.

Find the files under sensitive-data-samples/dummy-financial-data.txt and sensitive-data-samples/dummy-personal-data.txt. Take these two files and upload them to S3 by using either the console, as shown in Figure 8, or AWS CLI. You can choose to use any new or existing bucket, but make sure that the bucket is in the same AWS account and Region that was used to deploy the solution.

Figure 8: Dummy files uploaded to S3

Figure 8: Dummy files uploaded to S3

To run a Macie sensitive data discovery job

  1. Navigate to the Amazon Macie console, and make sure that your selected Region is the same as the one that was used to deploy the solution.
    1. If this is your first time using Macie, choose the Get Started button, and then choose Enable Macie.
  2. On the Macie Summary dashboard, you will see a Create Job button at the top right. Choose this button to launch the Job creation wizard. Configure each step as follows:
    1. Select S3 buckets: Select the bucket where you uploaded the dummy sensitive data file. Choose Next.
    2. Review S3 buckets: No changes are required, choose Next.
    3. Scope: For Job type, choose One-time job. Make sure Sampling depth is set to 100%. Choose Next.
    4. Custom data identifiers: No changes are required, choose Next.
    5. Name and description: For Job name, enter any name you like, such as Dummy job, and then choose Next.
    6. Review and create: Review your settings; they should look like the following sample. Choose Submit.
Figure 9: Configure the Macie sensitive data discovery job

Figure 9: Configure the Macie sensitive data discovery job

Macie will launch the sensitive data discovery job. You can track its status from the Jobs page within the Macie console.

To review and take action on Slack notifications

Within five minutes of submitting the data discovery job, you should expect to see two notifications appear in your configured Slack channel. One notification, similar to the one in Figure 10, is informational only and is related to an auto-remediation action that has taken place.

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

The other notification, similar to the one in Figure 11, requires end user action and is for a finding that requires administrator review. All notifications will display key information such as the offending S3 object, a description of the finding, the finding severity, and other relevant metadata.

Figure 11: Slack notification for human review of the file containing dummy personal data

Figure 11: Slack notification for human review of the file containing dummy personal data

(Optional) You can review the finding details by choosing the View Macie Finding in Console link in the notification.

In the Slack notification, choose the Remediate button to quarantine the object. The notification will be updated with confirmation of the quarantine action, as shown in Figure 12.

Figure 12: Slack notification of authorized remediation

Figure 12: Slack notification of authorized remediation

To confirm that S3 objects are quarantined

Finally, navigate to the S3 console and validate that the objects have been removed from their original bucket and placed into the quarantine bucket listed in the notification details, as shown in Figure 13. Note that you may need to refresh your S3 object listing in the browser.

Figure 13: Slack notification of authorized remediation

Figure 13: Slack notification of authorized remediation

Congratulations! You now have a fully operational solution to detect and respond to Macie sensitive data findings through a Slack ChatOps workflow.

Solution cleanup

To remove the solution and avoid incurring additional charges from the AWS resources that you deployed, complete the following steps.

To remove the solution and associated resources

  1. Navigate to the Macie console. Under Settings, choose Suspend Macie.
  2. Navigate to the S3 console and delete all objects in the quarantine bucket.
  3. Run the command cdk destroy from the command line within the root directory of the project. You will be prompted to confirm that you want to remove the solution. Press y.

Summary

In this blog post, I showed you how to integrate Amazon Macie sensitive data findings with an auto-remediation and Slack ChatOps workflow. We reviewed the AWS services used, how they are integrated, and the steps to configure, deploy, and test the solution. With Macie and the solution in this blog post, you can substantially reduce the heavy lifting associated with detecting and responding to sensitive data in your AWS environment.

I encourage you to take this solution and customize it to your needs. Further enhancements could include supporting policy findings, adding additional remediation actions, or integrating with additional findings from AWS Security Hub.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Nick Cuneo

Nick is an Enterprise Solutions Architect at AWS who works closely with Australia’s largest financial services organisations. His previous roles span operations, software engineering, and design. Nick is passionate about application and network security, automation, microservices, and event driven architectures. Outside of work, he enjoys motorsport and is found most weekends in his garage wrenching on cars.

Detecting sensitive data in DynamoDB with Macie

Post Syndicated from Sheldon Sides original https://aws.amazon.com/blogs/security/detecting-sensitive-data-in-dynamodb-with-macie/

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in Amazon Web Services (AWS). It gives you the ability to automatically scan for sensitive data and get an inventory of your Amazon Simple Storage Service (Amazon S3) buckets. Macie also gives you the added ability to detect which buckets are public, unencrypted, and accessible from other AWS accounts.

In this post, we’ll walk through how to use Macie to detect sensitive data in Amazon DynamoDB tables by exporting the data to Amazon S3 so that Macie can scan the data. An example of why you would deploy a solution like this is if you have potentially sensitive data stored in DynamoDB tables. When we’re finished, you’ll have a solution that can set up on-demand or scheduled Macie discovery jobs to detect sensitive data exported from DynamoDB to S3.

Architecture

In figure 1, you can see an architectural diagram explaining the flow of the solution that you’ll be deploying.

Figure 1: Solution architecture

Figure 1: Solution architecture

Here’s a brief overview of the steps that you’ll take to deploy the solution. Some steps you will do manually, while others will be handled by the provided AWS CloudFormation template. The following outline describes the steps taken to extract the data from DynamoDB and store it in S3, which allows Macie to run a discovery job against the data.

  1. Enable Amazon Macie, if it isn’t already enabled.
  2. Deploy a test DynamoDB dataset.
  3. Create an S3 bucket to export DynamoDB data to.
  4. Configure an AWS Identity and Access Management (IAM) policy and role. (These are used by the Lambda function to access the S3 and DynamoDB tables)
  5. Deploy an AWS Lambda function to export DynamoDB data to S3.
  6. Set up an Amazon EventBridge rule to schedule export of the DynamoDB data.
  7. Create a Macie discovery job to discover sensitive data from the DynamoDB data export.
  8. View the results of the Macie discovery job.

The goal is that when you finish, you have a solution that you can use to set up either on-demand or scheduled Macie discovery jobs to detect sensitive data that was exported from DynamoDB to S3.

Prerequisite: Enable Macie

If Macie hasn’t been enabled in your account, complete Step 1 in Getting started with Amazon Macie to enable Macie. Once you’ve enabled Macie, you can proceed with the deployment of the CloudFormation template.

Deploy the CloudFormation template

In this section, you start by deploying the CloudFormation template that will deploy all the resources needed for the solution. You can then review the output of the resources that have been deployed.

To deploy the CloudFormation template

  1. Download the CloudFormation template: https://github.com/aws-samples/macie-dynamodb-blog/blob/main/src/cft.yaml
  2. Sign in to the AWS Management Console and navigate to the CloudFormation console.
  3. Choose Upload a template file, and then select the CloudFormation template that you downloaded in the previous step. Choose Next.

    Figure 2 - Uploading the CloudFormation template to be deployed

    Figure 2 – Uploading the CloudFormation template to be deployed

  4. For Stack Name, name your stack macie-blog, and then choose Next.

    Figure 3: Naming your CloudFormation stack

    Figure 3: Naming your CloudFormation stack

  5. For Configure stack options, keep the default values and choose Next.
  6. At the bottom of the Review screen, select the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create stack.

    Figure 4: Acknowledging that this CloudFormation template will create IAM roles

    Figure 4: Acknowledging that this CloudFormation template will create IAM roles

You should then see the following screen. It may take several minutes for the CloudFormation template to finish deploying.

Figure 5: CloudFormation stack creation in progress

Figure 5: CloudFormation stack creation in progress

View CloudFormation output

Once the CloudFormation template has been completely deployed, choose the Outputs tab, and you will see the following screen. Here you’ll find the names and URLs for all the AWS resources that are needed to complete the remainder of the solution.

Figure 6: Completed CloudFormation stack output

Figure 6: Completed CloudFormation stack output

For easier reference, open a new browser tab to your AWS Management Console and leave this tab open. This will make it easier to quickly copy and paste the resource URLs as you navigate to different resources during this walkthrough.

Import DynamoDB data

In this section, we walk through importing the test dataset to DynamoDB. You first start by downloading the test CSV datasets, then upload those datasets to S3 and run the Lambda function that imports the data to DynamoDB. Finally, you review the data that was imported into DynamoDB.

Test datasets

Download the following test datasets:

  1. Accounts Info test dataset (accounts.csv): https://github.com/aws-samples/macie-dynamodb-blog/blob/main/datasets/accounts.csv
  2. People test dataset (people.csv): https://github.com/aws-samples/macie-dynamodb-blog/blob/main/datasets/people.csv

Upload data to the S3 import bucket

Now that you’ve downloaded the test datasets, you’ll need to navigate to the data import S3 bucket and upload the data.

To upload the datasets to the S3 import bucket

  1. Navigate to the CloudFormation Outputs tab, where you’ll find the bucket information.

    Figure 7: S3 bucket output values for the CloudFormation stack

    Figure 7: S3 bucket output values for the CloudFormation stack

  2. Copy the ImportS3BucketURL link and navigate to the URL.
  3. Upload the two test CSV datasets, people.csv and accounts.csv, to your S3 bucket.
  4. After the upload is complete, you should see the two CSV files in the S3 bucket. You’ll use these files as your test DynamoDB data.

    Figure 8: Test S3 datasets in the S3 bucket

    Figure 8: Test S3 datasets in the S3 bucket

View the data import Lambda function

Now that you have your test data staged for loading, you’ll import it into DynamoDB by using a Lambda function that was deployed with the CloudFormation template. To start, navigate to the CloudFormation console and get the URL to the Lambda function that will handle the data import to DynamoDB, as shown in figure 9.

Figure 9: CloudFormation output information for the People DynamoDB table

Figure 9: CloudFormation output information for the People DynamoDB table

To run the data import Lambda function

  1. Copy the LambdaImportS3DataToDynamoURL link and navigate to the URL. You will see the Import-Data-To-DynamoDB Lambda function, as shown in figure 10.

    Figure 10: The Lambda function that imports data to DynamoDB

    Figure 10: The Lambda function that imports data to DynamoDB

  2. Choose the Test button in the upper right-hand corner. In the dialog screen, for Event name, enter Test and replace the value with {}.
  3. Your screen should now look as shown in figure 11. Choose Create.

    Figure 11: Configuring a test event to manually run the Lambda function

    Figure 11: Configuring a test event to manually run the Lambda function

  4. Choose the Test button again in the upper right-hand corner. You should now see the Lambda function running, as shown in figure 12.

    Figure 12: View of the Lambda function running

    Figure 12: View of the Lambda function running

  5. Once the Lambda function is finished running, you can expand the Details section. You should see a screen similar to the one in figure 13. When you see this screen, the test datasets have successfully been imported into the DynamoDB tables.

    Figure 13: View of the data import Lambda function after it runs successfully

    Figure 13: View of the data import Lambda function after it runs successfully

View the DynamoDB test dataset

Now that you have the datasets imported, you can look at the data in the console.

To view the test dataset

  1. Navigate to the two DynamoDB tables. You can do this by getting the URL values from the CloudFormation Outputs tab. Figure 14 shows the URL for the accounts tables.
    Figure 14: Output values for CloudFormation stack DynamoDB account tables

    Figure 14: Output values for CloudFormation stack DynamoDB account tables

    Figure 15 shows the URL for the people tables.

    Figure 15: Output values for CloudFormation stack DynamoDB people tables

    Figure 15: Output values for CloudFormation stack DynamoDB people tables

  2. Copy the AccountsDynamoDBTableURL link value and navigate to it in the browser. Then choose the Items tab.
    Figure 16: View of DynamoDB account-info-macie table data

    Figure 16: View of DynamoDB account-info-macie table data

    You should now see a screen showing data similar to the screen in figure 16. This DynamoDB table stores the test account data that you will use to run a Macie discovery job against after the data has been exported to S3.

  3. Navigate to the PeopleDynamoDBTableURL link that is in the CloudFormation output. Then choose the Items tab.

    Figure 17: View of DynamoDB people table data

    Figure 17: View of DynamoDB people table data

You should now see a screen showing data similar to the screen in figure 17. This DynamoDB table stores the test people data that you will use to run a Macie discovery job against after the data has been exported to S3.

Export DynamoDB data to S3

In the previous section, you set everything up and staged the data to DynamoDB. In this section, you will export data from DynamoDB to S3.

View the EventBridge rule

The EventBridge rule that was deployed earlier allows you to automatically schedule the export of DynamoDB data to S3. You will can export data in hours, in minutes, or in days. The purpose of the EventBridge rule is to allow you to set up an automated data pipeline from DynamoDB to S3. For demonstration purposes, you’ll run the Lambda function that the EventBridge rule uses manually, so that you can see the data be exported to S3 without having to wait.

To view the EventBridge rule

  1. Navigate to the CloudFormation Outputs tab for the CloudFormation stack you deployed earlier.

    Figure 18: CloudFormation output information for the EventBridge rule

    Figure 18: CloudFormation output information for the EventBridge rule

  2. Navigate to the EventBridgeRule link. You should see the following screen.

    Figure 19: EventBridge rule configuration details page

    Figure 19: EventBridge rule configuration details page

On this screen, you can see that we’ve set the event schedule to run every hour. The interval can be changed to fit your business needs. We have set it for 1 hour for demonstration purposes only. To make changes to the interval, you can choose the Edit button to make changes and then save the rule.

In the Target(s) section, we’ve configured a Lambda function named Export-DynamoDB-Data-To-S3 to handle the process of exporting data to the S3 bucket the Macie discovery job will run against. We will cover the Lambda function that handles the export of the data from DynamoDB next.

View the data export Lambda function

In this section, you’ll take a look at the Lambda function that handles the exporting of DynamoDB data to the S3 bucket that Macie will run its discovery job against.

To view the Lambda function

  1. Navigate to the CloudFormation Outputs tab for the CloudFormation stack you deployed earlier.

    Figure 20: CloudFormation output information for the Lambda function that exports DynamoDB data to S3

    Figure 20: CloudFormation output information for the Lambda function that exports DynamoDB data to S3

  2. Copy the link value for LambdaExportDynamoDBDataToS3URL and navigate to the URL in your browser. You should see the Python code that will handle the exporting of data to S3. The code has been commented so that you can easily follow it and refactor it for your needs.
  3. Scroll to the Environment variables section.
    Figure 21: Environment variables used by the Lambda function

    Figure 21: Environment variables used by the Lambda function

    You will see two environment variables:

  • bucket_to_export_to – This environment variable is used by the function as the S3 bucket location to save the DynamoDB data to. This is the bucket that the Macie discovery will run against.
  • dynamo_db_tables – This environment variable is a comma-delimited list of DynamoDB tables that will be read and have data exported to S3. If there was another table that you wanted to export data from, you would simply add it to the comma-delimited list and it would be part of the export.

Export DynamoDB data

In this section, you will manually run the Lambda function to export the DynamoDB tables data to S3. As stated previously, you would normally allow the EventBridge rule to handle the automated export of the data to S3. In order to see the export in action, you’re going to manually run the function.

To run the export Lambda function

  1. In the console, scroll back to the top of the screen and choose the Test button.
  2. Name the test dynamoDBExportTest, and for the test data create an empty JSON object “{}” as shown in figure 22.

    Figure 22: Configuring a test event to manually test the data export Lambda function

    Figure 22: Configuring a test event to manually test the data export Lambda function

  3. Choose Create.
  4. Choose the Test button again to run the Lambda function to export the DynamoDB data to S3.

    Figure 23: View of the screen where you run the Lambda function to export data

    Figure 23: View of the screen where you run the Lambda function to export data

  5. It could take about one minute to export the data from DynamoDB to S3. Once the Lambda function exports the data, you should see a screen similar to the following one.

    Figure 24: The result after you successfully run the data export Lambda function

    Figure 24: The result after you successfully run the data export Lambda function

View the exported DynamoDB data

Now that the DynamoDB data has been exported for Macie to run discovery jobs against, you can navigate to S3 to verify that the files exported to the bucket.

To view the data, navigate to the CloudFormation stack Output tab. Find the ExportS3BucketURL, shown in figure 25, and navigate to the link.

Figure 25: CloudFormation output information for the S3 buckets that the DynamoDB data was exported to

Figure 25: CloudFormation output information for the S3 buckets that the DynamoDB data was exported to

You should then see two different JSON files for the two DynamoDB tables that data was exported from, as shown in figure 26.

Figure 26: View of S3 objects that were exported to S3

Figure 26: View of S3 objects that were exported to S3

This is the file naming convention that’s used for the files:

<Service-name>-<DynamoDB-Table-Name>-<AWS-Region>-<DataAndTime>.json

Next, you’ll create a Macie discovery job to run against the files in this S3 bucket to discover sensitive data.

Create the Macie discovery job

In this section, you’ll create a Macie discovery job and view the results after the job has finished running.

To create the discovery job

  1. In the AWS Management Console, navigate to Macie. In the left-hand menu, choose Jobs.

    Figure 27: Navigation menu to Macie discovery jobs

    Figure 27: Navigation menu to Macie discovery jobs

  2. Choose the Create job button.

    Figure 28: Macie discovery job list screen

    Figure 28: Macie discovery job list screen

  3. Using the Bucket Name filter, search for the S3 bucket that the DynamoDB data was exported to. This can be found in the CloudFormation stack output, as shown in figure 29.

    Figure 29: CloudFormation stack output

    Figure 29: CloudFormation stack output

  4. Select the value you see for ExportS3BucketName, as shown in figure 30.

    Note: The value you see for your bucket name will be slightly different, based on the random characters added to the end of the bucket name generated by CloudFormation.

    Figure 30: Selecting the S3 bucket to include in the Macie discovery job

    Figure 30: Selecting the S3 bucket to include in the Macie discovery job

  5. Once you’ve found the S3 bucket, select the check box next to it, and then choose Next.
  6. On the Review S3 Buckets screen, if you’re satisfied with the selected buckets, choose Next.

Following are some important options when setting up Macie data discovery jobs.

Scheduling
You have the following scheduling options for the data discovery job:

  • Daily
  • Weekly
  • Monthly

Data Sampling
This allows you to randomly sample a percentage of the data that the Macie discovery job will run against.

Object criteria
This enables you to target objects based on certain metadata values. The values are:

  • Tags – Target objects with certain tags.
  • Last modified – Target objects based on when they were last modified.
  • File extensions – Target objects based on file extensions.
  • Object size – Target objects based on the file size.

You can include or exclude objects based on these object criteria filters.

Set the discovery job scope

For demonstration purposes, this will be a one-time discovery job.

To set the discovery job scope

  1. On the Scope page that appears after you create the job, set the following options for the job scope:
    1. Select the One-time job option.
    2. Leave Sampling depth set to 100%, and choose Next.

      Figure 31: Selecting the objects that should be in scope for this discovery job

      Figure 31: Selecting the objects that should be in scope for this discovery job

  2. On the Custom data identifiers screen, select account_number, and then choose Next.With the custom identifier, you can create custom business logic to look for certain patterns in files stored in S3. In this example, the job generates a finding for any file that contains data with the following format:

    Account Number Format: Starts with “XYZ-” followed by 11 numbers

    The logic to create a custom data identifier can be found in the CloudFormation template.

    Figure 32: Custom data identifiers

    Figure 32: Custom data identifiers

  3. Give your discovery job the name dynamodb-macie-discovery-job. For Description, enter Discovery job to detect sensitive data exported from DynamoDB, and choose Next.
    Figure 33: Giving the Macie discovery job a name and description

    Figure 33: Giving the Macie discovery job a name and description

    You will then see the Review and create screen, as shown in figure 34.

    Figure 34: The Macie discovery job review screen

    Figure 34: The Macie discovery job review screen

    Note: Macie must have proper permissions to decrypt objects that are part of the Macie discovery job. The CloudFormation template that you deployed during the initial setup has already deployed an AWS Key Management Service (AWS KMS) key with the proper permissions.

    For this proof of concept you won’t store the results, so you can select the check box next to Override this requirement. If you wanted to store detailed results of the discovery job long term, you would configure a repository for data discovery results. To view detailed steps for setting this up, see Storing and retaining discovery results with Amazon Macie.

Submit the discovery job

Next, you can submit the discovery job. On the Review and create screen, choose the Submit button to start the discovery job. You should see a screen similar to the following.

Figure 35: A Macie discovery job run that is in progress

Figure 35: A Macie discovery job run that is in progress

The amount of data that is being scanned dictates how long the job will take to run. You can choose the Refresh button at the top of the screen to see the updated status of the job. This job, based on the size of the test dataset, will take about seven minutes to complete.

Review the job results

Now that the Macie discovery job has run, you can review the results to see what sensitive data was discovered in the data exported from DynamoDB.

You should see the following screen once the job has successfully run.

Figure 36: View of the completed Macie discovery job

Figure 36: View of the completed Macie discovery job

On the right, you should see another pane with more information related to the discovery job. The pane should look like the following screen.

Figure 37: Summary showing which S3 bucket the discovery job ran against and start and complete time

Figure 37: Summary showing which S3 bucket the discovery job ran against and start and complete time

Note: If you don’t see this pane, choose on the discovery job to have this information displayed.

To review the job results

  1. On the page for the discovery job, in the Show Results list, select Show findings.

    Figure 38: Option to view discovery job findings

    Figure 38: Option to view discovery job findings

  2. The Findings screen appears, as follows.
    Figure 39: Viewing the list of findings generated by the Macie discovery job

    Figure 39: Viewing the list of findings generated by the Macie discovery job

    The discovery job that you ran has two different “High Severity” finding types:

    SensitiveData:S3Object/Personal – The object contains personal information, such as full names or identification numbers.

    SensitiveData:S3Object/Multiple – The object contains more than one type of sensitive data.

    Learn more about Macie findings types.

  3. Choose the SensitiveData:S3Object/Personal finding type, and you will see an information pane appear to the right, as shown in figure 40.Some of the key information that you can find here:

    Severity – What the severity of the finding is: Low, Medium, or High.
    Resource – The S3 bucket where the S3 object exists that caused the finding to be generated.
    Region – The Region where the S3 bucket exists.

    Figure 40: Viewing the severity of the discovery job finding

    Figure 40: Viewing the severity of the discovery job finding

    Since the finding is based on the detection of personal information in the S3 object, you get the number of times and type of personal data that was discovered, as shown in figure 41.

    Figure 41: Viewing the number of social security numbers that were discovered in the finding

    Figure 41: Viewing the number of social security numbers that were discovered in the finding

    Here you can see that 10 names were detected in the data that you exported from the DynamoDB table. Occurrences of name equals 10 line ranges, which tells you that the names were found on 10 different lines in the file. If you choose the 10 line ranges link, you are given the starting line and column in the document where the name was discovered.

    The S3 object that triggered the finding is displayed in the Resource affected section, as shown in figure 42.

    Figure 42: The S3 object that generated the Macie finding

    Figure 42: The S3 object that generated the Macie finding

Now that you know which S3 object contains the sensitive data, you can investigate further to take appropriate action to protect the data.

View the Macie finding details

In this section, you will walk through how to read and download the objects related to the Macie discovery job.

To download and view the S3 object that contains the finding

  1. In the Overview section of the finding details, select the value for the Resource link. You will then be taken to the object in the S3 bucket.

    Figure 43: Viewing the S3 bucket where the object is located that generated the Macie finding

    Figure 43: Viewing the S3 bucket where the object is located that generated the Macie finding

  2. You can then download the S3 object from the S3 bucket to view the file content and further investigate the file content for sensitive data. Select the check box next to the S3 object, and choose the Download button at the top of the screen.Next, we will look at the SensitiveData:S3Object/Multiple finding type that was generated. This finding type lets us know that there are multiple types of potentially sensitive data related to an object stored in S3.
  3. In the left navigation menu, navigate back to the Jobs menu.
  4. Choose the job that you created in the previous steps. In the Show Results list, select Show Findings.
  5. Select the SensitiveData:S3Object/Multiple finding type. An information pane appears to the right. As with the previous finding, you will see the severity, Region, S3 bucket location, and other relevant information about the finding. For this finding, we will focus on the Custom data identifiers and Personal info sections.
    Figure 44: Details about the sensitive data that was discovered by the Macie discovery job

    Figure 44: Details about the sensitive data that was discovered by the Macie discovery job

    Here you can see that the discovery job found 10 names on 10 different lines in the file. Also, you can see that 10 account numbers were discovered on 10 different lines in the file, based on the custom identifier that was included as part of the discovery job.

    This finding demonstrates how you can use the built-in Macie identifiers, such as names, and also include custom business logic based on your organization’s needs by using Macie custom data identifiers.

    To view the data and investigate further, follow the same steps as in the previous finding you investigated.

  6. Navigate to the top of the screen and in the Overview section, locate the Resource.

    Figure 45: Viewing the S3 bucket where the object is located that generated the Macie finding

    Figure 45: Viewing the S3 bucket where the object is located that generated the Macie finding

  7. Choose Resource, which will take you to the S3 object to download. You can now view the contents of the file and investigate further.

You’ve now created a Macie discovery job to scan for sensitive data stored in an S3 bucket that originated in DynamoDB. You can also automate this solution further by using EventBridge rules to detect Macie findings to take actions against those objects with sensitive data.

Solution cleanup

In order to clean up the solution that you just deployed, complete the following steps. Note that you need to do these steps to stop data from being exported from DynamoDB to S3 every 1 hour.

To perform cleanup

  1. Navigate to the S3 buckets used to import and export data. You can find the bucket names in the CloudFormation Outputs tab in the console, as shown in figure 7 and figure 25.
  2. After you’ve navigated to each of the buckets, delete all objects from the bucket.
  3. Navigate to the CloudFormation console, and then delete the CloudFormation stack named macie-blog. After the stack is deleted, the solution will no longer be deployed in your AWS account.

Summary

After deploying the solution, we hope you have a better understanding of how you can use Macie to detect sensitive from other data sources, such as DynamoDB, as outlined in this post. The following are links to resources that you can use to further expand your knowledge of Amazon Macie capabilities and features.

Additional resources

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sheldon Sides

Sheldon is a Senior Solutions Architect, focused on helping customers implement native AWS security services. He enjoys using his experience as a consultant and running a cloud security startup to help customers build secure AWS Cloud solutions. His interests include working out, software development, and learning about the latest technologies.

Use Macie to discover sensitive data as part of automated data pipelines

Post Syndicated from Brandon Wu original https://aws.amazon.com/blogs/security/use-macie-to-discover-sensitive-data-as-part-of-automated-data-pipelines/

Data is a crucial part of every business and is used for strategic decision making at all levels of an organization. To extract value from their data more quickly, Amazon Web Services (AWS) customers are building automated data pipelines—from data ingestion to transformation and analytics. As part of this process, my customers often ask how to prevent sensitive data, such as personally identifiable information, from being ingested into data lakes when it’s not needed. They highlight that this challenge is compounded when ingesting unstructured data—such as files from process reporting, text files from chat transcripts, and emails. They also mention that identifying sensitive data inadvertently stored in structured data fields—such as in a comment field stored in a database—is also a challenge.

In this post, I show you how to integrate Amazon Macie as part of the data ingestion step in your data pipeline. This solution provides an additional checkpoint that sensitive data has been appropriately redacted or tokenized prior to ingestion. Macie is a fully managed data security and privacy service that uses machine learning and pattern matching to discover sensitive data in AWS.

When Macie discovers sensitive data, the solution notifies an administrator to review the data and decide whether to allow the data pipeline to continue ingesting the objects. If allowed, the objects will be tagged with an Amazon Simple Storage Service (Amazon S3) object tag to identify that sensitive data was found in the object before progressing to the next stage of the pipeline.

This combination of automation and manual review helps reduce the risk that sensitive data—such as personally identifiable information—will be ingested into a data lake. This solution can be extended to fit your use case and workflows. For example, you can define custom data identifiers as part of your scans, add additional validation steps, create Macie suppression rules to archive findings automatically, or only request manual approvals for findings that meet certain criteria (such as high severity findings).

Solution overview

Many of my customers are building serverless data lakes with Amazon S3 as the primary data store. Their data pipelines commonly use different S3 buckets at each stage of the pipeline. I refer to the S3 bucket for the first stage of ingestion as the raw data bucket. A typical pipeline might have separate buckets for raw, curated, and processed data representing different stages as part of their data analytics pipeline.

Typically, customers will perform validation and clean their data before moving it to a raw data zone. This solution adds validation steps to that pipeline after preliminary quality checks and data cleaning is performed, noted in blue (in layer 3) of Figure 1. The layers outlined in the pipeline are:

  1. Ingestion – Brings data into the data lake.
  2. Storage – Provides durable, scalable, and secure components to store the data—typically using S3 buckets.
  3. Processing – Transforms data into a consumable state through data validation, cleanup, normalization, transformation, and enrichment. This processing layer is where the additional validation steps are added to identify instances of sensitive data that haven’t been appropriately redacted or tokenized prior to consumption.
  4. Consumption – Provides tools to gain insights from the data in the data lake.

 

Figure 1: Data pipeline with sensitive data scan

Figure 1: Data pipeline with sensitive data scan

The application runs on a scheduled basis (four times a day, every 6 hours by default) to process data that is added to the raw data S3 bucket. You can customize the application to perform a sensitive data discovery scan during any stage of the pipeline. Because most customers do their extract, transform, and load (ETL) daily, the application scans for sensitive data on a scheduled basis before any crawler jobs run to catalog the data and after typical validation and data redaction or tokenization processes complete.

You can expect that this additional validation will add 5–10 minutes to your pipeline execution at a minimum. The validation processing time will scale linearly based on object size, but there is a start-up time per job that is constant.

If sensitive data is found in the objects, an email is sent to the designated administrator requesting an approval decision, which they indicate by selecting the link corresponding to their decision to approve or deny the next step. In most cases, the reviewer will choose to adjust the sensitive data cleanup processes to remove the sensitive data, deny the progression of the files, and re-ingest the files in the pipeline.

Additional considerations for deploying this application for regular use are discussed at the end of the blog post.

Application components

The following resources are created as part of the application:

Note: the application uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See AWS Pricing for details. The primary drivers of the solution cost will be the amount of data ingested through the pipeline, both for Amazon S3 storage and data processed for sensitive data discovery with Macie.

The architecture of the application is shown in Figure 2 and described in the text that follows.
 

Figure 2: Application architecture and logic

Figure 2: Application architecture and logic

Application logic

  1. Objects are uploaded to the raw data S3 bucket as part of the data ingestion process.
  2. A scheduled EventBridge rule runs the sensitive data scan Step Functions workflow.
  3. triggerMacieScan Lambda function moves objects from the raw data S3 bucket to the scan stage S3 bucket.
  4. triggerMacieScan Lambda function creates a Macie sensitive data discovery job on the scan stage S3 bucket.
  5. checkMacieStatus Lambda function checks the status of the Macie sensitive data discovery job.
  6. isMacieStatusCompleteChoice Step Functions Choice state checks whether the Macie sensitive data discovery job is complete.
    1. If yes, the getMacieFindingsCount Lambda function runs.
    2. If no, the Step Functions Wait state waits 60 seconds and then restarts Step 5.
  7. getMacieFindingsCount Lambda function counts all of the findings from the Macie sensitive data discovery job.
  8. isSensitiveDataFound Step Functions Choice state checks whether sensitive data was found in the Macie sensitive data discovery job.
    1. If there was sensitive data discovered, run the triggerManualApproval Lambda function.
    2. If there was no sensitive data discovered, run the moveAllScanStageS3Files Lambda function.
  9. moveAllScanStageS3Files Lambda function moves all of the objects from the scan stage S3 bucket to the scanned data S3 bucket.
  10. triggerManualApproval Lambda function tags and moves objects with sensitive data discovered to the manual review S3 bucket, and moves objects with no sensitive data discovered to the scanned data S3 bucket. The function then sends a notification to the ApprovalRequestNotification Amazon SNS topic as a notification that manual review is required.
  11. Email is sent to the email address that’s subscribed to the ApprovalRequestNotification Amazon SNS topic (from the application deployment template) for the manual review user with the option to Approve or Deny pipeline ingestion for these objects.
  12. Manual review user assesses the objects with sensitive data in the manual review S3 bucket and selects the Approve or Deny links in the email.
  13. The decision request is sent from the Amazon API Gateway to the receiveApprovalDecision Lambda function.
  14. manualApprovalChoice Step Functions Choice state checks the decision from the manual review user.
    1. If denied, run the deleteManualReviewS3Files Lambda function.
    2. If approved, run the moveToScannedDataS3Files Lambda function.
  15. deleteManualReviewS3Files Lambda function deletes the objects from the manual review S3 bucket.
  16. moveToScannedDataS3Files Lambda function moves the objects from the manual review S3 bucket to the scanned data S3 bucket.
  17. The next step of the automated data pipeline will begin with the objects in the scanned data S3 bucket.

Prerequisites

For this application, you need the following prerequisites:

You can use AWS Cloud9 to deploy the application. AWS Cloud9 includes the AWS CLI and AWS SAM CLI to simplify setting up your development environment.

Deploy the application with AWS SAM CLI

You can deploy this application using the AWS SAM CLI. AWS SAM uses AWS CloudFormation as the underlying deployment mechanism. AWS SAM is an open-source framework that you can use to build serverless applications on AWS.

To deploy the application

  1. Initialize the serverless application using the AWS SAM CLI from the GitHub project in the aws-samples repository. This will clone the project locally which includes the source code for the Lambda functions, Step Functions state machine definition file, and the AWS SAM template. On the command line, run the following:
    sam init --location gh: aws-samples/amazonmacie-datapipeline-scan
    

    Alternatively, you can clone the Github project directly.

  2. Deploy your application to your AWS account. On the command line, run the following:
    sam deploy --guided
    

    Complete the prompts during the guided interactive deployment. The first deployment prompt is shown in the following example.

    Configuring SAM deploy
    ======================
    
            Looking for config file [samconfig.toml] :  Found
            Reading default arguments  :  Success
    
            Setting default arguments for 'sam deploy'
            =========================================
            Stack Name [maciepipelinescan]:
    

  3. Settings:
    • Stack Name – Name of the CloudFormation stack to be created.
    • AWS RegionRegion—for example, us-west-2, eu-west-1, ap-southeast-1—to deploy the application to. This application was tested in the us-west-2 and ap-southeast-1 Regions. Before selecting a Region, verify that the services you need are available in those Regions (for example, Macie and Step Functions).
    • Parameter StepFunctionName – Name of the Step Functions state machine to be created—for example, maciepipelinescanstatemachine).
    • Parameter BucketNamePrefix – Prefix to apply to the S3 buckets to be created (S3 bucket names are globally unique, so choosing a random prefix helps ensure uniqueness).
    • Parameter ApprovalEmailDestination – Email address to receive the manual review notification.
    • Parameter EnableMacie – Whether you need Macie enabled in your account or Region. You can select yes or no; select yes if you need Macie to be enabled for you as part of this template, select no, if you already have Macie enabled.
  4. Confirm changes and provide approval for AWS SAM CLI to deploy the resources to your AWS account by responding y to prompts, as shown in the following example. You can accept the defaults for the SAM configuration file and SAM configuration environment prompts.
    #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
    Confirm changes before deploy [y/N]: y
    #SAM needs permission to be able to create roles to connect to the resources in your template
    Allow SAM CLI IAM role creation [Y/n]: y
    ReceiveApprovalDecisionAPI may not have authorization defined, Is this okay? [y/N]: y
    ReceiveApprovalDecisionAPI may not have authorization defined, Is this okay? [y/N]: y
    Save arguments to configuration file [Y/n]: y
    SAM configuration file [samconfig.toml]: 
    SAM configuration environment [default]:
    

    Note: This application deploys an Amazon API Gateway with two REST API resources without authorization defined to receive the decision from the manual review step. You will be prompted to accept each resource without authorization. A token (Step Functions taskToken) is used to authenticate the requests.

  5. This creates an AWS CloudFormation changeset. Once the changeset creation is complete, you must provide a final confirmation of y to Deploy the changeset? [y/N] when prompted as shown in the following example.
    Changeset created successfully. arn:aws:cloudformation:ap-southeast-1:XXXXXXXXXXXX:changeSet/samcli-deploy1605213119/db681961-3635-4305-b1c7-dcc754c7XXXX
    
    
    Previewing CloudFormation changeset before deployment
    ======================================================
    Deploy this changeset? [y/N]:
    

Your application is deployed to your account using AWS CloudFormation. You can track the deployment events in the command prompt or via the AWS CloudFormation console.

After the application deployment is complete, you must confirm the subscription to the Amazon SNS topic. An email will be sent to the email address entered in Step 3 with a link that you need to select to confirm the subscription. This confirmation provides opt-in consent for AWS to send emails to you via the specified Amazon SNS topic. The emails will be notifications of potentially sensitive data that need to be approved. If you don’t see the verification email, be sure to check your spam folder.

Test the application

The application uses an EventBridge scheduled rule to start the sensitive data scan workflow, which runs every 6 hours. You can manually start an execution of the workflow to verify that it’s working. To test the function, you will need a file that contains data that matches your rules for sensitive data. For example, it is easy to create a spreadsheet, document, or text file that contains names, addresses, and numbers formatted like credit card numbers. You can also use this generated sample data to test Macie.

We will test by uploading a file to our S3 bucket via the AWS web console. If you know how to copy objects from the command line, that also works.

Upload test objects to the S3 bucket

  1. Navigate to the Amazon S3 console and upload one or more test objects to the <BucketNamePrefix>-data-pipeline-raw bucket. <BucketNamePrefix> is the prefix you entered when deploying the application in the AWS SAM CLI prompts. You can use any objects as long as they’re a supported file type for Amazon Macie. I suggest uploading multiple objects, some with and some without sensitive data, in order to see how the workflow processes each.

Start the Scan State Machine

  1. Navigate to the Step Functions state machines console. If you don’t see your state machine, make sure you’re connected to the same region that you deployed your application to.
  2. Choose the state machine you created using the AWS SAM CLI as seen in Figure 3. The example state machine is maciepipelinescanstatemachine, but you might have used a different name in your deployment.
     
    Figure 3: AWS Step Functions state machines console

    Figure 3: AWS Step Functions state machines console

  3. Select the Start execution button and copy the value from the Enter an execution name – optional box. Change the Input – optional value replacing <execution id> with the value just copied as follows:
    {
        “id”: “<execution id>”
    }
    

    In my example, the <execution id> is fa985a4f-866b-b58b-d91b-8a47d068aa0c from the Enter an execution name – optional box as shown in Figure 4. You can choose a different ID value if you prefer. This ID is used by the workflow to tag the objects being processed to ensure that only objects that are scanned continue through the pipeline. When the EventBridge scheduled event starts the workflow as scheduled, an ID is included in the input to the Step Functions workflow. Then select Start execution again.
     

    Figure 4: New execution dialog box

    Figure 4: New execution dialog box

  4. You can see the status of your workflow execution in the Graph inspector as shown in Figure 5. In the figure, the workflow is at the pollForCompletionWait step.
     
    Figure 5: AWS Step Functions graph inspector

    Figure 5: AWS Step Functions graph inspector

The sensitive discovery job should run for about five to ten minutes. The jobs scale linearly based on object size, but there is a start-up time per job that is constant. If sensitive data is found in the objects uploaded to the <BucketNamePrefix>-data-pipeline-upload S3 bucket, an email is sent to the address provided during the AWS SAM deployment step, notifying the recipient requesting of the need for an approval decision, which they indicate by selecting the link corresponding to their decision to approve or deny the next step as shown in Figure 6.
 

Figure 6: Sensitive data identified email

Figure 6: Sensitive data identified email

When you receive this notification, you can investigate the findings by reviewing the objects in the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket. Based on your review, you can either apply remediation steps to remove any sensitive data or allow the data to proceed to the next step of the data ingestion pipeline. You should define a standard response process to address discovery of sensitive data in the data pipeline. Common remediation steps include review of the files for sensitive data, deleting the files that you do not want to progress, and updating the ETL process to redact or tokenize sensitive data when re-ingesting into the pipeline. When you re-ingest the files into the pipeline without sensitive data, the files will not be flagged by Macie.

The workflow performs the following:

  • If you select Approve, the files are moved to the <BucketNamePrefix>-data-pipeline-scanned-data S3 bucket with an Amazon S3 SensitiveDataFound object tag with a value of true.
  • If you select Deny, the files are deleted from the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket.
  • If no action is taken, the Step Functions workflow execution times out after five days and the file will automatically be deleted from the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket after 10 days.

Clean up the application

You’ve successfully deployed and tested the sensitive data pipeline scan workflow. To avoid ongoing charges for resources you created, you should delete all associated resources by deleting the CloudFormation stack. In order to delete the CloudFormation stack, you must first delete all objects that are stored in the S3 buckets that you created for the application.

To delete the application

  1. Empty the S3 buckets created in this application (<BucketNamePrefix>-data-pipeline-raw S3 bucket, <BucketNamePrefix>-data-pipeline-scan-stage, <BucketNamePrefix>-data-pipeline-manual-review, and <BucketNamePrefix>-data-pipeline-scanned-data).
  2. Delete the CloudFormation stack used to deploy the application.

Considerations for regular use

Before using this application in a production data pipeline, you will need to stop and consider some practical matters. First, the notification mechanism used when sensitive data is identified in the objects is email. Email doesn’t scale: you should expand this solution to integrate with your ticketing or workflow management system. If you choose to use email, subscribe a mailing list so that the work of reviewing and responding to alerts is shared across a team.

Second, the application is run on a scheduled basis (every 6 hours by default). You should consider starting the application when your preliminary validations have completed and are ready to perform a sensitive data scan on the data as part of your pipeline. You can modify the EventBridge Event Rule to run in response to an Amazon EventBridge event instead of a scheduled basis.

Third, the application currently uses a 60 second Step Functions Wait state when polling for the Macie discovery job completion. In real world scenarios, the discovery scan will take 10 minutes at a minimum, likely several orders of magnitude longer. You should evaluate the typical execution times for your application execution and tune the polling period accordingly. This will help reduce costs related to running Lambda functions and log storage within CloudWatch Logs. The polling period is defined in the Step Functions state machine definition file (macie_pipeline_scan.asl.json) under the pollForCompletionWait state.

Fourth, the application currently doesn’t account for false positives in the sensitive data discovery job results. Also, the application will progress or delete all objects identified based on the decision by the reviewer. You should consider expanding the application to handle false positives through automation rather than manual review / intervention (such as deleting the files from the manual review bucket or removing the sensitive data tags applied).

Last, the solution will stop the ingestion of a subset of objects into your pipeline. This behavior is similar to other validation and data quality checks that most customers perform as part of the data pipeline. However, you should test to ensure that this will not cause unexpected outcomes and address them in your downstream application logic accordingly.

Conclusion

In this post, I showed you how to integrate sensitive data discovery using Macie as an additional validation step in an automated data pipeline. You’ve reviewed the components of the application, deployed it using the AWS SAM CLI, tested to validate that the application functions as expected, and cleaned up by removing deployed resources.

You now know how to integrate sensitive data scanning into your ETL pipeline. You can use automation and—where required—manual review to help reduce the risk of sensitive data, such as personally identifiable information, being inadvertently ingested into a data lake. You can take this application and customize it to fit your use case and workflows, such as using custom data identifiers as part of your scans, adding additional validation steps, creating Macie suppression rules to define cases to archive findings automatically, or only request manual approvals for findings that meet certain criteria (such as high severity findings).

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Brandon Wu

Brandon is a security solutions architect helping financial services organizations secure their critical workloads on AWS. In his spare time, he enjoys exploring outdoors and experimenting in the kitchen.

Discover sensitive data by using custom data identifiers with Amazon Macie

Post Syndicated from Kayla Jing original https://aws.amazon.com/blogs/security/discover-sensitive-data-by-using-custom-data-identifiers-with-amazon-macie/

As you put more and more data in the cloud, you need to rely on security automation to keep it secure at scale. AWS recently launched Amazon Macie, a fully managed service that uses machine learning and pattern matching to help you detect, classify, and better protect your sensitive data stored in the AWS Cloud.

Many data breaches are not the result of malicious activity from unauthorized users, but rather from mistakes made by authorized users. To monitor and manage the security of sensitive data, you must first be able to identify it. In this post, we show you how to use custom data identifiers with Macie to identify sensitive data. Once you know what’s sensitive, you can start designing security controls that operate at scale to monitor and remediate risk automatically.

Macie comes with a set of managed data identifiers that you can use to discover many types of sensitive data. These are somewhat generic and broadly applicable to many organizations. What makes Macie unique is its ability to help you address specific data needs. Macie enables you to expand your sensitive data detection through the new custom data identifiers. Custom data identifiers can be used to highlight organizational proprietary data, intellectual property, and specific scenarios.

Custom Data Identifiers in Macie help you find and identify sensitive data based on your own organization’s specific needs. In this post, we show you a step-by-step walkthrough of how to define and run custom data identifiers to automatically discover specific, sensitive data. Before you begin using Custom Data Identifiers, you need to enable Macie and configure detailed logging. Follow these instructions to enable Macie and follow these instructions to configure detailed logging, if you haven’t done that already.

When to use the Custom Data Identifier resource

To begin, imagine you’re an IT administrator for a manufacturing company that’s headquartered in France. Your company has acquired a few additional local subsidiaries, including an R&D facility in São Paulo, Brazil. The company is migrating to AWS, and in the process is classifying registration information, employee information, and product data into encrypted and non-encrypted storage.

You want to identify sensitive data for the following three scenarios:

  • SIRET-NIC: SIRET-NIC is a unique number assigned to businesses in France. This number is issued by their National Institute of Statistics (INSEE) when a business is registered. A sample file that contains SIRET-NIC information is shown in the following figure. Each record in the file includes the GUID, employee name, employee email, the company name, the date it was issued, and the SIRET-NIC number.

    Figure 1: SIRET-NIC dataset

    Figure 1: SIRET-NIC dataset

  • Brazil CPF (Cadastro de Pessoas Físicas – Natural Persons Register): CPF is a unique number assigned by the Brazilian revenue agency to people subject to taxes in the country. Each of your employees residing in the Brazilian office has a CPF.
  • Prototyping naming convention: Your company has products that are publicly available, but also products that are still in the prototyping stage and should be kept confidential. A sample file that contains Brazil CPF numbers and the prototype names is shown in the following figure.

    Figure 2: Brazil CPF and prototype number dataset

    Figure 2: Brazil CPF and prototype number dataset

Configure the Custom Data Identifier resource in the Macie console

To use custom data identifiers to identify your organization’s sensitive information, you must:

  1. Create custom data identifiers.
  2. Create a job to scan your Amazon Simple Storage Service (Amazon S3) bucket to locate the data patterns that match your custom data identifiers.
  3. Respond to the returned results.

The following steps introduce you to the Custom Data Identifier resource in Macie.

Designing Custom Data Identifiers for use with Amazon Macie

In the previous section you discovered 3 scenarios that your company will like to protect SIRET-NIC, Brazil CPF, and your prototyping naming convention. You now need to first create a specific REGEX pattern for each of these scenarios. There are different syntaxes and dialects of regular expression languages. Amazon Macie supports a subset of the Perl Compatible Regular Expressions (PCRE) library, and you can learn more about it in Regex support in custom data identifiers section. Once the patterns are ready, follow the instructions below to create the custom data identifiers.

Creating Custom Data Identifiers in Amazon Macie

  1. Sign in to the AWS Management Console.
  2. Enter Amazon Macie in the AWS services search box.
  3. Choose Amazon Macie.
  4. In the navigation pane on the left-hand side, under Settings, choose Custom data identifiers as shown in the following figure.

    Figure 3: Custom data identifiers console

    Figure 3: Custom data identifiers console

Create a custom data identifier

  1. Choose Create on the custom data identifier console.
  2. Name: Enter a name for your custom data identifier. Make it descriptive so you know what it does. For example, enter SIRET-NIC for the SIRET-NIC number you use.
  3. Description: Enter a description of the custom data identifier.
  4. Regular expression (regex): Define the pattern you want to identify. Use a Regular Expression (“regex”) to create the desired pattern. For example, a SIRET-NIC number contains 14 digits—9 numbers followed by a hyphen and then 5 more numbers. The first part, 9 numbers, can stay together or separated by spaces into 3 groups of 3. The specific regex pattern for this is \b(\d{3}\s?){2}\d{3}\-\d{5}\b
  5. Keywords: Define expressions that identify the text to match. The SIRET-NIC number itself is publicly accessible information. But in your case, you want to encrypt the information about the company that was registered during the month the acquisition happened (April 2020), thus the information will not leak to your competitors. So, the keywords here will be all the days in April.
  6. (Optional) Ignore words: Use this box to enter text that you want to be ignored. In this example scenario, you know your security training materials always use an example SIRET-NICs of 12345789-12345 and 000000000-00000. You can enter these values here, so that your security training materials are not flagged as sensitive data containing SIRET-NICs.
  7. Maximum match distance: Use this box to define the proximity between the result and the keywords. If you enter 20, Macie will provide results that include the specified keyword and 20 characters on either side of it.

Note: Do not select Submit yet. After entering the settings and before selecting Submit, you should test your custom data identifier with sample data to confirm that it works.

With all the attributes set, your console will look like what is shown in Figure 4.

Figure 4: SIRET-NIC custom data identifier creation

Figure 4: SIRET-NIC custom data identifier creation

Test your SIRET-NIC custom data identifier

Use the Evaluate section on the right-hand panel of the Macie console to confirm that the regex pattern and other configurations for your custom data identifier are correct.

Follow the steps below to use the Evaluate section.

  1. Enter test data in the sample data box.
  2. Select Submit. There will be one match per record in the file if the configurations are correct and your custom data identifier is ready.The following figure is an example of the Evaluate section using test data. The test data has 3 records, each record has 5 fields which are GUID, employee name, employee email, company name, date SIRET-NIC was issued, and the SIRET-NIC number.

    Figure 5: Evaluate, showing sample data

    Figure 5: Evaluate, showing sample data

  3. After verifying your SIRET-NIC custom data identifier works in the Evaluate section, now select Submit on the New custom data identifier window to create the custom data identifier.

Create a Brazil CPF Custom Data Identifier

Congrats on creating your first custom data identifier! Now use the same steps to create and test custom data identifiers for the Brazil CPF and prototyping naming convention scenarios. The Brazil CPF number usually shows up in the format of 000.000.000-00.

Use the following values for the Brazil CPF scenario, as shown in the following figure:

  • Name: Brazil CPF
  • Description: The format for Brazil CPF in our sample data is 000.000.000-00
  • Regular expression: \b(\d{3}\.){2}\d{3}\-\d{2}\b

    Figure 6: Brazil CPF custom data identifier

    Figure 6: Brazil CPF custom data identifier

Create a Prototype Name Custom Data Identifier

Assume that your company has a very strict and regular naming scheme for prototype part numbers. It is P, followed by a hyphen, and then 2 letters and 4 digits. E.g., P-AB1234. You want to identify objects in S3 that contain references to private prototype parts. This is a small pattern, and so if we’re not careful it will cause Macie to flag objects that do not actually contain one of our prototype numbers. We suggest adding \b at the beginning and the end of the regular expression. The \b symbol means a “word boundary” and word boundaries are basically whitespace, punctuation, or other things that are not letters and numbers. With \b, you limit the pattern so that you only match if the entire word matches the pattern. For example, P-AB1234 will match the pattern, but STEP-AB123456 and P-XY123 will not match the pattern. This gives you finer grained control and reduces false positives.

Use the following values for the prototyping name scenario, as shown in the following figure:

  • Name: Prototyping Naming
  • Description: Any prototype name start with P means it’s private. The format for private prototype name is P-2 capital letters and 4 numbers
  • Regular expression: \bP\-[A-Z]{2}\d{4}\b
Figure 7: Prototyping naming custom data identifier

Figure 7: Prototyping naming custom data identifier

You should now see a page like the following figure, indicating that the SIRET-NIC, Brazil CPF, and Prototyping Naming custom data identifiers are successfully configured.

Figure 8: Successfully configured custom data identifier

Figure 8: Successfully configured custom data identifier

Set up a Test Bucket to Demonstrate Macie

Before we can see Macie do its work, we have to create a bucket with some test data that we can scan. We’ve provided some sample data files that you can download. Follow these instructions to create a test bucket and load our test data into the test bucket.

  1. Download the sample data and unzip it.
  2. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  3. Choose Create bucket. The Create bucket wizard opens.
  4. In Bucket name, enter a DNS-compliant name for your bucket. The bucket name must:
    • Be unique across all of Amazon S3.
    • Be between 3 and 63 characters long.
    • Not contain uppercase characters.
    • Start with a lowercase letter or number.

    We created a bucket called bucketformacieuse; you have to choose another name because this one is already taken by us.

  5. In Region, choose the AWS Region where you want the bucket to reside.
  6. Select Create, to finish the bucket creation.
  7. Open the bucket you just created and upload the two Excel files you downloaded in step 1.

Use Macie to create a job to scan your data

Now you can create a job to scan your Amazon S3 bucket to detect and locate the data patterns defined in the SIRET-NIC, Brazil CPF, and Prototyping Naming custom data identifiers.

To create a job

  1. In the navigation pane, choose Jobs, and then select Create Job on the upper right.
  2. Select Amazon S3 buckets: Select the S3 bucket you want to analyze. In this case, we are using the bucket previously created, bucketformacieuse.
  3. Review Amazon S3 buckets: Verify that you selected the S3 bucket you want the job to scan and analyze.
  4. Scope: Select your scope. For this example, choose the One-time job option as your scope. The scope specifies how often you want the job to run. This can be either a one-time job or a scheduled job. If you choose a scheduled job, you can define how often you want your job to scan your Amazon S3 bucket.
  5. Custom data identifiers: Select the 3 custom data identifiers you created to be associated with this job, and then select Next. This is shown in the following figure.

    Figure 9: Select your custom data identifiers

    Figure 9: Select your custom data identifiers

  6. Name and description: Enter a name and description for the job.
  7. Review and create: Review and verify all your settings, and then select Create.

You now have a job in Macie to scan the Amazon S3 buckets you’ve chosen using the 3 custom data identifiers you created. More information about creating jobs is available in Running sensitive data discovery jobs in Amazon Macie.

Respond to results

Macie will help you be secure when you’re effectively responding to the findings that it produces. For our example, we’ll show you how to review your findings manually. You can look at your findings by bucket, type, or job, or see a collective summary of all findings. In this example, let’s look at all findings.

To review your results

  1. In the navigation pane on the left-hand side, choose Findings. Findings include the severity, the type, the resources affected, and when the findings were last updated.
  2. The following figure shows an example of the results you might see on the findings page. There are two findings for the selected job. The compagnie_français.csv and the empresa_brasileira.csv files contain the custom data identifiers that you created earlier and added to the job.

    Figure 10: Findings

    Figure 10: Findings

  3. Let’s look at the details of one of the findings so you can review the results. From the page showing the 4r findings, select the file that contains your custom data identifier for the Brazil CPF: empresa_brasileira.csv. The number of custom data identifiers found in the document is shown in the Result section on the right, as shown in the following figure.

    Figure 11: Findings detail page for the Brazil CPF custom data identifiers

    Figure 11: Findings detail page for the Brazil CPF custom data identifiers

  4. Now look at the findings details for the compagnie_français.csv file. It shows the number of custom data identifiers found in the file. In this case Macie found 13 SIRET-NIC numbers as shown in the following figure.

    Figure 12: Findings page for the French company file

    Figure 12: Findings page for the French company file

  5. If you configured detailed logging, the results will be saved in the Amazon S3 bucket you specified. The S3 bucket location can be found in the Details section after Detailed result location as shown in the preceding figure.

Now that you’ve used Macie and the Custom Data Identifiers resource to obtain these findings, you can identify what data to place in encrypted storage, and what can be placed in non-encrypted storage when migrating to AWS. Macie and custom data identifiers provide an automated tool to help you enhance protection of your sensitive data by providing you the information to help detect and classify your data in the AWS Cloud.

Using Macie at Scale

Custom Data Identifiers help you tell Macie what to look for. As you move more and more data to the cloud, you’ll need to make new identifiers and new rules. As your rules and identifiers grow you will need to create automation that responds to things that are found. For example, perhaps a lambda function turns on encryption in a bucket when it finds sensitive data in that bucket. Or perhaps a function automatically applies tags to buckets where sensitive data is found, and those buckets and their owners start to appear on reports for audit and compliance. Once you’ve done this at small scale, think about how you will automate responses at larger scale.

Conclusion

The new Custom Data Identifier resource in the newly enhanced Macie can help you detect, classify, and protect sensitive data types unique to your organization. This post focused on the functionality and use of custom data identifiers to automatically discover sensitive data stored in Amazon S3. You can also review the managed data identifiers to see a list of personally identifiable information (PII) that Macie can detect by default. Visit What is Amazon Macie? to learn more.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Kayla Jing

Kayla is a Solutions Architect at Amazon Web Services based out of Seattle. She has experience in data science with a focus on Data Analytics and Machine Learning.

Author

Joshua Choung

Joshua is a Solutions Architect based out of Seattle. He works with customers to provide architectural and technical guidance and training on their AWS cloud journey.

Author

Laura Reith

Laura is a Solutions Architect at Amazon Web Services. Before AWS, she worked as a Solutions Architect in Taiwan focusing on physical security and retail analytics.

BBVA: Architecture for Large-Scale Macie Implementation

Post Syndicated from Neel Sendas original https://aws.amazon.com/blogs/architecture/bbva-architecture-for-large-scale-macie-implementation/

This post was co-written by Andrew Alaniz , Technical Information Security Officer, and Brady Pratt, Cloud Security Enginner, both at BBVA USA.

Introduction

Data Loss Prevention (DLP) is a common topic among companies that work with any type of sensitive data. One of the challenges is that many people either don’t fully understand what DLP is, or rather, have their own definition of what it is. Regardless of one’s interpretation of DLP, one thing is certain: before you can control data loss, you need to locate find the data sources.

If an organization can’t identify its data, it can’t protect it. BBVA USA, a bank holding company, turned to AWS for advice, and decided to use Amazon Macie to accomplish this in Amazon Simple Storage Service (Amazon S3). Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. This blog post will share some of the design and architecture we used to deploy Macie using services, such as AWS Lambda and Amazon CloudWatch.

Data challenges in Amazon S3

Although all S3 buckets are private by default, everyone is aware of the challenges of unsecured S3 buckets exposing data publicly. Amazon has provided a way to prevent that by removing the ability to make buckets public. As with other data storage mechanisms, this doesn’t stop anyone from storing sensitive data within AWS and exposing it another way. With Macie, one can classify the data stored in S3, centrally, and through AWS Organizations.

Recommended architecture

We can break the Macie architecture into two main parts: S3 discovery and evaluation, and S3 sensitive data discovery:

Macie architecture

The setup of discovery and evaluation is simple and straightforward, and should be enabled through Amazon Organizations and across all accounts. The cost of this piece is minimal, and it provides valuable insights into the compliance state of S3 buckets.

Once setup of discovery and evaluation is completed, we are a ready to move to the next step and configure discovery jobs for our S3 buckets. The architecture includes the use of S3, Amazon CloudWatch Events, Amazon EventBridge, and Lambda. All of the execution should happen in a centralized account, but the event triggers should come from each individual account.

Architectural considerations

When determining the architectural design of the solution, consider a few main components:

Centralization

Utilize AWS Organizations: Macie allows native integration with AWS Organizations. This is a significant advantage for Macie. Additionally, within AWS Organizations, it allows the delegation of the Macie master account to a subordinate account. The benefit of this is that it allows centralized management while allowing for the compartmentalization of roles.

Ease of management

One of the most challenging things to manage is non-conforming configurations. It’s much easier to manage a standard way to create, name, and configure settings. Once the classification jobs were ready to be created, we had to take into consideration the following when deploying Macie for our use case:

  1. Macie classifies content in a single job across one account.
  2. If you submit multiple jobs that contain the same bucket, Macie will scan the objects multiple times.
  3. Macie jobs are immutable.

Due to these considerations, we decided to create one job per S3 bucket. This allows administrators to search more easily for jobs related to findings.

Cost considerations

Macie plays an essential role, not only in identifying data and improving data collection, but also in compliance. We needed to make a decision about how to determine if an S3 bucket would be included in a classification job. Initially, we considered including all buckets no matter what. The logic here was that even if we make an assumption that a bucket would never have sensitive data in it, an entity with the right role could always add something at a later date.

Finally, we implemented a solution to tag specific buckets that were known to have immutable properties and which would never allow sensitive data to be added. We could do this because we knew exactly what data was in the bucket, who or what created the bucket, and exactly who or what had access to the bucket.

An example of this type of bucket is the S3 bucket used to store VPC Flow Logs. We know that this bucket is only created by provisioning scripts and is only going to store VPC flow logs that contain no sensitive data based on data classification standards. Also, only VPC services and specific security services can access this bucket for anything other than READ. This is controlled organizationally and can be tagged with a simple ignore key/value pair upon creation.

Deploying Macie at BBVA USA

BBVA USA developed an approach to working within AWS that allows guardrails to be applied as accounts are created. One of those guardrails identifies if developers have stored sensitive data in an account. BBVA needed to be able to do this, and do it at scale. If there is a roadblock or a challenge with AWS services, the first place BBVA looks is to support, but the second place is the Technical Account Manager.

After initiating conversations with its account team, BBVA determined that AWS Macie was the tool to help them with this challenge.

With the help of its technical account manager (TAM), BBVA was able to meet with the Macie Product team and discuss the best options for deploying at scale. Through these conversations, they were even able to influence the Macie product roadmap.

Getting Macie ready to deploy at scale was actually quite simple once the architectural pattern was designed.

Initial job creation

In order to set up jobs for each existing bucket in the organization, it’s a matter of scripting the job creation and adding each bucket from each account into its own job, which is pretty straightforward.

Job creation for new buckets

The recommended architecture and implementation for existing buckets:

  1. Whenever a new S3 bucket is added to Organization accounts, trigger a CloudWatch Event in the target account.
  2. Set up a cross account EventBridge to consume the Event. Using the EventBridge allows for a simpler configuration and centralized management of both Events and Lambda.
  3. Trigger a Lambda function in a delegated Macie admin account, which creates classification jobs to apply Macie to all the newly created S3 buckets.
  4. Repeat the same process when a bucket is deleted by triggering a cancel job.

Evaluate the state of S3 buckets

To evaluate the S3 accounts, turn on Macie at the organization master account and delegate administration to a subordinate account used for Macie. This enables management consolidation of security features into a centralized security account. This helps further restrict access from those that may need access to the master billing account. Finally, enable Macie by default on all organization accounts.

Evaluate the state of S3 buckets

Conclusion

BBVA USA worked directly with the Macie product team by leveraging its relationship with the AWS account team and Enterprise Support. This allowed the company to eventually deploy Macie quickly and at scale. Through Macie, the company is able to track any changes to configurations on buckets that allow a bucket to be public, shared, or replicated with external accounts and if the encryption policies are disabled. Using Macie, BBVA was able to identify buckets that contained sensitive information and put in another control to bolster its AWS governance profile.

New – Enhanced Amazon Macie Now Available with Substantially Reduced Pricing

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-enhanced-amazon-macie-now-available/

Amazon Macie is a fully managed service that helps you discover and protect your sensitive data, using machine learning to automatically spot and classify data for you.

Over time, Macie customers told us what they like, and what they didn’t. The service team has worked hard to address this feedback, and today I am very happy to share that we are making available a new, enhanced version of Amazon Macie!

This new version has simplified the pricing plan: you are now charged based on the number of Amazon Simple Storage Service (S3) buckets that are evaluated, and the amount of data processed for sensitive data discovery jobs. The new tiered pricing plan has reduced the price by 80%. With higher volumes, you can reduce your costs by more than 90%.

At the same time, we have introduced many new features:

  • An expanded sensitive data discovery, including updated machine learning models for personally identifiable information (PII) detection, and customer-defined sensitive data types using regular expressions.
  • Multi-account support with AWS Organizations.
  • Full API coverage for programmatic use of the service with AWS SDKs and AWS Command Line Interface (CLI).
  • Expanded regional availability to 17 Regions.
  • A new, simplified free tier and free trial to help you get started and understand your costs.
  • A completely redesigned console and user experience.

Macie is now tightly integrated with S3 in the backend, providing more advantages:

  • Enabling S3 data events in AWS CloudTrail is no longer a requirement, further reducing overall costs.
  • There is now a continual evaluation of all buckets, issuing security findings for any public bucket, unencrypted buckets, and for buckets shared with (or replicated to) an AWS account outside of your Organization.

The anomaly detection features monitoring S3 data access activity previously available in Macie are now in private beta as part of Amazon GuardDuty, and have been enhanced to include deeper capabilities to protect your data in S3.

Enabling Amazon Macie
In the Macie console, I select to Enable Macie. If you use AWS Organizations, you can delegate an AWS account to administer Macie for your Organization.

After it has been enabled, Amazon Macie automatically provides a summary of my S3 buckets in the region, and continually evaluates those buckets to generate actionable security findings for any unencrypted or publicly accessible data, including buckets shared with AWS accounts outside of my Organization.

Below the summary, I see the top findings by type and by S3 bucket. Overall, this page provides a great overview of the status of my S3 buckets.

In the Findings section I have the full list of findings, and I can select them to archive, unarchive, or export them. I can also select one of the findings to see the full information collected by Macie.

Findings can be viewed in the web console and are sent to Amazon CloudWatch Events for easy integration with existing workflow or event management systems, or to be used in combination with AWS Step Functions to take automated remediation actions. This can help meet regulations such as Payment Card Industry Data Security Standard (PCI-DSS), Health Insurance Portability and Accountability Act (HIPAA), General Data Privacy Regulation (GDPR), and California Consumer Protection Act (CCPA).

In the S3 Buckets section, I can search and filter on buckets of interest to create sensitive data discovery jobs across one or multiple buckets to discover sensitive data in objects, and to check encryption status and public accessibility at object level. Jobs can be executed once, or scheduled daily, weekly, or monthly.

For jobs, Amazon Macie automatically tracks changes to the buckets and only evaluates new or modified objects over time. In the additional settings, I can include or exclude objects based on tags, size, file extensions, or last modified date.

To monitor my costs, and the use of the free trial, I look at the Usage section of the console.

Creating Custom Data Identifiers
Amazon Macie supports natively the most common sensitive data types, including personally identifying information (PII) and credential data. You can extend that list with custom data identifiers to discover proprietary or unique sensitive data for your business.

For example, often companies have a specific syntax for their employee IDs. A possible syntax is to have a capital letter, that defines if this is a full-time or a part-time employee, followed by a dash, and then eight numbers. Possible values in this case are F-12345678 or P-87654321.

To create this custom data identifier, I enter a regular expression (regex) to describe the pattern to match:

[A-Z]-\d{8}

To avoid false positives, I ask that the employee keyword is found near the identifier (by default, less than 50 characters apart). I use the Evaluate box to test that this configuration works with sample text, then I select Submit.

Available Now
For Amazon Macie regional availability, please see the AWS Region Table. You can find more information on how the new enhanced Macie in the documentation.

This release of Amazon Macie remains optimized for S3. However, anything you can get into S3, permanently or temporarily, in an object format supported by Macie, can be scanned for sensitive data. This allows you to expand the coverage to data residing outside of S3 by pulling data out of custom applications, databases, and third-party services, temporarily placing it in S3, and using Amazon Macie to identify sensitive data.

For example, we’ve made this even easier with RDS and Aurora now supporting snapshots to S3 in Apache Parquet, which is a format Macie supports. Similarly, in DynamoDB, you can use AWS Glue to export tables to S3 which can then be scanned by Macie. With the new API and SDKs coverage, you can use the new enhanced Amazon Macie as a building block in an automated process exporting data to S3 to discover and protect your sensitive data across multiple sources.

Danilo

AWS Security Profiles: Dan Plastina, VP of Security Services

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-dan-plastina-vp-security-services/

In the weeks leading up to re:Invent 2019, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do as the VP of Security Services?

I’ve been at Amazon for just over two years. I lead the External Security Services organization—our team builds AWS services that help customers improve the security of their workloads. Our services include Amazon Macie, Amazon GuardDuty, Amazon Inspector, and AWS Security Hub.

What drew me to Amazon is the culture of ownership and accountability. I wake up every day and get to help AWS customers do things that transform their world—and I get to do that work with a whole bunch of people who feel the same way and take the same level of ownership. It’s very energizing.

What’s your favorite part of your job?

That’s hard! I love most aspects of my job. Forced to pick one, I’d have to say my favorite part is helping customers. Our Shared Responsibility Model says that AWS is accountable for the security of AWS, and customers are responsible for the resources and workloads they manage in AWS. My job allows me to sit on the customer side of the shared responsibility model. Our team builds the services that help customers improve the security of their workloads on AWS. Being able to help in that way is very rewarding.

One of Amazon’s widely-known leadership principles is Customer Obsession. Can you speak to what that looks like in the context of your work?

Being customer obsessed means that you’re in tune with the needs of the customer you’re working with. In the case of external security services, “customer obsessed” requires you to deeply understand what it means for individual customers to protect their assets in AWS, to empathize with those needs, and then to help them figure out how to get from where they are, to where they want to be. Because of this, I spend a lot of time with customers.

Our team participates in many in-person executive customer briefings. We hold a lot of conference calls. I’m flying to the UK on Monday to meet with customers—and I was there three weeks ago. I’ve spent over six weeks this fall traveling to talk with customers.  That much travel time can be hard, but it’s necessary to be in front of customers and listen to what they tell us. I’m fortunate to have a really strong team and so when I’m not traveling, I’m still able to spend a lot of time thinking about customer needs and about what my team should do next.

You’re on an elevator packed with CISOs, and they want you to explain the difference between Security Hub, GuardDuty, Macie, and Inspector before the doors open. What do you say?

First, I would tell them that the services are best understood as a suite of security services, and that AWS Security Hub offers a single pane of glass [Editor: a management tool that integrates information and offers a unified view] into everything else: Use it to understand the severity and sensitivity of findings across the other services you’re using.

Amazon GuardDuty is a continuous security monitoring and threat detection service. You simply choose to have it on or off in your AWS accounts. When it’s enabled, it detects highly suspicious activity and unauthorized access across the entirety of your AWS workloads. While GuardDuty alerts you to potential threats, Amazon Inspector helps you ensure that you address publicly known software vulnerabilities in a timely manner, removing them as a potential entry point for unauthorized users. Amazon Macie offers a particular focus on protecting your sensitive data by giving you a highly scalable and cost effective way to scan AWS for sensitive data and report back what is found and how it is being protected with access controls and encryption.

Then, I’d invite the entire elevator to come to re:Invent, to learn more about the new work my team is doing.

What can you tell us about your team’s re:Invent plans?

We have some exciting things planned for re:Invent this year. I can’t go into specifics yet, but we’re excited about it. A lot of my team will be present, and we’re looking forward to speaking with customers and learning more about what we should work on for next year.

We’ve got a variety of sessions about Security Hub, GuardDuty, and Inspector. If you can only make it to three security-specific sessions, I recommend Threat management in the cloud (SEC206-R), Automating threat detection and response in AWS (SEC301-R), and Use AWS Security Hub to act on your compliance and security posture (SEC342-R).

Is there some connecting thread to all of the various projects that your teams are working on right now?

I see a few threads. One is the concept of security being priority zero. It’s a theme that we live by at AWS, but customers ask us to stretch a little bit further and include their workloads in our security considerations. So workload security is now priority zero too. We’re spending a lot of time working that out and looking for ways to improve our services.

Another thread is that customers are asking us for prescriptive guidance. They’re saying, “Just tell me how I can ensure that my environment is safe. I promise you won’t offend me. Guide me as much as you can, and I’ll disregard anything that isn’t relevant to my environment.”

What’s one currently available security feature that you wish more customers were aware of?

A service, not a feature: AWS Security Hub. It has the ability to bring together security findings from many different AWS, partner, and customer security detection services. Security Hub takes security findings and normalizes them into our Amazon Security Findings Format, ASFF, and then sends them all back out through Amazon CloudWatch events to many partners that are capable of consuming them.

I think customers underestimate the value of having all of these security events normalized into a format that they can use to write a Splunk Phantom runbook, for example, or a Demisto runbook, or a Lambda function, or to send it to Rapid7 or cut a ticket in Jira. There’s a lot of power in what Security Hub does and it’s very cost effective. Many customers have started to use these capabilities, but I know that not everyone knows about it yet.

How do you stay up to date on important cloud security developments across the industry?

I get a lot of insight from customers. Customers have a lot of questions, and I can take these questions as a good indicator of what’s on peoples’ minds. I then do the research needed to get them smart answers, and in the process I learn things myself.

I also subscribe to a number of newsletters, such as Last Week In AWS, that give some interesting information about what’s trending. Reading our AWS blogs also helps because just keeping up with AWS is hard. There’s a lot going on! Listening to the various feeds and channels that we have is very informative.

And then there’s tinkering. I tinker with home automation / Internet of Things projects and with vendor-provided offers such as those provided to me by Splunk, Palo Alto, and CheckPoint of recent. It’s been fun learning partner offerings by building out VLANs, site-to-site tunnels, VPNs, DNS filters, SSL inspection, gateway-level anonymizers, central logging, and intrusion detection systems. You know, the home networking ‘essentials’ we all need.

You’re into riding Superbikes as a hobby. What’s the appeal?

I ride fast bikes on well-known race tracks all around the US several times a year. I love how speed and focus must come together. Going through different corners requires orchestrating all kinds of different motor and mental skills. It flushes the brain and clears your thoughts like nothing else. So, I appreciate the hobby as a way of escaping from normal day-to-day routine. Honestly, there’s nothing like doing 160 mph down a straightaway to teach you how to focus on what is needed, now.

You’re originally from Montreal. What’s one thing a visitor should eat on a trip there?

Let me give you two, eh. If you find yourself in a small rural Quebec restaurant, you must have poutine, the local ‘delicacy’. If you find yourself downtown, near my Alma matter Concordia University, you must enjoy our local student staple, Kojax. That said, it’s honestly hard to make a mistake when you’re eating in Montreal. They have a lot of good food there.

Want more AWS Security news? Follow us on Twitter.

The AWS External Security Services team is hiring! Want to find out more? Check out our career page.

Dan Plastina

Dan is Vice President, Head of Security Services for Amazon Web Services (AWS). He’s most often seen working alongside his team leaders on product design, management and engineering development efforts to enable business and government customers to secure themselves, when using AWS. He has travelled extensively, meeting c-suite and security leaders at all corners of the globe.

Nine AWS Security Hub best practices

Post Syndicated from Ketan Srivastava original https://aws.amazon.com/blogs/security/nine-aws-security-hub-best-practices/

AWS Security Hub is a security and compliance service that became generally available on June 25, 2019. It provides you with extensive visibility into your security and compliance status across multiple AWS accounts, in a single dashboard per region. The service helps you monitor critical settings to ensure that your AWS accounts remain secure, allowing you to notice and react quickly to any changes in your environment.

AWS Security Hub aggregates, organizes, and prioritizes security findings from supported AWS services—that’s Amazon GuardDuty, Amazon Inspector, and Amazon Macie at the time this post was published—and from various AWS partner security solutions. AWS Security Hub also generates its own findings, based on automated, resource-level and account-level configuration and compliance checks using service-linked AWS Config rules plus other analytic techniques. These checks help you keep your AWS accounts compliant with industry standards and best practices, such as the Center for Internet Security (CIS) AWS Foundations standard.

In this post, I’ll provide nine best practices to help you use AWS Security Hub as effectively as possible.

1. Use the AWS Labs script to turn on Security Hub in all your AWS accounts in all regions and to establish your existing Amazon GuardDuty master/member hierarchy

As a best practice, you should continuously monitor all regions across all of your AWS accounts for unauthorized behavior or misconfigurations, even in regions that you don’t use heavily. AWS already recommends that you do this when using monitoring services like AWS Config and AWS CloudTrail. I recommend that you enable Security Hub in every region available in your AWS accounts.

In addition, you can also invite other AWS accounts to enable Security Hub and share findings with your AWS account. If you send an invitation and it is accepted by the other account owner, your Security Hub account is designated as the master account, and any associated Security Hub accounts become your member accounts. Users from the master account will then be able to view Security Hub findings from member accounts.

To simplify these configurations, you can utilize the AWS Labs script available on GitHub, which provides a step-by-step guide to automate this process. This script allows you to enable (and disable) AWS Security Hub simultaneously across a list of associated AWS accounts and bulk-add them to become your Security Hub members; it sends invitations from the master account and automatically accepts invitations in all member accounts. To run the script, you must have the AWS account IDs and root email addresses of the AWS accounts that you want as your Security Hub members. (Note that you should only share your root email address and account ID with AWS accounts that you trust. Visit the IAM best practices page to learn more about how to keep access to your AWS accounts secured.)

By default, the Security Hub master/member association is independent of the relationships that you’ve established between your Amazon GuardDuty or Amazon Macie accounts and other associated accounts. If you have an existing master/member hierarchy in GuardDuty or Macie, you can export that list of accounts into a CSV file and then use it with the script. For example in GuardDuty, use the ListMembers API to export the AWS Account ID and email of all member accounts, as follows:

aws guardduty list-members –detector-id <Detector ID> –query "Members[].[AccountId, Email]" –output text | awk ‘{print $1 "," $2}’

The output of the above command will be your GuardDuty member account IDs and their corresponding root email addresses, one per line and separated with a comma as shown below:

12345678910,[email protected]
98765432101,[email protected]

2. Enable AWS Config in all AWS accounts and regions and leave the AWS CIS Foundations standard check enabled

When you enable Security Hub in any region, the AWS CIS standard checks are enabled by default. I recommend leaving them enabled; they are important security measures that are applicable to all AWS accounts.

To run most of these checks, Security Hub uses service-linked AWS Config rules. Because of this, you should make sure that AWS Config is turned on and recording all supported resources, including global resources, in all accounts and regions where Security Hub is deployed. You are not charged by AWS Config for these service-linked rules. You are only charged via Security Hub’s pricing model.

3. Use specific managed IAM policies for different types of Security Hub users

You can choose to allow a large group of users to access List and Read Security Hub actions, which will permit them to view your security findings. However, you should allow only a small group of users to access the Security Hub Write actions. This will permit only authorized users to archive, resolve, or remediate the findings.

You can use AWS managed policies to give your employees the permissions they need to get started. These policies are already available in your account and are maintained and updated by AWS. To grant more granular permission to your Security Hub users, I recommend that you create your own customer managed policies. A great place to start with this is to import an existing AWS managed policy. That way, you know that the policy is initially correct, and all you need to do is customize it for your environment.

AWS categorizes each service action into one of five access levels based on what each action does: List, Read, Write, Permissions management, or Tagging. To determine which access level to include in the IAM policies that you assign to your users, you can view the policy summary by navigating from the IAM Console to Policies, then selecting any AWS managed or customer managed policy. Next, on the Summary page, under the Permissions tab, select Policy summary (see Figure 1). For more details and examples of access level classification, see Understanding Access Level Summaries Within Policy Summaries.
 

Figure 1: Policy summary of AWSSecurityHubReadOnlyAccess AWS managed policy

Figure 1: Policy summary of AWSSecurityHubReadOnlyAccess AWS managed policy

4. Use tags for access controls and cost allocation

A SecurityHub::Hub resource represents the implementation of the AWS Security Hub service per region in your AWS account. Security Hub allows you to assign metadata to your SecurityHub::Hub resource in the form of tags. Each tag is a string consisting of a user-defined key and an optional key-value that makes it easier for you to identify and manage the AWS resources in your environment.

You can control access permissions by using tags on your SecurityHub::Hub resource. For example, you can allow a group of developer IAM entities to manage and update only the SecurityHub::Hub resources that have the tag key developer associated with them. This can help you restrict access to your production SecurityHub::Hub resources, while allowing your developers to continue testing in their developer environment.

For more information on the supported tag-based conditions which you can use with the Security Hub APIs, refer to Condition Keys for AWS Security Hub. Please note that when you use tag-based conditions for access control, you must define who can modify those tags.

To make it easier to categorize and track your AWS costs, you can also activate cost allocation tags. This helps you organize your SecurityHub::Hub resource costs. AWS generates a cost allocation report as a CSV file, with your usage and costs grouped according to your active tags. You can apply tags that represent business categories (such as cost centers, application names, or project environments) to organize your costs.

For more information on commonly used tagging categories and effective tagging strategies, read about AWS Tagging Strategies.

5. Integrate and enable your existing security products (with 34 integrations today and more to come)

Numerous tools can help you understand the security and compliance posture of your AWS accounts, but these tools generate their own set of findings, often in different formats. Security Hub normalizes the findings.

With Security Hub, findings generated from integrated providers (both third-party services and AWS services) are ingested using a standard findings format, which eliminates the need for security teams to convert the data. You can currently integrate 34 findings providers to import and/or export findings with Security Hub. Some partner products, like PagerDuty, Splunk, and Slack, can receive findings from Security Hub, although they don’t generate findings.

If you want to add a third-party partner product to your AWS environment, you can choose the Purchase link from the Security Hub console’s Integrations page and navigate to AWS Marketplace. Once purchased, choose the Configure link to navigate to step-by-step instructions to install the product and configure its integration with Security Hub. Then choose Enable integration to create a product subscription in your account for that third-party provider (see Figure 2).

After you enable a subscription, a resource policy is automatically attached to it. The resource policy defines the permissions that Security Hub needs to accept and process the product’s findings. You can also enable the subscription via the API and CloudFormation.
 

Figure 2: Integrating partner findings provider with Security Hub

Figure 2: Integrating partner findings provider with Security Hub

6. Build out customized remediation playbooks using Amazon CloudWatch Events, AWS Systems Manager Automation documents, and AWS Step Functions to automatically resolve findings that don’t require human intervention

Security Hub automatically sends all findings to Amazon CloudWatch Events. This integration helps you automate your response to threat incidents by allowing you to take specific actions using AWS Systems Manager Automation documents, OpsItems, and AWS Step Functions. Using these tools, you can create your own incident handling plan. This will allow your security team to focus on strengthening the security of your AWS environments rather than on remediating the current findings.
 

Figure 3: Creating a CloudWatch Events Rule for sending matched Security Hub findings to specific Targets

Figure 3: Creating a CloudWatch Events Rule for sending matched Security Hub findings to specific Targets

7. Create custom actions to send a copy of a Security Hub finding to a resource that is internal or external to your AWS account, enabling additional visibility and remediation options for the finding

Because of its integration with CloudWatch Events, you can use Security Hub to create custom actions that will send specific findings to ticketing, chat, email, or automated remediation systems. Custom actions can also be sent to your own AWS resources, such as AWS Systems Manager OpsCenter, AWS Lambda or Amazon Kinesis, allowing you to do your own remediation or data capture related to the finding.

For an in-depth look at this architecture, plus specific examples of how to implement custom actions, see How to Integrate AWS Security Hub Custom Actions with PagerDuty and How to Enable Custom Actions in AWS Security Hub.

In addition, Security Hub gives you the option to choose a language-specific AWS SDK so that you can use custom actions to resolve findings programmatically. Below, I’ll demonstrate how you can implement this using AWS Lambda and AWS SDK for Python (Boto3). In my example, I’ll remediate the finding generated by Security Hub for CIS check 2.4, “Ensure CloudTrail trails are integrated with Amazon CloudWatch Logs.” For this walk-through, I assume that you have the necessary AWS IAM permissions to work with Security Hub, CloudWatch Events, Lambda and AWS CloudTrail.
 

Figure 4: Data flow supporting remediation of Security Hub findings using custom actions

Figure 4: Data flow supporting remediation of Security Hub findings using custom actions

As shown in figure 4:

  1. When findings against CIS check 2.4 are generated in Security Hub, Security Hub will send them to CloudWatch Events using custom actions that I’ll describe below.
  2. CloudWatch Events will send the findings to a Lambda function that has been configured as the target.
  3. The Lambda function will utilize a Python script to check whether the finding has been generated against CIS check 2.4. If it has, the Lambda function will identify the affected CloudTrail trail and configure it with CloudWatch Logs to monitor the trail logs.

Prerequisites

  1. You must configure an IAM Role for AWS CloudTrail to assume so that it can deliver events to your CloudWatch Logs log group. For more information about how to do this, refer to the AWS CloudTrail documentation. I’ll refer to this role as the CloudTrail role.
  2. To deploy the Lambda function, you must configure an IAM Role for the Lambda function to assume. I’ll refer to this role as the Lambda execution role. The following sample policy includes the permissions that you’ll assign to it for this example. Please replace <CloudTrail_CloudWatchLogs_Role> with the CloudTrail role that you created in the previous step. Depending on your use case, you can restrict this IAM policy further to grant least privilege, which is a recommended IAM Best Practice.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:DescribeLogGroups",
                "cloudtrail:UpdateTrail",
                "iam:GetRole"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::012345678910:role/<CloudTrail_CloudWatchLogs_Role>"
        }
    ]
}     

Solution deployment

  1. Create a custom action in AWS Security Hub and associate it with a CloudWatch Events rule that you configure for your Security Hub findings. Follow the instructions laid out in the Security Hub user guide for the exact steps to do this.
  2. Create a Lambda Function, which will complete the auto-remediation of the CIS 2.4 findings:
    1. Open the Lambda Console and select Create function.
    2. On the next page, choose Author from scratch.
    3. Under Basic information, enter a name for your function. For Runtime, select Python 3.7.
       
      Figure 5: Updating basic information to create the Lambda function

      Figure 5: Updating basic information to create the Lambda function

    4. Under Permissions, expand Choose or create an execution role.
    5. Under Execution role, select the drop down menu and change the setting to Use an existing role.
    6. Under Existing role, select the Lambda execution role that you created earlier, then select Create function.
       
      Figure 6: Updating basic information and permissions to create the Lambda function

      Figure 6: Updating basic information and permissions to create the Lambda function

    7. Delete the default function code and paste the code I’ve provided below:
      
              import json, boto3
              cloudtrail_client = boto3.client('cloudtrail')
              cloudwatchlogs_client = boto3.client('logs')
              iam_client = boto3.client('iam')
              
              role_details = iam_client.get_role(RoleName='<CloudTrail_CloudWatchLogs_Role>')
              
              def lambda_handler(event, context):
                  # First off all, let us see if the JSON sent by CWE has any Security Hub findings.
                  if 'detail' in event.keys() and 'findings' in event['detail'].keys() and len(event['detail']['findings']) > 0:
                      print("There are some findings. Let's check them!")
                      print("Number of findings: %i" % len(event['detail']['findings']))
              
                      # Then we need to filter out the findings. In this code snippet, we'll handle only findings related to CloudTrail trails for integration with CloudWatch Logs.
                      for finding in event['detail']['findings']:
                          if 'Title' in finding.keys():
                              if 'Ensure CloudTrail trails are integrated with CloudWatch Logs' in finding['Title']:
                                  print("There's a CloudTrail-related finding. I can handle it!")
              
                                  if 'Compliance' in finding.keys() and 'Status' in finding['Compliance'].keys():
                                      print("Compliance Status: %s" % finding['Compliance']['Status'])
              
                                      # We can skip compliant findings, and evaluate only the non-compliant ones.                        
                                      if finding['Compliance']['Status'] == 'PASSED':
                                          continue
              
                                      # For each non-compliant finding, we need to get specific pieces of information so as to create the correct log group and update the CloudTrail trail.                        
                                      for resource in finding['Resources']:
                                          resource_id = resource['Id']
                                          cloudtrail_name = resource['Details']['Other']['name']
                                          loggroup_name = 'CloudTrail/' + cloudtrail_name
                                          print("ResourceId for the finding is %s" % resource_id)
                                          print("LogGroup name: %s" % loggroup_name)
              
                                          # At this point, we can create the log group using the name extracted from the finding.
                                          try:
                                              response_logs = cloudwatchlogs_client.create_log_group(logGroupName=loggroup_name)
                                          except Exception as e:
                                              print("Exception: %s" % str(e))
              
                                          # For updating the CloudTrail trail, we need to have the ARN of the log group. Let's retrieve it now.                            
                                          response_logsARN = cloudwatchlogs_client.describe_log_groups(logGroupNamePrefix = loggroup_name)
                                          print("LogGroup ARN: %s" % response_logsARN['logGroups'][0]['arn'])
                                          print("The role used by CloudTrail is: %s" % role_details['Role']['Arn'])
              
                                          # Finally, let's update the CloudTrail trail so that it sends logs to the new log group created.
                                          try:
                                              response_cloudtrail = cloudtrail_client.update_trail(
                                                  Name=cloudtrail_name,
                                                  CloudWatchLogsLogGroupArn = response_logsARN['logGroups'][0]['arn'],
                                                  CloudWatchLogsRoleArn = role_details['Role']['Arn']
                                              )
                                          except Exception as e:
                                              print("Exception: %s" % str(e))
                              else:
                                  print("Title: %s" % finding['Title'])
                                  print("This type of finding cannot be handled by this function. Skipping it…")
                          else:
                              print("This finding doesn't have a title and so cannot be handled by this function. Skipping it…")
                  else:
                      print("There are no findings to remediate.")            
              

    8. After pasting the code, replace <CloudTrail_CloudWatchLogs_Role> with your CloudTrail role and select Save to save your Lambda function.
       
      Figure 7: Editing Lambda code to replace the correct CloudTrail role

      Figure 7: Editing Lambda code to replace the correct CloudTrail role

  3. Go to your CloudWatch console and select Rules in the navigation pane on the left.
    1. From the list of CloudWatch rules that you see, select the rule which you created in Step 1 of this solution deployment.
    2. Then, select Actions on the top right of the page and choose Edit.
    3. On the Step 1: Create rule page, under Targets, choose Lambda function and select the Lambda function you created in Step 2.
    4. Select Configure details.
    5. On the Step 2: Configure rule details page, select Update rule.
       
      Figure 8: Adding your created Lambda function as Target for the CloudWatch rule

      Figure 8: Adding your created Lambda function as target for the CloudWatch rule

  4. Configuration is now complete, and you can test your rule. Go to your AWS Security Hub console and select Compliance standards in the navigation pane.
    1. Next, select CIS AWS Foundations.
       
      Figure 9: Compliance standards page in the Security Hub console

      Figure 9: Compliance standards page in the Security Hub console

    2. Search for the rule 2.4 Ensure CloudTrail trails are integrated with CloudWatch Logs and select it.
       
      Figure 10: Locating CIS check 2.4 in the Security Hub console

      Figure 10: Locating CIS check 2.4 in the Security Hub console

    3. If you’ve left the default AWS Security Hub CIS checks enabled (along with AWS Config service in the same region), and if you have CloudTrail trails in that region which are not yet configured to deliver events to CloudWatch Logs, you should see a low severity finding with a Failed Compliance status.
    4. Select the failed finding by selecting the checkbox and choosing the Actions button.
    5. Finally, from the dropdown menu, select the custom action that you created in Step 1 to send the finding to CloudWatch Events. CloudWatch Events will send the finding to your Lambda function, which you configured as the target for the rule in step 3. The Lambda function will automatically identify the affected CloudTrail trail and configure CloudWatch Logs log group for you. The log group will have the same name as your trail for identification purposes. You can modify the code to suit your needs further.

    Note: There may be a delay before the compliance status of the remediated resource changes. Once the CIS AWS Foundations Standard is enabled, Security Hub will run the checks within 2 hours. After that, the checks are automatically run once every 24 hours.

     

    Figure 11: Findings generated against CIS check 2.4 in the Security Hub Console

    Figure 11: Findings generated against CIS check 2.4 in the Security Hub console

    8. Customize your insights using the default “managed insights” as templates and use them to prioritize resources and findings to act upon

    A Security Hub “insight” is a collection of related findings to which one or more Security Hub filters have been applied. Insights can help you organize your findings and identify security risks that need immediate attention.

    Security Hub offers several managed (default) insights. You can use these as templates for new insights, and modify them depending on your use case. You can save these modified queries as new custom insights to ensure an even greater visibility of your AWS accounts. Please refer to the documentation for step-by-step instructions on how to create custom insights.
     

    Figure 12: Creating a Security Hub custom insight

    Figure 12: Creating a Security Hub custom insight

    9. Use the free trial to evaluate what your costs could be

    Security Hub provides a 30-day free trial for all AWS accounts and regions. The trial is a good way to evaluate how much Security Hub will cost, on average, to monitor threats and compliance in your environments. You can view an estimate by navigating from the Security Hub console to Settings, then Usage (see Figure 13).
     

    Figure 13: Estimating your Security Hub costs

    Figure 13: Estimating your Security Hub costs

    Conclusion

    AWS Security Hub allows you to have more visibility into the security and compliance status of your AWS environments. Using the Security Hub best practices discussed here, security teams can spend more time on incident remediation and recovery rather than incident detection and organization. Security Hub has undergone HIPAA, ISO, PCI, and SOC certification. To learn more about Security Hub, refer to the AWS Security Hub documentation.

    If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the AWS Security Hub forum or contact AWS Support.

    Want more AWS Security news? Follow us on Twitter.

    Author

    Ketan Srivastava

    Ketan is a Cloud Support Engineer at AWS. He enjoys the fact that, at AWS, there are always so many opportunities to build things better for our customers and learn from these opportunities. Outside of work, he plays MOBAs and travels to new places with his wife. He holds a Master of Science degree from Rochester Institute of Technology.

How to create custom alerts with Amazon Macie

Post Syndicated from Jeremy Haynes original https://aws.amazon.com/blogs/security/how-to-create-custom-alerts-with-amazon-macie/

Amazon Macie is a security service that makes it easy for you to discover, classify, and protect sensitive data in Amazon Simple Storage Service (Amazon S3). Macie collects AWS CloudTrail events and Amazon S3 metadata such as permissions and content classification. In this post, I’ll show you how to use Amazon Macie to create custom alerts for those data sets to notify you of events and objects of interest. I’ll go through the various types of data you can find in Macie, and talk about how to identify fields that are relevant to a given security use case. What you’ll learn is a method of investigation and alerting that you’ll be able to apply to your own situation.

If you’re totally new to Macie, make sure you first head over to the AWS Blog launch post for instructions on how to get started.

Understand the types of data in Macie

There are three sources of data found in Macie: CloudTrail events, S3 Bucket metadata, and S3 Object metadata. Each data source is stored separately in its own index. Therefore, when writing queries it’s important to keep fields from different sources in separate queries.

Within each data source there are two types of fields: Extracted and Generated. Extracted fields come directly from pre-existing AWS fields associated with that data source. For example, Extracted fields from the CloudTrail data source come directly from the events recorded in the CloudTrail logs that correspond to the actions taken by an IAM user, role, or an AWS service. See these CloudTrail log file examples for more details.

Generated fields, on the other hand, are created by Macie. These fields provide additional security value and context for creating more powerful queries. For example, the Internet Service Provider (ISP) field is populated by taking the IP addresses from a CloudTrail event and matching them against a global service provider list. This enables searching for the ISP associated with an API call.

In the sections below, I’ll explore each of the three data sources in more detail. There are reference guides in the Macie documentation dedicated to each source that provide an extensive list of fields and descriptions. I’ll use these lists to discover what fields are appropriate for investigating a particular security use case.

Choose a security use case to instrument alerts

For the purposes of this post, I’ll choose one particular security use case to focus on. The process of discovering relevant data fields collected by Macie and turning those into custom alerts will be the same for any use case you wish to investigate. With that in mind, let’s choose the theme of sensitive or critical data stored in S3 to explore because it can affect many AWS customers.

When beginning to design alerts, the first step is to think about all of the resources, attributes, actions, and identities related to the subject. In this case, I’m looking at sensitive or critical data stored in S3, so the following are some potentially useful fields of data to consider:

  • S3 bucket and object resources
  • S3 configuration and security attributes
  • Read, write, and delete actions on S3
  • IAM users, roles, and access policies associated with the S3 resources

I’ll use this list to guide my search for relevant fields in the Macie reference documentation. First, I’ll dive into the CloudTrail data.

Explore CloudTrail data

The best way to build an understanding of what activities are happening in your AWS environment is by using the CloudTrail data source because it contains your AWS API calls and the identities that made them. If you’re unfamiliar with CloudTrail, head over to the AWS CloudTrail documentation for more information.

Ok, I have a critical S3 bucket and I want to be notified each time an object write is attempted to it outside of the typical access pattern used in my corporate environment. In this case, write actions to the bucket are normally controlled by my organization’s own identity system via federated access. So that means I want to write a query that searches for object write attempts by a non-federated principal. If you’re unfamiliar with what federation is, that’s ok, you can still follow along. If you’re curious about federation, see the IAM Identity Providers and Federation documentation.

I begin by opening the Macie CloudTrail Reference documentation and looking at the section titled “CloudTrail Data Fields Extracted by Macie.” Skimming over the list, I see that the objectsWritten.key description matches my criteria for investigating actions on an S3 resource: “A list of S3 objects’ ARNs that were part of a PutObject, CopyObject, or CompleteMultipartUpload API calls.”

Next, I take a look at the CloudTrail field names that are extracted from the userIdentity object in an event. The userIdentity.type field looks promising, but I’m unsure what values this field can accept. Using the CloudTrail userIdentity Element documentation as a reference, I look up the type field and see that FederatedUser is listed as one of the values.

Great! I have all of the information necessary to write my first custom query. I’ll search for all “user sessions” (a 5-minute aggregation of CloudTrail events corresponding to a user) that have both an attempted write to my S3 bucket and any API call with a userIdentity type which is not FederatedUser. It’s important to note that I’m searching groups of events and not individual events because we’re looking for events related to a specific API call rather than all events related to all API calls. To match values which do not equal FederatedUser, I use a regular expression and place FederatedUser inside parenthesis with a tilde character (~) at the beginning. Here’s what I came up with using the corresponding Macie field names and example search queries as a formatting guide:
 
objectsWritten.key:/arn:aws:s3:::my_sensitive_bucket.*/ AND userIdentityType.key:/(~FederatedUser)/
 
Now, it’s time to test this query on the Research page and turn it into a custom alert.

Create custom alerts

The first step to create a custom alert is to run the proposed query from the Research page. I do this so I can verify that the results match my expectations and to get an idea of how often the alert will occur. Once I’m happy with the query, setting up a custom alert is only a few clicks away.

  1. In the Macie navigation pane, select Research.
  2. Select the data source matching the query from the options in the drop-down menu. In this case, I select CloudTrail data.
     
    Figure 1: Select the data source on the Research page

    Figure 1: Select the data source on the Research page

  3. To run the search, I copy my custom query, paste it in the query box, and then select Enter.
     
    objectsWritten.key:/arn:aws:s3:::my_sensitive_bucket.*/ AND userIdentityType.key:/(~FederatedUser)/
     
  4. At this point, I verify that the results match my expectations and make any necessary modifications to the query. To learn more about how the features of the Research page can help with verifying and modifying queries, see Using the Macie Research Tab.
  5. Once I’m confident in the results, I select the Save query as alert icon.
     
    Figure 2: Select the "Save query as alert" icon

    Figure 2: Select the “Save query as alert” icon

  6. Now, I fill in the remaining fields: Alert title, Description, Min number of matches, and Severity. For more information about each of these fields, see Macie Adding New Custom Basic Alerts.
     
    Figure 3: Fill in the remaining "Alert title," "Description," "Min number of matches," and "Severity" fields

    Figure 3: Fill in the remaining “Alert title,” “Description,” “Min number of matches,” and “Severity” fields

  7. In the Whitelisted users field, I add any users which appeared in my Research page results that I would like to exclude from the alert. For more details on this feature, see Whitelisting Users or Buckets for Basic Alerts.
  8. Finally, I save the alert.

That’s it! I now have a custom basic alert that will alert me every time there’s an attempt to write to my bucket in the same user session as an API call with a userIdentityType other than FederatedUser. Now, I’ll use the S3 object and S3 bucket data sources to look for some more useful fields.

Use S3 object data

The S3 object data source contains fields extracted from S3 API metadata, as well as fields populated by Macie relating to content classification. To expand my search and alerts for sensitive or critical data stored in S3, I’ll look at the S3 Object Data Reference documentation and look for fields that are promising candidates.

I see that S3 Server Side Encryption Settings metadata could be useful because I know that all of the objects in a certain bucket should be encrypted using AES256, and I’d like to be notified every time an object is uploaded that doesn’t match that attribute.

To create the query, I combine the server_encryption field and the bucket field to match on all S3 objects within the specified bucket. Note the forward slashes and “.*” that make this a regular expression search. This allows me to match all buckets that share my project name, even when the full bucket name is different.
 
filesystem_metadata.bucket:/.*my_sensitive_project.*/ AND NOT filesystem_metadata.server_encryption:"AES256"
 
Next, I have a different bucket that I know should never have any objects which contain personally identifiable information (PII). This includes such information as names, email addresses, and credit card numbers. For the full list of what Macie considers PII, see Classifying Data with Amazon Macie. I’d like to set up an alert that notifies me every time an object containing any type of PII is added to this bucket. Since this is a data field that’s provided by Macie, I look under the Generated Fields heading and find the field pii_impact. I’m looking for all levels of PII impact, so my query will search for any value which isn’t equal to none.

As before, I’ll combine this with the bucket field to include all S3 objects matching the bucket name.
 
filesystem_metadata.bucket:"my_logs_bucket" AND NOT pii_impact:"none"
 

Use S3 bucket data

The S3 bucket data source contains information extracted from S3 bucket APIs, as well as fields that Macie generates by processing bucket metadata and access control lists (ACLs). Following the same method as before, I head over to the S3 Bucket Data Reference documentation and look for fields that will help me create useful alerts.

There are plenty of fields which could be useful here, depending on how much information I know in advance about my bucket and which type of security vector I want to protect against. To narrow my search, I decide to add some protection against accidental or unauthorized data destruction.

In the S3 Versioning section, I see that the Multi-Factor Authentication Delete settings are one of the available fields. Since I have this bucket locked down to only allow MFA delete actions, I can create an alert to notify me every time this delete action is disabled.
 
bucket:"my_critical_bucket" AND versioning.MFADelete:"disabled"
 
Another method of potential data destruction is through the bucket lifecycle expiration settings automatically removing data after a period of time. I’d like to know if someone changes this to a low number so I can make a modification and prevent losing any recent data.
 
bucket:"my_critical_bucket" AND lifecycle_configuration.Rules.Expiration.Days:<3
 
Now that I have gathered another set of potential alert queries, I can walk through the same steps I used for CloudTrail data to turn them into custom alerts. Once saved, I’ll begin to receive notifications in the Macie console whenever a match is found. I can view the alerts I’ve received by selecting Alerts from the Macie navigation pane.

Summary

I described the various types and sources of data available in Macie. After demonstrating how to take a security use case and discover relevant fields, I stepped through the process of creating queries and turning them into custom alerts. My goal has been to show you how to build alerts that are tailored to your specific environment and solve your own individual needs.

If you have comments about this post, submit them in the Comments section below. If you have questions about how to use Macie, or you’d like to request new fields and data sources, start a new thread on the Macie forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.