All posts by Nicholas Doropoulos

How to use Amazon Macie to reduce the cost of discovering sensitive data

Post Syndicated from Nicholas Doropoulos original https://aws.amazon.com/blogs/security/how-to-use-amazon-macie-to-reduce-the-cost-of-discovering-sensitive-data/

Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to discover and help protect your sensitive data, such as personally identifiable information (PII), payment card data, and Amazon Web Services (AWS) credentials. Analyzing large volumes of data for the presence of sensitive information can be expensive, due to the nature of compute-intensive operations involved in the process.

Macie offers several capabilities to help customers reduce the cost of discovering sensitive data, including automated data discovery, which can reduce your spend with new data sampling techniques that are custom-built for Amazon Simple Storage Service (Amazon S3). In this post, we will walk through such Macie capabilities and best practices so that you can cost-efficiently discover sensitive data with Macie.

Overview of the Macie pricing plan

Let’s do a quick recap of how customers pay for the Macie service. With Macie, you are charged based on three dimensions: the number of S3 buckets evaluated for bucket inventory and monitoring, the number of S3 objects monitored for automated data discovery, and the quantity of data inspected for sensitive data discovery. You can read more about these dimensions on the Macie pricing page.

The majority of the cost incurred by customers is driven by the quantity of data inspected for sensitive data discovery. For this reason, we will limit the scope of this post to techniques that you can use to optimize the quantity of data that you scan with Macie.

Not all security use cases require the same quantity of data for scanning

Broadly speaking, you can choose to scan your data in two ways—run full scans on your data or sample a portion of it. When to use which method depends on your use cases and business objectives. Full scans are useful when customers have identified what they want to scan. A few examples of such use cases are: scanning buckets that are open to the internet, monitoring a bucket for unintentionally added sensitive data by scanning every new object, or performing analysis on a bucket after a security incident.

The other option is to use sampling techniques for sensitive data discovery. This method is useful when security teams want to reduce data security risks. For example, just knowing that an S3 bucket contains credit card numbers is enough information to prioritize it for remediation activities.

Macie offers both options, and you can discover sensitive data either by creating and running sensitive data discovery jobs that perform full scans on targeted locations, or by configuring Macie to perform automated sensitive data discovery for your account or organization. You can also use both options simultaneously in Macie.

Use automated data discovery as a best practice

Automated data discovery in Macie minimizes the quantity of data scanning that is needed to a fraction of your S3 estate.

When you enable Macie for the first time, automated data discovery is enabled by default. When you already use Macie in your organization, you can enable automatic data discovery in the management console of the Amazon Macie administrator account. This Macie capability automatically starts discovering sensitive data in your S3 buckets and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to identify data security risks that need immediate attention.

Automated data discovery in Macie starts to evaluate the level of sensitivity of each of your buckets by using intelligent and fully managed data sampling techniques to minimize the quantity of data scanning needed. During evaluation, objects are organized with similar S3 metadata, such as bucket names, object-key prefixes, file-type extensions, and storage class, into groups that are likely to have similar content. Macie then selects small, but representative, samples from each identified group of objects and scans them to detect the presence of sensitive data. Macie has a feedback loop that uses the results of previously scanned samples to prioritize the next set of samples to inspect.

The automated sensitive data discovery feature is designed to detect sensitive data at scale across hundreds of buckets and accounts, which makes it easier to identify the S3 buckets that need to be prioritized for more focused scanning. Because the amount of data that needs to be scanned is reduced, this task can be done at fraction of the cost of running a full data inspection across all your S3 buckets. The Macie console displays the scanning results as a heat map (Figure 1), which shows the consolidated information grouped by account, and whether a bucket is sensitive, not sensitive, or not analyzed yet.

Figure 1: A heat map showing the results of automated sensitive data discovery

Figure 1: A heat map showing the results of automated sensitive data discovery

There is a 30-day free trial period when you enable automatic data discovery on your AWS account. During the trial period, in the Macie console, you can see the estimated cost of running automated sensitive data discovery after the trial period ends. After the evaluation period, we charge based on the total quantity of S3 objects in your account, as well as the bytes that are scanned for sensitive content. Charges are prorated per day. You can disable this capability at any time.

Tune your monthly spend on automated sensitive data discovery

To further reduce your monthly spend on automated sensitive data, Macie allows you to exclude buckets from automated discovery. For example, you might consider excluding buckets that are used for storing operational logs, if you’re sure they don’t contain any sensitive information. Your monthly spend is reduced roughly by the percentage of data in those excluded buckets compared to your total S3 estate.

Figure 2 shows the setting in the heatmap area of the Macie console that you can use to exclude a bucket from automated discovery.

Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap

Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap

You can also use the automated data discovery settings page to specify multiple buckets to be excluded, as shown in Figure 3.

Figure 3: Excluding buckets from the automated sensitive data discovery settings page

Figure 3: Excluding buckets from the automated sensitive data discovery settings page

How to run targeted, cost-efficient sensitive data discovery jobs

Making your sensitive data discovery jobs more targeted make them more cost-efficient, because it reduces the quantity of data scanned. Consider using the following strategies:

  1. Make your sensitive data discovery jobs as targeted and specific as possible in their scope by using the Object criteria settings on the Refine the scope page, shown in Figure 4.
    Figure 4: Adjusting the scope of a sensitive data discovery job

    Figure 4: Adjusting the scope of a sensitive data discovery job

    Options to make discovery jobs more targeted include:

    • Include objects by using the “last modified” criterion — If you are aware of the frequency at which your classifiable S3-hosted objects get modified, and you want to scan the resources that changed at a particular point in time, include in your scope the objects that were modified at a certain date or time by using the “last modified” criterion.
    • Don’t scan CloudTrail logs — Identify the S3 bucket prefixes that contain AWS CloudTrail logs and exclude them from scanning.
    • Consider using random object sampling — With this option, you specify the percentage of eligible S3 objects that you want Macie to analyze when a sensitive data discovery job runs. If this value is less than 100%, Macie selects eligible objects to analyze at random, up to the specified percentage, and analyzes the data in those objects. If your data is highly consistent and you want to determine whether a specific S3 bucket, rather than each object, contains sensitive information, adjust the sampling depth accordingly.
    • Include objects with specific extensions, tags, or storage size — To fine tune the scope of a sensitive data discovery job, you can also define custom criteria that determine which S3 objects Macie includes or excludes from a job’s analysis. These criteria consist of one or more conditions that derive from properties of S3 objects. You can exclude objects with specific file name extensions, exclude objects by using tags as the criterion, and exclude objects on the basis of their storage size. For example, you can use a criteria-based job to scan the buckets associated with specific tag key/value pairs such as Environment: Production.
  2. Specify S3 bucket criteria in your job — Use a criteria-based job to scan only buckets that have public read/write access. For example, if you have 100 buckets with 10 TB of data, but only two of those buckets containing 100 GB are public, you could reduce your overall Macie cost by 99% by using a criteria-based job to classify only the public buckets.
  3. Consider scheduling jobs based on how long objects live in your S3 buckets. Running jobs at a higher frequency than needed can result in unnecessary costs in cases where objects are added and deleted frequently. For example, if you’ve determined that the S3 objects involved contain high velocity data that is expected to reside in your S3 bucket for a few days, and you’re concerned that sensitive data might remain, scheduling your jobs to run at a lower frequency will help in driving down costs. In addition, you can deselect the Include existing objects checkbox to scan only new objects.
    Figure 5: Specifying the frequency of a sensitive data discovery job

    Figure 5: Specifying the frequency of a sensitive data discovery job

  4. As a best practice, review your scheduled jobs periodically to verify that they are still meaningful to your organization. If you aren’t sure whether one of your periodic jobs continues to be fit for purpose, you can pause it so that you can investigate whether it is still needed, without incurring potentially unnecessary costs in the meantime. If you determine that a periodic job is no longer required, you can cancel it completely.
    Figure 6: Pausing a scheduled sensitive data discovery job

    Figure 6: Pausing a scheduled sensitive data discovery job

  5. If you don’t know where to start to make your jobs more targeted, use the results of Macie automated data discovery to plan your scanning strategy. Start with small buckets and the ones that have policy findings associated with them.
  6. In multi-account environments, you can monitor Macie’s usage across your organization in AWS Organizations through the usage page of the delegated administrator account. This will enable you to identify member accounts that are incurring higher costs than expected, and you can then investigate and take appropriate actions to keep expenditure low.
  7. Take advantage of the Macie pricing calculator so that you get an estimate of your Macie fees in advance.

Conclusion

In this post, we highlighted the best practices to keep in mind and configuration options to use when you discover sensitive data with Amazon Macie. We hope that you will walk away with a better understanding of when to use the automated data discovery capability and when to run targeted sensitive data discovery jobs. You can use the pointers in this post to tune the quantity of data you want to scan with Macie, so that you can continuously optimize your Macie spend.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.

Want more AWS Security news? Follow us on Twitter.

Nicholas Doropoulos

Nicholas Doropoulos

Nicholas is an AWS Cloud Security Engineer, a Bestselling Udemy Instructor, and a subject matter expert in AWS Shield, GuardDuty and Certificate Manager. Outside work, he enjoys spending his time with his wife and their beautiful baby son.

Koulick Ghosh

Koulick Ghosh

Koulick is a Senior Product Manager in AWS Security based in Seattle, WA. He loves speaking with customers on how AWS Security services can help make them more secure. In his free-time, he enjoys playing the guitar, reading, and exploring the Pacific Northwest.

10 reasons to import a certificate into AWS Certificate Manager (ACM)

Post Syndicated from Nicholas Doropoulos original https://aws.amazon.com/blogs/security/10-reasons-to-import-a-certificate-into-aws-certificate-manager-acm/

AWS Certificate Manager (ACM) is a service that lets you efficiently provision, manage, and deploy public and private SSL/TLS certificates for use with AWS services and your internal connected resources. The certificates issued by ACM can then be used to secure network communications and establish the identity of websites on the internet or resources on private networks.

So why might you want to import a certificate into ACM, rather than using a certificate issued by ACM? According to the AWS Certificate Manager User Guide topic Importing certificates into AWS Certificate Manager, “you might do this because you already have a certificate from a third-party certificate authority (CA), or because you have application-specific requirements that are not met by ACM issued certificates.”

In this blog post, I’ll list 10 reasons why you might want to import a certificate into ACM, including what specific requirements you might have, and why you might want to use a certificate signed by a third-party CA in the first place.

1. To use an ECDSA certificate for faster TLS connections

Imported Elliptic Curve Digital Signature Algorithm (ECDSA) certificates use smaller keys than ACM issued public RSA certificates, allowing for TLS connections to be established faster. For this reason, ECDSA certificates are particularly useful for systems with limited processing resources, such as Internet of Things (IoT) devices. ACM supports imported certificates with ECDSA in 256, 384, and 521 bit variations. If you want to use an ECDSA certificate for your public-facing web application, you need to get a third-party certificate and then import it into ACM. For more information about supported cryptographic algorithms for imported certificates, see Prerequisites for importing certificates in the AWS Certificate Manager User Guide.

2. To control your certificate’s renewal cycle

When you import a certificate into ACM, you have greater control over its renewal cycle simply because you can re-import it as frequently as you want. You also have control over how often your imported certificate’s private key can be rotated. As a best practice, you should rotate your certificate’s private key based on your certificate’s usage frequency.

Note: When you re-import your certificate, to maintain the existing associations during renewal, ensure that you specify the existing certificate’s Amazon Resource Name (ARN). For more information and step-by-step instructions, see Reimporting a certificate in the AWS Certificate Manager User Guide.

3. To use certificate pinning

You might have an application that requires certificate pinning, which is the practice of bypassing the typical hierarchical model of trust that is governed by certificate authorities. With certificate pinning, a host’s identity is trusted based on a specific certificate or public key. As a certificate pinning best practice, AWS recommends that public certificates issued by ACM should not be pinned because ACM will generate a new public/private key pair at the next renewal phase, which essentially replaces the pinned certificate with a new one, causing service disruption along the process. If you want to use certificate pinning, you can pin an imported certificate because imported certificates are not subject to managed renewal, thereby reducing the risk of production impact.

4. To use a higher-assurance certificate

You might want to use a higher-assurance certificate, such as an organization validation (OV) or extended validation (EV) certificate. Certificates issued by ACM currently only support domain validation (DV). If the domain you want to protect is an application that requires OV or EV, you can import OV or EV certificates into ACM by using a third-party certificate of either type. You can use the ACM API action ImportCertificate to import OV or EV certificates into ACM.

5. To use a self-signed certificate

For internal testing environments where your developers want speed and flexibility, self-signed certificates are issued faster and effortlessly. However, it’s important to know that self-signed certificates are not trusted by default, which means that self-signed certificates need to be installed inside the trust stores of the intended clients, to avoid the risk of your users getting into the habit of ignoring browser warnings. For more information, see the additional requirements for self-signed certificates in Prerequisites for importing certificates in the AWS Certificate Manager User Guide.

6. To use an IP address for the certificate’s subject

By design, the subject field of an ACM certificate can only identify a fully qualified domain name (FQDN). If you want to use an IP address for the certificate’s subject, then you can create the certificate and import it to ACM.

7. To exceed the number of domains allowed by the ACM quotas

Certificates issued by ACM are subject to the ACM service quotas. The default quota for ACM is 10 domain names for each ACM certificate, and you can request an increase to the quota up to a maximum of 100 domain names for each certificate. However, if you import certificates, they are not subject to the quotas, and you can use a public certificate with more than 100 FQDNs in its domain scope without having to go through the process of requesting any limit increases.

8. To use a private certificate issued by ACM Private CA with the IssueCertificate API action

Certificates provisioned with the IssueCertificate API action have a private status and cannot be associated directly with an AWS integrated service, such as an internal Application Load Balancer. Instead, a private certificate issued by AWS Certificate Manager Private Certificate Authority (ACM Private CA) with the IssueCertificate API action needs to be exported and then imported into ACM before the association can be made. The same is true for certificate templates as well, which are configuration templates that can be passed as parameters to the IssueCertificate API action as a means to have greater control over the private certificate’s extensions.

9. To use a private certificate issued by your on-premises CA

You might want to use a private certificate issued by your on-premises CA instead of using ACM Private CA. To administer your internal public key infrastructure (PKI), AWS generally recommends that you use ACM Private CA. However, you might still come across scenarios where a certificate signed by your on-premises CA is better suited for your specific needs. For example, you might want to have a common root of trust, for consistency and interoperability purposes across a hybrid PKI solution. Furthermore, using an external parent CA with ACM Private CA also allows you to enforce CA name constraints. For more information, see Signing private CA certificates with an external CA in the AWS Certificate Manager Private Certificate Authority User Guide.

10. To use a certificate for something other than securing a public website

In addition to securing a public website, you can use certificates for other purposes. For example, you can import client and server certificates as part of an OpenVPN setup. For more information about this example, see How can I generate server and client certificates and their respective keys on a Windows server and upload them to AWS Certificate Manager (ACM)? In addition, you can import a code-signing certificate for use with AWS IoT Device Management. For more information about how to import a code-signing certificate, see (For IoT only) Obtain and import a code-signing certificate in the AWS Signer Developer Guide.

Conclusion

In this blog post, you learned about some of the reasons you might want to import a certificate into AWS Certificate Manager (ACM). For more information about importing certificates into ACM and step-by-step instructions, see Importing certificates into AWS Certificate Manager in the AWS Certificate Manager User Guide. For the latest pricing information, see the AWS Certificate Manager Pricing page on the AWS website. You can also use the AWS pricing calculator to estimate costs.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Nicholas Doropoulos

Nicholas Doropoulos

Nicholas is a Cloud Security Engineer II, Bestselling Udemy Instructor, AWS Shield, GuardDuty and Certificate Manager SME. In his spare time, he enjoys creating tools, practising his OSINT skills by participating in Search Party CTFs for missing people and registering Google Dorks in Offensive Security’s Google Hacking Database.