Amazon Web Services (AWS) has re-published the whitepaper Architecting for PCI DSS Scoping and Segmentation on AWS to provide guidance on how to properly define the scope of your Payment Card Industry (PCI) Data Security Standard (DSS) workloads that are running in the AWS Cloud. The whitepaper has been refreshed to include updated AWS best practices and technologies, and updates that are applicable to the new PCI DSS v4.0 requirements. The whitepaper looks at how to define segmentation boundaries between your in-scope and out-of-scope resources by using cloud-based AWS services.
The whitepaper is intended for engineers and solution builders, but it also serves as a guide for Qualified Security Assessors (QSAs) and internal security assessors (ISAs) to better understand the different segmentation controls that are available within AWS products and services, along with associated scoping considerations.
Compared to on-premises environments, software-defined networking on AWS transforms the scoping process for applications by providing additional segmentation controls beyond network segmentation. Thoughtful design of your applications and selection of security-impacting services for implementing your required controls can reduce the number of systems and services in your cardholder data environment (CDE).
Amazon Macie is a fully managed data security service that uses machine learning and pattern matching to discover and help protect your sensitive data, such as personally identifiable information (PII), payment card data, and Amazon Web Services (AWS) credentials. Analyzing large volumes of data for the presence of sensitive information can be expensive, due to the nature of compute-intensive operations involved in the process.
Macie offers several capabilities to help customers reduce the cost of discovering sensitive data, including automated data discovery, which can reduce your spend with new data sampling techniques that are custom-built for Amazon Simple Storage Service (Amazon S3). In this post, we will walk through such Macie capabilities and best practices so that you can cost-efficiently discover sensitive data with Macie.
Overview of the Macie pricing plan
Let’s do a quick recap of how customers pay for the Macie service. With Macie, you are charged based on three dimensions: the number of S3 buckets evaluated for bucket inventory and monitoring, the number of S3 objects monitored for automated data discovery, and the quantity of data inspected for sensitive data discovery. You can read more about these dimensions on the Macie pricing page.
The majority of the cost incurred by customers is driven by the quantity of data inspected for sensitive data discovery. For this reason, we will limit the scope of this post to techniques that you can use to optimize the quantity of data that you scan with Macie.
Not all security use cases require the same quantity of data for scanning
Broadly speaking, you can choose to scan your data in two ways—run full scans on your data or sample a portion of it. When to use which method depends on your use cases and business objectives. Full scans are useful when customers have identified what they want to scan. A few examples of such use cases are: scanning buckets that are open to the internet, monitoring a bucket for unintentionally added sensitive data by scanning every new object, or performing analysis on a bucket after a security incident.
The other option is to use sampling techniques for sensitive data discovery. This method is useful when security teams want to reduce data security risks. For example, just knowing that an S3 bucket contains credit card numbers is enough information to prioritize it for remediation activities.
Macie offers both options, and you can discover sensitive data either by creating and running sensitive data discovery jobs that perform full scans on targeted locations, or by configuring Macie to perform automated sensitive data discovery for your account or organization. You can also use both options simultaneously in Macie.
Use automated data discovery as a best practice
Automated data discovery in Macie minimizes the quantity of data scanning that is needed to a fraction of your S3 estate.
When you enable Macie for the first time, automated data discovery is enabled by default. When you already use Macie in your organization, you can enable automatic data discovery in the management console of the Amazon Macie administrator account. This Macie capability automatically starts discovering sensitive data in your S3 buckets and builds a sensitive data profile for each bucket. The profiles are organized in a visual, interactive data map, and you can use the data map to identify data security risks that need immediate attention.
Automated data discovery in Macie starts to evaluate the level of sensitivity of each of your buckets by using intelligent and fully managed data sampling techniques to minimize the quantity of data scanning needed. During evaluation, objects are organized with similar S3 metadata, such as bucket names, object-key prefixes, file-type extensions, and storage class, into groups that are likely to have similar content. Macie then selects small, but representative, samples from each identified group of objects and scans them to detect the presence of sensitive data. Macie has a feedback loop that uses the results of previously scanned samples to prioritize the next set of samples to inspect.
The automated sensitive data discovery feature is designed to detect sensitive data at scale across hundreds of buckets and accounts, which makes it easier to identify the S3 buckets that need to be prioritized for more focused scanning. Because the amount of data that needs to be scanned is reduced, this task can be done at fraction of the cost of running a full data inspection across all your S3 buckets. The Macie console displays the scanning results as a heat map (Figure 1), which shows the consolidated information grouped by account, and whether a bucket is sensitive, not sensitive, or not analyzed yet.
Figure 1: A heat map showing the results of automated sensitive data discovery
There is a 30-day free trial period when you enable automatic data discovery on your AWS account. During the trial period, in the Macie console, you can see the estimated cost of running automated sensitive data discovery after the trial period ends. After the evaluation period, we charge based on the total quantity of S3 objects in your account, as well as the bytes that are scanned for sensitive content. Charges are prorated per day. You can disable this capability at any time.
Tune your monthly spend on automated sensitive data discovery
To further reduce your monthly spend on automated sensitive data, Macie allows you to exclude buckets from automated discovery. For example, you might consider excluding buckets that are used for storing operational logs, if you’re sure they don’t contain any sensitive information. Your monthly spend is reduced roughly by the percentage of data in those excluded buckets compared to your total S3 estate.
Figure 2 shows the setting in the heatmap area of the Macie console that you can use to exclude a bucket from automated discovery.
Figure 2: Excluding buckets from automated sensitive data discovery from the heatmap
You can also use the automated data discovery settings page to specify multiple buckets to be excluded, as shown in Figure 3.
Figure 3: Excluding buckets from the automated sensitive data discovery settings page
How to run targeted, cost-efficient sensitive data discovery jobs
Making your sensitive data discovery jobs more targeted make them more cost-efficient, because it reduces the quantity of data scanned. Consider using the following strategies:
Make your sensitive data discovery jobs as targeted and specific as possible in their scope by using the Object criteria settings on the Refine the scope page, shown in Figure 4.
Figure 4: Adjusting the scope of a sensitive data discovery job
Options to make discovery jobs more targeted include:
Include objects by using the “last modified” criterion — If you are aware of the frequency at which your classifiable S3-hosted objects get modified, and you want to scan the resources that changed at a particular point in time, include in your scope the objects that were modified at a certain date or time by using the “last modified” criterion.
Consider using random object sampling — With this option, you specify the percentage of eligible S3 objects that you want Macie to analyze when a sensitive data discovery job runs. If this value is less than 100%, Macie selects eligible objects to analyze at random, up to the specified percentage, and analyzes the data in those objects. If your data is highly consistent and you want to determine whether a specific S3 bucket, rather than each object, contains sensitive information, adjust the sampling depth accordingly.
Include objects with specific extensions, tags, or storage size — To fine tune the scope of a sensitive data discovery job, you can also define custom criteria that determine which S3 objects Macie includes or excludes from a job’s analysis. These criteria consist of one or more conditions that derive from properties of S3 objects. You can exclude objects with specific file name extensions, exclude objects by using tags as the criterion, and exclude objects on the basis of their storage size. For example, you can use a criteria-based job to scan the buckets associated with specific tag key/value pairs such as Environment: Production.
Specify S3 bucket criteria in your job — Use a criteria-based job to scan only buckets that have public read/write access. For example, if you have 100 buckets with 10 TB of data, but only two of those buckets containing 100 GB are public, you could reduce your overall Macie cost by 99% by using a criteria-based job to classify only the public buckets.
Consider scheduling jobs based on how long objects live in your S3 buckets. Running jobs at a higher frequency than needed can result in unnecessary costs in cases where objects are added and deleted frequently. For example, if you’ve determined that the S3 objects involved contain high velocity data that is expected to reside in your S3 bucket for a few days, and you’re concerned that sensitive data might remain, scheduling your jobs to run at a lower frequency will help in driving down costs. In addition, you can deselect the Include existing objects checkbox to scan only new objects.
Figure 5: Specifying the frequency of a sensitive data discovery job
As a best practice, review your scheduled jobs periodically to verify that they are still meaningful to your organization. If you aren’t sure whether one of your periodic jobs continues to be fit for purpose, you can pause it so that you can investigate whether it is still needed, without incurring potentially unnecessary costs in the meantime. If you determine that a periodic job is no longer required, you can cancel it completely.
Figure 6: Pausing a scheduled sensitive data discovery job
If you don’t know where to start to make your jobs more targeted, use the results of Macie automated data discovery to plan your scanning strategy. Start with small buckets and the ones that have policy findings associated with them.
In multi-account environments, you can monitor Macie’s usage across your organization in AWS Organizations through the usage page of the delegated administrator account. This will enable you to identify member accounts that are incurring higher costs than expected, and you can then investigate and take appropriate actions to keep expenditure low.
Take advantage of the Macie pricing calculator so that you get an estimate of your Macie fees in advance.
Conclusion
In this post, we highlighted the best practices to keep in mind and configuration options to use when you discover sensitive data with Amazon Macie. We hope that you will walk away with a better understanding of when to use the automated data discovery capability and when to run targeted sensitive data discovery jobs. You can use the pointers in this post to tune the quantity of data you want to scan with Macie, so that you can continuously optimize your Macie spend.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Macie re:Post.
Want more AWS Security news? Follow us on Twitter.
Amazon Web Services (AWS) is pleased to announce that AWS CloudHSM is certified for Payment Card Industry Personal Identification Number (PCI PIN) version 3.1.
With CloudHSM, you can manage and access your keys on FIPS 140-2 Level 3 certified hardware, protected with customer-owned, single-tenant hardware security module (HSM) instances that run in your own virtual private cloud (VPC). This PCI PIN attestation gives you the flexibility to deploy your regulated workloads with reduced compliance overhead.
Coalfire, a third-party Qualified Security Assessor (QSA), evaluated CloudHSM. Customers can access the PCI PIN Attestation of Compliance (AOC) report through AWS Artifact.
To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
We’re continuing to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that six additional services have been added to the scope of our Payment Card Industry Data Security Standard (PCI DSS) certification. This provides our customers with more options to process and store their payment card data and architect their cardholder data environment (CDE) securely on AWS.
AWS was evaluated by Coalfire, a third-party Qualified Security Assessor (QSA). Customers can access the Attestation of Compliance (AOC) report demonstrating our PCI compliance status through AWS Artifact.
To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.
Want more AWS Security news? Follow us on Twitter.
We’re continuing to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that seven new services have been added to the scope of our Payment Card Industry Data Security Standard (PCI DSS) certification. This provides our customers with more options to process and store their payment card data and architect their cardholder data environment (CDE) securely in AWS.
We were evaluated by Coalfire, a third-party Qualified Security Assessor (QSA). Customers can access the Attestation of Compliance (AOC) report demonstrating AWS’ PCI compliance status through AWS Artifact.
To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
We are excited to announce that Amazon Web Services (AWS) has released the latest 2022 Payment Card Industry 3-D Secure (PCI 3DS) attestation to support our customers in the financial services sector. Although AWS doesn’t perform 3DS functions directly, the AWS PCI 3DS attestation of compliance can help customers to attain their own PCI 3DS compliance for their services running on AWS.
All AWS Regions in scope for PCI DSS were included in the 3DS attestation. AWS was assessed by Coalfire, an independent Qualified Security Assessor (QSA).
AWS compliance reports, including this latest PCI 3DS attestation, are available on demand through AWS Artifact. The 3DS package available in AWS Artifact includes the 3DS Attestation of Compliance (AOC) and Shared Responsibility Guide. To learn more about our PCI program and other compliance and security programs, visit the AWS Compliance Programs page.
We value your feedback and questions. If you have feedback about this post, or want to reach out to our team, submit comments through the Contact Us page.
Want more AWS Security news? Follow us on Twitter.
We’re continuing to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that seven new services have been added to the scope of our Payment Card Industry Data Security Standard (PCI DSS) certification. These new services provide our customers with more options to process and store their payment card data and to architect their cardholder data environment (CDE) securely in AWS.
The Asia-Pacific (Jakarta) Region was newly added to scope, and assessed as PCI compliant as part of the Fall 2021 PCI assessment.
We were evaluated by Coalfire, a third-party Qualified Security Assessor (QSA). The Attestation of Compliance (AOC) that shows AWS PCI compliance status is available through AWS Artifact.
We value your feedback and questions—feel free to reach out to our team or give feedback about this post through our Contact Us page.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security news? Follow us on Twitter.
The new AWS PCI 3DS attestation of compliance means customers can now attain their own PCI 3DS compliance for services running on AWS. All AWS Regions in scope for PCI DSS are included in the 3DS attestation. AWS was assessed by Coalfire, an independent Qualified Security Assessor (QSA). AWS compliance reports, including this latest PCI 3DS attestation, are available on demand through AWS Artifact. The 3DS package available in AWS Artifact includes the 3DS Attestation of Compliance (AOC) and a Shared Responsibility Guide.
To learn more about our PCI program and other compliance and security programs, please visit AWS Compliance Programs.
We value your feedback and questions—feel free to reach out to our team or give feedback about this post through our Contact Us page.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security news? Follow us on Twitter.
We’re continuing to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that nine new services have been added to the scope of our Payment Card Industry Data Security Standard (PCI DSS) certification. This provides our customers with more options to process and store their payment card data and architect their cardholder data environment (CDE) securely in AWS.
AWS Local Zones sites were newly assessed as additional infrastructure deployments as part of the spring 2021 PCI assessment.
We were evaluated by Coalfire, a third-party Qualified Security Assessor (QSA). The Attestation of Compliance (AOC) that shows AWS PCI compliance status is available through AWS Artifact.
To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.
If you have feedback about this post, submit comments in the Comments section below.
Many Amazon Web Services (AWS) customer workflows require ingesting sensitive and regulated data such as Payments Card Industry (PCI) data, personally identifiable information (PII), and protected health information (PHI). In this post, I’ll show you a method designed to protect sensitive data for its entire lifecycle in AWS. This method can help enhance your data security posture and be useful for fulfilling the data privacy regulatory requirements applicable to your organization for data protection at-rest, in-transit, and in-use.
An existing method for sensitive data protection in AWS is to use the field-level encryption feature offered by Amazon CloudFront. This CloudFront feature protects sensitive data fields in requests at the AWS network edge. The chosen fields are protected upon ingestion and remain protected throughout the entire application stack. The notion of protecting sensitive data early in its lifecycle in AWS is a highly desirable security architecture. However, CloudFront can protect a maximum of 10 fields and only within HTTP(S) POST requests that carry HTML form encoded payloads.
If your requirements exceed CloudFront’s native field-level encryption feature, such as a need to handle diverse application payload formats, different HTTP methods, and more than 10 sensitive fields, you can implement field-level encryption yourself using the [email protected] feature in CloudFront. In terms of choosing an appropriate encryption scheme, this problem calls for an asymmetric cryptographic system that will allow public keys to be openly distributed to the CloudFront network edges while keeping the corresponding private keys stored securely within the network core. One such popular asymmetric cryptographic system is RSA. Accordingly, we’ll implement a [email protected] function that uses asymmetric encryption using the RSA cryptosystem to protect an arbitrary number of fields in any HTTP(S) request. We will discuss the solution using an example JSON payload, although this approach can be applied to any payload format.
A complex part of any encryption solution is key management. To address that, I use AWS Key Management Service (AWS KMS). AWS KMS simplifies the solution and offers improved security posture and operational benefits, detailed later.
Solution overview
You can protect data in-transit over individual communications channels using transport layer security (TLS), and at-rest in individual storage silos using volume encryption, object encryption or database table encryption. However, if you have sensitive workloads, you might need additional protection that can follow the data as it moves through the application stack. Fine-grained data protection techniques such as field-level encryption allow for the protection of sensitive data fields in larger application payloads while leaving non-sensitive fields in plaintext. This approach lets an application perform business functions on non-sensitive fields without the overhead of encryption, and allows fine-grained control over what fields can be accessed by what parts of the application.
A best practice for protecting sensitive data is to reduce its exposure in the clear throughout its lifecycle. This means protecting data as early as possible on ingestion and ensuring that only authorized users and applications can access the data only when and as needed. CloudFront, when combined with the flexibility provided by [email protected], provides an appropriate environment at the edge of the AWS network to protect sensitive data upon ingestion in AWS.
Since the downstream systems don’t have access to sensitive data, data exposure is reduced, which helps to minimize your compliance footprint for auditing purposes.
The number of sensitive data elements that may need field-level encryption depends on your requirements. For example:
For healthcare applications, HIPAA regulates 18 personal data elements.
In California, the California Consumer Privacy Act (CCPA) regulates at least 11 categories of personal information—each with its own set of data elements.
The idea behind field-level encryption is to protect sensitive data fields individually, while retaining the structure of the application payload. The alternative is full payload encryption, where the entire application payload is encrypted as a binary blob, which makes it unusable until the entirety of it is decrypted. With field-level encryption, the non-sensitive data left in plaintext remains usable for ordinary business functions. When retrofitting data protection in existing applications, this approach can reduce the risk of application malfunction since the data format is maintained.
The following figure shows how PII data fields in a JSON construction that are deemed sensitive by an application can be transformed from plaintext to ciphertext with a field-level encryption mechanism.
Figure 1: Example of field-level encryption
You can change plaintext to ciphertext as depicted in Figure 1 by using a [email protected] function to perform field-level encryption. I discuss the encryption and decryption processes separately in the following sections.
Field-level encryption process
Let’s discuss the individual steps involved in the encryption process as shown in Figure 2.
Figure 2: Field-level encryption process
Figure 2 shows CloudFront invoking a [email protected] function while processing a client request. CloudFront offers multiple integration points for invoking [email protected] functions. Since you are processing a client request and your encryption behavior is related to requests being forwarded to an origin server, you want your function to run upon the origin request event in CloudFront. The origin request event represents an internal state transition in CloudFront that happens immediately before CloudFront forwards a request to the downstream origin server.
You can associate your [email protected] with CloudFront as described in Adding Triggers by Using the CloudFront Console. A screenshot of the CloudFront console is shown in Figure 3. The selected event type is Origin Request and the Include Body check box is selected so that the request body is conveyed to [email protected].
The [email protected] function acts as a programmable hook in the CloudFront request processing flow. You can use the function to replace the incoming request body with a request body with the sensitive data fields encrypted.
You can generate an RSA customer managed key (CMK) in AWS KMS as described in Creating asymmetric CMKs. This is done at system configuration time.
Note: You can use your existing RSA key pairs or generate new ones externally by using OpenSSL commands, especially if you need to perform RSA decryption and key management independently of AWS KMS. Your choice won’t affect the fundamental encryption design pattern presented here.
The RSA key creation in AWS KMS requires two inputs: key length and type of usage. In this example, I created a 2048-bit key and assigned its use for encryption and decryption. The cryptographic configuration of an RSA CMK created in AWS KMS is shown in Figure 4.
Figure 4: Cryptographic properties of an RSA key managed by AWS KMS
Of the two encryption algorithms shown in Figure 4— RSAES_OAEP_SHA_256 and RSAES_OAEP_SHA_1, this example uses RSAES_OAEP_SHA_256. The combination of a 2048-bit key and the RSAES_OAEP_SHA_256 algorithm lets you encrypt a maximum of 190 bytes of data, which is enough for most PII fields. You can choose a different key length and encryption algorithm depending on your security and performance requirements. How to choose your CMK configuration includes information about RSA key specs for encryption and decryption.
Using AWS KMS for RSA key management versus managing the keys yourself eliminates that complexity and can help you:
Enforce IAM and key policies that describe administrative and usage permissions for keys.
Manage cross-account access for keys.
Monitor and alarm on key operations through Amazon CloudWatch.
Audit AWS KMS API invocations through AWS CloudTrail.
Record configuration changes to keys and enforce key specification compliance through AWS Config.
Generate high-entropy keys in an AWS KMS hardware security module (HSM) as required by NIST.
Store RSA private keys securely, without the ability to export.
Perform RSA decryption within AWS KMS without exposing private keys to application code.
Categorize and report on keys with key tags for cost allocation.
Figure 5: RSA public key available for copy or download in the console
Note: As we will see in the sample code in step 3, we embed the public key in the [email protected] deployment package. This is a permissible practice because public keys in asymmetric cryptography systems aren’t a secret and can be freely distributed to entities that need to perform encryption. Alternatively, you can use [email protected] to query AWS KMS for the public key at runtime. However, this introduces latency, increases the load against your KMS account quota, and increases your AWS costs. General patterns for using external data in [email protected] are described in Leveraging external data in [email protected].
Step 2 – HTTP API request handling by CloudFront
CloudFront receives an HTTP(S) request from a client. CloudFront then invokes [email protected] during origin-request processing and includes the HTTP request body in the invocation.
The [email protected] function processes the HTTP request body. The function extracts sensitive data fields and performs RSA encryption over their values.
The following code is sample source code for the [email protected] function implemented in Python 3.7:
import Crypto
import base64
import json
from Crypto.Cipher import PKCS1_OAEP
from Crypto.PublicKey import RSA
# PEM-formatted RSA public key copied over from AWS KMS or your own public key.
RSA_PUBLIC_KEY = "-----BEGIN PUBLIC KEY-----<your key>-----END PUBLIC KEY-----"
RSA_PUBLIC_KEY_OBJ = RSA.importKey(RSA_PUBLIC_KEY)
RSA_CIPHER_OBJ = PKCS1_OAEP.new(RSA_PUBLIC_KEY_OBJ, Crypto.Hash.SHA256)
# Example sensitive data field names in a JSON object.
PII_SENSITIVE_FIELD_NAMES = ["fname", "lname", "email", "ssn", "dob", "phone"]
CIPHERTEXT_PREFIX = "#01#"
CIPHERTEXT_SUFFIX = "#10#"
def lambda_handler(event, context):
# Extract HTTP request and its body as per documentation:
# https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html
http_request = event['Records'][0]['cf']['request']
body = http_request['body']
org_body = base64.b64decode(body['data'])
mod_body = protect_sensitive_fields_json(org_body)
body['action'] = 'replace'
body['encoding'] = 'text'
body['data'] = mod_body
return http_request
def protect_sensitive_fields_json(body):
# Encrypts sensitive fields in sample JSON payload shown earlier in this post.
# [{"fname": "Alejandro", "lname": "Rosalez", … }]
person_list = json.loads(body.decode("utf-8"))
for person_data in person_list:
for field_name in PII_SENSITIVE_FIELD_NAMES:
if field_name not in person_data:
continue
plaintext = person_data[field_name]
ciphertext = RSA_CIPHER_OBJ.encrypt(bytes(plaintext, 'utf-8'))
ciphertext_b64 = base64.b64encode(ciphertext).decode()
# Optionally, add unique prefix/suffix patterns to ciphertext
person_data[field_name] = CIPHERTEXT_PREFIX + ciphertext_b64 + CIPHERTEXT_SUFFIX
return json.dumps(person_list)
The event structure passed into the [email protected] function is described in [email protected] Event Structure. Following the event structure, you can extract the HTTP request body. In this example, the assumption is that the HTTP payload carries a JSON document based on a particular schema defined as part of the API contract. The input JSON document is parsed by the function, converting it into a Python dictionary. The Python native dictionary operators are then used to extract the sensitive field values.
Note: If you don’t know your API payload structure ahead of time or you’re dealing with unstructured payloads, you can use techniques such as regular expression pattern searches and checksums to look for patterns of sensitive data and target them accordingly. For example, credit card primary account numbers include a Luhn checksum that can be programmatically detected. Additionally, services such as Amazon Comprehend and Amazon Macie can be leveraged for detecting sensitive data such as PII in application payloads.
The example uses the standard optimal asymmetric encryption padding (OAEP) and SHA-256 encryption algorithm properties. These properties are supported by AWS KMS and will allow RSA ciphertext produced here to be decrypted by AWS KMS later.
Note: You may have noticed in the code above that we’re bracketing the ciphertexts with predefined prefix and suffix strings:
This is an optional measure and is being implemented to simplify the decryption process.
The prefix and suffix strings help demarcate ciphertext embedded in unstructured data in downstream processing and also act as embedded metadata. Unique prefix and suffix strings allow you to extract ciphertext through string or regular expression (regex) searches during the decryption process without having to know the data body format or schema, or the field names that were encrypted.
Distinct strings can also serve as indirect identifiers of RSA key pair identifiers. This can enable key rotation and allow separate keys to be used for separate fields depending on the data security requirements for individual fields.
You can ensure that the prefix and suffix strings can’t collide with the ciphertext by bracketing them with characters that don’t appear in cyphertext. For example, a hash (#) character cannot be part of a base64 encoded ciphertext string.
The [email protected] function returns the modified HTTP body back to CloudFront and instructs it to replace the original HTTP body with the modified one by setting the following flag:
http_request['body']['action'] = 'replace'
Step 5 – Forward the request to the origin server
CloudFront forwards the modified request body provided by [email protected] to the origin server. In this example, the origin server writes the data body to persistent storage for later processing.
Field-level decryption process
An application that’s authorized to access sensitive data for a business function can decrypt that data. An example decryption process is shown in Figure 6. The figure shows a Lambda function as an example compute environment for invoking AWS KMS for decryption. This functionality isn’t dependent on Lambda and can be performed in any compute environment that has access to AWS KMS.
Figure 6: Field-level decryption process
The steps of the process shown in Figure 6 are described below.
Step 1 – Application retrieves the field-level encrypted data
The example application retrieves the field-level encrypted data from persistent storage that had been previously written during the data ingestion process.
Step 2 – Application invokes the decryption Lambda function
The application invokes a Lambda function responsible for performing field-level decryption, sending the retrieved data to Lambda.
Step 3 – Lambda calls the AWS KMS decryption API
The Lambda function uses AWS KMS for RSA decryption. The example calls the KMS decryption API that inputs ciphertext and returns plaintext. The actual decryption happens in KMS; the RSA private key is never exposed to the application, which is a highly desirable characteristic for building secure applications.
Note: If you choose to use an external key pair, then you can securely store the RSA private key in AWS services like AWS Systems ManagerParameter Store or AWS Secrets Manager and control access to the key through IAM and resource policies. You can fetch the key from relevant vault using the vault’s API, then decrypt using the standard RSA implementation available in your programming language. For example, the cryptography toolkit in Python or javax.crypto in Java.
The Lambda function Python code for decryption is shown below.
import base64
import boto3
import re
kms_client = boto3.client('kms')
CIPHERTEXT_PREFIX = "#01#"
CIPHERTEXT_SUFFIX = "#10#"
# This lambda function extracts event body, searches for and decrypts ciphertext
# fields surrounded by provided prefix and suffix strings in arbitrary text bodies
# and substitutes plaintext fields in-place.
def lambda_handler(event, context):
org_data = event["body"]
mod_data = unprotect_fields(org_data, CIPHERTEXT_PREFIX, CIPHERTEXT_SUFFIX)
return mod_data
# Helper function that performs non-greedy regex search for ciphertext strings on
# input data and performs RSA decryption of them using AWS KMS
def unprotect_fields(org_data, prefix, suffix):
regex_pattern = prefix + "(.*?)" + suffix
mod_data_parts = []
cursor = 0
# Search ciphertexts iteratively using python regular expression module
for match in re.finditer(regex_pattern, org_data):
mod_data_parts.append(org_data[cursor: match.start()])
try:
# Ciphertext was stored as Base64 encoded in our example. Decode it.
ciphertext = base64.b64decode(match.group(1))
# Decrypt ciphertext using AWS KMS
decrypt_rsp = kms_client.decrypt(
EncryptionAlgorithm="RSAES_OAEP_SHA_256",
KeyId="<Your-Key-ID>",
CiphertextBlob=ciphertext)
decrypted_val = decrypt_rsp["Plaintext"].decode("utf-8")
mod_data_parts.append(decrypted_val)
except Exception as e:
print ("Exception: " + str(e))
return None
cursor = match.end()
mod_data_parts.append(org_data[cursor:])
return "".join(mod_data_parts)
The function performs a regular expression search in the input data body looking for ciphertext strings bracketed in predefined prefix and suffix strings that were added during encryption.
While iterating over ciphertext strings one-by-one, the function calls the AWS KMS decrypt() API. The example function uses the same RSA encryption algorithm properties—OAEP and SHA-256—and the Key ID of the public key that was used during encryption in [email protected].
Note that the Key ID itself is not a secret. Any application can be configured with it, but that doesn’t mean any application will be able to perform decryption. The security control here is that the AWS KMS key policy must allow the caller to use the Key ID to perform the decryption. An additional security control is provided by Lambda execution role that should allow calling the KMS decrypt() API.
Step 4 – AWS KMS decrypts ciphertext and returns plaintext
To ensure that only authorized users can perform decrypt operation, the KMS is configured as described in Using key policies in AWS KMS. In addition, the Lambda IAM execution role is configured as described in AWS Lambda execution role to allow it to access KMS. If both the key policy and IAM policy conditions are met, KMS returns the decrypted plaintext. Lambda substitutes the plaintext in place of ciphertext in the encapsulating data body.
Steps three and four are repeated for each ciphertext string.
Step 5 – Lambda returns decrypted data body
Once all the ciphertext has been converted to plaintext and substituted in the larger data body, the Lambda function returns the modified data body to the client application.
Conclusion
In this post, I demonstrated how you can implement field-level encryption integrated with AWS KMS to help protect sensitive data workloads for their entire lifecycle in AWS. Since your [email protected] is designed to protect data at the network edge, data remains protected throughout the application execution stack. In addition to improving your data security posture, this protection can help you comply with data privacy regulations applicable to your organization.
Since you author your own [email protected] function to perform standard RSA encryption, you have flexibility in terms of payload formats and the number of fields that you consider to be sensitive. The integration with AWS KMS for RSA key management and decryption provides significant simplicity, higher key security, and rich integration with other AWS security services enabling an overall strong security solution.
By using encrypted fields with identifiers as described in this post, you can create fine-grained controls for data accessibility to meet the security principle of least privilege. Instead of granting either complete access or no access to data fields, you can ensure least privileges where a given part of an application can only access the fields that it needs, when it needs to, all the way down to controlling access field by field. Field by field access can be enabled by using different keys for different fields and controlling their respective policies.
In addition to protecting sensitive data workloads to meet regulatory and security best practices, this solution can be used to build de-identified data lakes in AWS. Sensitive data fields remain protected throughout their lifecycle, while non-sensitive data fields remain in the clear. This approach can allow analytics or other business functions to operate on data without exposing sensitive data.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
We continue to expand the scope of our assurance programs and are pleased to announce that eight additional services have been added to the scope of our Payment Card Industry Data Security Standard (PCI DSS) certification. This gives our customers more options to process and store their payment card data and architect their cardholder data environment (CDE) securely in Amazon Web Services (AWS).
Private AWS Local Zones and AWS Wavelength sites were newly assessed as additional infrastructure deployments as part of the fall 2020 PCI assessment.
We were evaluated by Coalfire, a third-party Qualified Security Assessor (QSA). The Attestation of Compliance (AOC) evidencing AWS PCI compliance status is available through AWS Artifact.
To learn more about our PCI program and other compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions. You can contact the compliance team through the Contact Us page.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
A new whitepaper is available that summarizes the results of tests by Foregenix comparing Amazon GuardDuty with network intrusion detection systems (IDS) on threat detection of network layer attacks. GuardDuty is a cloud-centric IDS service that uses Amazon Web Services (AWS) data sources to detect a broad range of threat behaviors. Security engineers need to understand how Amazon GuardDuty compares to traditional solutions for network threat detection. Assessors have also asked for clarity on the effectiveness of GuardDuty for meeting compliance requirements, like Payment Card Industry (PCI) Data Security Standard (DSS) requirement 11.4, which requires intrusion detection techniques to be implemented at critical points within a network.
A traditional IDS typically relies on monitoring network traffic at specific network traffic control points, like firewalls and host network interfaces. This allows the IDS to use a set of preconfigured rules to examine incoming data packet information and identify patterns that closely align with network attack types. Traditional IDS have several challenges in the cloud:
Networks are virtualized. Data traffic control points are decentralized and traffic flow management is a shared responsibility with the cloud provider. This makes it difficult or impossible to monitor all network traffic for analysis.
Cloud applications are dynamic. Features like auto-scaling and load balancing continuously change how a network environment is configured as demand fluctuates.
Most traditional IDS require experienced technicians to maintain their effective operation and avoid the common issue of receiving an overwhelming number of false positive findings. As a compliance assessor, I have often seen IDS intentionally de-tuned to address the false positive finding reporting issue when expert, continuous support isn’t available.
GuardDuty analyzes tens of billions of events across multiple AWS data sources, such as AWS CloudTrail, Amazon Virtual Private Cloud (Amazon VPC) flow logs, and Amazon Route 53 DNS logs. This gives GuardDuty the ability to analyze event data, such as AWS API calls to AWS Identity and Access Management (IAM) login events, which is beyond the capabilities of traditional IDS solutions. Monitoring AWS API calls from CloudTrail also enables threat detection for AWS serverless services, which sets it apart from traditional IDS solutions. However, without inspection of packet contents, the question remained, “Is GuardDuty truly effective in detecting network level attacks that more traditional IDS solutions were specifically designed to detect?”
AWS asked Foregenix to conduct a test that would compare GuardDuty to market-leading IDS to help answer this question for us. AWS didn’t specify any specific attacks or architecture to be implemented within their test. It was left up to the independent tester to determine both the threat space covered by market-leading IDS and how to construct a test for determining the effectiveness of threat detection capabilities of GuardDuty and traditional IDS solutions which included open-source and commercial IDS.
Foregenix configured a lab environment to support tests that used extensive and complex attack playbooks. The lab environment simulated a real-world deployment composed of a web server, a bastion host, and an internal server used for centralized event logging. The environment was left running under normal operating conditions for more than 45 days. This allowed all tested solutions to build up a baseline of normal data traffic patterns prior to the anomaly detection testing exercises that followed this activity.
Foregenix determined that GuardDuty is at least as effective at detecting network level attacks as other market-leading IDS. They found GuardDuty to be simple to deploy and required no specialized skills to configure the service to function effectively. Also, with its inherent capability of analyzing DNS requests, VPC flow logs, and CloudTrail events, they concluded that GuardDuty was able to effectively identify threats that other IDS could not natively detect and required extensive manual customization to detect in the test environment. Foregenix recommended that adding a host-based IDS agent on Amazon Elastic Compute Cloud (Amazon EC2) instances would provide an enhanced level of threat defense when coupled with Amazon GuardDuty.
As a PCI Qualified Security Assessor (QSA) company, Foregenix states that they consider GuardDuty as a qualifying network intrusion technique for meeting PCI DSS requirement 11.4. This is important for AWS customers whose applications must maintain PCI DSS compliance. Customers should be aware that individual PCI QSAs might have different interpretations of the requirement, and should discuss this with their assessor before a PCI assessment.
Customer PCI QSAs can also speak with AWS Security Assurance Services, an AWS Professional Services team of PCI QSAs, to obtain more information on how customers can leverage AWS services to help them maintain PCI DSS Compliance. Customers can request Security Assurance Services support through their AWS Account Manager, Solutions Architect, or other AWS support.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon GuardDuty forum or contact AWS Support.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.