Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.
We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.
We generated a total of 600,000 samples by querying GPT-2 with three different sampling strategies. Each sample contains 256 tokens, or roughly 200 words on average. Among these samples, we selected 1,800 samples with abnormally high likelihood for manual inspection. Out of the 1,800 samples, we found 604 that contain text which is reproduced verbatim from the training set.
The rest of the blog post discusses the types of data they found.
Where did the last month go? Were you able to catch all of the sessions in the Security, Identity, and Compliance track you hoped to see at AWS re:Invent? If you missed any, don’t worry—you can stream all the sessions released in 2020 via the AWS re:Invent website. Additionally, we’re starting 2021 with all new sessions that you can stream live January 12–15. Here are the new Security, Identity, and Compliance sessions—each session is offered at multiple times, so you can find the time that works best for your location and schedule.
Protecting sensitive data with Amazon Macie and Amazon GuardDuty – SEC210 Himanshu Verma, AWS Speaker
Tuesday, January 12 – 11:00 AM to 11:30 AM PST Tuesday, January 12 – 7:00 PM to 7:30 PM PST Wednesday, January 13 – 3:00 AM to 3:30 AM PST
As organizations manage growing volumes of data, identifying and protecting your sensitive data can become increasingly complex, expensive, and time-consuming. In this session, learn how Amazon Macie and Amazon GuardDuty together provide protection for your data stored in Amazon S3. Amazon Macie automates the discovery of sensitive data at scale and lowers the cost of protecting your data. Amazon GuardDuty continuously monitors and profiles S3 data access events and configurations to detect suspicious activities. Come learn about these security services and how to best use them for protecting data in your environment.
BBC: Driving security best practices in a decentralized organization – SEC211 Apurv Awasthi, AWS Speaker Andrew Carlson, Sr. Software Engineer – BBC
Tuesday, January 12 – 1:15 PM to 1:45 PM PST Tuesday, January 12 – 9:15 PM to 9:45 PM PST Wednesday, January 13 – 5:15 AM to 5:45 AM PST
In this session, Andrew Carlson, engineer at BBC, talks about BBC’s journey while adopting AWS Secrets Manager for lifecycle management of its arbitrary credentials such as database passwords, API keys, and third-party keys. He provides insight on BBC’s secrets management best practices and how the company drives these at enterprise scale in a decentralized environment that has a highly visible scope of impact.
Get ahead of the curve with DDoS Response Team escalations – SEC321 Fola Bolodeoku, AWS Speaker
Tuesday, January 12 – 3:30 PM to 4:00 PM PST Tuesday, January 12 – 11:30 PM to 12:00 AM PST Wednesday, January – 7:30 AM to 8:00 AM PST
This session identifies tools and tricks that you can use to prepare for application security escalations, with lessons learned provided by the AWS DDoS Response Team. You learn how AWS customers have used different AWS offerings to protect their applications, including network access control lists, security groups, and AWS WAF. You also learn how to avoid common misconfigurations and mishaps observed by the DDoS Response Team, and you discover simple yet effective actions that you can take to better protect your applications’ availability and security controls.
Network security for serverless workloads – SEC322 Alex Tomic, AWS Speaker
Thursday, January 14 -1:30 PM to 2:00 PM PST Thursday, January 14 – 9:30 PM to 10:00 PM PST Friday, January 15 – 5:30 AM to 6:00 AM PST
Are you building a serverless application using services like Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon Aurora, and Amazon SQS? Would you like to apply enterprise network security to these AWS services? This session covers how network security concepts like encryption, firewalls, and traffic monitoring can be applied to a well-architected AWS serverless architecture.
Building your cloud incident response program – SEC323 Freddy Kasprzykowski, AWS Speaker
Wednesday, January 13 – 9:00 AM to 9:30 AM PST Wednesday, January 13 – 5:00 PM to 5:30 PM PST Thursday, January 14 – 1:00 AM to 1:30 AM PST
You’ve configured your detection services and now you’ve received your first alert. This session provides patterns that help you understand what capabilities you need to build and run an effective incident response program in the cloud. It includes a review of some logs to see what they tell you and a discussion of tools to analyze those logs. You learn how to make sure that your team has the right access, how automation can help, and which incident response frameworks can guide you.
Wednesday, January 13 – 2:15 PM to 2:45 PM PST Wednesday, January 13 – 10:15 PM to 10:45 PM PST Thursday, January 14 – 6:15 AM to 6:45 AM PST
Amazon Cognito is a flexible user directory that can meet the needs of a number of customer identity management use cases. Web and mobile applications can integrate with Amazon Cognito in minutes to offer user authentication and get standard tokens to be used in token-based authorization scenarios. This session covers best practices that you can implement in your application to secure and protect tokens. You also learn about new Amazon Cognito features that give you more options to improve the security and availability of your application.
Event-driven data security using Amazon Macie – SEC325 Neha Joshi, AWS Speaker
Thursday, January 14 – 8:00 AM to 8:30 AM PST Thursday, January 14 – 4:00 PM to 4:30 PM PST Friday, January 15 – 12:00 AM to 12:30 AM PST
Amazon Macie sensitive data discovery jobs for Amazon S3 buckets help you discover sensitive data such as personally identifiable information (PII), financial information, account credentials, and workload-specific sensitive information. In this session, you learn about an automated approach to discover sensitive information whenever changes are made to the objects in your S3 buckets.
Thursday, January 14 – 10:15 AM to 10:45 AM PST Thursday, January 14 – 6:15 PM to 6:45 PM PST Friday, January 15 – 2:15 AM to 2:45 AM PST
In this session, learn about several instance containment and isolation techniques, ranging from simple and effective to more complex and powerful, that leverage native AWS networking services and account configuration techniques. If an incident happens, you may have questions like “How do we isolate the system while preserving all the valuable artifacts?” and “What options do we even have?”. These are valid questions, but there are more important ones to discuss amidst a (possible) incident. Join this session to learn highly effective instance containment techniques in a crawl-walk-run approach that also facilitates preservation and collection of valuable artifacts and intelligence.
Trusted connects for government workloads – SEC402 Brad Dispensa, AWS Speaker
Wednesday, January 13 – 11:15 AM to 11:45 AM PST Wednesday, January 13 – 7:15 PM to 7:45 PM PST Thursday, January 14 – 3:15 AM to 3:45 AM PST
Cloud adoption across the public sector is making it easier to provide government workforces with seamless access to applications and data. With this move to the cloud, we also need updated security guidance to ensure public-sector data remain secure. For example, the TIC (Trusted Internet Connections) initiative has been a requirement for US federal agencies for some time. The recent TIC-3 moves from prescriptive guidance to an outcomes-based model. This session walks you through how to leverage AWS features to better protect public-sector data using TIC-3 and the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF). Also, learn how this might map into other geographies.
I look forward to seeing you in these sessions. Please see the re:Invent agenda for more details and to build your schedule.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
More than 100,000 Zyxel firewalls, VPN gateways, and access point controllers contain a hardcoded admin-level backdoor account that can grant attackers root access to devices via either the SSH interface or the web administration panel.
Installing patches removes the backdoor account, which, according to Eye Control researchers, uses the “zyfwp” username and the “PrOw!aN_fXp” password.
“The plaintext password was visible in one of the binaries on the system,” the Dutch researchers said in a report published before the Christmas 2020 holiday.
Data is a crucial part of every business and is used for strategic decision making at all levels of an organization. To extract value from their data more quickly, Amazon Web Services (AWS) customers are building automated data pipelines—from data ingestion to transformation and analytics. As part of this process, my customers often ask how to prevent sensitive data, such as personally identifiable information, from being ingested into data lakes when it’s not needed. They highlight that this challenge is compounded when ingesting unstructured data—such as files from process reporting, text files from chat transcripts, and emails. They also mention that identifying sensitive data inadvertently stored in structured data fields—such as in a comment field stored in a database—is also a challenge.
In this post, I show you how to integrate Amazon Macie as part of the data ingestion step in your data pipeline. This solution provides an additional checkpoint that sensitive data has been appropriately redacted or tokenized prior to ingestion. Macie is a fully managed data security and privacy service that uses machine learning and pattern matching to discover sensitive data in AWS.
When Macie discovers sensitive data, the solution notifies an administrator to review the data and decide whether to allow the data pipeline to continue ingesting the objects. If allowed, the objects will be tagged with an Amazon Simple Storage Service (Amazon S3) object tag to identify that sensitive data was found in the object before progressing to the next stage of the pipeline.
This combination of automation and manual review helps reduce the risk that sensitive data—such as personally identifiable information—will be ingested into a data lake. This solution can be extended to fit your use case and workflows. For example, you can define custom data identifiers as part of your scans, add additional validation steps, create Macie suppression rules to archive findings automatically, or only request manual approvals for findings that meet certain criteria (such as high severity findings).
Typically, customers will perform validation and clean their data before moving it to a raw data zone. This solution adds validation steps to that pipeline after preliminary quality checks and data cleaning is performed, noted in blue (in layer 3) of Figure 1. The layers outlined in the pipeline are:
Ingestion – Brings data into the data lake.
Storage – Provides durable, scalable, and secure components to store the data—typically using S3 buckets.
Processing – Transforms data into a consumable state through data validation, cleanup, normalization, transformation, and enrichment. This processing layer is where the additional validation steps are added to identify instances of sensitive data that haven’t been appropriately redacted or tokenized prior to consumption.
Consumption – Provides tools to gain insights from the data in the data lake.
Figure 1: Data pipeline with sensitive data scan
The application runs on a scheduled basis (four times a day, every 6 hours by default) to process data that is added to the raw data S3 bucket. You can customize the application to perform a sensitive data discovery scan during any stage of the pipeline. Because most customers do their extract, transform, and load (ETL) daily, the application scans for sensitive data on a scheduled basis before any crawler jobs run to catalog the data and after typical validation and data redaction or tokenization processes complete.
You can expect that this additional validation will add 5–10 minutes to your pipeline execution at a minimum. The validation processing time will scale linearly based on object size, but there is a start-up time per job that is constant.
If sensitive data is found in the objects, an email is sent to the designated administrator requesting an approval decision, which they indicate by selecting the link corresponding to their decision to approve or deny the next step. In most cases, the reviewer will choose to adjust the sensitive data cleanup processes to remove the sensitive data, deny the progression of the files, and re-ingest the files in the pipeline.
Additional considerations for deploying this application for regular use are discussed at the end of the blog post.
The following resources are created as part of the application:
S3 buckets store data in various stages of processing: A raw data bucket for uploading objects for the data pipeline, a scanning bucket where objects are scanned for sensitive data, a manual review bucket holding objects where sensitive data was discovered, and a scanned data bucket for starting the next ingestion step of the data pipeline.
Lambda functions execute the logic to run the sensitive data scans and workflow.
Note: the application uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See AWS Pricing for details. The primary drivers of the solution cost will be the amount of data ingested through the pipeline, both for Amazon S3 storage and data processed for sensitive data discovery with Macie.
The architecture of the application is shown in Figure 2 and described in the text that follows.
Figure 2: Application architecture and logic
Objects are uploaded to the raw data S3 bucket as part of the data ingestion process.
A scheduled EventBridge rule runs the sensitive data scan Step Functions workflow.
triggerMacieScan Lambda function moves objects from the raw data S3 bucket to the scan stage S3 bucket.
triggerMacieScan Lambda function creates a Macie sensitive data discovery job on the scan stage S3 bucket.
checkMacieStatus Lambda function checks the status of the Macie sensitive data discovery job.
getMacieFindingsCount Lambda function counts all of the findings from the Macie sensitive data discovery job.
isSensitiveDataFound Step Functions Choice state checks whether sensitive data was found in the Macie sensitive data discovery job.
If there was sensitive data discovered, run the triggerManualApproval Lambda function.
If there was no sensitive data discovered, run the moveAllScanStageS3Files Lambda function.
moveAllScanStageS3Files Lambda function moves all of the objects from the scan stage S3 bucket to the scanned data S3 bucket.
triggerManualApproval Lambda function tags and moves objects with sensitive data discovered to the manual review S3 bucket, and moves objects with no sensitive data discovered to the scanned data S3 bucket. The function then sends a notification to the ApprovalRequestNotification Amazon SNS topic as a notification that manual review is required.
Email is sent to the email address that’s subscribed to the ApprovalRequestNotification Amazon SNS topic (from the application deployment template) for the manual review user with the option to Approve or Deny pipeline ingestion for these objects.
Manual review user assesses the objects with sensitive data in the manual review S3 bucket and selects the Approve or Deny links in the email.
The decision request is sent from the Amazon API Gateway to the receiveApprovalDecision Lambda function.
manualApprovalChoice Step Functions Choice state checks the decision from the manual review user.
If denied, run the deleteManualReviewS3Files Lambda function.
If approved, run the moveToScannedDataS3Files Lambda function.
deleteManualReviewS3Files Lambda function deletes the objects from the manual review S3 bucket.
moveToScannedDataS3Files Lambda function moves the objects from the manual review S3 bucket to the scanned data S3 bucket.
The next step of the automated data pipeline will begin with the objects in the scanned data S3 bucket.
For this application, you need the following prerequisites:
You can use AWS Cloud9 to deploy the application. AWS Cloud9 includes the AWS CLI and AWS SAM CLI to simplify setting up your development environment.
Deploy the application with AWS SAM CLI
You can deploy this application using the AWS SAM CLI. AWS SAM uses AWS CloudFormation as the underlying deployment mechanism. AWS SAM is an open-source framework that you can use to build serverless applications on AWS.
To deploy the application
Initialize the serverless application using the AWS SAM CLI from the GitHub project in the aws-samples repository. This will clone the project locally which includes the source code for the Lambda functions, Step Functions state machine definition file, and the AWS SAM template. On the command line, run the following:
sam init --location gh: aws-samples/amazonmacie-datapipeline-scan
Deploy your application to your AWS account. On the command line, run the following:
sam deploy --guided
Complete the prompts during the guided interactive deployment. The first deployment prompt is shown in the following example.
Configuring SAM deploy
Looking for config file [samconfig.toml] : Found
Reading default arguments : Success
Setting default arguments for 'sam deploy'
Stack Name [maciepipelinescan]:
Stack Name – Name of the CloudFormation stack to be created.
AWS Region – Region—for example, us-west-2, eu-west-1, ap-southeast-1—to deploy the application to. This application was tested in the us-west-2 and ap-southeast-1 Regions. Before selecting a Region, verify that the services you need are available in those Regions (for example, Macie and Step Functions).
Parameter StepFunctionName – Name of the Step Functions state machine to be created—for example, maciepipelinescanstatemachine).
Parameter BucketNamePrefix – Prefix to apply to the S3 buckets to be created (S3 bucket names are globally unique, so choosing a random prefix helps ensure uniqueness).
Parameter ApprovalEmailDestination – Email address to receive the manual review notification.
Parameter EnableMacie – Whether you need Macie enabled in your account or Region. You can select yes or no; select yes if you need Macie to be enabled for you as part of this template, select no, if you already have Macie enabled.
Confirm changes and provide approval for AWS SAM CLI to deploy the resources to your AWS account by responding y to prompts, as shown in the following example. You can accept the defaults for the SAM configuration file and SAM configuration environment prompts.
#Shows you resources changes to be deployed and require a 'Y' to initiate deploy
Confirm changes before deploy [y/N]: y
#SAM needs permission to be able to create roles to connect to the resources in your template
Allow SAM CLI IAM role creation [Y/n]: y
ReceiveApprovalDecisionAPI may not have authorization defined, Is this okay? [y/N]: y
ReceiveApprovalDecisionAPI may not have authorization defined, Is this okay? [y/N]: y
Save arguments to configuration file [Y/n]: y
SAM configuration file [samconfig.toml]:
SAM configuration environment [default]:
Note: This application deploys an Amazon API Gateway with two REST API resources without authorization defined to receive the decision from the manual review step. You will be prompted to accept each resource without authorization. A token (Step Functions taskToken) is used to authenticate the requests.
This creates an AWS CloudFormation changeset. Once the changeset creation is complete, you must provide a final confirmation of y to Deploy the changeset? [y/N] when prompted as shown in the following example.
Changeset created successfully. arn:aws:cloudformation:ap-southeast-1:XXXXXXXXXXXX:changeSet/samcli-deploy1605213119/db681961-3635-4305-b1c7-dcc754c7XXXX
Previewing CloudFormation changeset before deployment
Deploy this changeset? [y/N]:
Your application is deployed to your account using AWS CloudFormation. You can track the deployment events in the command prompt or via the AWS CloudFormation console.
After the application deployment is complete, you must confirm the subscription to the Amazon SNS topic. An email will be sent to the email address entered in Step 3 with a link that you need to select to confirm the subscription. This confirmation provides opt-in consent for AWS to send emails to you via the specified Amazon SNS topic. The emails will be notifications of potentially sensitive data that need to be approved. If you don’t see the verification email, be sure to check your spam folder.
Test the application
The application uses an EventBridge scheduled rule to start the sensitive data scan workflow, which runs every 6 hours. You can manually start an execution of the workflow to verify that it’s working. To test the function, you will need a file that contains data that matches your rules for sensitive data. For example, it is easy to create a spreadsheet, document, or text file that contains names, addresses, and numbers formatted like credit card numbers. You can also use this generated sample data to test Macie.
We will test by uploading a file to our S3 bucket via the AWS web console. If you know how to copy objects from the command line, that also works.
Upload test objects to the S3 bucket
Navigate to the Amazon S3 console and upload one or more test objects to the <BucketNamePrefix>-data-pipeline-raw bucket. <BucketNamePrefix> is the prefix you entered when deploying the application in the AWS SAM CLI prompts. You can use any objects as long as they’re a supported file type for Amazon Macie. I suggest uploading multiple objects, some with and some without sensitive data, in order to see how the workflow processes each.
Start the Scan State Machine
Navigate to the Step Functions state machines console. If you don’t see your state machine, make sure you’re connected to the same region that you deployed your application to.
Choose the state machine you created using the AWS SAM CLI as seen in Figure 3. The example state machine is maciepipelinescanstatemachine, but you might have used a different name in your deployment.
Figure 3: AWS Step Functions state machines console
Select the Start execution button and copy the value from the Enter an execution name – optional box. Change the Input – optional value replacing <execution id> with the value just copied as follows:
“id”: “<execution id>”
In my example, the <execution id> is fa985a4f-866b-b58b-d91b-8a47d068aa0c from the Enter an execution name – optional box as shown in Figure 4. You can choose a different ID value if you prefer. This ID is used by the workflow to tag the objects being processed to ensure that only objects that are scanned continue through the pipeline. When the EventBridge scheduled event starts the workflow as scheduled, an ID is included in the input to the Step Functions workflow. Then select Start execution again.
Figure 4: New execution dialog box
You can see the status of your workflow execution in the Graph inspector as shown in Figure 5. In the figure, the workflow is at the pollForCompletionWait step.
Figure 5: AWS Step Functions graph inspector
The sensitive discovery job should run for about five to ten minutes. The jobs scale linearly based on object size, but there is a start-up time per job that is constant. If sensitive data is found in the objects uploaded to the <BucketNamePrefix>-data-pipeline-upload S3 bucket, an email is sent to the address provided during the AWS SAM deployment step, notifying the recipient requesting of the need for an approval decision, which they indicate by selecting the link corresponding to their decision to approve or deny the next step as shown in Figure 6.
Figure 6: Sensitive data identified email
When you receive this notification, you can investigate the findings by reviewing the objects in the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket. Based on your review, you can either apply remediation steps to remove any sensitive data or allow the data to proceed to the next step of the data ingestion pipeline. You should define a standard response process to address discovery of sensitive data in the data pipeline. Common remediation steps include review of the files for sensitive data, deleting the files that you do not want to progress, and updating the ETL process to redact or tokenize sensitive data when re-ingesting into the pipeline. When you re-ingest the files into the pipeline without sensitive data, the files will not be flagged by Macie.
The workflow performs the following:
If you select Approve, the files are moved to the <BucketNamePrefix>-data-pipeline-scanned-data S3 bucket with an Amazon S3 SensitiveDataFound object tag with a value of true.
If you select Deny, the files are deleted from the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket.
If no action is taken, the Step Functions workflow execution times out after five days and the file will automatically be deleted from the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket after 10 days.
Clean up the application
You’ve successfully deployed and tested the sensitive data pipeline scan workflow. To avoid ongoing charges for resources you created, you should delete all associated resources by deleting the CloudFormation stack. In order to delete the CloudFormation stack, you must first delete all objects that are stored in the S3 buckets that you created for the application.
To delete the application
Empty the S3 buckets created in this application (<BucketNamePrefix>-data-pipeline-raw S3 bucket, <BucketNamePrefix>-data-pipeline-scan-stage, <BucketNamePrefix>-data-pipeline-manual-review, and <BucketNamePrefix>-data-pipeline-scanned-data).
Before using this application in a production data pipeline, you will need to stop and consider some practical matters. First, the notification mechanism used when sensitive data is identified in the objects is email. Email doesn’t scale: you should expand this solution to integrate with your ticketing or workflow management system. If you choose to use email, subscribe a mailing list so that the work of reviewing and responding to alerts is shared across a team.
Second, the application is run on a scheduled basis (every 6 hours by default). You should consider starting the application when your preliminary validations have completed and are ready to perform a sensitive data scan on the data as part of your pipeline. You can modify the EventBridge Event Rule to run in response to an Amazon EventBridge event instead of a scheduled basis.
Third, the application currently uses a 60 second Step Functions Wait state when polling for the Macie discovery job completion. In real world scenarios, the discovery scan will take 10 minutes at a minimum, likely several orders of magnitude longer. You should evaluate the typical execution times for your application execution and tune the polling period accordingly. This will help reduce costs related to running Lambda functions and log storage within CloudWatch Logs. The polling period is defined in the Step Functions state machine definition file (macie_pipeline_scan.asl.json) under the pollForCompletionWait state.
Fourth, the application currently doesn’t account for false positives in the sensitive data discovery job results. Also, the application will progress or delete all objects identified based on the decision by the reviewer. You should consider expanding the application to handle false positives through automation rather than manual review / intervention (such as deleting the files from the manual review bucket or removing the sensitive data tags applied).
Last, the solution will stop the ingestion of a subset of objects into your pipeline. This behavior is similar to other validation and data quality checks that most customers perform as part of the data pipeline. However, you should test to ensure that this will not cause unexpected outcomes and address them in your downstream application logic accordingly.
In this post, I showed you how to integrate sensitive data discovery using Macie as an additional validation step in an automated data pipeline. You’ve reviewed the components of the application, deployed it using the AWS SAM CLI, tested to validate that the application functions as expected, and cleaned up by removing deployed resources.
You now know how to integrate sensitive data scanning into your ETL pipeline. You can use automation and—where required—manual review to help reduce the risk of sensitive data, such as personally identifiable information, being inadvertently ingested into a data lake. You can take this application and customize it to fit your use case and workflows, such as using custom data identifiers as part of your scans, adding additional validation steps, creating Macie suppression rules to define cases to archive findings automatically, or only request manual approvals for findings that meet certain criteria (such as high severity findings).
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
AWS re:Invent will certainly be different in 2020! Instead of seeing you all in Las Vegas, this year re:Invent will be a free, three-week virtual conference. One thing that will remain the same is the variety of sessions, including many Security, Identity, and Compliance sessions. As we developed sessions, we looked to customers—asking where they would like to expand their knowledge. One way we did this was shared in a recent Security blog post, where we introduced a new customer polling feature that provides us with feedback directly from customers. The initial results of the poll showed that Identity and Access Management and Data Protection are top-ranking topics for customers. We wanted to highlight some of the re:Invent sessions for these two important topics so that you can start building your re:Invent schedule. Each session is offered at multiple times, so you can sign up for the time that works best for your location and schedule.
Managing your Identities and Access in AWS
AWS identity: Secure account and application access with AWS SSO Ron Cully, Principal Product Manager, AWS
AWS SSO provides an easy way to centrally manage access at scale across all your AWS Organizations accounts, using identities you create and manage in AWS SSO, Microsoft Active Directory, or external identity providers (such as Okta Universal Directory or Azure AD). This session explains how you can use AWS SSO to manage your AWS environment, and it covers key new features to help you secure and automate account access authorization.
Getting started with AWS identity services Becky Weiss, Senior Principal Engineer, AWS
The number, range, and breadth of AWS services are large, but the set of techniques that you need to secure them is not. Your journey as a builder in the cloud starts with this session, in which practical examples help you quickly get up to speed on the fundamentals of becoming authenticated and authorized in the cloud, as well as on securing your resources and data correctly.
AWS identity: Ten identity health checks to improve security in the cloud Cassia Martin, Senior Security Solutions Architect, AWS
Get practical advice and code to help you achieve the principle of least privilege in your existing AWS environment. From enabling logs to disabling root, the provided checklist helps you find and fix permissions issues in your resources, your accounts, and throughout your organization. With these ten health checks, you can improve your AWS identity and achieve better security every day.
AWS identity: Choosing the right mix of AWS IAM policies for scale Josh Du Lac, Principal Security Solutions Architect, AWS
This session provides both a strategic and tactical overview of various AWS Identity and Access Management (IAM) policies that provide a range of capabilities for the security of your AWS accounts. You probably already use a number of these policies today, but this session will dive into the tactical reasons for choosing one capability over another. This session zooms out to help you understand how to manage these IAM policies across a multi-account environment, covering their purpose, deployment, validation, limitations, monitoring, and more.
Zero Trust: An AWS perspective Quint Van Deman, Principal WW Identity Specialist, AWS
AWS customers have continuously asked, “What are the optimal patterns for ensuring the right levels of security and availability for my systems and data?” Increasingly, they are asking how patterns that fall under the banner of Zero Trust might apply to this question. In this session, you learn about the AWS guiding principles for Zero Trust and explore the larger subdomains that have emerged within this space. Then the session dives deep into how AWS has incorporated some of these concepts, and how AWS can help you on your own Zero Trust journey.
This session is for central security teams and developers who manage application permissions. This session reviews a permissions model that enables you to scale your permissions management with confidence. Learn how to set your organization up for access management success with permission guardrails. Then, learn about granting workforce permissions based on attributes, so they scale as your users and teams adjust. Finally, learn about the access analysis tools and how to use them to identify and reduce broad permissions and give users and systems access to only what they need.
Goldman Sachs takes security and access to AWS accounts seriously. While empowering teams with the freedom to build applications autonomously is critical for scaling cloud usage across the firm, guardrails and controls need to be set in place to enable secure administrative access. In this session, learn how the company built its credential brokering workflow and administrator access for its users. Learn how, with its simple application that uses proprietary and AWS services, including Amazon DynamoDB, AWS Lambda, AWS CloudTrail, Amazon S3, and Amazon Athena, Goldman Sachs is able to control administrator credentials and monitor and report on actions taken for audits and compliance.
Do you need an AWS KMS custom key store? Tracy Pierce, Senior Consultant, AWS
AWS Key Management Service (AWS KMS) has integrated with AWS CloudHSM, giving you the option to create your own AWS KMS custom key store. In this session, you learn more about how a KMS custom key store is backed by an AWS CloudHSM cluster and how it enables you to generate, store, and use your KMS keys in the hardware security modules that you control. You also learn when and if you really need a custom key store. Join this session to learn why you might choose not to use a custom key store and instead use the AWS KMS default.
Using certificate-based authentication on containers & web servers on AWS Josh Rosenthol, Senior Product Manager, AWS Kevin Rioles, Manager, Infrastructure & Security, BlackSky
In this session, BlackSky talks about its experience using AWS Certificate Manager (ACM) end-entity certificates for the processing and distribution of real-time satellite geospatial intelligence and monitoring. Learn how BlackSky uses certificate-based authentication on containers and web servers within its AWS environment to help make TLS ubiquitous in its deployments. The session details the implementation, architecture, and operations best practices that the company chose and how it was able to operate ACM at scale across multiple accounts and regions.
The busy manager’s guide to encryption Spencer Janyk, Senior Product Manager, AWS
In this session, explore the functionality of AWS cryptography services and learn when and where to deploy each of the following: AWS Key Management Service, AWS Encryption SDK, AWS Certificate Manager, AWS CloudHSM, and AWS Secrets Manager. You also learn about defense-in-depth strategies including asymmetric permissions models, client-side encryption, and permission segmentation by role.
Building post-quantum cryptography for the cloud Alex Weibel, Senior Software Development Engineer, AWS
This session introduces post-quantum cryptography and how you can use it today to secure TLS communication. Learn about recent updates on standards and existing deployments, including the AWS post-quantum TLS implementation (pq-s2n). A description of the hybrid key agreement method shows how you can combine a new post-quantum key encapsulation method with a classical key exchange to secure network traffic today.
Data protection at scale using Amazon Macie Neel Sendas, Senior Technical Account Manager, AWS
Data Loss Prevention (DLP) is a common topic among companies that work with sensitive data. If an organization can’t identify its sensitive data, it can’t protect it. Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. In this session, we will share details of the design and architecture you can use to deploy Macie at large scale.
While sessions are virtual this year, they will be offered at multiple times with live moderators and “Ask the Expert” sessions available to help answer any questions that you may have. We look forward to “seeing” you in these sessions. Please see the re:Invent agenda for more details and to build your schedule.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
In case you aren’t familiar with the security chapters, they were developed to provide easy-to-find, easy-to-consume security content in existing service documentation, so you don’t have to refer to multiple sources when reviewing the security capabilities of an AWS service. The chapters align with the Security Epics of the AWS Cloud Adoption Framework (CAF), including information about the security ‘of’ the cloud and security ‘in’ the cloud, as outlined in the AWS Shared Responsibility Model. The chapters cover the following security topics from the CAF, as applicable for each AWS service:
Identity and access management
Logging and monitoring
Configuration and vulnerability analysis
Security best practices
These topics also align with the control domains of many industry-recognized standards that customers use to meet their compliance needs when using cloud services. This enables customers to evaluate the services against the frameworks they are already using.
We thought it might be helpful to share some of the ways that we’ve seen our customers and partners use the security chapters as a resource to both assess services and configure them securely. We’ve seen customers develop formal service-by-service assessment processes that include key considerations, such as achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security, when determining how cloud services can help them address their regulatory obligations.
To support their cloud journey and digital transformation, Fidelity Investments established a Cloud Center of Excellence (CCOE) to assist and enable Fidelity business units to safely and securely adopt cloud services at scale. The CCOE security team created a collaborative approach, inviting business units to partner with them to identify use cases and perform service testing in a safe environment. This ongoing process enables Fidelity business units to gain service proficiency while working directly with the security team so that risks are properly assessed, minimized, and evidenced well before use in a production environment.
Steve MacIntyre, Cloud Security Lead at Fidelity Investments, explains how the availability of the chapters assists them in this process: “As a diversified financial services organization, it is critical to have a deep understanding of the security, data protection, and compliance features for each AWS offering. The AWS security “chapters” allow us to make informed decisions about the safety of our data and the proper configuration of services within the AWS environment.”
Information found in the security chapters has also been used by customers as key inputs in refining their cloud governance, and helping customers to balance agility and innovation, while remaining secure as they adopt new services. Outlining customer responsibilities that are laid out under the AWS Shared Responsibility Model, the chapters have influenced the refinement of service assessment processes by a number of AWS customers, enabling customization to meet specific control objectives based on known use cases.
For example, when AWS Partner Network (APN) Partner Deloitte works on cloud strategies with organizations, they advise on topics that range from enterprise-wide cloud adoption to controls needed for specific AWS services.
Devendra Awasthi, Cloud Risk & Compliance Leader at Deloitte & Touche LLP, explained that, “When working with companies to help develop a secure cloud adoption framework, we don’t want them to make assumptions about shared responsibility that lead to a false sense of security. We advise clients to use the AWS service security chapters to identify their responsibilities under the AWS Shared Responsibility Model; the chapters can be key to informing their decision-making process for specific service use.”
Partners and customers, including Deloitte and Fidelity, have been helpful by providing feedback on both the content and structure of the security chapters. Service teams will continue to update the security chapters as new features are released, and in the meantime, we would appreciate your input to help us continue to expand the content. You can give us your feedback by selecting the Feedback button in the lower right corner of any documentation page. We look forward to learning how you use the security chapters within your organization.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
AWS developed the Well-Architected Framework to help customers build secure, high-performing, resilient, and efficient infrastructure for their applications. The Well-Architected Framework is based on five pillars — operational excellence, security, reliability, performance efficiency, and cost optimization. We periodically update the framework as we get feedback from customers, partners, and other teams within AWS. Along with the AWS Well-Architected whitepapers, there is an AWS Well-Architected Tool to help you review your architecture for alignment to these best practices.
Changes to identity and access management
The biggest changes are related to identity and access management, and how you operate your workload. Both the security pillar whitepaper, and the review in the tool, now start with the topic of how to securely operate your workload. Instead of just securing your root user, we want you to think about all aspects of your AWS accounts. We advise you to configure the contact information for your account, so that AWS can reach out to the right contact if needed. And we recommend that you use AWS Organizations with Service Control Policies to manage the guardrails on your accounts.
A new best practice is to identify and validate control objectives. This new best practice is about having objectives built from your own threat model, not just a list of controls to help you measure the effectiveness of your risk mitigation. Adding to that is the best practice to automate testing and validation of security controls in pipelines.
Identity and access management is no longer split between credentials and authentication, human and programmatic access. There are now just two questions which have a simpler approach, and which focus on identity and permissions. These questions don’t draw a distinction between humans and machines. You should think about how identity and permissions are applied to your AWS environment, to the systems supporting your workload, and to the workload itself. New best practices include the following:
Define permission guardrails for your organization – Establish common controls that restrict access to all identities in your organization, such as restricting which AWS Regions can be used.
Reduce permissions continuously – As teams and workloads determine what access they need, you should remove the permissions they no longer use, and you should establish review processes to achieve least privilege permissions.
Establish an emergency access process – Have a process that allows emergency access to your workload, so that you can react appropriately in the unlikely event of an issue with an automated process or the pipeline.
Analyze public and cross account access – Continuously monitor findings that highlight public and cross-account access.
Share resources securely – Govern the consumption of shared resources across accounts, or within your AWS Organization.
Changes to detective controls
The section that was previously called detective controls is now simply called detection. The main update in this section is a new best practice called automate response to events, which covers automating your investigation processes, alerts, and remediation. For example, you can use Amazon GuardDuty managed threat detection findings, or the new AWS Security Hub foundational best practices, to trigger a notification to your team with Amazon CloudWatch Events, and you can use an AWS Step Function to coordinate or automatically remediate what was found.
Changes to infrastructure protection
From a networking and compute perspective, there are only a few minor refinements. A new best practice in this section is to enable people to perform actions at a distance. This aligns to the design principle: keep people away from data. For example, you can use AWS Systems Manager to remotely run commands on an EC2 instance without needing to open ports outside your network or interactively connect to instances, which reduces the risk of human error.
Changes to data protection
One update in the best practice for data at rest is that we changed provide mechanisms to keep people away from data to use mechanisms to keep people away from data. Simply providing mechanisms is good, but it’s better if they are actually used.
The final section of the security pillar is incident response. The new best practice in this section is automate containment and recovery capability. Your incident response should already include plans, pre-deployed tools, and access, so your next step should be automation. The incident response section of the whitepaper has been rewritten and takes on many parts of the very helpful AWS Security Incident Response Guide. You should also check out the incident response hands-on lab Incident Response Playbook with Jupyter – AWS IAM. This lab uses the power of Jupyter notebooks to perform interactive queries against AWS APIs for semi-automated playbooks. As always, a best practice is to plan and practice your incident response so you can continue to learn and improve as you operate your workloads securely in AWS.
As a Principal Solutions Architect within the Worldwide Financial Services industry group, one of the most frequently asked questions I receive is whether a particular AWS service is financial-services-ready. In a regulated industry like financial services, moving to the cloud isn’t a simple lift-and-shift exercise. Instead, financial institutions use a formal service-by-service assessment process, often called whitelisting, to demonstrate how cloud services can help address their regulatory obligations. When this process is not well defined, it can delay efforts to migrate data to the cloud.
In this post, I will provide a framework consisting of five key considerations that financial institutions should focus on to help streamline the whitelisting of cloud services for their most confidential data. I will also outline the key AWS capabilities that can help financial services organizations during this process.
Here are the five key considerations:
Isolation of compute environments
Automating audits with APIs
Operational access and security
For many of the business and technology leaders that I work with, agility and the ability to innovate quickly are the top drivers for their cloud programs. Financial services institutions migrate to the cloud to help develop personalized digital experiences, break down data silos, develop new products, drive down margins for existing products, and proactively address global risk and compliance requirements. AWS customers who use a wide range of AWS services achieve greater agility as they move through the stages of cloud adoption. Using a wide range of services enables organizations to offload undifferentiated heavy lifting to AWS and focus on their core business and customers.
My goal is to guide financial services institutions as they move their company’s highly confidential data to the cloud — in both production environments and mission-critical workloads. The following considerations will help financial services organizations determine cloud service readiness and to achieve success in the cloud.
1. Achieving compliance
For financial institutions that use a whitelisting process, the first step is to establish that the underlying components of the cloud service provider’s (CSP’s) services can meet baseline compliance needs. A key prerequisite to gaining this confidence is to understand the AWS shared responsibility model. Shared responsibility means that the secure functioning of an application on AWS requires action on the part of both the customer and AWS as the CSP. AWS customers are responsible for their security in the cloud. They control and manage the security of their content, applications, systems, and networks. AWS manages security of the cloud, providing and maintaining proper operations of services and features, protecting AWS infrastructure and services, maintaining operational excellence, and meeting relevant legal and regulatory requirements.
In order to establish confidence in the AWS side of the shared responsibility model, customers can regularly review the AWS System and Organization Controls 2 (SOC 2) Type II report prepared by an independent, third-party auditor. The AWS SOC 2 report contains confidential information that can be obtained by customers under an AWS non-disclosure agreement (NDA) through AWS Artifact, a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.
Key takeaway: Currently, 116 AWS services are in scope for SOC compliance, which will help organizations streamline their whitelisting process. For more information about which services are in scope, see AWS Services in Scope by Compliance Program.
2. Data protection
Financial institutions use comprehensive data loss prevention strategies to protect confidential information. Customers using AWS data services can employ encryption to mitigate the risk of disclosure, alteration of sensitive information, or unauthorized access. The AWS Key Management Service (AWS KMS) allows customers to manage the lifecycle of encryption keys and control how they are used by their applications and AWS services. Allowing encryption keys to be generated and maintained in the FIPS 140-2 validated hardware security modules (HSMs) in AWS KMS is the best practice and most cost-effective option.
For AWS customers who want added flexibility for key generation and storage, AWS KMS allows them to either import their own key material into AWS KMS and keep a copy in their on-premises HSM, or generate and store keys in dedicated AWS CloudHSM instances under their control. For each of these key material generation and storage options, AWS customers can control all the permissions to use keys from any of their applications or AWS services. In addition, every use of a key or modification to its policy is logged to AWS CloudTrail for auditing purposes. This level of control and audit over key management is one of the tools organizations can use to address regulatory requirements for using encryption as a data privacy mechanism.
All AWS services offer encryption features, and most AWS services that financial institutions use integrate with AWS KMS to give organizations control over their encryption keys used to protect their data in the service. AWS offers customer-controlled key management features in twice as many services as any other CSP.
Financial institutions also encrypt data in transit to ensure that it is accessed only by the intended recipient. Encryption in transit must be considered in several areas, including API calls to AWS service endpoints, encryption of data in transit between AWS service components, and encryption in transit within applications. The first two considerations fall within the AWS scope of the shared responsibility model, whereas the latter is the responsibility of the customer.
All AWS services offer Transport Layer Security (TLS) 1.2 encrypted endpoints that can be used for all API calls. Some AWS services also offer FIPS 140-2 endpoints in selected AWS Regions. These FIPS 140-2 endpoints use a cryptographic library that has been validated under the Federal Information Processing Standards (FIPS) 140-2 standard. For financial institutions that operate workloads on behalf of the US government, using FIPS 140-2 endpoints helps them to meet their compliance requirements.
To simplify configuring encryption in transit within an application, which falls under the customer’s responsibility, customers can use the AWS Certificate Manager (ACM) service. ACM enables easy provisioning, management, and deployment of x.509 certificates used for TLS to critical application endpoints hosted in AWS. These integrations provide automatic certificate and private key deployment and automated rotation for Amazon CloudFront, Elastic Load Balancing, Amazon API Gateway, AWS CloudFormation, and AWS Elastic Beanstalk. ACM offers both publicly-trusted and private certificate options to meet the trust model requirements of an application. Organizations may also import their existing public or private certificates to ACM to make use of existing public key infrastructure (PKI) investments.
Key takeaway: AWS KMS allows organizations to manage the lifecycle of encryption keys and control how encryption keys are used for over 50 services. For more information, see AWS Services Integrated with AWS KMS. AWS ACM simplifies the deployment and management of PKI as compared to self-managing in an on-premises environment.
3. Isolation of compute environments
Financial institutions have strict requirements for isolation of compute resources and network traffic control for workloads with highly confidential data. One of the core competencies of AWS as a CSP is to protect and isolate customers’ workloads from each other. Amazon Virtual Private Cloud (Amazon VPC) allows customers to control their AWS environment and keep it separate from other customers’ environments. Amazon VPC enables customers to create a logically separate network enclave within the Amazon Elastic Compute Cloud (Amazon EC2) network to house compute and storage resources. Customers control the private environment, including IP addresses, subnets, network access control lists, security groups, operating system firewalls, route tables, virtual private networks (VPNs), and internet gateways.
Amazon VPC provides robust logical isolation of customers’ resources. For example, every packet flow on the network is individually authorized to validate the correct source and destination before it is transmitted and delivered. It is not possible for information to pass between multiple tenants without specifically being authorized by both the transmitting and receiving customers. If a packet is being routed to a destination without a rule that matches it, the packet is dropped. AWS has also developed the AWS Nitro System, a purpose-built hypervisor with associated custom hardware components that allocates central processing unit (CPU) resources for each instance and is designed to protect the security of customers’ data, even from operators of production infrastructure.
For more information about the isolation model for multi-tenant compute services, such as AWS Lambda, see the Security Overview of AWS Lambda whitepaper. When Lambda executes a function on a customer’s behalf, it manages both provisioning and the resources necessary to run code. When a Lambda function is invoked, the data plane allocates an execution environment to that function or chooses an existing execution environment that has already been set up for that function, then runs the function code in that environment. Each function runs in one or more dedicated execution environments that are used for the lifetime of the function and are then destroyed. Execution environments run on hardware-virtualized lightweight micro-virtual machines (microVMs). A microVM is dedicated to an AWS account, but can be reused by execution environments across functions within an account. Execution environments are never shared across functions, and microVMs are never shared across AWS accounts. AWS continues to innovate in the area of hypervisor security, and resource isolation enables our financial services customers to run even the most sensitive workloads in the AWS Cloud with confidence.
Most financial institutions require that traffic stay private whenever possible and not leave the AWS network unless specifically required (for example, in internet-facing workloads). To keep traffic private, customers can use Amazon VPC to carve out an isolated and private portion of the cloud for their organizational needs. A VPC allows customers to define their own virtual networking environments with segmentation based on application tiers.
To connect to regional AWS services outside of the VPC, organizations may use VPC endpoints, which allow private connectivity between resources in the VPC and supported AWS services. Endpoints are managed virtual devices that are highly available, redundant, and scalable. Endpoints enable private connection between a customer’s VPC and AWS services using private IP addresses. With VPC endpoints, Amazon EC2 instances running in private subnets of a VPC have private access to regional resources without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Furthermore, when customers create an endpoint, they can attach a policy that controls the use of the endpoint to access only specific AWS resources, such as specific Amazon Simple Storage Service (Amazon S3) buckets within their AWS account. Similarly, by using resource-based policies, customers can restrict access to their resources to only allow access from VPC endpoints. For example, by using bucket policies, customers can restrict access to a given Amazon S3 bucket only through the endpoint. This ensures that traffic remains private and only flows through the endpoint without traversing public address space.
Key takeaway: To help customers keep traffic private, more than 45 AWS services have support for VPC Endpoints.
4. Automating audits with APIs
Visibility into user activities and resource configuration changes is a critical component of IT governance, security, and compliance. On-premises logging solutions require installing agents, setting up configuration files and log servers, and building and maintaining data stores to store the data. This complexity may result in poor visibility and fragmented monitoring stacks, which in turn takes longer to troubleshoot and resolve issues. CloudTrail provides a simple, centralized solution to record AWS API calls and resource changes in the cloud that helps alleviate this burden.
CloudTrail provides a history of activity in a customer’s AWS account to help them meet compliance requirements for their internal policies and regulatory standards. CloudTrail helps identify who or what took which action, what resources were acted upon, when the event occurred, and other details to help customers analyze and respond to activity in their AWS account. CloudTrail management events provide insights into the management (control plane) operations performed on resources in an AWS account. For example, customers can log administrative actions, such as creation, deletion, and modification of Amazon EC2 instances. For each event, they receive details such as the AWS account, IAM user role, and IP address of the user that initiated the action as well as time of the action and which resources were affected.
CloudTrail data events provide insights into the resource (data plane) operations performed on or within the resource itself. Data events are often high-volume activities and include operations, such as Amazon S3 object-level APIs, and AWS Lambda function Invoke APIs. For example, customers can log API actions on Amazon S3 objects and receive detailed information, such as the AWS account, IAM user role, IP address of the caller, time of the API call, and other details. Customers can also record activity of their Lambda functions and receive details about Lambda function executions, such as the IAM user or service that made the Invoke API call, when the call was made, and which function was executed.
To help customers simplify continuous compliance and auditing, AWS uniquely offers the AWS Config service to help them assess, audit, and evaluate the configurations of AWS resources. AWS Config continuously monitors and records AWS resource configurations, and allows customers to automate the evaluation of recorded configurations against internal guidelines. With AWS Config, customers can review changes in configurations and relationships between AWS resources and dive into detailed resource configuration histories.
Key takeaway: Over 160 AWS services are integrated with CloudTrail, which helps customers ensure compliance with their internal policies and regulatory standards by providing a history of activity within their AWS account. For more information about how to use CloudTrail with specific AWS services, see AWS Service Topics for CloudTrail in the CloudTrail user guide. For more information on how to enable AWS Config in an environment, see Getting Started with AWS Config.
5. Operational access and security
In our discussions with financial institutions, they’ve told AWS that they are required to have a clear understanding of access to their data. This includes knowing what controls are in place to ensure that unauthorized access does not occur. AWS has implemented layered controls that use preventative and detective measures to ensure that only authorized individuals have access to production environments where customer content resides. For more information about access and security controls, see the AWS SOC 2 report in AWS Artifact.
One of the foundational design principles of AWS security is to keep people away from data to minimize risk. As a result, AWS created an entirely new virtualization platform called the AWS Nitro System. This highly innovative system combines new hardware and software that dramatically increases both performance and security. The AWS Nitro System enables enhanced security with a minimized attack surface because virtualization and security functions are offloaded from the main system board where customer workloads run to dedicated hardware and software. Additionally, the locked-down security model of the AWS Nitro System prohibits all administrative access, including that of Amazon employees, which eliminates the possibility of human error and tampering.
Key takeaway: Review third-party auditor reports (including SOC 2 Type II) available in AWS Artifact, and learn more about the AWS Nitro System.
AWS can help simplify and expedite the whitelisting process for financial services institutions to move to the cloud. When organizations take advantage of a wide range of AWS services, it helps maximize their agility by making use of the existing security and compliance measures built into AWS services to complete whitelisting so financial services organizations can focus on their core business and customers.
After organizations have completed the whitelisting process and determined which cloud services can be used as part of their architecture, the AWS Well-Architected Framework can then be implemented to help build and operate secure, resilient, performant, and cost-effective architectures on AWS.
AWS also has a dedicated team of financial services professionals to help customers navigate a complex regulatory landscape, as well as other resources to guide them in their migration to the cloud – no matter where they are in the process. For more information, see the AWS Financial Services page, or fill out this AWS Financial Services Contact form.
AWS Security Documentation The security documentation repository shows how to configure AWS services to help meet security and compliance objectives. Cloud security at AWS is the highest priority. AWS customers benefit from a data center and network architecture that are built to meet the requirements of the most security-sensitive organizations.
AWS Compliance Center The AWS Compliance Center is an interactive tool that provides customers with country-specific requirements and any special considerations for cloud use in the geographies in which they operate. The AWS Compliance Center has quick links to AWS resources to help with navigating cloud adoption in specific countries, and includes details about the compliance programs that are applicable in these jurisdictions. The AWS Compliance Center covers many countries, and more countries continue to be added as they update their regulatory requirements related to technology use.
Typically, when you protect data in Amazon Simple Storage Service (Amazon S3), you use a combination of Identity and Access Management (IAM) policies and S3 bucket policies to control access, and you use the AWS Key Management Service (AWS KMS) to encrypt the data. This approach is well-understood, documented, and widely implemented. However, many customers want to extend the value of encryption beyond basic protection against unauthorized access to the storage layer where the data resides. They want to enforce a separation of duties between which team manages access to the storage layer and which team manages access to the encryption keys. This model ensures that configuration errors made by only one of these teams won’t compromise the data in ways that grant unauthorized access to plaintext data. For example, if the team that owns permissions to the S3 bucket mistakenly grants access to unauthorized users, when those users attempt to access objects in S3 they will fail. Why? Because the separate team who manages access to the keys didn’t grant those users access to use the keys for decryption.
You can create this kind of independent access control by combining KMS encryption with IAM policies and S3 bucket policies. When data is encrypted with a customer-managed KMS customer master key (CMK), the key’s policy acts as an independent access control. Users can be prevented from accessing the data, even though the IAM permissions and the S3 bucket policy would permit the access. Figure 1 shows a Venn diagram of the access that is required. The bucket policy, the IAM policy, and the KMS key policy all play a role. Users have permission for the data only when they are granted permissions in all three policies.
Figure 1: Venn diagram showing the required permissions for access
This exercise builds the resources shown in Figure 2:
Three AWS IAM roles
A role (1) with permission to create and manage permissions on an S3 bucket (secure-bucket-admin)
A role (2) with permission to create and manage permissions on a KMS master key (secure-key-admin)
A role (3) with permissions to access (but not manage) a specific S3 bucket and to use (but not manage) a specific AWS KMS customer master key (authorized-users).
An S3 bucket (4) with a custom bucket policy (5) that only allows data to be stored if that data is encrypted with a specific KMS key. The ability to write to or read from this bucket will be restricted to the IAM role authorized-users.
A KMS key (6) with a specific key policy (7) that can only be used by the IAM role authorized-users and only managed by the IAM user secure-key-admin.
Figure 2: Architecture diagram
When you have completed this exercise, you will have:
Created an S3 bucket protected by IAM policies, and a bucket policy that enforces encryption.
Attached the IAM role authorized-users to an EC2 instance so your applications in that instance can assume that role and access encrypted data in the S3 bucket.
Uploaded and downloaded data from the bucket that is protected by the KMS key.
Demonstrated that when the KMS key policy is modified, removing access for the IAM role authorized-users, the applications on the EC2 instance no longer have access to the data in the S3 bucket.
Set things up
For simplicity, I create the S3 bucket, KMS keys, and EC2 instances all in the same region and in the same AWS account. It’s possible to use KMS keys that are owned by a different AWS account, to assume roles across accounts, and to have instances in different regions from the buckets and the keys. I discuss those variations at the end.
I assume you have at least one administrator identity available to you already: one that has broad rights for creating users, creating roles, managing KMS keys, and launching EC2 instances. I will refer to this as your “Admin identity” throughout these instructions. This can be a federated identity (for example, from your corporate identity provider or from a social identity), or it can be an AWS IAM user.
First, I will create 3 policies that grant very specific sets of rights. Then, I will attach those policies to roles: two roles for administrators, and one for software running on EC2 instances. You’re going to create an S3 bucket in Step 3. That bucket, like all S3 buckets, needs a globally unique name. You will reference that bucket’s name in these policies, even though you will create the bucket later. Decide the name of your bucket now. When you reach steps that require you to type or paste a JSON policy document for your bucket policy, remember to use the name of your bucket where I have written secure-demo-bucket.
Step 1a: Create the S3 bucket management policy
While logged in to the console as your Admin user, create an IAM policy in the web console using the JSON tab. Name the policy secure-bucket-admin. When you reach the step to type or paste a JSON policy document, paste the JSON from Listing 1 below. This policy allows broad S3 administration rights (creating, deleting, and modifying policies), so it is a high privilege policy. In an effort to be concise, it grants all permissions to S3 and then takes a few away by explicitly denying them. The intention is to permit managing all aspects of the bucket’s operation, while denying all access to the contents of the bucket. The explicit deny mechanism is important because, due to IAM’s policy evaluation logic, an explicit deny cannot be overridden by subsequent “allow” statements or by attaching additional policies. As the S3 service evolves over time and new features are added, the policy will permit using those new features, without any change to this policy. If you prefer to enable features explicitly, you’ll need to rewrite this policy to explicitly allow only the features you want, and then come back and revise the policy every so often, as S3 features are added that your role needs to use.
Your policy will have an ARN (it will look something like arn:aws:iam::111122223333:policy/secure-bucket-admin). Make a note of this ARN. You will use it later to attach to the secure-bucket-admin role you’ll create in step 2.
Step 1b: Create the KMS administrator policy
While logged in to the console as your Admin user, create an IAM policy in the web console using the JSON tab. Name the policy secure-key-admin. When you reach the step to type or paste a JSON policy document, paste the JSON from Listing 2 below. Be sure to add your own 12-digit AWS account number where I have written 111122223333. This policy allows broad KMS administration rights (creating keys, granting access to keys, and modifying key policies), so it is a high privilege policy. In an effort to be concise, this policy grants all permissions to the KMS service and then denies certain rights through an explicit deny statement. The intention is to permit managing all aspects of KMS keys, while denying all access to perform encryption and decryption using KMS keys. As the KMS service evolves over time and new features are added, the policy will permit using those new features, without any change to this policy. If you prefer to enable features explicitly, you’ll need to rewrite this policy to explicitly allow only the features you want, and then come back and revise the policy every so often, as KMS features are added that your role needs to use.
Your policy will have an ARN (it will look something like arn:aws:iam::111122223333:policy/secure-key-admin). Make a note of this ARN. You will use it later to attach to the secure-key-admin role you’ll create in step 2.
Step 1c: Create the S3 bucket usage policy
This final policy grants access to read and write encrypted data in the target S3 bucket. This is a narrowly-scoped policy that only grants rights to a single bucket. While logged in to the console as your Admin user, create an IAM policy in the web console using the JSON tab. Name the policy secure-bucket-access.
When you reach the step to type or paste a JSON policy document for your bucket policy, paste the JSON from Listing 3 below, substituting the name of your bucket on the two lines where I have secure-demo-bucket.
Note: In an effort to grant a minimal, but realistic, set of permissions, this IAM policy only grants access to basic get, put, and delete operations. You might have a use for other features, like tagging objects. If so, you will need to change the policy to enable the features you want to use.
Your policy will have an ARN (it will look something like arn:aws:iam::111122223333:policy/secure-bucket-access). Make a note of this ARN. You will use it later to attach to the authorized-users role you’ll create in step 2.
You might ask why this policy designed to control access to encrypted objects has no KMS permissions in it. Wouldn’t that prevent the users that assume this IAM role from using the encryption keys? It would normally prevent them, except you have the ability to list the authorized-users IAM role within the resource policy attached to the KMS key you’re about to create. By placing the authorized-users role in the KMS key resource policy, it further enforces the separation of duties so administrators in the account with an ability to modify IAM policies don’t inadvertently escalate privilege to other IAM users/roles and give them permissions to use KMS keys for decryption.
Step 2: Create IAM roles
An AWS IAM role is an identity that you can create in an AWS account that has specific permissions. An IAM role is similar to an IAM user, because it has permission policies that determine what the identity can and cannot do in AWS. It’s different from an IAM user because it’s not associated with a single person. A role can be used by users, by EC2 instances, by AWS services, or by other entities like AWS Lambda functions that you allow to use it. The IAM policies we created in step 1 do not grant permissions until we assign them to roles and assign the roles to users or entities.
Step 2a: Create the S3 bucket management role
This role will be used by administrators who need to manage the properties of the bucket.
Choose Another AWS account under the section labeled Select type of trusted entity.
For the authorized AWS account ID, enter the 12-digit account number for the account that you’re working in. If you intend to authorize AWS IAM users that are defined in a different AWS IAM account to access the S3 bucket and decrypt objects, then you would include that AWS account’s ID number, instead.
Name the IAM role secure-bucket-admin and import the customer managed policy named secure-bucket-admin that you created in step 1a to the role that you have created.
Your AWS IAM role will have an ARN (it will look something like arn:aws:iam::111122223333:role/secure-bucket-admin). Make a note of this ARN. You will use it in the step 3 when you create your S3 bucket.
Step 2b: Create the KMS key management role
This role will be used by administrators who need to manage the KMS customer master keys that protect the data. The actions you take to manage the keys will be authorized by this role. Importantly, this role has no ability to modify the bucket, grant access to the bucket, or access any of the data in the bucket.
In the Select type of trusted entity section, select Another AWS account.
For the authorized AWS account ID, enter the 12-digit account number for the account that you’re working in. If you intend to authorize AWS IAM users that are defined in a different AWS IAM account, then you would include that AWS account’s ID number, instead.
Name the IAM role secure-key-admin and import the customer-managed policy named secure-key-admin that you created in step 1b to the role that you have created.
Your AWS IAM role will have an ARN (it will look something like arn:aws:iam::111122223333:role/secure-key-admin). Make a note of this ARN. You will use it in step 4 when you create your KMS key.
Step 2c. Create the bucket usage role
This role will grant permissions to EC2 instances. An EC2 instance running with this role will be able to create and read encrypted data in the protected S3 bucket.
In the Select type of trusted entity section, select AWS service.
Choose EC2 as the service that you will authorize. This authorizes all applications running on that EC2 instance to use credentials with permissions attached to the role.
Name the IAM role authorized-users and import the customer-managed secure-bucket-access policy that you created in step 1c to the role that you have created.
This role is not for users trying to access the S3 bucket from any arbitrary application that happens to have the role’s credentials. It will only be used by users operating within applications running in AWS EC2 instances.
Step 3: Create an S3 bucket for the encrypted data
Log in to the console using your secure-bucket-admin role. (Either log in with the correct federated identity, or with the AWS IAM user you created in step 1d). Follow the instructions to create a bucket that will hold the encrypted data. In my example, I call my bucket secure-demo-bucket. You chose your own unique bucket name back in step 1. Type that bucket name throughout these steps where I use secure-demo-bucket. You will set a bucket policy and properties on that bucket later.
Step 4: Create a KMS key to encrypt and decrypt the data in the S3 bucket
Log out of the console and log back in using your secure-key-admin role. Create a customer-managed customer master key (CMK) to encrypt and decrypt the data in the S3 bucket you just created. If you already have a customer-managed CMK created that you want to use for this purpose, you can do that. To use your own CMK, skip steps 1-5 below about creating a key and, instead, select your existing key in the KMS console and then follow steps 6-8 to change the key policy to allow the authorized-users role permissions to use the key.
In the AWS console, go to Key Management Service.
Select the Create Key button.
On the Step 1 screen, set a display name (called an “Alias”) for the key and a description. I recommend a meaningful description that tells others what the key is for.
On the Step 2 screen, set tags if you need them to track usage of keys for billing purposes. Tags won’t have a functional impact in this exercise so you can skip this step if you want by selecting Next.
On the Step 3 screen, select key administrators. Pick only the secure-key-admin IAM role. You must not pick the secure-bucket-admin role or the authorized-users role as key administrators to ensure separation of duties. For example, if you were to pick the authorized-users IAM role, then any user that assumed that role could escalate their own (or others’) privileges to use this key to decrypt any other data encrypted under this key in your account. If you were to pick the secure-bucket-admin user, then that user could modify permissions both on the S3 bucket and the KMS key in ways that allowed unauthorized users access to decrypt data.
On the Step 4 screen, select key users. Pick only the authorized-users IAM role you created in step 2c.
On the Step 5 screen, select Finish.
After you have created the key, make note of the key’s ARN. It will look something like this:
You will need it for the next step where you enforce all objects uploaded into the S3 bucket to be encrypted under this key.
Step 5: Modify the bucket policy
Log out of the console and log back with the secure-bucket-admin role. You’re going to attach a bucket policy to the bucket that does two things: it requires objects to be encrypted and it requires them to be encrypted with a specific KMS key. You will accomplish this by explicitly denying any attempt to call PutObject unless the correct conditions are true. This helps you increase your confidence that you will not store unencrypted data in this bucket.
Find the secure-demo-bucket bucket in the S3 web console, and then modify its bucket policy. Use the code from Listing 4 below as the entire bucket policy. Be sure to change secure-demo-bucket to the actual name of the bucket that you’re using in both places where it appears in the policy. You recorded the key’s ARN in step 4, make sure you insert that ARN for your KMS key where I use an example key ARN below.
Note: This bucket policy is not retroactive: If you apply this policy to a bucket that already exists and already has unencrypted objects, nothing happens to the objects that are already in the bucket. They remain unencrypted. They can be fetched or deleted. Once the policy is applied, however, new objects cannot be put in the bucket unless they are correctly encrypted.
Instead of applying a bucket policy, you could consider turning on S3 default encryption. This feature forces all new objects uploaded to an S3 bucket to be encrypted using the KMS key you created in step 4 unless the user specifies a different key. This feature doesn’t prohibit callers from encrypting objects under other KMS keys, but it ensures that the data is protected even if the user does not specify KMS encryption when putting the object. The bucket policy in Listing 4 is a bit stricter than S3 default encryption because it ensures that no object is ever encrypted by any key other than the CMK created in step 4. That strictness means the attempt to put an object fails, unless the caller explicitly names the KMS keyId in every S3 PUT request. With S3 default encryption, attempts to put an object without specifying encryption will succeed, and the data will be protected by the named KMS CMK.
Step 7: Launch an EC2 instance to demonstrate the solution
The final step to showing how this solution works is to launch an EC2 instance and show that applications running in that instance can write and read data in the S3 bucket you created. If you launch an EC2 instance that has your authorized-users role attached and log in on that instance, you will be able to upload and download objects from the bucket, encrypting and decrypting transparently as you do it. No other identity (for example, other IAM users, other IAM roles, other EC2 instances, and Lambda functions) will be able to upload and download data to this S3 bucket because these other identities don’t have the permissions to use the KMS key that protects the data.
Choose an instance type. Any instance type will work. If you launch an Amazon Linux t2.micro instance, it might qualify for free tier pricing.
For IAM Role, select the authorized-users role from the drop-down menu.
Make sure you specify an SSH key that you have access to, and make sure that you have a way to reach the EC2 instance over the network.
Satisfy yourself that it works as expected
At this point, the solution is complete and is running. I want to demonstrate that the KMS key is providing the independent access controls the way I said it would. I will modify the key policy to remove the instance’s rights to use the KMS key. Then, I will confirm that the commands that had succeeded before now fail after the key policy change. This shows how the KMS key and its policy are completely independent of the S3 bucket policies and the IAM policies.
You will need to download a file onto the EC2 instance that you can then upload, encrypted, to the S3 bucket. If you don’t have a file that you want to use, you can use the AWS Cryptographic Details whitepaper as a reasonable test file.
On the instance, run the following command to download a local copy of the AWS Cryptographic Details whitepaper that you can use as test data:
Side note: You should also read this whitepaper. It’s very informative on how AWS KMS is built and operated to secure your encryption keys.
On the EC2 instance, use the AWS command line to upload the file to the S3 bucket. Note all the options that tell S3 to use KMS encryption and to use the correct key ID. Remember to insert the bucket name for the bucket that you’re using and the ARN of your KMS key from step 4 above.
If all went well, you should see a message like the following, showing that the object was uploaded successfully:
upload: ./KMS-Cryptographic-Details.pdf to s3://secure-demo-bucket/KMS-Cryptographic-Details.pdf
Test 2: Upload an Unencrypted Object
You can now prove the fact that a user on this instance attempting to upload unencrypted objects will fail. Run this command to upload a second copy of the PDF file to be called test2.pdf. Be sure to substitute your bucket’s name into the command.
You’ll notice this command doesn’t include the options instructing S3 to use KMS to encrypt the file. You should see this error message:
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
If you see no error, then double-check that your bucket policy in Step 5 above is correct.
Test 3: Downloading Encrypted Objects
You’ve now proven that the EC2 instance can upload encrypted objects and that unencrypted objects are refused. Now, you can prove that the EC2 instance has access to cause S3 to decrypt the encrypted object in the bucket using the KMS keys. Here’s how: While still on your EC2 instance, run this command, substituting your bucket name, to download a copy of the PDF file:
If this command succeeds, then you will have a file in your current directory on your EC2 instance named test3.pdf. That shows that you have successfully decrypted and downloaded the PDF file.
Test 4: Demonstrate that the key policy regulates access
Now, I will demonstrate the independence of access control provided by the KMS key policy. Leaving the bucket policy and IAM role/policy as they are, you will disable the EC2 instance’s access to the objects using the KMS key policy. The IAM policy for S3 and the bucket policy on the bucket would still normally permit the EC2 instance to access the data. But, because the KMS key policy will prevent use of the key by the authorized-users IAM role, S3 will fail to encrypt or decrypt the object. This means that any commands that execute on the EC2 instance will no longer be able to upload or download data from the S3 bucket.
First, modify the key policy.
Log out of the console and log back in under the secure-key-admin user. Go to the Key Management Service console.
In the left-hand navigation, select Customer managed keys and look for the key with the alias or Key ID that you’re using. The Key ID is the last 32 characters of the full key ARN.
Select the Key ID for the key that you’re using to get to the screen where you can edit the key policy.
In the list of Key users, you will see your authorized-users role listed. Select that role, and then select the Remove button to remove its access to use the KMS key.
At this point, the EC2 instance no longer has the permissions to use the KMS key because its role no longer grants it permission to use the key.
Repeat the command that you did in Test 1 that uploaded a PDF file to the bucket. In this case, try to make a second copy of the PDF file into an object named test4.pdf. Run this command, substituting your bucket name and your KMS key ID as required:
An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
These two commands are denied because when S3 tried to invoke KMS to encrypt or decrypt data, the EC2 instance role did not have permission to use the KMS key and thus the request failed. Note that there is no situation where the API call returns the KMS-encrypted data from S3. Either the API call succeeds, and you receive the decrypted data, or the API call fails, and you receive an error. All AWS services that use KMS to encrypt data behave this way—you either get the decrypted data, or you get an error message.
Restoring access to the key
To restore the EC2 instance’s access to the data, you authorize its role again in the KMS key policy:
Go to Key Management Service in the AWS Console.
Select Customer managed keys.
Find the key that you’re using and select it.
Find your authorized-users role in the list of roles, or type “authorized-users” in the search box to find it.
Select the checkbox next to the authorized-users role, and then select Add to add that role as a key user.
The role will now have permission to use the key as it did before.
Useful variations on this solution
Variation 1: Using KMS keys in different AWS accounts
You can use a KMS key that is in a different AWS account for encrypting and decrypting. This allows administrators in a central AWS account to manage KMS keys, while the data itself resides in other AWS accounts. This can offer further separation of roles from the example above because even a highly privileged user (for example, root) in the account in which the authorized-users role exists won’t be able to modify the key policy. The account ID in which authorized-users role exists must be listed in the key policy. For more information, follow the instructions on sharing KMS keys across accounts.
Note that the KMS key and the S3 bucket must always be in the same region. The EC2 instance does not need to be in the same region as the S3 bucket. You will experience higher latency when your EC2 instance is not in the same region as the S3 bucket.
Variation 2: Granting KMS key usage permissions to other AWS services
EC2 is not the only service that can be granted a role this way. Lambda functions can be granted AWS IAM roles that allow them to use KMS keys. That would permit the Lambda functions with the correct roles to manipulate the S3 data, while other entities (users, EC2 instances) could not. Likewise, AWS services such as Amazon Athena might require access to a KMS key if you want to use it to search data stored in S3 that has been encrypted using KMS. If Athena is given permission to assume a role with permissions to use the KMS key, then Athena can successfully execute its search queries because S3 will be allowed to decrypt objects on behalf of Athena, which is acting on your behalf when assuming the authorized-users role.
Variation 3: Creating isolated authorization to encrypt vs decrypt
You can use the KMS key policy to isolate authorization to encrypt versus decrypt data between two identities. For example, if a role has the kms:Encrypt or kms:GenerateDataKey permissions for a key, that means that role can write encrypted data directly or ask an AWS service to do it on their behalf (for example, during an upload to an S3 bucket). If the role does not also have kms:Decrypt permission, it can’t read encrypted data. This write-only permission might be appropriate for data acquisition, security log delivery, or other functions that should not be allowed to read the data they have written. Likewise, if a role has the kms:Decrypt permission, then the role has the ability to read data. But if it lacks the kms:Encrypt permission, it cannot write or modify encrypted data. This kind of isolation authorization is suitable for audit functions and log aggregation functions that need to read data but typically are prohibited from modifying the data/logs that they read. The complete set of permissions for KMS key policies can be found in the KMS developers guide.
Cost of this solution
Three services with charges are used in this solution: EC2, S3, and KMS. The EC2 instance hours are charged according to standard EC2 pricing. Likewise, storing data in S3 will incur costs according to standard S3 pricing. There is no difference in S3 pricing for storing encrypted versus unencrypted data. Finally, KMS has a fixed price per month for each customer-managed CMK you create, which is described in the KMS pricing page. Each encryption and decryption of an object is a KMS API call and a certain number of KMS API calls are free each month. The number of free KMS API calls, and the price for API calls beyond the free tier, are described on the KMS pricing page.
The combination of IAM policies, S3 bucket policies, and KMS key policies gives you a powerful way to apply independent access control mechanisms on data. This mechanism means that one set of users can be granted rights to do maintenance operations on the buckets themselves, while not having rights to access or manipulate the data itself. Even a user or function with full privileges in S3 would be denied access to this encrypted data unless it also had the rights to use the KMS keys. It gives you an approach to access control that allows key policies to serve as an additional control when IAM policies or S3 bucket policies alone are not sufficient.
CCPA, or the California Consumer Privacy Act, is the upcoming “small GDPR” that is applied for all companies that have users from California (i.e. it has extraterritorial application). It is not as massive as GDPR, but you may want to follow its general recommendations.
A few years ago I wrote a technical GDPR guide. Now I’d like to do the same with CCPA. GDPR is much more prescriptive on the fact that you should protect users’ data, whereas CCPA seems to be mainly concerned with the rights of the users – to be informed, to opt out of having their data sold, and to be forgotten. That focus is mainly because other laws in California and the US have provisions about protecting confidentiality of data and data breaches; in that regard GDPR is a more holistic piece of legislation, whereas CCPA covers mostly the aspect of users’ rights (or “consumers”, which is the term used in CCPA). I’ll use “user” as it’s the term more often use in technical discussions.
I’ll list below some important points from CCPA – this is not an exhaustive list of requirements to a software system, but aims to highlight some important bits. And, obviously, I’m not a lawyer, but I’ve been doing data protection consultations and products (like SentinelDB) for the past several years, so I’m qualified to talk about the technical side of privacy regulations.
Right of access – you should be able to export (in a human-readable format, and preferable in machine-readable as well) all the data that you have collected about an individual. Their account details, their orders, their preferences, their posts and comments, etc.
Deletion – you should delete any data you hold about the user. Exceptions apply, of course, including data used for prevention of fraud, other legal reasons, needed for debugging, necessary to complete the business requirement, or anything that the user can reasonably expect. From a technical perspective, this means you most likely have to delete what’s in your database, but other places where you have personal data, like logs or analytics, can be skipped (provided you don’t use it to reconstruct user profiles, of course)
Notify 3rd party providers that received data from you – when data deletion is requested, you have to somehow send notifications to wherever you’ve sent personal data. This can be a SaaS like Mailchimp, Salesforce or Hubspot, or it can be someone you sold the data (apparently that’s a major thing in CCPA). So ideally you should know where data has been sent and invoke APIs for forgetting it. Fortunately, most of these companies are already compliant with GDPR anyway, so they have these endpoints exposed. You just have to add the logic. If your company sells data by posting dumps to S3 or sending Excel sheets via email, you have a bigger problem as you have to keep track of those activities and send unstructured requests (e.g. emails).
Data lineage – this is not spelled out as a requirement, but it follows from multiple articles, including the one for deletion as well as the one for disclosing who data was sent to and where did data came from in your system (in order to know if you can re-sell it, among other things). In order to avoid buying expensive data lineage solutions, you can either have a spreadsheet (in case of simpler processes), or come up with a meaningful way to tag your data. For example, using a separate table with columns (ID, table, sourceType, sourceId, sourceDetails), where ID and table identify a record of personal data in your database, sourceType is the way you have ingested the data (e.g. API call, S3, email) and the ID is the identifier that you can use to track how it came in your system – API key, S3 bucket name, email “from”, or even company registration ID (data might still be sent around flash drives, I guess). Similar table for the outgoing data (with targetType and targetId). It’s a simplified implementation but it might work in cases where a spreadsheet would be too cumbersome to take care of.
Age restriction – if you’ve had the opportunity to know the age of a person whose data you have, you should check it. That means not to ignore the age or data of birth field when you import data from 3rd parties, and also to politely ask users about their age. You can’t sell that data, so you need to know which records are automatically opted out. If you never ever sell data, well, it’s still a good idea to keep it (per GDPR)
Don’t discriminate if users have used their privacy rights – that’s more of a business requirement, but as technical people we should know that we are not allowed to have logic based on users having used their CCPA (or GDPR) rights. From a data organization perspective, I’d put rights requests in a separate database than the actual data to make it harder to fulfill such requirements. You can’t just do a SQL query to check if someone should get a better price, you should do cross system integration and that might dissuade product owners from breaking the law; furthermore it will be a good sign in case of audits.
“Do Not Sell My Personal Information” – this should be on the homepage if you have to comply with CCPA. It’s a bit of a harsh requirement, but it should take users to a form where they can opt out of having their data sold. As mentioned in a previous point, this could be a different system to hold users’ CCPA preferences. It might be easier to just have a set of columns in the users’ table, of course.
Identifying users is an important aspect. CCPA speaks about “verifiable requests”. So if someone drops you an email “I want my data deleted”, you should be able to confirm it’s really them. In an online system that can be a button in the user profile (for opting out, for deletion, or for data access) – if they know the password, it’s fairly certain it’s them. However, in some cases, users don’t have accounts in the system. In that case there should be other ways to identify them. SSN sounds like one, and although it’s a terrible things to use for authentication, with the lack of universal digital identity, especially in the US, it’s hard not to use it at least as part of the identifying information. But it can’t be the only thing – it’s not a password, it’s an identifier. So users sharing their SSN (if you have it), their phone or address, passport or driving license might be some data points to collect for identifying them. Note that once you collect that data, you can’t use it for other purposes, even if you are tempted to. CCPA requires also a toll-free phone support, which is hardly applicable to non-US companies even though they have customers in California, but it poses the question of identifying people online based on real-world data rather than account credentials. And please don’t ask users about their passwords over the phone; just initiate a request on their behalf in the system and direct them to login and confirm it. There should be additional guidelines for identifying users as per 1798.185(a)(7).
Deidentification and aggregate consumer information – aggregated information, e.g. statistics, is not personal data, unless you are able to extract personal data based on it (e.g. the statistics is split per town and age and you have only two users in a given town, you can easily see who is who). Aggregated data is differentiate from deidentified data, which is data that has its identifiers removed. Simply removing identifiers, though, might again not be sufficient to deidentify data – based on several other data points, like IP address (+ logs), physical address (+ snail mail history), phone (+ phone book), one can be uniquely identified. If you can’t reasonably identify a person based on a set of data, it can be considered deidentified. Do make the mental exercise of thinking how to deidentify your data, as then it’s much easier to share it (or sell it) to third parties. Probably nobody minds being part of an aggregated statistics sold to someone, or an anonymized account used for trend analysis.
Pseudonymization is a measure to be taken in many scenarios to protect data. CCPA mentions it particularly in research context, but I’d support a generic pseudonymization functionality. That means replacing the identifying information with a pseudonym, that’s not reversible unless a secret piece of data is used. Think of it (and you can do that quite literally) as encrypting the identifier(s) with a secret key to form the pseudonym. You can then give that data to third parties to work with it (e.g. to do market segmentation) and then give it back to you. You can then decrypt the pseudonyms and fill the obtained market segment(s) into your own database. The 3rd party doesn’t get personal information, but you still get the relevant data
Audit trail is not explicitly stated as a requirement, but since you have the obligation to handle users requests and track the use of their data in and outside of your system, it’s a good idea to have a form of audit trail – who did what with which data; who handled a particular user request; how was the user identified in order to perform the request, etc.
As CCPA is not concerned with data confidentiality requirements, I won’t repeat my GDPR advice about using encryption whenever possible (notably, for backups), or about internal security measures for authentication.
CCPA is focused on the rights of your users and you should be able to handle them (and track how you handled them). You can have manual and spreadsheet based processes if you are not too big, and you should definitely check with your legal team if and to what extent CCPA applies to your company. But if you have implemented the GDPR data subject rights, it’s likely that you are already compliant with CCPA in terms of the overall system architecture, except for a few minor details.
With the recent trend towards data protection and privacy, as well as the requirements of data protection regulations like GDPR and CCPA, some organizations are trying to reorganize their personal data so that it has a higher level of protection.
One path that I’ve seen organizations take is to apply the (what I call) “Personal data store” pattern. That is, to extract all personal data from existing systems and store it in a single place, where it’s accessible via APIs (or in some cases directly through the database). The personal data store is well guarded, audited, has proper audit trail and anomaly detection, and offers privacy-preserving features.
It makes sense to focus one’s data protection efforts predominantly in one place rather than scatter it across dozens of systems. Of course it’s far from trivial to migrate so much data from legacy systems to a new module and then upgrade them to still be able to request and use it when needed. That’s why in some cases the pattern is applied only to sensitive data – medical, biometric, credit cards, etc.
For the sake of completeness, there’s something else called “personal data stores” and it means an architecture where the users themselves store their own data in order to be in control. While this is nice in theory, in practice very few users have the capacity to do so, and while I admire the Solid project, for example, I don’t think it is viable pattern for many organizations, as in many cases users don’t directly interact with the company, but the company still processes large amounts of their personal data.
So, the personal data store pattern is an architectural approach to personal data protection. It can be implemented as a “personal data microservice”, with CRUD operations on predefined data entities, an external service can be used (e.g. SentinelDB, a project of mine), or it can just be a centralized database that has some proxy in front of it to control the access patterns. You an imagine it as externalizing your application’s “users” table and its related tables.
It sounds a little bit like a data warehouse for personal data, but the major difference is that it’s used for operational data, rather than (just) analysis and reporting. All (or most) of your other applications/microservices interact constantly with the personal data store whenever they need to access or update (or “forget”) personal data.
Some of the main features of such a personal data store, the combination of which protect against data breaches, in my view, include:
Easy to use interface (e.g. RESTful web services or simply SQL) – systems that integrate with the personal data store should be built in a way that a simple DAO layer implementation gets swapped and then data that was previously accessed form a local database is now obtained from the personal data store. This is not always easy, as ORM technologies add a layer of complexity.
High level of general security – servers protected with 2FA, access control, segregated networks, restricted physical access, firewalls, intrusion prevention systems, etc. The good things is that it’s easier to apply all the best practices applied to a single system instead of applying it (and keeping it that way) to every system.
Encryption – but not just “data at rest” encryption; especially sensitive data can and should be encrypted with well protected and rotated keys. That way the “honest but curious” admin won’t be able to extract anything form the underlying database
Audit trail – all infosec and data protection standards and regulations focus on accountability and traceability. There should not be a way to extract or modify personal data without leaving a trace (and ideally, that trace should be protected as well)
Anomaly detection – checking if there is something strange/anomalous in the data access patterns. Such strange access patterns can mean a data breach is happening, and the personal data store can actively block it. There is a lot of software out there that does anomaly detection on network traffic, but it’s much better if the rules (or machine learning) are domain-specific. “Monitor for increased traffic to those servers” is one thing, but it’s much better to be able to say “monitor for out-of-the ordinary accesses to personal data of such and such kind”
Pseudonymization – many systems that need the personal data don’t actually need to know who it is about. That includes marketing, including outsourcing to 3rd parties, reporting functionalities, etc. So the personal data store can return data that does not allow a person do be identified, but a pseudo-ID instead. That way, when updates are made back to the personal data store, they can still refer to a particular person, via the pseudonymous ID, but the application that extracted the data in the first place doesn’t get to know who the data was about. This is useful in scenarios where data has to be (temporarily or not) stored in a database that lies outside the personal datastore.
Authentication – if the company offers user authentication, this can be done via the personal data store. Passwords, two-factor authentication secrets and other means of authentication are personal data, and an important one as well. An organization may use a single-sign-on internally (e.g. Active Directory), but it doesn’t make sense to put customers there, too, so they are usually stored in a database. During authentication, the personal data store accepts all necessary credentials (username, password, 2FA code), and return a token to be used for subsequent calls or to be used a a session cookie token.
GDPR (or CCPA or similar) functionalities – e.g. export of all data about a person, forgetting a person. That’s an often overlooked problem, but “give me all data about me that you have” is an enormous issue with large companies that have dozens of systems. It’s next to impossible to extract the data in a sensible way from all the systems. Tracking all these requests is itself a requirement, so the personal data store can keep track of them to present to auditors if needed.
That’s all easier said than done. In organizations that have already many systems working alongside and processing personal data, migration can be costly. So it’s a good idea to introduce it as early as possible, and have a plan (even if it lasts for years) to move at least sensitive personal data to the well protected silo. This silo is a data engineering effort, a system refactoring effort and an organizational effort. The benefits, though, are reduced long-term cost and reduced risks for data breaches and non-compliance.
You can use the AWS Key Management Service (KMS) custom key store feature to gain more control over your KMS keys. The KMS custom key store integrates KMS with AWS CloudHSM to help satisfy compliance obligations that would otherwise require the use of on-premises hardware security modules (HSMs) while providing the AWS service integrations of KMS. However, the additional control comes with increased cost and potential impact on performance and availability. This post will help you decide if this feature is the best approach for you.
KMS is a fully managed service that generates encryption keys and helps you manage their use across more than 45 AWS services. It also supports the AWS Encryption SDK and other client-side encryption tools, and you can integrate it into your own applications. KMS is designed to meet the requirements of the vast majority of AWS customers. However, there are situations where customers need to manage their keys in single-tenant HSMs that they exclusively control. Previously, KMS did not meet these requirements since it offered only the ability to store keys in shared HSMs that are managed by KMS.
AWS CloudHSM is a service that’s primarily intended to support customer-managed applications that are specifically designed to use HSMs. It provides direct control over HSM resources, but the service isn’t, by itself, widely integrated with other AWS managed services. Before custom key store, this meant that if you required direct control of your HSMs but still wanted to use and store regulated data in AWS managed services, you had to choose between changing those requirements, not using a given AWS service, or building your own solution. KMS custom key store gives you another option.
How does a custom key store work?
With custom key store, you can configure your own CloudHSM cluster and authorize KMS to use it as a dedicated key store for your keys rather than the default KMS key store. Then, when you create keys in KMS, you can choose to generate the key material in your CloudHSM cluster. Your KMS customer master keys (CMKs) never leave the CloudHSM instances, and all KMS operations that use those keys are only performed in your HSMs. In all other respects, the master keys stored in your custom key store are used in a way that is consistent with other KMS CMKs.
This diagram illustrates the primary components of the service and shows how a cluster of two CloudHSM instances is connected to KMS to create a customer controlled key store.
Figure 1: A cluster of two CloudHSM instances is connected to KMS to create a customer controlled key store
Because you control your CloudHSM cluster, you can take direct action to manage certain aspects of the lifecycle of your keys, independently of KMS. Specifically, you can verify that KMS correctly created keys in your HSMs and you can delete key material and restore keys from backup at any time. You can also choose to connect and disconnect the CloudHSM cluster from KMS, effectively isolating your keys from KMS. However, with more control comes more responsibility. It’s important that you understand the availability and durability impact of using this feature, and I discuss the issues in the next section.
KMS customers who plan to use a custom key store tell us they expect to use the feature selectively, deciding on a key-by-key basis where to store them. To help you decide if and how you might use the new feature, here are some important issues to consider.
Here are some reasons you might want to store a key in a custom key store:
You have keys that are required to be protected in a single-tenant HSM or in an HSM over which you have direct control.
You have keys that are explicitly required to be stored in an HSM validated at FIPS 140-2 level 3 overall (the HSMs used in the default KMS key store are validated to level 2 overall, with level 3 in several categories, including physical security).
You have keys that are required to be auditable independently of KMS.
And here are some considerations that might influence your decision to use a custom key store:
Cost — Each custom key store requires that your CloudHSM cluster contains at least two HSMs. CloudHSM charges vary by region, but you should expect costs of at least $1,000 per month, per HSM, if each device is permanently provisioned. This cost occurs regardless of whether you make any requests of the KMS API directly or indirectly through an AWS service.
Performance — The number of HSMs determines the rate at which keys can be used. It’s important that you understand the intended usage patterns for your keys and ensure that you have provisioned your HSM resources appropriately.
Availability — The number of HSMs and the use of availability zones (AZs) impacts the availability of your cluster and, therefore, your keys. The risk of your configuration errors that result in a custom key store being disconnected, or key material being deleted and unrecoverable, must be understood and assessed.
Operations — By using the custom key store feature, you will perform certain tasks that are normally handled by KMS. You will need to set up HSM clusters, configure HSM users, and potentially restore HSMs from backup. These are security-sensitive tasks for which you should have the appropriate resources and organizational controls in place to perform.
Here’s a basic rundown of the steps that you’ll take to create your first key in a custom key store within a given region.
Create a CMK in KMS in the usual way except now select CloudHSM as the source of your key material. You’ll define administrators, users, and policies for the key as you would for any other CMK.
Use the key via the existing KMS APIs, AWS CLI, or the AWS Encryption SDK. Requests to use the key don’t need to be context-aware of whether the key is stored in a custom key store or the default KMS key store.
Some customers need specific controls in place before they can use KMS to manage encryption keys in AWS. The new KMS custom key store feature is intended to satisfy that requirement. You can now apply the controls provided by CloudHSM to keys managed in KMS, without changing access control policies or service integration.
However, by using the new feature, you take responsibility for certain operational aspects that would otherwise be handled by KMS. It’s important that you have the appropriate controls in place and understand the performance and availability requirements of each key that you create in a custom key store.
If you’ve been prevented from migrating sensitive data to AWS because of specific key management requirements that are currently not met by KMS, consider using the new KMS custom key store feature.
A new webpage focused on data privacy in Argentina features FAQs, helpful links, and whitepapers that provide an overview of PDPL considerations, as well as our security assurance frameworks and international certifications, including ISO 27001, ISO 27017, and ISO 27018. You’ll also find details about our Information Request Report and the high bar of security at AWS data centers.
Additionally, we’ve released a new workbook that offers a detailed mapping as to how customers can operate securely under the Shared Responsibility Model while also aligning with Disposition No. 11/2006. The AWS Disposition 11/2006 Workbook can be downloaded from the Argentina Data Privacy page or directly from this link. Both resources are also available in Spanish from the Privacidad de los datos en Argentina page.
Want more AWS Security news? Follow us on Twitter.
Today, we’re happy to announce that the AWS GDPR Data Processing Addendum (GDPR DPA) is now part of our online Service Terms. This means all AWS customers globally can rely on the terms of the AWS GDPR DPA which will apply automatically from May 25, 2018, whenever they use AWS services to process personal data under the GDPR. The AWS GDPR DPA also includes EU Model Clauses, which were approved by the European Union (EU) data protection authorities, known as the Article 29 Working Party. This means that AWS customers wishing to transfer personal data from the European Economic Area (EEA) to other countries can do so with the knowledge that their personal data on AWS will be given the same high level of protection it receives in the EEA.
As we approach the GDPR enforcement date this week, this announcement is an important GDPR compliance component for us, our customers, and our partners. All customers which that are using cloud services to process personal data will need to have a data processing agreement in place between them and their cloud services provider if they are to comply with GDPR. As early as April 2017, AWS announced that AWS had a GDPR-ready DPA available for its customers. In this way, we started offering our GDPR DPA to customers over a year before the May 25, 2018 enforcement date. Now, with the DPA terms included in our online service terms, there is no extra engagement needed by our customers and partners to be compliant with the GDPR requirement for data processing terms.
The AWS GDPR DPA also provides our customers with a number of other important assurances, such as the following:
AWS will process customer data only in accordance with customer instructions.
AWS has implemented and will maintain robust technical and organizational measures for the AWS network.
AWS will notify its customers of a security incident without undue delay after becoming aware of the security incident.
Customers who have already signed an offline version of the AWS GDPR DPA can continue to rely on that GDPR DPA. By incorporating our GDPR DPA into the AWS Service Terms, we are simply extending the terms of our GDPR DPA to all customers globally who will require it under GDPR.
AWS GDPR DPA is only part of the story, however. We are continuing to work alongside our customers and partners to help them on their journey towards GDPR compliance.
The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
GDPR, или новият Общ регламент относно защитата на данните, е гореща тема, тъй като влиза в сила на 25-ти май. И разбира се, публичното пространство е пълно с мнения и заключения по въпроса. За съжаление повечето от тях са грешни. На база на наблюденията ми от последните месеци реших да извадя 7 мита за Регламента.
От края на миналата година активно консултирам малки и големи компании относно регламента, водя обучения и семинари и пиша технически разяснения. И не, не съм юрист, но Регламентът изисква познаване както на правните, така и на технологичните аспекти на защитата на данните.
1. „GDPR ми е ясен, разбрал съм го“
Най-опасното е човек да мисли, че разбира нещо след като само е чувал за него или е прочел две статии в новинарски сайт (както за GDPR така и в по-общ смисъл). Аз самият все още не твърдя, че познавам всички ъгълчета на Регламента. Но по конференции, кръгли маси, обучения, срещи, форуми и фейсбук групи съм чул и прочел твърде много глупости относно GDPR. И то такива, които могат да се оборят с „Не е вярно, виж чл. Х“. В тази категория за съжаление влизат и юристи, и IT специалисти, и хора на ръководни позиции.
От мита, че познаваме GDPR, произлизат и всички останали митове. Част от вината за това е и на самия Регламент. Дълъг е, чете се трудно, има лоши законодателни практики (3 различни хипотези в едно изречение??) и нито Европейската Комисия, нито някоя друга европейска институция си е направила труда да го разясни за хората, за които се отнася – а именно, за почти всички. Т.нар. „работна група по чл. 29 (от предишната Директива)“ има разяснения по някои въпроси, но те са също толкова дълги и трудно четими ако човек няма контекст. При толкова широкообхватно законодателство е голяма грешка то да се остави нерязяснено. Да, в него има много нюанси и много условности (което е друг негов минус), но е редно поне общите положения да бъдат разказани ясно и то от практическа гледна точка.
Така че не – да не си мислим, че сме разбрали GDPR.
2. „Личните данни са тайна“
Определението за лични данни в Регламента може би характеризира целия Регламент – трудно четима и „увъртяно“:
„лични данни“ означава всяка информация, свързана с идентифицирано физическо лице или физическо лице, което може да бъде идентифицирано („субект на данни“); физическо лице, което може да бъде идентифицирано, е лице, което може да бъде идентифицирано, пряко или непряко, по-специално чрез идентификатор като име, идентификационен номер, данни за местонахождение, онлайн идентификатор или по един или повече признаци, специфични за физическата, физиологичната, генетичната, психическата, умствената, икономическата, културната или социална идентичност на това физическо лице;
Всъщност лични данни са всичко, което се отнася за нас. Включително съвсем очевидни неща като цвят на очи и коса, ръст и т.н. И не, личните данни не са тайна. Имената ни не са тайна, ръстът ни не е тайна. ЕГН-то ни не е тайна (да, не е). Има специални категории лични данни, които могат да бъдат тайна (напр. медицински данни), но за тях има специален ред.
Разграничаването, което GDPR не прави ясно, за разлика от едно разяснение на NIST – има лични данни, на база на които хората могат да бъдат идентифицирани, и такива, с които не могат, но се отнасят за тях. По цвят на косата не можем да бъдем идентифицирани. Но цветът на косата представлява лични данни. По професия не можем да бъдем идентифицирани. (По три имена и професия обаче – евентуално може и да можем). И тук едно много важно нещо, посочено в последните изречения на съображение 26 – данни, които са лични, но не могат да бъдат отнесени към конкретно лице, и на база на които не може да бъде идентифицирано такова, не попадат в обхвата на регламента. И съвсем не са тайна – „имаме 120 клиента на възраст 32 години, които са си купили телефон Sony между Април и Юли“ е напълно окей.
Та, личните данни не са та тайни – някои даже са съвсем явни и видни. Целта на GDPR е да уреди тяхната обработка с автоматизирани средства (или полуавтоматизирани в структуриран вид, т.е. тетрадки). С други думи – кой има право да ги съхранява, за какво има право да ги използва и как трябва да ги съхранява и използва.
3. „GDPR не се отнася за мен“
Няма почти никакви изключения в Регламента. Компании под 250 души не са длъжни да водят едни регистри, а компании, които нямат мащабна обработка и наблюдение на субекти на данни нямат задължение за длъжностно лице по защита на данните (Data protection officer; тази точка е дискусионна с оглед на предложенията за изменения на българския закон за защита на личните данни, които разширяват прекалено много изискванията за DPO). Всичко останало важи за всички, които обработват лични данни. И всички граждани на ЕС имат всички права, посочени в Регламента.
4. „Ще ни глобят 20 милиона евро“
Тези глоби са единствената причина GDPR да е популярен. Ако не бяха те, на никого нямаше да му дреме за поредното европейско законодателство. Обаче заради плашещите глоби всякакви консултанти ходят и обясняват как „ами те глобите, знаете, са до 20 милиона“.
Но колкото и да се повтарят тези 20 милиона (или както някои пресоляват манджата „глоби над 20 милиона евро“), това не ги прави реалистични. Първо, има процес, който всички регулатори ще следват, и който включва няколко стъпки на „препоръки“ преди налагане на глоба. Идва комисията, установява несъответствие, прави препоръки, идва пак, установява взети ли са мерки. И ако сте съвсем недобросъвестни и не направите нищо, тогава идват глобите. И тези глоби са пропорционални на риска и на количеството данни. Не е „добър ден, 20 милиона“. Според мен 20-те милиона ще са само за огромни международни компании, като Google и Facebook, които обработват данни на милиони хора. За тетрадката с вересиите глоба няма да има (правото да бъдеш забравен се реализира със задраскване, но само ако магазинерът няма легитимен интерес да ги съхранява, а именно – да му върнете парите :)).
Тук една скоба за българското законодателство – то предвижда доста високи минимуми на глобите (10 хил. лева). Това се оспорва в рамките на общественото обсъждане и е несъразмерно на минимумите в други европейски държави и се надявам да спадне значително.
5. „Трябва да спрем да обработваме лични данни“
В никакъв случай. GDPR не забранява обработката на лични данни, просто урежда как и кога те да се обработват. Имате право да обработвате всички данни, които са ви нужни, за да си свършите работата.
Някои интернет компании напоследък обявиха, че спират работа заради GDPR, защото не им позволявал да обработват данни. И това в общия случай са глупости. Или те така или иначе са били на загуба и сега си търсят оправдание, или са били такъв разграден двор и са продавали данните ви наляво и надясно без ваше знание и съгласие, че GDPR представлява риск. Но то това му е идеята – да няма такива практики. Защото (както твърди Регламентът) това представлява риск за правата и свободите на субектите на данни (субект на данните – това звучи гордо).
6. „Трябва да искаме съгласие за всичко“
Съгласието на потребителите е само едно от основанията за обработка на данните. Има доста други и те дори са по-често срещани в реалния бизнес. Както отбелязах по-горе, ако можете да докажете легитимен интерес да обработвате данните, за да си свършите работата, може да го правите без съгласие. Имате ли право да събирате адреса и телефона на клиента, ако доставяте храна? Разбира се, иначе не може да му я доставите. Няма нужда от съгласие в този случай (би имало нужда от съгласие ако освен за доставката, ползвате данните му и за други цели). Нужно ли е съгласие за обработка на лични данни в рамките на трудово правоотношение? Не, защото Кодекса на труда изисква работодателят да води трудово досие. Има ли нужда банката да поиска съгласие, за да ви обработва личните данни за кредита? Не, защото те са нужни за изпълнението на договора за кредит (и не, не можете да кажете на банката да ви „забрави“ кредита; правото да бъдеш забравен важи само в някои случаи).
Усещането ми обаче е, че ще плъзнат едни декларации и чекбоксове за съгласие, които ще са напълно излишни…но вж. т.1. А дори когато трябва да ги има, ще бъдат прекалено общи, а не за определени цели (съгласявам се да ми обработвате данните, ама за какво точно?).
7. „Съответсвието с GDPR е трудно и скъпо“
…и съответно Регламентът е голяма административна тежест, излишно натоварване на бизнеса и т.н. Ами не, не е. Съответствието с GDPR изисква осъзната обработка на личните данни. Да, изисква и няколко хартии – политики и процедури, с които да докажете, че знаете какви лични данни обработвате и че ги обработвате съвестно, както и че знаете, че гражданите имат някакви права във връзка с данните си (и че всъщност не вие, а те са собственици на тези данни), но извън това съответствието не е тежко. Е, ако хал хабер си нямате какви данни и бизнес процеси имате, може и да отнеме време да ги вкарате в ред, но това е нещо, което по принцип e добре да се случи, със или без GDPR.
Ако например досега в една болница данните за пациентите са били на незащитен по никакъв начин сървър и всеки е имал достъп до него, без това да оставя следа, и също така е имало още 3-4 сървъра, на които никой не е знаел, че има данни (щото „IT-то“ е напуснало преди 2 години), то да, ще трябват малко усилия.
Но почти всичко в GDPR са „добри практики“ така или иначе. Неща, които са полезни и за самия бизнес, не само за гражданите.
Разбира се, синдромът „по-светец и от Папата“ започва да се наблюдава. Освен компаниите, които са изсипали милиони на юристи, консултанти, доставчици (и което накрая е имало плачевен резултат и се е оказало, че за един месец няколко човека могат да я свършат цялата тая работа) има и такива, които четат Регламента като „по-добре да не даваме никакви данни никъде, за всеки случай“. Презастраховането на големи компании, като Twitter и Facebook например, има риск да „удари“ компании, които зависят от техните данни. Но отново – вж. т.1.
В заключение, GDPR не е нещо страшно, не е нещо лошо и не е „измислица на бюрократите в Брюксел“. Има много какво да се желае откъм яснотата му и предполагам ще има какво да се желае откъм приложението му, но „по принцип“ е окей.
И както става винаги със законодателства, обхващащи много хора и бизнеси – в началото ще има не само 7, а 77 мита, които с времето и с практиката ще се изяснят. Ще има грешки на растежа, има риск (особено в по-малки и корумпирани държави) някой „да го отнесе“, но гледайки голямата картинка, смятам, че с този Регламент след 5 години ще сме по-добре откъм защита на данните и откъм последици от липсата на на такава защита.
Security is our top priority at AWS, and from the beginning we have built security into the fabric of our services. With the introduction of GDPR (which becomes enforceable on May 25 of 2018), privacy and data protection have become even more ingrained into our security-centered culture. Three weeks ago, well ahead of the deadline, we announced that all AWS services are compliant with GDPR, meaning you can use AWS as a data processor as a way to help solve your GDPR challenges (be sure to visit our GDPR Center for additional information).
When it comes to GDPR compliance, many customers are progressing nicely and much of the initial trepidation is gone. In my interactions with customers on this topic, a few themes have emerged as universal:
GDPR is important. You need to have a plan in place if you process personal data of EU data subjects, not only because it’s good governance, but because GDPR does carry significant penalties for non-compliance.
Solving this can be complex, potentially involving a lot of personnel and multiple tools. Your GDPR process will also likely span across disciplines – impacting people, processes, and technology.
Each customer is unique, and there are many methodologies around assessing your compliance with GDPR. It’s important to be aware of your own individual business attributes.
I thought it might be helpful to share some of our own lessons learned. In our experience in solving the GDPR challenge, the following were keys to our success:
Get your senior leadership involved. We have a regular cadence of detailed status conversations about GDPR with our CEO, Andy Jassy. GDPR is high stakes, and the AWS leadership team knows it. If GDPR doesn’t have the attention it needs with the visibility of top management today, it’s time to escalate.
Centralize the GDPR efforts. Driving all work streams centrally is key. This may sound obvious, but managing this in a distributed manner may result in duplicative effort and/or team members moving in a different direction.
The most important single partner in solving GDPR is your legal team. Having non-legal people make assumptions about how to interpret GDPR for your unique environment is both risky and a potential waste of time and resources. You want to avoid analysis paralysis by getting proper legal advice, collaborating on a direction, and then moving forward with the proper urgency.
Collaborate closely with tech leadership. The “process” people in your organization, the ones who already know how to approach governance problems, are typically comfortable jumping right in to GDPR. But technical teams, including data owners, have set up their software for business application. They may not even know what kind of data they are storing, processing, or transferring to other parts of the business. In the GDPR exercise they need to be aware of (or at least help facilitate) the tracking of data and data elements between systems. This isn’t a typical ask for technical teams, so be prepared to educate and to fully understand data flow.
Don’t live by the established checklists. There are multiple methodologies to solving the compliance challenges of GDPR. At AWS, we ended up establishing core requirements, mapped out by data controller and data processor functions and then, in partnership with legal, decided upon a group of projects based on our known current state. Be careful about using a set methodology, tool or questionnaire to govern your efforts. These generic assessments can help educate, but letting them drive or limit your work could lead to missing something that is key to your own compliance. In this sense, a generic, “one size fits all” solution might not be helpful.
Don’t be afraid to challenge prior orthodoxy. Many times we changed course based on new information. You shouldn’t be afraid to scrap an effort if you determine it’s not working. You should also not be afraid to escalate issues to senior leadership when needed. This is an executive issue.
Look for ways to leverage your work beyond this compliance activity. GDPR requires serious effort, but are the results limited to GDPR compliance? Certainly not. You can use GDPR workflows as a way to ensure better governance moving forward. Privacy and security will require work for the foreseeable future, so make your governance program scalable and usable for other purposes.
One last tip that has made all the difference: think about protecting data subjects and work backwards from there. Customer focus drives us to ask, “what would customers and data subjects want and expect us to do?” Taking GDPR from a pure legal or compliance standpoint may be technically sufficient, but we believe the objectives of security and personal data protection require a more comprehensive view, and you can most effectively shape that view by starting with the individuals GDPR was meant to protect.
If you would like to find out more about our experiences, as well as how we can help you in your efforts, please reach out to us today.
Vice President, AWS Security Assurance
Interested in additional AWS Security news? Follow the AWS Security Blog on Twitter.
Good news for cloud security experts: following our most popular beta exam ever, the AWS Certified Security – Specialty exam is here. This new exam allows experienced cloud security professionals to demonstrate and validate their knowledge of how to secure the AWS platform.
About the exam The security exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. The exam is open to anyone who currently holds a Cloud Practitioner or Associate-level certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.
The exam validates:
An understanding of specialized data classifications and AWS data protection mechanisms.
An understanding of data encryption methods and AWS mechanisms to implement them.
An understanding of secure Internet protocols and AWS mechanisms to implement them.
A working knowledge of AWS security services and features of services to provide a secure production environment.
Competency gained from two or more years of production deployment experience using AWS security services and features.
Ability to make trade-off decisions with regard to cost, security, and deployment complexity given a set of application requirements.
An understanding of security operations and risk.
Learn more and register >>
How to prepare We have training and other resources to help you prepare for the exam:
Good news for cloud security experts: the AWS Certified Security — Specialty exam is here. This new exam allows experienced cloud security professionals to demonstrate and validate their knowledge of how to secure the AWS platform.
About the exam
The security exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. The exam is open to anyone who currently holds a Cloud Practitioner or Associate-level certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.
The exam validates your understanding of:
Specialized data classifications and AWS data protection mechanisms
Data encryption methods and AWS mechanisms to implement them
Secure Internet protocols and AWS mechanisms to implement them
AWS security services and features of services to provide a secure production environment
Making tradeoff decisions with regard to cost, security, and deployment complexity given a set of application requirements
Security operations and risk
How to prepare
We have training and other resources to help you prepare for the exam.
In the wake of the Cambridge Analytica scandal, news articles and commentators have focused on what Facebook knows about us. A lot, it turns out. It collects data from our posts, our likes, our photos, things we type and delete without posting, and things we do while not on Facebook and even when we’re offline. It buys data about us from others. And it can infer even more: our sexual orientation, political beliefs, relationship status, drug use, and other personality traits — even if we didn’t take the personality test that Cambridge Analytica developed.
But for every article about Facebook’s creepy stalker behavior, thousands of other companies are breathing a collective sigh of relief that it’s Facebook and not them in the spotlight. Because while Facebook is one of the biggest players in this space, there are thousands of other companies that spy on and manipulate us for profit.
Harvard Business School professor Shoshana Zuboff calls it “surveillance capitalism.” And as creepy as Facebook is turning out to be, the entire industry is far creepier. It has existed in secret far too long, and it’s up to lawmakers to force these companies into the public spotlight, where we can all decide if this is how we want society to operate and — if not — what to do about it.
There are 2,500 to 4,000 data brokers in the United States whose business is buying and selling our personal data. Last year, Equifax was in thenews when hackers stole personal information on 150 million people, including Social Security numbers, birth dates, addresses, and driver’s license numbers.
You certainly didn’t give it permission to collect any of that information. Equifax is one of those thousands of data brokers, most of them you’ve never heard of, selling your personal information without your knowledge or consent to pretty much anyone who will pay for it.
Surveillance capitalism takes this one step further. Companies like Facebook and Google offer you free services in exchange for your data. Google’s surveillance isn’t in the news, but it’s startlingly intimate. We never lie to our search engines. Our interests and curiosities, hopes and fears, desires and sexual proclivities, are all collected and saved. Add to that the websites we visit that Google tracks through its advertising network, our Gmail accounts, our movements via Google Maps, and what it can collect from our smartphones.
That phone is probably the most intimate surveillance device ever invented. It tracks our location continuously, so it knows where we live, where we work, and where we spend our time. It’s the first and last thing we check in a day, so it knows when we wake up and when we go to sleep. We all have one, so it knows who we sleep with. Uber used just some of that information to detect one-night stands; your smartphone provider and any app you allow to collect location data knows a lot more.
Surveillance capitalism drives much of the internet. It’s behind most of the “free” services, and many of the paid ones as well. Its goal is psychological manipulation, in the form of personalized advertising to persuade you to buy something or do something, like vote for a candidate. And while the individualized profile-driven manipulation exposed by Cambridge Analytica feels abhorrent, it’s really no different from what every company wants in the end. This is why all your personal information is collected, and this is why it is so valuable. Companies that can understand it can use it against you.
None of this is new. The media has been reporting on surveillance capitalism for years. In 2015, I wrote a book about it. Back in 2010, the Wall Street Journal publishedan award-winning two-year series about how people are tracked both online and offline, titled “What They Know.”
Surveillance capitalism is deeply embedded in our increasingly computerized society, and if the extent of it came to light there would be broad demands for limits and regulation. But because this industry can largely operate in secret, only occasionally exposed after a data breach or investigative report, we remain mostly ignorant of its reach.
This might change soon. In 2016, the European Union passed the comprehensive General Data Protection Regulation, or GDPR. The details of the law are far too complex to explain here, but some of the things it mandates are that personal data of EU citizens can only be collected and saved for “specific, explicit, and legitimate purposes,” and only with explicit consent of the user. Consent can’t be buried in the terms and conditions, nor can it be assumed unless the user opts in. This law will take effect in May, and companies worldwide are bracing for its enforcement.
Because pretty much all surveillance capitalism companies collect data on Europeans, this will expose the industry like nothing else. Here’s just one example. In preparation for this law, PayPal quietlypublished a list of over 600 companies it might share your personal data with. What will it be like when every company has to publish this sort of information, and explicitly explain how it’s using your personal data? We’re about to find out.
In the wake of this scandal, even Mark Zuckerberg saidthat his industry probably should be regulated, although he’s certainly not wishing for the sorts of comprehensive regulation the GDPR is bringing to Europe.
He’s right. Surveillance capitalism has operated without constraints for far too long. And advances in both big data analysis and artificial intelligence will make tomorrow’s applications far creepier than today’s. Regulation is the only answer.
The first step to any regulation is transparency. Who has our data? Is it accurate? What are they doing with it? Who are they selling it to? How are they securing it? Can we delete it? I don’t see any hope of Congress passing a GDPR-like data protection law anytime soon, but it’s not too far-fetched to demand laws requiring these companies to be more transparent in what they’re doing.
One of the responses to the Cambridge Analytica scandal is that people are deleting their Facebook accounts. It’s hard to do right, and doesn’t do anything about the data that Facebook collectsaboutpeople who don’t use Facebook. But it’s a start. The market can put pressure on these companies to reduce their spying on us, but it can only do that if we force the industry out of its secret shadows.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.