Tag Archives: GDPR

Dan Solove on Privacy Regulation

2024-04-24 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/04/dan-solove-on-privacy-regulation.html

Law professor Dan Solove has a new article on privacy regulation. In his email to me, he writes: “I’ve been pondering privacy consent for more than a decade, and I think I finally made a breakthrough with this article.” His mini-abstract:

In this Article I argue that most of the time, privacy consent is fictitious. Instead of futile efforts to try to turn privacy consent from fiction to fact, the better approach is to lean into the fictions. The law can’t stop privacy consent from being a fairy tale, but the law can ensure that the story ends well. I argue that privacy consent should confer less legitimacy and power and that it be backstopped by a set of duties on organizations that process personal data based on consent.

Full abstract:

Consent plays a profound role in nearly all privacy laws. As Professor Heidi Hurd aptly said, consent works “moral magic”—it transforms things that would be illegal and immoral into lawful and legitimate activities. As to privacy, consent authorizes and legitimizes a wide range of data collection and processing.

There are generally two approaches to consent in privacy law. In the United States, the notice-and-choice approach predominates; organizations post a notice of their privacy practices and people are deemed to consent if they continue to do business with the organization or fail to opt out. In the European Union, the General Data Protection Regulation (GDPR) uses the express consent approach, where people must voluntarily and affirmatively consent.

Both approaches fail. The evidence of actual consent is non-existent under the notice-and-choice approach. Individuals are often pressured or manipulated, undermining the validity of their consent. The express consent approach also suffers from these problems people are ill-equipped to decide about their privacy, and even experts cannot fully understand what algorithms will do with personal data. Express consent also is highly impractical; it inundates individuals with consent requests from thousands of organizations. Express consent cannot scale.

In this Article, I contend that most of the time, privacy consent is fictitious. Privacy law should take a new approach to consent that I call “murky consent.” Traditionally, consent has been binary—an on/off switch—but murky consent exists in the shadowy middle ground between full consent and no consent. Murky consent embraces the fact that consent in privacy is largely a set of fictions and is at best highly dubious.

Because it conceptualizes consent as mostly fictional, murky consent recognizes its lack of legitimacy. To return to Hurd’s analogy, murky consent is consent without magic. Rather than provide extensive legitimacy and power, murky consent should authorize only a very restricted and weak license to use data. Murky consent should be subject to extensive regulatory oversight with an ever-present risk that it could be deemed invalid. Murky consent should rest on shaky ground. Because the law pretends people are consenting, the law’s goal should be to ensure that what people are consenting to is good. Doing so promotes the integrity of the fictions of consent. I propose four duties to achieve this end: (1) duty to obtain consent appropriately; (2) duty to avoid thwarting reasonable expectations; (3) duty of loyalty; and (4) duty to avoid unreasonable risk. The law can’t make the tale of privacy consent less fictional, but with these duties, the law can ensure the story ends well.

NIS2, DORA, CER and GDPR: A Comparative Overview of Crucial EU Compliance Directives and Regulations

2024-03-26 Editor

Post Syndicated from Editor original https://nebosystems.eu/comparative-guide-dora-gdpr-nis2-cer/

In the evolving regulatory landscape, organizations operating within the EU must navigate through a complex web of regulations and directives, including NIS2 (Network & Information System) Directive, CER (Critical Entities Resilience) Directive, DORA (Digital Operational Resilience Act) and GDPR (General Data Protection Regulation). Each of these frameworks has a distinct focus, from enhancing cybersecurity and operational resilience to protecting personal data and ensuring the resilience of critical entities.

This guide outlines the essential aspects of DORA (EU) 2022/2554, GDPR (EU) 2016/679, NIS2 (EU) 2022/2555 directive and the CER/RCE (EU) 2022/2557 directive, including their scope, objectives, key requirements, sanctions for non-compliance, implementation deadlines, technical and organizational measures, key differences and compliance intersections.

Scope and Applicability

NIS2 (Network & Information System) Directive : Applies to essential and important entities across various sectors expanding the scope of its predecessor, the NIS Directive.
Essential entities include sectors such as energy (including electricity, oil, and gas), transport (air, rail, water and road), banking, financial market infrastructures, health care, drinking water, wastewater, and digital infrastructure. Essential entities are those whose disruption would cause significant impacts on public safety, security, or economic or societal activities.
Important Entities covers postal and courier services, waste management, manufacture, production and distribution of chemicals, food production, distribution and sale, manufacturing of medical devices, computers and electronics, machinery equipment, motor vehicles, digital providers such as online marketplaces, online search engines, and social networking services platforms, and certain entities within the public administration sector.
DORA (Digital Operational Resilience Act): Specifically focuses on the resilience of the financial sector to ICT risks, encompassing a wide range of entities that play pivotal roles in the financial ecosystem. This includes credit institutions, investment firms, insurance and reinsurance companies, payment institutions, electronic money institutions, crypto-asset service providers, central securities depositories, central counterparties, trading venues, managers of alternative investment funds and management companies of undertakings for collective investment in transferable securities (UCITS). Additionally, it covers ICT third-party service providers to these financial entities, emphasizing the importance of digital operational resilience not just within financial entities themselves but also within their extended digital supply chains.
GDPR (General Data Protection Regulation): Has a global reach, affecting any organization that processes personal data of EU citizens, focusing on data protection and privacy regardless of the sector.
CER (Critical Entities Resilience) Directive: Aims to enhance the resilience of critical entities operating in vital sectors such as energy, transport, banking, financial market infrastructure, health, drinking water, waste water, public administration, space, digital infrastructure, production, processing and distribution of food sector within the EU.

Objectives

NIS2 Directive: Seeks to significantly raise cybersecurity standards and improve incident response capabilities across the EU.
DORA: Ensures that the financial sector can withstand, respond to, and recover from ICT-related disruptions and threats.
GDPR: Protects EU citizens’ personal data, ensuring privacy and giving individuals control over their personal information.
CER Directive: Focuses on enhancing the overall resilience of entities that are critical to the maintenance of vital societal or economic activities against a range of non-cyber and cyber threats.

Key Requirements

NIS2 Directive: Mandates robust risk management measures, timely incident reporting, supply chain security and resilience testing among affected entities.
DORA: Requires financial entities to establish comprehensive ICT risk management frameworks, report significant ICT-related incidents, conduct resilience testing and manage risks related to third-party ICT service providers.
GDPR: Enforces principles such as lawful processing, data minimization and transparency; upholds data subjects’ rights; mandates data breach notifications; and requires data protection measures to be embedded in business processes.
CER Directive: Calls for national risk assessments, enhanced security measures, incident notification, and crisis management for critical entities, ensuring they can maintain essential services under adverse conditions.

Sanctions and Penalties

NIS2 Directive: The directive suggests Member States ensure that penalties for non-compliance are effective, proportionate, and dissuasive, but does not specify amounts, leaving it to individual Member States to set.
DORA: Specific sanctions and penalties are not detailed, implying that penalties would be defined at the Member State level or in subsequent regulatory guidance.
GDPR: Known for its strict penalties, organizations can face fines up to €20 million or 4% of their total global turnover, whichever is higher.
CER Directive: Similar to NIS2, the CER Directive leaves the specifics of sanctions and penalties to Member States, emphasizing the need for them to be effective, proportionate, and dissuasive.

Implementation Deadline Date

NIS2 Directive: Member States are required to transpose and apply the measures of the NIS2 Directive by 18 October 2024 .
DORA: The regulation will become applicable from 17 January 2025, marking the deadline for entities within the financial sector to comply with its requirements .
GDPR: This regulation has been in effect since 25 May 2018, requiring immediate compliance from the effective date.
CER Directive: Similar to NIS2, the CER Directive must be transposed and applied by Member States by 18 October 2024 .

Key Differences

While NIS2, DORA, GDPR and CER directives and regulations share common goals related to security and privacy, they differ significantly in their primary focus and applicability:

NIS2 Directive primarily enhances cybersecurity across various critical sectors, emphasizing sector-specific risk management and incident reporting.
DORA focuses on the financial sector’s digital operational resilience, detailing ICT risk management and third-party risk, specific to financial services.
GDPR is dedicated to personal data protection, granting extensive rights to individuals regarding their data, applicable across all sectors.
CER Directive aims to ensure the resilience of entities vital for societal and economic well-being, focusing on both cyber and physical resilience measures.

Overlapping Areas

Despite their differences, these frameworks overlap in several key areas, allowing for synergistic compliance efforts:

Risk Management: NIS2, DORA and the CER Directive all emphasize robust risk management, albeit with different focal points (cybersecurity, ICT and critical entity resilience, respectively).
Incident Reporting: NIS2 and DORA require incident reporting within their respective domains, which can streamline processes for entities covered by both.
Data Protection Measures: GDPR’s data protection principles can complement the cybersecurity measures under NIS2 and CER, enhancing overall data security.

Incident Response and Recovery

NIS2 Directive: Requires entities to have incident response capabilities in place, ensuring timely detection, analysis, and response to incidents. It emphasizes the need for recovery plans to restore services after an incident.
DORA: Mandates financial entities to establish and implement an incident management process capable of responding swiftly to ICT-related incidents, including recovery objectives, restoration of systems, and lessons learned activities.
GDPR: While not explicitly detailing incident response processes, GDPR mandates notification of personal data breaches to supervisory authorities and, in certain cases, to the affected individuals, highlighting the need for an effective response mechanism.
CER Directive: Stresses the importance of having incident response plans, ensuring critical entities can quickly respond to and recover from disruptive incidents, maintaining essential services.

Technical and Organizational Measures

NIS2 Directive: Entities should incorporate state-of-the-art cybersecurity solutions like advanced threat detection systems, comprehensive data encryption, secure network configurations and regular security assessments to safeguard sensitive information. Additional technical measures might include continuous monitoring and anomaly detection systems to identify suspicious activities in real time, and the implementation of Security Information and Event Management (SIEM) systems and next-generation firewalls (NGFWs). Organizational strategies involve establishing a robust cybersecurity governance framework, conducting frequent cybersecurity awareness training, and formulating clear policies for effective incident response and thorough business continuity planning.
DORA: For compliance with DORA, financial entities are advised to utilize secure communication protocols and robust encryption for protecting data during transmission and storage, supplemented by multi-factor authentication systems to enhance access security. Additional technical measures could involve the deployment of advanced cybersecurity tools like Security Information and Event Management (SIEM) systems for integrated threat analysis and response, and next-generation firewalls (NGFWs). On the organizational front, setting up a dedicated ICT risk management team, clearly defining cybersecurity roles, and embedding cybersecurity risk considerations into the overarching risk management framework are essential.
GDPR: In alignment with GDPR, technical safeguards such as strong data encryption, pseudonymization of personal data where feasible, and stringent access control mechanisms are pivotal. Expanding on these, additional technical measures may include the use of Data Loss Prevention (DLP) tools to prevent unauthorized data disclosure or loss and employing regular penetration testing to identify and rectify vulnerabilities. Organizational measures encompass the implementation of comprehensive data protection policies, conducting DPIAs for high-risk data processing activities, and appointing a Data Protection Officer in specific scenarios to oversee data protection strategies and compliance.
CER Directive: Adhering to the CER Directive involves applying network segmentation to isolate and protect critical systems, utilizing intrusion detection and prevention systems, and ensuring resilient data backup and recovery strategies. Enhancing these measures, technical strategies could also include the deployment of next-generation firewalls (NGFWs) and the use of automated patch management systems to ensure timely application of security updates. Organizational approaches include developing a detailed incident management plan, establishing a dedicated crisis management team, and conducting regular resilience testing and drills to validate and improve recovery processes.

Compliance Intersections and Synergies

While each framework has its unique focus, there are notable intersections, particularly in the areas of risk management, incident reporting, and the overarching emphasis on security and resilience. For instance, the risk management strategies advocated by NIS2 and the CER Directive can complement the ICT risk management framework of DORA. GDPR’s requirement for data protection by design and default can also support the cybersecurity measures outlined in NIS2 and CER, promoting a secure and privacy-focused operational environment. Furthermore, the incident reporting mechanisms mandated by both NIS2 and DORA underscore a shared commitment to transparency and accountability in the face of security incidents, which can drive improvements in organizational responses to breaches, including those involving personal data under GDPR. This alignment not only streamlines compliance processes but also fortifies the organization’s overall security framework, enhancing its ability to protect against and respond to cyber threats and operational disruptions. By recognizing and acting upon these synergies, organizations can more effectively allocate resources, avoid duplicative efforts, and foster a culture of continuous improvement in cybersecurity and data protection practices.

Conclusion

Understanding the nuances and requirements of NIS2, DORA, GDPR, and the CER Directive is crucial for organizations operating within the EU, especially those that fall under the scope of multiple frameworks. By recognizing the overlaps and leveraging synergies between these regulations and directives, organizations can streamline their compliance efforts, enhance their operational resilience and data protection measures, and contribute to a safer, more secure digital and physical environment within the EU. This integrated approach not only ensures regulatory compliance but also builds a strong foundation of trust with customers, stakeholders, and regulatory bodies.

For streamlined compliance with EU directives like NIS2, DORA and GDPR, Nebosystems offers expert services tailored to your needs. Learn more about our cybersecurity solutions or get in touch directly.

References:

NIS2 (Network & Information System) Directive (EU) 2022/2555. EUR-Lex.

General Data Protection Regulation (EU) 2016/679. EUR-Lex.

Digital Operational Resilience Act (EU) 2022/2554. EUR-Lex.

Critical Entities Resilience Directive (EU) 2022/2557. EUR-Lex.

Understanding GDPR: A Definitive Guide on Key Requirements and Compliance

2024-03-24 Editor

Post Syndicated from Editor original https://nebosystems.eu/what-is-gdpr-key-requirements-guide/

In the digital landscape where data breaches and privacy concerns are increasingly prevalent, understanding the General Data Protection Regulation (GDPR) is essential for businesses and individuals alike. Implemented on May 25, 2018, GDPR represents a significant overhaul of data protection laws, setting a new global benchmark for privacy rights, security, and compliance.

The GDPR is a comprehensive data protection law that came into effect in the European Union (EU) but has far-reaching implications for companies worldwide. It represents a significant shift in the way personal data of individuals within these regions is collected, stored, processed, and protected by organizations worldwide. It aims to give individuals more control over their personal data and to unify data protection regulations across the EU, thereby simplifying the regulatory environment for international business

Who is Affected?

The GDPR affects:

Organizations within the EU: All entities operating within the EU, regardless of their size, that process personal data are subject to the GDPR.
Organizations outside the EU: Non-EU organizations that offer goods or services to individuals in the EU or monitor the behavior of individuals within the EU are also subject to the GDPR.
Individuals within the EU: The GDPR enhances the rights of EU residents, offering them greater control over their personal data.

The GDPR is built around several key principles that dictate how personal data should be handled, processed, and protected. Understanding these requirements is crucial for any organization striving for compliance:

Lawfulness, Fairness, and Transparency: Processing must be lawful, fair and transparent to the data subject.
Purpose Limitation: Data must be collected for specified, explicit and legitimate purposes, and not further processed in a manner that is incompatible with those purposes.
Data Minimization: The collection of data should be limited to what is necessary in relation to the purposes for which they are processed.
Accuracy: Personal data must be accurate and, where necessary, kept up to date.
Storage Limitation: Data should be retained only as long as necessary for the purposes for which they are processed.
Integrity and Confidentiality (Security): Personal data must be processed in a manner that ensures appropriate security, including protection against unauthorized or unlawful processing, accidental loss, destruction or damage.
Accountability: The data controller is responsible for, and must be able to demonstrate, compliance with all the principles mentioned above.

Rights of Data Subjects

The GDPR enhances and introduces new rights for data subjects, including:

The right to be informed: Individuals have the right to be informed about the collection and use of their personal data.
The right of access: Individuals can access their data and ask how their data is being used.
The right to rectification: Individuals have the right to have inaccurate data corrected.
The right to erasure (Right to be Forgotten): Individuals can request the deletion of their data under certain conditions.
The right to restrict processing: Individuals can request the restriction of processing of their personal data.
The right to data portability: Individuals have the right to receive their personal data in a structured, commonly used, and machine-readable format.
The right to object: Individuals can object to the processing of their personal data in certain circumstances, including for direct marketing.

Additional Requirements:

Consent: When processing is based on consent, it must be freely given, specific, informed, and unambiguous, with a clear affirmative action by the data subject.
Data Protection by Design and by Default: Organizations must implement appropriate technical and organizational measures to meet the principles of data protection effectively and safeguard individual rights. Integrating privacy considerations into the design of systems and processes, known as ‘Privacy by Design,’ is a GDPR principle that emphasizes proactive privacy measures from the outset of any project or process involving personal data.
Data Protection Impact Assessments (DPIAs): DPIAs are required where data processing is likely to result in high risk to the rights and freedoms of individuals, particularly with the use of new technologies.
Data Breach Notification: Organizations must notify the appropriate data protection authority of a data breach within 72 hours, unless the breach is unlikely to result in a risk to the rights and freedoms of individuals. Affected individuals must also be notified if there is a high risk to their rights and freedoms.
Data Protection Officers (DPOs): Organizations must appoint a DPO if they are a public authority, their core activities require large scale, regular and systematic monitoring of individuals, or their core activities consist of large scale processing of special categories of data or data relating to criminal convictions and offences.
One-Stop-Shop: The GDPR introduces a one-stop-shop mechanism for organizations operating in multiple EU countries, meaning they only have to deal with a single supervisory authority.
Cross-Border Data Transfers: The GDPR imposes restrictions on the transfer of personal data outside the EU, ensuring that such transfers only occur to countries or entities providing an adequate level of data protection.
Processors Obligations: Processors are directly responsible for processing personal data in accordance with the GDPR’s mandates, including processing data based on the controller’s documented instructions, ensuring the confidentiality of the processed data, and aiding controllers in meeting their GDPR obligations .
Record Keeping: Controllers and processors must keep detailed records of processing activities.
Security of Processing: Controllers and processors must implement appropriate technical and organizational measures to ensure a level of security appropriate to the risk.
Cooperation Among Supervisory Authorities: Supervisory authorities must cooperate with each other to ensure consistent application of the GDPR across the EU.
Certification Mechanisms, Seals, and Marks: The GDPR encourages the use of certification mechanisms, seals, and marks as evidence of compliance with its provisions, including for controllers or processors not directly subject to the regulation due to their geographical location .

By adhering to these requirements, organizations can ensure compliance with the GDPR, thereby enhancing the protection of personal data and potentially avoiding significant penalties for non-compliance. Non-compliance with the GDPR can result in hefty fines, with penalties reaching up to €20 million or 4% of the annual worldwide turnover of the preceding financial year, whichever is greater, for the most serious infringements.

The GDPR’s impact extends beyond the borders of the EU and EEA, affecting any organization worldwide that processes the personal data of individuals within these regions. Its implementation marks a significant step towards enhancing individuals’ privacy rights and setting a new global standard for data protection.

For organizations seeking to fortify their data protection measures in line with GDPR standards, our Comprehensive GDPR Compliance Cybersecurity Solutions provide a robust framework tailored to meet the unique challenges of your business.

Whether you’re looking to enhance your cybersecurity measures or seeking expert consulting to navigate GDPR compliance, reach out Nebosystems today. Let us help you transform GDPR compliance from a daunting obligation into an opportunity for enhanced data security and trust building.

Reference: General Data Protection Regulation (2016/679). EUR-Lex.

Build a pseudonymization service on AWS to protect sensitive data: Part 2

2024-03-06 Edvin Hallvaxhiu

Post Syndicated from Edvin Hallvaxhiu original https://aws.amazon.com/blogs/big-data/build-a-pseudonymization-service-on-aws-to-protect-sensitive-data-part-2/

Part 1 of this two-part series described how to build a pseudonymization service that converts plain text data attributes into a pseudonym or vice versa. A centralized pseudonymization service provides a unique and universally recognized architecture for generating pseudonyms. Consequently, an organization can achieve a standard process to handle sensitive data across all platforms. Additionally, this takes away any complexity and expertise needed to understand and implement various compliance requirements from development teams and analytical users, allowing them to focus on their business outcomes.

Following a decoupled service-based approach means that, as an organization, you are unbiased towards the use of any specific technologies to solve your business problems. No matter which technology is preferred by individual teams, they are able to call the pseudonymization service to pseudonymize sensitive data.

In this post, we focus on common extract, transform, and load (ETL) consumption patterns that can use the pseudonymization service. We discuss how to use the pseudonymization service in your ETL jobs on Amazon EMR (using Amazon EMR on EC2) for streaming and batch use cases. Additionally, you can find an Amazon Athena and AWS Glue based consumption pattern in the GitHub repo of the solution.

Solution overview

The following diagram describes the solution architecture.

The account on the right hosts the pseudonymization service, which you can deploy using the instructions provided in the Part 1 of this series.

The account on the left is the one that you set up as part of this post, representing the ETL platform based on Amazon EMR using the pseudonymization service.

You can deploy the pseudonymization service and the ETL platform on the same account.

Amazon EMR empowers you to create, operate, and scale big data frameworks such as Apache Spark quickly and cost-effectively.

In this solution, we show how to consume the pseudonymization service on Amazon EMR with Apache Spark for batch and streaming use cases. The batch application reads data from an Amazon Simple Storage Service (Amazon S3) bucket, and the streaming application consumes records from Amazon Kinesis Data Streams.

PySpark code used in batch and streaming jobs

Both applications use a common utility function that makes HTTP POST calls against the API Gateway that is linked to the pseudonymization AWS Lambda function. The REST API calls are made per Spark partition using the Spark RDD mapPartitions function. The POST request body contains the list of unique values for a given input column. The POST request response contains the corresponding pseudonymized values. The code swaps the sensitive values with the pseudonymized ones for a given dataset. The result is saved to Amazon S3 and the AWS Glue Data Catalog, using Apache Iceberg table format.

Iceberg is an open table format that supports ACID transactions, schema evolution, and time travel queries. You can use these features to implement the right to be forgotten (or data erasure) solutions using SQL statements or programming interfaces. Iceberg is supported by Amazon EMR starting with version 6.5.0, AWS Glue, and Athena. Batch and streaming patterns use Iceberg as their target format. For an overview of how to build an ACID compliant data lake using Iceberg, refer to Build a high-performance, ACID compliant, evolving data lake using Apache Iceberg on Amazon EMR.

Prerequisites

You must have the following prerequisites:

An AWS account.
An AWS Identity and Access Management (IAM) principal with privileges to deploy the AWS CloudFormation stack and related resources.
The AWS Command Line Interface (AWS CLI) installed on the development or deployment machine that you will use to run the provided scripts.
An S3 bucket in the same account and AWS Region where the solution is to be deployed.
Python3 installed in the local machine where the commands are run.
PyYAML installed using pip.
A bash terminal to run bash scripts that deploy CloudFormation stacks.
An additional S3 bucket containing the input dataset in Parquet files (only for batch applications). Copy the sample dataset to the S3 bucket.
A copy of the latest code repository in the local machine using git clone or the download option.

Open a new bash terminal and navigate to the root folder of the cloned repository.

The source code for the proposed patterns can be found in the cloned repository. It uses the following parameters:

ARTEFACT_S3_BUCKET – The S3 bucket where the infrastructure code will be stored. The bucket must be created in the same account and Region where the solution lives.
AWS_REGION – The Region where the solution will be deployed.
AWS_PROFILE – The named profile that will be applied to the AWS CLI command. This should contain credentials for an IAM principal with privileges to deploy the CloudFormation stack of related resources.
SUBNET_ID – The subnet ID where the EMR cluster will be spun up. The subnet is pre-existing and for demonstration purposes, we use the default subnet ID of the default VPC.
EP_URL – The endpoint URL of the pseudonymization service. Retrieve this from the solution deployed as Part 1 of this series.
API_SECRET – An Amazon API Gateway key that will be stored in AWS Secrets Manager. The API key is generated from the deployment depicted in Part 1 of this series.
S3_INPUT_PATH – The S3 URI pointing to the folder containing the input dataset as Parquet files.
KINESIS_DATA_STREAM_NAME – The Kinesis data stream name deployed with the CloudFormation stack.
BATCH_SIZE – The number of records to be pushed to the data stream per batch.
THREADS_NUM – The number of parallel threads used in the local machine to upload data to the data stream. More threads correspond to a higher message volume.
EMR_CLUSTER_ID – The EMR cluster ID where the code will be run (the EMR cluster was created by the CloudFormation stack).
STACK_NAME – The name of the CloudFormation stack, which is assigned in the deployment script.

Batch deployment steps

As described in the prerequisites, before you deploy the solution, upload the Parquet files of the test dataset to Amazon S3. Then provide the S3 path of the folder containing the files as the parameter <S3_INPUT_PATH>.

We create the solution resources via AWS CloudFormation. You can deploy the solution by running the deploy_1.sh script, which is inside the deployment_scripts folder.

After the deployment prerequisites have been satisfied, enter the following command to deploy the solution:

sh ./deployment_scripts/deploy_1.sh \
-a <ARTEFACT_S3_BUCKET> \
-r <AWS_REGION> \
-p <AWS_PROFILE> \
-s <SUBNET_ID> \
-e <EP_URL> \
-x <API_SECRET> \
-i <S3_INPUT_PATH>

The output should look like the following screenshot.

The required parameters for the cleanup command are printed out at the end of the run of the deploy_1.sh script. Make sure to note down these values.

Test the batch solution

In the CloudFormation template deployed using the deploy_1.sh script, the EMR step containing the Spark batch application is added at the end of the EMR cluster setup.

To verify the results, check the S3 bucket identified in the CloudFormation stack outputs with the variable SparkOutputLocation.

You can also use Athena to query the table pseudo_table in the database blog_batch_db.

Clean up batch resources

To destroy the resources created as part of this exercise,

in a bash terminal, navigate to the root folder of the cloned repository. Enter the cleanup command shown as the output of the previously run deploy_1.sh script:

sh ./deployment_scripts/cleanup_1.sh \
-a <ARTEFACT_S3_BUCKET> \
-s <STACK_NAME> \
-r <AWS_REGION> \
-e <EMR_CLUSTER_ID>

The output should look like the following screenshot.

Streaming deployment steps

We create the solution resources via AWS CloudFormation. You can deploy the solution by running the deploy_2.sh script, which is inside the deployment_scripts folder. The CloudFormation stack template for this pattern is available in the GitHub repo.

After the deployment prerequisites have been satisfied, enter the following command to deploy the solution:

sh deployment_scripts/deploy_2.sh \
-a <ARTEFACT_S3_BUCKET> \
-r <AWS_REGION> \
-p <AWS_PROFILE> \
-s <SUBNET_ID> \
-e <EP_URL> \
-x <API_SECRET>

The output should look like the following screenshot.

The required parameters for the cleanup command are printed out at the end of the output of the deploy_2.sh script. Make sure to save these values to use later.

Test the streaming solution

In the CloudFormation template deployed using the deploy_2.sh script, the EMR step containing the Spark streaming application is added at the end of the EMR cluster setup. To test the end-to-end pipeline, you need to push records to the deployed Kinesis data stream. With the following commands in a bash terminal, you can activate a Kinesis producer that will continuously put records in the stream, until the process is manually stopped. You can control the producer’s message volume by modifying the BATCH_SIZE and the THREADS_NUM variables.

python3 -m pip install kiner
python3 \
consumption-patterns/emr/1_pyspark-streaming/kinesis_producer/producer.py \
<KINESIS_DATA_STREAM_NAME> \
<BATCH_SIZE> \
<THREADS_NUM>

To verify the results, check the S3 bucket identified in the CloudFormation stack outputs with the variable SparkOutputLocation.

In the Athena query editor, check the results by querying the table pseudo_table in the database blog_stream_db.

Clean up streaming resources

To destroy the resources created as part of this exercise, complete the following steps:

Stop the Python Kinesis producer that was launched in a bash terminal in the previous section.
Enter the following command:

sh ./deployment_scripts/cleanup_2.sh \
-a <ARTEFACT_S3_BUCKET> \
-s <STACK_NAME> \
-r <AWS_REGION> \
-e <EMR_CLUSTER_ID>

The output should look like the following screenshot.

Performance details

Use cases might differ in requirements with respect to data size, compute capacity, and cost. We have provided some benchmarking and factors that may influence performance; however, we strongly advise you to validate the solution in lower environments to see if it meets your particular requirements.

You can influence the performance of the proposed solution (which aims to pseudonymize a dataset using Amazon EMR) by the maximum number of parallel calls to the pseudonymization service and the payload size for each call. In terms of parallel calls, factors to consider are the GetSecretValue calls limit from Secrets Manager (10.000 per second, hard limit) and the Lambda default concurrency parallelism (1,000 by default; can be increased by quota request). You can control the maximum parallelism adjusting the number of executors, the number of partitions composing the dataset, and the cluster configuration (number and type of nodes). In terms of payload size for each call, factors to consider are the API Gateway maximum payload size (6 MB) and the Lambda function maximum runtime (15 minutes). You can control the payload size and the Lambda function runtime by adjusting the batch size value, which is a parameter of the PySpark script that determines the number of items to be pseudonymized per each API call. To capture the influence of all these factors and assess the performance of the consumption patterns using Amazon EMR, we have designed and monitored the following scenarios.

Batch consumption pattern performance

To assess the performance for the batch consumption pattern, we ran the pseudonymization application with three input datasets composed of 1, 10, and 100 Parquet files of 97.7 MB each. We generated the input files using the dataset_generator.py script.

The cluster capacity nodes were 1 primary (m5.4xlarge) and 15 core (m5d.8xlarge). This cluster configuration remained the same for all three scenarios, and it allowed the Spark application to use up to 100 executors. The batch_size, which was also the same for the three scenarios, was set to 900 VINs per API call, and the maximum VIN size was 5 bytes.

The following table captures the information of the three scenarios.

Execution ID	Repartition	Dataset Size	Number of Executors	Cores per Executor	Executor Memory	Runtime
A	800	9.53 GB	100	4	4 GiB	11 minutes, 10 seconds
B	80	0.95 GB	10	4	4 GiB	8 minutes, 36 seconds
C	8	0.09 GB	1	4	4 GiB	7 minutes, 56 seconds

As we can see, properly parallelizing the calls to our pseudonymization service enables us to control the overall runtime.

In the following examples, we analyze three important Lambda metrics for the pseudonymization service: Invocations, ConcurrentExecutions, and Duration.

The following graph depicts the Invocations metric, with the statistic SUM in orange and RUNNING SUM in blue.

By calculating the difference between the starting and ending point of the cumulative invocations, we can extract how many invocations were made during each run.

Run ID	Dataset Size	Total Invocations
A	9.53 GB	1.467.000 – 0 = 1.467.000
B	0.95 GB	1.467.000 – 1.616.500 = 149.500
C	0.09 GB	1.616.500 – 1.631.000 = 14.500

As expected, the number of invocations increases proportionally by 10 with the dataset size.

The following graph depicts the total ConcurrentExecutions metric, with the statistic MAX in blue.

The application is designed such that the maximum number of concurrent Lambda function runs is given by the amount of Spark tasks (Spark dataset partitions), which can be processed in parallel. This number can be calculated as MIN (executors x executor_cores, Spark dataset partitions).

In the test, run A processed 800 partitions, using 100 executors with four cores each. This makes 400 tasks processed in parallel so the Lambda function concurrent runs can’t be above 400. The same logic was applied for runs B and C. We can see this reflected in the preceding graph, where the amount of concurrent runs never surpasses the 400, 40, and 4 values.

To avoid throttling, make sure that the amount of Spark tasks that can be processed in parallel is not above the Lambda function concurrency limit. If that is the case, you should either increase the Lambda function concurrency limit (if you want to keep up the performance) or reduce either the amount of partitions or the number of available executors (impacting the application performance).

The following graph depicts the Lambda Duration metric, with the statistic AVG in orange and MAX in green.

As expected, the size of the dataset doesn’t affect the duration of the pseudonymization function run, which, apart from some initial invocations facing cold starts, remains constant to an average of 3 milliseconds throughout the three scenarios. This because the maximum number of records included in each pseudonymization call is constant (batch_size value).

Lambda is billed based on the number of invocations and the time it takes for your code to run (duration). You can use the average duration and invocations metrics to estimate the cost of the pseudonymization service.

Streaming consumption pattern performance

To assess the performance for the streaming consumption pattern, we ran the producer.py script, which defines a Kinesis data producer that pushes records in batches to the Kinesis data stream.

The streaming application was left running for 15 minutes and it was configured with a batch_interval of 1 minute, which is the time interval at which streaming data will be divided into batches. The following table summarizes the relevant factors.

Repartition

Cluster Capacity Nodes

Number of Executors

Executor’s Memory

Batch Window

Batch Size

VIN Size

1 Primary (m5.xlarge),

3 Core (m5.2xlarge)

9 GiB

60 seconds

900 VINs/API call.

5 Bytes / VIN

The following graphs depict the Kinesis Data Streams metrics PutRecords (in blue) and GetRecords (in orange) aggregated with 1-minute period and using the statistic SUM. The first graph shows the metric in bytes, which peaks 6.8 MB per minute. The second graph shows the metric in record count peaking at 85,000 records per minute.

We can see that the metrics GetRecords and PutRecords have overlapping values for almost the entire application’s run. This means that the streaming application was able to keep up with the load of the stream.

Next, we analyze the relevant Lambda metrics for the pseudonymization service: Invocations, ConcurrentExecutions, and Duration.

The following graph depicts the Invocations metric, with the statistic SUM (in orange) and RUNNING SUM in blue.

By calculating the difference between the starting and ending point of the cumulative invocations, we can extract how many invocations were made during the run. In specific, in 15 minutes, the streaming application invoked the pseudonymization API 977 times, which is around 65 calls per minute.

The following graph depicts the total ConcurrentExecutions metric, with the statistic MAX in blue.

The repartition and the cluster configuration allow the application to process all Spark RDD partitions in parallel. As a result, the concurrent runs of the Lambda function are always equal to or below the repartition number, which is 17.

To avoid throttling, make sure that the amount of Spark tasks that can be processed in parallel is not above the Lambda function concurrency limit. For this aspect, the same suggestions as for the batch use case are valid.

The following graph depicts the Lambda Duration metric, with the statistic AVG in blue and MAX in orange.

As expected, aside the Lambda function’s cold start, the average duration of the pseudonymization function was more or less constant throughout the run. This because the batch_size value, which defines the number of VINs to pseudonymize per call, was set to and remained constant at 900.

The ingestion rate of the Kinesis data stream and the consumption rate of our streaming application are factors that influence the number of API calls made against the pseudonymization service and therefore the related cost.

The following graph depicts the Lambda Invocations metric, with the statistic SUM in orange, and the Kinesis Data Streams GetRecords.Records metric, with the statistic SUM in blue. We can see that there is correlation between the amount of records retrieved from the stream per minute and the amount of Lambda function invocations, thereby impacting the cost of the streaming run.

In addition to the batch_interval, we can control the streaming application’s consumption rate using Spark streaming properties like spark.streaming.receiver.maxRate and spark.streaming.blockInterval. For more details, refer to Spark Streaming + Kinesis Integration and Spark Streaming Programming Guide.

Conclusion

Navigating through the rules and regulations of data privacy laws can be difficult. Pseudonymization of PII attributes is one of many points to consider while handling sensitive data.

In this two-part series, we explored how you can build and consume a pseudonymization service using various AWS services with features to assist you in building a robust data platform. In Part 1, we built the foundation by showing how to build a pseudonymization service. In this post, we showcased the various patterns to consume the pseudonymization service in a cost-efficient and performant manner. Check out the GitHub repository for additional consumption patterns.

About the Authors

Edvin Hallvaxhiu is a Senior Global Security Architect with AWS Professional Services and is passionate about cybersecurity and automation. He helps customers build secure and compliant solutions in the cloud. Outside work, he likes traveling and sports.

Rahul Shaurya is a Principal Big Data Architect with AWS Professional Services. He helps and works closely with customers building data platforms and analytical applications on AWS. Outside of work, Rahul loves taking long walks with his dog Barney.

Andrea Montanari is a Senior Big Data Architect with AWS Professional Services. He actively supports customers and partners in building analytics solutions at scale on AWS.

María Guerra is a Big Data Architect with AWS Professional Services. Maria has a background in data analytics and mechanical engineering. She helps customers architecting and developing data related workloads in the cloud.

Pushpraj Singh is a Senior Data Architect with AWS Professional Services. He is passionate about Data and DevOps engineering. He helps customers build data driven applications at scale.

Reflecting on the GDPR to celebrate Privacy Day 2024

2024-01-26 Emily Hancock http://blog.cloudflare.com/author/emily-hancock/

Post Syndicated from Emily Hancock http://blog.cloudflare.com/author/emily-hancock/ original https://blog.cloudflare.com/reflecting-on-the-gdpr-to-celebrate-privacy-day-2024

Just in time for Data Privacy Day 2024 on January 28, the EU Commission is calling for evidence to understand how the EU’s General Data Protection Regulation (GDPR) has been functioning now that we’re nearing the 6th anniversary of the regulation coming into force.

We’re so glad they asked, because we have some thoughts. And what better way to celebrate privacy day than by discussing whether the application of the GDPR has actually done anything to improve people’s privacy?

The answer is, mostly yes, but in a couple of significant ways – no.

Overall, the GDPR is rightly seen as the global gold standard for privacy protection. It has served as a model for what data protection practices should look like globally, it enshrines data subject rights that have been copied across jurisdictions, and when it took effect, it created a standard for the kinds of privacy protections people worldwide should be able to expect and demand from the entities that handle their personal data. On balance, the GDPR has definitely moved the needle in the right direction for giving people more control over their personal data and in protecting their privacy.

In a couple of key areas, however, we believe the way the GDPR has been applied to data flowing across the Internet has done nothing for privacy and in fact may even jeopardize the protection of personal data. The first area where we see this is with respect to cross-border data transfers. Location has become a proxy for privacy in the minds of many EU data protection regulators, and we think that is the wrong result. The second area is an overly broad interpretation of what constitutes “personal data” by some regulators with respect to Internet Protocol or “IP” addresses. We contend that IP addresses should not always count as personal data, especially when the entities handling IP addresses have no ability on their own to tie those IP addresses to individuals. This is important because the ability to implement a number of industry-leading cybersecurity measures relies on the ability to do threat intelligence on Internet traffic metadata, including IP addresses.

Location should not be a proxy for privacy

Fundamentally, good data security and privacy practices should be able to protect personal data regardless of where that processing or storage occurs. Nevertheless, the GDPR is based on the idea that legal protections should attach to personal data based on the location of the data – where it is generated, processed, or stored. Articles 44 to 49 establish the conditions that must be in place in order for data to be transferred to a jurisdiction outside the EU, with the idea that even if the data is in a different location, the privacy protections established by the GDPR should follow the data. No doubt this approach was influenced by political developments around government surveillance practices, such as the revelations in 2013 of secret documents describing the relationship between the US NSA (and its Five Eyes partners) and large Internet companies, and that intelligence agencies were scooping up data from choke points on the Internet. And once the GDPR took effect, many data regulators in the EU were of the view that as a result of the GDPR’s restrictions on cross-border data transfers, European personal data simply could not be processed in the United States in a way that would be consistent with the GDPR.

This issue came to a head in July 2020, when the European Court of Justice (CJEU), in its “Schrems II” decision¹, invalidated the EU-US Privacy Shield adequacy standard and questioned the suitability of the EU standard contractual clauses (a mechanism entities can use to ensure that GDPR protections are applied to EU personal data even if it is processed outside the EU). The ruling in some respects left data protection regulators with little room to maneuver on questions of transatlantic data flows. But while some regulators were able to view the Schrems II ruling in a way that would still allow for EU personal data to be processed in the United States, other data protection regulators saw the decision as an opportunity to double down on their view that EU personal data cannot be processed in the US consistent with the GDPR, therefore promoting the misconception that data localization should be a proxy for data protection.

In fact, we would argue that the opposite is the case. From our own experience and according to recent research², we know that data localization threatens an organization’s ability to achieve integrated management of cybersecurity risk and limits an entity’s ability to employ state-of-the-art cybersecurity measures that rely on cross-border data transfers to make them as effective as possible. For example, Cloudflare’s Bot Management product only increases in accuracy with continued use on the global network: it detects and blocks traffic coming from likely bots before feeding back learnings to the models backing the product. A diversity of signal and scale of data on a global platform is critical to help us continue to evolve our bot detection tools. If the Internet were fragmented – preventing data from one jurisdiction being used in another – more and more signals would be missed. We wouldn’t be able to apply learnings from bot trends in Asia to bot mitigation efforts in Europe, for example. And if the ability to identify bot traffic is hampered, so is the ability to block those harmful bots from services that process personal data.

The need for industry-leading cybersecurity measures is self-evident, and it is not as if data protection authorities don’t realize this. If you look at any enforcement action brought against an entity that suffered a data breach, you see data protection regulators insisting that the impacted entities implement ever more robust cybersecurity measures in line with the obligation GDPR Article 32 places on data controllers and processors to “develop appropriate technical and organizational measures to ensure a level of security appropriate to the risk”, “taking into account the state of the art”. In addition, data localization undermines information sharing within industry and with government agencies for cybersecurity purposes, which is generally recognized as vital to effective cybersecurity.

In this way, while the GDPR itself lays out a solid framework for securing personal data to ensure its privacy, the application of the GDPR’s cross-border data transfer provisions has twisted and contorted the purpose of the GDPR. It’s a classic example of not being able to see the forest for the trees. If the GDPR is applied in such a way as to elevate the priority of data localization over the priority of keeping data private and secure, then the protection of ordinary people’s data suffers.

Applying data transfer rules to IP addresses could lead to balkanization of the Internet

The other key way in which the application of the GDPR has been detrimental to the actual privacy of personal data is related to the way the term “personal data” has been defined in the Internet context – specifically with respect to Internet Protocol or “IP” addresses. A world where IP addresses are always treated as personal data and therefore subject to the GDPR’s data transfer rules is a world that could come perilously close to requiring a walled-off European Internet. And as noted above, this could have serious consequences for data privacy, not to mention that it likely would cut the EU off from any number of global marketplaces, information exchanges, and social media platforms.

This is a bit of a complicated argument, so let’s break it down. As most of us know, IP addresses are the addressing system for the Internet. When you send a request to a website, send an email, or communicate online in any way, IP addresses connect your request to the destination you’re trying to access. These IP addresses are the key to making sure Internet traffic gets delivered to where it needs to go. As the Internet is a global network, this means it’s entirely possible that Internet traffic – which necessarily contains IP addresses – will cross national borders. Indeed, the destination you are trying to access may well be located in a different jurisdiction altogether. That’s just the way the global Internet works. So far, so good.

But if IP addresses are considered personal data, then they are subject to data transfer restrictions under the GDPR. And with the way those provisions have been applied in recent years, some data regulators were getting perilously close to saying that IP addresses cannot transit jurisdictional boundaries if it meant the data might go to the US. The EU’s recent approval of the EU-US Data Privacy Framework established adequacy for US entities that certify to the framework, so these cross-border data transfers are not currently an issue. But if the Data Privacy Framework were to be invalidated as the EU-US Privacy Shield was in the Schrems II decision, then we could find ourselves in a place where the GDPR is applied to mean that IP addresses ostensibly linked to EU residents can’t be processed in the US, or potentially not even leave the EU.

If this were the case, then providers would have to start developing Europe-only networks to ensure IP addresses never cross jurisdictional boundaries. But how would people in the EU and US communicate if EU IP addresses can’t go to the US? Would EU citizens be restricted from accessing content stored in the US? It’s an application of the GDPR that would lead to the absurd result – one surely not intended by its drafters. And yet, in light of the Schrems II case and the way the GDPR has been applied, here we are.

A possible solution would be to consider that IP addresses are not always “personal data” subject to the GDPR. In 2016 – even before the GDPR took effect – the Court of Justice of the European Union (CJEU) established the view in Breyer v. Bundesrepublik Deutschland that even dynamic IP addresses, which change with every new connection to the Internet, constituted personal data if an entity processing the IP address could link the IP addresses to an individual. While the court’s decision did not say that dynamic IP addresses are always personal data under European data protection law, that’s exactly what EU data regulators took from the decision, without considering whether an entity actually has a way to tie the IP address to a real person³.

The question of when an identifier qualifies as “personal data” is again before the CJEU: In April 2023, the lower EU General Court ruled in SRB v EDPS⁴ that transmitted data can be considered anonymised and therefore not personal data if the data recipient does not have any additional information reasonably likely to allow it to re-identify the data subjects and has no legal means available to access such information. The appellant – the European Data Protection Supervisor (EDPS) – disagrees. The EDPS, who mainly oversees the privacy compliance of EU institutions and bodies, is appealing the decision and arguing that a unique identifier should qualify as personal data if that identifier could ever be linked to an individual, regardless of whether the entity holding the identifier actually had the means to make such a link.

If the lower court’s common-sense ruling holds, one could argue that IP addresses are not personal data when those IP addresses are processed by entities like Cloudflare, which have no means of connecting an IP address to an individual. If IP addresses are then not always personal data, then IP addresses will not always be subject to the GDPR’s rules on cross-border data transfers.

Although it may seem counterintuitive, having a standard whereby an IP address is not necessarily “personal data” would actually be a positive development for privacy. If IP addresses can flow freely across the Internet, then entities in the EU can use non-EU cybersecurity providers to help them secure their personal data. Advanced Machine Learning/predictive AI techniques that look at IP addresses to protect against DDoS attacks, prevent bots, or otherwise guard against personal data breaches will be able to draw on attack patterns and threat intelligence from around the world to the benefit of EU entities and residents. But none of these benefits can be realized in a world where IP addresses are always personal data under the GDPR and where the GDPR’s data transfer rules are interpreted to mean IP addresses linked to EU residents can never flow to the United States.

Keeping privacy in focus

On this Data Privacy Day, we urge EU policy makers to look closely at how the GDPR is working in practice, and to take note of the instances where the GDPR is applied in ways that place privacy protections above all other considerations – even appropriate security measures mandated by the GDPR’s Article 32 that take into account the state of the art of technology. When this happens, it can actually be detrimental to privacy. If taken to the extreme, this formulaic approach would not only negatively impact cybersecurity and data protection, but even put into question the functioning of the global Internet infrastructure as a whole, which depends on cross-border data flows. So what can be done to avert this?

First, we believe EU policymakers could adopt guidelines (if not legal clarification) for regulators that IP addresses should not be considered personal data when they cannot be linked by an entity to a real person. Second, policymakers should clarify that the GDPR’s application should be considered with the cybersecurity benefits of data processing in mind. Building on the GDPR’s existing recital 49, which rightly recognizes cybersecurity as a legitimate interest for processing, personal data that needs to be processed outside the EU for cybersecurity purposes should be exempted from GDPR restrictions to international data transfers. This would avoid some of the worst effects of the mindset that currently views data localization as a proxy for data privacy. Such a shift would be a truly pro-privacy application of the GDPR.

¹ Case C-311/18, Data Protection Commissioner v Facebook Ireland and Maximillian Schrems.
² Swire, Peter and Kennedy-Mayo, DeBrae and Bagley, Andrew and Modak, Avani and Krasser, Sven and Bausewein, Christoph, Risks to Cybersecurity from Data Localization, Organized by Techniques, Tactics, and Procedures (2023).
³ Different decisions by the European data protection authorities, namely the Austrian DSB (December 2021), the French CNIL (February 2022) and the Italian Garante (June 2022), while analyzing the use of Google Analytics, have rejected the relative approach used by the Breyer case and considered that an IP address should always be considered as personal data. Only the decision issued by the Spanish AEPD (December 2022) followed the same interpretation of the Breyer case. In addition, see paragraphs 109 and 136 in Guidelines by Supervisory Authorities for Tele-Media Providers, DSK (2021).
⁴ Single Resolution Board v EDPS, Court of Justice of the European Union, April 2023.

CISPE Code of Conduct Public Register now has 107 compliant AWS services

2023-06-19 Gokhan Akyuz

Post Syndicated from Gokhan Akyuz original https://aws.amazon.com/blogs/security/cispe-code-of-conduct-public-register-now-has-107-compliant-aws-services/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that 107 services are now certified as compliant with the Cloud Infrastructure Services Providers in Europe (CISPE) Data Protection Code of Conduct. This alignment with the CISPE requirements demonstrates our ongoing commitment to adhere to the heightened expectations for data protection by cloud service providers. AWS customers who use AWS certified services can be confident that their data is processed in adherence with the European Union’s General Data Protection Regulation (GDPR).

The CISPE Code of Conduct is the first pan-European, sector-specific code for cloud infrastructure service providers, which received a favorable opinion that it complies with the GDPR. It helps organizations across Europe accelerate the development of GDPR compliant, cloud-based services for consumers, businesses, and institutions.

The accredited monitoring body EY CertifyPoint evaluated AWS on January 26, 2023, and successfully audited 100 certified services. AWS added seven additional services to the current scope in June 2023. As of the date of this blog post, 107 services are in scope of this certification. The Certificate of Compliance that illustrates AWS compliance status is available on the CISPE Public Register. For up-to-date information, including when additional services are added, search the CISPE Public Register by entering AWS as the Seller of Record; or see the AWS CISPE page.

AWS strives to bring additional services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about AWS compliance with CISPE, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs, AWS General Data Protection Regulation (GDPR) Center, and the EU data protection section of the AWS Cloud Security site. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

New Global AWS Data Processing Addendum

2023-04-11 Chad Woolf

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/new-global-aws-data-processing-addendum/

Navigating data protection laws around the world is no simple task. Today, I’m pleased to announce that AWS is expanding the scope of the AWS Data Processing Addendum (Global AWS DPA) so that it applies globally whenever customers use AWS services to process personal data, regardless of which data protection laws apply to that processing. AWS is proud to be the first major cloud provider to adopt this global approach to help you meet your compliance needs for data protection.

The Global AWS DPA is designed to help you satisfy requirements under data protection laws worldwide, without having to create separate country-specific data processing agreements for every location where you use AWS services. By introducing this global, one-stop addendum, we are simplifying contracting procedures and helping to reduce the time that you spend assessing contractual data privacy requirements on a country-by-country basis.

If you have signed a copy of the previous AWS General Data Protection Regulation (GDPR) DPA, then you do not need to take any action and can continue to rely on that addendum to satisfy data processing requirements. AWS is always innovating to help you meet your compliance obligations wherever you operate. We’re confident that this expanded Global AWS DPA will help you on your journey. If you have questions or need more information, see Data Protection & Privacy at AWS and GDPR Center.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Helping protect personal information in the cloud, all across the world

2023-03-30 Rory Malone

Post Syndicated from Rory Malone original https://blog.cloudflare.com/cloudflare-official-gdpr-code-of-conduct/

Helping protect personal information in the cloud, all across the world

Cloudflare has achieved a new EU Cloud Code of Conduct privacy validation, demonstrating GDPR compliance to strengthen trust in cloud services

Internet privacy laws around the globe differ, and in recent years there’s been much written about cross-border data transfers. Many regulations require adequate protections to be in place before personal information flows around the world, as with the European General Data Protection Regulation (GDPR). The law rightly sets a high bar for how organizations must carefully handle personal information, and in drafting the regulation lawmakers anticipated personal data crossing-borders: Chapter V of the regulation covers those transfers specifically.

Whilst transparency on where personal information is stored is important, it’s also critically important how personal information is handled, and how it is kept safe and secure. At Cloudflare, we believe in protecting the privacy of personal information across the world, and we give our customers the tools and the choice on how and where to process their data. Put simply, we require that data is handled and protected in the same, secure, and careful way, whether our customers choose to transfer data across the world, or for it to remain in one country.

And today we are proud to announce that we have successfully completed our assessment journey and received the EU Cloud Code of Conduct compliance mark as a demonstration of our compliance with the GDPR, protecting personal data in the cloud, all across the world.

It matters how personal information is handled – not just where in the world it is saved

The same GDPR lawmakers also anticipated that organizations would want to handle and protect personal information in a consistent, transparent, and safe way too. Article 40, called ‘Codes of Conduct’ starts:

“The Member States, the supervisory authorities, the Board and the Commission shall encourage the drawing up of codes of conduct intended to contribute to the proper application of this Regulation, taking account of the specific features of the various processing sectors and the specific needs of micro, small and medium-sized enterprises.”

Using codes of conduct to demonstrate compliance with privacy law has a longer history, too. Like the GDPR, the pioneering 1995 EU Data Protection Directive, officially Directive 95/46/EC, also included provision for draft community codes to be submitted to national authorities, and for those codes to be formally approved by an official body of the European Union.

It took a full five years after the GDPR was adopted in 2016 for the first code of conduct to be officially approved. Finally in May 2021, the European Data Protection Board, a group composed of representatives of all the national data protection authorities across the union, approved the “EU Data Protection Code of Conduct for Cloud Service Providers” – the EU Cloud Code of Conduct (or ‘EU Cloud CoC’ for short) as the first official GDPR code of conduct. The EU Cloud CoC was brought to the board by the Belgian supervisory authority on behalf of SCOPE Europe, the organization who collaborated to develop the code over a number of years, including with input from the European Commission, members of the cloud computing community, and European data protection authorities.

The code is a framework for buyers and providers of cloud services. Buyers can understand in a straightforward way how a provider of cloud services will handle personal information. Providers of cloud services undergo an independent assessment to demonstrate to the buyers of their cloud services that they will handle personal information in a safe and codified way. In the case of the EU Cloud CoC and only because the code has received formal approval, buyers of cloud services compliant with code will know that the cloud provider handled customer personal information in a way that is compliant with the GDPR.

What the code covers

The code defines clear requirements for providers of cloud services to implement Article 28 of the GDPR (“Processor”) and related articles. The framework covers data protection policies, as well as technical and organizational security measures. There are sections covering providers’ terms and conditions, confidentiality and recordkeeping, the audit rights of the customer, how to handle potential data breaches, and how the provider approaches subprocessing – when a third-party is subcontracted to process personal data alongside the main data processor – and more.

The framework also covers how personal data may be legitimately transferred internationally, although whilst the EU Cloud CoC covers ensuring this is done in a legally-compliant way, the code itself is not a ‘safeguard’ or a tool for third country transfers. A future update to the code may expand into that with an additional module, but as of March 2023 that is still under development.

Example one
One requirement in the code is to have documented procedures in place to assist customers with their ‘data protection impact assessments’. According to the GDPR, these are:

“…an assessment of the impact of the envisaged processing operations
on the protection of personal data.” – Article 35.1, GDPR

So a cloud service provider should have a written process in place to support customers as they undertake their own assessments. In supporting the customer, the service provider is demonstrating their commitment to the rigorous data protection standards of the GDPR too. Cloudflare meets this requirement, and further supports transparency by publishing details of sub-processors used in the processing of personal data, and directing customers to audit reports available in the Cloudflare dashboard.

There’s also another reference in the GDPR to codes of conduct in the context of data protection impact assessments too:

“Compliance with approved codes of conduct… shall be taken into due account in assessing the impact of the processing operations performed… in particular for the purposes of a data protection impact assessment.” – Article 35.8, GDPR

So when preparing an impact assessment, a cloud customer shall take into account that a service provider complies with an approved code of conduct. Another way that both customers and cloud providers benefit from using codes of conduct!

Example two
Another example of a requirement of the code is that when cloud service providers provide encryption capabilities, they shall be implemented effectively. The requirement clarifies further that this should be undertaken by following strong and trusted encryption techniques, by taking into account the state-of-the-art, and by adequately preventing abusive access to customer personal data. Encryption is critical to protecting personal data in the cloud; without encryption, or with weakened or outdated encryption, privacy and security are not possible. So in using and reviewing encryption appropriately, cloud services providers help meet the requirements of the GDPR in protecting their customers’ personal data.

At Cloudflare, we are particularly proud of our track record: we make effective encryption available, for free, to all our customers. We help our customers understand encryption, and most importantly, we use strong and trusted encryption algorithms and techniques ourselves to protect customer personal data. We have a formal Research Team, including academic researchers and cryptographers who design and deploy state-of-the-art encryption protocols designed to provide effective protection against active and passive attacks, including with resources known to be available to public authorities; and we use trustworthy public-key certification authorities and infrastructure. Most recently this month, we announced that post-quantum crypto should be free, and so we are including it for free, forever.

More information
The code contains requirements described in 87 statements, called controls. You can find more about the EU Cloud CoC, download a full copy of the code, and keep up to date with news at https://eucoc.cloud/en/home

Why this matters to Cloudflare customers

Cloudflare joined the EU Cloud Code of Conduct’s General Assembly last May. Members of the General Assembly undertake an assessment journey which includes declaring named cloud services compliant with the EU Cloud Code, and after completing an independent assessment process by SCOPE Europe, the accredited monitoring body, receive the EU Cloud Code of Conduct compliance mark.

Cloudflare has completed the assessment process and been verified for 47 cloud services.

Cloudflare services that are in scope for EU Cloud Code of Conduct:

Services are verified compliant with the EU Cloud Code of Conduct,
Verification-ID: 2023LVL02SCOPE4316.
For further information please visit https://eucoc.cloud/en/public-register

And we’re not done yet…

The EU Cloud Code of Conduct is the newest privacy validation to add to our growing list of privacy certifications. Two years ago, Cloudflare was one of the first organisations in our industry to have received the new ISO privacy certification, ISO/IEC 27701:2019, and the first Internet performance & security company to be certified to it. Last year, Cloudflare certified to a second international privacy standard related to the processing of personal data, ISO/IEC 27018:2019. Most recently, in January this year Cloudflare completed our annual ISO audit with third-party auditor Schellman; and our new certificate, covering ISO 27001:2013, ISO 27018:2019, and ISO 27701:2019 is now available for customers to download from the Cloudflare dashboard.

And there’s more to come! As we blogged about in January for Data Privacy Day, we’re following the progress of the emerging Global Cross Border Privacy Rules (CBPR) certification with interest. This proposed single global certification could suffice for participating companies to safely transfer personal data between participating countries worldwide, and having already been supported by several governments from North America and Asia, looks very promising in this regard.

Cloudflare certifications

Find out how existing customers may download a copy of Cloudflare’s certifications and reports from the Cloudflare dashboard; new customers may also request these from your sales representative.

For the latest information about our certifications and reports, please visit our Trust Hub.

Build a pseudonymization service on AWS to protect sensitive data, part 1

2022-08-11 Rahul Shaurya

Post Syndicated from Rahul Shaurya original https://aws.amazon.com/blogs/big-data/part-1-build-a-pseudonymization-service-on-aws-to-protect-sensitive-data/

According to an article in MIT Sloan Management Review, 9 out of 10 companies believe their industry will be digitally disrupted. In order to fuel the digital disruption, companies are eager to gather as much data as possible. Given the importance of this new asset, lawmakers are keen to protect the privacy of individuals and prevent any misuse. Organizations often face challenges as they aim to comply with data privacy regulations like Europe’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). These regulations demand strict access controls to protect sensitive personal data.

This is a two-part post. In part 1, we walk through a solution that uses a microservice-based approach to enable fast and cost-effective pseudonymization of attributes in datasets. The solution uses the AES-GCM-SIV algorithm to pseudonymize sensitive data. In part 2, we will walk through useful patterns for dealing with data protection for varying degrees of data volume, velocity, and variety using Amazon EMR, AWS Glue, and Amazon Athena.

Data privacy and data protection basics

Before diving into the solution architecture, let’s look at some of the basics of data privacy and data protection. Data privacy refers to the handling of personal information and how data should be handled based on its relative importance, consent, data collection, and regulatory compliance. Depending on your regional privacy laws, the terminology and definition in scope of personal information may differ. For example, privacy laws in the United States use personally identifiable information (PII) in their terminology, whereas GDPR in the European Union refers to it as personal data. Techgdpr explains in detail the difference between the two. Through the rest of the post, we use PII and personal data interchangeably.

Data anonymization and pseudonymization can potentially be used to implement data privacy to protect both PII and personal data and still allow organizations to legitimately use the data.

Anonymization vs. pseudonymization

Anonymization refers to a technique of data processing that aims to irreversibly remove PII from a dataset. The dataset is considered anonymized if it can’t be used to directly or indirectly identify an individual.

Pseudonymization is a data sanitization procedure by which PII fields within a data record are replaced by artificial identifiers. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing. This technique is especially useful because it protects your PII data at record level for analytical purposes such as business intelligence, big data, or machine learning use cases.

The main difference between anonymization and pseudonymization is that the pseudonymized data is reversible (re-identifiable) to authorized users and is still considered personal data.

Solution overview

The following architecture diagram provides an overview of the solution.

Solution overview

This architecture contains two separate accounts:

Central pseudonymization service: Account 111111111111 – The pseudonymization service is running in its own dedicated AWS account (right). This is a centrally managed pseudonymization API that provides access to two resources for pseudonymization and reidentification. With this architecture, you can apply authentication, authorization, rate limiting, and other API management tasks in one place. For this solution, we’re using API keys to authenticate and authorize consumers.
Compute: Account 222222222222 – The account on the left is referred to as the compute account, where the extract, transform, and load (ETL) workloads are running. This account depicts a consumer of the pseudonymization microservice. The account hosts the various consumer patterns depicted in the architecture diagram. These solutions are covered in detail in part 2 of this series.

The pseudonymization service is built using AWS Lambda and Amazon API Gateway. Lambda enables the serverless microservice features, and API Gateway provides serverless APIs for HTTP or RESTful and WebSocket communication.

We create the solution resources via AWS CloudFormation. The CloudFormation stack template and the source code for the Lambda function are available in GitHub Repository.

We walk you through the following steps:

Deploy the solution resources with AWS CloudFormation.
Generate encryption keys and persist them in AWS Secrets Manager.
Test the service.

Demystifying the pseudonymization service

Pseudonymization logic is written in Java and uses the AES-GCM-SIV algorithm developed by codahale. The source code is hosted in a Lambda function. Secret keys are stored securely in Secrets Manager. AWS Key Management System (AWS KMS) makes sure that secrets and sensitive components are protected at rest. The service is exposed to consumers via API Gateway as a REST API. Consumers are authenticated and authorized to consume the API via API keys. The pseudonymization service is technology agnostic and can be adopted by any form of consumer as long as they’re able to consume REST APIs.

As depicted in the following figure, the API consists of two resources with the POST method:

API Resources

Pseudonymization – The pseudonymization resource can be used by authorized users to pseudonymize a given list of plaintexts (identifiers) and replace them with a pseudonym.
Reidentification – The reidentification resource can be used by authorized users to convert pseudonyms to plaintexts (identifiers).

The request response model of the API utilizes Java string arrays to store multiple values in a single variable, as depicted in the following code.

Request/Response model

The API supports a Boolean type query parameter to decide whether encryption is deterministic or probabilistic.

The implementation of the algorithm has been modified to add the logic to generate a nonce, which is dependent on the plaintext being pseudonymized. If the incoming query parameters key deterministic has the value True, then the overloaded version of the encrypt function is called. This generates a nonce using the HmacSHA256 function on the plaintext, and takes 12 sub-bytes from a predetermined position for nonce. This nonce is then used for the encryption and prepended to the resulting ciphertext. The following is an example:

Identifier – VIN98765432101234
Nonce – NjcxMDVjMmQ5OTE5
Pseudonym – NjcxMDVjMmQ5OTE5q44vuub5QD4WH3vz1Jj26ZMcVGS+XB9kDpxp/tMinfd9

This approach is useful especially for building analytical systems that may require PII fields to be used for joining datasets with other pseudonymized datasets.

The following code shows an example of deterministic encryption.

If the incoming query parameters key deterministic has the value False, then the encrypt method is called without the deterministic parameter and the nonce generated is a random 12 bytes. This generates a different ciphertext for the same incoming plaintext.

The following code shows an example of probabilistic encryption.

Probabilistic Encryption

The Lambda function utilizes a couple of caching mechanisms to boost the performance of the function. It uses Guava to build a cache to avoid generation of the pseudonym or identifier if it’s already available in the cache. For the probabilistic approach, the cache isn’t utilized. It also uses SecretCache, an in-memory cache for secrets requested from Secrets Manager.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
An AWS Identity and Access Management (IAM) principal with privileges to deploy the stack and related resources
An Amazon Simple Storage Service (Amazon S3) bucket in the same account and same Region where the solution is deployed
Code dependencies highlighted in the README

Deploy the solution resources with AWS CloudFormation

The deployment is triggered by running the deploy.sh script. The script runs the following phases:

Checks for dependencies.
Builds the Lambda package.
Builds the CloudFormation stack.
Deploys the CloudFormation stack.
Prints to standard out the stack output.

The following resources are deployed from the stack:

An API Gateway REST API with two resources:
- /pseudonymization
- /reidentification
A Lambda function
A Secrets Manager secret
A KMS key
IAM roles and policies
An Amazon CloudWatch Logs group

You need to pass the following parameters to the script for the deployment to be successful:

STACK_NAME – The CloudFormation stack name.
AWS_REGION – The Region where the solution is deployed.
AWS_PROFILE – The named profile that applies to the AWS Command Line Interface (AWS CLI). command
ARTEFACT_S3_BUCKET – The S3 bucket where the infrastructure code is stored. The bucket must be created in the same account and Region where the solution lives.

Use the following commands to run the ./deployments_scripts/deploy.sh script:

chmod +x ./deployment_scripts/deploy.sh ./deployment_scripts/deploy.sh -s STACK_NAME -b ARTEFACT_S3_BUCKET -r AWS_REGION -p AWS_PROFILE AWS_REGION

Upon successful deployment, the script displays the stack outputs, as depicted in the following screenshot. Take note of the output, because we use it in subsequent steps.

Stack Output

Generate encryption keys and persist them in Secrets Manager

In this step, we generate the encryption keys required to pseudonymize the plain text data. We generate those keys by calling the KMS key we created in the previous step. Then we persist the keys in a secret. Encryption keys are encrypted at rest and in transit, and exist in plain text only in-memory when the function calls them.

To perform this step, we use the script key_generator.py. You need to pass the following parameters for the script to run successfully:

KmsKeyArn – The output value from the previous stack deployment
AWS_PROFILE – The named profile that applies to the AWS CLI command
AWS_REGION – The Region where the solution is deployed
SecretName – The output value from the previous stack deployment

Use the following command to run ./helper_scripts/key_generator.py:

python3 ./helper_scripts/key_generator.py -k KmsKeyArn -s SecretName -p AWS_PROFILE -r AWS_REGION

Upon successful deployment, the secret value should look like the following screenshot.

Encryption Secrets

Test the solution

In this step, we configure Postman and query the REST API, so you need to make sure Postman is installed in your machine. Upon successful authentication, the API returns the requested values.

The following parameters are required to create a complete request in Postman:

PseudonymizationUrl – The output value from stack deployment
ReidentificationUrl – The output value from stack deployment
deterministic – The value True or False for the pseudonymization call
API_Key – The API key, which you can retrieve from API Gateway console

Follow these steps to set up Postman:

Start Postman in your machine.
On the File menu, choose Import.
Import the Postman collection.
From the collection folder, navigate to the pseudonymization request.
To test the pseudonymization resource, replace all variables in the sample request with the parameters mentioned earlier.

The request template in the body already has some dummy values provided. You can use the existing one or exchange with your own.

Choose Send to run the request.

The API returns in the body of the response a JSON data type.

Reidentification

From the collection folder, navigate to the reidentification request.
To test the reidentification resource, replace all variables in the sample request with the parameters mentioned earlier.
Pass to the response template in the body the pseudonyms output from earlier.
Choose Send to run the request.

The API returns in the body of the response a JSON data type.

Pseudonyms

Cost and performance

There are many factors that can determine the cost and performance of the service. Performance especially can be influenced by payload size, concurrency, cache hit, and managed service limits on the account level. The cost is mainly influenced by how much the service is being used. For our cost and performance exercise, we consider the following scenario:

The REST API is used to pseudonymize Vehicle Identification Numbers (VINs). On average, consumers request pseudonymization of 1,000 VINs per call. The service processes on average 40 requests per second, or 40,000 encryption or decryption operations per second. The average process time per request is as follows:

15 milliseconds for deterministic encryption
23 milliseconds for probabilistic encryption
6 milliseconds for decryption

The number of calls hitting the service per month is distributed as follows:

50 million calls hitting the pseudonymization resource for deterministic encryption
25 million calls hitting the pseudonymization resource for probabilistic encryption
25 million calls hitting the reidentification resource for decryption

Based on this scenario, the average cost is $415.42 USD per month. You may find the detailed cost breakdown in the estimate generated via the AWS Pricing Calculator.

We use Locust to simulate a similar load to our scenario. Measurements from Amazon CloudWatch metrics are depicted in the following screenshots (network latency isn’t considered during our measurement).

The following screenshot shows API Gateway latency and Lambda duration for deterministic encryption. Latency is high at the beginning due to the cold start, and flattens out over time.

API Gateway Latency & Lamdba Duration for deterministic encryption. Latency is high at the beginning due to the cold start and flattens out over time.

The following screenshot shows metrics for probabilistic encryption.

metrics for probabilistic encryption

The following shows metrics for decryption.

metrics for decryption

Clean up

To avoid incurring future charges, delete the CloudFormation stack by running the destroy.sh script. The following parameters are required to run the script successfully:

STACK_NAME – The CloudFormation stack name
AWS_REGION – The Region where the solution is deployed
AWS_PROFILE – The named profile that applies to the AWS CLI command

Use the following commands to run the ./deployment_scripts/destroy.sh script:

chmod +x ./deployment_scripts/destroy.sh ./deployment_scripts/destroy.sh -s STACK_NAME -r AWS_REGION -p AWS_PROFILE

Conclusion

In this post, we demonstrated how to build a pseudonymization service on AWS. The solution is technology agnostic and can be adopted by any form of consumer as long as they’re able to consume REST APIs. We hope this post helps you in your data protection strategies.

Stay tuned for part 2, which will cover consumption patterns of the pseudonymization service.

About the authors

Rahul Shaurya is a Senior Big Data Architect with AWS Professional Services. He helps and works closely with customers building data platforms and analytical applications on AWS. Outside of work, Rahul loves taking long walks with his dog Barney.

Andrea Montanari is a Big Data Architect with AWS Professional Services. He actively supports customers and partners in building analytics solutions at scale on AWS.

María Guerra is a Big Data Architect with AWS Professional Services. Maria has a background in data analytics and mechanical engineering. She helps customers architecting and developing data related workloads in the cloud.

Pushpraj is a Data Architect with AWS Professional Services. He is passionate about Data and DevOps engineering. He helps customers build data driven applications at scale.

Cloudflare achieves key cloud computing certifications — and there’s more to come

2022-05-23 Rory Malone

Post Syndicated from Rory Malone original https://blog.cloudflare.com/iso-27018-second-privacy-certification-and-c5/

Cloudflare achieves key cloud computing certifications — and there’s more to come

Back in the early days of the Internet, you could physically see the hardware where your data was stored. You knew where your data was and what kind of locks and security protections you had in place. Fast-forward a few decades, and data is all “in the cloud”. Now, you have to trust that your cloud services provider is putting security precautions in place just as you would have if your data was still sitting on your hardware. The good news is, you don’t have to merely trust your provider anymore. There are a number of ways a cloud services provider can prove it has robust privacy and security protections in place.

Today, we are excited to announce that Cloudflare has taken three major steps forward in proving the security and privacy protections we provide to customers of our cloud services: we achieved a key cloud services certification, ISO/IEC 27018:2019; we completed our independent audit and received our Cloud Computing Compliance Criteria Catalog (“C5”) attestation; and we have joined the EU Cloud Code of Conduct General Assembly to help increase the impact of the trusted cloud ecosystem and encourage more organizations to adopt GDPR-compliant cloud services.

Cloudflare has been committed to data privacy and security since our founding, and it is important to us that we can demonstrate these commitments. Certification provides assurance to our customers that a third party has independently verified that Cloudflare meets the requirements set out in the standard.

ISO/IEC 27018:2019 – Cloud Services Certification

2022 has been a big year for people who like the number ‘two’. February marked the second when the 22nd Feb 2022 20:22:02 passed: the second second of the twenty-second minute of the twentieth hour of the twenty-second day of the second month, of the year twenty-twenty-two! As well as the date being a palindrome — something that reads the same forwards and backwards — on an vintage ‘80s LCD clock, the date and time could be written as an ambigram too — something that can be read upside down as well as the right way up:

When we hit 2022 02 22, our team was busy completing our second annual audit to certify to ISO/IEC 27701:2019, having been one of the first organizations in our industry to have achieved this new ISO privacy certification in 2021, and the first Internet performance & security company to be certified to it. And now Cloudflare has now been certified to a second international privacy standard related to the processing of personal data — ISO/IEC 27018:2019.¹

ISO 27018 is a privacy extension to the widespread industry standards ISO/IEC 27001 and ISO/IEC 27002, which describe how to establish and run an Information Security Management System. ISO 27018 extends the standards into a code of practice for how any personal information should be protected when processed in a public cloud, such as Cloudflare’s.

What does ISO 27018 mean for Cloudflare customers?

Put simply, with Cloudflare’s certifications to both ISO 27701 and ISO 27018, customers can be assured that Cloudflare both has a privacy program that meets GDPR-aligned industry standards and also that Cloudflare protects the personal data processed in our network as part of that privacy program.

These certifications, in addition to the Data Processing Addendum (“DPA”) we make available to our customers, offer our customers multiple layers of assurance that any personal data that Cloudflare processes on their behalf will be handled in a way that meets the GDPR’s requirements.

The ISO 271018 standard contains enhancements to existing ISO 27002 controls and an additional set of 25 controls identified for organizations that are personal data processors. Controls are essentially a set of best practices that processors must meet in terms of data handling practices and transparency about those practices, protecting and encrypting the personal data processed, and handling data subject rights, among others. As an example, one of the ISO 27018 requirements is:

Where the organization is contracted to process personal data, that personal data may not be used for the purpose of marketing and advertising without establishing that prior consent was obtained from the appropriate data subject. Such consent shall not be a condition for receiving the service.

When Cloudflare acts as a data processor for our customers’ data, that data (and any personal data it may contain) belongs to our customers, not to us. Cloudflare does not track our customers’ end users for marketing or advertising purposes, and we never will. We even went beyond what the ISO control required and added this commitment to our customer DPA:

“… Cloudflare shall not use the Personal Data for the purposes of marketing or advertising…”
– 3.1(b), Cloudflare Data Protection Addendum

Cloudflare achieves ISO 27018:2019 Certification

For ISO 27018, Cloudflare was assessed by a third-party auditor, Schellman, between December 2021 and February 2022. Certifying to an ISO privacy standard is a multi-step process that includes an internal and an external audit, before finally being certified against the standard by the independent auditor. Cloudflare’s new single joint certificate, covering ISO 27001:2013, ISO 27018:2019, and ISO 27701:2019 is now available to download from the Cloudflare Dashboard.

C5:2020 – Cloud Computing Compliance Criteria Catalog

ISO 27108 isn’t all we’re announcing: as we blogged in February, Cloudflare has also been undergoing a separate independent audit for the Cloud Computing Compliance Criteria Catalog certification — also known as C5 — which was introduced by the German government’s Federal Office for Information Security (“BSI”) in 2016 and updated in 2020. C5 evaluates an organization’s security program against a standard of robust cloud security controls. Both German government agencies and private companies place a high level of importance on aligning their cloud computing requirements with these standards. Learn more about C5 here.

Today, we’re excited to announce that we have completed our independent audit and received our C5 attestation from our third-party auditors. The C5 attestation report is now available to download from the Cloudflare Dashboard.

And we’re not done yet…

When the European Union’s benchmark-setting General Data Protection Regulation (“GDPR”) was adopted four years ago this week, Article 40 encouraged:

“…the drawing up of codes of conduct intended to contribute to the proper application of this Regulation, taking account of the specific features of the various processing sectors and the specific needs of micro, small and medium-sized enterprises.”

The first code officially approved as GDPR-compliant by the EU one year ago this past weekend is ‘The EU Cloud Code of Conduct’. This code is designed to help cloud service providers demonstrate the protections they provide for the personal data they process on behalf of their customers. It covers all cloud service layers, and its compliance is overseen by accredited monitoring body SCOPE Europe. Initially, cloud service providers join as members of the code’s General Assembly, and then the second step is to undergo an audit to validate their adherence to the code.

Today, we are pleased to announce today that Cloudflare has joined the General Assembly of the EU Cloud Code of Conduct. We look forward to the second stage in this process, undertaking our audit and publicly affirming our compliance to the GDPR as a processor of personal data.

Cloudflare Certifications

Customers may now download a copy of Cloudflare’s certifications and reports from the Cloudflare Dashboard; new customers may request these from your sales representative. For the latest information about our certifications and reports, please visit our Trust Hub.

…
¹The International Organization for Standardization (“ISO”) is an international, nongovernmental organization made up of national standards bodies that develops and publishes a wide range of proprietary, industrial, and commercial standards.

New IDC whitepaper released – Trusted Cloud: Overcoming the Tension Between Data Sovereignty and Accelerated Digital Transformation

2022-04-28 Marta Taggart

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/new-idc-whitepaper-released-trusted-cloud-overcoming-the-tension-between-data-sovereignty-and-accelerated-digital-transformation/

A new International Data Corporation (IDC) whitepaper sponsored by AWS, Trusted Cloud: Overcoming the Tension Between Data Sovereignty and Accelerated Digital Transformation, examines the importance of the cloud in building the future of digital EU organizations. IDC predicts that 70% of CEOs of large European organizations will be incentivized to generate at least 40% of their revenues from digital by 2025, which means they have to accelerate their digital transformation. In a 2022 IDC survey of CEOs across Europe, 46% of European CEOs will accelerate the shift to cloud as their most strategic IT initiative in 2022.

In the whitepaper, IDC offers perspectives on how operational effectiveness, digital investment, and ultimately business growth need to be balanced with data sovereignty requirements. IDC defines data sovereignty as “a subset of digital sovereignty. It is the concept of data being subject to the laws and governance structures within the country it is collected or pertains to.”

IDC provides a perspective on some of the current discourse on cloud data sovereignty, including extraterritorial reach of foreign intelligence under national security laws, and the level of protection for individuals’ privacy in-country or with cross-border data transfer. The Schrems II decision and its implications with respect to personal data transfers between the EU and US has left many organizations grappling with how to comply with their legal requirements when transferring data outside the EU.

IDC provides the following background on controls in the cloud:

Cloud providers do not have unrestricted access to customer data in the cloud. Organizations retain all ownership and control of their data. Through credential and permission settings, the customer is the controller of who has access to their data.
Cloud providers use a rigorous set of organizational and technical controls based on least privilege to protect data from unauthorized access and inappropriate use.
Most cloud service operations, including maintenance and trouble-shooting, are fully automated. Should human access to customer data be required, it is temporary and limited to what is necessary to provide the contracted service to the customer. All access should be strictly logged, monitored, and audited to verify that activity is valid and compliant.
Technical controls such as encryption and key management assume greater importance. Encryption is considered fundamental to data protection best practices and highly recommended by regulators. Encrypted data processed in memory within hardware-based trusted execution environment (TEEs), also known as enclaves, can alleviate these regulatory concerns by rendering sensitive information invisible to host operating systems and cloud providers. The AWS Nitro System, the underlying platform that runs Amazon EC2 instances, is an industry example that provides such protection capability.
Independent accreditation against official standards are a recognized basis for assessing adherence to privacy and security practices. Approved by the European Data Protection Board, the EU Cloud Code of Conduct and CISPE’s Code of Conduct for Cloud Infrastructure Service Providers provide an accountability framework to help demonstrate compliance with processor obligations under GDPR Article 28. Whilst not required for GDPR compliance, CISPE requires accredited cloud providers to offer customers the option to retain all personal data in their customer content in the European Economic Area (EEA).
Greater data control and security is often cited as a driver to hosting data in-country. However, IDC notes that the physical location of the data has no bearing on mitigating data risk to cyber threats. Data residency can run counter to an organization’s objectives for security and resilience. More and more European organizations now are trusting the cloud for their security needs, as many organizations simply do not have the resource and expertise to provide the same security benefits as large cloud providers can.

For more information about how to translate your data sovereignty requirements into an actionable business and IT strategy, read the full IDC whitepaper Trusted Cloud: Overcoming the Tension Between Data Sovereignty and Accelerated Digital Transformation. You can also read more about AWS commitments to protect EU customers’ data on our EU data protection webpage.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

LGPD workbook for AWS customers managing personally identifiable information in Brazil

2022-04-26 Rodrigo Fiuza

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/lgpd-workbook-for-aws-customers-managing-personally-identifiable-information-in-brazil/

Portuguese version

AWS is pleased to announce the publication of the Brazil General Data Protection Law Workbook.

The General Data Protection Law (LGPD) in Brazil was first published on 14 August 2018, and started its applicability on 18 August 2020. Companies that manage personally identifiable information (PII) in Brazil as defined by LGPD will have to comply with and attend to the law.

To better help customers prepare and implement controls that focus on LGPD Chapter VII Security and Best Practices, AWS created a workbook based on industry best practices, AWS service offerings, and controls.

Amongst other topics, this workbook covers information security and AWS controls from:

CIS Controls v8 – framework covering 18 domain controls
NIST Cybersecurity Framework (CSF) – additional NIST CSF information available to consult
NIST Privacy Framework – check additional information on mapping between AWS CAF and NIST Privacy Framework
AWS Cloud Adoption Framework (AWS CAF) – AWS CAF 3.0 now available
AWS Well-Architected Framework – read through correlation between AWS Well-Architected Framework, AWS CAF and NIST CSF

In combination with Brazil General Data Protection Law Workbook, customers can use the detailed Navigating LGPD Compliance on AWS whitepaper.

AWS adheres to a shared responsibility model. Customers will have to observe which services offer privacy features and determine their applicability to their specific compliance requirements. Further information about data privacy at AWS can be found at our Data Privacy Center. Specific information about LGPD and data privacy at AWS in Brazil can be found on our Brazil Data Privacy page.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security news? Follow us on Twitter.

Portuguese

Workbook da LGPD para Clientes AWS que gerenciam Informações de Identificação Pessoal no Brasil

A AWS tem o prazer de anunciar a publicação do Workbook Lei Geral de Proteção de Dados do Brasil.

A Lei Geral de Proteção de Dados (LGPD) teve sua primeira publicação em 14 de agosto de 2018 no Brasil e iniciou sua aplicabilidade em 18 de agosto de 2020. Empresas que gerenciam informações pessoais identificáveis (PII) conforme definido na LGPD terão que cumprir e atender às cláusulas da lei.

Para ajudar melhor os clientes a preparar e implementar controles que se concentram no Capítulo VII da LGPD “da Segurança e Boas Práticas”, a AWS criou uma pasta de trabalho com base nas melhores práticas do setor, ofertas de serviços e controles da AWS.

Entre outros tópicos, esta pasta de trabalho aborda a segurança da informação e os controles da AWS de:

CIS v8.0 – framework contendo 18 domínios de controles
NIST Cybersecurity Framework (CSF) – informações adicionais sobre NIST CSF disponíveis para consulta
NIST Privacy Framework (PF) – confira informações adicionais sobre mapeamento entre AWS CAF e NIST PF
AWS Cloud Adoption Framework (AWS CAF) – AWS CAF 3.0 agora disponível
AWS Well-Architected Framework – leia a correlação entre AWS Well-Architected Framework, AWS CAF e NIST CSF

Em combinação com o Workbook Lei Geral de Proteção de Dados do Brasil, os clientes podem usar o whitepaper detalhado Navegando na conformidade com a LGPD na AWS.

A AWS adere a um modelo de responsabilidade compartilhada. Clientes terão que observar quais serviços oferecem recursos de privacidade e determinar sua aplicabilidade aos seus requisitos específicos de compliance. Mais informações sobre a privacidade de dados na AWS podem ser encontradas em nosso Centro de Privacidade de Dados. Informações adicionais sobre LGPD e Privacidade de dados na AWS no Brasil podem ser encontradas em nossa página de Privacidade de Dados no Brasil.

Para saber mais sobre nossos programas de conformidade e segurança, consulte Programas de conformidade da AWS. Como sempre, valorizamos seus comentários e perguntas; entre em contato com a equipe de conformidade da AWS por meio da página Fale conosco.

Se você tiver feedback sobre esta postagem, envie comentários na seção Comentários abaixo.

Quer mais notícias sobre segurança da AWS? Siga-nos no Twitter.

AWS welcomes new Trans-Atlantic Data Privacy Framework

2022-04-25 Michael Punke

Post Syndicated from Michael Punke original https://aws.amazon.com/blogs/security/aws-welcomes-new-trans-atlantic-data-privacy-framework/

Amazon Web Services (AWS) welcomes the new Trans-Atlantic Data Privacy Framework (Data Privacy Framework) that was agreed to, in principle, between the European Union (EU) and the United States (US) last month. This announcement demonstrates the common will between the US and EU to strengthen privacy protections in trans-Atlantic data flows, and will supplement the safeguards AWS and other companies already offer today. AWS commits to undertaking certification in accordance with the Data Privacy Framework as it is adopted, and we look forward to our customers and their end users benefiting from the new safeguards.

The Data Privacy Framework, once finalized, will re-establish a mechanism for certified businesses to conduct trans-Atlantic data transfers between the US and EU. According to the announcement, the new Data Privacy Framework will address the concerns raised by the Court of Justice of the European Union (CJEU) when it invalidated the EU-US Privacy Shield in its Schrems II decision in uly 2020. The Data Privacy Framework will adopt new safeguards to ensure that US intelligence activities are limited to what is necessary and proportionate to protect national security, and also create a new redress system to address the complaints of EU citizens.

As one of the architects of the Trusted Cloud Principles (a cloud-industry initiative to help safeguard the interests of organizations and the basic rights of individuals using cloud), AWS fully supports improved rules and regulations that advance privacy and security protections for any organization that wants to use cloud technologies and maintain control of their data.

While organizations using AWS technology have been able to conduct trans-Atlantic data transfers even after Schrems II, the new Data Privacy Framework will ensure further clarity and agility for our customers in their data transfer assessments. This will help our customers unlock value in terms of growth, digital transformation, and global competitive advantage.

Organizations that want to trade with speed and agility to and from the European Economic Area (EEA) need certainty that their goals to innovate and invest in the best technology for growth is supported by international frameworks promoting privacy across borders. Once finalized, the new Data Privacy Framework, coupled with our continued commitment to privacy at AWS, will provide even more simplicity and confidence for customers who choose to transfer data to and from Europe when using AWS services.

More than ever, our collective security requires mutual trust across both sides of the Atlantic and beyond. We therefore look forward to participating in, and remain committed to, the finalization of the Data Privacy Framework. We also support efforts to build broad consensus around the appropriate balance between privacy and security in forums such as the OECD’s workstream on trusted government access to data held by the private sector.

About AWS privacy and security

AWS is committed to protecting customer data. We continue to help customers successfully meet evolving European laws and standards, and achieve the highest levels of security, privacy, and resilience. AWS already offers comprehensive technical, operational, and contractual measures to protect and transfer customer content outside of Europe in compliance with the General Data Protection Regulation (GDPR) and the Schrems II ruling. Customers can also choose to store their content in the European Union by selecting any one or more of our regions in France, Germany, Ireland, Italy, Sweden, and later in 2022, Spain, with the confidence that their data stays in the AWS Region they select. In addition, customers can use an advanced set of access, encryption, and logging features to maintain full control of their content.

Today, AWS customers can also transfer their data outside of the European Economic Area (EEA) by relying on the new Standard Contractual Clauses (SCCs) included in the AWS Data Processing Addendum (DPA), which is supplemented by our strengthened contractual commitments to protect customer data, such as challenging law enforcement requests that conflict with EU law.

We also have a wide variety of tools available to enhance the security of cross-border data transfers for customers with global services. For example, AWS CloudHSM and AWS Key Management Service (AWS KMS) allow customers to encrypt data in transit and at rest, and securely generate and manage control of encryption keys. By building on top of the AWS Nitro System, our answer to confidential computing, which includes the use of specialized hardware and associated firmware to protect customer code and data during processing from outside access, customers can further secure data during processing, and thereby enhance confidentiality and privacy.

AWS has achieved internationally recognized certifications and attestations that demonstrate compliance with rigorous international privacy and security standards, including the Cloud Infrastructure Services in Europe (CISPE) Data Protection Code of Conduct, Cloud Computing Compliance Controls Catalog (C5), ISO27018, and the Esquema National de Securidad (ENS, Spain).

As well as benefitting from these existing measures, our extensive online resources can help customers more easily complete data-transfer assessments and fulfill their GDPR compliance requirements, in accordance with the European Data Protection Board (EDPB) recommendations. This includes regular Information Request Reports showing requests to access data from governments and our responses.

Further information

Our technical paper Navigating Compliance with EU Data Transfer Requirements and AWS’s Privacy Features for AWS Services provide further information to help customers assess the right services for their individual needs.

If you have questions or need more information, visit our EU Data Protection page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Enabling data classification for Amazon RDS database with Macie

2021-10-05 Bruno Silveira

Post Syndicated from Bruno Silveira original https://aws.amazon.com/blogs/security/enabling-data-classification-for-amazon-rds-database-with-amazon-macie/

Customers have been asking us about ways to use Amazon Macie data discovery on their Amazon Relational Database Service (Amazon RDS) instances. This post presents how to do so using AWS Database Migration Service (AWS DMS) to extract data from Amazon RDS, store it on Amazon Simple Storage Service (Amazon S3), and then classify the data using Macie. Macie’s resulting findings will also be made available to be queried with Amazon Athena by appropriate teams.

The challenge

Let’s suppose you need to find sensitive data in an RDS-hosted database using Macie, which currently only supports S3 as a data source. Therefore, you will need to extract and store the data from RDS in S3. In addition, you will need an interface for audit teams to audit these findings.

Solution overview

Figure 1: Solution architecture workflow

The architecture of the solution in Figure 1 can be described as:

A MySQL engine running on RDS is populated with the Sakila sample database.
A DMS task connects to the Sakila database, transforms the data into a set of Parquet compressed files, and loads them into the dcp-macie bucket.
A Macie classification job analyzes the objects in the dcp-macie bucket using a combination of techniques such as machine learning and pattern matching to determine whether the objects contain sensitive data and to generate detailed reports on the findings.
Amazon EventBridge routes the Macie findings reports events to Amazon Kinesis Data Firehose.
Kinesis Data Firehose stores these reports in the dcp-glue bucket.
S3 event notification triggers an AWS Lambda function whenever an object is created in the dcp-glue bucket.
The Lambda function named Start Glue Workflow starts a Glue Workflow.
Glue Workflow transforms the data from JSONL to Apache Parquet file format and places it in the dcp-athena bucket. This provides better performance during data query and optimized storage usage using a binary optimized columnar storage.
Athena is used to query and visualize data generated by Macie.

Note: For better readability, the S3 bucket nomenclature omits the suffix containing the AWS Region and AWS account ID used to meet the global uniqueness naming requirement (for example, dcp-athena-us-east-1-123456789012).

The Sakila database schema consists of the following tables:

actor
address
category
city
country
customer

Building the solution

Prerequisites

Before configuring the solution, the AWS Identity and Access Management (IAM) user must have appropriate access granted for the following services:

You can find an IAM policy with the required permissions here.

Step 1 – Deploying the CloudFormation template

You’ll use CloudFormation to provision quickly and consistently the AWS resources illustrated in Figure 1. Through a pre-built template file, it will create the infrastructure using an Infrastructure-as-Code (IaC) approach.

Download the CloudFormation template.
Go to the CloudFormation console.
Select the Stacks option in the left menu.
Select Create stack and choose With new resources (standard).
On Step 1 – Specify template, choose Upload a template file, select Choose file, and select the file template.yaml downloaded previously.
On Step 2 – Specify stack details, enter a name of your preference for Stack name. You might also adjust the parameters as needed, like the parameter CreateRDSServiceRole to create a service role for RDS if it does not exist in the current account.
On Step 3 – Configure stack options, select Next.
On Step 4 – Review, check the box for I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
Wait for the stack to show status CREATE_COMPLETE.

Note: It is expected that provisioning will take around 10 minutes to complete.

Step 2 – Running an AWS DMS task

To extract the data from the Amazon RDS instance, you need to run an AWS DMS task. This makes the data available for Amazon Macie in an S3 bucket in Parquet format.

Go to the AWS DMS console.
In the left menu, select Database migration tasks.
Select the task Identifier named rdstos3task.
Select Actions.
Select the option Restart/Resume.

When the Status changes to Load Complete the task has finished and you will be able to see migrated data in your target bucket (dcp-macie).

Inside each folder you can see parquet file(s) with names similar to LOAD00000001.parquet. Now you can use Macie to discover if you have sensitive data in your database contents as exported to S3.

Step 3 – Running a classification job with Amazon Macie

Now you need to create a data classification job so you can assess the contents of your S3 bucket. The job you create will run once and evaluate the complete contents of your S3 bucket to determine whether it can identify PII among the data. As mentioned earlier, this job only uses the managed identifiers available with Macie – you could also add your own custom identifiers.

Go to the Macie console.
Select the S3 buckets option in the left menu.

Choose the S3 bucket dcp-macie containing the output data from the DMS task. You may need to wait a minute and select the Refresh icon if the bucket names do not display.

Select Create job.
Select Next to continue.
Create a job with the following scope.
1. Sensitive data Discovery options: One-time job
2. Sampling Depth: 100%
3. Leave all other settings with their default values
Select Next to continue.
Select Next again to skip past the Custom data identifiers section.
Give the job a name and description.
Select Next to continue.
Verify the details of the job you have created and select Submit to continue.

You will see a green banner stating that The Job was successfully created. The job can take up to 15 minutes to conclude and the Status will change from Active to Complete. To open the findings from the job, select the job’s check box, choose Show results, and select Show findings.

Figure 2: Macie Findings screen

Note: You can navigate in the findings and select each checkbox to see the details.

Step 4 – Enabling querying on classification job results with Amazon Athena

Go to the Athena console and open the Query editor.
If it’s your first-time using Athena you will see a message Before you run your first query, you need to set up a query result location in Amazon S3. Learn more. Select the link presented with this message.
In the Settings window, choose Select and then choose the bucket dcp-assets to store the Athena query results.
(Optional) To store the query results encrypted, check the box for Encrypt query results and select your preferred encryption type. To learn more about Amazon S3 encryption types, see Protecting data using encryption.
Select Save.

Step 5 – Query Amazon Macie results with Amazon Athena.

It might take a few minutes for the data to complete the flow between Amazon Macie and AWS Glue. After it’s finished, you’ll be able to see in Athena’s console the table dcp_athena within the database dcp.

Then, select the three dots next to the table dcp_athena and select the Preview table option to see a data preview, or run your own custom queries.

Figure 3: Athena table preview

As your environment grows, this blog post on Top 10 Performance Tuning Tips for Amazon Athena can help you apply partitioning of data and consolidate your data into larger files if needed.

Clean up

After you finish, to clean up the solution and avoid unnecessary expenses, complete the following steps:

Go to the Amazon S3 console.
Navigate to each of the buckets listed below and delete all its objects:
- dcp-assets
- dcp-athena
- dcp-glue
- dcp-macie
Go to the CloudFormation console.
Select the Stacks option in the left menu.
Choose the stack you created in Step 1 – Deploying the CloudFormation template.
Select Delete and then select Delete Stack in the pop-up window.

Conclusion

In this blog post, we show how you can find Personally Identifiable Information (PII), and other data defined as sensitive, in Macie’s Managed Data Identifiers in an RDS-hosted MySQL database. You can use this solution with other relational databases like PostgreSQL, SQL Server, or Oracle, whether hosted on RDS or EC2. If you’re using Amazon DynamoDB, you may also find useful the blog post Detecting sensitive data in DynamoDB with Macie.

By classifying your data, you can inform your management of appropriate data protection and handling controls for the use of that data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

New Standard Contractual Clauses now part of the AWS GDPR Data Processing Addendum for customers

2021-09-17 Stéphane Ducable

Post Syndicated from Stéphane Ducable original https://aws.amazon.com/blogs/security/new-standard-contractual-clauses-now-part-of-the-aws-gdpr-data-processing-addendum-for-customers/

Today, we’re happy to announce an update to our online AWS GDPR Data Processing Addendum (AWS GDPR DPA) and our online Service Terms to include the new Standard Contractual Clauses (SCCs) that the European Commission (EC) adopted in June 2021. The EC-approved SCCs give our customers the ability to comply with the General Data Protection Regulation (GDPR) when they transfer personal data subject to GDPR to countries outside the European Economic Area (EEA) that haven’t received an EC adequacy decision (third countries). The new SCCs are now better adapted to how our customers operate their applications or run their workloads in the cloud, because they cover different transfer scenarios, and also provide enhanced safeguards for data transfers.

Achieving compliance with GDPR is critical for hundreds of thousands of AWS customers and AWS. The new SCCs allow all AWS customers that are controllers or processors under GDPR to continue to transfer personal data from their AWS accounts in compliance with GDPR. As part of the online Service Terms, the new SCCs will apply automatically whenever an AWS customer uses AWS services to transfer customer data to third countries.

The updated AWS GDPR DPA incorporating the new SCCs supplements our announcement in February 2021 of strengthened commitments to protect customer data, such as challenging law enforcement requests that conflict with EU law. We have also published the blog post How AWS is helping EU customers navigate the new normal for data protection, and the whitepaper Navigating Compliance with EU Data Transfer Requirements to help AWS customers conduct their data transfer assessments and comply with GDPR, the Schrems II ruling, and the recommendations issued by the European Data Protection Board. AWS is constantly working to ensure that our customers can enjoy the benefits of AWS everywhere they operate, and we welcome the new SCCs because they enable our customers to continue using AWS services in compliance with GDPR. If you have questions or need more information, visit our EU Data Protection page and our GDPR Center.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How AWS is helping EU customers navigate the new normal for data protection

2021-07-15 Stephen Schmidt

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/how-aws-is-helping-eu-customers-navigate-the-new-normal-for-data-protection/

Achieving compliance with the European Union’s data protection regulations is critical for hundreds of thousands of Amazon Web Services (AWS) customers. Many of them are subject to the EU’s General Data Protection Regulation (GDPR), which ensures individuals’ fundamental right to privacy and the protection of personal data. In February, we announced strengthened commitments to protect customer data, such as challenging law enforcement requests for customer data that conflict with EU law.

Today, we’re excited to announce that we’ve launched two new online resources to help customers more easily complete data transfer assessments and comply with the GDPR, taking into account the European Data Protection Board (EDPB) recommendations. These resources will also assist AWS customers in other countries to understand whether their use of AWS services involves a data transfer.

Using AWS’s new “Privacy Features for AWS Services,” customers can determine whether their use of an individual AWS service involves the transfer of customer data (the personal data they’ve uploaded to their AWS account). Knowing this information enables customers to choose the right action for their applications, such as opting out of the data transfer or creating an appropriate disclosure of the transfer for end user transparency.

We’re also providing additional information on the processing activities and locations of the limited number of sub-processors that AWS engages to provide services that involve the processing of customer data. AWS engages three types of sub-processors:

Local AWS entities that provide the AWS infrastructure.
AWS entities that process customer data for specific AWS services.
Third parties that AWS contracts with to provide processing activities for specific AWS services.

The enhanced information available on our updated Sub-processors page enables customers to assess if a sub-processor is relevant to their use of AWS services and AWS Regions.

These new resources make it easier for AWS customers to conduct their data transfer assessments as set out in the EDPB recommendations and, as a result, comply with GDPR. After completing their data transfer assessments, customers will also be able to determine whether they need to implement supplemental measures in line with the EDPB’s recommendations.

These resources support our ongoing commitment to giving customers control over where their data is stored, how it’s stored, and who has access to it.

Since we opened our first region in the EU in 2007, customers have been able to choose to store customer data with AWS in the EU. Today, customers can store their data in our AWS Regions in France, Germany, Ireland, Italy, and Sweden, and we’re adding Spain in 2022. AWS will never transfer data outside a customer’s selected AWS Region without the customer’s agreement.

AWS customers control how their data is stored, and we have a variety of tools at their disposal to enhance security. For example, AWS CloudHSM and AWS Key Management Service (AWS KMS) allow customers to encrypt data in transit and at rest and securely generate and manage encryption keys that they control.

Finally, our customers control who can access their data. We never use customer data for marketing or advertising purposes. We also prohibit, and our systems are designed to prevent, remote access by AWS personnel to customer data for any purpose, including service maintenance, unless requested by a customer, required to prevent fraud and abuse, or to comply with the law.

As previously mentioned, we challenge law enforcement requests for customer data from governmental bodies, whether inside or outside the EU, where the request conflicts with EU law, is overbroad, or we otherwise have any appropriate grounds to do so.

Earning customer trust is the foundation of our business at AWS, and we know protecting customer data is key to achieving this. We also know that helping customers protect their data in a world with constantly changing regulations, technology, and risks takes teamwork. We would never expect our customers to go it alone.

As we continue to enhance the capabilities customers have at their fingertips, they can be confident that choosing AWS will ensure they have the tools necessary to help them meet the most stringent security, privacy, and compliance requirements.

If you have questions or need more information, visit our EU Data Protection page.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Creating a notification workflow from sensitive data discover with Amazon Macie, Amazon EventBridge, AWS Lambda, and Slack

2021-06-10 Bruno Silviera

Post Syndicated from Bruno Silviera original https://aws.amazon.com/blogs/security/creating-a-notification-workflow-from-sensitive-data-discover-with-amazon-macie-amazon-eventbridge-aws-lambda-and-slack/

Following the example of the EU in implementing the General Data Protection Regulation (GDPR), many countries are implementing similar data protection laws. In response, many companies are forming teams that are responsible for data protection. Considering the volume of information that companies maintain, it’s essential that these teams are alerted when sensitive data is at risk.

This post shows how to deploy a solution that uses Amazon Macie to discover sensitive data. This solution enables you to set up automatic notification to your company’s designated data protection team via a Slack channel when sensitive data that needs to be protected is discovered by Amazon EventBridge and AWS Lambda.

The challenge

Let’s imagine that you’re part of a team that’s responsible for classifying your organization’s data but the data structure isn’t documented. Amazon Macie provides you the ability to run a scheduled classification job that examines your data, and you want to notify the data protection team when there’s new sensitive data to classify. Let’s build a solution to automatically notify the data protection team.

Solution overview

To be scalable and cost-effective, this solution uses serverless technologies and managed AWS services, including:

Macie – A fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in Amazon Web Services (AWS).
EventBridge – A serverless event bus that connects application data from your apps, SaaS, and AWS services. EventBridge can respond to specific events or run according to a schedule. The solution presented in this post uses EventBridge to initiate a custom Lambda function in response to a specific event.
Lambda – Runs code in response to events such as changes in data, changes in application state, or user actions. In this solution, a Lambda function is initiated by EventBridge.

Solution architecture

The architecture workflow is shown in Figure 1 and includes the following steps:

Macie runs a classification job and publishes its findings to EventBridge as a JSON object.
The EventBridge rule captures the findings and invokes a Lambda function as a target.
The Lambda function parses the JSON object. The function then sends a custom message to a Slack channel with the sensitive data finding for the data protection team to evaluate and respond to.

Figure 1: Solution architecture workflow

Set up Slack

For this solution, you need a Slack workspace and an incoming webhook. The workspace must be in place before you create the webhook.

Create a Slack workspace

If you already have a Slack workspace in your environment, you can skip forward, to creating the webhook.

If you don’t have a Slack workspace, follow the steps in Create a Slack Workspace to create one.

Create an incoming webhook in Slack API

Go to your Slack API.
Choose Start Building to create an app.
Enter the following details for your app:
- App Name – macie-to-slack.
- Development Slack Workspace – Choose the Slack workspace—either an existing workspace or one you created for this solution—to receive the Macie findings.
Choose the Create App button.
In the left menu, choose Incoming Webhooks.
At the Activate Incoming Webhooks screen, move the slider from OFF to ON.
Scroll down and choose Add New Webhook to Workspace.
In the screen asking where your app should post, enter the name of the Slack channel from your Workspace that you want to send notification to and choose Authorize.
On the next screen, scroll down to the Webhook URL section. Make a note of the URL to use later.

Deploy the CloudFormation template with the solution

The deployment of the CloudFormation template automatically creates the following resources:

A Lambda function that begins with the name named macie-to-slack-lambdafindingsToSlack-.
An EventBridge rule named MacieFindingsToSlack.
An IAM role named MacieFindingsToSlackkRole.
A permission to invoke the Lambda function named LambdaInvokePermission.

Note: Before you proceed, make sure you’re deploying the template to the same Region that your production Macie is running.

To deploy the Cloudformation template

Download the YAML template to your computer.

Note: To save the template, you can right click the Raw button at the top of the code and then select Save link as if you’re using Chrome, or the equivalent in your browser. This file is used in Step 4.
Open CloudFormation in the AWS Management Console.
On the Welcome page, choose Create stack and then choose With new resources.
On Step 1 — Specify template, choose Upload a template file, select Choose file and then select the file template.yaml (the file extension might be .YML), then choose Next.
On Step 2 — Specify stack details:
1. Enter macie-to-slack as the Stack name.
2. At the Slack Incoming Web Hook URL, paste the webhook URL you copied earlier.
3. At Slack channel, enter the name of the channel in your workspace that will receive the alerts and choose Next.
Figure 2: Defining stack details
On Step 3 – Configure Stack options, you can leave the default settings, or change them for your environment. Choose Next to continue.
At the bottom of Step 4 – Review, select I acknowledge that AWS CloudFormation might create IAM resources, and choose Create stack.

Figure 3: Confirmation before stack creation
Wait for the stack to reach status CREATE_COMPLETE.

Running the solution

At this point, you’ve deployed the solution and your resources are created.

To test the solution, you can schedule a Macie job targeting a bucket that contains a file with sensitive information that Macie can detect.

Note: You can check the Amazon Macie documentation to see the list of supported managed data identifiers.

When the Macie job is complete, any findings are sent to the Slack channel.

Figure 4: Macie finding delivered to Slack channel

Select the link in the message sent to the Slack channel to open that finding in the Macie console, as shown in Figure 5.

Figure 5: Finding details

And you’re done!

Now your Macie finding results are delivered to your Slack channel where they can be easily monitored, reducing response time and risk exposure.

If you deployed this for testing purposes, or want to clean this up and move to your production account, you can delete the Cloudformation stack:

Open the CloudFormation console.
Select the stack and choose Delete.

Conclusion

In this blog post we walked through the steps to configure a notification workflow using Macie, Lambda, and EventBridge to send sensitive data findings to your data protection team via a Slack channel.

Your data protection team will appreciate the timely notifications of sensitive data findings, giving you the ability to focus on creating controls to improve data security and compliance with regulations related to protection and treatment of personal data.

For more information about data privacy on AWS, see Data Privacy FAQ.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

How to protect sensitive data for its entire lifecycle in AWS

2021-02-26 Raj Jain

Post Syndicated from Raj Jain original https://aws.amazon.com/blogs/security/how-to-protect-sensitive-data-for-its-entire-lifecycle-in-aws/

Many Amazon Web Services (AWS) customer workflows require ingesting sensitive and regulated data such as Payments Card Industry (PCI) data, personally identifiable information (PII), and protected health information (PHI). In this post, I’ll show you a method designed to protect sensitive data for its entire lifecycle in AWS. This method can help enhance your data security posture and be useful for fulfilling the data privacy regulatory requirements applicable to your organization for data protection at-rest, in-transit, and in-use.

An existing method for sensitive data protection in AWS is to use the field-level encryption feature offered by Amazon CloudFront. This CloudFront feature protects sensitive data fields in requests at the AWS network edge. The chosen fields are protected upon ingestion and remain protected throughout the entire application stack. The notion of protecting sensitive data early in its lifecycle in AWS is a highly desirable security architecture. However, CloudFront can protect a maximum of 10 fields and only within HTTP(S) POST requests that carry HTML form encoded payloads.

If your requirements exceed CloudFront’s native field-level encryption feature, such as a need to handle diverse application payload formats, different HTTP methods, and more than 10 sensitive fields, you can implement field-level encryption yourself using the Lambda@Edge feature in CloudFront. In terms of choosing an appropriate encryption scheme, this problem calls for an asymmetric cryptographic system that will allow public keys to be openly distributed to the CloudFront network edges while keeping the corresponding private keys stored securely within the network core. One such popular asymmetric cryptographic system is RSA. Accordingly, we’ll implement a Lambda@Edge function that uses asymmetric encryption using the RSA cryptosystem to protect an arbitrary number of fields in any HTTP(S) request. We will discuss the solution using an example JSON payload, although this approach can be applied to any payload format.

A complex part of any encryption solution is key management. To address that, I use AWS Key Management Service (AWS KMS). AWS KMS simplifies the solution and offers improved security posture and operational benefits, detailed later.

Solution overview

You can protect data in-transit over individual communications channels using transport layer security (TLS), and at-rest in individual storage silos using volume encryption, object encryption or database table encryption. However, if you have sensitive workloads, you might need additional protection that can follow the data as it moves through the application stack. Fine-grained data protection techniques such as field-level encryption allow for the protection of sensitive data fields in larger application payloads while leaving non-sensitive fields in plaintext. This approach lets an application perform business functions on non-sensitive fields without the overhead of encryption, and allows fine-grained control over what fields can be accessed by what parts of the application.

A best practice for protecting sensitive data is to reduce its exposure in the clear throughout its lifecycle. This means protecting data as early as possible on ingestion and ensuring that only authorized users and applications can access the data only when and as needed. CloudFront, when combined with the flexibility provided by Lambda@Edge, provides an appropriate environment at the edge of the AWS network to protect sensitive data upon ingestion in AWS.

Since the downstream systems don’t have access to sensitive data, data exposure is reduced, which helps to minimize your compliance footprint for auditing purposes.

The number of sensitive data elements that may need field-level encryption depends on your requirements. For example:

For healthcare applications, HIPAA regulates 18 personal data elements.
In California, the California Consumer Privacy Act (CCPA) regulates at least 11 categories of personal information—each with its own set of data elements.

The idea behind field-level encryption is to protect sensitive data fields individually, while retaining the structure of the application payload. The alternative is full payload encryption, where the entire application payload is encrypted as a binary blob, which makes it unusable until the entirety of it is decrypted. With field-level encryption, the non-sensitive data left in plaintext remains usable for ordinary business functions. When retrofitting data protection in existing applications, this approach can reduce the risk of application malfunction since the data format is maintained.

The following figure shows how PII data fields in a JSON construction that are deemed sensitive by an application can be transformed from plaintext to ciphertext with a field-level encryption mechanism.

Figure 1: Example of field-level encryption

You can change plaintext to ciphertext as depicted in Figure 1 by using a Lambda@Edge function to perform field-level encryption. I discuss the encryption and decryption processes separately in the following sections.

Field-level encryption process

Let’s discuss the individual steps involved in the encryption process as shown in Figure 2.

Figure 2: Field-level encryption process

Figure 2 shows CloudFront invoking a Lambda@Edge function while processing a client request. CloudFront offers multiple integration points for invoking Lambda@Edge functions. Since you are processing a client request and your encryption behavior is related to requests being forwarded to an origin server, you want your function to run upon the origin request event in CloudFront. The origin request event represents an internal state transition in CloudFront that happens immediately before CloudFront forwards a request to the downstream origin server.

You can associate your Lambda@Edge with CloudFront as described in Adding Triggers by Using the CloudFront Console. A screenshot of the CloudFront console is shown in Figure 3. The selected event type is Origin Request and the Include Body check box is selected so that the request body is conveyed to Lambda@Edge.

Figure 3: Configuration of Lambda@Edge in CloudFront

The Lambda@Edge function acts as a programmable hook in the CloudFront request processing flow. You can use the function to replace the incoming request body with a request body with the sensitive data fields encrypted.

The process includes the following steps:

Step 1 – RSA key generation and inclusion in Lambda@Edge

You can generate an RSA customer managed key (CMK) in AWS KMS as described in Creating asymmetric CMKs. This is done at system configuration time.

Note: You can use your existing RSA key pairs or generate new ones externally by using OpenSSL commands, especially if you need to perform RSA decryption and key management independently of AWS KMS. Your choice won’t affect the fundamental encryption design pattern presented here.

The RSA key creation in AWS KMS requires two inputs: key length and type of usage. In this example, I created a 2048-bit key and assigned its use for encryption and decryption. The cryptographic configuration of an RSA CMK created in AWS KMS is shown in Figure 4.

Figure 4: Cryptographic properties of an RSA key managed by AWS KMS

Of the two encryption algorithms shown in Figure 4— RSAES_OAEP_SHA_256 and RSAES_OAEP_SHA_1, this example uses RSAES_OAEP_SHA_256. The combination of a 2048-bit key and the RSAES_OAEP_SHA_256 algorithm lets you encrypt a maximum of 190 bytes of data, which is enough for most PII fields. You can choose a different key length and encryption algorithm depending on your security and performance requirements. How to choose your CMK configuration includes information about RSA key specs for encryption and decryption.

Using AWS KMS for RSA key management versus managing the keys yourself eliminates that complexity and can help you:

Enforce IAM and key policies that describe administrative and usage permissions for keys.
Manage cross-account access for keys.
Monitor and alarm on key operations through Amazon CloudWatch.
Audit AWS KMS API invocations through AWS CloudTrail.
Record configuration changes to keys and enforce key specification compliance through AWS Config.
Generate high-entropy keys in an AWS KMS hardware security module (HSM) as required by NIST.
Store RSA private keys securely, without the ability to export.
Perform RSA decryption within AWS KMS without exposing private keys to application code.
Categorize and report on keys with key tags for cost allocation.
Disable keys and schedule their deletion.

You need to extract the RSA public key from AWS KMS so you can include it in the AWS Lambda deployment package. You can do this from the AWS Management Console, through the AWS KMS SDK, or by using the get-public-key command in the AWS Command Line Interface (AWS CLI). Figure 5 shows Copy and Download options for a public key in the Public key tab of the AWS KMS console.

Figure 5: RSA public key available for copy or download in the console

Note: As we will see in the sample code in step 3, we embed the public key in the Lambda@Edge deployment package. This is a permissible practice because public keys in asymmetric cryptography systems aren’t a secret and can be freely distributed to entities that need to perform encryption. Alternatively, you can use Lambda@Edge to query AWS KMS for the public key at runtime. However, this introduces latency, increases the load against your KMS account quota, and increases your AWS costs. General patterns for using external data in Lambda@Edge are described in Leveraging external data in Lambda@Edge.

Step 2 – HTTP API request handling by CloudFront

CloudFront receives an HTTP(S) request from a client. CloudFront then invokes Lambda@Edge during origin-request processing and includes the HTTP request body in the invocation.

Step 3 – Lambda@Edge processing

The Lambda@Edge function processes the HTTP request body. The function extracts sensitive data fields and performs RSA encryption over their values.

The following code is sample source code for the Lambda@Edge function implemented in Python 3.7:

import Crypto
import base64
import json
from Crypto.Cipher import PKCS1_OAEP
from Crypto.PublicKey import RSA

# PEM-formatted RSA public key copied over from AWS KMS or your own public key.
RSA_PUBLIC_KEY = "-----BEGIN PUBLIC KEY-----<your key>-----END PUBLIC KEY-----"
RSA_PUBLIC_KEY_OBJ = RSA.importKey(RSA_PUBLIC_KEY)
RSA_CIPHER_OBJ = PKCS1_OAEP.new(RSA_PUBLIC_KEY_OBJ, Crypto.Hash.SHA256)

# Example sensitive data field names in a JSON object. 
PII_SENSITIVE_FIELD_NAMES = ["fname", "lname", "email", "ssn", "dob", "phone"]

CIPHERTEXT_PREFIX = "#01#"
CIPHERTEXT_SUFFIX = "#10#"

def lambda_handler(event, context):
    # Extract HTTP request and its body as per documentation:
    # https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html
    http_request = event['Records'][0]['cf']['request']
    body = http_request['body']
    org_body = base64.b64decode(body['data'])
    mod_body = protect_sensitive_fields_json(org_body)
    body['action'] = 'replace'
    body['encoding'] = 'text'
    body['data'] = mod_body
    return http_request


def protect_sensitive_fields_json(body):
    # Encrypts sensitive fields in sample JSON payload shown earlier in this post.
    # [{"fname": "Alejandro", "lname": "Rosalez", … }]
    person_list = json.loads(body.decode("utf-8"))
    for person_data in person_list:
        for field_name in PII_SENSITIVE_FIELD_NAMES:
            if field_name not in person_data:
                continue
            plaintext = person_data[field_name]
            ciphertext = RSA_CIPHER_OBJ.encrypt(bytes(plaintext, 'utf-8'))
            ciphertext_b64 = base64.b64encode(ciphertext).decode()
            # Optionally, add unique prefix/suffix patterns to ciphertext
            person_data[field_name] = CIPHERTEXT_PREFIX + ciphertext_b64 + CIPHERTEXT_SUFFIX 
    return json.dumps(person_list)

The event structure passed into the Lambda@Edge function is described in Lambda@Edge Event Structure. Following the event structure, you can extract the HTTP request body. In this example, the assumption is that the HTTP payload carries a JSON document based on a particular schema defined as part of the API contract. The input JSON document is parsed by the function, converting it into a Python dictionary. The Python native dictionary operators are then used to extract the sensitive field values.

Note: If you don’t know your API payload structure ahead of time or you’re dealing with unstructured payloads, you can use techniques such as regular expression pattern searches and checksums to look for patterns of sensitive data and target them accordingly. For example, credit card primary account numbers include a Luhn checksum that can be programmatically detected. Additionally, services such as Amazon Comprehend and Amazon Macie can be leveraged for detecting sensitive data such as PII in application payloads.

While iterating over the sensitive fields, individual field values are encrypted using the standard RSA encryption implementation available in the Python Cryptography Toolkit (PyCrypto). The PyCrypto module is included within the Lambda@Edge zip archive as described in Lambda@Edge deployment package.

The example uses the standard optimal asymmetric encryption padding (OAEP) and SHA-256 encryption algorithm properties. These properties are supported by AWS KMS and will allow RSA ciphertext produced here to be decrypted by AWS KMS later.

Note: You may have noticed in the code above that we’re bracketing the ciphertexts with predefined prefix and suffix strings:

person_data[field_name] = CIPHERTEXT_PREFIX + ciphertext_b64 + CIPHERTEXT_SUFFIX

This is an optional measure and is being implemented to simplify the decryption process.

The prefix and suffix strings help demarcate ciphertext embedded in unstructured data in downstream processing and also act as embedded metadata. Unique prefix and suffix strings allow you to extract ciphertext through string or regular expression (regex) searches during the decryption process without having to know the data body format or schema, or the field names that were encrypted.

Distinct strings can also serve as indirect identifiers of RSA key pair identifiers. This can enable key rotation and allow separate keys to be used for separate fields depending on the data security requirements for individual fields.

You can ensure that the prefix and suffix strings can’t collide with the ciphertext by bracketing them with characters that don’t appear in cyphertext. For example, a hash (#) character cannot be part of a base64 encoded ciphertext string.

Deploying a Lambda function as a Lambda@Edge function requires specific IAM permissions and an IAM execution role. Follow the Lambda@Edge deployment instructions in Setting IAM Permissions and Roles for Lambda@Edge.

Step 4 – Lambda@Edge response

The Lambda@Edge function returns the modified HTTP body back to CloudFront and instructs it to replace the original HTTP body with the modified one by setting the following flag:

http_request['body']['action'] = 'replace'

Step 5 – Forward the request to the origin server

CloudFront forwards the modified request body provided by Lambda@Edge to the origin server. In this example, the origin server writes the data body to persistent storage for later processing.

Field-level decryption process

An application that’s authorized to access sensitive data for a business function can decrypt that data. An example decryption process is shown in Figure 6. The figure shows a Lambda function as an example compute environment for invoking AWS KMS for decryption. This functionality isn’t dependent on Lambda and can be performed in any compute environment that has access to AWS KMS.

Figure 6: Field-level decryption process

The steps of the process shown in Figure 6 are described below.

Step 1 – Application retrieves the field-level encrypted data

The example application retrieves the field-level encrypted data from persistent storage that had been previously written during the data ingestion process.

Step 2 – Application invokes the decryption Lambda function

The application invokes a Lambda function responsible for performing field-level decryption, sending the retrieved data to Lambda.

Step 3 – Lambda calls the AWS KMS decryption API

The Lambda function uses AWS KMS for RSA decryption. The example calls the KMS decryption API that inputs ciphertext and returns plaintext. The actual decryption happens in KMS; the RSA private key is never exposed to the application, which is a highly desirable characteristic for building secure applications.

Note: If you choose to use an external key pair, then you can securely store the RSA private key in AWS services like AWS Systems Manager Parameter Store or AWS Secrets Manager and control access to the key through IAM and resource policies. You can fetch the key from relevant vault using the vault’s API, then decrypt using the standard RSA implementation available in your programming language. For example, the cryptography toolkit in Python or javax.crypto in Java.

The Lambda function Python code for decryption is shown below.

import base64
import boto3
import re

kms_client = boto3.client('kms')
CIPHERTEXT_PREFIX = "#01#"
CIPHERTEXT_SUFFIX = "#10#"

# This lambda function extracts event body, searches for and decrypts ciphertext 
# fields surrounded by provided prefix and suffix strings in arbitrary text bodies 
# and substitutes plaintext fields in-place.  
def lambda_handler(event, context):    
    org_data = event["body"]
    mod_data = unprotect_fields(org_data, CIPHERTEXT_PREFIX, CIPHERTEXT_SUFFIX)
    return mod_data

# Helper function that performs non-greedy regex search for ciphertext strings on
# input data and performs RSA decryption of them using AWS KMS 
def unprotect_fields(org_data, prefix, suffix):
    regex_pattern = prefix + "(.*?)" + suffix
    mod_data_parts = []
    cursor = 0

    # Search ciphertexts iteratively using python regular expression module
    for match in re.finditer(regex_pattern, org_data):
        mod_data_parts.append(org_data[cursor: match.start()])
        try:
            # Ciphertext was stored as Base64 encoded in our example. Decode it.
            ciphertext = base64.b64decode(match.group(1))

            # Decrypt ciphertext using AWS KMS  
            decrypt_rsp = kms_client.decrypt(
                EncryptionAlgorithm="RSAES_OAEP_SHA_256",
                KeyId="<Your-Key-ID>",
                CiphertextBlob=ciphertext)
            decrypted_val = decrypt_rsp["Plaintext"].decode("utf-8")
            mod_data_parts.append(decrypted_val)
        except Exception as e:
            print ("Exception: " + str(e))
            return None
        cursor = match.end()

    mod_data_parts.append(org_data[cursor:])
    return "".join(mod_data_parts)

The function performs a regular expression search in the input data body looking for ciphertext strings bracketed in predefined prefix and suffix strings that were added during encryption.

While iterating over ciphertext strings one-by-one, the function calls the AWS KMS decrypt() API. The example function uses the same RSA encryption algorithm properties—OAEP and SHA-256—and the Key ID of the public key that was used during encryption in Lambda@Edge.

Note that the Key ID itself is not a secret. Any application can be configured with it, but that doesn’t mean any application will be able to perform decryption. The security control here is that the AWS KMS key policy must allow the caller to use the Key ID to perform the decryption. An additional security control is provided by Lambda execution role that should allow calling the KMS decrypt() API.

Step 4 – AWS KMS decrypts ciphertext and returns plaintext

To ensure that only authorized users can perform decrypt operation, the KMS is configured as described in Using key policies in AWS KMS. In addition, the Lambda IAM execution role is configured as described in AWS Lambda execution role to allow it to access KMS. If both the key policy and IAM policy conditions are met, KMS returns the decrypted plaintext. Lambda substitutes the plaintext in place of ciphertext in the encapsulating data body.

Steps three and four are repeated for each ciphertext string.

Step 5 – Lambda returns decrypted data body

Once all the ciphertext has been converted to plaintext and substituted in the larger data body, the Lambda function returns the modified data body to the client application.

Conclusion

In this post, I demonstrated how you can implement field-level encryption integrated with AWS KMS to help protect sensitive data workloads for their entire lifecycle in AWS. Since your Lambda@Edge is designed to protect data at the network edge, data remains protected throughout the application execution stack. In addition to improving your data security posture, this protection can help you comply with data privacy regulations applicable to your organization.

Since you author your own Lambda@Edge function to perform standard RSA encryption, you have flexibility in terms of payload formats and the number of fields that you consider to be sensitive. The integration with AWS KMS for RSA key management and decryption provides significant simplicity, higher key security, and rich integration with other AWS security services enabling an overall strong security solution.

By using encrypted fields with identifiers as described in this post, you can create fine-grained controls for data accessibility to meet the security principle of least privilege. Instead of granting either complete access or no access to data fields, you can ensure least privileges where a given part of an application can only access the fields that it needs, when it needs to, all the way down to controlling access field by field. Field by field access can be enabled by using different keys for different fields and controlling their respective policies.

In addition to protecting sensitive data workloads to meet regulatory and security best practices, this solution can be used to build de-identified data lakes in AWS. Sensitive data fields remain protected throughout their lifecycle, while non-sensitive data fields remain in the clear. This approach can allow analytics or other business functions to operate on data without exposing sensitive data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

AWS and EU data transfers: strengthened commitments to protect customer data

2021-02-18 Stephen Schmidt

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-and-eu-data-transfers-strengthened-commitments-to-protect-customer-data/

Last year we published a blog post describing how our customers can transfer personal data in compliance with both GDPR and the new “Schrems II” ruling. In that post, we set out some of the robust and comprehensive measures that AWS takes to protect customers’ personal data.

Today, we are announcing strengthened contractual commitments that go beyond what’s required by the Schrems II ruling and currently provided by other cloud providers to protect the personal data that customers entrust AWS to process (customer data). Significantly, these new commitments apply to all customer data subject to GDPR processed by AWS, whether it is transferred outside the European Economic Area (EEA) or not. These commitments are automatically available to all customers using AWS to process their customer data, with no additional action required, through a new supplementary addendum to the AWS GDPR Data Processing Addendum.

Our strengthened contractual commitments include:

Challenging law enforcement requests: We will challenge law enforcement requests for customer data from governmental bodies, whether inside or outside the EEA, where the request conflicts with EU law, is overbroad, or where we otherwise have any appropriate grounds to do so.
Disclosing the minimum amount necessary: We also commit that if, despite our challenges, we are ever compelled by a valid and binding legal request to disclose customer data, we will disclose only the minimum amount of customer data necessary to satisfy the request.

These strengthened commitments to our customers build on our long track record of challenging law enforcement requests. AWS rigorously limits – or rejects outright – law enforcement requests for data coming from any country, including the United States, where they are overly broad or we have any appropriate grounds to do so.

These commitments further demonstrate AWS’s dedication to securing our customers’ data: it is AWS’s highest priority. We implement rigorous contractual, technical, and organizational measures to protect the confidentiality, integrity, and availability of customer data regardless of which AWS Region a customer selects. Customers have complete control over their data through powerful AWS services and tools that allow them to determine where data will be stored, how it is secured, and who has access.

For example, customers using our latest generation of EC2 instances automatically gain the protection of the AWS Nitro System. Using purpose-built hardware, firmware, and software, AWS Nitro provides unique and industry-leading security and isolation by offloading the virtualization of storage, security, and networking resources to dedicated hardware and software. This enhances security by minimizing the attack surface and prohibiting administrative access while improving performance. Nitro was designed to operate in the most hostile network we could imagine, building in encryption, secure boot, a hardware-based root of trust, a decreased Trusted Computing Base (TCB) and restrictions on operator access. The newly announced AWS Nitro Enclaves feature enables customers to create isolated compute environments with cryptographic controls to assure the integrity of code that is processing highly sensitive data.

AWS encrypts all data in transit, including secure and private connectivity between EC2 instances of all types. Customers can rely on our industry leading encryption features and take advantage of AWS Key Management Services to control and manage their own keys within FIPS-140-2 certified hardware security modules. Regardless of whether data is encrypted or unencrypted, we will always work vigilantly to protect data from any unauthorized access. Find out more about our approach to data privacy.

AWS is constantly working to ensure that our customers can enjoy the benefits of AWS everywhere they operate. We will continue to update our practices to meet the evolving needs and expectations of customers and regulators, and fully comply with all applicable laws in every country in which we operate. With these changes, AWS continues our customer obsession by offering tooling, capabilities, and contractual rights that nobody else does.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Table of Contents

Scope and Applicability

Objectives

Key Requirements

Sanctions and Penalties

Implementation Deadline Date

Key Differences

Overlapping Areas

Incident Response and Recovery

Technical and Organizational Measures

Compliance Intersections and Synergies

Conclusion

Table of Contents

What is GDPR?

Who is Affected?

Key Requirements of GDPR

Rights of Data Subjects

Additional Requirements:

Solution overview

PySpark code used in batch and streaming jobs

Prerequisites

Batch deployment steps

Test the batch solution

Clean up batch resources

Streaming deployment steps

Test the streaming solution

Clean up streaming resources

Performance details

Batch consumption pattern performance

Streaming consumption pattern performance

Conclusion

About the Authors

Location should not be a proxy for privacy

Applying data transfer rules to IP addresses could lead to balkanization of the Internet

Keeping privacy in focus

It matters how personal information is handled – not just where in the world it is saved

An official GDPR Code of Conduct

What the code covers

Let us do a deeper dive into some of the requirements of the EU Cloud CoC and how it can demonstrate compliance with the GDPR

Why this matters to Cloudflare customers

Cloudflare services that are in scope for EU Cloud Code of Conduct:

And we’re not done yet…

Cloudflare certifications

Data privacy and data protection basics

Anonymization vs. pseudonymization

Solution overview

Demystifying the pseudonymization service

Prerequisites

Deploy the solution resources with AWS CloudFormation

Generate encryption keys and persist them in Secrets Manager

Test the solution

Cost and performance

Clean up

Conclusion

About the authors

ISO/IEC 27018:2019 – Cloud Services Certification

C5:2020 – Cloud Computing Compliance Criteria Catalog

And we’re not done yet…

Cloudflare Certifications

Portuguese

Workbook da LGPD para Clientes AWS que gerenciam Informações de Identificação Pessoal no Brasil

About AWS privacy and security

Further information

The challenge

Solution overview

Building the solution

Prerequisites

Step 1 – Deploying the CloudFormation template

Step 2 – Running an AWS DMS task

Step 3 – Running a classification job with Amazon Macie

Step 4 – Enabling querying on classification job results with Amazon Athena

Step 5 – Query Amazon Macie results with Amazon Athena.

Clean up

Conclusion

The challenge

Solution overview

Solution architecture

Set up Slack

Create a Slack workspace

Create an incoming webhook in Slack API